Proteomics in Systems Biology: Methods and Protocols (Methods in Molecular Biology, 2456) 1071621238, 9781071621233

This detailed book highlights the diverse techniques and applications of proteomics in an accessible, informative, and c

159 73 10MB

English Pages 382 [372] Year 2022

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Contributors
Chapter 1: Review of the Real and Sometimes Hidden Costs in Proteomics Experimental Workflows
1 Introduction
2 Quality Assurance of Samples
2.1 Sample Selection, Collection, and Storage
2.2 Ensuring High-Quality Samples to Begin with
2.3 How Much Data Is Really Needed?
3 Nanoflow LC-The Costly Achilles´ Heel of Proteomics
4 Balancing Replication vs. Fractionation
5 Costs of Quantitation
6 Conclusions
References
Chapter 2: High-Throughput Mass Spectrometry-Based Proteomics with dia-PASEF
1 Introduction
2 Materials
2.1 Samples
2.2 Liquid Chromatography
2.3 Mass Spectrometry
2.4 Data Analysis
3 Methods
3.1 Instrument Setup and Ion Mobility Calibration
3.2 Setting up a dia-PASEF Acquisition Method
3.3 Setting up an LC-MS Acquisition Method
3.4 Acquire a Sequence of dia-PASEF Experiments
3.5 Data Processing and Expected Results
4 Notes
References
Chapter 3: Isolation of Detergent Insoluble Proteins from Mouse Brain Tissue for Quantitative Analysis Using Data Independent ...
1 Introduction
2 Materials
2.1 Tissue Lysis
2.2 Centrifugation
2.3 Trypsin Digestion of Proteins
2.4 Stage Tips
2.5 Fractionation of Peptide Library
2.6 DDA Library Generation
2.7 DIA Mass Spec
2.8 Data Analysis
3 Methods
3.1 Tissue Lysis
3.2 Centrifugation and Protein Concentration Measurement
3.3 Protein Digestion and Peptide Elution
3.4 Stage Tipping
3.4.1 Prepare Stage Tips
3.4.2 Conditioning of Stage Tips
3.4.3 Stage Tipping of Samples
3.5 Library Generation
3.6 Mass Spectrometry Analysis of Library Using Data Dependent Acquisition (DDA)
3.7 Mass Spectrometry Analysis of Samples (Data Independent Acquisition, DIA)
3.8 Spectral Library Generation
3.9 Data Analysis of DIA Samples
3.10 directDIA Analysis
3.11 Analysis of Pooled Standards
3.12 Results: Reproducibly of DIA Workflow
4 Notes
References
Chapter 4: Rodent Lung Tissue Sample Preparation and Processing for Shotgun Proteomics
1 Introduction
2 Materials
2.1 Tissue Homogenization and Protein Extraction
2.2 Total Protein Quantitation
2.3 SDS-PAGE
2.4 Protein Reduction and Alkylation
2.5 Protein Clean Up Using Single-Pot, Solid-Phase-Enhanced Sample Preparation (SP3)
2.6 Trypsin Digestion
2.7 Peptide Clean Up
2.8 Instruments
3 Methods
3.1 Tissue Homogenization
3.2 Protein Quantification
3.3 Quality Assessment
3.4 Sample Preparation for MS
3.4.1 To Prepare Protein Lysate for SP3 Protein Clean Up
3.4.2 To Prepare SP3 Beads
3.4.3 Perform Protein Clean Up
3.4.4 Trypsin Digestion
3.4.5 Peptide Desalting (See Note 5)
3.5 Running Samples on LC-MS/MS
3.5.1 Preparation Prior to LC-MS Analysis
3.5.2 Determine Peptide Concentration Using NanoDrop Spectrophotometer
4 Notes
References
Chapter 5: Protein Purification and Digestion Methods for Bacterial Proteomic Analyses
1 Introduction
2 Materials
2.1 Bacterial Cell Lysis
2.2 Protein Purification and Solubilization
2.2.1 Acetone Precipitation
2.2.2 TRIzol-Based Protein Extraction
2.3 Protein Reduction, Alkylation, and Digestion
2.3.1 In-solution Digestion
2.3.2 On-filter Digestion
3 Methods
3.1 Cell Harvest, Lysis and Acetone Precipitation and Solubilization (SDS or SDC)
3.2 Cell Harvest, Lysis and TRIzol-Based Extraction and Solubilization with SDS
3.3 In-solution Digestion (SDC)
3.4 Filter-Aided Desalting and Digest
4 Notes
References
Chapter 6: Mapping Cell Surface Proteolysis with Plasma Membrane-Targeted Subtiligase
1 Introduction
2 Materials
2.1 Cell Culture and Cell Line Construction
2.2 Subtiligase Cell Labeling
2.3 Fluorescence Imaging
2.4 Cell Harvest and Cell Lysis
2.5 N-Terminal Peptide Enrichment
2.6 Peptide Desalting
2.7 LC-MS/MS Analysis
3 Methods
3.1 Lentivirus Production
3.2 Lentivirus Infection of HEK293T Cells
3.3 Fluorescence Imaging to Validate Subtiligase-TM Expression and Activity
3.4 Cell Harvest
3.5 Subtiligase-TM Labeling
3.6 Cell Lysis
3.7 Biotinylated Protein Enrichment
3.8 Reduction, Alkylation, and Trypsin Digestion
3.9 TEV Protease Elution of N-Terminal Peptides
3.10 Sample Desalting
3.11 LC-MS/MS and Data Analysis
4 Notes
References
Chapter 7: N-Terminomics/TAILS of Tissue and Liquid Biopsies
1 Introduction
2 Materials
3 Methods
3.1 Tissue Homogenization
3.2 Pre-Enrichment TAILS and TAILS
3.2.1 Protein Preparation, Denaturation, Alkylation, and Amine Labeling
3.2.2 Protein Precipitation and Trypsinization
3.2.3 PreTAILS Fraction and Polymer Selection
3.2.4 TAILS Fraction and Filtration
3.3 Alternative Labeling Option for Analysis of More than 2 Conditions: Tandem Mass Tag (TMT) Labeling (or iTRAQ)
4 Notes
References
Chapter 8: HUNTER: Sensitive Automated Characterization of Proteolytic Systems by N Termini Enrichment from Microscale Specimen
1 Introduction
2 Materials
3 Methods
3.1 Sample Preparation
3.1.1 Sample Lysis
3.1.2 Protein Reduction and Alkylation
3.2 SP3 Bead Binding and Proteome Clean up
3.2.1 Prepare SP3 Beads
3.2.2 Binding and Clean up
3.3 Protein Dimethyl Labeling
3.3.1 Initiate Labeling Reaction
3.3.2 Quench Labeling Reaction
3.4 Protein Digestion
3.5 Undecanal-Based Enrichment of Protein N Termini
3.5.1 Undecanal Labeling
3.5.2 Peptide Clean up / Removal of Excess Undecanal
3.5.3 Sample Desalting (Second C18 Clean up)
3.6 Automated HUNTER (for Plasma Sample)
3.6.1 Sample Preparation
3.6.2 SP3 Bead Particles Preparation
3.6.3 Determine the Amount of Ethanol for Sample Binding
3.6.4 Determine the Working Stock Concentration of Dimethyl Labeling Reagents
3.6.5 Determine the Amount of Undecanal Labeling Reagents
3.6.6 Set up Application Workflow
3.6.7 Application a: Day 1 Part 1
3.6.8 Application B: Day 1 Part 2
3.6.9 Application C: Day 2
3.7 Offline High pH Fractionation (Optional)
3.8 Mass Spectrometry Analysis by DDA and DIA
3.8.1 Concatenated Samples
3.8.2 Individual Samples
3.8.3 Instrumentation Setup
3.9 Data Processing and Statistical Analysis
3.9.1 DDA Data on Individual Samples
3.9.2 Sample Spectral Library Generation
3.9.3 DIA Data on Samples
3.9.4 Data Analysis for DDA and DIA
4 Notes
References
Chapter 9: Phosphoproteomics and Organelle Proteomics in Pancreatic Islets
1 Introduction
2 Materials
2.1 Islet Isolation
2.2 Total Proteome and Phosphoproteome Sample Preparation
2.3 Organelle Fractionation
2.4 Equipment
3 Methods
3.1 Islet Isolation
3.2 Phosphoproteome Sample Preparation
3.3 Organelle Fractionation
3.3.1 Preparation of Gradient Tubes
3.3.2 Lysis of Islets
3.3.3 Ultracentrifugation and Collection of Gradient Samples
3.3.4 Protein Precipitation and Total Proteome Sample Preparation
3.3.5 Stage Tip Preparation
StageTip activation
3.3.6 Sample Load and in StageTip Wash
3.3.7 Elution of Peptides from StageTip Membrane
3.3.8 Bioinformatic Analysis of PCP Data
Identification of Separable Compartments and Organelle Markers
Support Vector Machines (SVM)-Based Assignment of the Main Organelle
Assignment of a Secondary Organellar Localization by Correlation Analysis
4 Notes
References
Chapter 10: Phosphoproteomic Sample Preparation for Global Phosphorylation Profiling of a Fungal Pathogen
1 Introduction
2 Materials
2.1 Culturing of C. neoformans
2.2 Proteome Analysis
3 Methods
3.1 Growth of C. neoformans
3.2 Protein Extraction from C. neoformans
3.3 Phosphoenrichment for MS Analysis
3.4 Peptide Purification
3.5 Mass Spectrometry and Data Analysis
4 Notes
References
Chapter 11: Glycopeptide-Centric Approaches for the Characterization of Microbial Glycoproteomes
1 Introduction
2 Materials
2.1 Preparation of Proteome Samples
2.2 Hydrophilicity-Based Glycopeptide Enrichment
2.3 Antibody-Based Glycopeptide Enrichment
2.4 Glycopeptide MS Analysis
2.5 Bioinformatic Analysis of Microbial Glycopeptides
3 Methods
3.1 Preparation of Proteome Samples for Glycopeptide Enrichment/Analysis
3.2 Hydrophilicity-Based Glycopeptide Enrichment
3.3 Antibody-Based Glycopeptide Enrichment
3.3.1 Coupling of Antibodies to Protein A/G Beads
3.3.2 Cross-Linking Antibodies to Protein A/G Beads
3.3.3 Antibody-Based Affinity Purification of Glycopeptides
3.4 Glycopeptide MS Analysis
3.5 Bioinformatic Analysis of Microbial Glycopeptides
4 Notes
References
Chapter 12: Integrated Network Discovery Using Multi-Proteomic Data
1 Introduction
2 Multi-P Overview
3 Methods
3.1 Prepare proteinGroup.txt
3.2 Network Enrichment Analysis
3.3 Multi-P & SIV Calculation
References
Chapter 13: Targeted Cross-Linking Mass Spectrometry on Single-Step Affinity Purified Molecular Complexes in the Yeast Sacchar...
1 Introduction
2 Materials
2.1 Titrating the SM(PEG)2 Cross-Linker
2.1.1 Small-Scale ssAP
2.1.2 Two Steps SM(PEG)2 Cross-Linking Titration
2.2 Estimating CH-Tagged Protein Amounts from Isolated Small-Scale ssAP Complexes
2.2.1 Method 1
2.2.2 Method 2
2.3 Large-scale ssAP-anchXL-MS
2.3.1 Large-scale ssAP
2.3.2 SM(PEG)2 Two-Step Reaction
2.3.3 IMAC Enrichment of CH-Tagged Cross-Linked Peptides
2.4 MS Method and Analysis
3 Method
3.1 Starting Amount of Material and ssAP Buffer Optimization
3.2 Titrating the SM(PEG)2 Cross-Linker
3.2.1 Small-scale ssAP
3.2.2 Titrating the SM(PEG)2 Cross-Linker
3.2.3 Estimating CH-Tagged protein Amounts from Isolated Small-scale ssAP Complexes
Method 1
Method 2
3.3 Large-scale ssAP-anchXL-MS
3.3.1 Large-scale ssAP
3.3.2 SM(PEG)2 Controlled Two-Step Reactions
3.3.3 On-Bead Trypsin Digestion of ssAP-anchXL Complexes
3.3.4 IMAC Enrichment of Anchored chXL Peptides
3.3.5 Mass Spectrometry
3.3.6 ssAP-anchXL-MS Data Analysis Using pLink2
4 Notes
References
Chapter 14: A Crosslinking Mass Spectrometry Protocol for the Structural Analysis of Microtubule-Associated Proteins
1 Introduction
2 Materials
2.1 Protein Preparations
2.2 Buffers and Stock Reagents
2.3 Required Equipment
3 Methods
3.1 MT-MAP Preparation and Initial Testing
3.2 Evaluation of the MAP-MT Construct by Fluorescence Microscopy
3.3 Crosslinking of MAP-MT and Preparation for LC-MS
3.4 LC-MS Analysis
3.5 XL-MS Data Analysis
4 Notes
References
Chapter 15: Comprehensive Interactome Mapping of Nuclear Receptors Using Proximity Biotinylation
1 Introduction
2 Materials
2.1 Plasmids and Lentivirus Production
2.2 Lentiviral Transduction
2.3 Construct Validation in Polyclonal Populations and Clonal Isolation
2.4 Cell Induction for TurboID
2.5 TurboID
2.6 Peptide Desalting
3 Methods
3.1 Plasmid and Lentivirus Production
3.2 Lentiviral Transduction
3.3 Construct Validation in a Polyclonal Population and Monoclonal Cell Population Isolation
3.4 MDA-MB-231 Cell Induction for TurboID
3.5 MCF-7 Cell Induction for TurboID
3.6 TurboID
3.7 Peptide Desalting
3.8 MS Data Acquisition
3.9 MS Data Analysis
3.10 MS Data Archiving
4 Notes
References
Chapter 16: Mining Proteomics Datasets to Uncover Functional Pseudogenes
1 Introduction
2 Materials
2.1 Sample Preparation
2.2 PRM/MRM Analysis
3 Methods
3.1 Identification of Pseudogenes
3.1.1 Pseudogene Database Search
3.1.2 Transcription Evidence Search
3.1.3 Translation Verification
3.1.4 Translation Analysis
3.1.5 Protein Expression
3.1.6 Interaction Databases
3.2 Validation of Functional Pseudogenes
3.2.1 Protein Specific Unique Heavy Peptide Design
3.2.2 Optimization of Peptide Detection by MS
3.2.3 Sample Preparation
3.3 Options for Functional Analysis of Identified Pseudogene
4 Notes
References
Chapter 17: Proteomic Profiling of the Interplay Between a Bacterial Pathogen and Host Uncovers Novel Anti-Virulence Strategies
1 Introduction
2 Materials
2.1 Culturing of Klebsiella pneumoniae
2.2 Culturing Macrophages
2.3 Cellular Proteome Analysis
3 Methods
3.1 Culturing K. pneumoniae Cells
3.2 Culturing of Macrophages
3.2.1 Seeding Macrophages
3.2.2 Passaging Macrophages for Co-culture
3.3 Co-culture of Macrophages with K. pneumoniae
3.4 Collection of Cells
3.5 Proteome Extraction
3.6 Mass Spectrometry
3.7 Data Analysis
4 Notes
References
Chapter 18: Affinity Enrichment of Salmonella-Modified Membranes from Murine Macrophages for Proteomic Analyses
1 Introduction
2 Materials
2.1 Basic Equipment for Cell Cultivation, Infection, and Harvest
2.1.1 Cell Culture: Host
2.1.2 Cell Culture: Pathogen Salmonella enterica
2.2 Basic Equipment for Protein Extraction and Affinity Enrichment
2.2.1 Protein Extraction for Affinity Enrichment
2.2.2 Labeling of Protein G Magnetic Beads with M45 Antibody
2.2.3 Affinity Enrichment of Salmonella-Modified Membranes
3 Methods
3.1 RAW264.7 Cell Infection
3.2 Preparation of Protein Fraction for Affinity Enrichment
3.3 Labeling of Magnetic Beads for Affinity Enrichment
3.4 Affinity Enrichment of Salmonella-Modified Membranes (SMM)
4 Notes
References
Chapter 19: Proteomic Profiling of Interplay Between Agrobacterium tumefaciens and Nicotiana benthamiana for Improved Molecula...
1 Introduction
2 Materials
2.1 Plant Material
2.2 Protein Extraction
2.3 Protein Digestion
2.4 Peptide Purification
2.5 TMT Labeling
3 Methods
3.1 Preparing Plant Material
3.2 Protein Extraction
3.3 Protein Digestion
3.4 Peptide Purification
3.5 TMT Labeling
3.6 Mass Spectrometry
3.7 Data Analysis
4 Notes
References
Chapter 20: Label-Free Quantitative Proteomic Profiling of Fusarium Head Blight in Wheat
1 Introduction
2 Materials
2.1 Media for Inoculation
2.2 Inoculation and Harvest
2.3 Total Protein Extraction
2.4 Stop-and-Go Extraction Tips (STAGE-Tip) Desalting
2.5 Liquid Chromatography with Tandem Mass Spectrometry (LC-MS/MS)
2.6 Bioinformatics
3 Methods
3.1 Culturing Fusarium graminearum and Inoculating Triticum aestivum
3.2 Tissue Disruption and Protein Extraction
3.3 Protein Solubilization, Quantification, and Digestion
3.4 Preparing and equilibrating the C18 Stop-and-Go Extraction Tips (STAGE-Tip)
3.5 STAGE-Tip Samples
3.6 LC-MS/MS Analysis
3.7 Proteome Data Analysis
4 Notes
References
Chapter 21: DIA Proteomics and Machine Learning for the Fast Identification of Bacterial Species in Biological Samples
1 Introduction
2 Materials
2.1 Sample Preparation
2.2 High-pH Reversed-Phase High-Pressure Liquid Chromatography
2.3 NanoLC-MS/MS Analysis
2.4 Software
3 Methods
3.1 Spectral Libraries: Sample Preparation
3.2 Spectral Libraries: DDA LC-MS/MS Analysis
3.3 Spectral Libraries: Bioinformatic Treatment
3.4 Training Step: Sample Preparation of Bacterial Inoculates
3.5 Training Step: DIA LC-MS/MS Analysis
3.6 Training Step: DIA Signal Extraction
3.7 Training Step: Machine Learning Model and Signature Identification
3.8 Identification Step: Monitoring of Peptide Signature with PRM
4 Notes
References
Chapter 22: Novel Bioinformatics Strategies Driving Dynamic Metaproteomic Studies
1 Introduction
2 Peptide and Protein Identification
2.1 Mass Spectrum Acquisition
2.2 Peptide Identification with Database Search
2.3 Protein Sequence Database Choice
2.4 Protein Sequence Database Search
2.5 Spectral Library Search
2.6 Assessment of Peptide-Spectrum Match Quality and False Discovery Rate Estimation
2.7 De Novo Sequencing
3 Peptide and Protein Quantification
3.1 Workflows Combining Identification and Quantification
4 Data Refinement
4.1 Normalization
4.2 Data Imputation
4.3 Data Aggregation
5 Data Mining and Functional Analysis
5.1 Taxonomic Analysis
5.2 Functional Analysis
5.3 Metaproteomics Data Visualization
6 Application: Metaproteomics Analytical Methods in Action with Real-World Data
References
Chapter 23: MaxQuant Module for the Identification of Genomic Variants Propagated into Peptides
1 Introduction
2 Materials
2.1 Data Downloads
2.2 Software
3 Methods
3.1 Variant Extraction
3.2 Variant-Aware MaxQuant Proteomics Search
3.3 Data Analysis
3.3.1 Proteogenomic Analysis of Ultra-deep HeLa Proteome
3.3.2 Immunopeptidomic Analysis of HLA Peptides
4 Notes
References
Chapter 24: Untargeted Metabolomic Profiling of Fungal Species Populations
1 Introduction
1.1 Culturing Methodology
1.2 UPLC-HRMS
1.3 Data Analysis
2 Materials
2.1 Materials for Biological Sample Culturing and Extraction
2.2 Materials for UPLC-HRMS
2.3 Materials for Data Analysis and Mining
3 Methods
3.1 Fermentation
3.2 Solvent Extraction
3.3 UPLC-HRMS Analysis
3.4 Metabolomics Data Preprocessing
3.5 Data Processing (in ``R´´ Environment)
4 Notes
References
Correction to: DIA Proteomics and Machine Learning for the Fast Identification of Bacterial Species in Biological Samples
Index
Recommend Papers

Proteomics in Systems Biology: Methods and Protocols (Methods in Molecular Biology, 2456)
 1071621238, 9781071621233

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Methods in Molecular Biology 2456

Jennifer Geddes-McAlister Editor

Proteomics in Systems Biology Methods and Protocols

METHODS

IN

MOLECULAR BIOLOGY

Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, UK

For further volumes: http://www.springer.com/series/7651

For over 35 years, biological scientists have come to rely on the research protocols and methodologies in the critically acclaimed Methods in Molecular Biology series. The series was the first to introduce the step-by-step protocols approach that has become the standard in all biomedical protocol publishing. Each protocol is provided in readily-reproducible step-bystep fashion, opening with an introductory overview, a list of the materials and reagents needed to complete the experiment, and followed by a detailed procedure that is supported with a helpful notes section offering tips and tricks of the trade as well as troubleshooting advice. These hallmark features were introduced by series editor Dr. John Walker and constitute the key ingredient in each and every volume of the Methods in Molecular Biology series. Tested and trusted, comprehensive and reliable, all protocols from the series are indexed in PubMed.

Proteomics in Systems Biology Methods and Protocols

Edited by

Jennifer Geddes-McAlister Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON, Canada

Editor Jennifer Geddes-McAlister Department of Molecular and Cellular Biology University of Guelph Guelph, ON, Canada

ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-0716-2123-3 ISBN 978-1-0716-2124-0 (eBook) https://doi.org/10.1007/978-1-0716-2124-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022, Corrected Publication 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Cover Illustration Caption: Mass spectrometry-based proteomics from start to finish. Illustrative mass spectra derived from bacterial proteins with volcano plot representing differences in protein abundance. Software used: XCalibur (ThermoScientific) and Perseus (https://maxquant.net/perseus/). Credit: Jason A. McAlister, PhD This Humana imprint is published by the registered company Springer Science+Business Media, LLC part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.

Preface Mass spectrometry-based proteomics and the accompanying techniques, instruments, bioinformatics tools, and applications are diverse and relevant for studying a plethora of biological systems. With proteomics, we gain valuable insight into cellular processes, environmental responses, modifications controlling regulatory mechanisms, and interaction networks critical to biological function. In addition, applications within diverse disciplines, including biochemistry, microbiology, immunology, plant biology, chemistry, drug discovery, and computing, are only the beginning of appreciating the intricacies and complexities of this field. Over the past 25 years, proteomics technologies have progressed rapidly, gaining accolades in medical and agricultural sectors, by advancing our fundamental knowledge of biology and uncovering new functions and perspectives. Looking through previous editions of the Methods in Molecular Biology series exploring proteomics, the impact of this both emerging, and established field is evident and clearly demonstrates the interconnectivity and interdisciplinarity of the platform. The goal of this edition focused on proteomics in systems biology is to highlight the diverse techniques and applications of proteomics in an accessible, informative, and concise manner. Contributions capture the latest technologies and bioinformatics platforms, covering sample preparation from diverse collections, quantification, enrichment, modification, and interactome methodology for the in-depth exploration of biological systems, and application of proteomics in clinical, infectious disease, and agricultural practices. Moreover, cutting-edge bioinformatics tools, encompassing machine learning and data integration strategies, extract accurate, reproducible, and useful information, and finally, techniques expanding beyond proteomics into the realm of metabolomics are presented. Furthermore, considerations beyond technical aspects, including costs and data availability, are needed for a holistic appreciation of the technology and its impact. Taken together, this collaborative and dynamic collection emphasizes the importance of proteomics and demonstrates a plethora of approaches for investigating diverse biological entities from a systems perspective. Guelph, ON, Canada

Jennifer Geddes-McAlister

v

Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 Review of the Real and Sometimes Hidden Costs in Proteomics Experimental Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aicha Asma Houfani and Leonard James Foster 2 High-Throughput Mass Spectrometry-Based Proteomics with dia-PASEF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Patricia Skowronek and Florian Meier 3 Isolation of Detergent Insoluble Proteins from Mouse Brain Tissue or Quantitative Analysis Using Data Independent Acquisition (DIA) . . . . . . . . . . Cristen Molzahn, Lorenz Nierves, Philipp F. Lange, and Thibault Mayor 4 Rodent Lung Tissue Sample Preparation and Processing for Shotgun Proteomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hadeesha Piyadasa, Ying Lao, Oleg Krokhin, and Neeloffer Mookherjee 5 Protein Purification and Digestion Methods for Bacterial Proteomic Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nicole Hansmeier, Samrachana Sharma, and Tzu-Chiao Chao 6 Mapping Cell Surface Proteolysis with Plasma Membrane-Targeted Subtiligase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Aspasia A. Amiridis and Amy M. Weeks 7 N-Terminomics/TAILS of Tissue and Liquid Biopsies . . . . . . . . . . . . . . . . . . . . . . Anthonia Anowai, Sameeksha Chopra, Barbara Mainoli, Daniel Young, and Antoine Dufour 8 HUNTER: Sensitive Automated Characterization of Proteolytic Systems by N Termini Enrichment from Microscale Specimen . . . . . . . . . . . . . . . . . . . . . . . Anuli C. Uzozie, Janice Tsui, and Philipp F. Lange 9 Phosphoproteomics and Organelle Proteomics in Pancreatic Islets . . . . . . . . . . . . ¨ zu ¨ m Sehnaz Caliskan, Giorgia Massacci, Natalie Krahmer, O and Francesca Sacco 10 Phosphoproteomic Sample Preparation for Global Phosphorylation Profiling of a Fungal Pathogen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Brianna Ball, Jonathan R. Krieger, and Jennifer Geddes-McAlister 11 Glycopeptide-Centric Approaches for the Characterization of Microbial Glycoproteomes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nichollas E. Scott 12 Integrated Network Discovery Using Multi-Proteomic Data . . . . . . . . . . . . . . . . . Rafe Helwer and Vincent C. Chen

vii

v ix

1

15

29

53

63

71 85

95 123

141

153 173

viii

Contents

13

Targeted Cross-Linking Mass Spectrometry on Single-Step Affinity Purified Molecular Complexes in the Yeast Saccharomyces cerevisiae . . . . . . . . . . . Christian Trahan and Marlene Oeffinger 14 A Crosslinking Mass Spectrometry Protocol for the Structural Analysis of Microtubule-Associated Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Atefeh Rafiei and David C. Schriemer 15 Comprehensive Interactome Mapping of Nuclear Receptors Using Proximity Biotinylation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lynda Agbo, Sophie Anne Blanchet, Pata-Eting Kougnassoukou Tchara, Ame´lie Fradet-Turcotte, and Jean-Philippe Lambert 16 Mining Proteomics Datasets to Uncover Functional Pseudogenes . . . . . . . . . . . . Anna Meller and Franc¸ois-Michel Boisvert 17 Proteomic Profiling of the Interplay Between a Bacterial Pathogen and Host Uncovers Novel Anti-Virulence Strategies . . . . . . . . . . . . . . . . . . . . . . . . Arjun Sukumaran and Jennifer Geddes-McAlister 18 Affinity Enrichment of Salmonella-Modified Membranes from Murine Macrophages for Proteomic Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tzu-Chiao Chao, Samina Thapa, and Nicole Hansmeier 19 Proteomic Profiling of Interplay Between Agrobacterium tumefaciens and Nicotiana benthamiana for Improved Molecular Pharming Outcomes . . . . Nicholas Prudhomme, Jonathan R. Krieger, Michael D. McLean, Doug Cossar, and Jennifer Geddes-McAlister 20 Label-Free Quantitative Proteomic Profiling of Fusarium Head Blight in Wheat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Boyan Liu, Danisha Johal, Mitra Serajazari, and Jennifer Geddes-McAlister 21 DIA Proteomics and Machine Learning for the Fast Identification of Bacterial Species in Biological Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Florence Roux-Dalvai, Mickae¨l Leclercq, Clarisse Gotti, and Arnaud Droit 22 Novel Bioinformatics Strategies Driving Dynamic Metaproteomic Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Caitlin M. A. Simopoulos, Daniel Figeys, and Mathieu Lavalle´e-Adam 23 MaxQuant Module for the Identification of Genomic Variants Propagated into Peptides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ¨ rgen Cox Pavel Sinitcyn, Maximilian Gerwien, and Ju 24 Untargeted Metabolomic Profiling of Fungal Species Populations . . . . . . . . . . . . Thomas E. Witte and David P. Overy Correction to: DIA Proteomics and Machine Learning for the Fast Identification of Bacterial Species in Biological Samples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

185

211

223

241

253

263

275

287

299

319

339 349

C1 367

Contributors LYNDA AGBO • Department of Molecular Medicine, Cancer Research Center and Big Data Research Center, Universite´ Laval, Que´bec, QC, Canada; Endocrinology and Nephrology Division, CHU de Que´bec-Universite´ Laval Research Center, Que´bec, QC, Canada ASPASIA A. AMIRIDIS • Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA ANTHONIA ANOWAI • Department of Physiology and Pharmacology, University of Calgary, Calgary, AB, Canada; Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, Canada; McCaig Institute for Bone and Joint Health, University of Calgary, Calgary, AB, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada BRIANNA BALL • Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON, Canada SOPHIE ANNE BLANCHET • Oncology Division, CHU de Que´bec-Universite´ Laval Research Center, Que´bec, QC, Canada; Department of Molecular Biology, Medical Biochemistry and Pathology, Cancer Research Center, Universite´ Laval, Que´bec, QC, Canada FRANC¸OIS-MICHEL BOISVERT • Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Universite´ de Sherbrooke, Que´bec, QC, Canada ¨ ZU¨M SEHNAZ CALISKAN • Diabetes Center, Helmholtz Zentrum Mu¨nchen, German O Research Center for Environmental Health, Neuherberg, Germany; German Center for Diabetes Research (DZD), Neuherberg, Germany TZU-CHIAO CHAO • Department of Biology, University of Regina, Regina, SK, Canada; Institute of Environmental Change and Society, University of Regina, Regina, SK, Canada VINCENT C. CHEN • Department of Chemistry, Brandon University, Brandon, MB, Canada SAMEEKSHA CHOPRA • Department of Physiology and Pharmacology, University of Calgary, Calgary, AB, Canada; Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, Canada; McCaig Institute for Bone and Joint Health, University of Calgary, Calgary, AB, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada DOUG COSSAR • PlantForm Corporation Canada, Toronto, ON, Canada JU¨RGEN COX • Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany ARNAUD DROIT • Proteomics Platform, CHU de Que´bec-Universite´ Laval Research Center, Que´bec, QC, Canada; Computational Biology Laboratory, CHU de Que´bec-Universite´ Laval Research Center, Que´bec, QC, Canada ANTOINE DUFOUR • Department of Physiology and Pharmacology, University of Calgary, Calgary, AB, Canada; Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, Canada; McCaig Institute for Bone and Joint Health, University of Calgary, Calgary, AB, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada DANIEL FIGEYS • Department of Biochemistry, Microbiology and Immunology and Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON, Canada; School of Pharmaceutical Sciences, University of Ottawa, Ottawa, ON, Canada

ix

x

Contributors

LEONARD JAMES FOSTER • Michael Smith Laboratories and Department of Biochemistry & Molecular Biology, University of British Columbia, Vancouver, BC, Canada AME´LIE FRADET-TURCOTTE • Oncology Division, CHU de Que´bec-Universite´ Laval Research Center, Que´bec, QC, Canada; Department of Molecular Biology, Medical Biochemistry and Pathology, Cancer Research Center, Universite´ Laval, Que´bec, QC, Canada JENNIFER GEDDES-MCALISTER • Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON, Canada MAXIMILIAN GERWIEN • Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany CLARISSE GOTTI • Proteomics Platform, CHU de Que´bec-Universite´ Laval Research Center, Que´bec, QC, Canada; Computational Biology Laboratory, CHU de Que´bec-Universite´ Laval Research Center, Que´bec, QC, Canada NICOLE HANSMEIER • Department of Biology, University of Regina, Regina, SK, Canada; Luther College at University of Regina, Regina, SK, Canada RAFE HELWER • Department of Chemistry, Brandon University, Brandon, MB, Canada AICHA ASMA HOUFANI • Michael Smith Laboratories and Department of Biochemistry & Molecular Biology, University of British Columbia, Vancouver, BC, Canada DANISHA JOHAL • Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON, Canada NATALIE KRAHMER • Diabetes Center, Helmholtz Zentrum Mu¨nchen, German Research Center for Environmental Health, Neuherberg, Germany; German Center for Diabetes Research (DZD), Neuherberg, Germany JONATHAN R. KRIEGER • Bioinformatics Solutions Inc., Waterloo, ON, Canada OLEG KROKHIN • Manitoba Centre for Proteomics and Systems Biology, Department of Internal Medicine, University of Manitoba, Winnipeg, MB, Canada JEAN-PHILIPPE LAMBERT • Department of Molecular Medicine, Cancer Research Center and Big Data Research Center, Universite´ Laval, Que´bec, QC, Canada; Endocrinology and Nephrology Division, CHU de Que´bec-Universite´ Laval Research Center, Que´bec, QC, Canada PHILIPP F. LANGE • Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada; Michael Cuccione Childhood Cancer Research Program, BC Children’s Hospital, Vancouver, BC, Canada; Department of Molecular Oncology, BC Cancer, Vancouver, BC, Canada YING LAO • Manitoba Centre for Proteomics and Systems Biology, Department of Internal Medicine, University of Manitoba, Winnipeg, MB, Canada MATHIEU LAVALLE´E-ADAM • Department of Biochemistry, Microbiology and Immunology and Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON, Canada MICKAE¨L LECLERCQ • Computational Biology Laboratory, CHU de Que´bec-Universite´ Laval Research Center, Que´bec, QC, Canada BOYAN LIU • Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON, Canada BARBARA MAINOLI • Department of Physiology and Pharmacology, University of Calgary, Calgary, AB, Canada; Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, Canada; McCaig Institute for Bone and Joint Health, University of Calgary, Calgary, AB, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada GIORGIA MASSACCI • Department of Biology, University of Rome Tor Vergata, Rome, Italy

Contributors

xi

THIBAULT MAYOR • Michael Smith Laboratories, Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada MICHAEL D. MCLEAN • PlantForm Corporation Canada, Toronto, ON, Canada FLORIAN MEIER • Department Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany; Functional Proteomics, Jena University Hospital, Jena, Germany ANNA MELLER • Department of Immunology and Cell Biology, Faculty of Medicine and Health Sciences, Universite´ de Sherbrooke, Que´bec, QC, Canada CRISTEN MOLZAHN • Michael Smith Laboratories, Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada NEELOFFER MOOKHERJEE • Manitoba Centre for Proteomics and Systems Biology, Department of Internal Medicine, University of Manitoba, Winnipeg, MB, Canada; Department of Immunology, University of Manitoba, Winnipeg, MB, Canada LORENZ NIERVES • Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada; Michael Cuccione Childhood Cancer Research Program, BC Children’s Hospital, Vancouver, BC, Canada MARLENE OEFFINGER • Institut de recherches cliniques de Montre´al, Montre´al, QC, Canada; De´partement de biochimie, Faculte´ de me´decine, Universite´ de Montre´al, QC, Canada; Faculty of Medicine, Division of Experimental Medicine, McGill University, QC, Canada DAVID P. OVERY • Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada HADEESHA PIYADASA • Manitoba Centre for Proteomics and Systems Biology, Department of Internal Medicine, University of Manitoba, Winnipeg, MB, Canada; Department of Immunology, University of Manitoba, Winnipeg, MB, Canada; Department of Pathology, School of Medicine, Stanford University, Palo Alto, CA, USA NICHOLAS PRUDHOMME • Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON, Canada ATEFEH RAFIEI • Department of Chemistry, University of Calgary, Calgary, AB, Canada FLORENCE ROUX-DALVAI • Proteomics Platform, CHU de Que´bec-Universite´ Laval Research Center, Que´bec, QC, Canada; Computational Biology Laboratory, CHU de Que´becUniversite´ Laval Research Center, Que´bec, QC, Canada FRANCESCA SACCO • Department of Biology, University of Rome Tor Vergata, Rome, Italy DAVID C. SCHRIEMER • Department of Chemistry, University of Calgary, Calgary, AB, Canada; Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, Canada NICHOLLAS E. SCOTT • Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Parkville, VIC, Australia MITRA SERAJAZARI • Ontario Agriculture College, University of Guelph, Guelph, ON, Canada SAMRACHANA SHARMA • Department of Biology, University of Regina, Regina, SK, Canada CAITLIN M. A. SIMOPOULOS • Department of Biochemistry, Microbiology and Immunology and Ottawa Institute of Systems Biology, University of Ottawa, Ottawa, ON, Canada PAVEL SINITCYN • Computational Systems Biochemistry Research Group, Max-Planck Institute of Biochemistry, Martinsried, Germany PATRICIA SKOWRONEK • Department Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany

xii

Contributors

ARJUN SUKUMARAN • Department of Molecular and Cellular Biology, University of Guelph, Guelph, ON, Canada PATA-ETING KOUGNASSOUKOU TCHARA • Department of Molecular Medicine, Cancer Research Center and Big Data Research Center, Universite´ Laval, Que´bec, QC, Canada; Endocrinology and Nephrology Division, CHU de Que´bec-Universite´ Laval Research Center, Que´bec, QC, Canada SAMINA THAPA • Department of Biology, University of Regina, Regina, SK, Canada CHRISTIAN TRAHAN • Institut de recherches cliniques de Montre´al, Montre´al, QC, Canada JANICE TSUI • Department of Pathology, University of British Columbia, Vancouver, BC, Canada; Michael Cuccione Childhood Cancer Research Program, BC Children’s Hospital, Vancouver, BC, Canada ANULI C. UZOZIE • Department of Pathology, University of British Columbia, Vancouver, BC, Canada; Michael Cuccione Childhood Cancer Research Program, BC Children’s Hospital, Vancouver, BC, Canada AMY M. WEEKS • Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA THOMAS E. WITTE • Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada; Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, ON, Canada DANIEL YOUNG • Department of Physiology and Pharmacology, University of Calgary, Calgary, AB, Canada; Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, Canada; McCaig Institute for Bone and Joint Health, University of Calgary, Calgary, AB, Canada; Hotchkiss Brain Institute, University of Calgary, Calgary, AB, Canada

Chapter 1 Review of the Real and Sometimes Hidden Costs in Proteomics Experimental Workflows Aicha Asma Houfani and Leonard James Foster Abstract A typical proteomics workflow covers all the steps from growing or collecting the cells/tissues/organism, protein extraction, digestion and cleanup, mass spectrometric analysis, and, finally, extensive bioinformatics to derive biological insight from the data. The details of the procedures employed for this can vary widely by laboratory and by sample type: e.g., hard tissues or cells with walls require much more mechanical disruption to extract proteins than do soft tissues, biological fluids, or wall-less cells. Everything then converges on the mass spectrometer, where there are further choices to be made about how to do the analysis. There is one commonality, however, virtually every group around the world now uses liquid chromatography on-line coupled to tandem mass spectrometry, which means that significant amounts of instrument time are dedicated to every sample. There are many other reviews or methods papers, including in this volume, that cover the details of the various procedures involved in proteomic analyses of all types of samples. Our focus here will be on the cost considerations for such analyses, including considerations to ensure that useful data can be obtained the first time a sample is analyzed. Some of these costs are often overlooked, particularly for those groups who operate their own mass spectrometer(s) and do not have to go to a fee-for-service facility to have something analyzed. The chapter presents several challenges and key suggestions in proving hypotheses in proteomics experimental workflow in different biological systems with specific regard to the costs involved, both real and hidden. The detailed methodology for cost-based studies reported in this chapter can help researchers to set up their laboratory with appropriate equipment as well as to identify potential collaborations based on their analytical instrumentation. Key words Mass spectrometry, Sample preparation, Proteomics, Automation, Systems biology

1

Introduction Proteomics is used to characterize “all” the proteins in a sample, but “all” in quotes because the technology is still not at the point where every single different protein can even be identified. Studies have reported more than 10,000 proteins identified from a cell lysate [1–3] but no one has yet made a verifiable claim to have identified all the proteins. Various technical and biological reasons still limit this holy grail of proteomics, including, but not limited to:

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_1, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

1

2

Aicha Asma Houfani and Leonard James Foster

(1) abundance—some proteins are simply below the current limits of detection, which is both a biological and a technical limitation, (2) sequence biases—some proteins do not yield tryptic peptides that are the right size or composition to be detected, (3) modifications—we may miss proteins that are modified in unexpected ways, (4) genome annotation—some genes, especially in non-model organisms, are incorrectly annotated, so the protein sequences that proteomics data is searched against are incorrect and never going to allow identification of some proteins, and (5) sample preparation and/or analytical mistakes/biases. Despite the challenges, however, a fundamental measure of the success of a proteomics experiment is how many proteins were identified, so every laboratory does what it can to overcome these issues. One easy way to increase the number of proteins identified from a sample is to spend more instrument time analyzing that sample (Fig. 1). This could be done by extending the length of the liquid chromatography (LC) gradient or fractionating the sample prior to LC-tandem mass spectrometry (LC-MS/MS). Extending the gradient might use 50 or 100% more instrument time, while fractionation could use 20 or even 50 more instrument time. Simply throwing more instrument time at a sample is one way to solve the problem, but that has costs. If a laboratory has essentially unlimited time on an LC-MS/MS system then, by all means, use that time to get better data. More commonly, and especially in laboratories supported by taxpayer-funded grants, instrument time is limiting: there are many people who all want to get their samples on an instrument so people end up waiting. Even apart from the waiting, the whole analysis process has costs: the direct costs of operating the instrument, the amortized capital costs of the instrument itself, the human costs that went into preparing the sample, but also the human costs when others have to wait, and science is slowed down. Our goal here is to focus on the costs of a proteomics experiment. Some costs are obvious and unavoidable: there is a required capital outlay to obtain an instrument, most newer instruments require special and sometimes proprietary expertise to maintain so there are the costs of service contracts, there are the utility and building costs, and there are consumable costs for solvents, buffers, plasticware, spare parts, etc. Then, there are also the salaries of the highly qualified personnel who prepare the samples and do the analyses. There are costs for the computer hardware and software needed to analyze and store the data. Anyone who runs a laboratory is aware of all these costs, with the possible exception of utility and building costs, which are typically absorbed by the host institution and not passed onto the researchers. Those working in a laboratory are also likely aware of most of these costs. In our experience though, despite being aware of these costs, they are largely not considered when making choices about experiments and analyses.

Review of the Real and Sometimes Hidden Costs in Proteomics Experimental. . .

3

a) Generation of samples (weeks- years) • Sample throughput and replicates • Cases and controls • Wide variety of sample type (biofluids, cells, tissues, organs, plants, insects, microbes) • Complex protein mixture

b) MS Sample preparation • • • • •

(hours- days) Cell lysis and degradation Chemical or enzymatic cleavage Proteolytic digestion (e.g., Trypsin) Complex peptide mixture Peptide fractionation and separation

c) LC-MS/MS data acquisition • •

(hours- days) MS1 (preliminary) MS2 (fragments analysis)

d) Data analysis and quality • • • • •

(hours- days) Proteome coverage Variable selection Peptide identification Quantification visualization and data interpretation

Fig. 1 Considerations during the proteomics workflow. (a) The generation time of biological samples can vary according to the source of material. (b) Extensive and time-consuming sample preparation taking days are bringing a set of challenges at the analytical level. (c) The number of fractions to analyze and the type of mass spectrometers employed in a given experiment determine the extent of LC-MS/MS analysis. (d) There is also a long-time lag between getting qualitative and quantitative data derived from MS analysis for protein searching and interpretation

4

Aicha Asma Houfani and Leonard James Foster

This is in contrast to some of the other obvious costs, around which much time is often spent considering the options, such as the choice of labeling reagents (If I use SILAC, is that cheaper than iTRAQ?), whether to label or not, how many antibodies to buy (Can my lab afford to purchase 25 antibodies at $500 each to follow up on my proteomics data?), and whether to use specialized samples or not (Can I just use HeLa cells for this or do I have to buy expensive stem cells?).

2

Quality Assurance of Samples We know from our own experience and from talking to colleagues that there is considerable wastage in running a proteomics facility: samples were not as rich as they should have been, the column clogged right after loading, the last person to use the instrument left it contaminated with polymer, the wrong control was used, etc. Some of these issues are avoidable with better planning or more care, while some can only be learned empirically. But even empiricism can be done with more efficiency. These are some of the considerations that should be taken into account to ensure efficient use of time to prepare samples, as well as to analyze them.

2.1 Sample Selection, Collection, and Storage

As the first physical step after planning an experiment, acquiring the right samples obviously plays a critical role in the proteomic workflow. But despite knowing what you really want to achieve with an experiment, it is not always obvious what the best sample is to accomplish that. Perhaps the key metric for deciding on which biological system to use is how much protein you can obtain. Typically, one will be loading a few hundred nanograms up to a few micrograms of total protein onto an instrument, but those amounts of sample are very hard to work with when using bulk methods (i.e., a manual pipette and microfuge tubes), so we would typically want at least 10 μg, or possibly 25 μg, of protein to start with. This can vary, however, depending on the type of experiment. For example, if you are seeking to identify interaction partners of a protein of interest that you have immunoprecipitated, it can be challenging to accurately measure how much protein is recoverable from the immunoprecipitate. Our rule-of-thumb in such cases is to run the sample by SDS-PAGE and stain with a colloidal Coomassie protocol [4]—if the target protein can be observed as a specific band of the expected molecular weight in the experimental sample relative to the negative control then there is likely enough of any interaction partners present to lead to a successful LC-MS/MS experiment. A similar test can be used with other samples where spectrophotometric measurement of protein amounts is difficult, such as in the isolation of subcellular organelles.

Review of the Real and Sometimes Hidden Costs in Proteomics Experimental. . .

5

Many proteomics experiments aim to characterize whole cells or tissues, perhaps comparing differences in expression between one state and another. As with any biological experiment, one must decide which system to use but the choices made here may lead to better or worse outcomes with the data, which may, in turn, increase costs and waste time. For example, primary cells or tissues are considered more physiologically relevant than immortalized cell lines but they often cost more and/or are harder to obtain. Getting enough primary cells to yield the needed amount of protein can also be challenging. If you have to kill 100 mice in order to harvest enough of a specialized neuronal cell type that you can extract 25 μg of protein, both your institutional animal ethics board and your lab budget would probably prefer that you do some of the discovery experiments on a suitable cell line, then use animals for the follow-up validation experiments. Dogmatic adherence to “the most physiologically relevant system” is not always the most useful approach. Along with selecting the right samples, you must also make sure that they are collected appropriately, considering all the subsequent steps that will be needed. For example, when lysing cells of any organism, one would typically include protease inhibitors to prevent endogenous enzymes from prematurely degrading the proteins. But if you are then going to use trypsin or another protease to digest those proteins, the protease inhibitors are going to impede that. You could get rid of protease inhibitors later, either by precipitating the proteins or running the proteins into a polyacrylamide gel, but do you really even need them there in the first place? You could save yourself time later on by suspending the cells in an SDS buffer right away and heating to 95  C for 5 min [5]. Likewise, if you are doing a phosphoproteomics experiment, do you really need phosphatase inhibitors there at the beginning or can you just denature all the enzymes immediately, bypassing the need to use inhibitors? Another consideration is how and when to store your samples so that you can be successful when you go back to them to complete the analysis. While it is always best to go non-stop from collecting the sample through the preparation steps directly onto the LC-MS/MS system, this is almost always impossible. So, consider when and how to store the samples so that the proteins/ peptides are fully recoverable, with no unwanted modifications or contaminants. This may require some trial-and-error to determine whether freezing the proteins cause them to fall out of solution or whether they will be irreversibly adsorbed to the plasticware. Perhaps the single storage factor that has led to more wasted mass spectrometry time is the potential for polymers or other contaminants to sneak into stored samples. Storing samples in organic solvents or certain organic acids (e.g., trifluoroacetic acid is particularly bad) can leach plasticizers and other components out of

6

Aicha Asma Houfani and Leonard James Foster

plastics. In addition to suppressing the ionization of the peptides that you want to detect, and thus leading to poor data, this can also lead to substantial other costs due to having to clean the instrument, perhaps replace parts, buy new analytical columns, etc. 2.2 Ensuring HighQuality Samples to Begin with

The reliability of the results of MS analysis will depend on the quality and consistency of sample preparation [6]. Proteomics analysis is most sensitive when there is enough protein to detect and that protein material is as free from other biomolecules (e.g., lipids), salts, and synthetic organic compounds (e.g., polymers) as possible [7]. One very relevant trend here is the recently developed workflows where all the sample processing is done in a single tube or microwell plate well [8, 9]. Separating the molecules of interest in a sample, the proteins, away from everything else requires efficient protein extraction, the steps for which can vary depending on whether one is using cell culture, tissue samples, plants, and any other biological sources [10]. Cell/tissue lysis or disruption steps, including the choice of lysis buffer, can have a great impact on both the recovery and purity of protein. The wrong choice of disruption conditions or lysis buffer can lead to biases in the proteins extracted (e.g., some classes of proteins might not be soluble, or parts of tissue may be left intact) or extract unwanted components (e.g., complex cell wall components from plant and fungal samples can clog columns or lead to unwanted precipitates). Another important component of the upstream biochemistry steps in a proteomics workflow is the detergent(s) employed to solubilize and denature proteins. The choice of detergent and how it is employed can have a very large impact on the efficient operation of a mass spectrometry facility. Detergents can easily contaminate inlet ports, ion optics, detectors, and all other components of a mass spectrometer, to the point where the instrument detects only detergent ions and no proteins/peptides. And detergents are very hard to separate from peptides because they typically have a similar hydrophobicity as peptides and are usually used in vast molar excess over the proteins/peptides one is interested in. In our experience, there is no perfect detergent that does its job extracting proteins but then can be completely eliminated away from the peptides prior to LC-MS/MS. So-called mass specfriendly detergents are available from some vendors: these typically hydrolyze in acidic conditions, leading to breakdown products that should not co-purify with peptides. In our experience, however, the breakdown products are hard to get rid of and better results can be obtained with more conventional detergents. There is not enough space here to discuss all detergents, but we recommend avoiding the Triton and ethoxylated nonylphenol (e.g., NP-40) classes of detergents which are commonly used in other areas of biochemistry. Even when a sample with one of these in it is cleaned up by running it into a polyacrylamide gel, residues of these detergents

Review of the Real and Sometimes Hidden Costs in Proteomics Experimental. . .

7

can still be detected in the mass spectrometer. Instead, we mostly use sodium dodecyl sulfate (SDS) when a detergent is required. SDS is also very detrimental if it is introduced into a mass spectrometer [11] but it can be more efficiently removed from a sample, such as with standard in-gel digestion procedures [12]. The cells can also be lysed in a detergent-free approach using a lysis buffer supplemented with 2,2,2-trifluoroethanol (TFE). TFE will enhance protein solubility, and it also evaporates easily, which allows for quick cleaning. In a study that compared lysis protocols for processing tissues samples, use of 2,2,2-trifluoroethanol (TFE) led to the highest average and total number of peptides and proteins identified, up to 2.3 times more than samples prepared with filter-aided sample preparation (FASP) or RapiGest (one of the acid-hydrolyzable “mass spec-friendly” detergents mentioned above). TFE has several other advantages, as well: the protocol using it was the least time-consuming of all tested, it is low-cost, and its low vapor pressure means that it evaporates easily, if one needs to get rid of it. Despite TFE’s advantages over most other protein extraction methods, it is a hazardous substance that poses greater safety risks than most other methods. Acetonitrile is a less toxic organic solvent that is physiochemically similar to TFE. In the same study, replacing TFE with acetonitrile in a proteomic workflow led to similar peptide and protein identifications, further supported by high proteome correlations [13]. As mentioned, several times, loading a sample that has too much salt or that is contaminated by polymers onto an LC-MS/ MS system can lead to many different problems, wasting all the time that was spent preparing and analyzing that sample, but also forcing the instrument off-line while it is cleaned, disrupting others. A key test that we now do to try to avoid such costly mistakes is to measure every sample on a lesser instrument before it goes on the high-end instrument. This test is not meant to discover any new biology but rather to quickly check if the samples are clean enough and rich enough to warrant further analysis. Of course, with the limited resources of a laboratory and the time pressure of graduate students and post-docs, it is impossible to check out every single sample this way but every new procedure, or even every modification to an existing procedure, should be checked out in advance of it being used for real samples. This would apply to every researcher, too: if someone is new in the lab or is using a new protocol, measuring their ability to produce highquality samples is essential. While this may seem somewhat Draconian, it will ultimately benefit everyone in the lab, including the person undergoing the test since it will ensure that they do not waste their own time.

8

Aicha Asma Houfani and Leonard James Foster

2.3 How Much Data Is Really Needed?

Proteomics is very much a numbers game—where a Nobel Prize was once won for detecting a single protein [14–16], you are unlikely to get your manuscript favorably reviewed now without having identified thousands of proteins, let alone getting it past the editor’s desk at Science, Nature or Cell. Of course, proteomics is one of the tools that could be employed to gain biological insight, and most studies now would involve substantial follow-up to test the hypotheses generated by a discovery approach such as proteomics. This includes the bioinformatics analysis required to fully derive meaning from proteomics data takes much more time than the sample generation and LC-MS/MS [17]. So how many proteins should you be trying to identify? It seems that the lower limit is really about how many you have to identify in order to convince the reviewers that you have sufficient expertise to be conducting a proteomics experiment. Depending on the biological system that you are studying, this minimum is probably in the range of two to three thousand proteins. But a much more practical answer, which also gets directly at the costs, is that you only really need to identify enough proteins to enable you to form solid hypotheses that you can go on to test in other ways. For example, identifying 20,000 phosphopeptides in samples that are infected with a coronavirus like SARS-CoV-2 is great, if, e.g., you are looking for a key regulatory point that determines pathogenicity. But if you could have found that master regulator in the first 4500 phosphopeptides [18] then all that extra effort was not worth it. Of course, you will not know ahead of time how many proteins you will have to identify to find that key point, so aiming for more is going to be the safer route. And having more proteins usually makes bioinformatic investigations of the data more statistically robust. Again though, keeping the costs in mind, there are several strategies that can be employed to go about this optimally: 1. Test samples before choosing to fractionate: our approach is to always analyze whole samples first, including all the different conditions to be tested. For example, if we were looking at differential expression between a treated and untreated sample, we would do triplicate analyses of treated vs. untreated with no fractionation first. This confirms that the sample preparation is appropriate and that the samples are of good quality. Depending on what is found, we would then evaluate whether more protein IDs are needed and how to achieve that (e.g., further fractionation, optimal type of fractionation, extending the LC gradients). 2. Experience: perhaps you have some hypotheses about which protein(s) might be involved in the process you are studying. Have you seen those proteins in proteomics experiments before? How much instrument time (or other considerations) was needed to see those proteins?

Review of the Real and Sometimes Hidden Costs in Proteomics Experimental. . .

9

3. Parallel approaches: perhaps the initial experiments in point #1 above show some differentially expressed proteins but we still think that a deeper analysis is needed (e.g., for better bioinformatic analysis). While fractionating the samples and getting more data, you could also start doing follow-up experiments on some of the differentially expressed proteins identified initially.

3

Nanoflow LC—The Costly Achilles’ Heel of Proteomics As we have been discussing, there are many factors that affect the cost of a proteomics experiment. Nothing increases the per-sample cost in a facility more, however, than instrument downtime—from Table 1 it is obvious that fewer samples analyzed in a year increase the cost for every sample. There are many reasons that an LC-MS/ MS instrument has to be taken off-line: regular cleaning/maintenance, irregular cleaning (e.g., when a highly contaminated sample is injected), hardware failures, power failures, quality control (an important step for ensuring optimal performance but not

Table 1 Putting numbers to the real cost of a proteomics experiment

Category

Description

Cost/ samplea

Sample preparation

Extraction of proteins, digestion, cleanup

$10

Graduate student or post-doc stipend

$200b

Capital costs

LC-MS/MS instrumentation, including installation, taxes, shipping—$1 M

$50c

Compute

Data storage

$1d

Protein identification with an open-access algorithm

$1

Operations

Staff engineer or scientist in charge of keeping instruments running, $20 @1/2 of $80,000 Solvents, columns, capillaries, gases

a

$4

MS maintenance and Service contract, @$50,000/year repairs

$12.50

Total

$298.50

Approximate cost in USD for preparing or analyzing a single sample. Assuming that the total instrument time required would be 90 min, that this would generate 3 GB of data and that the database search would take 1 h b Assuming a post-doc stipend, with benefits, of $50,000. After all the controls are done and the procedure is checked out, it might take a week for one person to generate 7–10 samples c Assuming 4000 real samples analyzed per year (after downtime, quality control checks, etc.), amortized over a 5 year useful life d This can vary widely, from purchasing an external portable hard drive to a mirrored, redundant data server

10

Aicha Asma Houfani and Leonard James Foster

directly productive). But by far the biggest source of downtime is due to issues with the LC side of the LC-MS/MS equation—thus the Achilles’ heel, but, like everyone’s heel, an essential part if you want to walk. Marrying ultra-high pressures with flowrates in the nL/min range gives a system maximum sensitivity; this is essential in proteomics, but it leads to the least robust system imaginable. The nanoflow LC system can fail for many reasons: leaks in the plumbing, microscopic scratches in the rotor/stator or a piece of dust or build-up of contaminants that increases backpressure in the analytical column, etc. In addition to the lost instrument time, fixing LC problems also wastes a considerable amount of personnel time. Higher flow rates can be used for certain targeted approaches and this will drastically improve robustness, but for virtually all discovery approaches in proteomics, one must learn to live in the nanoflow regime. Beyond the costs of not having the LC system operating, the primary consumable cost for nanoflow LC is in the analytical columns. Commercial columns can cost between $1000 and $2000, and if these are damaged by bad samples, they can amount to a significant fraction of the total cost of an experiment. It is relatively simple to make analytical columns in-house but these require some level of expertise and will take some time of an experienced person so also add to cost.

4

Balancing Replication vs. Fractionation Replication is a fundamental principle in science—without repeated measurements we cannot know the inherent variation in the system being studied and thus we cannot tell if a change in that system is significant or not. In the early days of any new discipline, replication is often overlooked in favor of just demonstrating the abilities of a novel technique [19, 20]. Biological replication is now expected in proteomics, as it should be, so the costs of those additional analyses need to be considered. Fractionation is also a key consideration in proteomics—the easiest way to identify more proteins is to fractionate protein or peptide samples ahead of LC-MS/MS analysis. Fractionating a single sample into ten fractions effectively increases the cost for that sample by an order of magnitude. So, with replication and fractionation, the cost for a proteomics experiment can easily reach tens of thousands of dollars. As a thought experiment, imagine a graduate student who wants to generate a rich dataset of diseased vs. healthy tissues to form the basis for their thesis. They know that they are going to need three biological replicates: that decision is made for them, so they resign themselves to generating that many samples. They also know that they can easily increase their protein IDs by 50–100% by fractionating each sample into ten more fractions using their lab’s

Review of the Real and Sometimes Hidden Costs in Proteomics Experimental. . .

11

favorite separation technique. With disease vs. healthy, three replicates and ten fractions each, that is sixty samples. But could the student get a better dataset for about the same cost? What if they did five replicates with six fractions each? Or ten replicates with three fractions each? The analysis costs for these scenarios would be equal, although the sample preparation costs, and the amount of time required of the student would certainly be higher and is significant (Table 1). The optimal balance between replication and fractionation will vary with biological variables in the system being studied, the system’s inherent cost and challenges, and the goals of the experiment. Generally, however, five replicates with less fractionation are going to lead to a better dataset than three replicates with more fractionation. For one thing, with five replicates there will be fewer missing data. For another, fractionation suffers greatly from the law of diminishing returns—most of the gains are realized with the first few fractions [21]. For certain types of experiments, especially when analyzing small clinical trials or rare disease cohorts, incorporating a larger number of replicates with little or no fractionation may be a useful strategy despite the fact that replication of LC-MS/MS is clearly more costly [22].

5

Costs of Quantitation The development of quantitative approaches in proteomics has been rapid, with both label-free and stable-isotope labeling technologies being very popular [23]. When selecting the quantification strategy, it is important to consider the complexity and reproducibility of the samples, as well as the cost of the experiments [24]. Assuming that the cost of obtaining the samples and doing the data analysis is the same, the unique costs involved in a quantitative experiment break down into the costs of the label (if used) and the LC-MS/MS costs. Metabolic and isobaric tagging approaches require a much higher up-front cost to label the samples, where as some other labeling strategies can be accomplished at negligible additional cost [25]. MS1-based labels (i.e., metabolic, non-isobaric derivatization) have a relatively low multiplexing factor (typically no more than three), but still do reduce the amount of LC-MS/MS time required to analyze a sample. Isobaric labels can lead to very high multiplexing [26, 27], albeit at a substantially higher labeling cost/ sample and restrictions on the type of instrument used. In addition, the long-known analytical pitfalls [28, 29] of properly analyzing isobaric tags are often still overlooked, in no small part because dealing with them would be harder and increase the cost of the analysis even more. Nonetheless, to the point of this article, the multiplexing achievable with isobaric tagging can substantially reduce the LC-MS/MS costs of an experiment.

12

Aicha Asma Houfani and Leonard James Foster

Label-free quantitation strategies fall into two categories: ion intensity-based and spectral counting. Ion intensity-based methods can provide adequate accuracy and precision although not quite as good as with isotope labeling [30]. Even if one only analyzes each sample once, the costs saved by not having to label are easily outweighed by the LC-MS/MS costs but, as we have been arguing, these costs are often not recognized since a lab is running those instruments anyway. Spectral counting [31], as it is typically applied, often leads to severe undersampling, meaning that the data is then overinterpreted. The principle of random sampling as a metric of abundance is sound—it forms the whole basis for RNAseq [32]—but to get robust data on a proteome [33], much more sampling is required. Thus, while spectral counting can seem cheaper, easier, and more intuitive on its surface, to do it well and to be confident in the results, one must make many more technical and biological replicate measures compared to what one would have to do with labels or ion intensity-based strategies.

6

Conclusions Proteomics has progressed rapidly, progressively, and more comprehensively over the past years, more than ever before. Proteomic research requires a clear, comprehensive plan to maximize its potential where experimental results and biological insight are improved. Proteomics experiments involve both hidden and more obvious costs, which can appear over time; reducing costs when throughput increases have a direct and indirect impact on experimental step-bystep decisions and by default on the consumption of mass-specfacility resources. While high-throughput and sensitive quantitative proteome analysis is an essential tool in systems biology, it comes with significant costs that are not always factored into decisionmaking. Testing budgets face challenges in controlling sample quality and stability, as well as costs and a lack of highly qualified workers. We hope that the experiences and considerations presented here will help groups and core facilities implement policies and procedures that will improve their efficiency. This review looks at how to calculate the cost of research based on the type of data and the purpose of the research project. Based on the available data, costs for atypical cases can be adjusted, however most of these approaches involve determining the relative cost of an experimental subject, reducing the hours of work and providing rapid evaluation of research needs when developing proteomics protocols.

Review of the Real and Sometimes Hidden Costs in Proteomics Experimental. . .

13

References 1. Rosenberger G, Koh CC, Guo T et al (2014) A repository of assays to quantify 10,000 human proteins by SWATH-MS. Sci Data 1:1–15 2. Muntel J, Gandhi T, Verbeke L, Bernhardt OM, Treiber T, Bruderer R, Reiter L (2019) Surpassing 10 000 identified and quantified proteins in a single run by optimizing current LC-MS instrumentation and data analysis strategy. Mol Omics 15:348–360 3. Geiger T, Wehner A, Schaab C, Cox J, Mann M (2012) Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins. Mol Cell Proteomics 11:1–11 4. Candiano G, Bruschi M, Musante L, Santucci L, Ghiggeri GM, Carnemolla B, Orecchia P, Zardi L, Righetti PG (2004) Blue silver: a very sensitive colloidal Coomassie G-250 staining for proteome analysis. Electrophoresis 25:1327–1333 5. Rogers LD, Fang Y, Foster LJ (2010) An integrated global strategy for cell lysis, fractionation, enrichment and mass spectrometric analysis of phosphorylated peptides. Mol BioSyst 6: 822–829 6. Rogers JC, Bomgarden RD (2016) Sample preparation for mass spectrometry-based proteomics; from proteomes to peptides. Adv Exp Med Biol 919:43–62 7. Hughes CS, Moggridge S, Mu¨ller T, Sorensen PH, Morin GB, Krijgsveld J (2019) Single-pot, solid-phase-enhanced sample preparation for proteomics experiments. Nat Protoc 14:68–85 8. Macklin A, Khan S, Kislinger T (2020) Recent advances in mass spectrometry based clinical proteomics: applications to cancer research. Clin Proteomics 17:1–25 9. Humphrey SJ, Karayel O, James DE, Mann M (2018) High-throughput and high-sensitivity phosphoproteomics with the EasyPhos platform. Nat Protoc 13:1897–1916 10. Iliuk A (2018) Identification of phosphorylated proteins on a global scale. Curr Protoc Chem Biol 10:e48 11. Kachuk C, Doucette AA (2018) The benefits (and misfortunes) of SDS in top-down proteomics. J Proteome 175:75–86 12. Shevchenko A, Wilm M, Vorm O, Mann M (1996) Mass spectrometric sequencing of proteins from silver-stained polyacrylamide gels. Anal Chem 68:850–858 13. Coscia F, Doll S, Bech JM, Schweizer L, Mund A, Lengyel E, Lindebjerg J, Madsen GI, Moreira JM, Mann M (2020) A streamlined mass spectrometry–based proteomics

workflow for large-scale FFPE tissue analysis. J Pathol 251:100–112 14. Mann M (2019) The ever expanding scope of electrospray mass spectrometry—a 30 year journey. Nat Commun 10:3744 15. Fenn JB (2003) Electrospray wings for molecular elephants (Nobel lecture). Angew Chem Int Ed Engl 42(33):3871–3894 16. Cho A, Normile D (2002) Nobel prize in chemistry. Mastering macromolecules. Science 298:527–528 17. Andrecht S, von Hagen J (2008) General aspects of sample preparation for comprehensive proteome analysis. In: Proteomics sample preparation. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, Germany, pp 5–20 18. Bouhaddou M, Memon D, Meyer B et al (2020) The global phosphorylation landscape of SARS-CoV-2 infection. Cell 182:685– 712.e19 19. Wright ME, Eng J, Sherman J, Hockenbery DM, Nelson PS, Galitski T, Aebersold R (2003) Identification of androgen-coregulated protein networks from the microsomes of human prostate cancer cells. Genome Biol 5: R4. 20. Foster LJ, De Hoog CL, Mann M (2003) Unbiased quantitative proteomics of lipid rafts reveals high specificity for signaling factors. Proc Natl Acad Sci U S A 100:5813–5818 21. Ly L, Wasinger VC (2011) Protein and peptide fractionation, enrichment and depletion: tools for the complex proteome. Proteomics 11: 513–534 22. Poulos RC, Hains PG, Shah R et al (2020) Strategies to enable large-scale proteomics for reproducible research. Nat Commun 11:1–13 23. Mann M, Kelleher NL (2008) Precision proteomics: the case for high resolution and high mass accuracy. Proc Natl Acad Sci U S A 105: 18132–18138 24. Mirza SP (2012) Quantitative mass spectrometry-based approaches in cardiovascular research. Circ Cardiovasc Genet 5:477–477 25. Hsu JL, Huang SY, Chow NH, Chen SH (2003) Stable-isotope dimethyl labeling for quantitative proteomics. Anal Chem 75: 6843–6852 26. Braun CR, Bird GH, Wu¨hr M, Erickson BK, Rad R, Walensky LD, Gygi SP, Haas W (2015) Generation of multiple reporter ions from a single isobaric reagent increases multiplexing capacity for quantitative proteomics. Anal Chem 87:9855–9863

14

Aicha Asma Houfani and Leonard James Foster

27. Wang Z, Yu K, Tan H, Wu Z, Cho JH, Han X, Sun H, Beach TG, Peng J (2020) 27-plex tandem mass tag mass spectrometry for profiling brain proteome in Alzheimer’s disease. Anal Chem 92:7162–7170 28. Christoforou AL, Lilley KS (2012) Isobaric tagging approaches in quantitative proteomics: the ups and downs. Anal Bioanal Chem 404: 1029–1037 29. Christoforou A, Lilley KS (2011) Taming the isobaric tagging elephant in the room in quantitative proteomics. Nat Methods 8:911–913 30. Wang M, You J, Bemis KG, Tegeler TJ, Brown DPG (2008) Label-free mass spectrometrybased protein quantification technologies in proteomic analysis. Brief Funct Genomics Proteomics 7:329–339

31. Liu H, Sadygov RG, Yates JR (2004) A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal Chem 76:4193–4201 32. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M (2008) The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320: 1344–1349 33. Girard M, Allaire PD, McPherson PS, Blondeau F (2005) Non-stoichiometric relationship between clathrin heavy and light chains revealed by quantitative comparative proteomics of clathrin-coated vesicles from brain and liver. Mol Cell Proteomics 4:1145–1154

Chapter 2 High-Throughput Mass Spectrometry-Based Proteomics with dia-PASEF Patricia Skowronek and Florian Meier Abstract Ion mobility separation is becoming an integral part in mass spectrometry-based proteomics. Here we describe the use of a trapped ion mobility-quadrupole time-of-flight (TIMS-QTOF) mass spectrometer for high-throughput label-free quantification with data-independent acquisition. The parallel accumulationserial fragmentation (PASEF) operation mode positions the mass-selecting quadrupole as a function of the TIMS separation, which allows highly efficient data-independent acquisition schemes (dia-PASEF), but also increases complexity in the method design. We provide a step-by-step protocol for instrument setup, method design, data acquisition and ion mobility-aware, library-based data analysis with Spectronaut. We highlight key acquisition parameters and illustrate their optimization for short gradients. Using the EvosepOne liquid chromatography system, we demonstrate expected results for the analysis of a human cancer cell line at a throughput of 60 samples per day, leading to the quantification of about 6000 protein groups with very high reproducibility. Importantly, the protocol can be readily adapted to other gradients and sample types such as modified peptides. Key words Mass spectrometry, Ion mobility, Proteomics, Data-independent acquisition, TIMS, PASEF, dia-PASEF

1

Introduction Mass spectrometry-based proteomics provides increasingly detailed insights into the architecture and dynamics of biological systems [1, 2]. However, the complexity of the proteome still poses an enormous analytical challenge that spurs the development of more scalable workflows with ever-increasing coverage [3]. Ion mobility spectrometry (IMS) is a rapid gas phase separation technique that can be integrated with conventional liquid chromatography-mass spectrometry (LC-MS) to increase the analytical peak capacity and selectivity [4]. Advances in ion transmission and simplified handling of commercially available devices have turned this technology into an attractive option for many proteomics workflows [5–8].

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_2, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

15

16

Patricia Skowronek and Florian Meier

The mobility of ions drifting in an electric field through a tube filled with inert gas is determined by their size and shape [9]. Trapped ion mobility spectrometry (TIMS) reverses this concept of classic drift tube IMS by holding ions in an axial electric field gradient against a gas flow at 1–3 mbar [10–12]. In this setup, ions are trapped at a position where their mobility in the electric field is balanced by the drag of the gas flow: larger, low-mobility ions towards the exit, and smaller, high-mobility ions towards the entrance of the TIMS analyzer. Progressively lowering the electric field strength releases ions to the downstream mass analyzer as a function of their ion mobility. A compelling feature of TIMS is that a high resolving power can be achieved within 50–200 ms scan times even with comparably compact devices of 5–10 cm length [13]. This also opens up the possibility to install two TIMS in series for efficient ion utilization [14]. In a TIMS-quadrupole-time-of-flight (QTOF) configuration, the selection of precursors for fragmentation can be further synchronized with the release of ions from the TIMS device [15]. In this “parallel accumulation – serial fragmentation” (PASEF) mode, the isolation window of the analytical quadrupole rapidly switches its mass position during a single TIMS scan [7, 16]. Note that this does not compromise sensitivity as the precursors are released in narrow ion mobility peaks. Because peptide mass and mobility are correlated, typically high-m/z (low mobility) peptides are released first, followed by ions of decreasing m/z (higher mobility). Following this trend with the quadrupole in a data-independent acquisition mode (dia-PASEF) can sample up to 100% of the peptide precursor ion beam [17]. Here, we describe a step-by-step protocol for label-free proteome quantification with dia-PASEF using a high-throughput nanoflow liquid chromatography system coupled to a TIMS-QTOF mass spectrometer. We highlight critical steps in the instrument setup and considerations for method design. To illustrate the application of our protocol, we provide an acquisition scheme for the analysis of 60 human cancer cell line samples per day, which should serve as a starting point for dia-PASEF measurements of other sample types or with other liquid chromatography systems. We also briefly describe the data analysis using library-based targeted data extraction in mass, retention time, and ion mobility dimensions.

2 2.1

Materials Samples

1. ESI-L Low Technologies).

Concentration

Tuning

Mix

(Agilent

2. Purified peptides from cells or tissues as obtained following standard protocols for mass spectrometry-based proteomics.

High-Throughput MS-Based Proteomics with dia-PASEF

17

We demonstrate our protocol using a tryptic digest of HeLa S3 human cervical cancer cells, which were processed according to the “iST” method [18]. 2.2 Liquid Chromatography

1. Nanoflow ultra-high performance liquid chromatography system. We use an Evosep One [19] (Evosep Biosystems) system for the analysis of 60 samples per day with a standardized gradient (see Note 1). 2. Disposable trap columns with C18 solid phase extraction material (Evotips, Evosep Biosystems). 3. Analytical column for nanoflow reversed-phase liquid chromatography. Here, we use a commercially available 8 cm  150 μm capillary column packed with 1.5 μm C18 porous beads and finger-tight zero-dead volume connectors (Evosep Biosystems) (see Note 2). 4. Highest-purity grade mobile phases A and B: water with 0.1 vol.-% formic acid and acetonitrile with 0.1 vol.-% formic acid.

2.3 Mass Spectrometry

1. High-resolution quadrupole time-of-flight mass spectrometer equipped with a dual TIMS device in the first vacuum stage and the possibility to operate in PASEF mode. We demonstrate the protocol with a Bruker timsTOF Pro system (see Note 3). 2. Nanoelectrospray ion source and emitter, for example, a Bruker CaptiveSpray source and a tapered tip electrospray emitter (inner diameter 20 μm) with a zero-dead volume connector (see Note 4). 3. Instrument control software to generate acquisition methods and perform dia-PASEF measurements. The protocol describes the use of Bruker timsControl v2.0.18. 4. Instrument control software to operate the liquid chromatography system and schedule acquisition sequences. The protocol describes the use of Bruker HyStar v5.1 with a software plug-in for the Evosep One system.

2.4

Data Analysis

1. Software for the analysis of dia-PASEF proteomics data (see Note 5). To demonstrate expected results, we analyze the data with Spectronaut v14 (Biognosys AG) [20]. 2. Spectral library for targeted data extraction [21] (see Note 6). We use a project-specific library generated with Spectronaut from 48 high pH reversed-phase fractions [22] of a HeLa digest acquired with PASEF in data-dependent acquisition mode using the same LC-MS setup and gradient as in the dia-PASEF experiments. Our library comprises precursor and fragment ion masses, retention time, and ion mobility information of 176,885 precursors from 122,928 unique peptide sequences, from which 9688 protein groups were inferred.

18

Patricia Skowronek and Florian Meier

3. Appropriate proteome database for the species under investigation. We downloaded a fasta file containing human canonical and isoform protein sequences from Uniprot (http://www. uniprot.org, taxonomy ID: 9606).

3

Methods Care should be taken when handling hot surfaces, fragile parts, or components to which high voltage is applied. Wear gloves to protect yourself and avoid contamination. Obey strictly to all relevant instrument and laboratory safety instructions.

3.1 Instrument Setup and Ion Mobility Calibration

1. Follow the standard routine to calibrate the time-of-flight dimension of the timsTOF Pro instrument in TIMS mode with the ESI-L Low Concentration Tuning Mix. 2. Assemble the CaptiveSpray source by first placing the emitter in the respective holder following the manufacturer’s instructions (see Note 7). 3. Mount the CaptiveSpray ion source on the timsTOF Pro mass spectrometer according to the manufacturer’s instructions. Then connect the analytical column to the electrospray emitter. Ensure that the source housing and all connections are tight to achieve stable operating conditions and to avoid post-column dead volumes. 4. Add ESI-L Low Concentration Tuning Mix as lock masses onto the CaptiveSpray filter or the inner walls of the adaptor (see Note 8). 5. Place fresh solvents in the Evosep One liquid chromatography system and connect the sample line to the analytical column. It is critical that all connections are tight and that there are no leaks. 6. Open the HyStar software and establish a connection to the timsTOF Pro mass spectrometer as well as the Evosep One LC system. 7. Start the idle flow in the “preparation” tab of LC system to establish a constant flow at a rate of 250 nL/min. 8. Activate the “CaptiveSpray” ion source in the timsControl software and switch the mass spectrometer to “Operate” mode. If necessary, adjust the electrospray parameters to achieve a stable ion current. Typical settings will be dry gas temp. 180  C; dry gas flow 3 L/min; capillary voltage 1750 V. 9. Enable TIMS and chose appropriate mass spectrometry parameters for your experiment or load a previously generated acquisition method (see step 3.2). Typical parameters will be

High-Throughput MS-Based Proteomics with dia-PASEF

19

scan begin: m/z 100; scan end: m/z 1700; ion polarity: positive; scan mode: dia-PASEF; 1/K0 start: 0.60 Vs cm2; 1/ K0 end: 1.50 Vs cm2; ramp time: 100 ms; accumulation time: 100 ms. Confirm a sufficient signal-to-noise for the lock masses at nominal m/z 622, 922, and 1222 (see Subheading 3.1, step 4). 10. Add extracted ion mobilogram traces for the lock mass ions to the live view. 11. Change the axis of the mobilogram view (y-axis) in the “TIMS View” window to the TIMS elution voltage and adjust the elution voltage of the m/z 622 ions to 132 V by manually restricting the source gas flow. 12. Set the mobilogram axis back to the reduced ion mobility coefficient 1/K0 and navigate to the “calibration” tab in the timsControl software (see Note 9). Use a linear regression model to calibrate the TIMS measurement with the following values (m/z, 1/K0): 622.0289, 0.9848 Vs cm2; 922.0097, 1.1895 Vs cm2; 1221.9906, 1.3820 Vs cm2. These values may also be found in the predefined “[ESI] Tuning Mix ES-TOF (ESI)” reference list. Confirm the correct peak assignment by clicking at the respective entries manually and accept the calibration if the score is 100%. 3.2 Setting up a diaPASEF Acquisition Method

1. Navigate to the “MS Settings” window in the timsControl software and confirm that the scan mode is set to “dia-PASEF.” 2. In the “MS/MS” tab, set appropriate collision energies for your experiment as a function of the ion mobility (see Note 10). Typical parameters for tryptic full-proteome digests will be linear interpolation from 20 eV at 1/K0 ¼ 0.6 Vs cm2 to 59 eV at 1/K0 ¼ 1.6 Vs cm2. 3. In the current implementation of dia-PASEF, the quadrupole isolation is stepped as a function of the ion mobility to cover rectangular windows in the m/z vs. ion mobility precursor space (Fig. 1) [17]. Acquisition schemes cover the full precursor space either in a single or in multiple dia-PASEF scans, sampling a fraction of all precursors in each. Adjusting the number of dia-PASEF scans per cycle as well as the position in m/z and ion mobility dimensions of the isolation areas thus provides a handle to balance sensitivity, speed, and selectivity according to experimental needs (see Note 11). 4. Follow the steps below to generate a new dia-PASEF acquisition scheme or import an existing scheme (“isolation list”) from a text file and proceed to the next section. Figure 2 shows an acquisition scheme that we found well suited for high-throughput label-free proteome quantification [17].

20

Patricia Skowronek and Florian Meier

Fig. 1 Schematics of dia-PASEF scans on a trapped ion mobility-quadrupole-time-of-flight mass spectrometer. In dia-PASEF mode, the mass spectrometer cycles through all dia-PASEF scans defined in an acquisition cycle, which is typically interspersed with TIMS-MS1 scans. The position of peptide precursor and fragment ions is plotted in the m/z vs. ion mobility plane, and the projections on the upper and right axes show the corresponding mass and ion mobility spectra. The left panels illustrate the precursor isolation scheme, i.e., the position of the mass-selecting quadrupole as a function of the TIMS separation. The right panels illustrate the resulting ion mobility-resolved fragment ion spectra (MS2). (a) A single-scan dia-PASEF acquisition scheme with six isolation windows. The diagonal arrow indicates the scan direction. (b) A three-scan dia-PASEF acquisition scheme (numbered 1, 2, 3) with three windows in each, collectively covering the full precursor space. Horizontal arrows indicate switching of the quadrupole isolation window

High-Throughput MS-Based Proteomics with dia-PASEF

21

Fig. 2 Screenshots of the timsControl software showing the setup of an eight-scan dia-PASEF acquisition method for short gradients. (a) Method parameters and visualization of the isolation scheme overlaid on averaged ion intensities detected in a 21 min analysis of HeLa digest. The windows in each of the eight dia-PASEF scans per acquisition cycle are grouped by color. (b) Tabular representation of the acquisition scheme using the same color code

5. To generate a new dia-PASEF acquisition scheme, open the “window editor” in the MS/MS tab and visualize the distribution of precursor ions in the two-dimensional m/z vs. ion mobility space by loading a representative experiment of your sample via “Open Analysis” (Fig. 2a). The chromatogram panel will show the total ion chromatogram, while the heatmap shows the average intensities of precursors distributed in the m/z vs. ion mobility space. Complex tryptic digests typically show distinct clouds for singly charged background ions and multiply charged peptide species (see Note 12). 6. Click into the heatmap to define a polygon covering the precursor area of interest. The example in Fig. 2a uses the following four anchor points, clockwise starting from the top (m/z, 1/K0): 1000, 1.48; 1000, 1.24; 400, 0.58; 400, 0.78. 7. Use the method editor to calculate isolation windows within the defined area of interest (see Note 13). To reproduce the scheme in Fig. 2a, set the following parameters: mass width: 25.0 Da; mass overlap: 0; mass steps per cycle: 24; calculate from polygon: mass steps; mobility overlap: 0; number of mobility windows: 1; mass range: m/z 400–1000. Confirm that the estimated cycle time for this acquisition scheme is 0.95 s. 8. Click “Apply dia-PASEF windows to method” to save the current acquisition scheme.

22

Patricia Skowronek and Florian Meier

9. Optionally, the isolation list (Fig. 2b) can be exported as a text file and edited in a text editor (see Note 14). Importing acquisition schemes from text files allows highly customized schemes and facilitates method transfer between laboratories. 10. “Save” or “Save as . . .” the MS acquisition method. 3.3 Setting up an LCMS Acquisition Method

1. Open the HyStar software and ensure that the Evosep One LC configuration is active. 2. Ensure that the standardized Evosep LC gradients to analyze 30, 60, 100, 200, and 300 samples per day [19] are correctly installed on your system and available in HyStar (see Note 15). 3. It is convenient to link MS and LC methods in a method set. To do so, navigate to the “Method Set” tab and select the predefined and standardized “60_SPD.m” method as “separation method,” injection method “Standard” and the MS method created above as “MS method.” 4. Save the method set.

3.4 Acquire a Sequence of diaPASEF Experiments

1. Open the HyStar software and confirm that a connection to both the timsTOF Pro mass spectrometer and the Evosep One LC system is established. 2. The Evosep One LC system uses C18 StageTips [23] as disposable trap columns (Evotips) to increase sample throughput and robustness [19]. Wash and equilibrate the Evotips prior to loading approximately 200 ng of each sample on an Evotip according to the manufacturer’s instructions (see Note 16). 3. Place the Evotips in the autosampler tray of the Evosep One. 4. Open the “Sample Table” tab in HyStar to set up a new acquisition sequence. 5. Chose the appropriate 96-well plate layout and tray slot. 6. Use the graphical user interface to define the position of each sample and enter a sample ID as well as a data path. 7. For each sample, select the method set created above, containing the dia-PASEF MS method and the 60 samples per day LC method. 8. Save the sample table. 9. Navigate to the “Acquisition” tab and open the sample table from above. 10. Confirm that the checkboxes are active for all samples and select the first sample in your sequence. 11. Click “Start sequence” to start the acquisition. The .d raw data will be saved under the sample ID in the data path specified in the sample table.

High-Throughput MS-Based Proteomics with dia-PASEF

23

Fig. 3 Single-run proteome analysis of a HeLa digest with dia-PASEF at a throughput of 60 samples per day. (a) Design and results of triplicate 200 ng injections with three alternative dia-PASEF acquisition schemes. (b) Total number of identified precursors, peptides, and protein groups in triplicate injections with the 25 Th method from a. Quantifications with a coefficient of variation (CV) below 20% are highlighted. (c) Detected number of peptides per minute as a function of the retention time in a single dia-PASEF experiment 3.5 Data Processing and Expected Results

Several processing tools and strategies are suitable for the analysis of dia-PASEF raw data (see Notes 5 and 6). As an example, we used Spectronaut [20] to process replicate injections of 200 ng HeLa samples acquired with the “60 samples per day” gradient. We chose a comprehensive project-specific library (see Subheading 2.4, item 2) for targeted data extraction and label-free quantification. This approach queries the data set for peptide precursor and fragment ion contained in a spectral library [21], which are positioned in the three-dimensional space of retention time, m/z and ion mobility [17]. To illustrate the effect of acquisition parameters and the choices in method design (see Notes 11 and 13), we varied the quadrupole isolation width from 10 to 75 Da, while keeping the polygon area fixed to cover theoretically close to 90% of all peptide precursors in the library (Fig. 3a). As narrower isolation windows cover less precursors per dia-PASEF scan, we added more dia-PASEF scans to the acquisition cycle going from wide to narrow isolation windows. While reducing the spectral complexity and thus facilitating the analysis, this increased the cycle time and thereby lowered the number of data points per peak. As a result, the median coefficients of variation were lowest for the fastest method (75 Th) and highest for the slower 10 Th method. Conversely, the average number of quantified peptide precursors and protein groups was higher for the methods with narrower isolation windows. From this, we concluded that an acquisition scheme with eight dia-PASEF scans per acquisition cycle and three isolation windows per dia-PASEF scan provides a good compromise between proteome coverage and quantitative accuracy for the analysis of 200 ng sample loads at a throughput of 60 samples per day. In triplicate injections, this method resulted in the quantification of, in total, 65,449

24

Patricia Skowronek and Florian Meier

precursors and 56,582 peptides, from which 6186 protein groups were inferred (Fig. 3b). Strikingly, almost 4000 peptides were detected per minute in the center of the 21 min gradient (Fig. 3c). The data completeness on the protein group level was 97.5%. Taken together, this protocol enables rapid, quantitative analysis of complex proteome samples from minimal sample amounts. The discussion of the dia-PASEF parameter space should provide guidance for the development of methods for other applications such as extended gradients or the analysis of post-translational modifications.

4

Notes 1. This protocol can be readily transferred to alternative liquid chromatography setups. A detailed description of the Evosep One system is available in ref. [19]. 2. Depending on your choice of analytical column and electrospray emitter, a heated column compartment may be necessary for optimal performance and endurance. 3. A detailed description of the timsTOF Pro mass spectrometer is available in ref. [7]. 4. As an alternative, we used an electrospray emitter with 10 μm inner diameter (Bruker), resulting in a somewhat higher backpressure. 5. A wide range of proteomics software packages is available for the analysis of dia-PASEF data, including free and open-source tools such as Skyline [24], OpenSWATH [17, 25], and DIA-NN [26, 27]. 6. Library-based strategies are widely used for the analysis of dataindependent acquisition data [28]. As an alternative, libraryfree or hybrid approaches are applicable. 7. Confirm that the emitter is grounded to ascertain a stable electrospray. 8. Alternatively, the lock mass compounds can be purchased individually and applied in higher concentrations. However, it is sufficient to detect low signal levels when operating the liquid chromatography system in idle flow. 9. TIMS calibration is critical to position the dia-PASEF isolation windows accurately in the m/z vs. ion mobility plane. The calibration parameters depend on the ion mobility scan range as well as the scan time defined in the acquisition method and therefore calibration must be performed whenever these settings are changed. In addition, calibration values are sensitive

High-Throughput MS-Based Proteomics with dia-PASEF

25

to variations in the gas flow through the TIMS device, for example, through changes in the ambient air pressure. We therefore recommend to verify the ion mobility position of the lock mass ions regularly and re-calibrate if necessary. 10. Collision energies are typically set as a function of the TIMS elution voltage, i.e., the ion mobility. When using a projectspecific library, we recommend to use the same collision energy settings for the acquisition of library and dia-PASEF data. 11. The complexity of fragment ion spectra in dia-PASEF can be reduced by narrowing the quadrupole isolation window. To cover the full mass range, precursors windows can be split to multiple dia-PASEF scans, thereby increasing the cycle time and lowering the ion utilization. To minimize the effect of lower ion transmission at the borders of the isolation windows either in m/z or ion mobility dimension, dia-PASEF scans in a given dia-PASEF scan cycle can have overlapping windows in either one or both dimensions. An optimal method will thus be a tradeoff between selectivity (number of precursors in a window), sensitivity (ion utilization), and quantitative accuracy (number of data points per chromatographic peak). For the analysis of 60 samples per day and comparably high sample loads of 200 ng, we found a method with 25 Da isolation windows and a cycle time of 0.95 s to provide a good compromise between proteome coverage and quantitative accuracy. 12. A detailed investigation of the conformational space of peptide ions in TIMS can be found in ref. [29]. 13. In dia-PASEF acquisition schemes, the quadrupole isolation window switches as a function of the ion mobility, which means that at any point in the TIMS ramp time one m/z range is selected for fragmentation and mass analysis. The number of selected windows per dia-PASEF scan depends on the ion mobility width and slope of the defined precursor polygon, since a higher number of windows per dia-PASEF scan reduces the available time, and thus ion mobility range, per window. 14. In case the ion mobility range of the polygon area is less than the total TIMS ion mobility range, the first and last windows in each dia-PASEF scan may be extended to cover the full ion mobility range by manually editing the isolation list. 15. Activating “automatic idle flow” in the LC configuration editor/main pump menu may help to prolong the lifetime of column and emitter. 16. Avoid drying of the Evotips throughout the protocol and, if feasible, load your samples just before MS analysis. An up-todate protocol for loading samples onto Evotips is available at https://www.evosep.com/wp-content/uploads/2020/03/ Sample-loading-protocol.pdf.

26

Patricia Skowronek and Florian Meier

Acknowledgements We thank Prof. Dr. Matthias Mann for his generous support of this project. We are grateful to our colleagues in the Department Proteomics and Signal Transduction at the Max Planck Institute of Biochemistry as well as at Bruker Daltonics and Evosep Biosystems for fruitful discussions and valuable feedback. Research in the Department Proteomics and Signal Transduction is supported by the Max Planck Society for the Advancement of Science. F.M. acknowledges support by the Federal Ministry of Education and Research and the Thuringian Ministry for Economic Affairs, Science and a Digital Society through the Joint Federal Government-L€ander Tenure-Track Programme and by the Free state of Thuringia and the European Union via the ‘Innovationszentrum fu¨r Thu¨ringer Medizintechnik-Lo¨sungen’ (ThIMEDOP; #2018 IZN 002). References 1. Larance M, Lamond AI (2015) Multidimensional proteomics for cell biology. Nat Rev Mol Cell Biol 16:269–280. https://doi.org/10. 1038/nrm3970 2. Aebersold R, Mann M (2016) Massspectrometric exploration of proteome structure and function. Nature 537:347–355. https://doi.org/10.1038/nature19949 3. Riley NM, Hebert AS, Coon JJ (2016) Proteomics moves into the fast lane. Cell Syst 2: 142–143. https://doi.org/10.1016/j.cels. 2016.03.002 4. Dodds JN, Baker ES (2019) Ion mobility spectrometry: fundamental concepts, instrumentation, applications, and the road ahead. J Am Soc Mass Spectrom 30:2185–2195. https:// doi.org/10.1007/s13361-019-02288-2 5. Pfammatter S, Bonneil E, McManus FP et al (2018) A novel differential ion mobility device expands the depth of proteome coverage and the sensitivity of multiplex proteomic measurements. Mol Cell Proteomics 17:2051–2067. https://doi.org/10.1074/mcp.TIR118. 000862 6. Hebert AS, Prasad S, Belford MW et al (2018) Comprehensive single-shot proteomics with FAIMS on a hybrid orbitrap mass spectrometer. Anal Chem 90:9529–9537. https://doi. org/10.1021/acs.analchem.8b02233 7. Meier F, Brunner A-D, Koch S et al (2018) Online parallel accumulation-serial fragmentation (PASEF) with a novel trapped ion mobility mass spectrometer. Mol Cell Proteomics 17:

2534–2545. https://doi.org/10.1074/mcp. TIR118.000900 8. Bekker-Jensen DB, Martı´nez-Val A, Steigerwald S et al (2020) A compact quadrupoleorbitrap mass spectrometer with FAIMS interface improves proteome coverage in short LC gradients. Mol Cell Proteomics 19:716–729. https://doi.org/10.1074/mcp.tir119. 001906 9. Revercomb HE, Mason E, a. (1975) Theory of plasma chromatography/gaseous electrophoresis. Rev Anal Chem 47:970–983. https:// doi.org/10.1021/ac60357a043 10. Fernandez-Lima F, Kaplan D, Suetering J, Park MA (2011) Gas-phase separation using a trapped ion mobility spectrometer. Int J Ion Mobil Spectrom 14:93–98. https://doi.org/ 10.1007/s12127-011-0067-8 11. Fernandez-Lima FA, Kaplan DA, Park MA (2011) Note: integration of trapped ion mobility spectrometry with mass spectrometry. Rev Sci Instrum 82:126106. https://doi.org/10. 1063/1.3665933 12. Ridgeway ME, Lubeck M, Jordens J et al (2018) Trapped ion mobility spectrometry: a short review. Int J Mass Spectrom 425:22–35. https://doi.org/10.1016/j.ijms.2018.01.006 13. Michelmann K, Silveira JA, Ridgeway ME, Park MA (2014) Fundamentals of trapped ion mobility spectrometry. J Am Soc Mass Spectrom 26:14–24. https://doi.org/10.1007/ s13361-014-0999-4 14. Silveira JA, Ridgeway ME, Laukien FH et al (2017) Parallel accumulation for 100% duty

High-Throughput MS-Based Proteomics with dia-PASEF cycle trapped ion mobility-mass spectrometry. Int J Mass Spectrom 413:168–175. https:// doi.org/10.1016/j.ijms.2016.03.004 15. Meier F, Beck S, Grassl N et al (2015) Parallel accumulation–serial fragmentation (PASEF): multiplying sequencing speed and sensitivity by synchronized scans in a trapped ion mobility device. J Proteome Res 14:5378–5387. https://doi.org/10.1021/acs.jproteome. 5b00932 16. Vasilopoulou CG, Sulek K, Brunner A-D et al (2020) Trapped ion mobility spectrometry and PASEF enable in-depth lipidomics from minimal sample amounts. Nat Commun 11:331. https://doi.org/10.1038/s41467-01914044-x 17. Meier F, Brunner A-D, Frank M et al (2020) diaPASEF: parallel accumulation–serial fragmentation combined with data-independent acquisition. Nat Methods 17:1229–1236. https://doi.org/10.1038/s41592-02000998-0 18. Kulak NA, Pichler G, Paron I et al (2014) Minimal, encapsulated proteomic-sample processing applied to copy-number estimation in eukaryotic cells. Nat Methods 11:319–324. https://doi.org/10.1038/nmeth.2834 19. Bache N, Geyer PE, Bekker-Jensen DB et al (2018) A novel LC system embeds analytes in pre-formed gradients for rapid, ultra-robust proteomics. Mol Cell Proteomics 17(11): 2284–2296. https://doi.org/10.1101/ 323048 20. Bruderer R, Bernhardt OM, Gandhi T et al (2015) Extending the limits of quantitative proteome profiling with data-independent acquisition and application to acetaminophentreated three-dimensional liver microtissues. Mol Cell Proteomics 14:1400–1410. https:// doi.org/10.1074/mcp.M114.044305 21. Gillet LC, Navarro P, Tate S et al (2012) Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics 11:

27

O111.016717. https://doi.org/10.1074/ mcp.O111.016717 22. Kulak NA, Geyer PE, Mann M (2017) Lossless nano-fractionator for high sensitivity, high coverage proteomics. Mol Cell Proteomics 16(4):694–705. https://doi.org/10.1074/ mcp.O116.065136 23. Rappsilber J, Mann M, Ishihama Y (2007) Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat Protoc 2: 1896–1906. https://doi.org/10.1038/nprot. 2007.261 24. Pino LK, Searle BC, Bollinger JG et al (2020) The Skyline ecosystem: informatics for quantitative mass spectrometry proteomics. Mass Spectrom Rev 39:229–244. https://doi.org/ 10.1002/mas.21540 25. Ro¨st HL, Rosenberger G, Navarro P et al (2014) OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat Biotechnol 32:219–223. https://doi.org/10.1038/nbt.2841 26. Demichev V, Messner CB, Vernardis SI et al (2020) DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput. Nat Methods 17: 41–44. https://doi.org/10.1038/s41592019-0638-x 27. Demichev V, Yu F, Teo GC, et al (2021) High sensitivity dia-PASEF proteomics with DIA-NN and FragPipe. bioRxiv. https://doi. org/10.1101/2021.03.08.434385 28. Zhang F, Ge W, Ruan G et al (2020) Dataindependent acquisition mass spectrometrybased proteomics and software tools: a glimpse in 2020. Proteomics 20:1900276. https://doi. org/10.1002/pmic.201900276 29. Meier F, Ko¨hler ND, Brunner A et al (2021) Deep learning the collisional cross sections of the peptide universe from a million experimental values. Nat Commun 12:1185. https://doi. org/10.1038/s41467-021-21352-8

Chapter 3 Isolation of Detergent Insoluble Proteins from Mouse Brain Tissue for Quantitative Analysis Using Data Independent Acquisition (DIA) Cristen Molzahn, Lorenz Nierves, Philipp F. Lange, and Thibault Mayor Abstract Enrichment of detergent insoluble proteins is a commonly used technique for analyzing proteins that may be aggregating in disease or with age. However, various methods for enriching for these proteins are used. Here we present a method using a mild detergent (Triton X-100) and high centrifugation speed (20,000  g) allowing for sufficient protein extraction and enrichment for large protein assemblies. Digestion is performed on columns allowing for a methanol chloroform wash to remove the highly prevalent lipids in brain tissue. This is followed by analysis by data independent acquisition mass spectrometry, which we have found to be highly reproducible. Our method is intended to enrich for amorphous aggregates, which may accumulate upon the collapse of protein homeostasis. Key words Protein aggregation, Aging, Neurodegenerative diseases, Mass spectrometry

1

Introduction Aging is often associated with the onset of neurodegeneration and cognitive decline. The principal cause of these neurodegenerative diseases (NDDs) is the aberrant accumulation of aggregated proteins. Despite intense investigation the instigating event in the formation of toxic protein aggregates remains unknown, but it is thought to proceed through accumulated damage and loss of protein homeostasis (proteostasis). In model organisms such as Caenorhabditis elegans, lifespan extension has been shown to alter protein aggregation, suggesting an intimate link between protein aggregation and aging. In order to better characterize the loss of proteostasis with age, further investigation is necessary into the composition of age-related or disease-induced protein aggregates. Detergent insoluble and protease resistant amyloid aggregates are characteristic of many NDDs [1]. Although amyloid fibrils are frequently associated with NDDs, the smaller oligomeric

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_3, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

29

30

Cristen Molzahn et al.

aggregates that proceed the formation of fibrillar aggregates have been found to be more cytotoxic [2–4]. Indeed, mouse models show cognitive decline before formation of amyloids [5, 6]. A possible mechanism for this toxicity may stem from the ability of fibrilforming proteins to disrupt cellular function by sequestering additional proteins into amorphous aggregates [7]. Unlike amyloids, it has been shown that oligomers consist of proteins retaining some of their native structure, suggesting that these proteins are amenable to digestion by trypsin for mass spectrometry-based proteomics [8]. Protein aggregation with age has been identified in worms and become well characterized over the years [9–13]. Long and shortlived worms such as hsf-1 and daf-2 mutants have been used to understand the relationship between protein aggregation and lifespan. While both mutant worms accumulate aggregates, the composition of those aggregates is different, suggesting that the cellular mechanisms for dealing with aggregates have implications for determining lifespan. These studies have established important foundations in aging research, but they lack the context for understanding aggregation in brain tissue. Expanding studies of aggregation to mice presents a crucial next step in determining how aggregation and aging are linked. Here we present methods for extracting triton-insoluble aggregate from mouse brain tissue as well as identifying and quantifying their component with data independent mass spectrometry. In recent years, many studies have combined high-speed centrifugation followed by liquid chromatography coupled to tandem mass spectrometry (LC-MS/MS). Using a variety of detergents for extraction and fractionation conditions, these studies have attempted to characterize different aggregates to identify novel “co-aggregators” with various disease-associated proteins (Table 1) [14–22]. The studies in Table 1 have led to the identification of hundreds of proteins found in aggregates with diverse properties. This includes the amyloid fibrils of Aβ as wells as inclusions enriched with TDP-43 that can also form liquid-like granules. One key distinction between these studies is that several detergents were used, including the nonionic Triton-X100 that permeabilizes membranes to extract proteins and the anionic surfactants Sarkosyl and sodium dodecyl sulfate (SDS) that can also further denature and solubilize proteins. Many amyloid fibrils remain insoluble in SDS, while amorphous aggregates can be more readily solubilized in denaturing conditions. Another experimental variable during the fractionation is the centrifugation step which typically alters from a high speed at 14–20,000  g to ultracentrifugation at 100–500,000  g to thereby enrich for both large and dense aggregates to potentially small protofibrils. Understanding the composition of brain protein aggregates is a key step in identifying the mechanisms of decline in proteostasis. Here we specifically

Isolation of Detergent Insoluble Proteins

31

Table 1 Methods of detergent insoluble protein extraction

speed (g)

Duration (min) Target

Citation

C. elegans WT (N2), fem-1, Sequential: RAB buffer; 1 M gon-2, glp-1, sucrose; RIPA daf-2 (3x)

20,000

NS

[12]

C. elegans WT(N2) and temperature sensitive mutants

Sequential: tris buffer; 1% SDS (3x)

16,000

NS

[11]

C. elegans WT (N2), daf-2, daf-16, hsf-1 and others

Sequential: 500,000 1% Igepal CA630; 0.5% sodium deoxycholate (3x)

10

[10]

C. elegans WT(N2)

Sequential: tris buffer; 1% SDS

20,000

15

[13]

Mouse

Huntingtin’s

2% SDS

16,000

15

Inclusion bodies

[14]

Mouse

Huntingtin’s

1% triton

14,000

20

mHtt aggregates

[15]

Mouse

Alzheimer’s

100,000 Sequential: each PBS insoluble; 0.5% NP-40; 2% deoxycholate; 1% SDS

30

Heterogeneous [25] aggregates

Mouse

Alzheimer’s

1% lauroyl sarcosine 200,000

40

Pre-fibrillar aggregates

Mouse

WT

50,000 0.5% Na-deoxycholate, 1% Igepal CA-630

60

[26]

Mouse

WT

4% SDS

30

[17]

Human

Frontotemporal dementia

30 each 25,000 Sequential: (first and tris buffer; 1% last Triton; 1% Triton step), X-100 with 30% 180,000 Sucrose; 1% (other) N-lauroyl sarcosine; 7 M Urea with 2 M thiourea and 2% SDS

Organism Model

conditions

100,000

TDP-43 positive aggregates

[16]

[19]

(continued)

32

Cristen Molzahn et al.

Table 1 (continued) Duration (min) Target

Citation

1% lauroyl sarcosine 180,000 (2x)

30

Senile plaques and NFTs

[20]

Chronic 1% lauroyl sarcosine 180,000 traumatic (2x) encephalopathy

30

pTau and pTDP-43 positive aggregates

[21]

Organism Model

conditions

Human

Alzheimer’s

Human

speed (g)

present a method based on Triton-X100 extraction and high-speed centrifugation that was used to characterize brain tissues derived from wild type mice. The method is adapted for LC-MS/MS analysis using data independent acquisition (DIA) based on a peptide library generated using data dependent acquisition (DDA). In this particular case, we sought to identify proteins that accumulate in denser particles, including potential amorphous aggregates that may arise during natural aging in the absence of specific genetic factor causing NDD. We show compelling evidence that our proteomic platform enables robust comparison of samples analyzed by DIA.

2 2.1

Materials Tissue Lysis

1. Liquid Nitrogen (lN2) and ice. 2. Mouse brain tissue (see Note 1). 3. 1% Sodium carbonate: dissolve 2 mg Na2CO3 in 200 mL of deionized ultrapure water (ddH2O) (see Note 2). 4. OPS diagnostics cryogrinder kit and drill (Black & Decker). 5. OPS CryoCooler or Styrofoam box (20 cm  20 cm) with a cover cut to fit and stabilize the mortar. 6. Lysis Buffer: 50 mM phosphate buffer pH 7.4, 150 mM NaCl, 1% Triton X-100, 0.5 mM dithiothreitol (DTT), 1 protease inhibitor cocktail (PIC; Roche), 1 mM phenylmethylsulfony fluoride (PMSF). Combine phosphate buffer (0.25 M stock), NaCl (5 M stock) and Triton (10% solution), bring to 90 mL with deionized ultrapure water to make a 1.11 premix and store at 4  C for up to 48 h or freeze if longer storage is required. Immediately before use add to 900 μL 1.11 premix 40 μL 25 PIC (25 stock), 10 μL 100 mM PMSF (stock in methanol), and 50 μL 1 M DTT for each 1 mL of lysis buffer to be used. 7. Bioruptor UCD-200 bath sonicator (Diagenode).

Isolation of Detergent Insoluble Proteins

2.2

Centrifugation

33

1. Beckmann reinforced 1.5 mL centrifuge tubes. 2. Sorvall Legend Micro 21R Ultracentrifuge (Thermo Scientific) prechilled to 4  C. 3. Reducing agent compatible microplate BCA kit (compatible with 5% SDS). 4. Tecan M200 Plate reader.

2.3 Trypsin Digestion of Proteins

1. 1 M triethyl ammonium bicarbonate (TEAB) at pH 7.1 and pH 7.55. The pH of TEAB will change over time. Once you have achieved the desired pH, store stocks at 20  C and check with pH paper before using. 2. 2x stock of the SDS Buffer: 10% SDS (from 20% stock) in 100 mM TEAB pH 7.55 (1 M stock) and prepared fresh for each experiment. 3. S-Trap micro column (Protifi). 4. 0.5 M DTT: 77.125 mg DTT powder and bring to volume of 1 mL with ddH2O. Store at 20  C. 5. 0.56 M chloroacetamide: 52.422 mg brought to volume of 1 mL with ddH2O (see Note 3). 6. 12% phosphoric acid HPLC grade stored in glass vials. 7. Methanol and chloroform 1:2 (HPLC grade). Must be stored in glass vials as the chloroform will leach polymers from plastic tubes. 8. S-trap Binding Buffer: 90% MeOH in 100 mM TEAB pH 7.1 stored at 4  C. 9. Digestion Buffer: 50 mM TEAB pH 7.1. 10. Sequencing grade trypsin (Promega) stored at 70  C. 11. Lowbind tubes: 1.5 mL nonstick surface microcentrifuge tubes (VWR) that reduce polymer contamination, improving data quality. 12. 0.2% formic acid HPLC grade stored in glass vials. 13. 50% acetonitrile (ACN) with 0.2% formic acid 14. Liquid Nitrogen. 15. Micro Modulyo Lyophilizer.

2.4

Stage Tips

1. 200 μL nonstick pipette tips (Sarstedt) with C18 disk (Empore) as prepared in Rappsilber et al. [23]. 2. Buffer A: 0.1% TFA and 2% ACN. 3. Buffer B: 80% ACN and 0.1% TFA. 4. Methanol. 5. Stage tip adapters (https://www.thingiverse.com/thing:4 687978) (see Note 4).

34

Cristen Molzahn et al.

6. Eppendorf miniSpin plus centrifuge. 7. Mass spectrometry (MS) Injection Buffer: 0.1% formic acid. 8. Vacuum centrifuge such as Eppendorf Vacufuge plus. 2.5 Fractionation of Peptide Library

1. Digested sample peptides. 2. High Performance Liquid Chromatography (composite Agilent 1100/1200 HPLC system, temperature controlled autosampler with 40 μL sample loop, inline degasser, binary pump, UV detector, and temperature controlled fraction collector) and Agilent Open Lab CDS ChemStation Edition version C.01.07. 3. C18 column (Agilent Zorbax Extend-C18, 3.5 μm particles). 4. Axygen 96 well PCR microplate. 5. High pH Buffer A: 5 mM NH4HCO2, 2% acetonitrile, pH 10. 6. High pH Buffer B: 5 mM NH4HCO2, 90% acetonitrile, pH 10. 7. Vacuum centrifuge.

2.6 DDA Library Generation

1. Q Exactive HF Orbitrap mass spectrometer coupled to an Easy-nLC 1200 liquid chromatography system (ThermoScientific) configured with a 20 nL sample loop, 30 μm ID steel emitter and fully cleaned, calibrated, and connected or compatible instrumentation. 2. 50 cm μPAC™ column with trap-column (Pharmafluidics). 3. MS Injection Buffer: 0.1% formic acid. 4. Buffer A: 0.1% FA and 2% ACN (LC-MS grade, high purity, low UV absorptivity). 5. Buffer B: 0.1% FA and 95% ACN (LC-MS grade, high purity, low UV absorptivity). 6. NanoDrop™. 7. Indexed Retention Time (iRT) Peptide Kit (Biognosis).

2.7

DIA Mass Spec

1. Q Exactive HF plus Orbitrap mass spectrometer coupled to an Easy-nLC 1200 liquid chromatography system configured with a 20 nL sample loop, 30 μm ID steel emitter and fully cleaned, calibrated, and connected or compatible instrumentation. 2. 50 cm μPAC™ column with trap-column (Pharmafluidics). 3. MS Injection Buffer: 0.1% formic acid. 4. Buffer A: 0.1% FA and 2% ACN (LC-MS grade, high purity, low UV absorptivity). 5. Buffer B: 0.1% FA and 95% ACN (LC-MS grade, high purity, low UV absorptivity).

Isolation of Detergent Insoluble Proteins

35

6. NanoDrop™. 7. Indexed Retention Time (iRT) Peptide kit. 2.8

Data Analysis

1. Spectronaut™ Powered by Pulsar version 14.3.200701.47784 (Copernicus) (Biognosys) (see Note 5). 2. UniprotKB reviewed mouse FASTA (17,056 proteins, downloaded 09/25/2020). 3. R software (version 3.6.1). (a) tidyverse package. (b) MSnbase package.

3

Methods Various mouse brain regions such as cortex or hippocampus can be processed with this method. We recommend that 2 mg of protein be extracted corresponding to 50 mg of wet tissue. Tissue lysis is achieved by cryogrinding (Subheading 3.1) and is immediately followed by the sample fractionation (Subheading 3.2), protein digestion (Subheading 3.3), and stage tipping (Subheading 3.4). This procedure is typically done for multiple samples processed in parallel and up to ten different samples can be prepared over a 3-day period. The samples can then be processed immediately for mass spectrometry analysis (Subheading 3.6) or stored for several days or weeks on stage tips. We also recommend preparing a peptide library to improve the identifications of the DIA (Subheadings 3.7 and 3.8). The peptide library will require another 100 μg of protein material that we typically obtained by taking an aliquot from each sample for 50 μg of pellet and supernatant, each combined 1:1. We recommend that separate libraries are created for each brain region analyzed. It is noteworthy that the pooled sample for library generation can also be composed from a subset of samples if, for example, some showed a lower protein yield. This does not compromise the quality of the library.

3.1

Tissue Lysis

1. Clean OPS diagnostics mortar and pestle by incubating in 1% Na2CO3 for 1 h. Store in ddH2O until ready to use. Let dry immediately before lysis. 2. Chill mortar, pestle, spatula, nonstick tubes in lN2 in the CryoCooler or Styrofoam box (see Note 6). 3. Transfer tissues into the well of the pre-chilled mortar and break tissue into pieces with the pestle. Attach pestle to motor and grind tissue for 4 min, changing direction every 4 s until the tissue forms a powder.

36

Cristen Molzahn et al.

4. Slowly remove pestle against the side of the mortar chamber to remove any remaining tissue. 5. Using a spatula, transfer at least 50 mg tissue to low bind tubes on lN2. Place the tissues on ice for 1–2 min then add lysis buffer (1000 μL/50 mg of tissue). Incubate tissues with lysis buffer on ice for 20 min (see Note 7). 6. Sonicate 2 30 s on the high setting with bath sonicator cooled with ice (see Note 8). Transfer the samples back on ice and immediately proceed to Subheading 3.2. 3.2 Centrifugation and Protein Concentration Measurement

This step enriches for the detergent insoluble proteins. In past experiments about 9% of supernatant was recoverable as protein in this fraction. 1. Centrifuge samples in the centrifuge at 3000  g for 10 min at 4  C to pellet cell debris. Transfer supernatant to a 1.5 mL reinforced Beckmann centrifuge tube (see Note 9). 2. Centrifuge samples at 20,000  g for 30 min at 4  C to pellet the 1% Triton insoluble fraction. Remove supernatant and transfer to a new nonstick tube. Keep on ice. Wash pellet 2 with 1 mL Lysis Buffer without disturbing the pellet and centrifuge at 20,000  g for 10 min each time. 3. Resuspend pellet fraction in 25 μL 1 SDS Buffer (for S-trap micro) that was obtained by diluting the 2 SDS Buffer 1:1 with the Lysis Buffer (see Note 10). 4. Combine 12.5 μL of supernatant with 12.5 μL of 2 SDS Buffer. 5. Measure protein concentrations in aliquots of pellet and supernatant fractions using Pierce microplate BCA Protein Assay kit-reducing agent compatible as described below. 6. Prepare BSA protein standards by diluting the 2 mg/mL BSA stock provided in the kit with 1:1 Lysis Buffer and 2 SDS Buffer mixture. Prepare a series of 7 standards in 50 μL with a range of 0.125–1.5 mg/mL BSA and a blank. 7. Prepare the Working Reagent (from kit) according to the protocol to the appropriate volume for the number of samples. 8. Dilute 15 μL of the samples 2 by adding 15 μL of 1:1 Lysis Buffer and 2 SDS Buffer mixture. 9. Add 9 μL of diluted sample to each well, in technical triplicate. 10. To the sample add 4 μL of Compatibility Reagent (from kit) and incubate at 37  C for 15 min. 11. Add 260 μL of the Working Reagent to each well and incubate at 37  C for 30 min.

Isolation of Detergent Insoluble Proteins

37

12. Allow plate to cool to room temperature for approximately 5 min and then measure the absorbance at 562 nm with a plate reader. 13. Create a calibration curve of absorbance vs. concentration to calculate concentration of samples (see Note 11). 3.3 Protein Digestion and Peptide Elution

1. Transfer appropriate μg of proteins to be processed (1–100 μg for S-trap micro) to a nonstick tube. We typically start with 25 μL of sample volume and volumes indicated below in steps 2–4 can be adjusted accordingly. 2. Reduce by adding 1 μL 0.5 M DTT for a final concentration of ~20 mM. Heat for 10 min at 95  C. 3. Alkylate by adding 2 μL 0.56 M chloroacetamide to final concentration of 40 mM. Incubate for 30 min in dark. 4. To the reduced and alkylated mixture add 3.22 μL 12% phosphoric acid for final concentration of 1.2%. 5. Add 175 μL of S-trap Binding Buffer to the acidified protein solution. If more volume of protein solution is required, increase the amount of S-trap Binding Buffer maintaining a 1: 7 ratio of 1 SDS Lysis mixture:Binding Buffer. 6. Add acidified solution to the spin column up to 100 μL at a time. 7. Place the column in a 1.5 mL nonstick tubes and centrifuge at 4000  g for 30 s or until solution has passed through the spin column. Repeat step 6, if required. 8. Wash the column 3 by adding 150 μL S-trap Binding Buffer and centrifuging at 4000  g for 30 s. 9. Remove lipids by washing once with 150 μL of 2:1 chloroform: MeOH then wash 2 with 150 μL S-Trap binding buffer. After each 150 μL volume centrifuge at 4000  g for 30 s. 10. Move S-trap to a clean nonstick 1.5 mL sample tube. 11. To the column add 20 μL Digestion Buffer to which the protease was added at 1:25 trypsin:protein. Spin protease solution into column briefly and return any solution that passes through to the top of the column. 12. Close the cap to limit evaporative loss (see Note 12). 13. Incubate for 16 h at 37  C. 14. Add 40 μL of Digestion Buffer to the S-Trap column. 15. Centrifuge at 4000  g for 60 s to elute the peptides (see Note 13). 16. Add 40 μL of 0.2% formic acid to the column and spin at 4000  g for 60 s.

38

Cristen Molzahn et al.

17. Elute hydrophobic peptides with 40 μL of 50% aqueous ACN containing 0.2% formic acid and spin at 4000  g for 60 s. 18. Snap freeze peptides in lN2 and lyophilize in the Micro Modulyo Lyophilizer until dry for about 16 h and resuspend in 100 μL 0.1% TFA. Proceed to stage tipping or Subheading 3.5 to generate library. 3.4

Stage Tipping

3.4.1 Prepare Stage Tips

We use two different stage tip methods, depending on the samples. For pellet fraction, and if only processing a small amount of peptides from the supernatant (10 μg or less) we use a low binding capacity (see steps 1 and 2). For the library preparation or if more peptides are processed we use a high capacity stage tip (up to 100 μg; steps 1–4) (see Note 14). 1. Prepare one stage tip per sample. Using a blunt 16-gauge needle punch a circle out of the C18 disk. 2. Transfer the C18 disk into a 200 μL pipette tip by passing a straightened paperclip through the needle forcing the C18 into the tip. For a low binding capacity stage tipping add a second C18 disk and proceed to step 5. For a high capacity stage tip, only add one disk and proceed to step 3. 3. Prepare C18 resin solution by transferring a dry volume of about 100 μL of C18 powder to a nonstick 1.5 mL tube (up to the corresponding mark up line). Add 1 mL of 50% ACN and store at 4  C. 4. Transfer C18 slurry to stage tip 20 μL at a time then centrifuge the tip placed in the stage tip adapter in a 1.5 mL nonstick tube at 1000  g for 1 min or until the ACN solution has passed through each time until there is about 3 mm of C18 in the stage tip.

3.4.2 Conditioning of Stage Tips

1. Rehydrate C18 with 100 μL of MeOH. Spin the stage tip in the adapter in a 1.5 mL nonstick tube at 1000  g until the liquid has passed through, but the C18 is not dry (about 2 min). 2. Wash with 100 μL Buffer B by centrifugation 1000  g until the liquid has passed through. 3. Wash 2 with 100 μL of Buffer A by centrifugation 1000  g until the liquid has passed through. 4. The stage tip is now ready to bind sample and should be used immediately.

3.4.3 Stage Tipping of Samples

1. For low binding capacity stage tips (double C18 disk) transfer 10 μg or less. For the high capacity stage tips transfer ~100 μg peptides 100 μL at a time. Place the stage tip in the adapter and centrifuge at 1000  g for 2–5 min or until the solution has

Isolation of Detergent Insoluble Proteins

39

passed through. Repeat until all the sample is loaded (see Note 14). 2. Wash with 100 μL of Buffer A by centrifugation 1000  g for 2–5 min. The samples can now be stored at 4  C for several days. 3. To elute samples add 100 μL Buffer B and centrifuge at 1000  g for 2–5 min into a new nonstick tube. 4. Dry peptides in a speed vac (about 1.5 h) then proceed to mass spectrometry analysis (Subheading 3.7). For the library samples, resuspend in 30 μL high pH Buffer A and proceed to Subheading 3.5. 5. Store at 20  C until the samples can be run. 3.5 Library Generation

The peptide library is fractionated into eight sample pools using a high pH reverse phase liquid chromatography (RPLC) method adapted from Udeshi et al [24]. The samples are then analyzed by data-dependent mass spectrometry in order to generate a library of MS2 spectra (Subheading 3.6). 1. Combine an equal amount of each pellet and supernatant sample to obtain a total of ~100 μg (50 μg of pellet sample and 50 μg of supernatant). 2. First clean the HPLC column by running 100% high pH Buffer B for 10 min at 50 μL/min. 3. Equilibrate the column by running 100% high pH Buffer A for 40 min at 50 μL/min. 4. Inject 30 μL peptide library sample onto the column and begin collecting fractions in the 96-well plate at 0.66 min/well for 64 min (see Table 2 for gradient). 5. Concatenate 96 fractions by combining every eighth fraction for a total of 8 final concatenated fractions (e.g., the first pool of peptides consist of fractions 1, 9, 17, . . ., 89). 6. Dry samples in the vacuum concentrator (about 1 h) and proceed to mass spectrometry analysis (Subheading 3.6) (see Note 15).

3.6 Mass Spectrometry Analysis of Library Using Data Dependent Acquisition (DDA)

1. Resuspend each concatenated fraction in 12 μL of MS Injection Buffer (0.1% FA with iRT peptide at 1:30 v/v). 2. Calibrate the NanoDrop with 1 μL of MS Injection Buffer. 3. Use 1.2 μL from each concatenated fraction to measure the peptide concentration by NanoDrop (see Note 16). 4. For each concatenated fraction, the concentration was adjusted to a final concentration of 0.1 μg/μL (1.2 μg in 12 μL) using the MS Injection Buffer.

40

Cristen Molzahn et al.

Table 2 High pH peptide fractionation gradient High pH 64 min gradient Time (min)

% Buffer B

0–5

6

5–7

8

7–45

27

45–49

31

49–53

39

53–64

60

64–85

0

5. Load 1 μg of peptides on the 50-cm μPAC™ column and eluted for 85 min, using a 60-min gradient (Table 3). Flow rate during gradient was at 300 nL/min. Sample injection was done at a flow rate of 10 μL/min and pressure was capped at 100 bar. 6. Pre-column equilibration volume was 3 μL at a flow rate of 3 μL/min and max pressure was capped at 200 bar; analytical column equilibration volume was 9 μL at a flow rate of 3 μL/ min and max pressure was capped 200 bar. 7. DDA acquisition for 85 min. (a) 1 MS1 scan, with a mass range of 400–1800 m/z at 60,000 resolution, maximum injection time of 75 ms, and an Automatic Gain Control (AGC) target of 3e6. (b) The 12 top most intense precursors were selected for fragmentation. +1, +5 charged precursors were excluded. (c) Up to 12 MS2 scans are acquired at a 15,000 resolution, maximum injection time of 50 ms, and an AGC target of 5e4. Normalized collision energy was set to 28. Dynamic exclusion of 20 s was used. 8. Repeat steps 2–6 until all pooled peptides are analyzed. 3.7 Mass Spectrometry Analysis of Samples (Data Independent Acquisition, DIA)

1. Resuspend the dried peptide sample (from Subheading 3.4) in 12 μL of MS Injection Buffer (0.1% FA with iRT peptides at 1: 30 v/v). 2. Using the NanoDrop measure the peptide concentration from a 1.2 μL sample aliquot. The NanoDrop was blank with the MS Injection Buffer.

Isolation of Detergent Insoluble Proteins

41

Table 3 LC-MS/MS gradient Time (min)

% Buffer B

0–5

4–9%

5–10

9–10%

10–15

10–12%

15–20

12–14%

20–25

14–15%

25–30

15–17%

30–35

17–18%

35–40

18–19%

40–45

19–21%

45–50

21–24%

50–55

24–27%

55–60

27–80%

60–85

80%

3. For each sample, the concentration was adjusted to a final concentration of 0.125 μg/μL (1.5 μg in 12 μL) using MS Injection Buffer. 4. Load 800 ng (i.e., 6.5 μL of the diluted sample) of peptides on the 50-cm μPAC™ column and then eluted for 85 min using a 60-min gradient (Table 3). Flow rate during gradient rate was 300 nL/min. Sample injection was done at a flow rate of 10 μL/min and pressure was capped at 100 bar. 5. Pre-column equilibration volume was 3 μL at a flow rate of 3 μL/min and max pressure was capped at 200 bar; analytical column equilibration volume was 9 μL at a flow rate of 3 μL/ min and max pressure was capped 200 bar. 6. DIA acquisition for 85 min: (a) 1 MS1 scan, mass range 300–1650 m/z, 120,000 resolution with maximum injection time of 60 ms and an AGC target of 3e6. (b) 24 MS2 DIA scans, 24-variable window format (Table 4) with 30,000 resolution, AGC target of 3e6. Maximum injection time was set to “auto.” (c) Stepped NCE set at 25.5, 27, 30. 7. Repeat steps 2–5 for each sample.

42

Cristen Molzahn et al.

Table 4 Inclusion list and isolation windows for the DIA method

3.8 Spectral Library Generation

Inclusion list

Isolation window (m/z)

367.5

35

398

28

422.5

23

444

22

464

20

483.5

21

503

20

522.5

21

542.5

21

562

20

582

22

603

22

624.5

23

647

24

670.5

25

695.5

27

723

30

754

34

789

38

827.5

41

873

52

932

68

1016.5

103

1358.5

583

1. Spectral library was prepared using Pulsar, Biognosys’ proprietary search engine that is integrated into Spectronaut (see Note 17). 2. To begin, select “Generate Library from Pulsar.” A wizard will appear to assist in the processing. 3. Select “Add Runs from File” to upload input data. Raw MS DDA files collected from the peptide fractions and raw MS DIA files collected from the samples were used as input run files for the spectral library generation.

Isolation of Detergent Insoluble Proteins

43

4. Select sequence databases to be used for the spectral library generation. Mouse fasta file and iRT fasta file were used as sequence databases. 5. Set the search settings as follows (see Note 18): (a) Cleavage rules set to Specific Trypsin/P with a minimum peptide length at 7 and maximum at 52 and missed cleavages set to 2; (b) Carbamidomethyl (C) was set as fixed modification; (c) Acetyl (Protein N-term) and oxidation (M) were set as variable modifications. 6. Additional settings for the spectral library generation would have to be entered before initiating the process (see Note 19). This includes parameters for tolerances (mass tolerances for calibration and main search), identification (search engine thresholds such as FDR for protein, peptide, and PSM), spectral library filters (fragment ion and precursor level settings), iRT calibration (preference for iRT calibration settings), workflow (fragment ion selection strategy and option for in-silico generation of missing channels for isotopically labeled samples) (see Note 20). 3.9 Data Analysis of DIA Samples

1. Analysis of DIA samples was done using the Analysis Perspective wizard in Spectronaut (see Note 21). 2. To begin, select “Set up a DIA Analysis from File.” This will open a file explorer window to select input files for the analysis. Raw MS DIA files obtained from the samples were used as input files. 3. After selecting input files, a wizard will assist in the processing. A prepared spectral library is needed to be assigned for the raw DIA files. 4. Prepare the DIA Analysis Settings Schema (see Note 22). This includes settings parameters for Data Extraction (method used for the calculation of intensity), XIC Extraction (use of iRT to predict peptide elution), Calibration (iRT and m/z calibration settings), Identification (decoys and FDR cutoffs for precursor and protein identification), Quantification (major and minor grouping, minimum and maximum requirements for quantification, normalization), Workflow (specify label-free analysis or other quantification workflow), Protein Inference, Post Analysis (settings for differential abundance testing and clustering), and Pipeline Mode (specification of reports and files to be generated). 5. The default schema (BGS Factory Settings) was used to analyze the DIA files:

44

Cristen Molzahn et al.

(a) Intensity extraction for MS1 and MS2 was set to maximum; MS1 and MS2 tolerance strategy were set to dynamic. (b) XIC Ion Mobility and XIC Retention Time Extraction Windows were both set to dynamic. (c) Calibration Mode was Automatic and MZ Extraction Strategy was set to Maximum; Precision iRT was activated, deamidated peptides were excluded, and Local (non-linear) Regression was selected for the RT iRT Regression Type; MS1 and MS2 Tolerance Strategy was set to System default. (d) Decoys were generated via Mutated method and Decoy Limit Strategy was set to Dynamic with maximum number of decoys set to 0.1 of library size; Machine learning and Q-value calculation were set to Per Run, Protein and Precursor Q-value Cutoffs were set to 0.1; Single hit definition was by Stripped Sequence; PTM Localization score was calculated, and Probability cutoffs were set to 0.75; P-value Estimator was set to Kernel Density Estimator. (e) Proteotypicity filter was set to none; Major (Protein) Grouping was done by Protein Group Id and Minor (Peptide) Grouping was done by Stripped Sequence; Major Group Quantity was calculated by Mean Peptide Quantity, selecting only Top N peptides and requiring a minimum of 1 and maximum of 3 peptides; Minor Group Quantity was calculated by Mean Precursor Quantity, selecting only Top N precursors and requiring a minimum of 1 and maximum of 3 precursors. Quantity MS Level was set to MS2; Quantity Type was set to Area; Data Filtering was done using Q-value; Cross Run Normalization was enabled; Normalization Strategy was set on Global Normalization on median. 6. Data set containing identified protein groups and quantified intensities was exported from Spectronaut. 7. Missing values were imputed via the function “impute” in the R package “MSnbase” using the “MinProb” method (see Note 23). 8. Differential analyses were done using R and the LIMMA package. Two differential analyses were conducted with each one holding one variable constant (i.e., either age (young vs old) or type (supernatant vs pellet)). This was done since the two variables are not independent covariates. The analyses yield two p-values and fold changes that were then plotted onto FC–FC plots.

Isolation of Detergent Insoluble Proteins

3.10 directDIA Analysis

45

directDIA is similar to the DIA analysis described in Subheading 3.9, except no spectral library is required for this analysis. 1. To begin, select “Set up a directDIA Analysis from File.” This will open a file explorer window to select input files for the analysis. Raw MS DIA files from the “standard pool” runs were used as input files. 2. Select fasta files that will be used as sequence database. Mouse fasta and iRT fasta files were used as sequence databases. 3. Set search settings and directDIA Analysis Settings schema as follows (see Note 24): (a) Cleavage rules to Specific Trypsin/P, with minimum peptide length at 7 and maximum at 52. Missed cleavages set to 2; (b) Fixed modification was Carbamidomethyl (C); (c) Variable modifications were Acetyl (Protein N-term) and Oxidation (M); (d) The default schema (BGS Factory Settings) was used to perform directDIA analysis. 4. Analysis of the directDIA on “pooled standard” was accomplished using R software. Protein groups intensity were obtained from the Report tab on Spectronaut. A Run Pivot Report for Protein Quant was used to export a data set. This data set has the separate DIA runs of the “pooled standard” as columns and the identified and/or quantified protein groups as rows.

3.11 Analysis of Pooled Standards

This analysis is performed to estimate the precision of the DIA measurement by comparing data obtained from technical replicates (see Subheading 3.12) 1. To demonstrate the consistency of the protein group identifications, data completeness was determined for each protein group in the report by determining the number of non-missing and missing (i.e., NA) values. Here, a protein group is 100% complete if it was quantified in all 8 “pooled standard” runs. 2. Coefficient of variation (CV) for each protein group was calculated by dividing the standard deviation with the mean then multiplied by 100 per row. These calculations were obtained using base R functions. To get an overall CV distribution, the calculated CV from the entire data set was plotted as a violin plot using the ggplot package. Here, the width of the plot correlates with the number of protein groups with a particular CV. In addition, we also created a bar plot using the ggplot package to further emphasize reproducibility. For this plot, we

46

Cristen Molzahn et al.

categorized the protein groups into 4 bins: CV < 10%, 10% < CV < 20%, 20% < CV < 50%, and CV > 50%. 3. To determine the correlation between the protein group intensity and CV, we first calculated the log10 mean intensity of each protein group. To prepare the plot, we used the ggplot package to prepare a scatter plot with the log10 mean intensity on the x-axis while the CV is on the y-axis. 3.12 Results: Reproducibly of DIA Workflow

We performed a series of analysis from a brain cortex tissue from 15- to 100-week-old male C57BL/6J mice to assess reproducibility of the DIA using technical replicates of a supernatant fraction. The tissue was processed as described in Subheadings 3.1–3.3 over a 3-day period then processed by stage tipping in a high/low binding capacity stage tip. To generate the standard supernatant sample, aliquots of all supernatant fractions were combined for a total of ~10 μg, then analyzed as described in Subheadings 3.7 and 3.9. The peptide library was generated in parallel from pooling samples from 28 mouse cortexes in the two age groups following the procedure described in Subheading 3.1–3.6, then Subheading 3.8. In that particular case a total of 100 μg of pooled peptides was used for the RPLC (Subheading 3.5). To control for reproducibility of the mass spectrometric data acquisition, we injected a “standard pool” sample at regular intervals among another 56 samples. A total of 8 injections were performed for this analysis. Cumulatively, we quantified 31,463 peptides and 3521 protein groups from the “standard pool” injections using directDIA analysis. Of these, 31,195 peptides (>99%) and 3507 (>99%) were quantified in all injections (Fig. 1a). These results illustrate the high reproducibility of identifications in the DIA analysis and justify imputing missing values using low values. We then calculated the coefficient of variation (CV) of the protein intensities. The protein group intensities were calculated based on the average intensity of the top 1–3 peptides, as per the Spectronaut directDIA schema used (Subheading 3.9). We found a median CV of 7.5% for all the protein groups in the 8 “standard pool” injections (Fig. 1b). Furthermore, majority of the proteins (2346, 67%) we detected in the “standard pool” had a CV that was less than 10% (Fig. 1c). An additional 790 (22%) protein groups had a CV that was within 10–20%. Finally, we determined the correlation between the observed CV and the reported intensities of each protein group. We found that protein groups that had a lower mean intensity expectedly had a greater CV, whereas protein groups that had a higher mean intensity had lower CV (Fig. 1d). These results confirm the overall high quantitative precision of the DIA analysis. This will directly translate to an improved ability to identify proteins that are differentially enriched between pellet and supernatant. It should however be noted the sample processing workflow does introduce variability that is not accounted for using repeat injections of the standard pool sample.

Isolation of Detergent Insoluble Proteins

47

Fig. 1 Technical replicate analyzed by DIA. (a) Protein and peptide completeness. The number of times a protein or peptide was detected in the “standard pool” injections was determined, with 8 being the maximum. Proteins or peptides were arranged on the x-axis with those having 100% completeness going first. The plot is annotated at the end of the 100% completeness with the number and percentage of proteins and peptides corresponding to that point. (b) Overall Protein CV. The CV for each protein group was calculated and plotted as a violin plot. Here, the plot shows the distribution of the CVs from the entire data set, with the wider areas indicating more protein groups having that particular CV. (c) Stratified Protein CV. Calculated protein CVs were stratified into 4 bins: 50% to demonstrate that lower CVs were observed compared to higher ones. (d) Relationship between CV and mean protein intensity. The log10 mean protein intensity and log10 CV were plotted as a scatterplot and a linear correlation line was added to demonstrate the relationship between the two variables. Here, proteins with a lower mean intensity also generally had a higher CV

4

Notes 1. With the help of our collaborators, we obtained mouse brain regions for analysis. Mice were terminated with anesthesia (isoflurane gas) and asphyxiation with carbon dioxide. Death was confirmed by inserting a needle into the heart rather than by cervical dislocation in order to preserve brain tissue. Mice were profused with 1x PBS containing halt protease and phosphatase inhibitor cocktail (ThermoFisher). This step improves identifications by eliminating serum proteins. For our analysis, the whole mice brain is typically split along the two hemispheres and dissected in different regions to collect the cortex, hippocampus, and cerebellum. The tissues are then immediately snap frozen in lN2 where they are stored until processing. Importantly, tissues cannot be fixed. In this method we present

48

Cristen Molzahn et al.

results from mouse cortex samples corresponding to 50–100 mg of tissue from which we extract about 5% in proteins. 2. If not specified all buffer or reagents are dissolved in deionized ultrapure water. If applied: all glassware and re-usable plasticware used for mass spectrometry reagents are washed and rinsed manually and kept separate from other glassware washed by our central facility. 3. Chloroacetamide solution must be used immediately. 4. The adapter is used to hold the tip in a 1.5 mL tube for centrifugation. We use a Prusa I3 MK3S 3D printer and 1.75 mm Polylactic Acid (PLA) filament. 5. The software can be used on any system meeting the minimum system requirements. 6. Massing cooled low bind tubes before and after transferring tissue are useful in calculating the amount of tissue transferred. 7. Leave tubes open as they warm on ice to prevent pressure from building within the tube and expelling tissue. 8. Homogenization with ten passages with 27G needle has also been used, but becomes inconvenient with a large number of samples. 9. This is particularly important at higher speeds however, our nonstick tubes have a max of 14,000  g so reinforced tubes were required even at 20,000  g. 10. Bath sonication with the bioruptor may be required in order to solubilize pellet (increments of 30 s on the high setting until pellet is no longer visible). 11. We first calculate the average values for each standard, then plot a linear regression within the linear range to derive the slope equation. Using that equation, we then calculate x using the averaged absorbance of the sample. 12. In order to prevent evaporation of digestion buffer, create a humid environment for incubation by placing S-trap columns in a container with water at the bottom. 13. Most peptides will be eluted in this step. 14. There may not be enough material from the pellet fraction to use the high capacity stage tips. Instead, use 2 layers of C18 disk and apply no more than 10 μg of protein to the stage tip. 15. Peptides can be stored at 20  C. If not running immediately, peptides should be stored dry. 16. Sample concentration was measured using the NanoDrop with the Protein A280 setting.

Isolation of Detergent Insoluble Proteins

49

17. For more details on spectral library generation in Spectronaut refer to the user manual, Subheading 3.3. 18. More detailed explanation of the different parameters in the spectral library preparation schema is available in the user manual, Appendix 7.4. 19. More detailed explanation of the search settings in Spectronaut is available in the user manual, Appendix 7.2. 20. The default settings were used (BGS Factory Settings) to generate spectral library: Mass tolerance for the calibration was set to “Dynamic” with MS1 and MS2 Correction factor set to 1. Protein, peptide, and PSM FDR were set to 0.01. Single Hit Proteins were not excluded, and protein localization filter was not enabled. Fragment ions were required to have minimum 3 amino acids, 300–1800 m/z and >5% relative intensity; precursor ions settings required a minimum and maximum number of fragment ions of 3 and 6, respectively. iRT Reference Strategy was set to Deep Learning Assisted iRT regression. 21. For more details on DIA or directDIA analysis in Spectronaut refer to the user manual, Subheading 3.4. 22. More detailed explanation of the different parameters in the DIA analysis schema is available on the user manual, Appendix 7.1. 23. “MinProb” method imputes left-censored missing data by random drawing from a Gaussian distribution with mean equals to the q-th quantile (default q ¼ 0.01) of the observed values in that sample. The standard deviation is estimated by the median of the protein-wise standard deviations. 24. More detailed explanation of the search settings and directDIA analysis schema is available in the Spectronaut user manual, Appendix 7.3.

Acknowledgments We thank the members of the Karsan lab for their contribution in housing, sacrificing and dissection of the mice for this project. Additionally, we thank Jing Wang from the Cashman lab for the brain region dissections. The method for 3D printing of the stage tip adapters was provided by Mang Zhu. This was supported by Canadian Institute of Health Research (CIHR; PJT-148489).

50

Cristen Molzahn et al.

References 1. Taylor JP, Hardy J, Fischbeck KH (2002) Toxic proteins in neurodegenerative disease. Science 296(5575):1991–1995 2. Bucciantini M, Giannoni E, Chiti F, Baroni F, Formigli L, Zurdo J, Taddei N, Ramponi G, Dobson CM, Stefani M (2002) Inherent toxicity of aggregates implies a common mechanism for protein misfolding diseases. Nature 416(6880):507–511 3. Lue LF, Kuo YM, Roher AE, Brachova L, Shen Y, Sue L, Beach T, Kurth JH, Rydel RE, Rogers J (1999) Soluble amyloid beta peptide concentration as a predictor of synaptic change in Alzheimer’s disease. Am J Pathol 155(3): 853–862 4. Campioni S, Mannini B, Zampagni M, Pensalfini A, Parrini C, Evangelisti E, Relini A, Stefani M, Dobson CM, Cecchi C, Chiti F (2010) A causative link between the structure of aberrant protein oligomers and their toxicity. Nat Chem Biol 6(2):140–147 5. Moechars D, Dewachter I, Lorent K, Reverse´ D, Baekelandt V, Naidu A, Tesseur I, Spittaels K, Haute CV, Checler F, Godaux E, Cordell B, Van Leuven F (1999) Early phenotypic changes in transgenic mice that overexpress different mutants of amyloid precursor protein in brain. J Biol Chem 274(10): 6483–6492 6. Dewachter I, van Dorpe J, Spittaels K, Tesseur I, Van Den Haute C, Moechars D, Van Leuven F (2000) Modeling Alzheimer’s disease in transgenic mice: effect of age and of presenilin1 on amyloid biochemistry and pathology in APP/London mice. Exp Gerontol 35(6–7):831–841 7. Olzscha H, Schermann SM, Woerner AC, Pinkert S, Hecht MH, Tartaglia GG, Vendruscolo M, Hayer-Hartl M, Hartl FU, Vabulas RM (2011) Amyloid-like aggregates sequester numerous metastable proteins with essential cellular functions. Cell 144(1):67–78 8. Bouchard M, Zurdo J, Nettleton EJ, Dobson CM, Robinson CV (2000) Formation of insulin amyloid fibrils followed by FTIR simultaneously with CD and electron microscopy. Protein Sci 9(10):1960–1967 9. Ben-Zvi A, Miller EA, Morimoto RI (2009) Collapse of proteostasis represents an early molecular event in Caenorhabditis elegans aging. Proc Natl Acad Sci U S A 106(35): 14914–14919 10. Walther DM, Kasturi P, Zheng M, Pinkert S, Vecchi G, Ciryam P, Morimoto RI, Dobson CM, Vendruscolo M, Mann M, Hartl FU

(2015) Widespread proteome remodeling and aggregation in aging C. elegans. Cell 161(4): 919–932 11. Reis-Rodrigues P, Czerwieniec G, Peters TW, Evani US, Alavez S, Gaman EA, Vantipalli M, Mooney SD, Gibson BW, Lithgow GJ, Hughes RE (2012) Proteomic analysis of age-dependent changes in protein solubility identifies genes that modulate lifespan. Aging Cell 11(1):120–127 12. David DC, Ollikainen N, Trinidad JC, Cary MP, Burlingame AL, Kenyon C (2010) Widespread protein aggregation as an inherent part of aging in C. elegans. PLoS Biol 8(8): e1000450 13. Xie X, Chamoli M, Bhaumik D, Sivapatham R, Angeli S, Andersen JK, Lithgow GJ, Schilling B (2020) Quantification of insoluble protein aggregation in Caenorhabditis elegans during aging with a novel data-independent acquisition workflow. J Vis Exp 162:e61366 ´ ngel S, Schaefer MH, 14. Hosp F, Gutie´rrez-A Cox J, Meissner F, Hipp MS, Hartl FU, Klein R, Dudanova I, Mann M (2017) Spatiotemporal proteomic profiling of Huntington’s disease inclusions reveals widespread loss of protein function. Cell Rep 21(8):2291–2303 15. Sap KA, Guler AT, Bezstarosti K, Bury AE, Juenemann K, Demmers JAA, Reits EA (2019) Global proteome and ubiquitinome changes in the soluble and insoluble fractions of Q175 Huntington mice brains. Mol Cell Proteomics 18(9):1705–1720 16. Thygesen C, Metaxas A, Larsen MR, Finsen B (2018) Age-dependent changes in the sarkosylinsoluble proteome of APPSWE/PS1ΔE9 transgenic mice implicate dysfunctional mitochondria in the pathogenesis of Alzheimer’s disease. J Alzheimers Dis 64(4):1247–1259 17. Kelmer Sacramento E, Kirkpatrick JM, Mazzetto M, Baumgart M, Bartolome A, Di Sanzo S, Caterino C, Sanguanini M, Papaevgeniou N, Lefaki M, Childs D, Bagnoli S, Terzibasi Tozzini E, Di Fraia D, Romanov N, Sudmant PH, Huber W, Chondrogianni N, Vendruscolo M, Cellerino A, Ori A (2020) Reduced proteasome activity in the aging brain results in ribosome stoichiometry loss and aggregation. Mol Syst Biol 16(6):e9596 18. Pace MC, Xu G, Fromholt S, Howard J, Crosby K, Giasson BI, Lewis J, Borchelt DR (2018) Changes in proteome solubility indicate widespread proteostatic disruption in mouse models of neurodegenerative disease. Acta Neuropathol 136(6):919–938

Isolation of Detergent Insoluble Proteins 19. Seyfried NT, Gozal YM, Donovan LE, Herskowitz JH, Dammer EB, Xia Q, Ku L, Chang J, Duong DM, Rees HD, Cooper DS, Glass JD, Gearing M, Tansey MG, Lah JJ, Feng Y, Levey AI, Peng J (2012) Quantitative analysis of the detergent-insoluble brain proteome in frontotemporal lobar degeneration using SILAC internal standards. J Proteome Res 11(5): 2721–2738 20. Hales CM, Dammer EB, Deng Q, Duong DM, Gearing M, Troncoso JC, Thambisetty M, Lah JJ, Shulman JM, Levey AI, Seyfried NT (2016) Changes in the detergent-insoluble brain proteome linked to amyloid and tau in Alzheimer’s disease progression. Proteomics 16(23): 3042–3053 21. Cherry JD, Zeineddin A, Dammer EB, Webster JA, Duong D, Seyfried NT, Levey AI, Alvarez VE, Huber BR, Stein TD, Kiernan PT, McKee AC, Lah JJ, Hales CM (2018) Characterization of detergent insoluble proteome in chronic traumatic encephalopathy. J Neuropathol Exp Neurol 77(1):40–49 22. Xu G, Fromholt SE, Chakrabarty P, Zhu F, Liu X, Pace MC, Koh J, Golde TE, Levites Y,

51

Lewis J, Borchelt DR (2020) Diversity in Aβ deposit morphology and secondary proteome insolubility across models of Alzheimer-type amyloidosis. Acta Neuropathol Commun 8(1):43 23. Rappsilber J, Mann M, Ishihama Y (2007) Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat Protoc 2(8): 1896–1906 24. Udeshi ND, Mertins P, Svinkina T, Carr SA (2013) Large-scale identification of ubiquitination sites by mass spectrometry. Nat Protoc 8(10):1950–1960 25. Xu G, Stevens SM, Moore BD, McClung S, Borchelt DR (2013) Cytosolic proteins lose solubility as amyloid deposits in a transgenic mouse model of Alzheimer-type amyloidosis. Hum Mol Genet 22(14):2765–2774 26. Albu RF, Chan GT, Zhu M, Wong ET, Taghizadeh F, Hu X, Mehran AE, Johnson JD, Gsponer J, Mayor T (2015) A feature analysis of lower solubility proteins in three eukaryotic systems. J Proteome 118:21–38

Chapter 4 Rodent Lung Tissue Sample Preparation and Processing for Shotgun Proteomics Hadeesha Piyadasa, Ying Lao, Oleg Krokhin, and Neeloffer Mookherjee Abstract Mass spectrometry (MS) is a routinely used approach to characterize global protein profile in various biological samples. Here we describe rodent lung tissue homogenization, sample preparation, and liquid chromatography with tandem mass spectrometry (LC-MS/MS) method for shotgun proteomics. Key words Rodent lungs, Sample preparation, Mass spectrometry, Shotgun proteomics

1

Introduction Changes in the proteome of tissues isolated from rodent models of disease have been pivotal in understanding the fundamental mechanistic processes involved in disease pathogenesis, including in studies related to respiratory disease [1, 2]. Mass spectrometry (MS)-based approaches can be utilized to identify and characterize global protein/peptide changes in lung tissues. However, meticulous and consistent sample preparation of complex biological starting material is crucial for successful downstream MS analysis. Therefore, in this chapter we outline a carefully optimized sample preparation and processing workflow that can be used for rat and mouse lung tissues proteome characterization, using liquid chromatography with tandem mass spectrometry (LC-MS/MS) method [3, 4].

2

Materials

2.1 Tissue Homogenization and Protein Extraction

1. Homogenization buffer (Keep on ice): (50 mM Tris-HCl (pH 7.5), 2 mM MgCl2, 150 mM NaCl, 1% sodium deoxycholate, 1% SDS) with 1 protease inhibitor cocktail (PIC).

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_4, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

53

54

Hadeesha Piyadasa et al.

2. Phosphate-buffered saline (PBS). 3. Sterile scalpel. 4. Homogenizer (Cole-Parmer LabGEN 125 homogenizer). 5. Homogenizer tips (Cole-Parmer LabGEN Plastic Tip Probes). 6. Centrifuge. 2.2 Total Protein Quantitation

1. Pierce Micro BCA Protein Assay Kit. 2. Microplate, non-treated, 96-well, flat-bottom, Clear. 3. Microplate sealing tapes for 96-well plates. 4. Microplate readers. 5. 37  C oven.

2.3

SDS-PAGE

1. XCell SureLock™ (Invitrogen™).

Mini-Cell

electrophoresis

system

2. Invitrogen™ 4–12% Tris-Bis Gels, 10 well (Invitrogen™). 3. 20 MOPS SDS Running buffer (Invitrogen™). 4. 4 NuPAGE™ LDS Sample buffer (Invitrogen™). 5. NuPAGE™ Antioxidant (Invitrogen™). 6. 10 NuPAGE Reducing agent (Invitrogen™). 7. Amersham™ ECL™ Rainbow™ Marker. 8. GelCode™ Blue Stain Reagent. 9. Square Disposable Petri Dish. 2.4 Protein Reduction and Alkylation

1. 100 mM Dithiothreitol (DTT).

2.5 Protein Clean Up Using Single-Pot, Solid-Phase-Enhanced Sample Preparation (SP3)

1. Cytiva Sera-Mag™ Speedbead Carboxylate-modified Magnetic Beads (hydrophilic), 50 mg/mL.

2. 500 mM Iodoacetamide (IAA).

2. Cytiva Sera-Mag™ Speedbead Carboxylate-modified Magnetic Beads (hydrophobic), 50 mg/mL. 3. LC-MS grade water. 4. 500 mM Tris-HCl buffer (pH 7.0). 5. 70% ethanol. 6. 100% acetonitrile. 7. Magnetic stand such as DynaMag™-2 Magnet.

2.6

Trypsin Digestion

1. 50 mM Tris-HCl buffer (pH 8.0). 2. Promega Sequencing Grade Modified Trypsin (Promega, V5111).

Rodent Lung Tissue Sample Preparation and Processing for Shotgun Proteomics

2.7

Peptide Clean Up

55

1. SOLA™ HRP SPE. 2. 100% acetonitrile. 3. Conditioning buffer: 0.1% trifluoroacetic acid (TFA). 4. Elution buffer: 65% acetonitrile, 0.1% TFA in water. 5. A syringe plunger (or a vacuum manifold).

2.8

Instruments

LC: Nano-LC system Easy-nLC 1200 system. MS: Orbitrap Q Exactive HF-X. Other instruments used: NanoDrop™ Spectrophotometer 2000.

3

Methods

3.1 Tissue Homogenization

1. Prepare homogenization buffer (500 μL per tissue) and keep on ice. 2. Prepare PBS with 1 PIC (5 mL per tissue +2 to 3 mL extra to wash homogenizer tips). Aliquot 5 mL into 15 mL falcon tubes based on number of samples and keep on ice. 3. Remove frozen lung tissues from 80  C or liquid nitrogen storage. Mar the surface of frozen lung fragments with sterile scalpel (to increase surface area) immediately. 4. Place marred lung tissues in the 15 mL tubes containing PBS + PIC and rotate slowly at 4  C for 30 min. 5. Aliquot 300 μL of ice-cold homogenization buffer into round bottom 5 mL polypropylene tubes and keep on ice. 6. Once lungs have been washed for 30 min, transfer the tissue into the round bottom 5 mL tubes containing homogenization buffer on ice. 7. Carefully homogenize on ice while preventing foam buildup (see Note 1). 8. Use 200 μL of homogenization buffer to wash the homogenizer tip and collect wash off into the round bottom tube to ensure recovery of all sample from the tip. 9. Once all tissues are homogenized, transfer contents of 5 mL round bottom tube into 1.5 mL Eppendorf tubes. Then centrifuge the samples at 10,000  g for 10 min at 4  C. 10. Aliquot supernatants (soluble proteins) as desired. Keep a 20 μL aliquot of each sample on ice to perform microBCA analysis for protein quantification. 11. Freeze all samples at 80  C for long-term storage.

56

Hadeesha Piyadasa et al.

3.2 Protein Quantification

Micro BCA Protein Assay 1. If samples were frozen, then thaw all samples in Eppendorf tubes on ice and centrifuge samples at 1000  g for 5 min at 4  C. 2. Add 100 μL of standard (BSA) in duplicates in columns 1 and 2 of a 96-well untreated microplate. Dilution and concentration range as per manufacture recommendation (see Note 2). 3. Samples should be appropriately diluted based on starting material to fall within the Micro BCA detection range (see Note 3). 4. Once dilution ratios are confirmed, add appropriate volume of water followed by samples to wells in the 96-well microplate. 5. Calculate the amount of Micro BCA reagents required (provided in the kit) based on number of wells that require reagents. For example, for 60 wells (including 16 wells for standards) use as follows: Micro BCA™ Reagent A: 50 μL  60 ¼ 3000 μL, Micro BCA™ Reagent B: 48 μL  60 ¼ 2880 μL, Micro BCA™ Reagent C: 2 μL  60 ¼ 120 μL. The Micro BCA™ protein assay kit contains the Reagents A, B, and C. 6. Combine Regents A, B, and C in a falcon tube. Immediately add 100 μL of the complete Micro BCA™ reagents to each well containing sample or standard. Use of a multichannel pipette is recommended. 7. Seal the 96-well microplate with a microplate sealing tape, and incubate the plate in a 37  C oven for 60 min. 8. Determine absorption at 562 nm using a standard spectrophotometer or a microplate reader. 9. Determine protein concentration using standard curve generated by BSA.

3.3 Quality Assessment

Gel Electrophoresis with NuPAGE™ System: 1. Based on protein concentration of each sample, aliquot 10 μg of total protein per sample into a 1.5 mL Eppendorf tubes. All samples should have equal protein content prior to loading onto the gel. 2. Add ddH2O to each sample to ensure that all samples have equal volume. The master mix reagent volumes shown above are for a total sample volume of 20 μL as follows. Add ddH2O to each sample volume to make up the total volume to 13 μL, and add 7 μL of Master Mix buffer to each sample (Table 1), for a total sample volume of 20 μL (see Note 4). 3. Use a vortex to mix sample well.

Rodent Lung Tissue Sample Preparation and Processing for Shotgun Proteomics

57

Table 1 Sample master mix Reagents for master mix

Per sample (for a total sample volume is 20 μL)

NuPAGE LDS sample buffer (4)

5 μL

NuPAGE reducing agent (10)

2 μL

4. Pulse centrifuge samples for ~5 s to collect solution at bottom of microcentrifuge tubes. 5. Make extra master sample buffer for blank loading as required. Use 5 μL of the molecular weight marker. 6. Heat all samples, at 80–90  C for 8–10 min using a water bath. 7. Pulse centrifuge samples for ~5 s to collect solution at bottom of microcentrifuge tubes. 8. Take the white strip off the gels, and place comb faces inside, touching bottom of chamber, clamp tightly with inner part fitted into the outer container part of the casting. 9. Pour running buffer in the front chamber to the fill line, ensuring no leaks. Then, pour running buffer in the back chamber, filling to approximately the same height. 10. Take out the comb and add 500 μL of antioxidant to the inner chamber (optional, recommended). Then, flush the wells carefully with the running buffer to clean out residues, by using a 1 mL pipette. 11. Carefully load samples and molecular weight markers to desired lanes. Load all empty wells with blank sample buffer/ mock samples. 12. Run gel at 150 V until the bromophenol blue lines reaches the bottom edge of the gel (about 90 min). 13. Remove gel from cassette and transfer into a large container with ddH2O (~500–1000 mL). 14. Place on top of an orbital shaker at a low speed to wash the gel to remove salts. Replace ddH2O every 20 min for 1 h. 15. Place gel into a Square Petri Dish. 16. Add GelCode™ Blue Stain until gel is well submerged. 17. Incubate overnight at room temperature. 18. Cover the dish to prevent evaporation. 19. Wash gel in ddH2O (3, 10 min each) to remove stain. 20. Image gel.

58

Hadeesha Piyadasa et al.

3.4 Sample Preparation for MS 3.4.1 To Prepare Protein Lysate for SP3 Protein Clean Up

1. Based on protein concentration of the sample (as determined by Micro BCA assay) aliquot 100 μg of total protein into a 1.5 mL Eppendorf tube. 2. Add 100 mM DTT to 100 μg of lysate for a final concentration of 10 mM DTT and incubate at 57  C for 30 min. 3. Cool samples to room temperature prior to adding IAA. 4. Add IAA (500 mM) to sample for a final concentration of 50 mM IAA. Incubate samples for 45 min in the dark at room temperature. 5. Add DTT (100 mM) for a final concentration of 20 mM DTT per sample, to quench excess IAA in solution. 6. Vortex for 10 min at room temperature at 800 rpm setting on the vortex mixer.

3.4.2 To Prepare SP3 Beads

1. Equilibrate beads to room temperature. 2. Combine 20 μL of Sera-Mag™ (hydrophilic) and 20 μL of Sera-Mag™ (hydrophobic) in a 1.5 mL Eppendorf tube. 3. Add 500 μL of LC-MS grade water to rinse beads. 4. Place Eppendorf tube containing beads on a magnetic stand and let beads settle for 2 min. 5. Remove and discard supernatant. 6. Repeat steps 3–5 twice. 7. Reconstitute beads in LC-MS grade water at a concentration of 20 μg solid/μL (at this stage, the prepared beads can be stored in fridge).

3.4.3 Perform Protein Clean Up

1. Use minimum volume to adjust prepared protein lysate samples to pH 7 with 500 mM Tris-HCl (pH 7.0). 2. Add 10 μL of prepared beads to lysate sample at a 1:2 (w/w) lysate: bead ratio. 3. Add acetonitrile to a final concentration of 70% to promote protein binding to the beads. 4. Incubate the samples for 18 min at room temperature. 5. Place sample on magnetic stand for 1 min. 6. Remove and discard supernatant. 7. Wash pellet with 200 μL of 70% ethanol, while keeping the sample on the magnetic rack. 8. Remove wash solution from beads. 9. Repeat steps 7 and 8 once more. 10. Wash pellet with 200 μL of acetonitrile, while keeping the sample on magnetic rack.

Rodent Lung Tissue Sample Preparation and Processing for Shotgun Proteomics

59

11. Remove wash solution from the beads and proceed to trypsin digestion. Avoid keeping beads dry for extended periods of time. 3.4.4 Trypsin Digestion

1. Prepare a digestion buffer by dissolving 20 μg of trypsin in 50 mM Tris-HCl buffer at a concentration of 0.1 μg/μL. 2. Add 40 μL of digestion buffer to the protein sample at an enzyme: protein ratio of 1:25. 3. Incubate samples at 37  C for 16 h. 4. Sonicate samples for 15 s. 5. Quick pulse in a microfuge to pull the beads and solution from side walls back into solution. 6. Place samples on a magnetic rack for 1 min. 7. Collect and transfer peptide solution to a new Eppendorf tube. 8. Acidify sample to stop trypsin activity with TFA solution for a final concentration of 0.5% TFA. At this stage the samples can be stored at 80  C until the next step.

3.4.5 Peptide Desalting (See Note 5)

1. Activate a cartridge with 1 mL of acetonitrile by applying positive pressure with a syringe at a rate of one drop per second. It is important to keep the sorbent wet throughout the procedure. 2. Add 1 mL of 0.1% TFA to condition the cartridge. 3. Push solvent through with a syringe. 4. Load peptide sample onto a cartridge. 5. Repeat step 3. 6. Wash the peptide-bound column with 1 mL of 0.1% TFA. 7. Repeat step 3. 8. Add 150 μL of elution buffer to elute peptides with a syringe plunger into a new Eppendorf tube. 9. Apply another 150 μL of elution buffer and collect eluate into the sample tube. 10. Lyophilize peptide sample in vacuo.

3.5 Running Samples on LC-MS/MS 3.5.1 Preparation Prior to LC-MS Analysis 3.5.2 Determine Peptide Concentration Using NanoDrop™ Spectrophotometer

1. Assume the peptide yield from protein digestion is 50%, reconstitute sample in LC-MS grade 0.1% FA for a peptide concentration of ~0.5 μg/μL. 2. Vortex samples for 15 min at 1000  rpm. 1. Select protocol: Measure peptides at 205 nm wavelength. 2. Clean UV measuring cuvette with 5 μL of 0.5 M HCl for 1 min.

60

Hadeesha Piyadasa et al.

3. Blank the machine with diluent (LC-MS grade 0.1% formic acid) and measure the blank ( pyro-Gln Glu- > pyroGlu DimethNter0

MS2 Ion types

a b

MS2 DeMultiplexing

Automatic

supernatant. Beads must be immersed and resuspended in solution to have a good wash. 6. For stock preparations, the volume of HPLC water is calculated based on bead concentration and number of samples. For example, 20 μL beads stock is reconstituted in 10 μL water for one sample to keep the working volume low. 7. Prepare enough SP3 bead mix stock sufficient for this step and for the dimethylation step. 8. Note: Calculate the volume of 100% ethanol using the formula below: Volume of EtOH ¼

ðSample volume þ Bead volumeÞ x 80% ð%Stock of ethanol  80%Þ

120

Anuli C. Uzozie et al.

9. Heavily clogged lysate sample after adding SP3 beads could result from high amount of DNA in samples. Use required amount of benzonase and incubate samples. Other methods such as sonication can be used to efficiently shear DNA. 10. Formaldehyde and sodium cyanoborohydride are toxic and should be handled with extra care in a fume hood. 11. Do not use a strong acid. Ensure Tris is not acidic. If acidic, hydrogen cyanide (HCN) can be produced. 12. Ensure beads are fully immersed in the solution. Do not pipette mix! Gently push beads into the solution with a pipette tip. 13. Approximately 10% sample volume or 10 μg of protein, whichever is less, is sufficient. 14. StageTip preparation: fit four layers of circular Empore octadecyl C18 extraction disk into a P200 pipette tip as described by Rappsilber and colleagues [16]. Select the appropriate C18 column based on protein starting amount. The following columns were tested in our protocol. l

1–5 μg: self-packed 4-layered C18 StageTips

l

5–20 μg: Nest Group microspin column

l

20–100 μg: Nest Group macrospin column

l

100–1000 μg: Waters SepPak columns.

15. Provide manual force through a syringe to push sample through StageTips. And for SpinColumns, centrifuge at room temperature, 600 rcf for 1 min (see Table 1). 16. Top up sample amount with 0.1% TFA in HPLC water to maintain the required sample loading volume provided in Table 1 for each column type. 17. Due to limited capacity of PCR 96-well plates, the maximum sample volume in each well should be less than 20 μL before the start of applications. Lysis volume, reduction, and alkylation volume should be taken into consideration. Sample with protein amount above 200–250 μg is not recommended for this PCR plate setup, as it would require a larger amount of beads and form a thicker ring inside the wells on the magnet, leading to interference with TS_50 pipette aspiration and potential loss of beads. 18. The type of 96-well plate used in this application determines lysis volume, pipette types used, magnet type, shaking speed needed to mix to avoid spill, and binding time on magnet to achieve complete separation. Alternatively, a 96 deep-well plate can be used to achieve a larger lysis volume or binding with higher organic content. However, this would require different accessories for the platform. 19. Avoid dispensing less than 2 μL reagents in the epMotion 5073 m. This can be achieved by preparing diluted stocks, or

Sensitive and Robust N Termini Enrichment

121

by manually combining compatible reagents prior to the runs and dispensing these as one stock. 20. Formaldehyde and sodium cyanoborohydride reagents are handled in a fume hood. To minimize hydrolysis, sodium cyanoborohydride diluted stock is prepared freshly for each dispense. 21. Undecanal is compatible in organics, thus an undecanal/ethanol mixture at the correct ratio is prepared in one microfuge tube to be dispensed. 22. To reduce the risk of air bubbles during aspiration and ensure uniform dispense, some of the epMotion 5073 m appliances used in this protocol have been modified from the default settings. For example, bottom tolerance distance of reservoirs and microfuge tubes. 23. The epMotion® 5073 m takes into account the number of tips required in an application and scans the available tips before the start of a run. This platform with its software version (40.4.0.38) cannot stack up pipette tip boxes and cannot rescan during a run, thus separate and subsequent applications might be required. Tips are reused until the end of a defined program. 24. The TS_1000 cannot go into the wells of 250 μL 96-well plate. Liquid dispense can only be from top of wells and aspiration from inside the wells will have to be done with the TS_50. 25. There will be residual volume remaining in the wells after aspiration. The exact volume depends on the distance between well bottom and tips. To remove as much liquid as possible. Applications are programmed to aspirate more than the volume presented in wells. Another option would be to change the depth of pipette tips going into the wells, with a risk of creating pressure and possible air bubbles inside the tip. 26. User intervention step features temporary pause of the application to allow manual steps to be performed, for example, placing freshly made reagents or sealing the plate. 27. Create a separate method file for DDA and for DIA analysis. 28. DDA settings detail methods used in our published data, however, a suitable method file for any mass spectrometer can be used. 29. DIA settings detail methods for a 24-window variable format first described in Bruderer et al., [17], and optimized for use in our published data [8]. However, a suitable method file for any mass spectrometer can be used. 30. Settings can be pre-set in Spectronaut Pulsar software using steps described in the manual.

122

Anuli C. Uzozie et al.

References 1. Huesgen PF, Lange PF, Overall CM (2014) Ensembles of protein termini and specific proteolytic signatures as candidate biomarkers of disease. Proteomics Clin Appl 8:338–350. https://doi.org/10.1002/prca.201300104 2. Lange PF, Huesgen PF, Nguyen K, Overall CM (2014) Annotating N termini for the human proteome project: N termini and Nα-acetylation status differentiate stable cleaved protein species from degradation remnants in the human erythrocyte proteome. J Proteome Res 13:2028–2044. https://doi.org/10. 1021/pr401191w 3. Lange PF, Overall CM (2013) Protein TAILS: when termini tell tales of proteolysis and function. Curr Opin Chem Biol 17:73–82. https:// doi.org/10.1016/j.cbpa.2012.11.025 4. Uzozie AC, Ergin EK, Rolf N, Tsui J, Lorentzian A, Weng SSH, Nierves L, Smith TG, Lim CJ, Maxwell CA, Reid GSD, Lange PF (2021) PDX models reflect the proteome landscape of pediatric acute lymphoblastic leukemia but divert in select pathways. J Exp Clin Cancer Res 40:96. https://doi.org/10.1186/ s13046-021-01835-8 5. Huesgen PF, Overall CM (2012) N- and C-terminal degradomics: new approaches to reveal biological roles for plant proteases from substrate identification. Physiol Plant 145: 5 – 1 7 . h t t p s : // d o i . o r g / 1 0 . 1 1 1 1 / j . 1399-3054.2011.01536.x 6. Savickas S, Kastl P, Auf dem Keller U (2020) Combinatorial degradomics: precision tools to unveil proteolytic processes in biological systems. Biochim Biophys Acta Proteins Proteom 1868:140392. https://doi.org/10.1016/j. bbapap.2020.140392 7. Kleifeld O, Doucet A, Prudova A, auf dem Keller U, Gioia M, Kizhakkedathu JN, Overall CM (2011) Identifying and quantifying proteolytic events and the natural N terminome by terminal amine isotopic labeling of substrates. Nat Protoc 6:1578–1611. https://doi.org/ 10.1038/nprot.2011.382 8. Weng SSH, Demir F, Ergin EK, Dirnberger S, Uzozie A, Tuscher D, Nierves L, Tsui J, Huesgen PF, Lange PF (2019) Sensitive determination of proteolytic proteoforms in limited microscale proteome samples. Mol Cell Proteomics 18:2335–2347. https://doi.org/10. 1074/mcp.TIR119.001560 9. Burger B, Vaudel M, Barsnes H (2021) Importance of block randomization when designing

proteomics experiments. J Proteome Res 20: 122–128. https://doi.org/10.1021/acs. jproteome.0c00536 10. Oberg AL, Vitek O (2009) Statistical design of quantitative mass spectrometry-based proteomic experiments. J Proteome Res 8: 2144–2156. https://doi.org/10.1021/ pr8010099 11. Smith PK, Krohn RI, Hermanson GT, Mallia AK, Gartner FH, Provenzano MD, Fujimoto EK, Goeke NM, Olson BJ, Klenk DC (1985) Measurement of protein using bicinchoninic acid. Anal Biochem 150:76–85. https://doi. org/10.1016/0003-2697(85)90442-7 12. Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteomewide protein quantification. Nat Biotechnol 26:1367–1372. https://doi.org/10.1038/ nbt.1511 13. Tyanova S, Temu T, Cox J (2016) The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat Protoc 11:2301–2319. https://doi.org/10. 1038/nprot.2016.136 14. Fortelny N, Yang S, Pavlidis P, Lange PF, Overall CM (2015) Proteome TopFIND 3.0 with TopFINDer and PathFINDer: database and analysis tools for the association of protein termini to pre- and post-translational events. Nucleic Acids Res 43:D290–D297. https:// doi.org/10.1093/nar/gku1012 15. Lange PF, Huesgen PF, Overall CM (2012) TopFIND 2.0—linking protein termini with proteolytic processing and modifications altering protein function. Nucleic Acids Res 40: D351–D361. https://doi.org/10.1093/nar/ gkr1025 16. Rappsilber J, Mann M, Ishihama Y (2007) Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat Protoc 2: 1896–1906. https://doi.org/10.1038/nprot. 2007.261 17. Bruderer R, Bernhardt OM, Gandhi T, Xuan Y, Sondermann J, Schmidt M, Gomez-Varela D, Reiter L (2017) Optimization of experimental parameters in data-independent mass spectrometry significantly increases depth and reproducibility of results. Mol Cell Proteomics 16:2296–2309. https://doi.org/10.1074/ mcp.RA117.000314

Chapter 9 Phosphoproteomics and Organelle Proteomics in Pancreatic Islets O¨zu¨m Sehnaz Caliskan, Giorgia Massacci, Natalie Krahmer, and Francesca Sacco Abstract Over the recent years, mass spectrometry (MS)-based proteomics has undergone dramatic advances in sample preparation, instrumentation, and computational methods. Here, we describe in detail, how a workflow quantifies global protein phosphorylation in pancreatic islets and characterizes intracellular organelle composition on protein level by MS-based proteomics. Key words Mass spectrometry-based phosphoproteomics, Protein correlation profiling, Organelles, Pancreatic islets

1

Introduction The rewiring of phosphorylation signaling networks in vivo in healthy and diseased islets has become a key question in the diabetic field, yet the extremely limited amount of material that can be extracted from pancreatic islets have hampered experimental progress. Recently, a high-sensitivity, state-of-the-art of mass spectrometry (MS)-based phosphoproteomic workflow has been developed, resulting in very high reproducibility and a limited amount of starting material required (200 μg of protein) [1, 2]. Thanks to this high-sensitive EasyPhos workflow, we have recently characterized in vivo changes in the signaling networks of pancreatic diabetic and healthy murine and human islets [3]. Here, we provide a detailed description of the experimental and computational procedures needed to quantify phosphoproteomic changes in islets. Protein phosphorylation is an important determinant of protein localization. Here, we describe in addition the optimization of

¨ zu¨m Sehnaz Caliskan, Giorgia Massacci equally contributed to this work O Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_9, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

123

124

O¨zu¨m Sehnaz Caliskan et al.

protein correlation profiling (PCP), a previously established organelle proteomic tool [4] for small sample amounts such as pancreatic islets. The combination of subcellular proteome profiling with phosphoproteomics of pancreatic islets provides a powerful tool to study protein relocalization processes and their dependence on signaling pathways.

2

Materials All solutions are prepared with MilliQ water unless otherwise stated.

2.1

1. G-solution: Hanks’ Balanced Salt Solution (HBSS) (with calcium and magnesium, can include phenol red), 1% (v/v) penicillin/streptomycin (P/S), 1% (w/v) bovine serum albumin (BSA). Dissolve 5 mg BSA per 50 mL HBSS on the bench top, but filter it into new 50 mL Falcon tubes under cell culture hood. BSA stock solutions can be stored at 20  C until further use. Store G-solution at 4  C maximum for 1 month.

Islet Isolation

2. Cell culture medium: RPMI 1640, 1% (v/v) P/S, 10% (v/v) fetal bovine serum (FBS). Store at 4  C. 3. 40% Optiprep: 40% Optiprep, 10 mM HEPES. Take 20 mL 60% Optiprep (Sigma), add 300 μL 1 M HEPES, and complete volume to 30 mL with 9.7 mL Dulbecco’s phosphate buffered saline (DPBS). Store at 4  C maximum for 7 days. 4. 10% RPMI: HBSS, 1% (v/v) P/S, 1% (v/v) cell culture medium. Add 5 mL cell culture medium into 45 mL HBSS with 1% (v/v) P/S. Store at 4  C maximum for 7 days. 5. 15% Optiprep Gradient medium: Mix 5 mL 10% RPMI with 3 mL 40% Optiprep. Store at 4  C maximum for 7 days. 6. Collagenase P: G-solution, 0.1% (w/v) Collagenase P. Dissolve 6 mg Collagenase P (1.9 U/mg) in 6 mL G-solution. Prepare fresh, do not store longer than 2 h. Keep on ice. 2.2 Total Proteome and Phosphoproteome Sample Preparation

All stock solutions can be stored at room temperature for > 1 year. l

1 M Tris-HCl (pH 8.5)

l

5 M potassium hydroxide (KOH)

l

100 mM KH2PO4

l

2 M calcium chloride (CaCl2)

l

Acetonitrile (ACN)

l

30% MeOH +1% TFA: 30% (v/v) Methanol (MeOH), 1% (v/v) Trifluoroacetic acid (TFA). This buffer is stable for > 6 months at RT.

Phosphoproteomics and Organelle Proteomics in Pancreatic Islets

125

1. 4% SDC lysis buffer: 4% sodium deoxycholate (SDC) (w/v), 100 mM Tris-HCl (pH ¼ 8.5). For 50 mL, weigh 2 g of SDC and add 45 mL of MilliQ water. Vortex until the SDC dissolves. This solution is stable for >3 months at RT. In the day of use, add 5 mL of 1 M Tris-HCl (pH ¼ 8.5). Prepare fresh to avoid SDC crystallization (see Note 1). 2. 2.2% SDC (w/v) stock solution: For 50 mL, weigh 1.1 g of SDC and 50 mL of MilliQ water. Vortex until the SDC dissolves. SDC stock solution is stable for >3 months at RT. 3. 2% SDC lysis buffer: Mix 2.2% SDC (w/v) stock solution with 1 M Tris-HCl (pH ¼ 8.5) in 9:1 (v/v) ratio. For 50 mL of solution, take 45 mL of 2.2% SDC (w/v) stock solution add 5 mL of 1 M Tris-HCl (pH ¼ 8.5). Prepare fresh to avoid SDC crystallization. 4. Reduction/alkylation buffer: For reduction buffer, 0.5 M Tris(2-carboxyethyl)phosphine hydrochloride (TCEP) stock solution is commercially available. For alkylation buffer, prepare a 10x stock solution of 400 mM chloroacetamide (CAA) in distilled water. Prepare this solution in a fume hood and handle it with gloves. Adjust the pH to 7.0–8.0. Make 50 or 200 μL aliquots and store them at 20  C. 5. Trypsin solution: Reconstitute 1 mg of lyophilized trypsin in 1 mL of trypsin buffer (0.05%(v/v) AcOH and 2 mM CaCl2). Vortex and centrifuge 1000  g for 1 min at RT. Divide into aliquots to avoid multiple freeze-thaw cycles and store for 6 months at 80  C. 6. LysC solution: Dissolve 10-AU of LysC in 3 mL of MilliQ water. Vortex and centrifuge 1000  g for 1 min at RT. Divide into aliquots to avoid multiple freeze-thaw cycles and store for at least 6 months at 80  C. 7. EP Loading buffer: 80% (v/v) Acetonitrile (ACN), 6% (v/v) Trifluoroacetic acid (TFA). This buffer is stable for > 3 months at RT! CAUTION (see Note 2). 8. EP Enrichment buffer: 36% TFA, 3 mM KH2PO4. This buffer is stable for > 3 months at RT. 9. EP Wash buffer: 60% (v/v) ACN, 1% (v/v) TFA. This buffer is stable for > 3 months at RT. 10. EP Transfer: 80% (v/v) ACN, 0.5% Acetic acid. This buffer is stable for > 3 months at RT. 11. EP Elution: Add 200 μL of NH4OH to 800 μL of 40% (v/v) ACN. This buffer must be prepared fresh! 12. SDB-RPS Loading buffer: 2% Trifluoroacetic acid (TFA) in isopropanol. This buffer is stable for > 3 months at RT.

126

O¨zu¨m Sehnaz Caliskan et al.

13. SDB-RPS Wash buffer 1: 1% (v/v) TFA in EtOAc. Prepare fresh! 14. SDB-RPS Wash buffer 2: 1% (v/v) TFA in isopropanol. This buffer is stable for > 3 months at RT. 15. SDB-RPS Wash buffer 3: 0.2% TFA. This buffer is stable for > 3 months at RT. 16. SDB-RPS Elution: Add 20 μL of NH4OH to 4 mL of 60% (vol/vol) ACN. This buffer must be prepared fresh! 17. Buffer A*: 2% ACN, 0.1% TFA. This buffer is stable for > 6 months at RT. 18. Buffer A: 0.1% (v/v) formic acid. 19. Buffer B: 80% (v/v) ACN, 0.1% formic acid. This buffer is stable for > 6 months at RT. 2.3 Organelle Fractionation

Keep all the solutions at room temperature unless otherwise stated. All stock solutions can be stored at room temperature for > 1 year. l

1 M Tris-HCl (pH 7.4)

l

0.5 M EDTA (pH 8.0)

l

1 M KCl

l

1 M MgCl2

l

DPBS

1. 20% sucrose solution: 20% sucrose (w/v), 20 mM Tris-HCl (pH 7.4), 0.5 mM EDTA (pH ¼ 8.0), 5 mM KCl, 3 mM MgCl2. For 500 mL 20% sucrose solution, weigh 100 g of sucrose and add 350 mL of pre-warmed MilliQ water. Dissolve sucrose on a magnetic stirrer and add 10 mL 1 M Tris-HCl (pH 7.4), 0.5 mL 0.5 M EDTA (pH 8.0), 2.5 mL 1 M KCl, 1.5 mL 1 M MgCl2. Make up the total volume with MilliQ water to 500 mL. Filter buffers through 0.22 μm filters with vacuum pump. 2. 50% sucrose solution: 50% sucrose (w/v), 20 mM Tris-HCl (pH 7.4), 0.5 mM EDTA (pH ¼ 8.0), 5 mM KCl, 3 mM MgCl2. For 500 mL 20% sucrose solution, weigh 250 g of sucrose and add 350 mL of pre-warmed MilliQ water. Dissolve sucrose on a magnetic stirrer and add 10 mL 1 M Tris-HCl (pH 7.4), 0.5 mL 0.5 M EDTA (pH 8.0), 2.5 mL 1 M KCl, 1.5 mL 1 M MgCl2. Make up the total volume with MilliQ water to 500 mL. Filter buffers through 0.22 μm filters with vacuum pump. 3. Lysis buffer: 10 mL 20% sucrose solution, 1 tablet protease inhibitor, 1 tablet phosphatase inhibitor. Prepare fresh and keep it on ice until use on same day (see Note 3).

Phosphoproteomics and Organelle Proteomics in Pancreatic Islets

127

4. Ethanol precipitation buffer: Ethanol, 50 mM sodium acetate. Prepare a 2.5 M sodium acetate stock solution in MilliQ water and adjust pH 5.0 by adding TFA. Add 10 mL of the stock solution into 500 mL of ethanol. 2.4

Equipment

l

Deep-well plates, 96 wells

l

Silicone sealing mats for 2-mL deep-well plates

l

Temperature-controlled high-speed orbital shaker

l

2-mL tube adapter for orbital shaker.

l

Deep-well plate adapter for orbital shaker

l

Solid-phase extraction disks for SDB-RPS StageTips. Cut out two layers of SDB-RPS material by using a blunt-end 14-gauge syringe and insert them into the ends of 200 μL pipette tips

l

Solid-phase extraction disks for C8 StageTips. Cut three layers of C8 material by using a blunt-end 14-gauge syringe and insert them into the ends of 200 μL pipette tips

l

Titanium dioxide (TiO2) beads

l

In-house 96-well StageTip centrifuge device

l

Electronic positive-displacement pipette or eight-channel electronic pipette and pipette tips

l

Multichannel 200-μL pipette

l

PCR strip tubes

l

Sealing mats for PCR tubes

l

Vacuum liquid aspiration line and disposable borosilicate Pasteur pipettes

l

Evaporative concentrator with 96-well plate rotor

l

Nanospray columns for online ultra-high-performance liquid chromatography (UHPLC)–MS/MS analysis

l

UHPLC system for online LC–MS/MS analysis

l

Column oven enabling heating of nanospray columns to 50  C

l

15 and 50 mL Falcon tubes.

l

5 and 10 mL pipettes.

l

Pipette boy

l

70-μm cell strainers.

l

Thermoshaker

l

Centrifuge

l

5 cm cell suspension dishes.

l

Ultracentrifuge – Beckman Optima L-70

l

SW41 ultracentrifuge rotor

l

Ultracentrifuge tubes – 14 * 89 mm

128

3 3.1

O¨zu¨m Sehnaz Caliskan et al. l

BioComp Instruments, Gradient master 108

l

0.22 μm filter bottles.

l

1.0 and 0.1 mL douncers.

l

21, 24, 26 Gauge needles

Methods Islet Isolation

Follow whole process on ice. 1. Prepare Falcon tubes including 15% Optiprep per mouse. Put 8 mL 15% Optiprep for each 15 mL Falcon tube. Keep them on ice. 2. Prepare 6 mL of G-solution per mouse. 3. Take 3 mL Collagenase P solution in syringe and 3 mL in 50 mL Falcon tube and put on ice. Do not keep them longer than 2 h. 4. Exsanguinate the animal by a method of preference. 5. Clamp off the common bile duct near the junction with the small intestine. Refer to “A Practical Guide to Rodent Islet Isolation and Assessment” to learn about anatomy of the mouse upper intraperitoneal cavity, clamping and injection site [5]. 6. Inject 2–3 mL Collagenase P solution into common bile duct at the duodenum (see Note 4). 7. Take the pancreas into 50 mL Falcon tube that includes 3 mL Collagenase P solution, which was prepared at Subheading 3.1, step 2. 8. Repeat the above steps for each mouse, and then proceed to the next step. 9. Incubate pancreas in thermoshaker at 37  C, with a medium shake force for 7 min. a. During this period set the centrifuge to 4  C. 10. Take the tubes and apply a vigorous handshake for 5 s. 11. Place back the tubes into thermos-shaker for additional 7–8 min (see Note 5). 12. Take the tubes and apply a vigorous handshake for 10 s (see Note 6). 13. Put them on ice and immediately filled with 15 mL ice-cold G-solution to slow down the digestion process. 14. Centrifuge 2 min, at 560  g, at 4  C.

Phosphoproteomics and Organelle Proteomics in Pancreatic Islets

129

15. Discard the supernatant by decanting into a waste container (no vacuum pump) and leave the pellet inside. 16. Put Falcon tube on ice and add 10-12 mL G-solution. 17. Resuspend the pellet with 10 mL pipette by pipetting strongly (see Note 7). 18. Centrifuge 2 min, at 560  g, at 4  C. 19. Discard supernatant. Decant supernatant in one-step. Place the tube mouth facing down on a paper towel for 2 s remove as much medium as possible. Do not make the pellet too dry. 20. Adjust the pipette boy to lowest speed. Gently add 5.5 mL 15% Optiprep to each tube at RT. 21. Gently resuspend the pellets starting from the first tube that 15% Optiprep added. Avoid bubble formation. 22. To make gradient, transfer this mixed 5.5 mL to the Falcon tube which left 2.5 mL 15% Optiprep gently, with minimal speed, and add the mixture through the wall. Key point: avoid disturbing the layer border! 23. Wash the left cells in Falcon tube where you resuspend the pellet at Subheading 3.1, step 22 by adding 6 mL G-solution. Pipette 2–3 times slowly and gently transfer to the Falcon tube to create the third phase. Checkpoint: Here, separated 3 phases are easily observed. 24. Leave the Falcon tube at RT, for 10 min, to gravity pellet large tissue pieces to the ground phase. 25. Centrifuge 10 min, at 475  g, RT. Set the brake (deceleration) to 0 and the acceleration to 3 in order to maximize the yield. 26. Prewet 70-μm cell strainers with G-solution and place them on top of new 50 mL Falcon tubes. 27. Take the Falcon tubes from the centrifuge. Islets are in between upper and middle phase, in the fluffy layer. 28. Remove upper phase (around 5 mL) by 1 mL pipette. 29. Absorb the interphase (with the islets) with 1 mL pipette and put into pre-wetted 70-μm cell strainer. 30. Wash islets in cell strainer with 5 mL G-solution. 31. Repeat Subheading 3.1, step 30 and 31. 32. Turn the strainer upside down in a suspension dish and rinse the captured islets into dish with 5 mL G-solution, twice. 33. Pick the islets by 200 μL pipette under light microscope and transfer them into a new 5 cm suspension cell dish. Place every 100 islets in a new dish. The brown round cell is the islet. Check the islet morphology following isolation.

130

O¨zu¨m Sehnaz Caliskan et al.

3.2 Phosphoproteome Sample Preparation

1. Pre-heat ThermoMixer to 95  C. 2. Add 4% SDC lysis buffer to the cells to obtain a protein concentration of ~2–4 mg/mL (see Note 8). 3. Boil immediately at 95  C for 5 min and cool on ice. 4. Sonicate at 4  C in a Bioruptor (ten cycles at high intensity). For pellets hard to lyse, boil, and sonicate again. 5. Perform protein quantification and dilute equal protein amount (>750 μg) in 4% SDC buffer to a maximum volume of 450 μL (see Note 9). Troubleshooting (i). High viscosity of the sample after lysis: the viscosity is due to the release of genomic DNA. Perform additional sonication before proceeding with subsequent steps. (ii). Too large volume for optimal protein yield: Take care that mixing velocity is not so high that samples come in contact with the lid for prolonged time. 6. Add 1:10 volume of reduction/alkylation buffer to the samples and place on ThermoMixer at 45  C, for 5 min. 7. Allow samples to cool to RT. Add 1:100 (enzyme:protein amount) LysC and Trypsin solution and incubate overnight at 37  C (see Note 3). PAUSE POINT Digested samples can be stored for weeks at 20  C or for months at 80  C. Take an aliquot which equals to 25 μg of protein for total proteome analysis. 8. Resuspend TiO2 beads at a concentration of 12:1 (wt/wt) bead to protein ratio and add 1.5 μL of EP Loading buffer per mg of beads. 9. Add 750 μL of ACN to each sample and mix for 30 s (see Note 10). 10. Add 250 μL of EP enrichment buffer to the samples and mix for 30 s (see Note 11). 11. Centrifuge 15 min at maximum speed (see Note 12). 12. Transfer samples to 2 mL Eppendorf deep-well plate avoiding the pellet. 13. Add TiO2 beads into each sample and incubate for 5 min at 40  C. 14. Centrifuge to pellet beads at 2000 g for 1 min at RT and discard supernatant. 15. Resuspend beads in 1 mL of EP wash buffer 1 and mix for 3 s at. Pellet the beads at 2000  g for 1 min and discard the supernatant. Repeat a further four times.

Phosphoproteomics and Organelle Proteomics in Pancreatic Islets

131

Table 1 Ultra-high performance liquid chromatography setup (see Note 15) Time interval (min)

Gradient (% buffer B)

0

5

95

30

100

60

105

95

110

95

120

5

16. Resuspend beads in 75 μL of EP transfer buffer and transfer to the top of a C8 Stage Tip. Add another 75 μL of EP transfer buffer to each well and transfer to the C8 Stage Tip. 17. Spin at 1000 g for 5 min at RT and be sure that all transfer buffer flow through away. Spin more through the dryness, if necessary (see Note 13). 18. Elute phosphopeptides with 30 μL of EP loading buffer into PCR tubes centrifuging at 1000 g for 3 min and repeat. 19. Immediately place the PCR tubes into a SpeedVac at 45  C for 20 min (see Note 14). Troubleshooting (i). Formation of insoluble material after Step 4: The formation of insoluble material indicates the presence of lipids. Centrifugation in the deep-well plate 2000  g for 15 min at RT and transfer supernatants to a clean 96-well plate. 20. During SpeedVac equilibrate SDB-RPS Stage Tip. 21. Add SDB-RPS Loading buffer and follow Subheadings 3.3.5, 3.3.6, and 3.3.7. 22. For MS analysis peptides are loaded onto a 50-cm column with a 75 μM inner diameter, packed in-house with 1.9 μM C18 ReproSil particles (Dr. Maisch GmbH) maintained at 60  C. LC-MS analysis is performed on an Q Exactive HF-X mass spectrometer (Thermofisher) with the settings reported in Tables 1 and 2. 23. Process raw files using MaxQuant or compatible software [6, 7]. 24. Use default settings with the following minor changes: methionine oxidation, protein N-term acetylation, and phosphorylation of serine, threonine, and tyrosine as variable modifications.

132

O¨zu¨m Sehnaz Caliskan et al.

Table 2 Mass spectrometer setup Setting

Value (Q Exactive HF-X)

Instrument Polarity

Positive

S-lens/ion-funnel RF level

45

Capillary temperature

300

Full MS Microscans

1

Resolution

60,000

Automatic gain control target

3  106 ion counts

dd-MS2 Microscans

1

Resolution

15,000

Automatic gain control target

1  105 ion counts

Maximum ion time

50 ms

Loop count

10

Isolation window

1.6 m/z

Isolation offset

0

Fixed first mass

100 m/z

Normalized collision energy

27

DD settings Minimum AGC target

1  104 ion counts

Apex trigger

2–4 s

Charge exclusion

Unassigned, 1, 5

Peptide match

Preferred

Exclude isotopes

On

Dynamic exclusion

30 s

25. Set enzyme specificity to trypsin (Maximum: two missed cleavages; Minimum peptide length: seven amino acids). 26. Perform downstream bioinformatics analysis using the Perseus platform [8] or other suitable software. 27. Use a site localization probability of at least 0.75 as thresholds for the localization of phosphorylated residues (class I phosphosites). See Tyanova et al. for in-depth information about settings [8, 9].

Phosphoproteomics and Organelle Proteomics in Pancreatic Islets

133

3.3 Organelle Fractionation

1. Mark the filling line on the tubes using the metal rack provided by BioComp Instruments, Gradient master 108.

3.3.1 Preparation of Gradient Tubes

2. Pipet 6–7 mL of 20% sucrose solution in gradient tubes (slightly above the line that is marked in step 1). 3. Take up 50% sucrose solution with syringe and injection needle (gauge 15 or equivalence needle) and remove air bubble. Use buffers at RT. 4. Carefully add 50% sucrose solution under 20% sucrose solution with the syringe to the marked line. 5. Close the tubes with the special lid (long) provided by BioComp Instruments, Gradient master 108. Start at an angle from the opposite side of the hole to let all air escape. 6. Remove all residual buffer from the top of the lid by pipette and tissue paper. 7. Use BioComp Instruments, Gradient master 108, select SW41 & “Long Sucr 20–50% wv first program to prepare to make gradients. 8. Store gradients at 4  C until use and take great care not to disturb the gradients.

3.3.2 Lysis of Islets

Minimum number of islets required is approximately 1000–1200. The number of mice that would supply this number of islets would change depending on success rate of islet isolation, age, gender, and species of the animal. As a reference, 1000 islets would correspond to 6  16 weeks old male C57BL/6 J mice. Heavy SILAC labeled cell spikes in each fraction is an option to increase the number of proteome per fraction and to obtain an increased proteome coverage. 1. Collect islets from suspension plates into Eppendorf tubes (see Note 16). 2. Centrifuge islets at 500–1000  g for approximately 1 min. 3. Discard supernatant with 200 μL pipette. 4. Wash islets with cold PBS (PBS kept at 4  C). Centrifuge again and discard supernatant. 5. Dissolve islets in lysis buffer and transfer them into glass homogenizer. Use half volume of lysis buffer that would be required to dissolve the islets, transfer them to homogenizer, and use other half volume of lysis buffer to collect any remaining islets in the Eppendorf tubes (see Note 17). 6. Dounce islets approximately 60 times in ice-cold 1.0 mL homogenizer (see Note 18).

134

O¨zu¨m Sehnaz Caliskan et al.

7. Transfer homogenized islets directly to gradient tubes or if preparation of another lysis is necessary, keep homogenized islets on ice until use. 3.3.3 Ultracentrifugation and Collection of Gradient Samples

1. Precool ultracentrifuge, rotor and tube holders to 4  C. 2. Carefully remove lid from gradient tubes. 3. Discard approximately same amount of islet lysate prepared in Subheading 3.3.6 from the sucrose gradient tubes. Take this volume from the upper layer of sucrose gradient. 4. Add the islet lysate carefully into each tube by pipetting slowly on the tube wall. Take care not to disturb the gradient while loading the lysate (see Note 19). 5. Centrifuge at 100,000  g for 3 h at 4  C. 6. Remove the 0.5 mL from the very top and put them in 5.0 mL Eppendorf tubes. 7. Fractions can be snap frozen in liquid nitrogen and store them at 80  C. Otherwise, directly proceed to protein precipitation step.

3.3.4 Protein Precipitation and Total Proteome Sample Preparation

1. Dilute fractions 13–24 with 1:1 (v/v) MilliQ water. Add 500 μL MilliQ water for each fraction (13–24) to dilute sucrose before adding ethanol precipitation buffer. 2. Add 4 (v/v) ethanol precipitation buffer. For 500 μL fraction, add 2 mL ethanol precipitation buffer and shake the tube well. Keep them at 20  C overnight (see Note 20). 3. Centrifuge the tubes at 13,500–15,000  g for 15 min. 4. Decant supernatant and keep the samples under fume hood for at least 30 min to evaporate remaining solution. 5. Dissolve the pellet with 50 or 100 μL 2% SDC lysis buffer and boil immediately at 95  C for 10 min (see Note 21). 6. Sonicate 15 min with Bioruptor, high intensity, 30s On/30s Off cycle at 4  C. 7. Measure the protein concentration with BCA (see Note 22). 8. Calculate volume of sample for 25 μg protein. 9. Dilute samples with 2% SDC buffer if necessary (see Note 23). 10. Add 1/10 (v/v) of 10  CAA and 1/50 (v/v) neutral TCEP solution (40 and 10 mM final concentration in sample). See the Table 3 for example calculation. 11. Incubate samples 10 min, 45  C, 1000 rpm in the dark. 12. Add 1:40 of LysC and trypsin stocks (0.5 μg/μL) to the sample solution (1 part protease to 40 parts protein, e.g., 1.25 μL each enzymes for a 25 μg of protein digest) (see Note 24). Incubate at 37  C, 1000 rpm, 17–20 h overnight.

Phosphoproteomics and Organelle Proteomics in Pancreatic Islets

135

Table 3 Reduction and alkylation reagents required volumes (μL) for 200 μL and 500 μL final sample volumes Sample volume

100 μL

500 μL

Diluted sample

88

440

10 CAA

10

50

0.5 M neutral TCEP (commercially available)

2

10

13. Following overnight digestion, acidify samples by adding 1 sample volume of SDB-RPS Loading buffer. 14. Directly proceed with stage tipping or store the samples at 20  C. 3.3.5 Stage Tip Preparation

This step is a common step for phosphoproteome samples and total proteome and organelle fraction samples [10]. 1. Prepare StageTips with three layers of SDB-RPS membrane (see Note 25). 2. Put three sheets of SDB-RPS membrane on top of each other. 3. Pierce out the stacked membranes with 15-gauge custommade syringe, and stuff into 200 μL Eppendorf tips. Prepare 10% extra tips. 4. Transfer the tips to stage tipping rack.

StageTip activation

Each step, spin at room temperature, at 1000  g until dryness to displace liquid, unless stated otherwise. 3–5 min is generally enough for dryness. 1. Wash StageTips with 100 μL ACN. 2. Activate with 100 μL 30% MeOH +1% TFA. 3. Wash with 150 μL 0.2% TFA. In this step, check if tips are packed well. Spin 1 min at 500 g, check the tips. If some liquid is left, tips are packed well. Choose the tips with liquid volume outlying to other StageTips to discard. 4. Spin to dryness, approximately 3 min more.

3.3.6 Sample Load and in StageTip Wash

1. Load samples on equilibrated columns and spin for 7 min. If there is liquid in tips, spin for approximately 5 min extra. In some cases, there might be clog in the tips. If so, take the clog out by using a syringe needle. Try not to deform the membrane layers. Increasing the spin force to 1250  g might also be helpful. – Wash samples by following the steps below and spin each time at room temperature, at 1000  g until dryness to displace liquid.

136

O¨zu¨m Sehnaz Caliskan et al.

2. 100 μL SDB-RPS Wash buffer 1 (see Note 26) . 3. 100 μL SDB-RPS Wash buffer 2 . 4. 150 μL SDB-RPS Wash buffer 3. PAUSE POINT for Phosphopeptides: Phosphopeptides can be stored bound to Stagetip material immediately before elution, for several weeks at 4  C. 3.3.7 Elution of Peptides from StageTip Membrane

1. Elute with 60 μL elution buffer. 2. SpeedVac until samples are dry. Most frequently, SpeedVac is used for 45 min, at 45  C. 3. Dissolve samples in 6 μL Buffer A and pipet along the tube wall 8–10 times to dissolve all peptides. 4. Cover tubes with plastic lid, spin down and measure peptide concentration on the nanodrop (see Note 27). PAUSE POINT samples can be stored in MS loading buffer for several days at 4  C, several weeks at 20  C, or several months at 80  C. Phosphopeptide samples are advised to avid freeze-thaw cycles. Do not use silica caps to prevent evaporation for long-term storage (see Note 28). 5. Put silica cap and place the samples in an LC autosampler cooled to 4–8  C. 6. For MS analysis peptides are loaded onto a 50 cm column with a 75 μM inner diameter, packed in-house with 1.9 μM C18 ReproSil particles (Dr. Maisch GmbH) maintained at 60  C. LC-MS analysis is performed on an Q Exactive HF-X mass spectrometer (Thermofisher) with the settings reported in Tables 1 and 2. The peptides are then separated by reversedphase chromatography using a binary buffer system consisting of buffer A and buffer A*.

3.3.8 Bioinformatic Analysis of PCP Data Identification of Separable Compartments and Organelle Markers

Support Vector Machines (SVM)-Based Assignment of the Main Organelle

In order to identify separable cellular compartments by PCP approach, protein or phosphopeptide profiles (medians from biological replicates) can be used for Euclidian hierarchical clustering with average linkage, as implemented in Perseus. This revealed clusters of proteins or phosphopeptides corresponding to distinct subcellular compartments. Markers are selected based on their documented GO-annotations and stable cluster assignment among all experimental conditions. Due to overlapping and not validated annotations in the database, a marker selection exclusively based on GO-annotations is not useful. The defined marker set is then used for parameter optimization and training of the SVM based supervised learning approach implemented in Perseus software [11]. Parameters are set to Sigma ¼ 0.2 and C ¼ 4. With SVM classification the main subcellular localization is assigned to every identified protein for each condition separately, or

Phosphoproteomics and Organelle Proteomics in Pancreatic Islets

137

for all conditions combined. For every protein SVM classification is performed on all fractions of all biological replicates combined. The typical prediction accuracy for marker proteins is around 95%, and 90% for marker phosphopeptides. Assignment of a Secondary Organellar Localization by Correlation Analysis

4

As most proteins shown dual subcellular localizations, we have implemented an algorithm for correlation analysis in Perseus software to estimate a second subcellular compartment contribution [4]. This algorithm determines the highest correlation between the protein or phosphopeptide profile determined by our PCP experiment with in silico generated combination profiles (the main organelle profile determined in the previous SVM analysis combined with every other possible median organelle marker profile).

Notes 1. Do not add phosphatase or protease inhibitors in lysis buffers because they may interfere with proteome digestion. Proceed immediately to heat treatment upon lysis of cells in order to inactivate endogenous proteases and phosphatases. 2. Be sure that pH is neutral (~7/8). 3. For preparation of lysis buffer, use vortex, and following roller shaker to dissolve protease and phosphatase inhibitor tablets. Dissolution of tablets might take some time. Lysis buffer might have white particles floating, this is normal. 4. Clamp well and close completely. This is a key point that affects the success of the isolation. Inject Collagenase P solution slowly, and do not rush. 5. Incubation time might vary among different lots of collagenase P and U/mg values. Amount of collagenase P should be re-calculated if the U7mg value is different than the value provided in Materials part 2.2.1.6. 6. The result should be free of large pieces of pancreatic tissue if the pancreas was completely inflated. It should have the consistency of pea soup. 7. Each time absorb 10 mL and pipette to the sidewall of tube. Repeat 10 times in total. Bubbles might occur but they would not interfere with the process. 8. In case of too large volumes, limit the mixing velocity to avoid that sample touch the lid during overnight incubation. 9. Weigh sufficient mg of beads to compensate for material losses during the protocol. TiO2 beads quickly precipitate and must be vortexed before aliquoting to ensure an equal amount of beads in each sample.

138

O¨zu¨m Sehnaz Caliskan et al.

10. Mix adequately with ACN to avoid the formation of precipitates. 11. If precipitate is present, centrifuge at 2000  g for 15 min at RT and transfer supernatants to the wells of a 96-well plate before addition of TiO2 beads. 12. For all the steps, the duration of centrifugation is only a guide. Longer or shorter centrifugation may be required to pass buffers completely through the StageTips. 13. If not all buffer pass through the StageTips, centrifuge for another 3–5 min. 14. The samples should not dry completely. 15. The LC parameters are specific for a nanospray column packed with 40 cm of ReproSil-Pur 1.9 μm C18 resin and the MS parameters are specific for Q Exactive HF or Q Exactive HF-X mass spectrometers. 16. Islets are highly sticky in absence of BSA in the environment. To reduce that they spend in Eppendorf tubes and risk of sticking there, pool islets from different suspension dishes, later transfer them into an Eppendorf tube. Eppendorf LoBind tube is suggested in that step. Transfer them to homogenizer as fast as possible and try to minimize the time that islets spend before transferred into glass homogenizer. 17. 100 μL lysis buffer would be enough for lysing 1000 islets. However, volume of lysis buffer can be scaled up to 1 mL. 18. Use tight pestle to homogenize the islets. Fast douncing can create bubbles and bubbles might reduce lysis efficiency. Pass homogenized islets through 24 Gauge needle to ensure lysis of islets. A small volume of the lysed solution can be observed under microscope in presence of Trypan Blue to check the lysis success. Troubleshooting: (i). If nuclei membrane is also ruptured, decrease the number of homogenizing steps, or use a wider needle (i.e., 21 Gauge needle). (ii). If all the islets is not lysed, increase the homogenizing number and use a narrower needle (i.e. 26 Gauge needle). 19. Prior addition of lysate into gradient tubes, 1/10 of the islet lysate can be removed and kept in a new Eppendorf tube to perform a whole lysate proteomics. 20. Storing at RT is also possible, either O/N or 4–6 h. However, storing at 20  C increases the number of proteins detected in the proteome. Samples can also be kept for 2–3 days. Methanol/chloroform precipitation can be considered as an alternative method.

Phosphoproteomics and Organelle Proteomics in Pancreatic Islets

139

21. Pellets are rarely visible and seen in fractions including high amount of material. It is normal not to observe a pellet. Continue processing anyway. On the other hand, if you have a pellet that is hard to dissolve, use sonication to dissolve the pellet, and boil again for 5 min. 22. As islets are the subjects, amount of material is limited most of the time. The amount of sample used for BCA can be reduced by using half area plates for measurement (e.g., sample and standard volumes can be 5 μL). 23. Ideal is a final digestion volume of 100 μL or less (For 100 μL final volume, dilute samples to 88 μL before adding CAA and TCEP). Some of the islets fractions generally result in less protein yield than 25 μg. For these fractions, use all the sample volume to proceed further. 24. LysC and trypsin have different working pH values. Therefore, they should not be mixed unless they are not already inside the sample volume. 25. Eppendorf tips are more favorable to prepare StageTips due to their shape. See [10] for preparation of StageTips. 26. If multichannel pipette would be used, use glass container. 27. Sonication for 3–5 min can be done before nanodrop measurement. Do not measure phosphoproteome samples. Use 5 μL of the phospho-samples to load into the column for MS measurement. 28. If samples have freeze/thaw cycles before MS measurement, sonicate them before putting samples to autosampler tray. References 1. Humphrey SJ, Karayel O, James DE, Mann M (2018) High-throughput and high-sensitivity phosphoproteomics with the EasyPhos platform. Nat Protoc 13:1897–1916 2. Humphrey SJ, Azimifar SB, Mann M (2015) High-throughput phosphoproteomics reveals in vivo insulin signaling dynamics. Nat Biotechnol 33:990–995. https://doi.org/10. 1038/nbt.3327 3. Sacco F, Seelig A, Humphrey SJ et al (2019) Phosphoproteomics reveals the GSK3-PDX1 axis as a key pathogenic Signaling node in diabetic islets resource phosphoproteomics reveals the GSK3-PDX1 Axis as a key pathogenic signaling node in diabetic islets. Cell Metab 29: 1422–1432.e3. https://doi.org/10.1016/j. cmet.2019.02.012 4. Krahmer N, Najafi B, Schueder F et al (2018) Organellar proteomics and phospho-proteomics reveal subcellular

reorganization in diet-induced hepatic steatosis:205–221. https://doi.org/10.1016/ j.devcel.2018.09.017 5. Carter JD, Dula SB, Corbin KL et al (2009) A practical guide to rodent islet isolation and assessment. Biol Proced Online 11:3–31. https://doi.org/10.1007/s12575-0099021-0 6. Orsburn BC (2021) Proteome discoverer-a community enhanced data processing suite for protein informatics. Proteomes 9. https://doi. org/10.3390/proteomes9010015 7. Koenig T, Menze BH, Kirchner M et al (2008) Robust prediction of the MASCOT score for an improved quality assessment in mass spectrometric proteomics. J Proteome Res 7: 3708–3717. https://doi.org/10.1021/ pr700859x 8. Tyanova S, Temu T, Sinitcyn P et al (2016) The Perseus computational platform for

140

O¨zu¨m Sehnaz Caliskan et al.

comprehensive analysis of ( prote ) omics data. 13. https://doi.org/10.1038/nmeth.3901 9. Sharma K, D’Souza RCJ, Tyanova S et al (2014) Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. Cell Rep 8: 1583–1594. https://doi.org/10.1016/j.cel rep.2014.07.036 10. Rappsilber J, Mann M, Ishihama Y (2007) Protocol for micro-purification, enrichment,

pre-fractionation and storage of peptides for proteomics using StageTips. Nat Protoc 2: 1896–1906. https://doi.org/10.1038/nprot. 2007.261 11. Deeb SJ, Tyanova S, Hummel M et al (2015) Machine learning-based classification of diffuse large B-cell lymphoma patients by their protein expression profiles. Mol Cell Proteomics 14: 2947–2960. https://doi.org/10.1074/mcp. M115.050245

Chapter 10 Phosphoproteomic Sample Preparation for Global Phosphorylation Profiling of a Fungal Pathogen Brianna Ball, Jonathan R. Krieger, and Jennifer Geddes-McAlister Abstract Phosphorylation is a key post-translational modification central to the biological behavior of proteins. This reversible modification specifically regulates cell signaling mechanisms to control survival and growth. Moreover, microbial pathogens, including both fungi and bacteria, rely on this modification to coordinate protein production and functioning during infection and dissemination within a host. Understanding phosphorylation and its involvement with effector proteins and complex networks are now possible with the recent technological advancements of mass spectrometry. Herein, we describe a phosphopeptide enrichment strategy optimized for the invasive mycosis-causing fungal pathogen Cryptococcus neoformans. Our protocol details proper sample preparation for efficient lysis and protein extraction with minimal phosphorylation losses followed by outlined steps for enrichment, instrumentation handling, and data analysis to permit deep profiling of the global phosphoproteome. The high-throughput versatility of bottom-up proteomics combined with our sample preparation approach facilitates opportunities for in-depth phosphorylation mapping and novel biological discoveries. Key words Phosphorylation, Phosphoproteomics, Quantitative mass spectrometry, Fungal pathogens, Cryptococcus neoformans, Cell signaling

1

Introduction Protein phosphorylation is the most well studied post-translational modification (PTM) due to its ubiquitous and crucial regulation of cellular processes. The breadth of temporal and spatial control of phosphorylation is highlighted by an estimated one-third of all proteins in eukaryotes containing phosphorylated residues; this considerable PTM impacts everything from minute protein interactions to global complex cascades [1]. The far-reaching implications of protein phosphorylation and dephosphorylation include metabolism, cellular growth, proliferation, and cell signaling. Furthermore, it is a vital player in signal transduction, specifically in kinase signaling involving the reversible enzymatic addition of a phosphate group onto a substrate [2]. Dysregulation of these

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_10, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

141

142

Brianna Ball et al.

events is a hallmark in complex diseases, whereas microbial pathogens depend on operative phosphorylation cascades for infection and disease progression within a host. The study of kinase signaling networks in fungal pathogens has advanced recently with diverse applications of mass spectrometry (MS)-based proteomics [3–6]. MS-based techniques are the method of choice for overcoming the traditional experimental challenges of hard-to-detect and transient modifications with a low stoichiometric ratio. The combination of MS and highthroughput bioinformatics tools enables large-scale quantitative mapping of thousands of phosphorylation events with localization to the precise site on a protein [7, 8]. Presently, these resources have equipped researchers with various platforms for phosphoproteomics studies; this encompasses in-house sample preparation workflows, instrumentation, and statistical and data visualization software [9–11]. These powerful tools can illuminate the complex relationship of kinase to substrate. Understanding the upstream regulation of phosphorylation in connection to the downstream events is invaluable in the study of PTM enhancing virulence in microbial pathogens. For instance, the notorious opportunistic human fungal pathogen Cryptococcus neoformans relies on complex signal transduction pathways to regulate its vital virulence factors, including a polysaccharide capsule, melanin, and thermotolerance [12]. C. neoformans is the etiological agent of cryptococcosis, which may clinically manifest as cryptococcal meningitis or meningoencephalitis without proper antifungal therapeutics. This invasive mycosis-causing fungus infects more than 220,000 immunocompromised individuals each year and is reported to cause more than 15% of AIDS-related deaths [13]. The increase of clinically antifungal-resistant isolates combined with the already unacceptably high mortality rate raises global concern for the limited antifungal treatment options [14, 15]. Thus, it is imperative to develop new strategies to understand fungal virulence and uncover opportunities to overcome antifungal resistance. For example, mapping of signaling networks vital for virulence in C. neoformans, such as cyclic AMP (cAMP)/Protein kinase A (PKA), protein kinase C (PKC)/mitogen-activated protein kinase (MAPK), calcium-calcineurin, provides such opportunities to develop new strategies to overcome fungal infections on a global scale [16–18]. Here, we outline the processing steps for a global phosphoproteome analysis beginning with crude sample collection of a fungal culture from both WT C. neoformans and a kinase deletion strain, and describe how to translate the sample into refined protein identification hits, phosphosite localization, and quantification of phosphopeptides (Fig. 1). This bottom-up strategy recognizes the nuances associated with labile phosphate groups and accommodates these unique challenges in the sample preparation protocol

Phosphoproteomic Sample Preparation for Global Phosphorylation Profiling. . .

143

Fig. 1 Workflow illustrating main steps in the phosphoproteome extraction from C. neoformans. Fungal culture is collected and mechanically and chemically disrupted to isolate proteins followed by complete digestion into peptides. The resulting peptides are separated into total and phosphoproteome aliquots, with the phosphoproteome sample subjected to an enrichment step for phosphorylated peptides followed by purification along with the total proteome digest prior to measurement on a mass spectrometer. Publicly available platforms, MaxQuant and Perseus, are recommended for user-friendly bioinformatic analysis. Figure created with BioRender.com

by controlling pH, temperature, and lysis conditions. This phosphopeptide enrichment strategy is designed for the opportunistic fungal pathogen C. neoformans; however, this technique is readily applicable to various microbial species, tissue samples, and cell lines with appropriate optimization. This strategy provides boundless discovery-based opportunities on PTM networks, kinase-substrate mapping, and therapeutic and biomarker targets.

2

Materials All solutions are to be prepared with double deionized water and high MS grade reagents. Reagents are stored at room temperature unless otherwise specified.

2.1 Culturing of C. neoformans

1. Cryptococcus neoformans H99 strain. 2. Yeast peptone dextrose (YPD) broth. 3. Yeast nitrogen base (YNB) with amino acids. 4. Agar. 5. 10 mL test tube. 6. 150 mL flask.

144

Brianna Ball et al.

7. 15 mL conical. 8. Sterile Phosphate Buffered Saline (PBS). 9. 1.5 mL lo-bind microcentrifuge tube. 10. 37  C shaking and static incubator. 11. Spectrophotometer. 12. Microcentrifuge. 13. Liquid nitrogen. 2.2 Proteome Analysis

1. Resuspension buffer: dissolve one phosphatase inhibitor tablet in 10 mL of 100 mM Tris-HCl (pH 8.5), make fresh and maintain at 4  C (see Note 1). 2. Probe sonicator. 3. 20% Sodium dodecyl sulfate (SDS). 4. 1 M Dithiothreitol (DTT) (see Note 2). 5. 0.55 M Iodoacetamide (IAA) (see Note 2). 6. Thermal Shaker. 7. Acetone solutions: Dilute acetone with water to produce 80% acetone, store both 100% and 80% at 20  C. 8. 8 M Urea. 9. 40 mM HEPES. 10. Water bath sonicator. 11. 50 mM Ammonium bicarbonate (ABC). 12. Trypsin/Lys-C protease mix, MS grade. 13. Stopping solution: 20% (v/v) acetonitrile and 6% (v/v) trifluoroacetic acid diluted in water. 14. Buffer A: 2% (v/v) acetonitrile, 0.1% (v/v) trifluoroacetic acid, and 0.5% (v/v) acetic acid, diluted in water. 15. Buffer B: 80% (v/v) acetonitrile and 0.5% (v/v) acetic acid diluted in water. 16. C18 resin. 17. STAGE tipping centrifuge. 18. Centrifugal vacuum concentrator. 19. ThermoFisher™ Enrichment Kit.

High-select

Fe-NTA

Phosphopeptide

20. High-resolution mass spectrometer (e.g., Oribtrap, QToF). 21. 5–60% acetonitrile in 0.5% acetic acid. 22. 50 cm Easy-Spray column. 23. Easy-nLC 1200 system. 24. Mass spectrometry data analysis software (e.g., MaxQuant and Perseus [7, 8]).

Phosphoproteomic Sample Preparation for Global Phosphorylation Profiling. . .

3

145

Methods

3.1 Growth of C. neoformans

1. Isolate single colonies of C. neoformans H99 strain from glycerol stock onto a YPD agar plate with a sterile inoculant tip. 2. Incubate streaked plate overnight at 37  C in static incubator. 3. Select a single C. neoformans colony with a sterile inoculant tip to inoculate 5 mL of YPD media in a loosely capped 10 mL test tube, complete in quadruplicate. 4. Incubate overnight in shaking incubator at 37  C at 200 rpm. 5. The following day, use each overnight culture to inoculate 15 mL YNB media in a 150 mL flask at a dilution of 1:100 (see Note 3). 6. Incubate subculture to mid-log phase at 37  C in a shaking incubator at 200 rpm (see Note 4). 7. Collect 15 mL culture in a 15 mL conical tube. 8. Pellet fungal cells by centrifugation for 10 min at 1500  g at room temperature. 9. Remove and discard supernatant (see Note 5). 10. Gently wash fungal cells with 5 mL of sterile PBS. 11. Centrifuge at 1500  g for 10 min at room temperature. 12. Repeat steps 8–11 two additional times, for a total of three washes. 13. Discard supernatant. 14. Flash freeze cell pellet in liquid nitrogen, and store at 80  C until ready for processing.

3.2 Protein Extraction from C. neoformans

1. Resuspend the fungal cell pellet collected in a 15 mL conical tube in 300 μL of 100 mM Tris-HCl containing a phosphatase inhibitor cocktail tablet (see Note 1). 2. Place 15 mL tube in an ice-water bath, and probe sonicate to lyse cells for 5 cycles (30s on/30s off) at an amplitude of 30% (see Note 6). 3. Collect liquid on sides of tube by brief centrifugation but not to pellet cell debris, and transfer sample to a 2 mL lo-bind microcentrifuge tube. 4. Add 1:10 volume of 20% SDS to a final concentration of 2% SDS in each sample. 5. Add 1 M dithiothreitol (DTT) to a final concentration of 10 mM, mix by pipetting (see Note 2). 6. Incubate samples at 95  C for 10 min with shaking at 800 rpm in thermal heating block. 7. Cool samples to room temperature, this step can be done on ice.

146

Brianna Ball et al.

8. Add 0.55 M iodoacetamide (IAA) to a final concentration of 55 mM, mix by pipetting (see Note 2). 9. Incubate samples in the dark at room temperature for 20 min. 10. Add cold 100% acetone to a final concentration of 80%, and store at 20  C overnight (see Note 7). 11. Centrifuge samples at 10,000  g for 10 min at 4  C. 12. Discard supernatant and wash pellet with 500 μL of 80% ice-cold acetone. 13. Repeat steps 11 and 12. 14. Air dry pellet at room temperature. 15. Add 100 μL of 8 M urea/40 mM HEPES to samples (see Note 8). 16. Resolubilize precipitated pellet by alternating between vortexing and sonicating in water bath sonicator for 15 cycles (30s on/30s off) (see Note 9). 17. Quantify the amount of protein in each sample using a protein concentration assay, such as BSA tryptophan assay or BCA protein assay. 18. Add 50 mM ammonium bicarbonate (ABC) to dilute urea to a final concentration of 2 M. 19. Add 2:50 (v/w) enzyme volume to protein amount of trypsin/ Lys-C protease mixture. 20. Mix tube gently by tapping and incubate samples at room temperature overnight. 21. The following day, stop digestion by adding stopping solution at a 1:10 dilution. 22. To pellet any cellular debris, centrifuge samples for 10 min at 10,000  g, room temperature. 23. Transfer supernatants microfuge tube.

to

a

new

1.5

mL

Lo-bind

24. Aliquot 100 μg (total proteome) and approximately 800 μg (phosphoproteome) into new lo-bind microcentrifuge tubes (see Note 10). Remaining samples can be flash frozen in liquid nitrogen and stored short term at 20  C or long term at 80  C (see Note 11). 3.3 Phosphoenrichment for MS Analysis

1. Dry the digested samples for phosphoenrichment using a vacuum concentrator at 45  C until completely dry. 2. To enrich for phosphorylated peptides from the dried sample, follow the manufacturing instructions for the ThermoFisher™ High-select Fe-NTA Phosphopeptide Enrichment Kit (see Note 12).

Phosphoproteomic Sample Preparation for Global Phosphorylation Profiling. . .

147

3. Dry the eluted phosphoenriched peptides using a vacuum concentrator at 45  C until completely dry. 4. Resuspend the phosphoenriched peptides in 200 μL of Buffer A with ice-water bath sonication with 15 cycles (30s on/30s off). 3.4 Peptide Purification

1. Equilibrate C18 STAGE tip by washing with 100 μL 100% acetonitrile and centrifuge at 1000  g for 2 min. 2. Equilibrate C18 STAGE tip with 50 μL Buffer B and centrifuge at 1000  g for 2 min. 3. Equilibrate C18 STAGE tip with 200 μL Buffer A and centrifuge at 1000  g for 3–5 min. 4. Load 50 μg of digested total proteome sample and the entire resuspended phosphoenriched sample into their own respective STAGE tips, and centrifuge at 1000  g until sample has fully passed through the STAGE tip. Remaining digested sample can be flash frozen and stored long term at 80  C (see Note 13). 5. Wash C18 STAGE tip with 200 μL Buffer A and centrifuge at 1000  g for 3–5 min. 6. Elute peptides from C18 STAGE tip with 50 μL Buffer B, centrifuge at 500  g for 2 min and collected in 0.2 mL PCR tubes. 7. Dry the eluted peptides using a vacuum concentrator at 45  C until completely dry, where they are stable at room temperature or at 20  C until processed.

3.5 Mass Spectrometry and Data Analysis

1. Reconstitute dried peptides in 10 μL Buffer A by pipetting and measure concentration of sample. 2. Inject 1.5–3 μg peptides onto the high-performance liquid chromatography column (see Note 14). 3. Separate peptides over a 60 min gradient (5–60% acetonitrile in 0.5% acetic acid), followed by elution using a 50 cm Easy-Spray column (50  C) over a 10 min period with a flow rate of 300 nL/min on the Easy-nLC 1200 system (see Note 15). 4. Using a high-resolution Orbitrap mass spectrometer, acquire full MS scans in the data dependent acquisition mode (m/z 300 to 1500) using the Orbitrap analyzer (resolution 60,000, 100 m/z). 5. Load. RAW mass spectrometer output data files into MaxQuant software, or comparable data processing platform [8]. 6. Set total proteome data files to one parameter group (i.e., Group 0), and phosphoproteome data files to a separate parameter group (i.e., Group 1).

148

Brianna Ball et al.

7. Label each sample file according to the experiment with the sample name being identical between total proteome and phosphoproteome raw files (e.g., WT1, WT2, WT3). 8. Set PTM to “False” for total proteome parameter group, and “True” for phosphoproteome parameter group. 9. In group specific parameters, select variable modification “Phospho (STY)” for phosphoproteome parameter group. 10. Set general parameters for analysis of both total and phosphoproteome: enable label-free quantification and “match between runs,” minimum of two peptides for protein identification, carbamidomethylation of cysteine as a fixed modification and N-acetylation and oxidation of methionine as variable modifications, trypsin digestion allowing up to two missed cleavages, and filter peptide spectral matches with a target decoy approach with a 1% false discovery rate. The Andromeda search engine processed protein identification against Cryptococcus neoformans FASTA file obtained from Uniprot database [19]. 11. Upload output file “Phospho(STY)sites.txt” into Perseus software, or comparable bioinformatic and statistical processing platform [7] (see Note 16). 12. Set general statistical processing: Filter rows to remove contaminants, reverse hits, and localization probability >0.75. Transform LFQ intensities to log2 values and expand site table to combine multiplicity columns into one main column. Categorically annotate samples, and filter for replicates based on valid values, followed by imputation based on normal distribution. Statistical analysis and data visualization are user and experiment oriented, representative data is provided below (Fig. 2) (see ref. [7]).

4

Notes 1. Dissolve one phosphatase inhibitor tablet in 10 mL of 100 mM Tris-HCl (pH 8.5), prepare fresh the day of processing for single-use, maintain solution cold on ice during sample preparation. 2. A stock solution of 1 M DTT and 0.55 M IAA can be prepared in bulk and in single-use aliquots, flash frozen, and stored at 20  C until use, furthermore, IAA stock and aliquots should be maintained in the dark with minimal exposure to light. On the day of processing, any unused DTT and IAA should be discarded. 3. Measurement of phosphorylated peptides requires a large amount of starting material; therefore, the volume of

Phosphoproteomic Sample Preparation for Global Phosphorylation Profiling. . .

149

Fig. 2 Phosphoproteome data analysis in C. neoformans. (a) Principal component analysis comparing the two sample conditions, WT C. neoformans (red) and kinase mutant strain (blue). Clustering of biological replicates based on strain is highlighted along Component 1 (37.8%) and reproducibility of the replicates along Component 2 (15.2%). (b) Volcano plot comparing WT C. neoformans and kinase mutant strain, changes upon protein abundance is depicted with statistical analysis (Student’s t-test, p-value 0.05; FDR ¼ 0.01; S0 ¼ 1)

subculture should be optimized for the fungal strain being used to obtain approximately 0.5–1.5 mg of extracted protein. 4. C. neoformans H99 strain reaches mid-log phase following subculture for 4–5 h in enriched YPD media. Depending on the spectrophotometer, an approximate OD600nm reading of 1.0–1.5 is optimal for collection in mid-log phase. 5. C. neoformans H99 strain generates the production of a polysaccharide capsule, ensure collection of both capsular material and cell pellet together. To avoid cellular material loss when removing the supernatant, gently pipette off the supernatant without disturbing the capsular “cloud-like” layer above the cell pellet. Repeated wash steps compress the capsular layer into the cell pellet, allowing an easier extraction of excess supernatant. 6. Sonication cycles can be increased or reduced depending on the presence of a polysaccharide capsule. Sufficient lysis can be identified in certain cases by visible change of the resuspended sample from cloudy to clear. 7. Samples can be maintained in the acetone precipitated step for up to 2 weeks at 20  C.

150

Brianna Ball et al.

8. Volume of 8 M urea/40 mM HEPES can be increased to ease the resolubilization of large protein extraction experiments. If volume is increased, it is necessary to adjust the corresponding amount of 50 mM ABC to ensure final dilution of 8 M urea to 2 M in Step 18. 9. Samples must always be kept cold when resolubilizing, keep samples on ice and ensure the water bath sonicator is pre-cooled to 4  C. 10. A total proteome should be prepared that is complementary to the PTM enriched proteome to allow for normalization of the phosphoproteome data. The preparation of protein extraction in excess ensures sufficient protein for total proteome analysis (~100 ug), with the remainder of protein satisfying phosphoenrichment requirements (0.8–1.5 mg). 11. The digested total proteome aliquot can be immediately processed following the steps outlined in 2.5 Peptide Purification, or it may be flash frozen in liquid nitrogen and stored at 20  C and purified along with the phosphopeptide enriched samples. 12. There are many user-friendly and optimized phosphoenrichment kits, it is researcher’s preference on choice of procedure to best fit their experimental goals. Regardless of phosphoenrichment kit, the labile nature of phosphate groups requires careful maintenance at optimal temperature and pH to minimize dephosphorylation. 13. The speed and centrifugation length can be increased depending on sample composition to ease the sample flow through the C18 STAGE tips. A maximum speed of 3500 x g is recommended, higher complex samples containing residual lipids and polysaccharides may require greater than 20 min (see ref. [20]). 14. The amount of required sample to be injected depends on instrument selectivity and on the reverse phase column, these factors should be optimized for each mass spectrometer. 15. Gradient percentage and length are dependent on the complexity of the sample and the sensitivity of the instrument. For phosphoenriched samples, the complexity is relatively low, a 60 min gradient is recommended, whereas the gradient should be extended for complex total proteome samples. 16. Two output files are generated in MaxQuant following PTM data processing, “proteingroups.txt” corresponds to the total proteome, and “Phospho(STY)sites.txt” is PTM specific and normalized against background PTM occupancies identified in total proteome (see ref. [7]).

Phosphoproteomic Sample Preparation for Global Phosphorylation Profiling. . .

151

References 1. Moorhead GBG, Trinkle-Mulcahy L, UlkeLeme´e A (2007) Emerging roles of nuclear protein phosphatases. Nat Rev Mol Cell Biol 8:234–244. https://doi.org/10.1038/ nrm2126 2. Cohen P (2002) The origins of protein phosphorylation. Nat Cell Biol 4:E127–E130. https://doi.org/10.1038/ncb0502-e127 3. Retanal C, Ball B, Geddes-McAlister J (2021) Post-translational modifications drive success and failure of fungal–host interactions. J Fungi 7:124. https://doi.org/10.3390/ jof7020124 4. Ball B, Bermas A, Carruthers-Lay D, GeddesMcAlister J (2019) Mass spectrometry-based proteomics of fungal pathogenesis, host-fungal interactions, and antifungal development. J F u n g i 5 . h t t p s : // d o i . o r g / 1 0 . 3 3 9 0 / jof5020052 5. Selvan LDN, Renuse S, Kaviyil JE, Sharma J, Pinto SM, Yelamanchi SD, Puttamallesh VN, Ravikumar R, Pandey A, Prasad TSK, Harsha HC (2014) Phosphoproteome of Cryptococcus neoformans. J Proteome 97:287–295. https:// doi.org/10.1016/j.jprot.2013.06.029 6. Ball B, Langille M, Geddes-McAlister J (2020) Fun(gi)omics: advanced and diverse technologies to explore emerging fungal pathogens and define mechanisms of antifungal resistance. MBio 11. https://doi.org/10.1128/mBio. 01020-20 7. Tyanova S, Temu T, Sinitcyn P, Carlson A, Hein MY, Geiger T, Mann M, Cox J (2016) The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods 13:731–740. https://doi.org/10. 1038/nmeth.3901 8. Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteomewide protein quantification. Nat Biotechnol 26:1367–1372. https://doi.org/10.1038/ nbt.1511 9. Needham EJ, Parker BL, Burykin T, James DE, Humphrey SJ (2019) Illuminating the dark phosphoproteome. Sci Signal 12:eaau8645. https://doi.org/10.1126/scisignal.aau8645 10. Ball B, Geddes-McAlister J (2019) Quantitative proteomic profiling of Cryptococcus neoformans. Curr Protoc Microbiol 55:1–15. https://doi.org/10.1002/cpmc.94 11. Ball B, Sukumaran A, Geddes-McAlister J (2020) Label-free quantitative proteomics workflow for discovery-driven host-pathogen

interactions. J Vis Exp. https://doi.org/10. 3791/61881 12. Zaragoza O (2019) Basic principles of the virulence of Cryptococcus. Virulence 10:490–501. https://doi.org/10.1080/21505594.2019. 1614383 13. Rajasingham R, Smith RM, Park BJ, Jarvis JN, Govender NP, Chiller TM, Denning DW, Loyse A, Boulware DR (2017) Global burden of disease of HIV-associated cryptococcal meningitis: an updated analysis. Lancet Infect Dis 17:873–881. https://doi.org/10.1016/ S1473-3099(17)30243-8 14. Bermas A, Geddes-McAlister J (2020) Combatting the evolution of antifungal resistance in Cryptococcus neoformans. Mol Microbiol: mmi.14565. https://doi.org/10.1111/mmi. 14565 15. Geddes-McAlister J, Shapiro RS (2019) New pathogens, new tricks: emerging, drugresistant fungal pathogens and future prospects for antifungal therapeutics. Ann N Y Acad Sci 1435:57–78. https://doi.org/10.1111/nyas. 13739 16. Geddes JMH, Caza M, Croll D, Stoynov N, Foster LJ, Kronstad JW (2016) Analysis of the protein kinase a-regulated proteome of Cryptococcus neoformans identifies a role for the ubiquitin-proteasome pathway in capsule formation. MBio 7:1–15. https://doi.org/10. 1128/mBio.01862-15 17. Geddes JMH, Croll D, Caza M, Stoynov N, Foster LJ, Kronstad JW (2015) Secretome profiling of Cryptococcus neoformans reveals regulation of a subset of virulence-associated proteins and potential biomarkers by protein kinase a. BMC Microbiol 15:1–26. https:// doi.org/10.1186/s12866-015-0532-3 18. Kozubowski L, Heitman J (2012) Profiling a killer, the development of Cryptococcus neoformans. FEMS Microbiol Rev 36:78–94. https://doi.org/10.1111/j.1574-6976.2011. 00286.x 19. Cox J, Neuhauser N, Michalski A, Scheltema RA, Olsen JV, Mann M (2011) Andromeda: a peptide search engine integrated into the MaxQuant environment. J Proteome Res 10:1794– 1805. https://doi.org/10.1021/pr101065j 20. Rappsilber J, Mann M, Ishihama Y (2007) Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat Protoc 2: 1896–1906. https://doi.org/10.1038/nprot. 2007.261

Chapter 11 Glycopeptide-Centric Approaches for the Characterization of Microbial Glycoproteomes Nichollas E. Scott Abstract Protein glycosylation is increasingly recognized as a common class of modifications within microbial species that can shape protein functions and the proteome at large. Due to this, there is an increasing need for robust analytical methods, which allow for the identification and characterization of microbial glycopeptides from proteome samples in a high-throughput manner. Using affinity-based enrichment (either hydrophilicity or antibody-based approaches) glycopeptides can easily be separated from non-glycosylated peptides and analyzed using mass spectrometry. By utilizing multiple mass spectrometry fragmentation approaches and open searching-based bioinformatic techniques, novel glycopeptides can be identified and characterized without prior knowledge of the glycans used for glycosylation. Using these approaches, glycopeptides within samples can rapidly be identified as well as quantified to understand how glycosylation changes in response to stimuli or how changes in glycosylation systems impact the glycoproteome. This chapter outlines a set of robust protocols for the initial preparation, enrichment, and analysis of microbial glycopeptides for both qualitative and quantitative glycoproteomic studies. Using these approaches, glycosylation events can be easily identified by researchers without the need for extensive manual analysis of proteomic datasets. Key words Glycopeptide enrichment, Zwitterionic hydrophilic interaction liquid chromatography (ZIC-HILC), Electron-Transfer/Higher-Energy Collision Dissociation (EThcD), Collision-induced dissociation (CID), Higher-energy collisional dissociation (HCD), Mass spectrometry

1

Introduction Glycoproteomics is a rapidly developing sub-field within proteomics that seeks to characterize glycosylation events on a proteome scale [1–3]. Over the last decade, this field has seen tremendous growth as new instrumentation [1, 4], enrichment approaches [5], and bioinformatic tools [5] now enable glycoproteomic analysis without extensive training or bespoke experimental setups. Although the interest and growth of glycoproteomics have largely been observed in mammalian systems, it is important to note that increasingly glycosylation events are also being identified in

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_11, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

153

154

Nichollas E. Scott

bacterial [6, 7] and eukaryotic parasites [8]. Within microbes these once overlooked events are increasingly gaining recognition with a growing body of work highlighting that not only does microbial glycosylation occur in a wide range of organisms, but it is also critical for physiological functions and microbial pathogenesis [6, 7]. Thus, the importance of these events in microbial physiology and pathology, as well as the potential to use microbial glycosylation to create next-generation vaccines [9, 10], makes the study of microbial glycoproteomics more critical than ever. As within the study of mammalian glycosylation events, mass spectrometry has become a critical tool for the identification, characterization, and quantification of both microbial glycopeptides and glycoproteins. Although similar enrichment and identification approaches can be used to identify microbial glycosylation, it should be noted that a common theme within microbial glycosylation is that the diversity of carbohydrates and amino acids targeted for glycosylation can be dramatically different to eukaryotic systems [6, 7]. This difference means rarely are approaches developed for Eukaryotic glycoproteomics compatible with all microbial glycosylation systems with the applicability of strategies dependent on the glycosylation system in question. This leads to the need to take one of two approaches for characterizing microbial glycopeptides: I) a less targeted approach exploiting general properties of glycans, such as hydrophilicity for glycopeptide enrichment; or II) an extremely focused approach using affinity reagents to epitopes within glycans or chemical biology to label glycans enabling their enrichment. Both styles of approaches have been used widely to study microbial glycosylation with a notable change over the last decade. Studies from the early 2010s favored less targeted approaches utilizing techniques, such as normal phase glycopeptide enrichment [11, 12], zwitterionic hydrophilic interaction liquid chromatography (ZIC-HILIC) glycopeptide enrichment [13–20], and graphite based glycopeptide enrichment [21, 22] to enable the identification of glycopeptides. As the field has matured, targeted approaches have become favored which leverage defined aspects of the biochemistry or conservation of microbial glycosylation to allow the enrichment of microbial glycosylation events. For example, recent targeted studies have used metabolic labelling to identify novel targets of the Legionella O-glucosyltransferase SetA and the enteropathogenic Escherichia coli Arginine-GlcNAc transferase NleB1 [23]. Knowledge of carbohydrate utilization has also aided glycoproteomic analysis in recent studies of the Neisseria glycoproteome, where it is now known multiple species incorporate negatively charged sugars into O-linked glycans [24]. This knowledge was recently exploited enabling the use of titanium dioxide, an established method for enriching sialic acid containing glycopeptides [25], to enrich Neisseria glycopeptides decorated with sialic

Glycopeptide-Centric Approaches for the Characterization of Microbial. . .

155

acid like sugars [26]. Finally, new tools such as novel affinity reagents are also allowing targeted studies with a particularly noteworthy example being the recent studies of Arginine-GlcNAcylation [27–31] using a commercial ArginineGlcNAc antibody [32]. Although these examples highlight how targeted microbial approaches are facilitating the study of microbial glycoproteomes, it is important to note that less targeted techniques still remain a powerful tool for initial discovery studies. The methodologies described in this chapter cover three key aspects of microbial glycosylation analysis: (1) The preparation of proteome samples for enrichment of microbial glycopeptides; (2) General guidelines for the mass spectrometry-based acquisition of microbial glycopeptide data; and (3) The bioinformatic approaches to identify and characterize glycosylation events. Due to the diversity of microbial glycosylation, two broad approaches for enriching microbial glycopeptides, the enrichment of glycopeptides with ZIC-HILIC enrichment and the enrichment of glycopeptides with antibodies will be highlighted, yet it should be noted, the suitability of these approaches will depend on the microbial glycosylation system in question.

2

Materials Where applicable, all buffers should be prepared with highperformance liquid chromatography (HPLC)-grade reagents or solvents.

2.1 Preparation of Proteome Samples

1. Sodium deoxycholate (SDC) lysis buffer: 4% SDC in 100 mM Tris pH 8.5. Prepare fresh and chill prior to use. 2. 10 x reduction/alkylation buffer: 100 mM Tris 2-carboxyethyl phosphine hydrochloride (TCEP), 400 mM 2-Chloroacetamide (CAA), and 1 M Tris pH 8.5. Prepare fresh (see Note 1). 3. Trypsin/Lys-C (Promega). 4. 100 mM Tris pH 8.5 in Milli-Q H2O. 5. 3M Empore™ SPE SDB-RPS (Merck). 6. Kel-F hub, point style 3, gauge 14 (Hamilton company). 7. Isopropanol. 8. 10% trifluoroacetic acid (TFA) in Milli-Q H2O. 9. Acetonitrile. 10. 30% methanol, 1% TFA in Milli-Q H2O. 11. 90% isopropanol, 1% TFA in Milli-Q H2O. 12. 1% TFA in Milli-Q H2O.

156

Nichollas E. Scott

13. 5% ammonium hydroxide in 80% acetonitrile. 14. A speed-vacuum concentrator. 15. A thermomixer with a 96-well PCR block for overnight digestion. 2.2 HydrophilicityBased Glycopeptide Enrichment

1. ZIC-HILIC loading/wash buffer: 80% acetonitrile, 1% TFA in Milli-Q H2O, prepare fresh. 2. ZIC-HILIC preparation buffer: 95% acetonitrile in Milli-Q H2O. 3. ZIC-HILIC elution buffer: 0.1% TFA in Milli-Q H2O. 4. ZIC-HILIC material 5 μm, 200 A˚ (Merck). 5. 3M Empore™ SPE C8 (Merck). 6. A speed-vacuum concentrator.

2.3 Antibody-Based Glycopeptide Enrichment

1. Protein A/G plus Agarose beads (Santa Cruz). 2. Immunoprecipitation affinity buffer: 10 mM Na2HPO4, 50 mM NaCl, and 50 mM MOPS in Milli-Q H2O pH to 7.2. Prepare fresh the day prior to use. 3. Sodium borate buffer: 100 mM Sodium tetraborate decahydrate in Milli-Q water, pH 9.0. Prepare fresh the day prior to use. 4. Dimethyl pimelimidate (DMP) cross-linking buffer: 20 mM DMP in 100 mM HEPES, pH 8.0. Prepare fresh just prior to use. 5. Ethanolamine buffer: 200 mM ethanolamine, pH 8.0. Prepare fresh prior to use. 6. 1M HEPES in Milli-Q H2O, pH 7.2. 7. TFA peptide elution buffer: 0.2% TFA in Milli-Q H2O. 8. 3M Empore™ SPE C18 (Merck). 9. A speed-vacuum concentrator.

2.4 Glycopeptide MS Analysis

2.5 Bioinformatic Analysis of Microbial Glycopeptides

1. Nano flow HPLC system (see Note 2). 2. A Orbitrap Fusion Lumos Tribrid Mass Spectrometer or equivalent. 1. Software packages for the analysis of proteomic data such as MSfragger [33, 34], Metamorpheus [35], Byonic [36], or MaxQuant [37].

Glycopeptide-Centric Approaches for the Characterization of Microbial. . .

3

157

Methods All analyses should be performed in at least biological triplicate to allow statistical interpretation of observed glycosylation events and confirmation that these events are representative of the biological samples being analyzed.

3.1 Preparation of Proteome Samples for Glycopeptide Enrichment/Analysis

1. Multiple sample preparation approaches can be used to generate proteomic samples for further downstream analysis. The key considerations when deciding on a sample preparation approach are: (1) Compatibility of the sample preparation approach with the glycoproteins of interest; (2) Compatibility of the resulting peptide mixtures with downstream enrichment; and (3) Appropriateness of the resulting peptides for downstream analysis. Increasingly, we utilize minimum sample handling approaches, such as the in-stage tip sample preparation method [38], as this allows the preparation of 5–300 μg of highly clean peptide preparations for direct analysis or enrichment-based studies (see Notes 3–5). 2. Prior to harvesting samples, prepare the SDC lysis buffer and store on ice till required (see Note 6). 3. If cell samples are to be prepared wash with ice-cold PBS prior to lysis to remove residual media associated proteins. 4. Add SDC lysis buffer to samples of interest and boil at 95  C for 10 min with shaking at 2000 rpm on a thermomixer. For 1 mL of OD600nm 1.0 bacterial culture 200 μL of SDC is sufficient. Ensure complete solubilization of sample, if required, additional SDC can be added and multiple rounds of boiling at 95  C for 10 min with shaking at 2000 rpm undertaken (see Note 7). 5. Store samples on ice and quantify protein yield using a Bicinchoninic Acid (BCA) assay. Protein (20–300 μg) is typically sufficient for non-enrichment as well as enrichment-based analysis. 6. Aliquot the required protein amounts into 0.2 mL PCR tubes. Reduce and alkylate samples by adding a 1:10 volume of reduction/alkylation buffer and incubate in the dark for 30 min at 45  C with shaking at 1500 rpm. 7. Spin down samples at 2000  g for 1 min and allow to cool at room temperature for 10 min. 8. Digest samples overnight at 37  C with trypsin/lys-C (1:100 protease:protein ratio) added in 10 μL of 100 mM Tris pH 8.5. Shake digests at 1500 rpm (see Note 8). 9. Add 1.1 volumes of isopropanol to the digests (for example, if a digest was 80 μL add 88 μL of isopropanol), vortex for 1 min to

158

Nichollas E. Scott

Fig. 1 Preparation of samples using SDB-RPS columns. (a) Using a P200 tip (1) pack the required number of disks of SDB-RPS into the tip using a Kel-F hub needle (2). (b) Place tips within the tip spinner (3) and using a 96-well plate (4) collect washes. Once washed samples can be eluted one at a time by hand into tubes or en masse using a tip spinner into PCR tubes or plates

ensure the samples are well mixed and spin down at 2000  g for 1 min (see Note 9). 10. Adjust the isopropanol / digest mixture to 1% TFA with 10% TFA (for example, if the isopropanol/digest mix is 168 μL, add 17 μL 10% TFA), vortex to ensure the samples are well mixed and spin down for 1 min at 2000  g. 11. Prepare one SDB-RPS Stage Tip for each sample as previously described [39]. Empirically, we use 3 x 14G disks of SDB-RPS to bind 50 μg of peptide and adjust the number of disks according to the peptide amount (Fig. 1a). To enable the processing of multiple samples a 3D-printed tip spinner is recommended (Fig. 1b, see Note 10). 12. Wet SDB-RPS columns with 150 μL of acetonitrile and centrifuge at 1000  g for 3 in in a tip spinner. 13. Wash SDB-RPS columns with 150 μL of 30% methanol, 1% TFA by centrifuging at 1000  g for 3 min in a tip spinner. 14. Equilibrate SDB-RPS columns with 150 μL of 90% isopropanol, 1% TFA by centrifuging at 1000  g for 3 min in a tip spinner. 15. Load samples onto the SDB-RPS columns by centrifuging at 1000  g for 5 min in a tip spinner. 16. Wash SDB-RPS columns with 150 μL of 90% isopropanol, 1% TFA by centrifuging at 1000  g for 5 min in a tip spinner. 17. Wash SDB-RPS columns with 150 μL of 1% TFA by centrifuging at 1000  g for 5 min in a tip spinner. 18. Elute SDB-RPS columns with 150 μL of 5% ammonium hydroxide in 80% acetonitrile by centrifuging at 1000  g for

Glycopeptide-Centric Approaches for the Characterization of Microbial. . .

159

5 minutes in a tip spinner. Collect samples in either a 96-well PCR tray or individual PCR tubes. 19. Dry down peptide elutions by vacuum centrifugation. 150 μL of elution will take approximately 30–40 min to dry. 3.2 HydrophilicityBased Glycopeptide Enrichment

1. Hydrophilic based glycopeptide enrichment such as normal phase [11, 12] and ZIC-HILIC enrichment [13–20] have been extensively used by the microbial glycoproteomic community. Although these technologies do allow the isolation of glycopeptides, even when the composition of the glycan is unknown, it is important to note the key caveat of these approaches are glycopeptides must have a glycan of moderate size (>3 carbohydrates). As such these enrichment approaches are not always suitable for all microbial glycopeptides and failure to isolate glycopeptides may not indicate the absence of glycosylation but the absence of compatible glycopeptides for enrichment (see Note 11). Within our laboratory, ZIC-HILIC enrichment is typically used as an initial approach to characterize microbial glycoproteomes and glycan diversity [40] due to its ease and versatility. 2. Prior to glycopeptide enrichment prepare the ZIC-HILIC loading/wash, ZIC-HILIC preparation, and ZIC-HILIC elution buffers (see Note 12). 3. Prepare ZIC-HILIC Stage tips as previously described [39, 41] using C8 Empore material to create a frit, which is packed with 0.5 cm of ZIC-HILIC material. 4. Equilibrate column with 20 bed volumes (200 μL) of ZIC-HILIC elution buffer followed by 20 bed volumes (200 μL) of ZIC-HILIC preparation buffer, then 20 bed volumes (200 μL) of ZIC-HILIC loading/wash buffer (see Note 13). 5. Resuspend dried down proteome samples in ZIC-HILIC loading/wash buffer to a final concentration of 4 μg/μL (for example, for 200 μg of peptide resuspend in 50 μL of ZIC-HILIC loading/wash buffer). Vortex briefly for 1 min to ensure the samples are resuspended and spin down for 1 min at 2000  g. 6. Load the resuspended peptide sample onto a conditioned ZIC-HILIC column and wash with 20–50 bed volumes (200–500 μL) of ZIC-HILIC loading/wash buffer (see Note 14). 7. Elute glycopeptides with 20 bed volumes (200 μL) of ZIC-HILIC elution buffer into a 1.5 mL tube and dry down the elution by vacuum centrifugation. 200 μL of elution will take approximately 120 min to dry.

160

Nichollas E. Scott

3.3 Antibody-Based Glycopeptide Enrichment

1. Traditionally restricted to PTMs, such as ubiquitination or phosphorylation [42, 43] studies, exploring glycosylation are increasingly being undertaken using antibody-based enrichment [28, 30, 44]. The rapid growth in new platforms to create anti-glycan reagents such as the VLRB of lampreys [45] makes this method an increasingly attractive option for enriching glycopeptides. Within the lab, we have heavily used an AntiArginine GlcNAc (Abcam/ab195033) antibody for the enrichment of Arginine-GlcNAc modified glycopeptides from bacterial samples [28, 30] as well as in vitro [29–31] and in vivo [31] infection models, finding antibody glycopeptide enrichment to be robust with even limited sample.

3.3.1 Coupling of Antibodies to Protein A/G Beads

1. Prior to antibody-based glycopeptide enrichment, prepare the immunoprecipitation affinity buffer and allow to chill before use. 2. Aliquot 100 μL of Protein A/G plus Agarose beads into a 1.5 mL Eppendorf tube and wash three times with chilled immunoprecipitation affinity buffer. 3. Add 10 μg of anti-glycan antibody (for example, anti-ArgGlcNAc) and tumble overnight at 4  C to allow coupling. 4. Prepare 100 mM sodium borate allowing it to dissolve overnight (see Note 15).

3.3.2 Cross-Linking Antibodies to Protein A/G Beads

1. Wash beads three times with 1 mL of 100 mM sodium borate to remove non-bound proteins. 2. Add 1 mL of freshly prepared 20 mM DMP in 100 mM HEPES, pH 8.0 to antibody conjugated beads and tumble at room temperature for 30 min. 3. Remove DMP cross-linking buffer and quenched/remove residual cross-linking agent with Ethanolamine buffer by washing three times (3 x 1 mL, see Note 16). 4. Tumble beads with 1 mL Ethanolamine buffer for 2 h at 4  C to ensure complete quenching of the DMP cross-linker (see Note 17).

3.3.3 Antibody-Based Affinity Purification of Glycopeptides

1. Resuspend peptides samples in 1 mL of immunoprecipitation affinity buffer and check pH is 7.2 to ensure compatibility with affinity conditions. If required pH with 1 M HEPES pH 7.2. 2. Add the resuspended peptides in the immunoprecipitation affinity buffer to the crossed-linked anti-glycan antibody beads, tumble for 3 h at 4  C. 3. Collect the unbound peptides solution and wash beads six times with 1 mL of ice-cold immunoprecipitation affinity buffer.

Glycopeptide-Centric Approaches for the Characterization of Microbial. . .

161

4. Add 100 μL of 0.2% TFA elution buffer to beads and allow the beads to stand at room temperature with gentle shaking every minute for 10 min. Spin down for 1 min at 2000  g and collect the supernatant ensuring no beads are collected. Repeat elution twice and pool the resulting elution samples. 5. Desalt collected samples using C18 Stage tips as previously described [39] before analysis by LC-MS. 3.4 Glycopeptide MS Analysis

1. Multiple instruments can be used to acquire glycopeptide data, yet within the field of glycoproteomics, thermo instruments have gained notable popularity due to their ability to undertake multiple fragmentation approaches, including Electrontransfer dissociation (ETD) and Electron-Transfer/HigherEnergy Collision Dissociation (EThcD) which can enable the localization of labile glycosylation events (see Note 18). The instrument settings described here are designed to be a starting point to acquire high-quality glycopeptide data. An appropriate nanoflow LC-MS setup should be used for proteomic based analysis with the column chemistry, column length, flow rate, and gradient conditions used determined by the user. 2. Using an Orbitrap Fusion Lumos Tribrid Mass Spectrometer (or similar instrument) glycopeptide analysis is performed as shown within Fig. 2. The initial full MS scan is undertaken within the Orbitrap at a resolution of 60 k using a mass range of 400–2000 m/z (see Note 19). Multiply charged ions are selected for fragmentation using higher-energy collisional dissociation (HCD) MS/MS. The MS/MS events are isolated using Quadrupole isolation with a width of 1.6 m/z and up to 3 s of HCD MS/MS scans allowed. Initial HCD scans are analyzed within the Orbitrap at a resolution of 15 k with a normalized collision energy of 32, an Automated Gain Control (AGC) target of 1  105 or 200% AGC, and a maximum injection time of 80 ms. Based on the chromatography performance, adjust the dynamic exclusion time between 20 and 60 s. 3. Using a scouting approach glycopeptide HCD scans, which contain ions associated with glycans structures (Table 1), are subjected to additional MS/MS events allowing information about the glycan, peptide, and site of attachment to be generated. 4. The three additional MS/MS events triggered by ions associated with glycans structures are: • An Electron-Transfer/Higher-Energy Collision Dissociation (EThcD) scan with a normalized collision energy of 25, a AGC target of 2.5  105 or 500% AGC, and a maximum injection time of 250 ms. Ensure the high mass

162

Nichollas E. Scott

Fig. 2 Scouting glycopeptide method: Outline of the MS method setup for scouting glycopeptide analysis within the Thermo Xcalibur Instrument method editor (version 4.3.73.11). (1) MS1 setting modified to enable the observation of large glycopeptides; (2) HCD scouting scan used to identify potential glycopeptides; (3) List of potential carbohydrate fragment ions which will trigger additional scans for ions of interest; (4) EThcD scan (provides site localization information on glycans); (5) CID scan (provides glycan information), and (6) the HCD scan (provides peptide and glycan fragment information) Table 1 Common oxonium ions within microbial glycans Carbohydrate associated ions (m/z)

Carbohydrate associated ion identity

366.1395

Hex-HexNAc

204.0865

HexNAc

186.0759

HexNAc-H2O

168.0654

HexNAc-2H2O

229.1188

Baca

211.1082

Baca—H2O

a

Bacillosamine (Bac), which is also known as 2,4-diacetamido-2,4,6-trideoxyglucopyranose and its derivatives, are commonly observed in bacterial glycoproteins [46]

range Orbitrap option is enabled (53) and the resulting scan is analyzed within the Orbitrap at a resolution of 30 k. • A CID scan using CID fragmentation with a normalized collision energy of 25, a AGC target of 5  104 or 100% AGC, and a maximum injection time of 50 ms. The resulting scan can be analyzed within either the ion trap or Orbitrap.

Glycopeptide-Centric Approaches for the Characterization of Microbial. . .

163

• A HCD scan using HCD fragmentation with stepped normalized collision energies of 28, 33, and 38, a AGC target of 2.5  105 or 500% AGC and a maximum injection time of 250 ms with the resulting scan analyzed within the Orbitrap at a resolution of 30 k. 3.5 Bioinformatic Analysis of Microbial Glycopeptides

1. Once collected, glycopeptide data can be analyzed using a range of software packages, including MSfragger [33, 34], Metamorpheus/O-Pair [35], Byonic [36], and MaxQuant [37] with these programs varying in their suitability for glycopeptide identification based on the type of glycopeptides expected in the sample. For database searching tools, such as MaxQuant, these are suitable for searching known glycan modifications where only one or two glycans are present within samples. Searching with variable modifications is increasingly referred to as a “closed” searching approach due to the fixed number of modifications being considered while “open” searching approaches refer to searching approaches which allow for a range of modifications within a given mass range [47]. For unknown modifications or samples where multiple glycans are expected open search tools such as MSfragger, O-Pair, or Byonic have been found to be highly effective. 2. Search the MS data files for glycopeptides using your software package of choice. As an example, a ZIC-HILIC enriched sample of Campylobacter fetus fetus NCTC10842 has been searched with MSfragger [40] using the open searching setting allowing delta masses of up to 2000 Da and the resulting data shown below in Fig. 3. 3. Using the unique peptide spectrum matches (PSMs) identified the observed delta masses can be plotted to visualize commonly observed modifications (Fig. 3). These delta mass plots enable common modifications to be identified by the number of times a specific delta mass is observed. These PSMs can be further filtered by identification scores to focus on high confident identifications which can be useful during the initial discovery of previously unknown glycans. 4. For high confident peptide assignments, manual inspection of the CID, EThcD, and HCD scans can be used to further characterize glycopeptides of interest (Fig. 4, see Note 20). 5. Once glycoforms have been identified within datasets, these glycoforms can be incorporated into targeted (closed) database searches which we have found improve the identification of specific classes of glycopeptides [40]. 6. Once glycoforms have been identified these modifications can be monitored to assess how glycosylation changes in response to biological conditions using MaxQuant or targeted analysis using software such as Skyline [48].

164

Nichollas E. Scott

Fig. 3 Open search results of ZIC-HILIC enriched samples of C. fetus fetus NCTC10842: (a) Delta mass plots enable the identification of known modifications, such as formylation and the addition of lysine, as well as high mass modifications, highlighted in blue. (b) Zooming in on the high mass modifications reveals multiple glycoforms

4

Notes 1. Check the pH of the reduction/alkylation buffer is ~7 as TCEP is acidic and will lower the pH of the buffer. 2. Multiple nanoflow LC-MS setups can be used with the selection of the setup, column used and flow rate all influencing the overall performance of the LC-MS analysis. LC-MS methods should be optimized prior to analysis to ensure optimal performance. 3. Multiple approaches can be used to generate protein samples for downstream digestions. If highly hydrophobic

Glycopeptide-Centric Approaches for the Characterization of Microbial. . .

165

Fig. 4 Combining information from multiple fragmentation types enables more complete characterization of microbial glycopeptides: The 1040 Da glycoform of C. fetus fetus NCTC10842 observed on the glycopeptide

166

Nichollas E. Scott

glycoproteins are to be analyzed, strong detergents, such as SDS, can be used using the filter-aided sample preparation approach [49]. Alternatively approaches such as the SP3 method [50, 51] or S-traps [52] can also be used. 4. If ZIC-HILIC enrichment is to be used, care must be taken to remove trace levels of salt as these can interfere with glycopeptide enrichment [53]. Similarly, if antibody-based enrichment approaches are used, care must be taken to ensure the final pH of the peptide mixture is neutral to enable effective affinity purification. 5. Care must be taken when considering the choice of proteases used to generate glycopeptides as the size and composition of glycopeptides impact both their ability to be enriched and identified. For example, we have noted that using ZIC-HILIC enrichment short aliphatic O-linked glycopeptides are typically absent from enrichments [54]. 6. If dilute samples are to be prepared, which cannot be concentrated by pelleting, such as secretome samples, a 2.5 x SDC buffer (10% SDC in 250 mM Tris pH 8.5) can be prepared. 7. To simplify the purification of resulting digested samples, aim to generate microbial lysates at a concentration of 3–5 μg/μL and use no more than 80 μL of lysate for digestion. A volume of less than 80 μL will enable isopropanol and TFA to be added directly to the PCR tube prior to cleaning up the samples with SDB-RPS. 8. Multiple proteases have been used for microbial glycoproteomics, including Pepsin (Promega); Thermolysin (Promega), and Glu-C (Promega) [18, 19, 55] with the optimal protease to use dependent on the protein substrates of interest. 9. Isopropanol is a polar protic solvent that prevents the precipitation of SDC even under acidic conditions [56]. 10. CAD designs for a tip spinner can be found within the supplementary material of Harney et al. [57]. The use of tip spinners simplifies sample preparation and improves reproducibility. 11. Alternative physiochemical properties can be used to enrich microbial glycopeptides, for example, the presence of negatively charged sugars can be exploited using Titanium Dioxide based enrichment approaches [25] and this approach has been used to characterize Neisseria glycopeptides [26].

ä Fig. 4 (continued) DINQTFTQSGLYK fragmented with: (a) CID fragmentation, (b) HCD fragmentation, (c) EThcD fragmentation. Note: the z11 ion corresponds to the addition of Bac to the peptide alone not the complete 1040 Da glycoform

Glycopeptide-Centric Approaches for the Characterization of Microbial. . .

167

12. The ion-pairing agent and organic solvent used within the loading/wash buffer can be altered to tailor the enrichment of glycopeptides of interest [53, 58]. 13. ZIC-HILIC enrichment requires the resin to remain wet at all times to ensure the integrity of the pseudo-water layer on the surface of the resin required to enrich glycopeptides. When washing the resin always leave ~10 μL of solvent on the resin. 14. Depending on the complexity of the sample, the number of washes can be altered. For the enrichment of glycopeptides from a single protein 20 bed volumes (200 μL) of ZIC-HILIC loading/wash buffer is sufficient while for total proteome samples 50 bed volumes (500 μL) of ZIC-HILIC loading/ wash buffer may be required. 15. 100 mM Sodium Borate should have a pH of 9.0 and the pH should not need to be adjusted. Prior to use check pH and remake if the pH is incorrect. 16. The Ethanolamine buffer can be challenging to pH to 8.0 due to its low buffer capacity and high pKa of 9.5 at 25  C, ensure pH is correct prior to use. 17. Beads can be stored for future use in a sodium azide storage buffer (phosphate buffered saline with 0.02% sodium azide, pH 7.4) or used immediately. 18. A complete description of all parameters and acquisition approaches that can be used to collect glycoproteomic data is beyond the scope of this method chapter but readers are encouraged to refer to Riley et al. [4] for a comparison of different acquisition methods. 19. The addition of glycans can dramatically shift the observed m/z of glycopeptides outside the commonly used 350–1200 m/z range set within standard proteomic instrument methods. To ensure the detection of glycopeptides widening the m/z range to up to 2000 m/z can be advantageous. 20. Annotation tools, such as the Interactive Peptide Spectral Annotator [59] (http://www.interactivepeptidespectralannotator. com/PeptideAnnotator.html), are recommended to be used to aid the annotation of glycopeptides.

Acknowledgements N.E.S is supported by an Australian Research Council Future Fellowship (FT200100270) and an ARC Discovery Project Grant (DP210100362).

168

Nichollas E. Scott

References 1. Riley NM, Bertozzi CR, Pitteri SJ (2020a) A pragmatic guide to enrichment strategies for mass spectrometry-based glycoproteomics. Mol Cell Proteomics 20:100029. https://doi. org/10.1074/mcp.R120.002277 2. Thaysen-Andersen M, Packer NH, Schulz BL (2016) Maturing glycoproteomics technologies provide unique structural insights into the N-glycoproteome and its regulation in health and disease. Mol Cell Proteomics 15(6):1773–1790. https://doi.org/10.1074/ mcp.O115.057638 3. Thomas DR, Scott NE (2020) Glycoproteomics: growing up fast. Curr Opin Struct Biol 68:18–25. https://doi.org/10.1016/j.sbi. 2020.10.028 4. Riley NM, Malaker SA, Driessen M, Bertozzi CR (2020b) Optimal dissociation methods differ for N- and O-glycopeptides. J Proteome Res 19(8):3286–3301. https://doi.org/10. 1021/acs.jproteome.0c00218 5. Cioce A, Malaker SA, Schumann B (2021) Generating orthogonal glycosyltransferase and nucleotide sugar pairs as next-generation glycobiology tools. Curr Opin Chem Biol 60:66– 78. https://doi.org/10.1016/j.cbpa.2020. 09.001 6. Koomey M (2019) O-linked protein glycosylation in bacteria: snapshots and current perspectives. Curr Opin Struct Biol 56:198–203. https://doi.org/10.1016/j.sbi.2019.03.020 7. Nothaft H, Szymanski CM (2019) New discoveries in bacterial N-glycosylation to expand the synthetic biology toolbox. Curr Opin Chem Biol 53:16–24. https://doi.org/10. 1016/j.cbpa.2019.05.032 8. Bandini G, Albuquerque-Wendt A, Hegermann J, Samuelson J, Routier FH (2019) Protein O- and C-glycosylation pathways in toxoplasma gondii and plasmodium falciparum. Parasitology 146(14):1755–1766. h t t p s : // d o i . o r g / 1 0 . 1 0 1 7 / S0031182019000040 9. Goddard-Borger ED, Boddey JA (2018) Implications of plasmodium glycosylation on vaccine efficacy and design. Future Microbiol 13:609–612. https://doi.org/10.2217/fmb2017-0284 10. Kightlinger W, Warfel KF, DeLisa MP, Jewett MC (2020) Synthetic glycobiology: parts, systems, and applications. ACS Synth Biol 9(7): 1534–1562. https://doi.org/10.1021/ acssynbio.0c00210 11. Ding W, Nothaft H, Szymanski CM, Kelly J (2009) Identification and quantification of

glycoproteins using ion-pairing normal-phase liquid chromatography and mass spectrometry. Mol Cell Proteomics 8(9):2170–2185. https://doi.org/10.1074/mcp.M900088MCP200 12. Thomas RM, Twine SM, Fulton KM, Tessier L, Kilmury SL, Ding W et al (2011) Glycosylation of DsbA in Francisella tularensis subsp. tularensis. J Bacteriol 193(19): 5498–5509. https://doi.org/10.1128/JB. 00438-11 13. Elhenawy W, Scott NE, Tondo ML, Orellano EG, Foster LJ, Feldman MF (2016) Protein O-linked glycosylation in the plant pathogen Ralstonia solanacearum. Glycobiology 26(3): 301–311. https://doi.org/10.1093/glycob/ cwv098 14. Harding CM, Nasr MA, Kinsella RL, Scott NE, Foster LJ, Weber BS et al (2015) Acinetobacter strains carry two functional oligosaccharyltransferases, one devoted exclusively to type IV pilin, and the other one dedicated to O-glycosylation of multiple proteins. Mol Microbiol 96(5):1023–1041. https://doi. org/10.1111/mmi.12986 15. Iwashkiw JA, Seper A, Weber BS, Scott NE, Vinogradov E, Stratilo C et al (2012) Identification of a general O-linked protein glycosylation system in Acinetobacter baumannii and its role in virulence and biofilm formation. PLoS Pathog 8(6):e1002758. https://doi.org/10. 1371/journal.ppat.1002758 16. Lithgow KV, Scott NE, Iwashkiw JA, Thomson EL, Foster LJ, Feldman MF et al (2014) A general protein O-glycosylation system within the Burkholderia cepacia complex is involved in motility and virulence. Mol Microbiol 92(1): 116–137. https://doi.org/10.1111/mmi. 12540 17. Scott NE, Kinsella RL, Edwards AV, Larsen MR, Dutta S, Saba J et al (2014a) Diversity within the O-linked protein glycosylation systems of Acinetobacter species. Mol Cell Proteomics 13(9):2354–2370. https://doi.org/10. 1074/mcp.M114.038315 18. Scott NE, Marzook NB, Cain JA, Solis N, Thaysen-Andersen M, Djordjevic SP et al (2014b) Comparative proteomics and glycoproteomics reveal increased N-linked glycosylation and relaxed sequon specificity in Campylobacter jejuni NCTC11168 O. J Proteome Res 13(11):5136–5150. https://doi.org/ 10.1021/pr5005554 19. Scott NE, Nothaft H, Edwards AV, Labbate M, Djordjevic SP, Larsen MR et al (2012) Modification of the Campylobacter jejuni N-linked

Glycopeptide-Centric Approaches for the Characterization of Microbial. . . glycan by EptC protein-mediated addition of phosphoethanolamine. J Biol Chem 287(35): 29384–29396. https://doi.org/10.1074/jbc. M112.380212 20. Scott NE, Parker BL, Connolly AM, Paulech J, Edwards AV, Crossett B, et al (2011) Simultaneous glycan-peptide characterization using hydrophilic interaction chromatography and parallel fragmentation by CID, higher energy collisional dissociation, and electron transfer dissociation MS applied to the N-linked glycoproteome of Campylobacter jejuni. Mol Cell Proteomics 10(2):M000031-MCP201. https://doi.org/10.1074/mcp.M000031MCP201 21. Posch G, Pabst M, Brecker L, Altmann F, Messner P, Schaffer C (2011) Characterization and scope of S-layer protein O-glycosylation in Tannerella forsythia. J Biol Chem 286(44): 38714–38724. https://doi.org/10.1074/jbc. M111.284893 22. Scott NE, Bogema DR, Connolly AM, Falconer L, Djordjevic SP, Cordwell SJ (2009) Mass spectrometric characterization of the surface-associated 42 kDa lipoprotein JlpA as a glycosylated antigen in strains of Campylobacter jejuni. J Proteome Res 8(10): 4654–4664. https://doi.org/10.1021/ pr900544x 23. Gao L, Song Q, Liang H, Zhu Y, Wei T, Dong N et al (2019) Legionella effector SetA as a general O-glucosyltransferase for eukaryotic proteins. Nat Chem Biol 15(3):213–216. https://doi.org/10.1038/s41589-0180189-y 24. Anonsen JH, Vik A, Borud B, Viburiene R, Aas FE, Kidd SW et al (2016) Characterization of a unique tetrasaccharide and distinct glycoproteome in the O-linked protein glycosylation system of Neisseria elongata subsp. glycolytica. J Bacteriol 198(2):256–267. https://doi.org/ 10.1128/JB.00620-15 25. Palmisano G, Lendal SE, Engholm-Keller K, Leth-Larsen R, Parker BL, Larsen MR (2010) Selective enrichment of sialic acid-containing glycopeptides using titanium dioxide chromatography with analysis by HILIC and mass spectrometry. Nat Protoc 5(12):1974–1982. https://doi.org/10.1038/nprot.2010.167 26. Hadjineophytou C, Anonsen JH, Wang N, Ma KC, Viburiene R, Vik A et al (2019) Genetic determinants of genus-level glycan diversity in a bacterial protein glycosylation system. PLoS Genet 15(12):e1008532. https://doi.org/10. 1371/journal.pgen.1008532 27. Araujo-Garrido JL, Bernal-Bayard J, RamosMorales F (2020) Type III secretion effectors with arginine N-glycosyltransferase activity.

169

Microorganisms 8(3):357. https://doi.org/ 10.3390/microorganisms8030357 28. El Qaidi S, Scott NE, Hays MP, Geisbrecht BV, Watkins S, Hardwidge PR (2020) An intrabacterial activity for a T3SS effector. Sci Rep 10(1):1073. https://doi.org/10.1038/ s41598-020-58062-y 29. Gan J, Scott NE, Newson JPM, Wibawa RR, Wong Fok Lung T, Pollock GL et al (2020) The Salmonella effector SseK3 targets small Rab GTPases. Front Cell Infect Microbiol 10: 419. https://doi.org/10.3389/fcimb.2020. 00419 30. Newson JP, Scott NE, Yeuk Wah Chung I, Wong Fok Lung T, Giogha C, Gan J et al (2019) Salmonella effectors SseK1 and SseK3 target death domain proteins in the TNF and TRAIL signaling pathways. Mol Cell Proteomics 18(6):1138–1156. https://doi.org/10. 1074/mcp.RA118.001093 31. Scott NE, Giogha C, Pollock GL, Kennedy CL, Webb AI, Williamson NA et al (2017) The bacterial arginine glycosyltransferase effector NleB preferentially modifies Fas-associated death domain protein (FADD). J Biol Chem 292(42):17337–17350. https://doi.org/10. 1074/jbc.M117.805036 32. Pan M, Li S, Li X, Shao F, Liu L, Hu HG (2014) Synthesis of and specific antibody generation for glycopeptides with arginine N-GlcNAcylation. Angew Chem Int Ed Engl 53(52):14517–14521. https://doi.org/10. 1002/anie.201407824 33. Kong AT, Leprevost FV, Avtonomov DM, Mellacheruvu D, Nesvizhskii AI (2017) MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat Methods 14(5):513–520. https://doi.org/10.1038/nmeth.4256 34. Polasky DA, Yu F, Teo GC, Nesvizhskii AI (2020) Fast and comprehensive N- and O-glycoproteomics analysis with MSFraggerGlyco. Nat Methods 17(11):1125–1132. https://doi.org/10.1038/s41592-0200967-9 35. Lu L, Riley NM, Shortreed MR, Bertozzi CR, Smith LM (2020) O-pair search with metamorpheus for O-glycopeptide characterization. Nat Methods 17(11):1133–1138. https://doi. org/10.1038/s41592-020-00985-5 36. Bern M, Kil YJ, Becker C (2012) Byonic: advanced peptide and protein identification software. Curr Protoc Bioinform, Chapter 13, Unit 13 20. https://doi.org/10.1002/ 0471250953.bi1320s40 37. Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized

170

Nichollas E. Scott

p.p.b.-range mass accuracies and proteomewide protein quantification. Nat Biotechnol 26(12):1367–1372. https://doi.org/10. 1038/nbt.1511 38. Kulak NA, Pichler G, Paron I, Nagaraj N, Mann M (2014) Minimal, encapsulated proteomic-sample processing applied to copynumber estimation in eukaryotic cells. Nat Methods 11(3):319–324. https://doi.org/ 10.1038/nmeth.2834 39. Rappsilber J, Mann M, Ishihama Y (2007) Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat Protoc 2(8): 1896–1906. https://doi.org/10.1038/nprot. 2007.261 40. Izaham ARA, Scott NE (2020) Open database searching enables the identification and comparison of glycoproteomes without defining glycan compositions prior to searching. Mol Cell Proteomics 19(9):1561–1574. https:// doi.org/10.1074/mcp.TIR120.002100 41. Scott NE (2017) Characterizing glycoproteins by mass spectrometry in Campylobacter jejuni. Methods Mol Biol 1512:211–232. https:// doi.org/10.1007/978-1-4939-6536-6_18 42. Udeshi ND, Mertins P, Svinkina T, Carr SA (2013a) Large-scale identification of ubiquitination sites by mass spectrometry. Nat Protoc 8(10):1950–1960. https://doi.org/10.1038/ nprot.2013.120 43. Udeshi ND, Svinkina T, Mertins P, Kuhn E, Mani DR, Qiao JW et al (2013b) Refined preparation and use of anti-diglycine remnant (K-epsilon-GG) antibody enables routine quantification of 10,000s of ubiquitination sites in single proteomics experiments. Mol Cell Proteomics 12(3):825–831. https://doi. org/10.1074/mcp.O112.027094 44. Anonsen JH, Vik A, Egge-Jacobsen W, Koomey M (2012) An extended spectrum of target proteins and modification sites in the general O-linked protein glycosylation system in Neisseria gonorrhoeae. J Proteome Res 11(12): 5781–5793. https://doi.org/10.1021/ pr300584x 45. McKitrick TR, Goth CK, Rosenberg CS, Nakahara H, Heimburg-Molinaro J, McQuillan AM et al (2020) Development of smart anti-glycan reagents using immunized lampreys. Commun Biol 3(1):91. https://doi. org/10.1038/s42003-020-0819-2 46. Imperiali B (2019) Bacterial carbohydrate diversity—a brave new world. Curr Opin Chem Biol 53:1–8. https://doi.org/10. 1016/j.cbpa.2019.04.026

47. Chick JM, Kolippakkam D, Nusinow DP, Zhai B, Rad R, Huttlin EL et al (2015) A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides. Nat Biotechnol 33(7):743–749. https://doi.org/10. 1038/nbt.3267 48. MacLean B, Tomazela DM, Shulman N, Chambers M, Finney GL, Frewen B et al (2010) Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26(7): 966–968. https://doi.org/10.1093/bioinfor matics/btq054 49. Wisniewski JR, Zougman A, Nagaraj N, Mann M (2009) Universal sample preparation method for proteome analysis. Nat Methods 6(5):359–362. https://doi.org/10.1038/ nmeth.1322 50. Batth TS, Tollenaere MAX, Ruther P, Gonzalez-Franquesa A, Prabhakar BS, BekkerJensen S et al (2019) Protein aggregation capture on microparticles enables multipurpose proteomics sample preparation. Mol Cell Proteomics 18(5):1027–1035. https://doi.org/ 10.1074/mcp.TIR118.001270 51. Hughes CS, Moggridge S, Muller T, Sorensen PH, Morin GB, Krijgsveld J (2019) Single-pot, solid-phase-enhanced sample preparation for proteomics experiments. Nat Protoc 14(1): 68–85. https://doi.org/10.1038/s41596018-0082-x 52. HaileMariam M, Eguez RV, Singh H, Bekele S, Ameni G, Pieper R et al (2018) S-trap, an ultrafast sample-preparation approach for shotgun proteomics. J Proteome Res 17(9): 2917–2924. https://doi.org/10.1021/acs. jproteome.8b00505 53. Alagesan K, Khilji SK, Kolarich D (2017) It is all about the solvent: on the importance of the mobile phase for ZIC-HILIC glycopeptide enrichment. Anal Bioanal Chem 409(2): 529–538. https://doi.org/10.1007/s00216016-0051-6 54. Izaham ARA, Ang CS, Nie S, Bird LE, Williamson NA, Scott NE (2021) What are we missing by using hydrophilic enrichment? Improving bacterial glycoproteome coverage using total proteome and FAIMS analysis. J Proteome Res 20(1):599–612. https://doi.org/10. 1021/acs.jproteome.0c00565 55. Khurana S, Coffey MJ, John A, Uboldi AD, Huynh MH, Stewart RJ et al (2019) Protein O-fucosyltransferase 2-mediated O-glycosylation of the adhesin MIC2 is dispensable for Toxoplasma gondii tachyzoite infection. J Biol Chem 294(5):1541–1553. https://doi.org/10.1074/jbc.RA118.005357

Glycopeptide-Centric Approaches for the Characterization of Microbial. . . 56. Humphrey SJ, Karayel O, James DE, Mann M (2018) High-throughput and high-sensitivity phosphoproteomics with the EasyPhos platform. Nat Protoc 13(9):1897–1916. https:// doi.org/10.1038/s41596-018-0014-9 57. Harney DJ, Hutchison AT, Hatchwell L, Humphrey SJ, James DE, Hocking S et al (2019) Proteomic analysis of human plasma during intermittent fasting. J Proteome Res 18(5):2228–2240. https://doi.org/10.1021/ acs.jproteome.9b00090 58. Mysling S, Palmisano G, Hojrup P, ThaysenAndersen M (2010) Utilizing ion-pairing hydrophilic interaction chromatography solid phase extraction for efficient glycopeptide

171

enrichment in glycoproteomics. Anal Chem 82(13):5598–5609. https://doi.org/10. 1021/ac100530w 59. Brademan DR, Riley NM, Kwiecien NW, Coon JJ (2019) Interactive peptide spectral annotator: a versatile web-based tool for proteomic applications. Mol Cell Proteomics 18(8 suppl 1):S193–S201. https://doi.org/10.1074/ mcp.TIR118.001209 60. Cao W, Liu M, Kong S, Wu M, Zhang Y, Yang P (2021) Recent advances in software tools for more generic and precise intact glycopeptide analysis. Mol Cell Proteomics 20:100060. https://doi.org/10.1074/mcp.R120.002090

Chapter 12 Integrated Network Discovery Using Multi-Proteomic Data Rafe Helwer and Vincent C. Chen Abstract A fundamental goal of systems biology is to seek a better understanding of the cell’s molecular mechanisms. Experimentalists most frequently rely upon reductionist methods to isolate and analyze discrete signaling compartments, including subcellular domains, organelles, and protein–protein interactions. Among the systems-biology community, there is a growing need to integrate multiple datasets to resolve complex cellular networks. In this chapter, we share our procedures for the discovery of integrated signaling networks, across multi-proteomic data. Demonstrating these procedures, we provide an integrated analysis of the cellular proteome and extracellular (secretome) of human glioma LN229. Key words Proteomics, Network Enrichment Analysis, Bioinformatics, Multi-omics, Multi-proteomics, MS/MS, Network Integration, Cell Signaling

1

Introduction A fundamental goal of systems biology is to better understand how the cell is precisely controlled. Technological developments have contributed immensely towards our ability to decipher the mechanisms of cell-signal transduction. These methods typically require the isolation and subsequent analysis of discrete signaling compartments. In proteomics, these processes include the isolation of organelles (nuclei, mitochondria, ER/Golgi), subcellular fractionation, enrichment, protein–protein interactions, and the isolation of posttranslational modifications. Subcellular proteomes produced by these methods often undergo digestion (trypsin) for high-throughput identification by high-performance liquid chromatography mass spectrometry (HPLC-MS). Once analyzed, these cell-signal transduction networks are often “reconstructed” using statistical enrichment tests [1]. Although proteomic analyses have

Supplementary Information The online version contains supplementary material available at [https://doi.org/ 10.1007/978-1-0716-2124-0_12]. Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_12, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

173

174

Rafe Helwer and Vincent C. Chen

Table 1 Required software and resources for Multi-P Data/Software/Data

Input/Output

Resource or Website

MS Data files

Sample/MS Raw

Proteome Core Facility

MaxQuant [ref]

MS files (.raw, .d)/Protein ID (UniProt, .txt)

maxquant.org

gProfiler [ref]

Protein IDs (.txt)/ Network IDs (.txt)

biit.cs.ut.ee/gprofiler/gost

Protein identification

Data Processing and Presentation MS Excel

(.txt)/(.txt, graphs)

Microsoft

matured, there is a growing need to integrate multi-dimensional data to address the complexity of biological systems, and to decipher the underlying mechanisms of health and disease [2, 3]. To help address this need, we share our procedures for Multi-P. Multi-P provides the procedural framework to discover signaling networks that functionally coalesce across multiple proteomic experiments. In this approach, we calculate signal integration value (SIV) or score (Eq. 1): SIV ¼ ðP ‐value1  P ‐value2  . . . P ‐valuen Þ1=n

ð1Þ

Fully scalable, n represents the number of experiments for a given cell, or tissue-type. To facilitate comparisons, we further take nth root of this product. It should be noted that once combined into the SIVs, multiplied P-values no longer retain their statistical properties. However, as a discovery tool, Multi-P provided a versatile approach for the identification secretome and protein–protein interactions networks that are implicated in glioma invasion [4, 5]. Starting from two or more proteomic experiments (i.e., “signaling compartments”), we outline software (Table 1) and steps for signal integration discovery using Multi-P (Fig. 1).

2

Multi-P Overview The procedures outlined here will start from the outputs of mass spectrometry (MS) protein database search. These outputs are comprised of lists of identified proteins, such as the MaxQuant ProteinGroup.txt output [6]. Proteins identified here undergo pathway enrichment and SIV calculation. Although we highlight the use of MaxQuant datafiles, gProfiler and Excel (see Table 1), other equivalent software should also work. Supplemental procedures and MaxQuant output files representing the cellular proteome and secretome of LN229 human glioma have been included

Integrated Network Discovery Using Multi-Proteomic Data

175

Fig. 1 Overview of Multi-P. (a) Multi-proteomic analysis of a specific cell or tissue-type. Subcellular proteomes represent discrete signaling compartments. Proteomes undergo network enrichment. (b) Resultant P-values are used for SIV calculation/ranking (Eq. 1), and network discovery. (c) To gain further biological insight, networks and signaling compartments can be functionally mapped

(see Appendix in this chapter, as well as Electronic Supplemental Material available with this chapter on link.springer.com). Associated MS data files are available from the ProteomeXchange (www.proteomexchange.org, PXD024001).

176

3

Rafe Helwer and Vincent C. Chen

Methods

3.1 Prepare proteinGroup.txt

We will begin procedures here by preparing and extracting the relevant experimental information from a MaxQuant search [6]. As mentioned, proteinGroup files for the LN229 secretome and cellular proteomes have been made available for download (Electronic Supplemental Material, available on this chapter’s page on link.springer.com). 1. Locate the proteinGroup.txt file. The proteinGroup.txt file will contain the proteins identified by MaxQuant. This file is located within the “combined” folder -> “txt” -> “proteinGroup.txt.” By default, the combined folder will be situated in the same drive location as the searched MS files. A summary of the proteinGroup file can be found here “combine” -> “txt” -> “tables.pdf.” 2. Open the proteinGroup.txt in Excel. You will be directed to the Excel import selection tool. Import the file as “Delimited” and advance using “Next” button. Select “Tab” followed by “Finish.” You will now see the tabulated information contained within the proteinGroups. At this time, it will be useful to rename and save the files as an Excel workbook. 3. Omit contaminant and reverse proteins. Protein groups data will contain sequences for contaminants (CON_[identifier]) and reverse proteins (REV_[identifier]). Respectively, these IDs are likely artifacts of sample processing (e.g., keratins, trypsin), and represent the by-products of false discovery rate estimation (% FDR). It will be important to exclude these IDs. 4. Locate the identifiers in the Majority Protein ID column. Unique protein identifiers will have the form of a 6, or in some cases a 10-character code. As it is possible for a given peptide to be traced to two or more proteins, identifications may be reported as a string of UniProt identifiers. As the size of this group may introduce bias, we limit these identifiers to the proteins with the strongest evidence. Fortunately, proteins with the strongest evidenced (the largest number of MS/MS peptide spectrum matches) are listed first. In the following example: “tr|E7ETU9|E7ETU9_HUMAN;sp|O00469| PLOD2_HUMAN” we observe “E7ETU9” and “O00469.” Of these, only E7ETU9 will be used. 5. Transfer identifier string to a new worksheet. In the Excel workbook, Press the “+” located at the bottom of the page. This will add a new worksheet. Use the “select,” “copy”

Integrated Network Discovery Using Multi-Proteomic Data

177

[Crtl + C], and “paste” [Ctrl + V] functions to transfer the entirety of this column to this worksheet. 6. Parse UniProt identifiers (Sheet 1). As noted in step 4, Subheading 3.1, protein ID string will be delimited, in this case by a vertical bar [ | ]. We will use this delimiter to parse information. Select the protein groups column. In the “Data” menu, select “Text to Columns.” Within the conversion wizard, select the “Delimited,” followed by “Next.” Select “Other:” and enter the vertical bar [Shift + \] (or appropriate delimiter), and press “Next.” Under the “Column data format” select “General.” Click “Finish” to advance. Identifiers will now be separated. Select the column containing the “first” UniProt/ID code. Transfer this column to a new worksheet and save. 7. Repeat steps 1–6, Subheading 3.1 for each proteinGroup.txt of the multi-proteomic set. Excel workbooks completing these steps have been provided (see examples LN229_cellular_proteome_proteinGroups.xlsx and LN229_secretome_proteinGroups.xlsx). This data will be used for network enrichment. 3.2 Network Enrichment Analysis

G:Profiler is a collection of tools for the computational analysis of biological networks [1]. Of these, functional enrichments will be conducted with g:GOSt [1]. Databases available include Gene Ontology Molecular Functions (GO:MF), Gene Ontology Biological Processes (GO:BP), Gene Ontology Cellular Components (GO:CC), Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome (REACT), Wiki Pathways (WP), Transcription Factors (TF), miRNA, Human Protein Atlas (HPA), the comprehensive resource of mammalian protein complexes (CORUM), and the Human Phenotype Ontology (HP) [1]. 1. Upload protein lists to gProfiler using the web site: https://biit.cs.ut.ee/gprofiler/. Transfer the UniProt identifiers generated in step 7, Subheading 3.1 using copy/paste. Select the appropriate organism from the gProfiler pull-down menu. In this example we will use “Homo sapiens” to characterize the networks of LN229. We will use the default settings for “Advance options” and “Data sources” (Fig. 2a–c). Click “Run Query” to perform analysis. 2. Download gProfiler results. Under the “Results” and “Detailed Results” tabs you may review networks (Fig. 2d). Under the “Detailed Results,” you are given options to select all or a subset of this data. Download the CSV file (Fig. 2e) and Save.

178

Rafe Helwer and Vincent C. Chen

Fig. 2 g:Profiler. (a–c) Protein IDs are submitted to g:Profiler, selecting options for organisms and network databases (source). (d) g:Profiler output summary highlighting functionally enriched networks and P-value. (e) Export the data output file in CSV format for further processing

3. Repeat steps 1–2, Subheading 3.2 for each proteomic dataset. Example output of these steps has been provided (see gProfiler_LN229_cellular_proteome_hsapiens.xlsx and gProfiler_LN229_secretome_hsapiens.xlsx). This data will be for SIV calculation and the discovery of integrated networks by Multi-P. 3.3 Multi-P & SIV Calculation

For SIV calculation it will be necessary combine networks IDs and significance level. The outputs from gProfiler contain information on the database source (e.g., GO, KEGG, CORUM), significance

Integrated Network Discovery Using Multi-Proteomic Data

179

Fig. 3 (a) Operations for extracting network p-values from multi-proteomic datasets using Excel. In this example, ¼VLOOKUP(D3, L:M, 2, FALSE) references “D3” (secretome term_id ¼ 10, highlighted blue). The adjusted P-value reported by this operation is 0.002099874. (b) Operation for SIV calculation across multiproteomic datasets

(p-value, adjusted p-value), network size (term_size), and the number and identity of proteins within your data. Using this information, we will complete the remaining steps to discover collaborative networks using Multi-P. 1. Open the g:Profiler CSV files in Excel. Copy/paste gProfiler results into adjacent columns. We have included an example . xlsx compiling this data (gProfiler_Combine_Secretome&CellularProteome.xlsx). Some users may find it useful to inspect the organization and operations of this file. 2. Correlate network P-values across multiple proteomic datasets using the Excel VLOOKUP function. It will be necessary to determine networks that are unique or shared across two or more signaling compartments. To do this, we will use VLOOKUP. VLOOKUP incorporates four parameters: (1) lookup value, (2) an array of columns that will be search, and (3) column position of the value (within the array) that will be returned; and (4) instructions to report a close (TRUE) or an exact match (FALSE, Fig. 3). Here, “ ¼VLOOKUP (D3, L:M, 2, FALSE)” will search for the contents of D3 among columns L:M. Once “D3” has been found in

180

Rafe Helwer and Vincent C. Chen

column L, the operation will report the contents of the cell at the second position of the array (column M). Using network identifier “term_id,” we will extract P-values for each network. This step will be repeated for all experimental/proteomic datasets. This will report the corresponding network P-value or a #N/A (not applicable/available). 3. Replace #N/A with 1. This will allow for the calculation of SIVs. Transfer the contents of this sheet (step 2, Subheading 3.3) to a new worksheet. Using copy [Crtl + C], use the “Paste Special” to transfer “Values.” To allow SIVs to be calculated, we will now replace all #N/As with “1” using the “Find” and “Replace” function. In the Replace window, “Find what:” enter “#N/A” and “Replace with:” enter “1.” Press Replace All to complete this task. In the provided example (gProfiler_Combine_Secretome&CellularProteome. xlsx), the worksheet completing this task is labeled “#NA->1.” 4. Generate SIV. In this worksheet, calculate SIVs. Original for each proteomic dataset, we will now combine network enrichment scores (i.e. P-value) using the SIV calculation (see Eq. 1, see Fig. 3b). Across networks, we will calculate SIV using the ¼ sqrt([P-value1]*[P-value2]) function. 5. Unify data and remove duplicates. Unify networks data by combining columns, transfer columns generated in step 4, Subheading 3.3 to a new worksheet. Under the “Data” menu, we will now remove duplicate using “Remove Duplicates.” SIVs listed after this operation will now be unique. In the gProfiler_Combine_Secretome&CellularProteome.xlsx example, this sheet is labeled “term id sorted duplicate remove.” 6. Sort and organize this data by SIV. We have sorted worksheets by network (GO:CC, GO:BP, CORUM, etc.) and rank order these networks by SIV. 7. Radar plots and other network visualization tools. We find these plots provide a reasonable representation of the data. We include example results for GO Cellular Compartment (Fig. 4) and integrated signals involving VEGFA/VEGFR2 (Fig. 5). It some case it may be useful to transfer integrated networks to other visualization tools such as Wikipathways and Cytoscape [7, 8].

Integrated Network Discovery Using Multi-Proteomic Data

181

Fig. 4 (a) Multi-P radar plots. Individual radar plots for secretome and cellular proteome for GO: Cellular Compartments. (b) Combined analysis using Multi-P cells suggests the secretome and cellular proteome intersect the regulation of the extracellular matrix/environment (Extracellular exosome, Extracellular organelle, Extracellular vesicle). (c) “Extracellular exosome” network appears to traverse the cellular proteome and secretome compartments of LN229 cells. Venn diagram summarizes the distribution of proteins that are unique and shared between these signaling compartments

182

Rafe Helwer and Vincent C. Chen

Fig. 5 (a) SIV radar plot demonstrates networks for Cytoplasmic Ribosome (Rank 1) and VEGFA-VEGFR2 (Rank 2). (b) Venn diagram of the VEGFA-VEGFR2 Network demonstrates the identity and distribution of these proteins. (c) Network map of the VEGFA-VEGFR2 (Wikipathway, WP3888) provides a systems-level view of proteins found in the secretome and cellular fractions of LN229 cells

Integrated Network Discovery Using Multi-Proteomic Data

183

Acknowledgements VC is supported by NSERC Discovery Grant and the Canadian Foundation for Innovation (CFI). The authors would like to acknowledge Christian C. Naus and Wun Chey Sin for provisions of LN229 glioma. The authors also thank Leonard J. Foster and Nikolay Stoynov for providing LC-MS/MS instrument time. References 1. Raudvere U, Kolberg L, Kuzmin I, Arak T, Adler P, Peterson H et al (2019) g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res 47(W1):W191–W1W8. https://doi.org/10.1093/nar/gkz369 2. Yan J, Risacher SL, Shen L, Saykin AJ (2018) Network approaches to systems biology analysis of complex disease: integrative methods for multi-omics data. Brief Bioinform 19(6): 1370–1381. https://doi.org/10.1093/bib/ bbx066 3. Hasin Y, Seldin M, Lusis A (2017) Multi-omics approaches to disease. Genome Biol 18(1):83. https://doi.org/10.1186/s13059-017-1215-1 4. Aftab Q, Mesnil M, Ojefua E, Poole A, Noordenbos J, Strale PO et al (2019) Cx43associated secretome and interactome reveal synergistic mechanisms for glioma migration and MMP3 activation. Front Neurosci 13:143. https://doi.org/10.3389/fnins.2019.00143 5. Poole AT, Sitko CA, Le C, Naus CC, Hill BM, Bushnell EAC et al (2020) Examination of

sulfonamide-based inhibitors of MMP3 using the conditioned media of invasive glioma cells. J Enzyme Inhib Med Chem 35(1):672–681. https://doi.org/10.1080/14756366.2020. 1715387 6. Tyanova S, Temu T, Cox J (2016) The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat Protoc 11(12):2301–2319. https://doi.org/ 10.1038/nprot.2016.136 7. Martens M, Ammar A, Riutta A, Waagmeester A, Slenter DN, Hanspers K et al (2021) WikiPathways: connecting communities. Nucleic Acids Res 49(D1):D613–D621. https://doi.org/10.1093/nar/gkaa1024 8. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D et al (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504. https://doi.org/10.1101/ gr.1239303

Chapter 13 Targeted Cross-Linking Mass Spectrometry on Single-Step Affinity Purified Molecular Complexes in the Yeast Saccharomyces cerevisiae Christian Trahan and Marlene Oeffinger Abstract Protein cross-linking mass spectrometry (XL-MS) has been developed into a powerful and robust tool that is now well implemented and routinely used by an increasing number of laboratories. While bulk crosslinking of complexes provides useful information on whole complexes, it is limiting for the probing of specific protein “neighbourhoods,” or vicinity interactomes. For example, it is not unusual to find crosslinked peptide pairs that are disproportionately overrepresented compared to the surface areas of complexes, while very few or no cross-links are identified in other regions. When studying dynamic complexes along their pathways, some vicinity cross-links may be of too low abundance in the pool of heterogenous complexes of interest to be efficiently identified by standard XL-MS. In this chapter, we describe a targeted XL-MS approach from single-step affinity purified (ssAP) complexes that enables the investigation of specific protein “neighbourhoods” within molecular complexes in yeast, using a small cross-linker anchoring tag, the CH-tag. One advantage of this method over a general cross-linking strategy is the possibility to significantly enrich for localized anchored-cross-links within complexes, thus yielding a higher sensitivity to detect highly dynamic or low abundance protein interactions within a specific protein “neighbourhood” occurring along the pathway of a selected bait protein. Moreover, many variations of the method can be employed; the ssAP-tag and the CH-tag can either be fused to the same or different proteins in the complex, or the CH-tag can be fused to multiple protein components in the same cell line to explore dynamic vicinity interactions along a pathway. Key words ssAP-anchXL-MS, anchXL-MS, Targeted XL-MS, Targeted CL-MS, XL-MS, CL-MS, Protein cross-linking, Single-step affinity purification, Mass spectrometry, Cryo-lysate, Cryo-milling, Yeast

1

Introduction Chemical cross-linking mass spectrometry (XL-MS; also often abbreviated as CL-MS or CX-MS) has become a robust tool adopted by many laboratories, often in conjunction with other structural investigation techniques [1–5], or used on its own to simply determine protein interfaces, protein hierarchy, or

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_13, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

185

186

Christian Trahan and Marlene Oeffinger

positioning as well as surface topology within molecular complexes. XL-MS techniques can be applied to any protein complex and it has been successfully applied to chromatin associated complexes [6– 13], the transcription machinery [14–27], the nuclear pore complex, and membrane-bound complexes [28–35], ribonucleoproteins such as ribosomes and mitoribosomes [36–43], protein-only complexes such as the proteasome or purified monomeric or multimeric protein [44–48], and a mixture of these through crosslinking of whole cell proteomes [49–62]. Protein–protein interactions within complexes are dynamic to varying degrees within their respective pathways. While standard XL-MS requires less homogeneity compared to most structural investigation techniques, and mixing cross-linkers with different chemistries can increase overall coverage and sensitivity [4], it is, however, not unusual to find cross-linked peptide pairs that are disproportionately overrepresented over a complex’s surface area, while very few or no cross-links can be identified in other regions. Plausible explanations for the absence of cross-links at specific “neighbourhoods” within complexes can include a lack of interactors, a lack of reactive amino acid residues, or a lack of sensitivity to detect transient and/or low abundance interactors from cell subcomplexes or cellular networks. We have developed a targeted cross-linking approach to identify near neighbors or vicinity interactomes within stable as well as highly dynamic or low abundance (sub)complexes along selected pathways in yeast. Using this method, we previously identified specific interactions between Mex67/Mtr2 dimers and (Fx)FG repeats of Nsp1 and Nup159 as well as Gle1, visualizing the passage of the Mex67/Mtr2 mRNA export dimer through the nuclear pore [63]. The method employs cryo-milling of cells, which better preserves the integrity of complexes compared to other lysis methods, and uses a rapid single-step affinity purification (ssAP) to isolate complexes via a protein A (PrA)-tagged protein that is purified with magnetic beads densely conjugated with rabbit IgG antibodies [64, 65]. Once purified and while still bound to the resin ex vivo, the complexes are then cross-linked with the commercially available heterobifunctional SM(PEG)2 cross-linker (Fig. 1a) directly on the beads in a two-step reaction to (1) first anchor the cross-links via its maleimide moiety to an exposed sulfhydryl group introduced at a specific location in a complex of interest, and (2) probe the “neighbourhood” for nearby lysines (and to a lesser extent serines, threonines, and tyrosines) by simply changing the pH to activate the N-hydroxysuccinimide (NHS) ester group. As a means to anchor cross-links (anchXL), we developed the CH-tag consisting of an arginine (R) that serves as a tag cleavage site for trypsin, followed by a cysteine (C) to which the cross-linker is anchored via its maleimide moiety in the first step, an aspartate-proline (DP) that facilitates MS fragmentation of the tag, and a His10 sequence for

Targeted XL-MS on Single-Step Affinity Purified Complexes in Yeast

187

Fig. 1 Structure of the SM(PEG)2 cross-linker and illustrations of CH-tag variations that can be used for ssAPanchXL-MS. (a) The SM(PEG)2 is a heterobifunctional cross-linker that displays both NHS ester and maleimide groups, linked by a spacer arm of defined length containing two polyethylene glycol units. (b) The CH-tag can be fused on any protein of interest (POI) end extremities. It can be applied on a single protein, in which case both POI independent and dependent schemes can be used. It can also be used on more than one protein at a time in a (sub)complex, in which case, only the POI terminal peptide-dependent CH-tag scheme can be used to identify the origins of cross-linked peptides via the POI terminal peptide

efficient IMAC enrichment of the cross-linked and trypsinized peptides (Fig. 1b). The anchoring strategy takes advantage of the rarity of cysteines in proteins [66], even more so on protein surfaces. By anchoring the cross-linker to a specific location on an affinity purifiable complex, it is possible to achieve significantly increased sensitivity to detect cross-links in a defined region. Different variations of our method can be used; for example, the CH-tag can be fused to any protein termini, but the sequence of the tag will have to be taken into consideration depending on which extremity the CH-tag is used (Fig. 1b). It can also be combined with an affinity purification epitope tag (i.e., PrA) on one protein. Yet while having a cell line with both affinity and anchoring tags on a single protein would reveal interactions along its whole cellular

188

Christian Trahan and Marlene Oeffinger

pathway, we do not recommend this strategy, as proteins that fold in a two-step kinetic have both termini in close proximity [67]. Having both tags in tandem at one protein extremity could lead to lower detection of protein interactors caused by stearic hindrance of PrA towards the CH-tag. Hence it is recommended to select proteins that fold in a non-two-step kinetic to use this approach. Moreover, by fusing the tags to two different proteins present in a complex or along a pathway, only interactions occurring from the moment that the two tagged proteins are co-present within the complex(es) will be detected. It is also possible to envision a strain in which multiple proteins along a highly dynamic complex are fused to CH-tags by studying changes over time and potential subcomplex dependencies by taking advantage of the degenerated genetic code to prevent recombination of a new CH-tag over an already inserted CH-tag. Using multiple CH-tagged proteins from common (sub)complexes requires the elimination of the arginine in the CH-tag to assign the anchored cross-links to the correct CH-tagged protein by identifying the native N- or C-terminal protein peptide as a bar code fused to the CH-tag (Fig. 1b). We are currently creating such strains in our lab using CRISPR [68]. The length of the cross-linker spacer arm can also be varied. The SM(PEG)n cross-linker series is available with different homogenous length of polyethylene glycol (PEG) spacer arms, SM(PEG)2,4,6,8,12 and 24, corresponding to spacer arm’s length of 17.6, 26.4, 32.5, 39.25, 53.4, and 92.5 A˚, respectively. Shorter versions of the reagent usually result in fewer observable cross-links; however, a shorter length of the spacer arms provides more stringent distance constraints for structural modeling. Conversely, longer spacer arms normally yield more observable cross-links but provide little structural information beyond interactions [69]. The amount of starting material for ssAP-anchXL-MS is crucial and will have to be determined empirically. It relies on multiple factors such as (1) the buffer conditions, (2) the protein copy number per cell, (3) the amount of a free bait versus the bait pool fraction assembled into complexes, (4) the ratio of the CH-tag protein over the PrA-tagged bait protein in cases where the tags are fused to different proteins, (5) the relative abundance of interactors near the anchoring tag of the CH-tagged protein along the pathway of the bait protein, as well as (6) accessibility of both ssAP and CH-tags. General guidelines to determine the amount of cryolysed cell material (cell grindate) required and buffer optimization is described in [65]. Here, we describe our ssAP-anchXL-MS workflow (Fig. 2) and use the Saccharomyces cerevisiae Mex67-PrA/ Mtr2-CH endogenously tagged heterodimer as an example.

Targeted XL-MS on Single-Step Affinity Purified Complexes in Yeast

189

Fig. 2 Workflow of the ssAP-anchXL-MS method. A cryo-grindate is firstly made from a cell strain in which the protein for ssAP is tagged with three IgG binding domain of protein A (PrA), in addition to a CH-tagged protein of interest that takes part in the same pathway. The cryo-grindate is resuspended in a selected buffer, the complexes are purified by single-step affinity using IgG-coupled magnetic beads, and the cross-linking reaction is performed in a two-step fashion, while complexes are attached to the beads (ex vivo). The maleimide group is first anchored to the cysteine via its sulfhydryl group at pH 6.6, and, after removing excess of unanchored SM(PEG)2, the NHS ester group is activated at pH 8.0 to mainly react with primary amines of nearby lysines and protein N-termini. The cross-linked complexes are then digested on the beads, and the CH-tagged peptides enriched for using an UptiTip (InterChim) nickel coated pipette tip. The samples are then dried and resuspended, injected into the MS instrument, and analyzed with pLink2. Optionally, a peptide size exclusion chromatography step can be used to remove uncross-linked CH-tags as well as CH-tags with hydrolyzed or Tris quenched cross-linkers from the samples if these are in excess

2

Materials

2.1 Titrating the SM (PEG)2 Cross-Linker 2.1.1 Small-Scale ssAP

1. Predetermined ssAP buffer with 1 mM DTT, protease inhibitors, and 1:5000 Antifoam B Emulsion. Avoid using buffers that contain primary amines like Tris or glycine. 2. Vortex. 3. Ice and ice bucket. 4. Metal spatula 5. Liquid nitrogen 6. Styrofoam box that fits a 50 mL tube rack

190

Christian Trahan and Marlene Oeffinger

7. 50 mL tube rack 8. Polytron (Kinematica, PT 1200 E) equipped with either a 7- or 12-mm EC standard immersion disperser for volumes below 10 or 250 mL, respectively. 9. 15- or 50-mL conical polypropylene tubes. 10. Centrifuge for 15-and 50-mL tubes capable of 2600 rcf. 11. Conjugated IgG magnetic beads [65]. 12. 1.5 mL LoBind tubes. 13. 2 mL LoBind tubes. 14. Timer. 15. Nutator or any slow rocking/rotating platform or wheel providing gentle mixing. 16. Cold room. 17. DynaMag-2, -15 and/or -50 Magnets (Life Technologies). 18. Optional: Last Wash Buffer (LWB; make fresh): 0.1 M NH4OAc, 0.1 mM MgCl2, 0.02% Tween-20 in MS grade water. 19. Elution solution: 1.48 M (10%) NH4OH, 0.5 mM EDTA (make fresh). 20. Vacuum concentrator compatible with organic solvents (SpeedVac). 21. Maleimide reaction buffer (MRB): 50 mM Sodium phosphate pH 6.6, 150 mM NaCl, 1 mM EDTA. 2.1.2 Two Steps SM (PEG)2 Cross-Linking Titration

1. 2 mL LoBind tubes. 2. SM(PEG)2 cross-linker no-weigh format. 3. Maleimide reaction buffer (MRB): 50 mM sodium phosphate pH 6.6, 150 mM NaCl, 1 mM EDTA. 4. NHS ester reaction buffer (NRB): 50 mM sodium phosphate pH 8, 150 mM NaCl, 1 mM EDTA. 5. Vortex. 6. Tris 1 M pH 8. 7. Elution solution: 1.48 M NH4OH in water (make fresh). 8. SDS-PAGE sample loading buffer: mix half-half solution A and B (see below). 9. Solution A: 0.5 M Tris-HCl pH 8, 5% SDS. 10. Solution B: 75% glycerol, 124.5 mM DTT, 0.05% Bromophenol Blue; store at 4  C and add fresh DTT separately before use. 11. SDS-PAGE system and gels. 12. Western blot transfer apparatus.

Targeted XL-MS on Single-Step Affinity Purified Complexes in Yeast

191

13. Nitrocellulose membrane 0.2 μ. 14. Anti-His mouse monoclonal antibody, 1:1000 (ABM good). 15. Peroxidase anti-peroxidase rabbit, 1 :5000 . 2.2 Estimating CHTagged Protein Amounts from Isolated Small-Scale ssAP Complexes

2.2.1 Method 1

Two methods can be used to estimate the abundance of CH-tagged proteins from ssAP-anchXL experiments. The first method can also be used to estimate the ratio of interprotein cross-links over free and anchored SM(PEG)2 CH-tagged proteins. This allows the user to re-evaluate the efficiency of interprotein cross-links and monitor for consistency across experiments. Alternatively, the second method is faster but no such ratio can be determined. Determining this ratio will only be possible from the SM(PEG)2 titration before proceeding with a large scale ssAP-anchXL-MS experiment. 1. SDS-PAGE sample loading buffer: mix half-half solution A and B (see below). 2. Solution A: 0.5 M Tris-HCl pH 8, 5% SDS. 3. Solution B: 75% glycerol, 124.5 mM DTT, 0.05% Bromophenol Blue; store at 4  C and add fresh DTT separately before use. 4. SDS-PAGE system and gels. 5. Western blot transfer apparatus. 6. Nitrocellulose membrane 0.2 μ. 7. Purified His-tagged protein of known concentration. 8. Anti-His mouse monoclonal antibody, 1:1000 (ABM good). 9. ImageJ (http://wsr.imagej.net/distros/).

2.2.2 Method 2

1. Sodium phosphate 50 mM pH 7.4 with 150 mM NaCl. 2. Nitrocellulose membrane 0.2 μ. 3. Purified His-tagged protein of known concentration or CH-tag synthetic peptide. 4. Anti-His mouse monoclonal antibody, 1:1000 (ABM good). 5. ImageJ (http://wsr.imagej.net/distros/).

2.3 Large-scale ssAP-anchXL-MS

1. ssAP buffer with 1 mM DTT, protease inhibitors, and 1:5000 Antifoam B Emulsion.

2.3.1 Large-scale ssAP

2. Vortex. 3. Ice and ice bucket. 4. Metal spatula. 5. Liquid nitrogen. 6. Styrofoam box that fits a 50 mL tube rack. 7. 50 mL tube rack.

192

Christian Trahan and Marlene Oeffinger

8. Polytron (Kinematica, PT 1200 E) equipped with either a 12 mm EC standard immersion disperser for volumes between 10 and 250 mL. 9. 50 mL conical polypropylene tubes. 10. Centrifuge for 50 mL tubes capable of 2600 rcf. 11. Conjugated IgG magnetic beads [65]. 12. 1.5 mL LoBind tubes. 13. Timer. 14. Nutator or any slow rocking/rotating platform or wheel providing gentle mixing. 15. Cold room. 16. DynaMag-2, -15 and/or -50 Magnets (Life Technologies). 17. Last Wash Buffer (LWB; make fresh): 0.1 M NH4OAc, 0.1 mM MgCl2, 0.02% Tween-20 in MS grade water. 18. Elution solution: 1.48 M (10%) NH4OH, 0.5 mM EDTA (make fresh). 19. Vacuum concentrator compatible with organic solvents (SpeedVac). 20. Maleimide reaction buffer (MRB): 50 mM Sodium phosphate pH 6.6, 150 mM NaCl, 1 mM EDTA. 2.3.2 SM(PEG)2 TwoStep Reaction

1. SM(PEG)2 cross-linker no-weigh format. 2. Maleimide reaction buffer (MRB): 50 mM sodium phosphate pH 6.6, 150 mM NaCl, 1 mM EDTA. 3. NHS ester reaction buffer (NRB): 50 mM sodium phosphate pH 8, 150 mM NaCl, 1 mM EDTA. 4. Elution solution: 1.48 M NH4OH in water (make fresh). 5. SDS-PAGE sample loading buffer: mix half-half solution A and B (see below). 6. Solution A: 0.5 M Tris-HCl pH 8, 5% SDS. 7. Solution B: 75% glycerol, 124.5 mM DTT, 0.05% Bromophenol Blue; store at 4  C and add fresh DTT separately before use. 8. SDS-PAGE system and gels. 9. Western blot transfer apparatus. 10. Nitrocellulose membrane 0.2 μ. 11. Trypsin digestion buffer: 50 mM sodium phosphate pH 8. 12. Trypsin or Trypsin/Lys-C mix (Promega).

2.3.3 IMAC Enrichment of CH-Tagged CrossLinked Peptides

1. UptiTip nickel coated 10–200 μL (Interchim). 2. 1.5 mL LoBind tubes. 3. MS grade water.

Targeted XL-MS on Single-Step Affinity Purified Complexes in Yeast

193

4. UptiTip Pre-Wash (UPW) buffer: 5% acetonitrile (HPLC grade) and 5% acetic acid in MS grade water. 5. UptiTip Equilibration (UEq) buffer: 50 mM sodium phosphate buffer pH 8 with 250 mM NaCl and 5 mM high purity imidazole in MS grade water. 6. UptiTip Wash Buffer (UWB): 50 mM sodium phosphate buffer pH 6 and 60% acetonitrile in MS grade water. 7. UptiTip Elution Buffer (UEB): 0.5 M NH4OH and 5% acetonitrile in MS grade water. 8. Vacuum concentrator compatible with organic solvent (SpeedVac). 2.4 MS Method and Analysis

1. MSB: 1% trifluoroacetic acid (TFA), 15% acetonitrile (ACN), 1 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP). 2. Solvent A: 0.2% formic acid in MS grade water. 3. Solvent B: 0.2% formic acid in 100% acetonitrile (ACN). 4. PicoFrit fused silica capillary column 15 cm  75 μm i.d (New Objective). 5. C18 reverse-phase media: Jupiter 5 μm particles, 300 A˚ pore size (Phenomenex). 6. Easy-nLC II system (Thermo Fisher Scientific). 7. Nanospray Flex Ion source (Thermo Fisher Scientific). 8. Mass spectrometer: Orbitrap Fusion mass spectrometer (Thermo Fisher Scientific). 9. pLink v2.3.9 and higher (http://pfind.ict.ac.cn/index.html).

3

Method

3.1 Starting Amount of Material and ssAP Buffer Optimization

For conjugation of Dynabeads epoxy M-270 IgG, cryogrinding (cryo-milling), cell grindate weight, and amounts of grindate requirements for small-scale ssAP please refer to [65].

3.2 Titrating the SM (PEG)2 Cross-Linker

Once the optimal buffer and small-scale ssAP amount of grindate have been determined, start by titrating the amount of SM(PEG)2 cross-linker needed for optimal cross-linking of ssAP material. Because cysteines are of low abundance, the SM(PEG)2 concentration required to optimally anchor the maleimide reactive group to cysteines in the first of the 2-step reaction will be lower than the concentrations commonly used for NHS ester homobifunctional cross-linkers.

194

Christian Trahan and Marlene Oeffinger

Fig. 3 SM(PEG)2 titration on small-scale ssAP. To determine the optimal SM (PEG)2 concentration to use for the ssAP-anchXL-MS, it is preferable to first titrate the cross-linker in a small-scale ssAP. The titration of Mtr2-CH from Mex67-PrA isolated complexes is provided as an example. Because the SM (PEG)2 reaction is performed in a two-step fashion, eliminating unreacted crosslinkers in-between, and since the cysteines are of low abundance chances of over cross-linking are very low. We recommend using twice the concentration needed to reach a plateau, which would be 100 μM in the example shown here. Mex67-Mtr2-CH cross-links are identified while cross-links of unknown composition are marked by asterisks 3.2.1 Small-scale ssAP

1. While keeping the grindate frozen (in a rack submerged in a box filled half-way with liquid N2), weigh out the amount of grindate needed for all SM(PEG)2 concentration titration samples, plus the equivalent of half a small-scale sample into a 10 mL polypropylene tube. This will ensure enough material for each condition considering pipetting errors after lysate clearing. For example, with Mex67-PrA/Mtr2-CH requiring 0.2 g per small-scale ssAP, testing six cross-linking concentrations ranging from 0 to 100 μM (Fig. 3) will require 1.3 g of grindate. Let the tubes with the weighed-out grindate sit on ice until it has an ice cream-like appearance. If more than 1.5 g of grindate is needed, either divide the grindate in two 15 mL tubes or use one 50 mL tube.

Targeted XL-MS on Single-Step Affinity Purified Complexes in Yeast

195

2. Add the predetermined ssAP buffer containing 1 mM DTT, 1: 5000 antifoam B and protease inhibitors in a 9:1 buffer to grindate ratio, and vortex for 30 s to 1 min, cooling the sample on ice after 30 s vortexing. 3. With the crude lysate on ice, further homogenize the sample by using a polytron equipped with a proper size dispersing probe set at two-third of its full speed for 30 s. 4. Clear the lysate by centrifugation at 2900 rcf at 4  C for 10 min. 5. During this time, pre-wash the amount of beads required for all SM(PEG)2 concentrations tested, plus the equivalent of half a condition. To test for six conditions between 0 and 100 μM as in Fig. 3, use six times 10 μL of conjugated Dynabeads, plus 5 μL (65 μL total) to account for pipetting error when dispensing the beads in microtubes after washing them. In the current example, pipette 65 μL of beads in a 1.5–2 mL microtube. Wash the beads with 1 mL of ssAP buffer making sure the beads are uniformly resuspended during each wash either by pipetting or by sufficiently inverting the microtube. If pipetting is used, rinse the pipette tip in the cleared buffer once the beads are magnetically recovered to avoid loss of beads and material. If inversion is used, invert the tubes slowly while they are on the magnet holder to recover the beads left under the microtube cap. Make sure to remove the buffer from underneath the cap, if inversion is used. 6. Repeat step 5 twice and then resuspend the beads at 100 μL ssAP buffer per 10 μL of beads. Make sure that the beads are homogeneously resuspended before transferring 100 μL of beads to new 2 mL round bottom microtubes, one tube per tested cross-linker concentration. Leave the tubes aside, off the magnet. For our Mex67-PrA/Mtr2-CH example, resuspend the beads in 650 μL of ssAP buffer, and transfer 100 μL of pre-washed beads into six new 2-mL round bottom microtubes. Round bottom tubes are best to test cross-linking conditions because they allow the beads to be gently mixed in low volumes while remaining in suspension compared to the narrower v-shaped bottom of 1.5 mL microtubes. The size of tubes needed may vary according to the amount of grindate and ssAP buffer volume required for any specific bait in relation to its cell abundance/copy number [65]. 7. When the lysate is finished clearing by centrifugation, on ice, transfer the supernatant into a fresh tube with a pipette and measure its total volume. The volume of cleared lysate to use for each cross-linker concentration tested equals the total volume divided by the number of cross-linking reaction conditions planned (see step 2). In the current example for Mex67-

196

Christian Trahan and Marlene Oeffinger

PrA/Mtr2-CH, the total volume would be divided by 6.5 samples. 8. Place the pre-washed bead microtubes on the magnetic rack and remove the buffer. 9. Take the tube off the magnetic rack and dispense the predetermined volume of clear lysate in each microtube. 10. Incubate the tubes for 30 min at 4  C on a nutator, a slow oscillating platform, or slow rotating wheel. 11. At room temperature, wash the beads three-times with the ssAP buffer. From this point, all steps are performed at room temperature. 12. Wash the beads with 1 mL of LWB for 5 min with gentle agitation. Caution: it is crucial to efficiently remove the ammonium salt before proceeding with the two-step cross-linking reactions as it can compete with the NHS ester coupling of primary amines. 13. Wash and equilibrate the purified complexes bound to the beads four times with 1 mL of MRB to efficiently remove the DTT from the ssAP buffer. The sulfhydryl containing DTT can compete with the maleimide during the sulfhydryl coupling reaction. The types of salts and their concentrations may be varied for the 2-step cross-linking reactions without affecting the reactions, but keep in mind that it can have an impact on protein folding and protein–protein interactions. The type and concentration of salt that was used during ssAP can be kept for the two-step cross-linking reactions, but we recommend 150 mM NaCl for more physiological condition. Caution: do not use any salt containing primary amines. If the beads are not already in a 2 mL LoBind tube, use the last equilibration wash to transfer the beads. 3.2.2 Titrating the SM (PEG)2 Cross-Linker

1. Resuspend the beads from each tube in 20 μL of MRB with salt containing the various predetermined SM(PEG)2 concentrations for the titration. Remember to include a negative control without cross-linker. 2. Vortex at low speed (350 rpm) at 25  C or room temperature for 30 min. From time to time, verify that the beads do not sediment at the bottom of the tube, but stays in suspension. If the beads sediments increase, the agitation until they do not. 3. Wash the beads quickly three times with 1 mL of MRB to remove any cross-linkers that were not anchored to cysteines. 4. With the tubes off the magnetic tube holder, add 1.35 mL of NRB with salt to quickly disperse and dilute the beads while activating the NHS ester group of anchored cross-linkers.

Targeted XL-MS on Single-Step Affinity Purified Complexes in Yeast

197

5. Incubate for 45–60 min on a nutator, rotating wheel or any other slow agitation platforms. 6. Quench the NHS ester group by adding 150 μL of Tris 1 M pH 8 and incubate 15 more min. Put the tubes on the magnet and, once cleared, discard the supernatant. 7. Elute the cross-linked complexes twice with 500 μL of 1.48 M NH4OH, incubating 20 min each on a slow agitating platform or turning wheel. 8. Pool both eluate fractions and bring all samples to dryness overnight using a SpeedVac that can accommodate organic solvent, without turning the heating on. 9. Resuspend the dried cross-linked complexes in 20 μL of SDS-PAGE sample loading buffer. 10. To monitor the anchored cross-linking titration reactions by western blotting, start by loading and separating one-tenth of each resuspended eluate per SDS-PAGE well, and transfer the proteins on nitrocellulose. The suitable amount of material required to properly visualize the CH-tagged protein and its cross-linked partners may need to be adjusted on a case-to-case basis depending on the ratio of PrA-bait to CH-tagged protein amounts as well as on the secondary antibody and developing system used. 11. For the detection of the CH-tag and its cross-linked proteins, use an anti-HIS tag antibody after blocking the membrane. We use an IgG2b mouse monoclonal anti-His antibody (1:1000 dilution) that binds poorly to the ssAP PrA-tagged bait, and that does not show any cross-reactivity with endogenous proteins in yeast. Antibodies directed against native proteins carrying the CH-tag can also be used to confirm the presence of the CH-tagged protein signals migrating at a higher molecular weight on western blots. For the detection of the PrA-bait protein, use a peroxidase anti-peroxidase antibody at a 1: 5000 dilution. 12. To determine the required concentration of SM(PEG)2 for a large-scale anchXL-MS experiment, first identify the lowest cross-linker concentration that provides a signal plateau of cross-linked protein(s) according to the western blot and double this concentration (see Fig. 3). For example, according to Fig. 3, Mtr2-CH cross-links plateau at 50 μM, so 100 μM of SM(PEG)2 would be used for Mex67-PrA/Mtr2-CH. 13. Optional: to visualize all proteins from the small-scale ssAPanchXL-MS, load half to 3/fourth of the eluates per SDS-PAGE well, and silver-stain the separated proteins in the gel.

198

Christian Trahan and Marlene Oeffinger

3.2.3 Estimating CHTagged protein Amounts from Isolated Small-scale ssAP Complexes

Method 1

Two methods can be used to roughly estimate the amount of CH-tagged protein present in isolated complexes from the PrA-bait ssAP. Although the first method is more time consuming, it allows to estimate the ratio of interprotein cross-linked CH-tagged proteins over free CH-tagged and hydrolyzed or Tris quenched CH-tag anchored SM(PEG)2 proteins to ensure consistency across experiments. 1. Repeat Subheading 3.2.1 and steps 1–9 from Subheading 3.2.2 using the determined optimal SM(PEG)2 concentration and an uncross-linked negative control. 2. Make a serial dilution of both optimal cross-linked and uncross-linked sample eluates in SDS-PAGE sample loading buffer and migrate the proteins from each serial dilution in a 15 well SDS-PAGE along with a serial dilution of a purified His10-tagged protein of known concentration. 3. Transfer to a nitrocellulose membrane, block the membrane and use an anti-His antibody for CH-tag detection as in Subheading 3.2.2, step 11. 4. Use densitometry from the western blot imaging to make a standard curve based on the His10-tagged protein of known concentration and estimate the abundance of the CH-tagged protein after subtracting the signal from the negative uncrosslinked sample control. Software like ImageJ offer this possibility (http://wsr.imagej.net/distros/). The amount and ratio of both free and cross-linked CH-protein can be easily estimated using this method. Be sure to convert mass to mol for proper molecules number comparison. Alternatively, the second method is faster but does not allow to estimate the ratio of CH-tag interprotein cross-links over free CH-tagged proteins and hydrolyzed/Tris quenched anchored SM (PEG)2 CH-proteins.

Method 2

1. Repeat Subheading 3.2.1 and steps 1–9 from Subheading 3.2.2 using the determined optimal SM(PEG)2 concentration and an uncross-linked negative control but resuspend the eluates in 20 μL of sodium phosphate 50 mM pH 7.4 with 150 mM NaCl. 2. Serial dilute the resuspended eluates in the same buffer and perform a dot blot on nitrocellulose by spotting 1 μL at the time, letting the eluate dry on the membrane before adding another μL. To make a standard curve, either use a serial diluted purified His10-tagged protein of known concentration or a synthetic CH-tag or His10 peptide. 3. Let the membrane dry, block the membrane and probe it with an anti-His as in step 3a above.

Targeted XL-MS on Single-Step Affinity Purified Complexes in Yeast

199

4. Proceed with the densitometry analysis, make a standard curve based on the synthetic peptide or His10-tagged protein of known concentrations, and estimate the amount of CH-tag present in the eluates by reporting to the standard curve (see Note 1). 3.3 Large-scale ssAP-anchXL-MS 3.3.1 Large-scale ssAP

1. Let the weigh-out grinded cryo-lysate partially thaw on ice until it has an ice cream appearance. The amount of powder depends on the amount of CH-tagged protein co-isolated by ssAP with the PrA-protein bait as well as the free CH-tagged to cross-linked CH-protein ratio. For Mex67-PrA/Mtr2-CH, 10 g of cell grindate are used. Weigh 5 g in two 50 mL conical polypropylene tubes. 2. Add the resuspension buffer at a 9:1 buffer to lysate ratio (40 mL per 5 g of Mex67-PrA/Mtr2-CH grindate). The ratio can be slightly lowered for practical reasons: for our Mex67-PrA/Mtr2-CH example, a ratio of 8:1 was used in order to accommodate the sample in two 50 mL tubes, while still being close to the optimal ratio of 9:1 ratio. 3. Vortex at maximum speed for 1 min and put the resuspended lysate back on ice. 4. Use a Polytron homogenizer equipped with a proper size standard immersion disperser insert set at two-third of the maximum speed for 30 s to further homogenize the sample. Use a 7 mm insert for volumes up to 10 mL, and a 12 mm insert for volumes from 10 to 50 mL. 5. Clear the sample by centrifugation at 2700 rcf for 10 min at 4  C. 6. While the lysate is clearing, pre-wash the amount of beads required for the ssAP-anchXL-MS as in Subheading 3.2.1, step 6. Based on beads to cell grindate ratio normally used for Mex67-PrA ssAP [65], 500 μL of beads are required for 10 g of Mex67-PrA/Mtr2-CH cell grindate (50 μL of beads for every gram of Mex67-PrA cell grindate). 7. Transfer the cleared lysate to a fresh tube and place the tube on a magnet support. 8. Remove the supernatant from the last beads wash and use the cleared lysate to resuspend and transfer the beads into the cleared lysate tube on the magnet support. 9. Let the beads settle on the tube wall by the magnet and repeat until all beads are transferred. Wash the tip on top of the lysate tube to make sure no beads are lost in the pipette tip. 10. Incubate for 30 min at 4  C on a nutator, rotation wheel, or any device providing gentle agitation.

200

Christian Trahan and Marlene Oeffinger

11. Pace the tube back on the magnet support, wait for the beads to settle on the magnet side of the tube, and slowly aspirate the supernatant from the other side of the tube wall using either a long Pasteur pipette or an unfiltered 1 mL pipet that are connected to a negative pressure vacuum to aspirate liquids. 12. Remove the tube from the magnet support and wash the beads off the wall of the tube by delivering 10 mL of resuspension buffer containing DTT and proteases inhibitors above the compacted beads. Then hold the tube at a 45-degree angle, and gently swirl the tube to make sure that the beads are homogeneously dispersed while preventing the wash buffer and beads touch the tube cap. Repeat this step 2 more times for a total of three washes. 13. Wash the beads once in 10 mL of NH4OAc 0.1 M pH 7.5 as described above. 3.3.2 SM(PEG)2 Controlled Two-Step Reactions

1. Wash the beads once with 20 mL of MRB containing salts, and two more times with 10 mL, taking care to wash all sides of the tube while delivering the buffer to remove as much NH4OAc from the sides of the tube and avoid its carryover to the next steps. While the beads are homogeneously dispersed in the last 10 mL MRB wash, transfer 200 μL of beads into a 1.5 mL LoBind microtube. Place the tube on the magnetic rack and remove the supernatant. Elute the complexes off the beads as in Subheading 3.2.2, step 7. This fraction will constitute the uncross-linked sample for western blotting and SDS-PAGE silver staining. 2. Resuspend the beads in 1 mL MRB containing 0.1 mM SM (PEG)2 and gently tap and swirl the tube upright to disperse the beads. 3. Incubate the tube standing upright at room temperature for 30 min with gentle agitation (about 160 rpm). To achieve such low rotation speed with the vortex, a 50 mL tube support can be taped to a vortex plate adaptor with a metal PCR block in-between to add weight. 4. Add 10 mL of MRB to dilute the unreacted SM(PEG)2, place the tube on the magnet holder and discard the supernatant. 5. After removing the MRB from step 4, with the tube off the magnetic rack, quickly disperse the beads by delivering 10 mL of NRB above the beads and swirling the tube before adding 35 more mL of NRB. It is important to disperse the beads rapidly while adding NRB since the NHS ester moiety is highly reactive at pH 8. Using a large volume for the NHS ester reaction keeps the beads further apart, minimizing the risks of crosslinking material between beads.

Targeted XL-MS on Single-Step Affinity Purified Complexes in Yeast

201

6. Incubate for 45–60 min at room temperature with the tube standing upright, slowly inverting the tube every few minutes. 7. Add 5 mL of Tris 1 M pH 8, and further incubate at room temperature for 15 min to quench the NHS ester reaction, inverting the tube every few minutes. Transfer 1 mL of 50 mL into a 1.5 mL LoBind microtube and elute the crosslinked sample as in Subheading 3.2.2, step 7. This will constitute the cross-linked fraction for western blotting and silver staining of SDS-PAGE. 3.3.3 On-Bead Trypsin Digestion of ssAP-anchXL Complexes

1. Equilibrate the beads bound cross-linked complexes once in 10 mL of 50 mM sodium phosphate pH 8, place the tube back on the magnetic rack and remove about 9 mL of the wash buffer. 2. Remove the 50 mL tube from the magnet and use the remaining 1 mL to transfer the beads into a 1.5 mL LoBind tube placed on a magnetic rack. Let the beads settle on the side of the microtube and use the supernatant to recover the remaining beads from the 50 mL tube until the buffer transferred is clear. 3. Discard the supernatant from the 1.5 mL tube and resuspend the resin with 355 μL trypsin digestion buffer containing 5 μg of trypsin. Do not mix with the pipette tip but invert the tube until all beads seems dispersed. Incubate on a slow spinning wheel overnight at 37  C. 4. Next day, add 2 μg of trypsin and prolong the incubation for four more hours. 5. Quick spin the tube at low speed (600–800 rcf) to avoid losing some material in the tube cap, then place the tube in the magnet rack and transfer the supernatant containing the tryptic digested peptides into a fresh tube. 6. Add 20 μL of 5 M NaCl and 0.8 μL of 2.5 M imidazole pH 8 to the sample to reach final concentrations of 250 mM and 5 mM, respectively. Mix well and split the 400 μL tryptic digest into two LoBind 1.5 mL microtubes, 200 μL in each tube.

3.3.4 IMAC Enrichment of Anchored chXL Peptides

1. Use one UptiTip per 200 μL tryptic digest. Using a P200 pipet, wet a nickel coated UptiTip by aspirating 200 μL of MS grade water and expelling it in a waste receptacle. Repeat four times. 2. Pre-wash the tip three times with 200 μL of UPWB, discarding the buffer in a waste recipient. 3. Equilibrate the tip by washing it five times with 200 μL of UBB, discarding the buffer as above.

202

Christian Trahan and Marlene Oeffinger

4. With a maximum of 200 μL of sample per UptiTip, bind the CH-tagged peptides to the nickel coated UptiTip by pipetting up-and-down 100 times, with the pipette set at a slightly lower volume than the sample volume to avoid creating bubbles. 5. Wash the tip 30 times with 20 μL of equilibration buffer, discarding the aspirated buffer in a recipient. 6. Wash the tip three times with 20 μL of UWB, discarding the buffer as above. 7. To elute the enriched CH-tagged peptides, prepare three LoBind tubes with 10 μL of UEB in each tube. Proceed by pipetting the 10 μL UEB up-and-down 10 times in each tube, then pool all three elution fractions (30 μL total). 8. Pool the elution from both UptiTip used (60 μL total). 9. Dry the samples in a vacuum concentrator at room temperature (SpeedVac) leaving the tube lid open pointing toward the exterior of the rotor. Store the dried samples at 80  C or proceed immediately to the next step. 3.3.5 Mass Spectrometry

1. Samples are resuspended in 12 μL of MSB. The use of TCEP (see Subheading 2) as reducing agent prevents the formation of cysteines disulfide bridges while being an acidic MS compatible reagent. To avoid over diluting the samples, it is best to first resuspend the sample in a minimal volume like 10–12 μL. The sample may be too concentrated, and we recommend injecting 1 or 2 μL from a one-fifth dilution first for a quick MS gradient test run to evaluate the quality and quantity of the sample and determine the volume to be injected to reach near LC and MS signal saturation. 2. Optional: As unreacted CH-tag peptides, as well as hydrolyzed or Tris quenched NHS ester groups from anchored SM(PEG)2 may be over represented in the samples, peptide size exclusion may be performed on the samples [70] to remove them. Ideally, use synthetic CH-tag peptides on which SM(PEG)2 have been anchored and then hydrolyzed as a cut-off for the size exclusion anchXL cross-links enrichment along with a set of different synthetic peptides for chromatography calibration. We recommend proceeding with this step if the estimated ratio of interprotein cross-links over free CH-tagged and CH-tagged anchored SM(PEG)2 proteins is very low. In such circumstances, the free CH-tag and anchored cross-linked peptide precursors will saturate the LC-MS instrument limiting the volume that can be injected in LC-MS, thus lowering the sensitivity of the method. 3. The samples are loaded directly onto a PicoFrit fused silica capillary column (15 cm  75 μm i.d; New Objective), selfpacked with C18 reverse-phase media using a high-pressure

Targeted XL-MS on Single-Step Affinity Purified Complexes in Yeast

203

packing cell. The column is installed on the Easy-nLC II system (Proxeon Biosystems) and coupled to the Fusion mass spectrometer equipped with a Proxeon Nanospray Flex ion source (Thermo Fisher Scientific). 4. Samples are then loaded on the column at 600 nL/min and eluted at 250 nL/min using a 2-slope gradient. The gradient may need to be adjusted according to the complexity and concentration of the samples. For Mex67-PrA/Mtr2-CH strain, the samples are loaded in 10% of solution B, and then eluted by increasing solution B to 40% over 40 min, and then to 85% over 15 min. 5. Orbitrap Fusion Tribrid mass spectrometer method settings: the mass spectrometer is operated in data dependent acquisition mode with a locked mass (371.101233 Da) using a Top 3 s cycle method. MS1 full scan range of 360–1560 m/z is acquired at 120K Orbitrap resolution with an automatic gain control (AGC) target of 1  106 coupled to a maximum injection time of 100 ms. Minimum ion intensity threshold is set to 1  104. The dynamic exclusion criteria are set to 4 scans within 12 s, with an exclusion time of 40 s. Mass tolerance of precursors is set to 10 ppm. The quadrupole precursor isolation window is left to default (1.6 m/z), and only precursor ions between 3 and 7 charges are fragmented by HCD with a collision energy of 30% for MS2 detection. MS2 fragmented ions are also detected in the Orbitrap at a resolution of 30,000. MS2 AGC target and maximum injection time are set to 1  105 and 80 ms, respectively. MS2 minimum ion intensity threshold is set to 1  104. A precursor ion exclusion list is used to avoid wasting MS2 scan time on uncross-linked CH-tag peptides, or cysteine anchored SM(PEG)2 mono-linked peptides having either a hydrolyzed or Tris quenched NHS ester end (see Table 1) (see Note 2). 3.3.6 ssAP-anchXL-MS Data Analysis Using pLink2

Thanks to the pFind Studio team that developed pLink2, the software has been modified from version 2.3.9 and higher to efficiently identify anchored chXL peptides. pLink2 can be downloaded on the pFind studio website at http://pfind.ict.ac.cn/ index.html. 1. After installing the software, first open the pConfig file to: (a) Add your custom FASTA library, adding the correct CH-tag sequence to the appropriate protein in the FASTA file (see Fig. 1b). A decoy database will automatically be created. (b) If it is not listed, add the SM(PEG)2 cross-linker to the linkers list according to the specifications below. Different entries can be added for serine, threonine, and tyrosine as

204

Christian Trahan and Marlene Oeffinger

Table 1 Ion exclusion list m/z

Formula

z

ID

852.8516

C(72), H(89), N(33), O(16)

2

CDPHHHHHHHHHH

568.9035

C(72), H(89), N(33), O(16)

3

CDPHHHHHHHHHH

426.9294

C(72), H(89), N(33), O(16)

4

CDPHHHHHHHHHH

1016.9151

C(86), H(109), N(35), O(23), S(1)

2

OH-SM(PEG)2-CDPHHHHHHHHHH

678.2791

C(86), H(109), N(35), O(23), S(1)

3

OH-SM(PEG)2-CDPHHHHHHHHHH

508.9612

C(86), H(109), N(35), O(23), S(1)

4

OH-SM(PEG)2-CDPHHHHHHHHHH

407.3704

C(86), H(109), N(35), O(23), S(1)

5

OH-SM(PEG)2-CDPHHHHHHHHHH

1068.4468

C(90), H(118), N(36), O(25), S(1)

2

Tris-SM(PEG)2-CDPHHHHHHHHHH

712.6336

C(90), H(118), N(36), O(25), S(1)

3

Tris-SM(PEG)2-CDPHHHHHHHHHH

534.7270

C(90), H(118), N(36), O(25), S(1)

4

Tris-SM(PEG)2-CDPHHHHHHHHHH

427.9831

C(90), H(118), N(36), O(25), S(1)

5

Tris-SM(PEG)2-CDPHHHHHHHHHH

Betasites without using the N-terminal representing bracket. Alphasites ¼ C. Betasites ¼ K. LinkerMass ¼ 328.127. MonoMass ¼ 310.117. LinkerComposition ¼ C(18)O(9)H(23)N(3). MonoComposition ¼ C(14)O(6)H(18)N(2). 2. After saving the new database and cross-linker, open pLink2 and proceed as follows: (a) Create a new task and modify the name and file location if needed. (b) From the MS data tab: import your RAW files directly and leave the data extraction parameters to default. If a mass spectrometer from another vendor was used, convert the output files to MGF file format using the msconvert tool from ProteoWizard [71]. (c) From the identification tab: use the conventional crosslinking flow and determine the process number (CPU threads) to be used during the analysis (refer to your computer CPU specifications). Choose the linker by selecting the SM(PEG)2 from the list in the right box. If you forgot to add the FASTA database to pLink2 in step 1b above, it is also possible to add it here. To do so from

Targeted XL-MS on Single-Step Affinity Purified Complexes in Yeast

205

the database search section, select Customize Database. A database name for pLink2 can then by entered along with the path to the FASTA file. The enzyme selection can be left to trypsin no matter if trypsin or trypsin/Lys-C was used to generate the peptides for MS. The difference between trypsin and Trypsin_P enzyme selection is that Trypsin_P will not consider K/R cleavages after prolines and is only relevant to trypsin, not trypsin/Lys-C. While fixed modification should be left empty, we strongly recommended using methionine oxidation (Oxidation [M]) as variable modification. Under the results filter, the peptide spectrum match false discovery rate (FDR) value threshold can be changed and calculated either from separate files or across-experiment. (d) Review the search settings in the summary tab, save the task below then start the process. 3. Optional: A linear peptide search can be done with pFind3 [72], which have a function of unexpected modifications discovery. This could help determining if other variable modifications should be included. 4. Output: when the search is done, a web page named general. html will automatically open showing the search results with links to more details from files generated in the htmls folder. A folder named reports is also generated and contains all the search results in csv file format for easy manipulation of the data.

4

Notes 1. It is not possible to establish a ratio of free CH-tagged protein to cross-linked CH-tags proteins using this approach. In that case, refer to the western blotting results from the SM(PEG)2 cross-linking titration results from Subheading 3.2.2. 2. These values are corresponding to a POI-independent CH-tag positioned at a C-terminal end (POI-RCDPH10). If N-terminal POI-independent CH-tag, POI terminal peptidedependent CH-tags or SM(PEG)n of a different lengths than the SM(PEG)2 are used, then these values should be recalculated (see Fig. 1).

206

Christian Trahan and Marlene Oeffinger

Acknowledgments We thank Denis Faubert for mass spectrometry support during the initial development of the method as well as the pFind Studio team, especially Shengbo Fan and Pengzhi Mao, for adapting pLink and pLink2, respectively. C.T is supported by funding awarded to M.O. from the Canadian Institutes for Health Research (PJT153313). References 1. Piersimoni L, Sinz A (2020) Cross-linking/ mass spectrometry at the crossroads. Anal Bioanal Chem 412:5981–5987. https://doi. org/10.1007/s00216-020-02700-x 2. Leitner A, Faini M, Stengel F, Aebersold R (2016) Crosslinking and mass spectrometry: an integrated technology to understand the structure and function of molecular machines. Trends Biochem Sci 41:20–32. https://doi. org/10.1016/j.tibs.2015.10.008 3. Yu C, Huang L (2018) Cross-linking mass spectrometry (XL-MS): an emerging technology for interactomics and structural biology. Anal Chem 90:144–165. https://doi.org/10. 1021/acs.analchem.7b04431 4. O’Reilly FJ, Rappsilber J (2018) Cross-linking mass spectrometry: methods and applications in structural, molecular and systems biology. Nat Struct Mol Biol 25:1000–1008. https:// doi.org/10.1038/s41594-018-0147-0 5. Tang X, Wippel HH, Chavez JD, Bruce JE (2021) Crosslinking mass spectrometry: a link between structural biology and systems biology. Protein Sci 30:773–784. https://doi. org/10.1002/pro.4045 6. Kim D, Setiaputra D, Jung T, Chung J, Leitner A, Yoon J, Aebersold R, Hebert H, Yip CK, Song J-J (2016) Molecular architecture of yeast chromatin assembly factor 1. Sci Rep 6:26702. https://doi.org/10.1038/ srep26702 7. Harrer N, Schindler CEM, Bruetzel LK, Forne´ I, Ludwigsen J, Imhof A, Zacharias M, Lipfert J, Mueller-Planitz F (2018) Structural architecture of the nucleosome remodeler ISWI determined from cross-linking, mass spectrometry, SAXS, and modeling. Structure 26:282–294.e6. https://doi.org/10.1016/j. str.2017.12.015 8. Kang JJ, Faubert D, Boulais J, Francis NJ (2020) DNA binding reorganizes the intrinsically disordered C-terminal region of PSC in drosophila PRC1. J Mol Biol 432:4856–4871. https://doi.org/10.1016/j.jmb.2020.07.002

9. Mashtalir N, Suzuki H, Farrell DP, Sankar A, Luo J, Filipovski M, D’Avino AR, St. Pierre R, Valencia AM, Onikubo T, Roeder RG, Han Y, He Y, Ranish JA, DiMaio F, Walz T, Kadoch C (2020) A structural model of the endogenous human BAF complex informs disease mechanisms. Cell 183:802–817.e24. https://doi.org/ 10.1016/j.cell.2020.09.051 10. Structure and subunit topology of the INO80 chromatin remodeler and its nucleosome complex: cell. https://www.cell.com/fulltext/S00 92-8674(13)01010-6. Accessed 3 Jun 2021 11. Nguyen VQ, Ranjan A, Stengel F, Wei D, Aebersold R, Wu C, Leschziner AE (2013) Molecular architecture of the ATP-dependent chromatin-remodeling complex SWR1. Cell 154:1220–1231. https://doi.org/10.1016/j. cell.2013.08.018 12. Kloet SL, Baymaz HI, Makowski M, Groenewold V, Jansen PWTC, Berendsen M, Niazi H, Kops GJ, Vermeulen M (2015) Towards elucidating the stability, dynamics and architecture of the nucleosome remodeling and deacetylase complex by using quantitative interaction proteomics. FEBS J 282:1774– 1785. https://doi.org/10.1111/febs.12972 13. Fasci D, van Ingen H, Scheltema RA, Heck AJR (2018) Histone interaction landscapes visualized by crosslinking mass spectrometry in intact cell nuclei. Mol Cell Proteomics 17: 2018–2033. https://doi.org/10.1074/mcp. RA118.000924 14. Chen ZA, Jawhari A, Fischer L, Buchen C, Tahir S, Kamenski T, Rasmussen M, Lariviere L, Bukowski-Wills J-C, Nilges M, Cramer P, Rappsilber J (2010) Architecture of the RNA polymerase II-TFIIF complex revealed by cross-linking and mass spectrometry. EMBO J 29:717–726. https://doi.org/ 10.1038/emboj.2009.401 15. Blattner C, Jennebach S, Herzog F, Mayer A, Cheung ACM, Witte G, Lorenzen K, Hopfner K-P, Heck AJR, Aebersold R, Cramer P (2011) Molecular basis of Rrn3-regulated RNA

Targeted XL-MS on Single-Step Affinity Purified Complexes in Yeast polymerase I initiation and cell growth. Genes Dev 25:2093–2105. https://doi.org/10. 1101/gad.17363311 16. Jennebach S, Herzog F, Aebersold R, Cramer P (2012) Crosslinking-MS analysis reveals RNA polymerase I domain architecture and basis of rRNA cleavage. Nucleic Acids Res 40: 5591–5601. https://doi.org/10.1093/nar/ gks220 17. Luo J, Fishburn J, Hahn S, Ranish J (2012) An integrated chemical cross-linking and mass spectrometry approach to study protein complex architecture and function. Mol Cell Proteomics MCP 11:M111.008318. https://doi. org/10.1074/mcp.M111.008318 18. Murakami K, Elmlund H, Kalisman N, Bushnell DA, Adams CM, Azubel M, Elmlund D, Levi-Kalisman Y, Liu X, Gibbons BJ, Levitt M, Kornberg RD (2013) Architecture of an RNA polymerase II transcription pre-initiation complex. Science 342:1238724. https://doi.org/ 10.1126/science.1238724 19. Mu¨hlbacher W, Sainsbury S, Hemann M, Hantsche M, Neyer S, Herzog F, Cramer P (2014) Conserved architecture of the core RNA polymerase II initiation complex. Nat Commun 5:4310. https://doi.org/10.1038/ ncomms5310 20. Martinez-Rucobo FW, Kohler R, van de Waterbeemd M, Heck AJR, Hemann M, Herzog F, Stark H, Cramer P (2015) Molecular basis of transcription-coupled pre-mRNA capping. Mol Cell 58:1079–1089. https:// doi.org/10.1016/j.molcel.2015.04.004 21. Robinson PJJ, Bushnell DA, Trnka MJ, Burlingame AL, Kornberg RD (2012) Structure of the mediator head module bound to the carboxy-terminal domain of RNA polymerase II. Proc Natl Acad Sci U S A 109:17931– 17935. https://doi.org/10.1073/pnas. 1215241109 22. Plaschka C, Larivie`re L, Wenzeck L, Seizl M, Hemann M, Tegunov D, Petrotchenko EV, Borchers CH, Baumeister W, Herzog F, Villa E, Cramer P (2015) Architecture of the RNA polymerase II-mediator core initiation complex. Nature 518:376–380. https://doi. org/10.1038/nature14229 23. Robinson PJ, Trnka MJ, Pellarin R, Greenberg CH, Bushnell DA, Davis R, Burlingame AL, Sali A, Kornberg RD (2015) Molecular architecture of the yeast mediator complex. eLife 4: e08719. https://doi.org/10.7554/eLife. 08719 24. Luo J, Cimermancic P, Viswanath S, Ebmeier CC, Kim B, Dehecq M, Raman V, Greenberg CH, Pellarin R, Sali A, Taatjes DJ, Hahn S, Ranish J (2015) Architecture of the human

207

and yeast general transcription and DNA repair factor TFIIH. Mol Cell 59:794–806. https:// doi.org/10.1016/j.molcel.2015.07.016 25. Murakami K, Tsai K-L, Kalisman N, Bushnell DA, Asturias FJ, Kornberg RD (2015) Structure of an RNA polymerase II preinitiation complex. Proc Natl Acad Sci U S A 112: 13543–13548. https://doi.org/10.1073/ pnas.1518255112 26. Sadian Y, Tafur L, Kosinski J, Jakobi AJ, Wetzel R, Buczak K, Hagen WJ, Beck M, Sachse C, Mu¨ller CW (2017) Structural insights into transcription initiation by yeast RNA polymerase I. EMBO J 36:2698–2709. https://doi.org/10.15252/embj.201796958 27. Wu C-C, Herzog F, Jennebach S, Lin Y-C, Pai C-Y, Aebersold R, Cramer P, Chen H-T (2012) RNA polymerase III subunit architecture and implications for open promoter complex formation. Proc Natl Acad Sci U S A 109: 19232–19237. https://doi.org/10.1073/ pnas.1211665109 28. Fernandez-Martinez J, Kim SJ, Shi Y, Upla P, Pellarin R, Gagnon M, Chemmama IE, Wang J, Nudelman I, Zhang W, Williams R, Rice WJ, Stokes DL, Zenklusen D, Chait BT, Sali A, Rout MP (2016) Structure and function of the nuclear pore complex cytoplasmic mRNA export platform. Cell 167:1215– 1228.e25. https://doi.org/10.1016/j.cell. 2016.10.028 29. Zhong X, Wu X, Schweppe DK, Chavez JD, Mathay M, Eng JK, Keller A, Bruce JE (2020) In vivo cross-linking MS reveals conservation in OmpA linkage to different classes of β-lactamase enzymes. J Am Soc Mass Spectrom 31:190–195. https://doi.org/10.1021/ jasms.9b00021 30. Wittig S, Ganzella M, Barth M, Kostmann S, ´ , Jahn R, Schmidt C Riedel D, Pe´rez-Lara A (2021) Cross-linking mass spectrometry uncovers protein interactions and functional assemblies in synaptic vesicle membranes. Nat Commun 12:858. https://doi.org/10.1038/ s41467-021-21102-w 31. Kim SJ, Fernandez-Martinez J, Nudelman I, Shi Y, Zhang W, Raveh B, Herricks T, Slaughter BD, Hogan JA, Upla P, Chemmama IE, Pellarin R, Echeverria I, Shivaraju M, Chaudhury AS, Wang J, Williams R, Unruh JR, Greenberg CH, Jacobs EY, Yu Z, de la Cruz MJ, Mironska R, Stokes DL, Aitchison JD, Jarrold MF, Gerton JL, Ludtke SJ, Akey CW, Chait BT, Sali A, Rout MP (2018) Integrative structure and functional anatomy of a nuclear pore complex. Nature 555:475–482. https://doi. org/10.1038/nature26003

208

Christian Trahan and Marlene Oeffinger

32. Shi Y, Fernandez-Martinez J, Tjioe E, Pellarin R, Kim SJ, Williams R, SchneidmanDuhovny D, Sali A, Rout MP, Chait BT (2014) Structural characterization by cross-linking reveals the detailed architecture of a coatomer-related heptameric module from the nuclear pore complex. Mol Cell Proteomics MCP 13:2927–2943. https://doi.org/10. 1074/mcp.M114.041673 33. Thierbach K, von Appen A, Thoms M, Beck M, Flemming D, Hurt E (2013) Protein interfaces of the conserved Nup84 complex from Chaetomium thermophilum shown by crosslinking mass spectrometry and electron microscopy. Structure 21:1672–1682. https://doi.org/ 10.1016/j.str.2013.07.004 34. Kim SJ, Fernandez-Martinez J, Sampathkumar P, Martel A, Matsui T, Tsuruta H, Weiss TM, Shi Y, MarkinaInarrairaegui A, Bonanno JB, Sauder JM, Burley SK, Chait BT, Almo SC, Rout MP, Sali A (2014) Integrative structure-function mapping of the nucleoporin Nup133 suggests a conserved mechanism for membrane anchoring of the nuclear pore complex. Mol Cell Proteomics MCP 13:2911–2926. https://doi.org/10. 1074/mcp.M114.040915 35. von Appen A, Kosinski J, Sparks L, Ori A, DiGuilio AL, Vollmer B, Mackmull M-T, Banterle N, Parca L, Kastritis P, Buczak K, Mosalaganti S, Hagen W, Andres-Pons A, Lemke EA, Bork P, Antonin W, Glavy JS, Bui KH, Beck M (2015) In situ structural analysis of the human nuclear pore complex. Nature 526:140–143. https://doi.org/10.1038/ nature15381 36. Erzberger JP, Stengel F, Pellarin R, Zhang S, Schaefer T, Aylett CHS, Cimermancˇicˇ P, Boehringer D, Sali A, Aebersold R, Ban N (2014) Molecular architecture of the 40SeIF1eIF3 translation initiation complex. Cell 158:1123–1135. https://doi.org/10. 1016/j.cell.2014.07.044 37. Lauber MA, Reilly JP (2011) Structural analysis of a prokaryotic ribosome using a novel amidinating cross-linker and mass spectrometry. J Proteome Res 10(8):3604–3616. https://pubs.acs.org/doi/10.1021/pr2002 60n. Accessed 3 Jun 2021 38. Lauber MA, Rappsilber J, Reilly JP (2012) Dynamics of ribosomal protein S1 on a bacterial ribosome with cross-linking and mass spectrometry. Mol Cell Proteomics 11:1965–1976. h t t p s : // d o i . o r g / 1 0 . 1 0 7 4 / m c p . M 1 1 2 . 019562 39. Greber BJ, Boehringer D, Leibundgut M, Bieri P, Leitner A, Schmitz N, Aebersold R, Ban N (2014) The complete structure of the

large subunit of the mammalian mitochondrial ribosome. Nature 515:283–286. https://doi. org/10.1038/nature13895 40. Greber BJ, Bieri P, Leibundgut M, Leitner A, Aebersold R, Boehringer D, Ban N (2015) Ribosome. The complete structure of the 55S mammalian mitochondrial ribosome. Science 348:303–308. https://doi.org/10.1126/sci ence.aaa3872 41. Kiosze-Becker K, Ori A, Gerovac M, Heuer A, Nu¨renberg-Goloub E, Rashid UJ, Becker T, Beckmann R, Beck M, Tampe´ R (2016) Structure of the ribosome post-recycling complex probed by chemical cross-linking and mass spectrometry. Nat Commun 7:13248. https://doi.org/10.1038/ncomms13248 42. Tu¨ting C, Iacobucci C, Ihling CH, Kastritis PL, Sinz A (2020) Structural analysis of 70S ribosomes by cross-linking/mass spectrometry reveals conformational plasticity. Sci Rep 10: 12618. https://doi.org/10.1038/s41598020-69313-3 43. Go¨tze M, Iacobucci C, Ihling CH, Sinz A (2019) A simple cross-linking/mass spectrometry workflow for studying system-wide protein interactions. Anal Chem 91:10236–10244. https://doi.org/10.1021/acs.analchem. 9b02372 44. de Oliveira LC, Volpon L, Rahardjo AK, Osborne MJ, Culjkovic-Kraljacic B, Trahan C, Oeffinger M, Kwok BH, Borden KLB (2019) Structural studies of the eIF4E–VPg complex reveal a direct competition for capped RNA: implications for translation. Proc Natl Acad Sci U S A 116:24056–24065. https://doi. org/10.1073/pnas.1904752116 45. Sharon M, Taverner T, Ambroggio XI, Deshaies RJ, Robinson CV (2006) Structural organization of the 19S proteasome lid: insights from MS of intact complexes. PLoS Biol 4:e267. https://doi.org/10.1371/jour nal.pbio.0040267 46. Bohn S, Beck F, Sakata E, Walzthoeni T, Beck M, Aebersold R, Fo¨rster F, Baumeister W, Nickell S (2010) Structure of the 26S proteasome from Schizosaccharomyces pombe at subnanometer resolution. Proc Natl Acad Sci U S A 107:20992–20997. https:// doi.org/10.1073/pnas.1015530107 47. Lasker K, Fo¨rster F, Bohn S, Walzthoeni T, Villa E, Unverdorben P, Beck F, Aebersold R, Sali A, Baumeister W (2012) Molecular architecture of the 26S proteasome holocomplex determined by an integrative approach. Proc Natl Acad Sci U S A 109:1380–1387. https://doi.org/10.1073/pnas.1120559109 48. Kao A, Randall A, Yang Y, Patel VR, Kandur W, Guan S, Rychnovsky SD, Baldi P, Huang L

Targeted XL-MS on Single-Step Affinity Purified Complexes in Yeast (2012) Mapping the structural topology of the yeast 19S proteasomal regulatory particle using chemical cross-linking and probabilistic modeling. Mol Cell Proteomics 11:1566–1577. h t t p s : // d o i . o r g / 1 0 . 1 0 7 4 / m c p . M 1 1 2 . 018374 49. Yang B, Wu Y-J, Zhu M, Fan S-B, Lin J, Zhang K, Li S, Chi H, Li Y-X, Chen H-F, Luo S-K, Ding Y-H, Wang L-H, Hao Z, Xiu L-Y, Chen S, Ye K, He S-M, Dong M-Q (2012) Identification of cross-linked peptides from complex samples. Nat Methods 9:904– 906. https://doi.org/10.1038/nmeth.2099 50. Pflieger D, Ju¨nger MA, Mu¨ller M, Rinner O, Lee H, Gehrig PM, Gstaiger M, Aebersold R (2008) Quantitative proteomic analysis of protein complexes: concurrent identification of interactors and their state of phosphorylation. Mol Cell Proteomics 7:326–346. https://doi. org/10.1074/mcp.M700282-MCP200 51. Xu H, Hsu P-H, Zhang L, Tsai M-D, Freitas MA (2010) Database search algorithm for identification of intact cross-links in proteins and peptides using tandem mass spectrometry. J Proteome Res 9:3384–3393. https://doi. org/10.1021/pr100369y 52. Buncherd H, Roseboom W, de Koning LJ, de Koster CG, de Jong L (2014) A gas phase cleavage reaction of cross-linked peptides for protein complex topology studies by peptide fragment fingerprinting from large sequence database. J Proteome 108:65–77. https://doi. org/10.1016/j.jprot.2014.05.003 53. Liu F, Rijkers DTS, Post H, Heck AJR (2015) Proteome-wide profiling of protein assemblies by cross-linking mass spectrometry. Nat Methods 12:1179–1184. https://doi.org/10. 1038/nmeth.3603 54. Liu F, Lo¨ssl P, Scheltema R, Viner R, Heck AJR (2017) Optimized fragmentation schemes and data analysis strategies for proteome-wide cross-link identification. Nat Commun 8: 1 5 4 7 3 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 8 / ncomms15473 55. Tan D, Li Q, Zhang M-J, Liu C, Ma C, Zhang P, Ding Y-H, Fan S-B, Tao L, Yang B, Li X, Ma S, Liu J, Feng B, Liu X, Wang H-W, He S-M, Gao N, Ye K, Dong M-Q, Lei X (2016) Trifunctional cross-linker for mapping protein-protein interaction networks and comparing protein conformational states. Elife 5: e12509. https://doi.org/10.7554/eLife. 12509 56. Zhang H, Tang X, Munske GR, Tolic N, Anderson GA, Bruce JE (2009) Identification of protein-protein interactions and topologies in living cells with chemical cross-linking and mass spectrometry. Mol Cell Proteomics 8:

209

409–420. https://doi.org/10.1074/mcp. M800232-MCP200 57. Navare AT, Chavez JD, Zheng C, Weisbrod CR, Eng JK, Siehnel R, Singh PK, Manoil C, Bruce JE (2015) probing the protein interaction network of Pseudomonas aeruginosa cells by chemical cross-linking mass spectrometry. Structure 23:762–773. https://doi.org/10. 1016/j.str.2015.01.022 58. Wu X, Chavez JD, Schweppe DK, Zheng C, Weisbrod CR, Eng JK, Murali A, Lee SA, Ramage E, Gallagher LA, Kulasekara HD, Edrozo ME, Kamischke CN, Brittnacher MJ, Miller SI, Singh PK, Manoil C, Bruce JE (2016) In vivo protein interaction network analysis reveals porin-localized antibiotic inactivation in Acinetobacter baumannii strain AB5075. Nat Commun 7:13414. https://doi. org/10.1038/ncomms13414 59. Zhong X, Navare AT, Chavez JD, Eng JK, Schweppe DK, Bruce JE (2017) Large-scale and targeted quantitative cross-linking MS using isotope-labeled protein interaction reporter (PIR) cross-linkers. J Proteome Res 16:720–727. https://doi.org/10.1021/acs. jproteome.6b00752 60. Chavez JD, Weisbrod CR, Zheng C, Eng JK, Bruce JE (2013) Protein interactions, posttranslational modifications and topologies in human cells. Mol Cell Proteomics 12:1451– 1467. https://doi.org/10.1074/mcp.M112. 024497 61. Schweppe DK, Chavez JD, Lee CF, Caudal A, Kruse SE, Stuppard R, Marcinek DJ, Shadel GS, Tian R, Bruce JE (2017) Mitochondrial protein interactome elucidated by chemical cross-linking mass spectrometry. Proc Natl Acad Sci U S A 114:1732–1737. https://doi. org/10.1073/pnas.1617220114 62. de Jong L, de Koning EA, Roseboom W, Buncherd H, Wanner MJ, Dapic I, Jansen PJ, van Maarseveen JH, Corthals GL, Lewis PJ, Hamoen LW, de Koster CG (2017) In-culture cross-linking of bacterial cells reveals large-scale dynamic protein-protein interactions at the peptide level. J Proteome Res 16: 2457–2471. https://doi.org/10.1021/acs. jproteome.7b00068 63. Trahan C, Oeffinger M (2016) Targeted crosslinking-mass spectrometry determines vicinal interactomes within heterogeneous RNP complexes. Nucleic Acids Res 44:1354–1369. https://doi.org/10.1093/nar/gkv1366 64. Oeffinger M, Wei KE, Rogers R, DeGrasse JA, Chait BT, Aitchison JD, Rout MP (2007) Comprehensive analysis of diverse ribonucleoprotein complexes. Nat Methods 4:951–956. https://doi.org/10.1038/nmeth1101

210

Christian Trahan and Marlene Oeffinger

65. Trahan C, Oeffinger M (2021) Single-step affinity purification (ssAP) and mass spectrometry of macromolecular complexes in the Yeast S. cerevisiae. Methods Mol Biol 1361:265–287 66. Miseta A, Csutora P (2000) Relationship between the occurrence of cysteine in proteins and the complexity of organisms. Mol Biol Evol 17:1232–1239. https://doi.org/10. 1093/oxfordjournals.molbev.a026406 67. Krishna MMG, Englander SW (2005) The N-terminal to C-terminal motif in protein folding and function. Proc Natl Acad Sci U S A 102:1053–1058. https://doi.org/10.1073/ pnas.0409114102 68. Laughery MF, Hunter T, Brown A, Hoopes J, Ostbye T, Shumaker T, Wyrick JJ (2015) New vectors for simple and streamlined CRISPRCas9 genome editing in Saccharomyces cerevisiae. Yeast 32:711–720. https://doi.org/10. 1002/yea.3098 69. Hofmann T, Fischer AW, Meiler J, Kalkhof S (2015) Protein structure prediction guided by crosslinking restraints—A systematic evaluation of the impact of the crosslinking spacer

length. Methods 89:79–90. https://doi.org/ 10.1016/j.ymeth.2015.05.014 70. Leitner A, Reischl R, Walzthoeni T, Herzog F, Bohn S, Fo¨rster F, Aebersold R (2012) Expanding the chemical cross-linking toolbox by the use of multiple proteases and enrichment by size exclusion chromatography. Mol Cell Proteomics 11:M111.014126. https:// doi.org/10.1074/mcp.M111.014126 71. Kessner D, Chambers M, Burke R, Agus D, Mallick P (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24:2534–2536. https:// doi.org/10.1093/bioinformatics/btn323 72. Chi H, Liu C, Yang H, Zeng W-F, Wu L, Zhou W-J, Wang R-M, Niu X-N, Ding Y-H, Zhang Y, Wang Z-W, Chen Z-L, Sun R-X, Liu T, Tan G-M, Dong M-Q, Xu P, Zhang P-H, He S-M (2018) Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine. Nat Biotechnol 36:1059–1061. https://doi.org/10.1038/ nbt.4236

Chapter 14 A Crosslinking Mass Spectrometry Protocol for the Structural Analysis of Microtubule-Associated Proteins Atefeh Rafiei and David C. Schriemer Abstract Microtubule-associated proteins (MAPs) engage microtubules (MTs) to regulate both the MT state and wide variety of cytoskeletal functions. A comprehensive understanding of MAPs function requires the structural characterization of physical contacts MAPs make with other proteins, particularly when engaged with the microtubule (MT) lattice. Most of the interaction between MAPs and MTs evade classical structural determination techniques, as the interactions can be both heterogenous and sub-stoichiometric. Crosslinking mass spectrometry (XL-MS) can aid in MAP-MT structure analysis by providing a wealth of residue-based distance restraints. This protocol provides an XL-MS workflow for accurate and unbiased sampling of an equilibrated MAP-MT interaction, involving modifications to the preparation and validation of a MAP-MT construct suitable for crosslinking with fast-sampling heterobifunctional crosslinkers. The distance restrains obtained by this protocol can be used to generate accurate models assembled with an integrative structural modeling approach. Key words Microtubules, Microtubule-associated protein, Structural mass spectrometry, Crosslinking mass spectrometry, Photo-crosslinking, Protein–protein interaction analysis, Integrative modeling

1

Introduction Microtubule-associated proteins (MAPs) are a group of proteins that interact with microtubules (MTs) to regulate a wide variety of functions. MAPs can fine-tune microtubule structure and regulate dynamic instability, which is a hallmark of the cytoskeleton, but can also serve as an interface between MTs and different cellular components. These include the cell cortex, kinetochores, centrosomes, and a wide variety of “payloads” that can traffic along MTs. Hundreds of proteins have been classified as MAPs and many possess a high degree of disorder, making structural analysis by classical techniques quite challenging. Some examples include MT stabilizers such as tau [1], MAP2 [2], doublecortin [3], and

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_14, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

211

212

Atefeh Rafiei and David C. Schriemer

non-neuronal MAP4 [4] as well as destabilizers like stathmin [5] and XKCM1 [6] and capping proteins that are concentrated at MT ends like EB1 [7], XMAP215 [8], CLIP-170 [9, 10], and γ-TuRC [11]. Many motor proteins also engage the MT lattice in a directional manner (e.g., kinesins [12] and dyneins [13]), using motor domains that are linked in dimeric associations through weakly ordered distal domains. MAP functions are defined in large part by these higher order associations and disordered domains, often involving many other proteins in larger complexes that are specifically nucleated by the MT lattice itself. Understanding their function requires structural analysis ideally in situ, but minimally on a system that reconstitutes the full regulatory set of interacting proteins on the lattice. Cryo-EM has been used to resolve the structure of several MAP-MT interactions [14–21], but in many cases these are partial determinations given the technology’s difficulty in imaging heterogenous systems, particularly ones that possess a degree of disorder. It also struggles with systems that inherently involve partial lattice occupation and non-symmetrical associations. New techniques are required to supplement cryo-EM and take advantage of integrative methods in structure determination, where all available complementary structural information is combined through modeling. Integrative structural modeling can be considered an optimization exercise, which involves sampling all possible combinations of elements to reach a final model that best fits the input information [22]. The success of modeling is determined by agreement between the input structural data and the final model (Fig. 1), without exceeding the bounds of the precision of the underlying techniques and in this regard is no different than any other structural technique. However, in the integrative approach, input structural data from disparate sources are used to generate a scoring function to guide optimization. The approach can accept experimental data as well as physical theories and statistical analysis to reveal the conformation, position, and orientation of ultra-large multimeric protein complexes [23]. For MAP-MT interactions, the integrative method can take advantage of high-resolution MT structures [24, 25] as well as the atomic structures for stable domains from many individual MAPs. Occasionally, low resolutions of the MAP-MT interactions are also available. The method therefore requires a source of data that can assemble all elements and fill in missing regions of structure. Chemical crosslinking mass spectrometry (XL-MS) is an ideal source of complementary information, as it can generate hundreds of distance restraints from even moderately sized systems [26]. It is especially valuable in that it can accommodate dynamics, flexibility, and a measure of heterogeneity. In XL-MS, bifunctional chemical reagents—crosslinkers—are covalently bound to the side chains of surface amino acids in protein(s) that are nearby in space. Upon

Crosslinking MS of MT-MAPs

213

Fig. 1 A schematic representation of the four stages of the integrative modeling workflow, comprising (1) gathering of data, (2) representation of the subunits and translation of data into spatial restraints, (3) conformational sampling, and (4) analysis and validation of the ensemble of models

proteolytic digestion of the protein sample, high-resolution tandem MS is used to identify crosslinking sites, which provides the distance restraints for integrative modeling. XL-MS is a relatively simple experimental workflow that can be applied to small quantities of protein, which makes it more easily applied to nearly any protein complex, regardless of flexibility [27], size [28], and solubility [29]. XL-MS has only been applied to a few MT-based interactions [30–35], in part because of the unique constraints of the MT lattice itself. The exterior of the lattice is very negatively charged, and it possesses relatively few exposed lysines. This is a concern, given that most of the efficient crosslinkers available are designed to react with free amines. In addition, the polymeric nature of the lattice can make localization of MAPs somewhat challenging. Both of these problems have been exacerbated by the long lifetime of most crosslinkers, leading to the entrapment of non-native conformations [36, 37]. We have addressed many of these concerns by developing a strategy that samples structure more rapidly and by defining a pipeline for integrative modeling of MT-MAP interaction that uses the Integrative Modeling Platform (IMP) [22, 38] (Fig. 2). Here, we present an optimized XL-MS experimental workflow using photo-activatable NHS-diazirine crosslinking scheme for MAP-MT analysis, which can be used to supply this pipeline with accurate crosslinking data for modeling. The NHS-ester reacts with

214

Atefeh Rafiei and David C. Schriemer

Fig. 2 A schematic representation of a crosslinking-based structural analysis of a MAP-MT interaction, comprising the MAP-MT construction, biochemical validation, and XL-MS procedure. The integrative model presented in this figure is the doublecortin-microtubule interaction model generated based on this workflow

primary amines (and minimally serine, threonine, and tyrosine), while the photo-activation of the diazirine moiety results in a di-radical carbene that inserts non-specifically into any of the 20 amino acids. Application of the short-lived photo-crosslinking reagent is necessary for sampling the equilibrium conformation of a MAP-MT complex, and achieving a less biased sampling of the interaction with the lattice. In this protocol, we provide the methodology required to use this tailored crosslinking method for any MAP-MT interaction. MTs are labile protein structures that are very sensitive to buffer composition, temperature, and pH, special care is required for successful XL-MS experiments. We present a revised protocol for reconstituting and validating MAP-MT constructs, followed by the protocol for photo-crosslinking, MS data acquisition, and XL-MS data analysis, generating data ready for modeling activities.

2

Materials

2.1 Protein Preparations

1. Purify α/β-tubulin from bovine or porcine brains as reported [39] or alternatively, as the amounts needed are not high, purchase α/β-tubulin from a commercial source (e.g., Cytoskeleton Inc.) (see Note 1). 2. Prepare α/β-tubulin with a conjugate fluorophore [40] or alternatively, purchase labeled α/β-tubulin from a commercial source (e.g., Cytoskeleton Inc.) (see Note 1).

Crosslinking MS of MT-MAPs

215

3. Reconstitute or buffer exchange α/β-tubulin stocks in cold BRB80 buffer (described below) to 10–20 mg/mL and aliquot to single-experiment volumes on ice (5–10 μL), flash freeze and store at 80  C until use. 4. Purify the recombinantly expressed MAPs according to accepted procedures specific to the protein of interest. Commercial resources are available for numerous MAPs. Reconstitute or buffer exchange into BRB80 where possible. 2.2 Buffers and Stock Reagents

1. BRB80 buffer: 80 mM PIPES, 1 mM EGTA, 1 mM MgCl2, pH 6.8 (with KOH). Prepare 5X, aliquot and store at 4  C. 2. Size exclusion chromatography (SEC) mobile phase: 0.1% formic acid in 30% acetonitrile, degassed. Keep at room temperature; use within 1 week. 3. Reversed-phase liquid chromatography mobile phase A: 0.1% formic acid in water (both LC-MS grade), degassed. Keep at room temperature; use within 1 week. 4. Reversed-phase liquid chromatography mobile phase B: 0.1% formic acid in 80% acetonitrile (both LC-MS grade). Keep at room temperature; use within 1 week. 5. Taxotere: 100 mM in DMSO. Aliquot and store at 20  C. 6. Guanosine 50 -triphosphate dilithium (GTP): 100 mM in LC-MS grade water. Aliquot and store at 20  C. 7. Dithiothreitol (DTT): 100 mM in LC-MS grade water. Aliquot and store at 20  C. 8. Chloroacetamide (CAA): 50 mM in LC-MS grade water. Prepare before use and avoid light exposure by wrapping the tube in aluminum foil. 9. Prepare stock of NHS-diazirine class crosslinking reagent (e.g., LC-SDA, see Note 2) in anhydrous DMSO. The DMSO content in the final MT preparation should be kept below 5% (vol/vol), so prepare the crosslinking reagent stock to an appropriate concentration, ideally 20 mM or higher. Prepare immediately before use. 10. Ammonium bicarbonate (ABC): 500 mM in LC-MS grade water. Keep at room temperature and check pH regularly. 11. Trypsin: 1 mg/mL in cold LC-MS grade water, on ice. Aliquot and store at 80  C.

2.3 Required Equipment

1. Centrifugation for the preparation and isolation of polymerized α/β-tubulin at different stages in the workflow, using standard refrigerated and non-refrigerated benchtop models. 2. Fluorescence microscopy for interrogating MT assemblies. For example, a fluorescence microscope equipped with an 100

216

Atefeh Rafiei and David C. Schriemer

oil-immersion objective lens and filter set of 530–550 nm excitation and 580–600 emission (for rhodamine-labeled specimen). 3. A spectrophotometer for monitoring turbidity (degree and rate of polymerization). We recommend a 384-well plate reader equipped with 340 nm filter. 4. Crosslinking apparatus. A setup is required whereby the second step of the crosslinking reaction can be achieved quickly, ideally in seconds or less. Two options include an Nd:YAG laser operated at the third harmonic (355 nm) (model YG 980; Quantel, Les Ulis, France) or a high intensity arc lamp filtered for nearUV output. 5. SEC. An Agilent 1100 series or similar LC system configured with a peptide size exclusion column. 6. Liquid Chromatography–Mass spectrometry system. A nanospray tandem mass spectrometer with an associated nanoLC system is required. Any later model Orbitrap technology is ideal. We have used both an Orbitrap Velos and an Orbitrap Fusion Lumos.

3

Methods The procedure involves reconstituting an equilibrated MT-MAP interaction, on drug-stabilized MTs if required, using microscopy, SDS-PAGE, and turbidimetric methods to validate complex formation, lattice occupancy, and rate of growth. The equilibrated system is then crosslinked using a heterobifunctional reagent, digested and processed for LC-MS/MS identification of crosslinked peptides.

3.1 MT-MAP Preparation and Initial Testing

1. Rapidly thaw an aliquot of α/β-tubulin stock in room temperature water. Remove the tube immediately after thaw and place it on ice. Reconstitute tubulin at 4.0 mg/mL in cold BRB80 buffer (see Note 3). 2. Centrifuge the sample at a minimum of 16,800  g for 10 min at 4  C to pellet any badly aggregated and denatured α/β-tubulin prior to polymerization. The aggregates, if present, are observed as star-shaped constructs in fluorescence microscopy. Gently remove the supernatant and dilute it to 10 μM (approximately 1.0 mg/mL) in warm BRB80 + GTP (1 mM). This is the critical concentration for polymerization in the absence of drugs. Supplement with appropriate concentration of MAP, generally in the range of 0–20 μM (see Note 4). 3. Incubate the sample at 37  C in a water bath for 30–60 min to induce polymerization. Alternatively, to monitor polymerization, prepare as above with different concentrations of MAP in

Crosslinking MS of MT-MAPs

217

BRB80 + GTP (1 mM) on ice, and transfer to a 384-well plate and quickly transfer to the spectrophotometer, held at 37  C. Avoid bubbles and ensure that each well has at least 40 μL for a sufficient pathlength. 4. Monitor light scattering using a spectrophotometer equipped with a 340 nm filter. Take scans every 10 s for an appropriate time (e.g., 100 min). Standardize the values to the pathlength of 1 cm. 5. In separate MAP-MT formulations, use SDS-PAGE to determine occupancy (and an estimate of stoichiometry) in a pelleted sample. Centrifuge the sample at 16,800  g for 10 min at room temperature and remove the supernatant with a gel-loader pipette tip. 6. Wash the MT pellet with warm BRB80 and analyze the supernatant and pellet on an 6 or 8% SDS-PAGE. 3.2 Evaluation of the MAP-MT Construct by Fluorescence Microscopy

1. Reconstruct MAP-MT construct using 1:5 molar ratio of rhodamine-labeled to unlabeled α/β-tubulin as described above. Avoid light exposure for MAP-MT samples prepared for fluorescence microscopy, by wrapping tubes in aluminum foil and placing glass slides in a dark cassette when transferring to microscopy. 2. Dilute the MAP-MT sample 50–100 times in warm resuspension buffer (BRB80 supplemented with 10 μM Taxotere) immediately before fluorescence microscopy. Maintain the temperature of the MT incubation chamber and dilution buffer at 37  C to preserve MT structure (see Note 5). 3. Place 3 μL of diluted sample on a glass slide and cover with cover glass (e.g., 18 mm  18 mm 0.170  0.005 mm Carl Zeiss Microscopy, LLC, USA) and seal it with nail polish. The glass slides should be clean and acid-etched overnight with 1 M HCl and rinsed thoroughly by water and then by 95% ethanol. 4. Image MTs using a fluorescence microscope equipped with an oil-immersion 100 objective and a CCD camera. Use appropriate light source intensity and exposure time to capture MT images. Access to a temperature-controlled microscopy stage is ideal for imaging temperature sensitive MTs, but room temperature imaging works if performed no later than 10–20 min after deposition on the glass slide. 5. Process the microscopy images to adjust for contrast using microscopy specific tools, if needed, and evaluate the overall MT morphology, length, and curvature by using standard measurement tools in ImageJ [41]. The results should be compared with a MAP-free control. For stabilizers, MTs may appear slightly longer and for nucleators added during polymerization, MTs may appear shorter. The addition of a

218

Atefeh Rafiei and David C. Schriemer

fluorescently tagged MAP compatible with the rhodaminetubulin can be considered to validate lattice occupancy. 3.3 Crosslinking of MAP-MT and Preparation for LC-MS

1. Polymerize MAP-MT as above and add the crosslinking reagent to a final 1 mM concentration (see Note 6). Incubate the sample for a maximum of 10 min at 37  C, immediately followed by 5 s of photolysis at 355 nm (50  100 mJ laser pulse) [36] (see Note 7). Quench the reaction by addition of ammonium bicarbonate to a final concentration of 50 mM and incubate for 20 min at 37  C. 2. Separate the crosslinked MAP-MT complex from any possible free tubulin and/or MAP by centrifugation at 16,800  g for 10 min at room temperature. Remove the supernatant, and the wash the microtubule pellet once with warm BRB80 (37  C) and then dissolve in 50 mM ammonium bicarbonate solution to a final protein concentration of 1 mg/mL. 3. Reduce the cysteines by adding DTT to a final concentration of 10 mM with heating to 90  C for 10 min. Cool the sample to room temperature and alkylate cysteines by addition of chloroacetamide to a final concentration of 50 mM and incubate at 37  C for 30 min. Minimize exposure to light. 4. Add trypsin to an enzyme-to-substrate ratio of 1:50 (w/w) and incubate overnight at 37  C with nutation (150 rpm) (see Note 8). Quench the digestion reaction by adding formic acid to a final concentration of 2% (v/v) (pH of 2–3), and separate into to fractions (80:20). Desalt the smaller fraction and store at 80  C for LC-MS. 5. Lyophilize the larger fraction and reconstitute in SEC mobile phase for enrichment of crosslinked peptides (see Note 9). Enrich the crosslinked peptides by separation on a Peptide SEC column, employing LC system at 100 μL/min with fraction collection. For further details see Leitner et al. [42]. Dry down the SEC fractions in a vacuum centrifuge. Speed dried samples can be analyzed immediately or stored at 80  C.

3.4

LC-MS Analysis

1. Analyze the SEC fractions and un-enriched samples by highresolution LC-MS/MS by reconstituting the samples in 0.1% formic acid and injecting approximately 1 pmol onto the nanoLC-MS system. Estimates of amounts can be based on the UV absorbance trace of each SEC fraction. Separation can be achieved using a typical proteomics column configuration, such as a 50 cm PepMap RSLC C18 column (75 μm  50 cm, 2 μm particles, 100 Å; Thermo Scientific), with a gradient of at 5–28% mobile phase B for 60 min, 28–40% B for 20 min, followed by 40–95% B for 10 and 95% B for 10 min.

Crosslinking MS of MT-MAPs

219

2. Operate the mass spectrometer in positive ion mode, in a highresolution configuration for both MS and MS/MS. For example, an OT/OT configuration on an Orbitrap Lumos with MS resolution set at 120,000 (350–1300 m/z) and MS2 resolution at 15000. The instrument parameters should be optimized for each system as appropriate (see Note 10). 3.5 XL-MS Data Analysis

1. Download the latest version of CRIMP, software for crosslink data analysis from www.msstudio.ca, and familiarize yourself with the software through online tutorials and publications (e.g., [43]). 2. Import all raw LC-MS/MS data (SEC fractions and un-enriched sample) into CRIMP for processing along with the Fasta files of the MAPs and the α/β-tubulin isoforms (TB-α-1A, TB-α-1B, TB-α-1C, TB-α-1D, TB-α-4A, and TB-β, TB-β-2B, TB-β-4A, TB-β-4B, TB-β-3, TB-β-5) (see Note 11). 3. Identify crosslinked peptides using search criteria appropriate for the protein system and MS configuration employed. Example parameters include methionine oxidation and carbamidomethylation of cysteines as variable and fixed modifications, respectively, MS accuracy of 5 ppm; MS/MS accuracy of 10 ppm, Enzyme specified as trypsin (Naive), Peptide length range of 4–50 residues. Unique to CRIMP, select an E-threshold of 50 or higher, and set the FDR to an estimated level of 1% (see Note 12). 4. Export the results for IMP-based modeling using the available data aggregators, to remove redundancies in the raw results and generate a final list of high-quality unique crosslinks. The data can be exported directly into the Integrative Modeling Platform (IMP, [22]), provided it is preconfigured on a suitable computer. Alternatively, a complete deployment bundle is available at www.msstudio.ca and MT-related project files at the PDB-Dev (https://pdb-dev.wwpdb.org/).

4

Notes 1. Start with enough sample volume, especially if planned to perform crosslinked peptide enrichment. We recommend starting with at least 50 μg tubulin for each biological replicate. Fluorescent microscopy experiments require less starting material (20 μg each replicate). 2. Different NHS-diazirine crosslinkers are available which present different spacer arm lengths and solubility. Two examples include SDA and LC-SDA, with spacer arm length of 3.9 Å and

220

Atefeh Rafiei and David C. Schriemer

12.5 Å, respectively, and are membrane-permeable. We noted better crosslinking efficiency of LC-SDA versus SDA (maybe due to longer spacer length) [36]. 3. Never refreeze the left-over α/β-tubulin sample after thawing. Extra sample should be discarded after thaw. Avoid α/β-tubulin stock sitting on ice for more than 20 min, as it can lose polymerization competency. 4. Depending on the nature of the interaction, the α/β-tubulin can be polymerized in the presence of the MAP and may tolerate lower concentrations of α/β-tubulin as a result. A stabilizer drug can be used to lower the required concentration of α/β-tubulin. We have had greater success using Taxotere than Taxol. 5. Microtubules samples should be treated carefully, as these polymeric protein structures are very sensitive to shear stress from pipetting. Hence, narrow pipette tips should be avoided. Cutting the tip off the larger pipette tips, such as a P20 at a 3 mm height is recommended. 6. The concentration of crosslinking reagent should be optimized for each protein system. It can be performed by SDS-PAGE of the crosslinked sample, if the linked product can be separated from the free forms. A good rule of thumb is to use a crosslinker concentration equal to the molar concentration of available lysine residue. For instance, for a protein system at 10 μM which contains 100 lysine per protein complex, 1 mM crosslinker reagent is used. 7. A lower fluence light source can be used but reaction time would need to be increased. If this method is chosen, it may be advisable to quench the first reaction prior to photolysis. 8. This is a general procedure for proteolytic digestion; however, crosslinking can render samples more difficult to digest. The method can be supplemented with chaotropes as needed (e.g., urea) and alternative enzymes as required. 9. An alternative enrichment process involves strong cation exchange (SCX) chromatography (see Fritzsche et al. [44]). 10. Use high energy collisional dissociation (HCD) to generate MS2 spectra, setting ion selection to 4+ charge states and higher, using a normalized collision energy of at least 32% and an isolation width of 1.2 m/z. 11. Include all major α/β-tubulin isoforms in the database to decrease the chance of missing crosslinked peptides involving isoform-specific peptides. The list of tubulin isoforms present in sample can be confirmed for each α/β-tubulin sample by proteomics analysis of an un-crosslinked MT sample.

Crosslinking MS of MT-MAPs

221

12. For smaller searches such as these, the FDR may not be a reliable gauge of false discoveries and manual inspections near the cut-off may be required. References 1. Baas PW, Pienkowski TP, Cimbalnik KA, Toyama K, Bakalis S, Ahmad FJ, Kosik KS (1994) Tau confers drug stability but not cold stability to microtubules in living cells. J Cell Sci 107(1):135–143 2. Lewis SA, Wang D, Cowan NJ (1988) Microtubule-associated protein MAP 2 shares a microtubule binding motif with tau protein. Science 242(4880):936–939 3. Horesh D, Sapir T, Francis F, Grayer Wolf S, Caspi M, Elbaum M, Chelly J, Reiner O (1999) Doublecortin, a stabilizer of microtubules. Hum Mol Genet 8(9):1599–1610 4. Chapin SJ, Bulinski JC (1991) Non-neuronal 210 10 (3) Mr microtubule-associated protein (MAP 4) contains a domain homologous to the microtubule-binding domains of neuronal MAP 2 and tau. J Cell Sci 98(1):27–36 5. Andersen SS (2000) Spindle assembly and the art of regulating microtubule dynamics by MAPs and Stathmin/Op18. Trends Cell Biol 10(7):261–267 6. Kline-Smith SL, Walczak CE (2002) The microtubule-destabilizing kinesin XKCM1 regulates microtubule dynamic instability in cells. Mol Biol Cell 13(8):2718–2731 7. Mimori-Kiyosue Y, Shiina N, Tsukita S (2000) The dynamic behavior of the APC-binding protein EB1 on the distal ends of microtubules. Curr Biol 10(14):865–868 8. Van Breugel M, Drechsel D, Hyman A (2003) Stu2p, the budding yeast member of the conserved Dis1/XMAP215 family of microtubuleassociated proteins is a plus end–binding microtubule destabilizer. J Cell Biol 161(2): 359–369 9. Pierre P, Scheel J, Rickard JE, Kreis TE (1992) CLIP-170 links endocytic vesicles to microtubules. Cell 70(6):887–900 10. Perez F, Diamantopoulos GS, Stalder R, Kreis TE (1999) CLIP-170 highlights growing microtubule ends in vivo. Cell 96(4):517–527 11. Zheng Y, Wong ML, Alberts B, Mitchison T (1995) Nucleation of microtubule assembly by a γ-tubulin-containing ring complex. Nature 378(6557):578–583 12. Rice S, Lin AW, Safer D, Hart CL, Naber N, Carragher BO, Cain SM, Pechatnikova E, Wilson-Kubalek EM, Whittaker M (1999) A structural change in the kinesin motor protein

that drives motility. Nature 402(6763): 778–784 13. Vallee RB, Williams JC, Varma D, Barnhart LE (2004) Dynein: an ancient motor protein involved in multiple modes of transport. J Neurobiol 58(2):189–200 14. Maurer SP, Fourniol FJ, Bohner G, Moores CA, Surrey T (2012) EBs recognize a nucleotide-dependent structural cap at growing microtubule ends. Cell 149(2):371–382 15. Kellogg EH, Howes S, Ti S-C, Ramı´rezAportela E, Kapoor TM, Chaco´n P, Nogales E (2016) Near-atomic cryo-EM structure of PRC1 bound to the microtubule. Proc Natl Acad Sci 113(34):9430–9439. https://doi. org/10.1073/pnas.1609903113 16. Gigant B, Wang W, Dreier B, Jiang Q, Pecqueur L, Plu¨ckthun A, Wang C, Knossow M (2013) Structure of a kinesin–tubulin complex and implications for kinesin motility. Nat Struct Mol Biol 20(8):1001 17. Redwine WB, Herna´ndez-Lo´pez R, Zou S, Huang J, Reck-Peterson SL, Leschziner AE (2012) Structural basis for microtubule binding and release by dynein. Science 337(6101): 1532–1536 18. Kellogg EH, Hejab NM, Poepsel S, Downing KH, DiMaio F, Nogales E (2018) Near-atomic model of microtubule-tau interactions. Science 360(6394):1242–1246 19. Fourniol FJ, Sindelar CV, Amigues B, Clare DK, Thomas G, Perderiset M, Francis F, Houdusse A, Moores CA (2010) Templatefree 13-protofilament microtubule–MAP assembly visualized at 8 Å resolution. J Cell Biol 191(3):463–470. https://doi.org/10. 1083/jcb.201007081 20. Tan D, Rice WJ, Sosa H (2008) Structure of the kinesin13-microtubule ring complex. Structure 16(11):1732–1739 21. Sosa H, Dias DP, Hoenger A, Whittaker M, Wilson-Kubalek E, Sablin E, Fletterick RJ, Vale RD, Milligan RA (1997) A model for the microtubule-Ncd motor protein complex obtained by cryo-electron microscopy and image analysis. Cell 90(2):217–224 22. Russel D, Lasker K, Webb B, Vela´zquezMuriel J, Tjioe E, Schneidman-Duhovny D, Peterson B, Sali A (2012) Putting the pieces together: integrative modeling platform

222

Atefeh Rafiei and David C. Schriemer

software for structure determination of macromolecular assemblies. PLoS Biol 10(1): e1001244 23. Ward AB, Sali A, Wilson IA (2013) Integrative structural biology. Science 339(6122): 913–915 24. Nogales E, Zhang R (2016) Visualizing microtubule structural transitions and interactions with associated proteins. Curr Opin Struct Biol 37:90–96 25. Alushin GM, Lander GC, Kellogg EH, Zhang R, Baker D, Nogales E (2014) Highresolution microtubule structures reveal the structural transitions in αβ-tubulin upon GTP hydrolysis. Cell 157(5):1117–1129 26. Liu F, Heck AJ (2015) Interrogating the architecture of protein assemblies and protein interaction networks by cross-linking mass spectrometry. Curr Opin Struct Biol 35: 100–108 27. Klykov O, van der Zwaan C, Heck AJ, Meijer AB, Scheltema RA (2020) Missing regions within the molecular architecture of human fibrin clots structurally resolved by XL-MS and integrative structural modeling. Proc Natl Acad Sci 117(4):1976–1987 28. Kim SJ, Fernandez-Martinez J, Nudelman I, Shi Y, Zhang W, Raveh B, Herricks T, Slaughter BD, Hogan JA, Upla P (2018) Integrative structure and functional anatomy of a nuclear pore complex. Nature 555(7697):475–482 29. Debelyy MO, Waridel P, Quadroni M, Schneiter R, Conzelmann A (2017) Chemical crosslinking and mass spectrometry to elucidate the topology of integral membrane proteins. PLoS One 12(10):e0186840 30. Kadavath H, Hofele RV, Biernat J, Kumar S, Tepper K, Urlaub H, Mandelkow E, Zweckstetter M (2015) Tau stabilizes microtubules by binding at the interface between tubulin heterodimers. Proc Natl Acad Sci 2015: 201504081. https://doi.org/10.1073/pnas. 1504081112 31. Rafiei A, Schriemer DC (2019) A microtubule crosslinking protocol for integrative structural modeling activities. Anal Biochem 586: 113416 32. Legal T, Zou J, Sochaj A, Rappsilber J, Welburn JP (2016) Molecular architecture of the Dam1 complex–microtubule interaction. Open Biol 6(3):150237. https://doi.org/10. 1098/rsob.150237 33. Zelter A, Bonomi M, Kim J, Umbreit NT, Hoopmann MR, Johnson R, Riffle M, Jaschob D, MJ MC, Moritz RL (2015) The molecular architecture of the Dam1 kinetochore complex is defined by cross-linking based structural modelling. Nat Commun 6:

8 6 7 3 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 8 / ncomms9673 34. Rafiei A (2021) Mass Spectrometry-based Integrative Structural Modeling of the Doublecortin-Microtubule Interaction. PhD Thesis. 35. Abad MA, Medina B, Santamaria A, Zou J, Plasberg-Hill C, Madhumalar A, Jayachandran U, Redli PM, Rappsilber J, Nigg EA (2014) Structural basis for microtubule recognition by the human kinetochore Ska complex. Nat Commun 5:2964. https://doi.org/10.1038/ ncomms3964 36. Ziemianowicz DS, Ng D, Schryvers AB, Schriemer DC (2018) Photo-Cross-Linking Mass Spectrometry and Integrative Modeling Enables Rapid Screening of Antigen Interactions Involving Bacterial Transferrin Receptors. J Proteome Res 18(3):934–946 37. Belsom A, Rappsilber J (2020) Anatomy of a crosslinker. Curr Opin Chem Biol 60:39–46 38. Rafiei A, Lee L, Crowder DA, Saltzberg DJ, Sali A, Brouhard GJ, Schriemer DC (2022) Doublecortin engages the microtubule lattice through a cooperative binding mode involving its C-terminal domain. ELife 11:e66975 39. Gell C, Friel CT, Borgonovo B, Drechsel DN, Hyman AA, Howard J (2011) Purification of tubulin from porcine brain. In: Microtubule dynamics. Springer, pp 15–28 40. Peloquin J, Komarova Y, Borisy G (2005) Conjugation of fluorophores to tubulin. Nat Methods 2(4):299–303 41. Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B, Tinevez J-Y, White DJ, Hartenstein V, Eliceiri K, Tomancak P, Cardona A (2012) Fiji: an opensource platform for biological-image analysis. Nat Methods 9(7):676–682. https://doi.org/ 10.1038/nmeth.2019 42. Leitner A, Walzthoeni T, Aebersold R (2014) Lysine-specific chemical cross-linking of protein complexes and identification of crosslinking sites using LC-MS/MS and the xQuest/xProphet software pipeline. Nat Protoc 9(1):120–137 43. Sarpe V, Rafiei A, Hepburn M, Ostan N, Schryvers AB, Schriemer DC (2016) High sensitivity crosslink detection coupled with integrative structure modeling in the Mass Spec Studio. Mol Cell Proteomics 15(9):3071–3080. https://doi.org/10.1074/mcp.O116.058685 44. Fritzsche R, Ihling CH, Go¨tze M, Sinz A (2012) Optimizing the enrichment of crosslinked products for mass spectrometric protein analysis. Rapid Commun Mass Spectrom 26(6):653–658

Chapter 15 Comprehensive Interactome Mapping of Nuclear Receptors Using Proximity Biotinylation Lynda Agbo, Sophie Anne Blanchet, Pata-Eting Kougnassoukou Tchara, Ame´lie Fradet-Turcotte, and Jean-Philippe Lambert Abstract Nuclear receptors, including hormone receptors, perform their cellular activities by modulating their protein–protein interactions. They engage with specific ligands and translocate to the nucleus, where they bind the DNA and activate extensive transcriptional programs. Therefore, gaining a comprehensive overview of the protein–protein interactions they establish requires methods that function effectively throughout the cell with fast dynamics and high reproducibility. Focusing on estrogen receptor alpha (ESR1), the founding member of the nuclear receptor family, this chapter describes a new lentiviral system that allows the expression of TurboID-hemagglutinin (HA)-2  Strep tagged proteins in mammalian cells to perform fast proximity biotinylation assays. Key validation steps for these reagents and their use in interactome mapping experiments in two distinct breast cancer cell lines are described. Our protocol enabled the quantification of ESR1 interactome generated by cellular contexts that were hormone-sensitive or not. Key words Proximity biotinylation, Interactome mapping, Nuclear receptor, Mass spectrometry, Proteomics, TurboID

1

Introduction The first nuclear receptor (NR), estrogen receptor alpha (ESR1), was described in the 1950s [1]. Since then, 48 NR genes have been discovered in the human genome [2], which share a common structure comprising a variable N-terminal domain, a DNA binding domain, a ligand binding domain, and a C-terminal heterodimerization domain [3]. NRs are activated by lipid-soluble ligands [4] and depending on the nature of those ligands, NRs can be sorted in three subtypes, namely steroid, thyroid hormone, and orphan

Supplementary Information The online version contains supplementary material available at [https://doi.org/ 10.1007/978-1-0716-2124-0_15]. Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_15, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

223

224

Lynda Agbo et al.

receptors [5]. They are involved in the regulation of several fundamental cellular processes, such as proliferation, differentiation, and metabolism [6, 7]. To fulfill their cellular functions, most proteins interact with other proteins. Therefore, much can be learned about biological systems and cellular pathways by mapping protein–protein interactions (PPIs). In the last few decades, numerous small- and largescale efforts have significantly enhanced our understanding of the molecular mechanisms underlying numerous aspects of cellular biology. Importantly, interactome mapping has contributed to our knowledge of cellular physiology as well as the pathological details of numerous diseases [8, 9], such as cancers [10, 11], diabetes [12], and neurodegenerative diseases [13, 14]. NRs can translocate through multiple cellular compartments, where they interact with distinct partners to impact cellular functions. Their PPIs are quite heterogenous, including both stable and transient complexes. The varying dissociation constants of these interactions impact their durations and functions. Traditionally, PPIs are identified through biochemical assays, most commonly by affinity purification coupled to mass spectrometry (AP-MS), in which cells are lysed in a mild lysis buffer to preserve PPIs throughout the procedure [15]. Unfortunately, this approach is poorly amenable to NRs, as their associations with membranes and chromatin impart poor solubility. The introduction of proximity-dependent labeling methods overcame many of the limitations of traditional NR AP-MS [16]. When coupled with other quantitative approaches, these methods have helped address several research questions [17]. In these experiments, a fusion protein consisting of a “bait” protein of interest and an enzyme capable of biotinylation is expressed in cells. Supplementing the culture medium with biotin or biotin-phenol induces the biotinylation of proteins that come into direct or indirect contact with the tagged bait. The spatial radius of biotinylation depends on the method used, the reaction time, and the dynamics of the subcellular structure being studied, but is sufficiently limited to permit bait specificity [17]. Once “prey” proteins are biotinylated, they are efficiently purified by exploiting the strong affinity between biotin and streptavidin [18] and quantified by MS. Two main systems are used for proximity labeling, which are catalyzed by biotin ligases and ascorbate peroxidases. There are many biotin-based (BioID, BioID2, BASU, split BioID, TurboID, and miniTurbo) [19–21] and ascorbate peroxidase-based (APEX, APEX2, split APEX) [22–24] labeling techniques, which have been described elsewhere (see Table 1 for an overview of these approaches). Despite their technical differences, these assays all map transient PPIs effectively because they covalently mark

Comprehensive Interactome Mapping of Nuclear Receptors Using Proximity. . .

225

Table 1 Overview of common proximity-dependent biotinylation assays characteristics BioID

TurboID

APEX2

Biotin ligase (BirA*)

Engineered biotin ligase (TurboID)

Ascorbate peroxidase

Biotin

Biotin

Biotin-phenol

Enzyme + Substrate

Enzyme + Substrate

Enzyme + Substrate + H2O2

Biotinylation quencher

None

None

Sodium ascorbate + Sodium azide + Trolox®

Average reaction time

24 hours

10--60 minutes

1 minute

37°C

30°C

30°C

Enzyme Substrate Biotinylation trigger

Optimal temperature

[B]

O HN

OH Activated substrate

NH

O S

O

Lysine

Tyrosine

+

NH3

OH

Target residue

NH

NH O

O

O

o

[B] [B]

HN NH

HN

HO

S

OH

Product

NH

NH O

O

proximal proteins, preventing the need for sustained PPIs throughout biochemical purification. Proximity-dependent biotinylation assays are effective interactome mapping strategies to characterize NRs in an array of cellular

226

Lynda Agbo et al.

A) 5′LTR

PTight

chloramphenicol/ ccdB resistance cassette Cm AttR1

R

T2A

hPGK PuroR

C AttR2 B)

3′LTR rtTA

Turbo-HA-2×Strep-eGFP eGFP-NLS + - + - + doxycycline

75 63

TurboID

48

HA 2×Strep

IB: HA

35 135 100

IB: KAP1

Fig. 1 A gateway-compatible lentiviral system to express N-terminal TurboID baits tagged with HA-2  Strep in mammalian cells (a) Schematic representation of the gateway-compatible lentiviral vector enabling generation of TurboID-tagged fusion proteins (pDEST-TurboID-HA-2  Strep). LTR long terminal repeat; PTight tight tetracycline response element (TRE) promoter; HA hemagglutinin tag; 2  Strep Strep tag II; AttR Gateway recombination sequence; CmR chloramphenicol resistance gene; CcdB CcdB toxin gene; hPGK human phosphoglycerate kinase promoter; PuroR puromycin resistance gene; rtTA reverse tetracyclinecontrolled transactivator. (b) Transient expression of the indicated Turbo-HA-2  Strep fusion proteins in HEK293T cells 24 h after induction with 5 μg/mL doxycycline. Whole cell extracts were resolved by SDS-polyacrylamide gel electrophoresis and analyzed by immunoblotting with an anti-HA. KAP1 was used as a loading control

contexts. In this chapter, we describe a novel lentiviral system enabling simple TurboID experiments in mammalian cell lines (Fig. 1). We have validated these reagents and the protocols described here in two breast cancer cell lines using immunofluorescence (Fig. 2a, b) and western blotting (Fig. 2c, d) approaches. By coupling TurboID to MS analysis, we were able to detect 81 (MCF-7) and 56 (MDA-MB-231) significant interaction partners for ESR1 at a SAINTexpress false discovery rate (FDR) of 1% (Fig. 2e, Table S1). Benchmarking these results against the BioGRID [25] revealed that 57% (MCF-7) and 70% (MDA-MB-231) of the hits were putative novel interaction partners for ESR1. Additionally, we observed that the cellular context had a major impact on ESR1’s interactome, reflecting the differences in sexual hormone responses between the two cell lines (Fig. 2f). The protocols described can be broadly applied to the study of mammalian proteins of interest and can be potentiated by coupling them to fully quantitative mass spectrometry approaches, as we have done before [26, 27]. Additionally, the timescale of the TurboID approach is amenable to the study of NR agonists and antagonists and could be used to characterize their modes of action.

A

B MDA-MB-231

MCF-7

parental

parental

eGFP-NLS

eGFP-NLS

ESR1

ESR1 10 μm DAPI

10 μm

anti-HA Streptavidin Merged

DAPI

C

anti-HA Streptavidin Merged

D MDA-MB-231

parental eGFP-NLS ESR1 parental eGFP-NLS ESR1

parental eGFP-NLS ESR1 parental eGFP-NLS ESR1

MCF-7

Biotin-depleted FBS + + + - - doxycycline + + + + + + 180 135 100 75 63

Biotin-depleted FBS + + + - - doxycycline + + + + + + 180 135 100 75 63

ESR1 eGFP-NLS IB: HA

eGFP-NLS IB: HA

180 135 100 75 63 48 35

180 135 100 75 63 48 35

IB: Streptavidin

50

IB: Tubulin

E

ESR1

New interactions Reported in BioGRID

IB: Streptavidin 50

IB: Tubulin

F

MCF-7

81

10

46

MDA-MB-231 0

20 100 40 60 80 # of interactions detected (FDR ≤ 0.01%)

MCF-7

MDA-MB-231

Fig. 2 Validation and interaction mapping of TurboID-tagged ESR1 in breast cancer cell lines. Immunofluorescence of TurboID-HA-2  Strep-ESR1 fusion protein in MDA-MB-231 (a) and MCF-7 (b) cells. Cells were immunostained for HA and stained with streptavidin and DAPI following a 24 h induction with 1 μg/mL of doxycycline and a 1 h treatment with 500 μM biotin. Western blot obtained from MDA-MB-231 (c) and MCF-7 (d) cells treated as in (a) and (b) show the TurboID-HA-2  Strep-tagged ESR1 bait and biotinylated proteins. Tubulin was used as a loading control. (e) Bar graph of ESR1 interaction partners (SAINTexpress FDR  1%) identified by TurboID in MCF-7 (green) and MDA-MB-231 (red) cells. Novel PPIs detected in this study are shown in dark red and green, while those reported in the BioGRID repository are shown in light red and green. (f) Overlap between the ESR1 TurboID results in MCF-7 (green) and MDA-MB-231 (red) cells. Complete SAINTexpress results can be found in Table S1

228

2

Lynda Agbo et al.

Materials

2.1 Plasmids and Lentivirus Production

1. Gateway entry clones for the gene of interest (ESR1 pDONR223; HsCD00376961) and a control (enhanced green fluorescent protein fused to a nuclear localization sequence (eGFP-NLS)). 2. Gateway destination vector pCW57.1 containing a Turbo-HA2  Strep-tag (see Note 1 for details regarding its creation). 3. Gateway LR Clonase II enzyme mix. 4. MAX Efficiency Stbl2 Competent Cells. 5. HEK 293T cell line (ATCC). 6. pCMV-VSV-G packaging vector (a gift from Bob Weinberg, Addgene #8454). 7. psPAX2 packaging vector (a gift from Didier Trono, Addgene #12260). 8. Opti-MEM reduced serum media. 9. Polyethylenimine (PEI MAX; Polysciences Inc.). 10. Complete Dulbecco’s modified Eagle’s medium (DMEM): DMEM with L-glutamine, sodium pyruvate, and phenol red (Wisent), 10% fetal bovine serum (FBS; Wisent), and 1% Penicillin-Streptomycin (P/S, 10,000 U/mL; Gibco). Heat inactivated FBS is generated by heating 50 mL FBS aliquots to 56  C in a water bath for 30 min. 11. 1 M HEPES pH 7.4. 12. 0.45-μm filters.

2.2 Lentiviral Transduction

1. MDA-MB-231 cell line (ATCC). 2. MCF-7 cell line (ATCC). 3. Biotin-depleted medium: DMEM, 10% FBS, and 1% P/S (see Note 2). 4. Polybrene, 8 mg/mL. 5. 1 phosphate-buffered saline (PBS) pH 7.4, without calcium and magnesium. 6. Puromycin, 1 mg/mL.

2.3 Construct Validation in Polyclonal Populations and Clonal Isolation

1. Coverslips. 2. 4% formaldehyde in 1 PBS. 3. Blocking buffer: 1 PBS, 0.3% Triton X-100, 5% goat normal serum. 4. Dilution buffer: 1 PBS, 1% bovine serum albumin (BSA), 0.3% Triton X-100.

Comprehensive Interactome Mapping of Nuclear Receptors Using Proximity. . .

229

5. Mouse polyclonal anti-HA (6E2) primary antibody (Cell Signaling Technologies). 6. Rabbit polyclonal anti-KAP1 primary antibody (Bethyl). 7. Alexa Fluor 488-conjugated goat anti-mouse antibody. 8. Alexa Fluor 555-conjugated streptavidin. 9. Prolong gold antifade reagent with 40 ,6-diamidino-2-phenylindole (DAPI). 10. Microscope slides. 11. Goat anti-mouse and anti-rabbit horseradish peroxidase (HRP)-conjugated antibodies. 12. Sonicator. We employ a Diagenode Bioruptor plus UCD-300 at 300 W intensity, using five cycles of 30 s on and 30 s off at 4  C. The parameters to be employed for other sonicator should be empirically determined. 13. Home-made 1 Tris Buffered Saline, 0.1% Tween (TBST). To prepare, dissolve 6.06 g Tris (Bioshop) and 8.76 g Sodium chloride (Biobasic) in distilled water (ddH2O). Adjust pH to 7.4 with 1 M HCl and make volume up to 1 L. Add 1 mL of Tween (Biobasic) while gently mixing. 2.4 Cell Induction for TurboID

1. Doxycycline: 1 mg/mL stock in H2O (V/V). 2. D-Biotin stock solution, 20 mM: dissolve in 30% (V/V) NH4OH in H2O, then neutralize to pH 7.4 with 1 M HCl and filter sterilize. 3. Phenol red-free DMEM with L-glutamine and sodium pyruvate. 4. 7.5% sodium bicarbonate. 5. 100 μM E2. Prepare a 100 μM stock solution by first dissolving 136 mg of E2 in 10 mL of 100% ethanol to generate a 50 mM solution. Then, take 100 μL of this solution and dilute it in 50 mL of 100% ethanol. Store the 100 μM stock solution at 20  C.

2.5

TurboID

1. Radioimmunoprecipitation (RIPA) lysis buffer: 1% (v/v) NP-40, 0.1% (v/v) sodium dodecyl sulfate (SDS), 50 mM Tris-HCl (pH 7.4), 150 mM NaCl, 0.5% (w/v) sodium deoxycholate, 1 mM ethylenediaminetetraacetic acid (EDTA). Supplement with 1 Protease Inhibitor cocktail, 10 mM dithiothreitol (DTT), and 100 mM phenylmethylsulfonyl fluoride (PMSF) immediately before use. 2. Fisher F60 Sonic Dismembrator. 3. Turbonuclease,  250 units/μL. 4. Streptavidin Sepharose beads.

230

Lynda Agbo et al.

5. Trypsin. 6. Ammonium bicarbonate (ABC) pH 8.5. 2.6 Peptide Desalting

1. Desalting Buffer A: 0.5% formic acid in H2O (see Note 3). 2. Desalting Buffer B: 0.5% formic acid in 80% acetonitrile and 20% H2O. 3. Home-made C18 StageTip using Empore solid discs, prepared as per Rappsilber et al. [28], are employed. Alternatively, commercial StageTips can be used. 4. Methanol. 5. CentriVap Vacuum Concentrator (Speedvac).

3

Methods

3.1 Plasmid and Lentivirus Production

1. Lentiviral plasmids are amplified in Stbl2 competent cells (see Note 4). 2. On day 0, seed 1.2  105 HEK293T cells in complete DMEM in a 10-cm plate, aiming for ~70% confluence the next day (see Note 5). 3. On day 1, add 0.8 μg of VSV-G plasmid, 7.4 μg of PAX2 plasmid, 7.8 μg of the destination vector, and 32 μL of PEI MAX (1 mg/mL stock) to 700 μL of warm Opti-MEM and mix. 4. Incubate for 20 min at room temperature (21–23  C), then carefully add the mixture dropwise to the medium. 5. Incubate cells at 37  C in 5% CO2 for 24 h. 6. On day 2, exchange the medium with DMEM supplemented with 10% heat inactivated FBS. Add HEPES pH 7.4 to a final concentration of 10 mM (7.5 mL of 1 M stock solution for 500 mL DMEM) and incubate overnight at 37  C and 5% CO2. 7. On day 3, collect the supernatant containing the lentiviral particles and filter it through a 0.45-μm filter. Store the supernatant at 4  C for up to 14 days (see Note 6).

3.2 Lentiviral Transduction

1. On day 0, seed 2  105 MDA-MB-231 or MCF7 cells in 2 mL of complete DMEM in a 6-well plate, aiming for ~25% confluency. Add 2 μL of 8 mg/mL stock polybrene solution, to a final concentration of 8 μg/mL, and 20 μL of 1 M HEPES pH 7.4, to a final 10 mM concentration, to each well. 2. On day 1, add 1 mL of supernatant containing the appropriate lentiviral particles to each well. The multiplicity of infection

Comprehensive Interactome Mapping of Nuclear Receptors Using Proximity. . .

231

range can be optimized to minimize the amount of lentiviral particle-containing supernatant employed per transduction. 3. On day 2, remove the medium and wash with 1 PBS to remove all remaining lentiviral particles, then add 2 mL of fresh complete DMEM. 4. On day 4, begin selection by adding 4 μL of a 1 mg/mL puromycin stock to a final concentration of 2 μg/mL for at least 48 h. 5. After selection, remove the puromycin-containing medium and add fresh complete DMEM. 6. Expand cells and proceed to polyclonal population validation and monoclonal population isolation. 3.3 Construct Validation in a Polyclonal Population and Monoclonal Cell Population Isolation

1. On day 0, seed 1  106 MDA-MB-231-Turbo-HA-2  StrepX or MCF7-Turbo-HA-2  Strep-X cells (where “X” indicates the bait protein expressed by the destination vector) on coverslips in a 6-well plate. 2. On day 1, induce bait protein expression with 1 μg/mL of doxycycline for 24 h. 3. On day 2, supplement the medium with 500 μM biotin for 1 h. 4. Aspirate the medium and quickly wash each well three times with 1 mL of 1 PBS. 5. Fix cells in 4% formaldehyde for 15 min at room temperature in a fumehood. 6. Wash each well three times for 5 min with 1 mL of 1 PBS. 7. Block and permeabilize cells with blocking buffer for 1 h at room temperature. 8. Remove blocking buffer and incubate cells with primary antibody (anti-HA diluted 1:1000 in antibody dilution buffer) at 4  C overnight. 9. The next day, wash each well three times for 5 min with 1 mL of 1 PBS. 10. Incubate cells for 1 h at room temperature with appropriate secondary antibodies (Alexa Fluor 488-conjugated goat antimouse and Alexa Fluor 555-conjugated streptavidin) at 1: 1000 in antibody dilution buffer. From this point on, minimize the exposure of the coverslips to ambient light. 11. Wash each well three times for 5 min with 1 mL of 1 PBS. 12. Mount coverslips on slides with Prolong Gold Antifade reagent with DAPI. 13. Image samples using a confocal microscope.

232

Lynda Agbo et al.

14. Proceed to construct validation by immunofluorescence in polyclonal population and to monoclonal cell isolation in parallel to save time (see Note 7). 15. Trypsinize stable polyclonal population and isolate monoclonal cell by performing a limiting dilution in 96-well plates. 16. Expand individual cells until colonies can be seen with a microscope annotating wells to be discarded that contained more than an individual cell initially. 17. When clones reach ~50% confluency in 96-well plates, divide them into two aliquots in 24-well plates. One aliquot should be incubated in biotin-depleted DMEM for western blot validation; the other in complete DMEM to expand cell numbers (to be employed in Subheading 3.4). 18. When cells reach ~60% confluency, transfer them to 6-well plates in biotin-depleted DMEM. 19. When cells reach ~80% confluency, exchange their medium with fresh biotin-depleted DMEM. 20. The next day, induce expression of the tagged protein with 1 μg/mL doxycycline for 24 h. 21. The next day, supplement the medium with 500 μM biotin for 1 h. 22. Remove medium and wash wells carefully with cold 1 PBS. 23. Scrape cell in 1 mL of cold 1 PBS with a clean silicone spatula and pellet them at 1500  g for 5 min in 1.5 mL Eppendorf tubes. You can snap freeze the pellets and store them at 80  C for many months prior to lysis. 24. Resuspend each cell pellet in 200 μL RIPA lysis buffer supplemented with protease inhibitors by pipetting up and down. Lyse cells on ice for ~5 min. 25. Sonicate lysates using a Bioruptor UCD-300 at high intensity (300 W), using five cycles of 30 s on and 30 s off at 4  C. 26. Centrifuge at 13,000  g for 15 min at 4  C to clear the lysates. 27. Transfer supernatants to fresh tubes. Samples can be stored for many months at 80  C at this point. 28. Determine protein concentration of each sample using a bicinchoninic acid assay or similar protein quantification assay. 29. Dilute lysate with 4 Laemmli buffer as appropriate. 30. Incubate samples at 95  C for 5 min. 31. Load 10–50 μg protein onto a polyacrylamide gel and resolve proteins as appropriate (see Note 8). 32. Transfer proteins to a nitrocellulose membrane, then block it for 1 h in 5% non-fat milk in TBST on a rocking platform at

Comprehensive Interactome Mapping of Nuclear Receptors Using Proximity. . .

233

room temperature. To evaluate biotinylation, we recommend blocking in 5% BSA in TBST, as non-fat milk drastically reduces the signal obtained from streptavidin-HRP reagents. 33. Incubate membrane with mouse anti-HA primary antibody (1: 1000) diluted in 5% non-fat milk in TBST overnight. 34. The next day, wash the membrane four times for 5 min with TBST, then incubate with the appropriate secondary antibody in 5% non-fat milk in TBST for 1 h. To visualize biotinylation, we use HRP-conjugated streptavidin thus no secondary antibody is required. 35. Wash the membranes four times for 5 min with TBST before incubating them with electrochemiluminescence (ECL) reagents for 5 min and verify the chemiluminescence signals generated on film. We use KAP1 or Tubulin as loading control. 36. Based on western blot analysis, choose a clone that expresses the bait protein at a similar level than the control (TurboIDHA-2  Strep-eGFP-NLS). 3.4 MDA-MB-231 Cell Induction for TurboID

1. Transfer cells from a 6-well plate to a 10-cm plate containing biotin-depleted DMEM. 2. When cells are near confluency, passage them to two 15-cm plates in biotin-depleted DMEM. 3. When the cells reach ~60% confluency, exchange the medium with fresh biotin-depleted DMEM. 4. When the cells reach ~80% confluency, induce bait expression with 1 μg/mL doxycycline for 24 h. 5. To promote the biotinylation of proximal proteins, supplement the media with 500 μM biotin for 1 h. 6. Remove the media and rinse the cells with cold 1 PBS before harvesting with a scraper or silicone spatula 1 mL of cold 1 PBS and pelleting them by gentle centrifugation. The cell pellet is then flash-frozen on dry ice. Cell pellets can be stored at 80  C up to a year before analysis without noticeable impact on the results.

3.5 MCF-7 Cell Induction for TurboID

1. Transfer cells from a 6-well plate to a 10-cm plate containing biotin-depleted DMEM. 2. When cells reach near confluency, passage them into two 15-cm plates in biotin-depleted DMEM. 3. When the cells reach ~60% confluency, rinse carefully with 1 PBS and then with biotin-depleted DMEM without phenol red, then add fresh biotin-depleted DMEM without phenol red.

234

Lynda Agbo et al.

4. When the cells reach ~80% confluency, induce bait expression with 1 μg/mL doxycycline for 24 h. 5. Exchange medium with biotin-depleted DMEM without phenol red containing 10 nM E2 (50 μL of the 100 nM E2 working solution in 500 mL DMEM) to activate ESR1. 6. Incubate for 5 h, then supplement medium with 500 μM of biotin for 1 h. 7. Remove the medium and collect cell pellets as described in Subheading 3.4, step 6. 3.6

TurboID

1. Place cell pellet on ice until partially thawed. 2. Resuspend pellet with 1.5 mL RIPA lysis buffer supplemented with protease inhibitors by pipetting up and down. Lyse on ice for ~5 min. 3. Shear chromatin to generate fragments of 500 bp. When using a Fisher F60 Sonic Dismembrator, this is achieved by sonicating at amplitude 4 for 30 s. Working conditions should be optimized empirically for each sonicator. 4. Add ~250 units (1 μL) of turbonuclease and incubate for 1 h at 4  C with gentle rotation to further digest the chromatin into 150-bp fragments. 5. Clear samples by centrifugation at 13,000  g for 20 min at 4  C and transfer supernatants to fresh tubes for TurboID. Aliquots can be taken for downstream immunoblotting analysis and stored at 20  C. 6. While the samples are being centrifuged, transfer 60 μL of streptavidin Sepharose bead slurry (50/50, V/V) per sample to new tubes and wash the beads three times with 1 mL of RIPA lysis buffer. 7. Add the clarified protein samples to the washed streptavidin beads and incubate for 3 h at 4  C with rotation. 8. Centrifuge for 2 min at 300  g and discard the supernatants. You can save an aliquot of each supernatant as unbound fractions for downstream analysis. 9. Wash the beads twice with 1 mL of RIPA lysis buffer and discard the supernatants. Additional washes can be performed at this point to further remove the unspecific or weak interaction partners of biotinylated proteins (see Note 9). 10. Wash the beads three times with 1 mL of 50 mM ABC pH 8.5. 11. Resuspend beads in 100 μL of 50 mM ABC pH 8.0 and add 1 μg of trypsin (from a 0.1 μg/mL stock solution in 20 mM Tris-HCl pH 8.0). 12. Incubate overnight at 37  C with gentle lateral shaking.

Comprehensive Interactome Mapping of Nuclear Receptors Using Proximity. . .

235

13. The next day, add another 1 μg of trypsin to ensure complete sample digestion and incubate for 3–4 h at 37  C with gentle lateral shaking. 14. Pellet the beads by centrifuging for 2 min at 300  g and collect the peptides in a new tube after each wash (be careful to avoid the beads). 15. Wash the beads twice more with 200 μL acetonitrile and collect the peptides in the same tube (be careful to avoid the beads). 16. Acidify the samples by adding 25 μL of 50% formic acid. 17. Speedvac the samples to dryness at low heat. 3.7 Peptide Desalting

1. Prepare a C18 StageTip as per Rappsilber et al. [28]. Alternatively, commercial StageTips can also be used. 2. Wet the C18 disk by adding 20 μL of 100% methanol and centrifuge at 1500  g for 1 min. 3. Add 20 μL desalting buffer B to the StageTip and centrifuge at 1500  g for 1 min. 4. Add 20 μL desalting buffer A to the StageTip and centrifuge 1500  g for 4 min. 5. Dissolve the dry peptide samples from Subheading 3.6, step 17 in 20 μL of desalting buffer A. 6. Add peptide mixtures to StageTips and centrifuge at 1500  g for 4 min. 7. Reload each flow-through onto its StageTip to allow complete peptide capture and centrifuge again at 1500  g for 4 min. 8. Wash StageTips twice with 20 μL of desalting buffer A and centrifuge at 1500  g for 4 min each time. 9. Transfer StageTips to new tubes. 10. Elute peptides three times by adding 20 μL of desalting buffer B and centrifuging at 1500  g for 1 min. Collect desalted peptides in a fresh tube. 11. Speedvac samples to dryness and store at 80  C until liquid chromatography (LC)-MS/MS analysis.

3.8 MS Data Acquisition

Below we present our current MS method to analyze proteins samples derived from proximity biotinylation assays. Similarly, effective acquisition methods can be developed based on the infrastructure available. Briefly, peptide samples are separated by online reversed-phase nanoscale capillary LC and analyzed by electrospray MS/MS. The experiments are performed with a Dionex UltiMate 3000 nanoRSLC chromatography system connected to an Orbitrap Fusion mass spectrometer equipped with a nanoelectrospray ion source. Peptides are trapped at 20 μL/min in loading solvent (2% acetonitrile, 0.05% TFA) on a 5 mm  300 μm C18 pepmap

236

Lynda Agbo et al.

cartridge pre-column for 5 min. Then, the pre-column is switched online using a self-made 50 cm  75 μm internal diameter separation column packed with ReproSil-Pur C18-AQ 3-μm resin (Dr. Maisch HPLC) and the peptides are eluted with a linear gradient of 5–40% solvent B (A: 0.1% formic acid, B: 80% acetonitrile, 0.1% formic acid) over 90 min at 300 nL/min. Mass spectra are acquired in data-dependent acquisition mode using Thermo XCalibur software version 3.0.63. Full scan mass spectra (350–1800 m/z) are acquired in the Orbitrap using an automatic gain control (ACG) target of 4e5, a maximum injection time of 50 ms, and a resolution of 120,000. Internal calibration is performed using the m/z 445.12003 siloxane ion as a lock mass. Each MS scan is followed by the acquisition of fragmentation spectra from the most intense ions for a total cycle time of 3 s (top speed mode). The selected ions are isolated using the quadrupole analyzer in 1.6-m/z windows and fragmented by higher energy collisioninduced dissociation at 35% of the collision energy. The resulting fragments are detected by the linear ion trap in rapid scan mode with an AGC target of 1e4 and a maximum injection time of 50 ms. Dynamic exclusion of previously fragmented peptides is set for a duration of 20 s and a tolerance of 10 ppm. 3.9

MS Data Analysis

Our MS data is stored, searched, and analyzed using the ProHits laboratory information management system [29]. Thermo Fisher Scientific RAW mass spectrometry files are converted to mzML and mzXML files using ProteoWizard (3.0.4468; [30]). These files are then searched using Mascot (v2.3.02) and Comet (v2012.02 rev.0). Peptide sequences are searched against the RefSeq database (version 57, January 30th, 2013) acquired from the National Center for Biotechnology Information, which contains 72,482 human and adenoviral sequences supplemented with common contaminants from the Max Planck Institute (http://lotus1.gwdg.de/ mpg/mmbc/maxquant_input.nsf/7994124a4298328fc12574 8d0048fee2/$FILE/contaminants.fasta) and the Global Proteome Machine (http://www.thegpm.org/crap/index.html). Charges of +2, +3, and +4 are allowed, and the parent mass tolerance is set at 12 ppm while the fragment bin tolerance is set at 0.6 amu. Deamidated asparagine and glutamine and oxidized methionine are allowed as variable modifications. The results from each search engine are analyzed using the Trans-Proteomic Pipeline (v4.6 OCCUPY rev 3) [31]. SAINTexpress version 3.6.1 [32] was used as a statistical tool to calculate the probability value of each potential protein–protein interaction compared to background contaminants using default parameters. Two unique peptide ions and a minimum iProphet probability of 0.95 were required for protein identification prior to SAINTexpress. See Table S1 for complete SAINTexpress results.

Comprehensive Interactome Mapping of Nuclear Receptors Using Proximity. . .

3.10 MS Data Archiving

4

237

All MS files are deposited in MassIVE (http://massive.ucsd.edu) and ProteomeXchange (http://www.proteomexchange.org/). The MS files reported here were assigned the identifiers MassIVE MSV000087446 and PXD026061 and can be accessed at doi: https://doi.org/10.25345/C5BV6H and ftp://massive.ucsd. edu/MSV000087446/.

Notes 1. The TurboID-HA-2  Strep lentiviral expression vector, which consists of an engineered TurboID biotin ligase fused to a G-S rich linker, and an HA-2  Strep tag, was generated using an overlapping PCR approach. TurboID was PCR amplified from V5-TurboID-NES_pCDNA3 (a gift from Alice Ting, Addgene #107169) [33], the 10 amino-acid longG-S rich linker was PCR amplified from pSTV2 N-BioID2 Flag pDEST (a gift from Anne-Claude Gingras, University of Toronto) [34] and the HA tag was inserted to the 50 primer used to amplify the twin strep tag during PCR amplification from pAAVS1. V2_neo_PGK1–3  Flag_Twin_Strep (a gift from Yannick Doyon, Universite´ Laval) [35]. The pENTRY-TurboID-HA2  Strep was created by introducing the annealed amplicons into the pDONR221 vector using Invitrogen™ Gateway™ recombination cloning (Invitrogen, California, USA). To generate the pDEST-pCW57.1-TurboID-HA-2  Strep vector, TurboID-HA-2  Strep was PCR amplified from pENTRYTurboID-HA-2  Strep and cloned into pCW57.1 (a gift from David Root, Addgene #41393) using the NheI restriction site. pCW57.1-TurboID-HA-2  Strep-eGFP and pCW57.1-TurboID-HA-2  Strep-eGFPnls vectors were generated by gateway recombination of pENTRY-eGFP and pENTRY-eGFPnls into pDEST-pCW57.1-TurboID-HA-2  Strep vector. All constructs were confirmed by sequencing. 2. Because of the high efficiency of TurboID, the growth medium used prior to fusion protein induction should be depleted of biotin to minimize background effects. Under usual growth conditions, both media and FBS can contain biotin. For example, Roswell Park Memorial Institute (RPMI) media contains 200 μg/L of biotin, while DMEM contains none. Therefore, we grow our cells in DMEM for at least three passages prior to TurboID to deplete cellular biotin levels. Commercial biotinfree RPMI medium (e.g., MyBioSource) can be sourced if other media are not tolerated by the cells being studied. To deplete biotin from our serum, we incubate our FBS with pre-washed streptavidin Sepharose beads in sterile PBS

238

Lynda Agbo et al.

(1/100, V/V) for 24 h with rotation at 4  C. Then, the now biotin-depleted FBS is filter sterilized and added to DMEM. 3. All buffers used for peptide preparation prior to MS analysis should be LC-MS grade or higher to minimize interference during MS acquisition. 4. Stbl2 competent cells are designed for cloning unstable inserts. In our experiments, they improve cloning efficiency for vectors containing viral sequences. Additionally, we found that growing Stbl2 cells at 30  C also improve yield; however, doing so may generate two types of colonies, as described by Feldman et al. [36]. If this occurs, smaller colonies should be screened first, as the larger ones often carry mutated plasmids. 5. The confluency of the HEK293T cells is a key parameter of lentiviral production, with significantly decreased production at high confluency. HEK293T cells used for lentiviral production should always be maintained below 90% confluency. 6. Lentiviral particles can be harvested twice if large quantities are required. After harvesting on day 3, new medium can be added and subsequently harvested on day 4. All supernatants should be pooled prior to use to minimize batch effects. Lentiviruscontaining media can be used immediately, maintained at 4  C for ~2 weeks, or snap frozen at 80  C for later use. Freezethaw cycles should be avoided to maintain the quality of the lentivirus. 7. Lentiviral transduction results in a heterogenous cell population. To avoid loss of transgene expression over time, clonal populations can be selected to maintain constant gene expression across multiple experiments. If pools of cells are used after transduction, the expression level of the gene of interest should be similar to that of the eGFP-NLS control to ensure proper modeling of the TurboID background. 8. Please note that the TurboID-HA-2  Strep tag adds ~30 kDa to the molecular weight of the bait protein when observed by western blot. 9. Additional washes can be used to remove possible interaction partners of biotinylated proteins and decrease the complexity of the samples prior to MS analysis. The biotin–streptavidin interaction is quite strong and will resist very harsh wash conditions, including 2% SDS [34].

Acknowledgments This research was supported by Project Grants from the Canadian Institutes of Health Research (PJT-168969, and PJT-152948) and Leader’s Opportunity Funds from the Canada Foundation for

Comprehensive Interactome Mapping of Nuclear Receptors Using Proximity. . .

239

Innovation (37454, 41426). L.A. is supported by a scholarship from the Fondation du CHU de Que´bec. S.A.B. is supported by a doctoral scholarship from the Fonds de Recherche du Que´bec Sante´ (FRQS). P.-E.K.T. is supported by a Bourse Distinction Luc Be´langer from the Cancer Research Center – Universite´ Laval and by a doctoral scholarship from the FRQS. J.-P.L. is supported by a Junior 1 salary award from the FRQS. A.F.-T. is a tier 2 Canada Research Chair in Molecular Virology and Genomic Instability and is supported by the Fondation J.-Louis Le´vesque. References 1. Gustafsson J-A (2016) Historical overview of nuclear receptors. J Steroid Biochem Mol Biol 157:3–6. https://doi.org/10.1016/j.jsbmb. 2015.03.004 2. Mangelsdorf DJ, Thummel C, Beato M et al (1995) The nuclear receptor superfamily: the second decade. Cell 83(6):835–839. https:// doi.org/10.1016/0092-8674(95)90199-x 3. Huang P, Chandra V, Rastinejad F (2010) Structural overview of the nuclear receptor superfamily: insights into physiology and therapeutics. Annu Rev Physiol 72(1):247–272. https://doi.org/10.1146/annurev-physiol021909-135917 4. McEwan Iain JI (2016) The nuclear receptor superfamily at thirty. Methods Mol Biol (Clifton, NJ) 1443:3–9 5. Weikum ER, Liu X, Ortlund EA (2018) The nuclear receptor superfamily: a structural perspective. Protein Sci 27(11):1876–1892. https://doi.org/10.1002/pro.3496 6. Berrabah W, Aumercier P, Lefebvre P et al (2011) Control of nuclear receptor activities in metabolism by post-translational modifications. FEBS Lett 585(11):1640–1650. https://doi.org/10.1016/j.febslet.2011. 03.066 7. Sever R, Glass CK (2013) Signaling by nuclear receptors. Cold Spring Harb Perspect Biol 5(3):a016709–a016709. https://doi.org/10. 1101/cshperspect.a016709 8. Huttlin EL, Bruckner RJ, Paulo JA et al (2017) Architecture of the human interactome defines protein communities and disease networks. Nature 545(7655):505–509. https://doi. org/10.1038/nature22366 9. Vidal M, Cusick ME, Baraba´si A-L (2011) Interactome networks and human disease. Cell 144(6):986–998. https://doi.org/10. 1016/j.cell.2011.02.016 10. Li Y, Sahni N, Yi S (2016) Comparative analysis of protein interactome networks prioritizes

candidate genes with cancer signatures. Oncotarget 7(48):78841–78849. https://doi.org/ 10.18632/oncotarget.12879 11. Gulfidan G, Turanli B, Beklen H et al (2020) Pan-cancer mapping of differential proteinprotein interactions. Sci Rep 10(1):3272. https://doi.org/10.1038/s41598-02060127-x 12. Li J-W, Lee H-M, Wang Y et al (2016) Interactome-transcriptome analysis discovers signatures complementary to GWAS loci of type 2 diabetes. Sci Rep 6(1):35228. https:// doi.org/10.1038/srep35228 13. Haenig C, Atias N, Taylor AK et al (2020) Interactome mapping provides a network of neurodegenerative disease proteins and uncovers widespread protein aggregation in affected brains. Cell Rep 32(7):108050. https://doi.org/10.1016/j.celrep.2020. 108050 14. Ganapathiraju MK, Thahir M, Handen A et al (2016) Schizophrenia interactome with 504 novel protein–protein interactions. NPJ Schizophr 2(1):16012. https://doi.org/10. 1038/npjschz.2016.12 15. Agbo L, Lambert J-P (2019) Proteomics contribution to the elucidation of the steroid hormone receptors functions. J Steroid Biochem Mol Biol 192:105387 16. Ve´lot L, Lessard F, Be´rube´-Simard F-A et al (2021) Proximity-dependent mapping of the androgen receptor identifies kruppel-like factor 4 as a functional partner. Mol Cell Proteomics 20:100064. https://doi.org/10.1016/j. mcpro.2021.100064 17. Gingras A-C, Abe KT, Raught B (2019) Getting to know the neighborhood: using proximity-dependent biotinylation to characterize protein complexes and map organelles. Curr Opin Chem Biol 48:44–54. https://doi. org/10.1016/j.cbpa.2018.10.017

240

Lynda Agbo et al.

18. De Boer E, Rodriguez P, Bonte E et al (2003) Efficient biotinylation and single-step purification of tagged transcription factors in mammalian cells and transgenic mice. Proc Natl Acad Sci 100(13):7480–7485. https://doi.org/10. 1073/pnas.1332608100 19. Roux KJ, Kim DI, Raida M et al (2012) A promiscuous biotin ligase fusion protein identifies proximal and interacting proteins in mammalian cells. J Cell Biol 196(6):801–810. https://doi.org/10.1083/jcb.201112098 20. Kim DI, Jensen SC, Noble KA et al (2016) An improved smaller biotin ligase for BioID proximity labeling. Mol Biol Cell 27(8): 1188–1196. https://doi.org/10.1091/mbc. e15-12-0844 21. Kim DI, Kc B, Zhu W et al (2014) Probing nuclear pore complex architecture with proximity-dependent biotinylation. Proc Natl Acad Sci 111(24):E2453–E2461. https://doi. org/10.1073/pnas.1406459111 22. Lobingier BT, Hu¨ttenhain R, Eichel K et al (2017) An approach to spatiotemporally resolve protein interaction networks in living cells. Cell 169(2):350–360.e312. https://doi. org/10.1016/j.cell.2017.03.022 23. Lam SS, Martell JD, Kamer KJ et al (2015) Directed evolution of APEX2 for electron microscopy and proximity labeling. Nat Methods 12(1):51–54. https://doi.org/10.1038/ nmeth.3179 24. Han Y, Branon TC, Martell JD et al (2019) Directed evolution of split APEX2 peroxidase. ACS Chem Biol 14(4):619–635. https://doi. org/10.1021/acschembio.8b00919 25. Oughtred R, Stark C, Breitkreutz B-J et al (2019) The BioGRID interaction database: 2019 update. Nucleic Acids Res 47(D1): D529–D541. https://doi.org/10.1093/nar/ gky1079 26. Lambert J-P, Ivosev G, Couzens AL et al (2013) Mapping differential interactomes by affinity purification coupled with dataindependent mass spectrometry acquisition. Nat Methods 10(12):1239–1245. https:// doi.org/10.1038/nmeth.2702 27. Lambert J-P, Picaud S, Fujisawa T et al (2019) Interactome rewiring following

pharmacological targeting of BET bromodomains. Mol Cell 73(3):621–638.e617. https://doi.org/10.1016/j.molcel.2018. 11.006 28. Rappsilber Juri J (2007) Protocol for micropurification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat Protoc 2(8):1896–1906 29. Liu G, Knight JD, Zhang JP et al (2016) Data independent acquisition analysis in ProHits 4.0. J Proteome 149:64–68. https://doi.org/ 10.1016/j.jprot.2016.04.042 30. Kessner D, Chambers M, Burke R et al (2008) ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24(21):2534–2536. https://doi.org/10. 1093/bioinformatics/btn323 31. Deutsch EW, Mendoza L, Shteynberg D et al (2015) Trans-proteomic pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics. Proteomics Clin Appl 9(7–8):745–754. https://doi. org/10.1002/prca.201400164 32. Teo G, Liu G, Zhang J et al (2014) SAINTexpress: improvements and additional features in significance analysis of INTeractome software. J Proteome 100:37–43. https://doi.org/10. 1016/j.jprot.2013.10.023 33. Branon TC, Bosch JA, Sanchez AD et al (2018) Efficient proximity labeling in living cells and organisms with TurboID. Nat Biotechnol 36(9):880–887. https://doi.org/10. 1038/nbt.4201 34. Samavarchi-Tehrani P, Abdouni H, Samson R et al (2018) A versatile lentiviral delivery toolkit for proximity-dependent biotinylation in diverse cell types. Mol Cell Proteomics 17(11):2256–2269. https://doi.org/10. 1074/mcp.TIR118.000902 35. Dalvai M, Loehr J, Jacquet K et al (2015) A scalable genome-editing-based approach for mapping multiprotein complexes in human cells. Cell Rep 13(3):621–633. https://doi. org/10.1016/j.celrep.2015.09.009 36. Feldman DH, Lossin C (2014) The Nav channel bench series: plasmid preparation. MethodsX 1:6–11. https://doi.org/10.1016/j. mex.2014.01.002

Chapter 16 Mining Proteomics Datasets to Uncover Functional Pseudogenes Anna Meller and Franc¸ois-Michel Boisvert Abstract Analysis of the proteome, combined with the human genome sequence, improved gene annotation and confirmed that some genes are actually expressed as proteins, including pseudogenes, alternative reading frames, and additional protein isoforms. Although sequencing a genome is a challenge in itself, the annotation of all genes encoding proteins is a bigger one. Here, we describe an in silico approach to identify evidence of pseudogene expression, as well as an experimental approach for the validation of the protein encoded by pseudogenes, including the steps necessary to quantify these proteins. This technique enables a comprehensive analysis for the expression of genes into proteins that were not suspected of existing. Key words Pseudogenes, Proteomics, Mass spectrometry, Parallel reaction monitoring, Protein expression

1

Introduction Pseudogenes are regions of the genome that contain copies of presumably non-expressed genes [1]. They exist from different mechanisms of origin that have been separated into three categories [2]. Processed pseudogenes are derived from retrotransposition of processed mRNAs and are often characterized by a lack of promoter regions and by the presence of a poly-A tail [3]. Unprocessed pseudogenes are derived from duplication, resulting from recombination events that have eventually mutated [4]. Finally, unitary pseudogenes are the result of inactivating mutations in a gene that was once expressed and functional [5]. Although, often assumed to lack function, several pseudogenes are now found to play important biological roles either at the RNA or the protein level, including the possibility that some of them were actually misclassified [6]. Due to this initial error in categorization, they are frequently dismissed and excluded from genomic analyses. The advent of technologies for large-scale identification of gene

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_16, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

241

242

Anna Meller and Franc¸ois-Michel Boisvert

expression and protein translation can identify several instances of pseudogenes that are expressed [7]. Herein, we present an approach to evaluate whether some of these genomic elements can actually be expressed and exert important biological functions.

2

Materials

2.1 Sample Preparation

All solutions and dilutions are prepared in mass spectrometry (MS) H2O on the day of the experiment. Iodoacetamide (IAA) and Ammonium bicarbonate (NH4HCO3) can be stored at 20  C as 1 M stock solution in aliquots. Storage of other solutions is not recommended. 1. Protein lysis buffer: 8 M Urea, 10 mM HEPES. 2. LoBind Eppendorf tubes. 3. BCA kit or other for protein quantification. 4. MS H2O. 5. Dithiothreitol (DTT) as a 1 M solution in MS H2O (store aliquots at 20  C). 6. Iodoacetamide (IAA) or 2-chloroacetamide (light sensitive!) as a 1 M solution in MS H2O (store aliquots at 20  C) do not refreeze! 7. 100% Acetonitrile (ACN). 8. 100% Trifluoroacetic acid (TFA). 9. 100% Glacial acetic acid. 10. Ammonium bicarbonate (NH4HCO3) as a 1 M solution in MS H2O (store aliquots at 20  C). 11. 1 μg/μL MS grade Trypsin in 50 mM acetic acid in MS H2O in LoBind Eppendorf tube (store aliquots at 80  C) or other enzyme for protein digest. 12. 100% Formic acid (FA). 13. Elution buffer: 1% FA, 50% ACN. 14. C-18 tips or columns. 15. 9 mm 350 μL Fused insert MS macrovials. 16. Heat block. 17. Tabletop centrifuge. 18. Vortex. 19. UV/Vis spectrophotometer. 20. Speedvac. 21. NanoDrop.

Proximity Biotinylation of Hormone Receptors

2.2 PRM/ MRM Analysis

243

1. Heavy labeled peptides corresponding to the protein of interest. 2. Skyline analysis software.

3

Methods

3.1 Identification of Pseudogenes 3.1.1 Pseudogene Database Search

In case of human or mouse genes pseudogene identification can be performed by searching readily available datasets such as GENECODE v.10 (https://www.gencodegenes.org/) [2] and PseudoPipe (http://pseudogene.org/) [8]. However, for other species there is less information available thus a primary bioinformatical analysis is required to generate a starting database (Fig. 1). 1. Both databanks provide a downloadable list with all identified pseudogenes in a .txt output format. This can then be opened and searched in Excel by filtering for the name of the parent gene of interest to find specific pseudogenes. While the method and criteria defining a pseudogene is different between datasets there are likely to be overlaps between them. Cross reference the datasets and remove any repeated entry (see Note 1). 2. Analyze the genomic context of the selected pseudogenes by entering their name in the gene search bar of NCBI (https:// www.ncbi.nlm.nih.gov/) or Ensembl (https://useast.ensembl. org/index.html) databases. Here, find the genomic sequence and see if they contain any introns or a poly-A repeat. Also, look for potential promoter regions, for example, a TATA box 50 upstream from the pseudogene sequence. These could all point towards the potential that the gene can be transcribed.

3.1.2 Transcription Evidence Search

As most pseudogenes do not encode for functional proteins, it is important to address whether there are transcriptional evidence for a given pseudogene, which then can point towards a potential translation. This can be verified by looking at RNA-seq and CHIP-seq datasets. 1. Search for a pseudogene name on GEO (https://www.ncbi. nlm.nih.gov/geo/) [9] or ENA (https://www.ebi.ac.uk/ena/ browser/home) [10] to retrieve a list of large-scale studies where the given RNA has been identified. This provides further evidence if a pseudogene is being actively transcribed and might produce a functional protein (see Note 2).Transcriptomics and proteomics evidences are also available on Expression Atlas (https://www.ebi.ac.uk/gxa/home) [11] in an interactive visual where collective experimental data of different studies can be found on each gene.

244

Anna Meller and Franc¸ois-Michel Boisvert

Fig. 1 Workflow of pseudogene database mining and analysis. Mining gene specific databases for candidate pseudogenes gives a basic dataset to further analyze by discovering their potential of being transcribed and translated. Exploring and re-analyzing proteomics datasets can further narrow down the list of potentially interesting pseudogenes

Proximity Biotinylation of Hormone Receptors 3.1.3 Translation Verification

245

To determine whether a pseudogene can be translated, it is important to look if there is evidence for ribosome binding on the mRNA. 1. Verify the potential of active translation from each pseudogene by using ribosome profiling datasets such as GWIPS-viz (https://gwips.ucc.ie/) [12]. Here, by entering the name of each pseudogene in the search bar a graphic interface will show peaks at certain locations along the mRNA where there were recorded events of ribosome binding.

3.1.4 Translation Analysis

While there can be translational evidence for a certain pseudogene, it does not necessarily mean that the translated protein is functional or that it resembles the original gene. 1. Analyze the potential translation of the sequence by in silico translation using Expasy (https://web.expasy.org/translate/) [13]. Paste the mRNA sequence in the search area and look through each reading frame within the provided results to see how much the translated pseudogene resembles the parental gene. There can also be different reading frames present, providing alternative translation options (see Note 3).

3.1.5 Protein Expression

To gather supporting evidence for the presence of a protein originating from a pseudogene, searching proteomics repository databases is a good starting point. The identification of unique peptides related to a pseudogene provides further proof for the existence of a protein encoded by the pseudogene. 1. Enter the name of the pseudogene in the search tab on OpenProt (https://www.openprot.org/) [14] to retrieve a list of studies with experimental or predicted evidences. It provides information about not only pseudogenes but potential alternative translations as well (see Note 4). Look at the details and find the identified unique peptides and their spectrum matches for each study in the mass spectrometry tab.

3.1.6 Interaction Databases

Optionally, to further characterize the potential functional expression of a pseudogene, interaction databases such as the BioPlex Interactome (https://bioplex.hms.harvard.edu/) [15] can also be re-analyzed. This can help to identify whether a given pseudogene functions within a known complex or whether the known associations appear to be random interactions with unrelated proteins.

3.2 Validation of Functional Pseudogenes

For experimental identification, the most straightforward method is mass spectrometry analysis where peptide specific search can be combined with quantitative analysis (Fig. 2). Quantification of wild type and pseudogene derived proteins by MS/MS analysis is generally performed using PRM (Parallel Reaction Monitoring) or

246

Anna Meller and Franc¸ois-Michel Boisvert

Fig. 2 Workflow of protein quantification. Promising pseudogenes with experimental translational level evidence can further be validated using mass spectrometry analysis coupled with absolute quantification

MRM (Multiple Reaction Monitoring, also known as Selective Reaction Monitoring—SRM) method coupled with heavy (non-radioactive, stable isotope) labeled peptides as measurement reference. It is important to change from the standard DDA (Data Dependent Acquisition) analysis when quantifying proteins to increase sensitivity and specificity. In DDA, the most abundant precursors in each cycle are selected for analysis which means that a protein with low expression may not even be detected at all. On the other hand, in PRM or MRM setting, the mass spectrometer only cycles through a list of pre-selected precursors which means that the instrument is specifically searching for the peptides defined

Proximity Biotinylation of Hormone Receptors

247

in an inclusion list. The difference between PRM and MRM methods is that MRM measures only selected fragment ions of a given precursor peptide while PRM measures them all. In general, PRM is preferred over MRM as it requires less optimization steps. 3.2.1 Protein Specific Unique Heavy Peptide Design

1. Using the mass spectrometry evidence database results choose and design unique, heavy labeled peptides for each protein to be quantified (ideally 2 peptides/protein). Alternatively, the target protein can also be transiently expressed, immunoprecipitated, and purified using an epitope tag. The digested protein then can be analyzed in the mass spectrometer to find the best unique peptides for endogenous identification and quantification. Choose peptides which are easily detectable and have been found in several mass spectrometry studies before. 2. Choose which amino acids will be heavy labeled. Labeling can be present on both N- and C-terminal (see Note 5) of the peptide. Using only one heavy amino acid is also an option, but keep in mind that in this case only y or b ions (depending on which end was labeled) can be used for quantification.

3.2.2 Optimization of Peptide Detection by MS

1. As a quality check, it is advisable to set up a test run with the heavy peptides on the mass spectrometer using a standard DDA method to see how they behave and to gather the exact monoisotopic precursor m/z value for each. Alternatively, this can also be calculated based on the light (not labeled) peptide m/z values and adding the mass of the heavy labeling to it manually or using Skyline software (https://skyline.ms/ project/home/begin.view) [16]. 2. Generate an inclusion list containing the m/z values of all peptides which are to be measured (both heavy) (see Note 6). 3. Set up a PRM or MRM method on the mass spectrometer (the setup interface depends on the instrument, but there are available templates on both Orbitrap and Tims-TOF mass spectrometers) and run the heavy peptides alone first to see if further adjustments are required. 4. Analyze the data using Skyline software. Verify the peak quality (see Note 7) and the presence of different charges (see Note 8). Also, look at product ion intensities, and select which ones will be used for the quantification (choose the three highest ones) (see Note 9). Based on the obtained results if necessary, make changes to the PRM/MRM method (see Note 10).

3.2.3 Sample Preparation

1. Denaturation: Prepare protein extracts using Protein lysis buffer. Measure protein concentration using BCA assay or equivalent. Aliquot the desired concentration of protein extract from each sample into a LoBind tube (see Note 11).

248

Anna Meller and Franc¸ois-Michel Boisvert

2. Disulfide reduction: Add DTT to the samples at a final concentration of 5 mM and boil for 2 min at 95  C. 3. Alkylation: Cool samples at room temperature for 30 min and add IAA at a final concentration of 7.5 mM. Incubate in dark at room temperature for 20 min. Decrease the urea concentration (see Note 12) to 2 M by the addition of 50 mM NH4HCO3. 4. Digestion: Add trypsin to the samples (1 μg/100 μg protein) and incubate at 30  C overnight. First, generate a standard curve by spiking in different concentrations of each heavy peptide to a control sample (see Note 13). Set up the samples and spike in a predetermined concentration (based on the concentration from the results obtained from the standard curve) from each heavy peptide (see Note 14). Cool samples to room temperature and quench the tryptic digest with TFA at as final concentration of 0.2%. 5. Desalting: Clean samples using C-18 tips. In the equilibration step take up and eject three times an appropriate amount of 100% ACN (determined by the C-18 tip size, which is related to binding capacity). Take up and eject three times 0.1% TFA to equilibrate the column. For peptide binding pipette up and down the sample ten times in a separate LoBind tube then discard it. Repeat the binding step until the total volume of the sample has been processed. Wash the column by taking up and ejecting 0.1% TFA three times. Elute peptides from the column in a similar way as in the binding step (see Note 15) using elution buffer (the amount of buffer is 3 times the amount the C-18 tip can take up, ex. for a 100 μL tip use 300 μL), but after 10 up and down cycles in a new LoBind tube leave the eluted peptides in that final tube and repeat the cycles in the same tubes until all of the elution buffer has been transferred. Dry the peptides in Speedvac. 6. Concentration measurement: Resuspend the dried peptides in 1% FA by vigorous vortexing. Measure the peptide concentration in each sample using NanoDrop at absorbance level 205 nm. Transfer the samples into MS vials to be loaded on the mass analyzer. 3.3 Options for Functional Analysis of Identified Pseudogene

To further study the identified pseudogene of interest and determine its cellular functions, overexpression can be performed to investigate behavior, localization, and effect compared to the wild type protein. Epitope tagged pseudogenes can be subjected to co-immunoprecipitation assay followed by western blot or mass spectrometry analysis to identify novel interactors. Similarly, generating knockouts can also be helpful to study cellular changes in the absence of the pseudogene derived protein.

Proximity Biotinylation of Hormone Receptors

4

249

Notes 1. Different databases can name a certain pseudogene differently so comparing the genomic location can be helpful to remove duplicates. Also, if there is an obviously missing entry, for example, there is no pseudogene 3 but there is 4 and 5, then a gene search on NCBI (https://www.ncbi.nlm.nih.gov/) or on Ensembl (https://useast.ensembl.org/index.html) can be helpful. 2. In addition, by re-analyzing the raw data on ChIP-Atlas (https://chip-atlas.org/) [17] specific transcription factors can be found to associated with a pseudogene giving additional information on their potential function or regulation. 3. In certain cases, it is also important to look if the translated pseudogene can function similar to the parent gene. It is advisable to check if there is a major mutation in functional domains or a large truncation in the translated protein compared to the wild type. For example, in case of ubiquitin genes a C-term Gly-Gly motif is essential for substrate conjugation. 4. Bear in mind, that the absence of evidence does not mean that the protein translated from a pseudogene is not present. In certain cases, specific enzyme digest such as chymotrypsin, Glu-C, or Lys-C is required to identify specific peptides which are different from the ones present in the wild type protein. 5. It is not necessary to choose the very last amino acid at N- or C-terminal, the second to last can also be heavy labeled instead. Having the option to choose between the two last amino acids on both ends can also be useful in terms of cost efficiency since some heavy labeled amino acids are more difficult to generate which can increase the cost of production. 6. Certain peptides can be present in different charge states, so it is also advisable to include “alternatively” charged peptides in the list as well. 7. Ideally the peaks should be relatively sharp without tailing and short over time capturing around 10 datapoints across the peak. 8. Charge states where no signal was found can be removed from the PRM inclusion list. 9. If only one end of the peptide is heavy labeled make sure to choose the product ions corresponding to the labeled terminus. 10. Several parameters can be changed depending on the problem encountered. For example, wider peaks can be narrowed by

250

Anna Meller and Franc¸ois-Michel Boisvert

changing the flow pressure as well as the column size. For asymmetrical, trailing peaks increasing the ramp can also be an option. 11. Peptides can stick easily to plastic surfaces so using a LoBind tube will increase sample recovery. 12. The final urea concentration and condition for enzymatic digest can vary depending on the choice of enzyme. 13. Ideally, heavy peptides are added to the samples prior digestion. If that is not possible spike in can be performed at later stages as well, but this will decrease the accuracy as the heavy peptides will not suffer the same processing losses as the endogenous ones. 14. Ideally the heavy labeled and the endogenous peptides should be present in the sample in a close concentration to make the peak ration calculation more reliable. 15. Elution from the column starts as soon as the column touches the solution so make sure that the entire elution buffer is used.

Acknowledgements A.M. is a recipient of a post-doctoral fellowship from the CRCHUS, and F.M.B is a recipient of a FRQS Senior scholarship. Funding was provided from the Canadian Institutes for Health Research, grant number #398925 to F.M.B. F.M.B. is a member of the FRQS-funded “Centre de Recherche du CHUS.” References 1. Harrison PM, Zheng D, Zhang Z et al (2005) Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking proteincoding ability. Nucleic Acids Res 33:2374– 2383 2. Frankish A, Diekhans M, Ferreira AM et al (2019) GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 47:D766–D773 3. Vanin EF (1985) Processed pseudogenes: characteristics and evolution. Annu Rev Genet 19: 253–272 4. Mighell AJ, Smith NR, Robinson PA et al (2000) Vertebrate pseudogenes. FEBS Lett 468:109–114 5. Zhang ZD, Frankish A, Hunt T et al (2010) Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in

humans and other primates. Genome Biol 11: R26 6. Xu J, Zhang J (2016) Are human translated pseudogenes functional? Mol Biol Evol 33: 755–760 7. Sisu C, Pei B, Leng J et al (2014) Comparative analysis of pseudogenes across three phyla. Proc Natl Acad Sci U S A 111:13361–13366 8. Zhang Z, Carriero N, Zheng D et al (2006) PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics 22:1437– 1439 9. Edgar R, Domrachev M, Lash AE (2002) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210 10. Amid C, Alako BTF, Balavenkataraman Kadhirvelu V et al (2020) The European Nucleotide Archive in 2019. Nucleic Acids Res 48: D70–D76

Proximity Biotinylation of Hormone Receptors 11. Papatheodorou I, Fonseca NA, Keays M et al (2018) Expression Atlas: gene and protein expression across multiple studies and organisms. Nucleic Acids Res 46:D246–D251 12. Michel AM, Fox G, Kiran AM et al (2014) GWIPS-viz: development of a ribo-seq genome browser. Nucleic Acids Res 42: D859–D864 13. Gasteiger E, Gattiker A, Hoogland C et al (2003) ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res 31:3784–3788 14. Brunet MA, Brunelle M, Lucier JF et al (2019) OpenProt: a more comprehensive guide to

251

explore eukaryotic coding potential and proteomes. Nucleic Acids Res 47:D403–D410 15. Huttlin EL, Ting L, Bruckner RJ et al (2015) The BioPlex network: a systematic exploration of the human interactome. Cell 162:425–440 16. MacLean B, Tomazela DM, Shulman N et al (2010) Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26:966–968 17. Oki S, Ohta T, Shioi G, et al (2018) ChIPAtlas: a data-mining suite powered by full integration of public ChIP-seq data. EMBO Rep 19

Chapter 17 Proteomic Profiling of the Interplay Between a Bacterial Pathogen and Host Uncovers Novel Anti-Virulence Strategies Arjun Sukumaran and Jennifer Geddes-McAlister Abstract Bottom-up proteomics enables a systems-level analysis of proteins involved in a particular sample set. In this protocol, we describe the workflow to prepare Klebsiella pneumoniae and macrophage cells for co-culture, how to extract and prepare samples for analysis by high-resolution mass spectrometry, and lastly, how to analyze the output data files. This workflow allows for the identification of proteins involved in both the bacterial and host perspective during pathogenesis. Key words Quantitative proteomics, Mass spectrometry, Bacterial pathogen, Macrophage

1

Introduction The progression of microbial infection consists of bypassing physical barriers and overcoming interactions with the host immune system to promote bacterial survival. The host defense response can be broadly separated into innate and adaptive immunity with common cross-talk occurring between these arms to protect the host and clear the invading microbe [1]. Foreign substances, including pathogens are initially met with components of the innate immune system compromised of physical barriers (e.g., epithelial cells), phagocytes (e.g., macrophages), and circulating serum proteins (e.g., complement) [1]. Response of these components is tailored to the invading organism and the site of infection. Klebsiella pneumoniae is an opportunistic bacterial pathogen, commonly associated with nosocomial infections, but can also readily cause community-acquired infections. K. pneumoniae colonizes mucosal surfaces, such as the upper respiratory or gastrointestinal tracts, following exposure of such surfaces to bacterial cells or biofilms [2–4]. An infection will primarily lead to pneumonia or

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_17, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

253

254

Arjun Sukumaran and Jennifer Geddes-McAlister

contamination of the urinary tract, but can also progress to liver abscesses or soft tissue infections [5]. Survival of the bacterial cells within the hostile environment of the host relies on the outcome of many simultaneous interactions between host and pathogen proteins [6, 7]. For example, K. pneumoniae produces an array of proteins (i.e., virulence factors) that aid in the acquisition of nutrients, evasion of host defense molecules, and eventual replication of cells. Similarly, host cells seek to prevent bacterial growth and survival by sequestering metal ions required for bacterial function or inhibition of bacterial protein by interfering with protein active sites. Understanding the interplay of both sides will provide insight into the infection process; information will aid in formulating antimicrobial strategies that can be used to treat K. pneumoniae infections. Mass spectrometry-based quantitative proteomics is an analytical chemistry technique that allows for the identification and quantification of proteins in a sample. Following extraction of proteins from the biological samples, proteins are either directly analyzed (top-down proteomics) or digested to peptides prior to analysis (bottom-up or shotgun proteomics). We recently highlighted the role of mass spectrometry-based proteomics to profile communication among the immune system, define interactions between bacterial pathogens and the host, assess the role of post-translational modifications on disease outcome, and explore applications in pathogenesis [6–10]. In this chapter, we offer guidance in utilizing bottom-up proteomics to analyze the co-culture of K. pneumoniae (pathogen) and macrophage (host) to identify shifts in abundance of both host and pathogen proteins. Our application of mass spectrometry-based proteomics to define dual perspective interplay of disease provides an opportunity to uncover novel therapeutic strategies.

2

Materials

2.1 Culturing of Klebsiella pneumoniae

1. K. pneumoniae strain to be tested. 2. Lysogeny broth (LB) or Tryptic Soy broth (TSB) Premix. 3. LB agar or Tryptic Soy agar (TSA) Premix. 4. 10 mL test tube. 5. Hemocytometer. 6. 15 mL conical centrifuge tube. 7. 1.5 mL microcentrifuge tube. 8. Phosphate Buffered Saline (PBS) (see Note 1).

Proteomic Profiling of Bacterial Pathogenesis

2.2 Culturing Macrophages

255

1. Total DMEM with Antibiotics: 500 mL DMEM, high glucose, GlutaMAX™ Supplement, Fetal Bovine Serum (FBS) (10% final concentration), L-glutamine (1% final concentration), penicillin:streptomycin 10 k/10 k (5% final concentration). Filter combined medium through a 0.2 μm complete filter system and store at 4  C (see Notes 2 and 3). 2. Total DMEM without Antibiotics: 500 mL DMEM, high glucose, GlutaMAX™ Supplement, FBS (10% final concentration), L-glutamine (1% final concentration). Filter combined medium through a 0.2 μm complete filter system and store at 4  C (see Note 2). 3. Serological pipettes (10, 25, and 50 mL). 4. Bel-Art™ HiFlow Vacuum Aspirator Collection System. 5. Cell Scrapers (20 mm blade width). 6. 100  15 mm dishes. 7. 6-well cell culture plates. 8. Hemocytometer or Countess™ II Automated Cell Counter.

2.3 Cellular Proteome Analysis

All buffers and reagents made with milli-Q ddH2O 1. 100 mM Tris-HCl (pH 8.5). 2. Protease inhibitor cocktail tablet (see Note 4). 3. Probe sonicator. 4. 4% Sodium dodecyl sulfate (SDS). 5. 1 M Dithiothreitol (DTT) (see Note 5). 6. 0.55 M Iodoacetamide (IAA) (see Note 5). 7. Thermal Shaker. 8. 100% Acetone stored at

20  C.

9. 80% Acetone stored at 20  C. 10. 8 M Urea. 11. 40 mM HEPES. 12. Water bath sonicator. 13. 300 mM Ammonium bicarbonate (ABC). 14. Trypsin/Lys-C protease mix, mass spectrometry grade (0.5 μg/μL; 100:1 protein:enzyme). 15. 100% Acetic acid. 16. 100% Acetonitrile, mass spectrometry grade. 17. 100% Trifluoroacetic acid. 18. Stopping solution: 20% acetonitrile, 6% trifluoroacetic acid. 19. Buffer A: (2% (v/v) acetonitrile, 0.1% (v/v) trifluoroacetic acid, 0.5% (v/v) acetic acid [11].

256

3

Arjun Sukumaran and Jennifer Geddes-McAlister

Methods Steps are to be performed at room temperature, unless stated otherwise.

3.1 Culturing K. pneumoniae Cells

K. pneumoniae is a Biosafety Level 2 organism and the necessary precautions should be taken prior to culture work. K. pneumoniae 52.145 can be maintained on a nutrient rich plate (i.e., LB agar or TSA). The following protocol will use LB media throughout but can be similarly performed using TSB. 1. Isolate single colonies by streaking glycerol stock onto a LB agar plate. 2. Incubate overnight in a static incubator at 37  C. 3. Use a single colony to inoculate a 5 mL of LB in a 10 mL test tube. Set in quadruplicate. 4. Incubate overnight in an incubator at 37  C with 200 rpm shaking. 5. Use overnight culture to inoculate a 5 mL of LB in a 10 mL test tube at a dilution of 1:100. Set four cultures for infection co-culture. Set an additional four cultures for in vitro proteome. 6. Incubate for 3 h in an incubator at 37  C with 200 rpm shaking. 7. Using a hemocytometer to measure cell count of the four cultures designated for co-culture. 8. Collect 5  107 cells per replicate in 1.5 mL microcentrifuge tubes. 9. Harvest cells by centrifuging at 3500  g for 10 min at room temperature. 10. Remove and discard supernatant. 11. Wash cells with 1 mL of sterile PBS. 12. Centrifuge at 3500  g for 10 min at room temperature. 13. Discard supernatant. 14. Repeat steps 11–13. 15. Store cells at RT until ready to use (max. 30 min).

3.2 Culturing of Macrophages 3.2.1 Seeding Macrophages

Total DMEM media should be warmed to 37  C for 30 min prior to experiment. When working with macrophages, pipette gently. 1. Thaw a cryopreserved vial of macrophages by resuspending in 10 mL of total DMEM with antibiotics. 2. Transfer cells and media to a 15 mL conical centrifuge tube. 3. Harvest cells by centrifuging at 400  g for 5 min.

Proteomic Profiling of Bacterial Pathogenesis

257

4. Discard supernatant with an aspirator or a serological pipette. 5. Resuspend cells in total DMEM with antibiotics. 6. Pipette cells onto a 100  15 mm dish. 7. Incubate dish in a humified static incubator at 37  C with 5% CO2. 3.2.2 Passaging Macrophages for Coculture

1. Remove cell culture media from the dish. 2. Add 5 mL of room temperature PBS to the dish. 3. Tilt the dish to rinse cells gently with PBS. Remove PBS. 4. Add 1 mL of cold PBS. Release cells using a cell scraper. 5. Add 9 mL of total DMEM with antibiotics. 6. Count cells using a hemocytometer or an automated counter. 7. Dilute cells using total DMEM with antibiotics to 0.6  106 cells per mL. 8. Add 2 mL of cells to a well of a 6-well culture plate. Fill four wells per plate in one plate for co-culture. Similarly fill a second plate for non-infected control. 9. Incubate plates in a humified static incubator at 37  C with 5% CO2. Allow at least 4 h for cells to adhere onto the surface of the plate prior to co-culture.

3.3 Co-culture of Macrophages with K. pneumoniae

1. Remove cell culture media from each well in the plate. 2. Add 1 mL of room temperature PBS to each well. 3. Tilt the plate to rinse cells with PBS. Remove PBS. 4. Repeat steps 3 and 4. 5. Resuspend harvested K. pneumoniae cells (5  107) from Subheading 3.1 with 2 mL total DMEM without antibiotics. 6. Add resuspended cells from a single biological replicate to a single well designated for co-culture. Repeat with other three wells (see Note 6). 7. For non-infected samples, perform steps 1–4. Add 2 mL of total DMEM without antibiotics or bacteria. 8. Incubate plates in a humified static incubator at 37  C with 5% CO2 for 90 min.

3.4 Collection of Cells

Procedure applies for both co-cultured and non-infected macrophage samples. 1. Remove cell culture media from each well on the plate. 2. Add 1 mL of room temperature PBS to each well. 3. Tilt the plate to rinse cells with PBS. Remove PBS. 4. Repeat steps 3 and 4.

258

Arjun Sukumaran and Jennifer Geddes-McAlister

5. Add 1 mL of cold PBS to each well. 6. Harvest cells by using a cell scraper. Transfer cells to a 15 mL conical centrifuge tube. 7. Centrifuge cells at 400  g for 5 min at 4  C. 8. Gently remove and discard supernatant. 9. Cells can be flash frozen and stored for later processing, or user can proceed directly to the next section. 3.5 Proteome Extraction

1. Resuspend harvested cells with 300 μL of cold 100 mM TrisHCl (pH 8.5) containing a protease inhibitor cocktail (see Note 4). 2. Lyse cells by using a probe sonicator programmed to 5 cycles of 30 s on/30 s off at an amplitude of 20% (see Note 7). 3. Briefly centrifuge to collect cell debris at the bottom. Transfer cells to a 2 mL microcentrifuge tube (see Note 8). 4. Add 20% SDS to a final concentration of 2% in each sample. 5. Add 1 M dithiothreitol (DTT) to a final concentration of 10 mM in each sample. 6. Incubate samples at 95  C for 10 min with 800 rpm shaking in a thermal shaker. 7. Cool samples to room temperature. 8. Add 0.55 M iodoacetamide (IAA) to a final concentration of 55 mM in each sample. 9. Incubate samples at room temperature for 20 min in the dark. 10. Add cold 100% acetone to final concentration of 80%. 11. Incubate samples at

20  C overnight (see Note 9).

12. Centrifuge precipitate at 10000  g for 10 min at 4  C. 13. Discard supernatant. Wash pellet with 500 μL of cold 80% acetone. 14. Centrifuge at 10000  g for 10 min at 4  C. 15. Repeat steps 13 and 14. 16. Air dry pellet at room temperature. 17. Add 100 μL of 8 M Urea/40 mM HEPES to each sample (see Note 10). 18. Resolubilize pellet by vortexing or sonicating using a water bath sonicator set to 15 cycles of 30 s on/30 s off (see Note 11). 19. Quantify protein concentration. Examples of assays include BCA protein assay or BSA tryptophan assay. 20. Add 50 mM ammonium bicarbonate (ABC) to dilute urea in each sample to a final concentration of 2 M.

Proteomic Profiling of Bacterial Pathogenesis

259

21. Aliquot 100 μg from each sample into new microcentrifuge tubes. Remaining sample can be flash frozen and kept at 20  C for short-term (e.g., 2 weeks) or at 80  C for longer-term (e.g., >1 month) storage. 22. Add 2:50 (v/w) enzyme-to-protein ratio of trypsin/Lys-C protease mixture. 23. Incubate samples at room temperature overnight. 24. The next day, add stopping solution at a dilution of 1:10 to quench digestion and follow previously reported protocol [11] to prepare dried peptides. 3.6 Mass Spectrometry

1. Resuspend dried peptides in 10 μL Buffer A. 2. Measure concentration of sample. 3. Inject ~1.5–3 μg of peptides onto the high performance liquid chromatography column, depending on the instrument (see Note 12). 4. Gradient percentage and length are dependent on users’ experiment, instrument, and preference (see Note 13).

3.7

Data Analysis

1. Load data files output from the mass spectrometer onto data processing software. 2. Set general parameters for analysis, depending on user preference: employ label-free quantification, requirement of a minimum of two peptides for protein identification, a minimum peptide length of seven amino acids, allowing up to two missed cleavages, trypsin digestion, carbamidomethylation of cysteine as a fixed modification, N-acetylation of proteins, and oxidation of methionine were set as variable modifications, and peptide spectral matches were filtered using a target-decoy approach at a false discovery rate of 1%. The “Match between runs” was enabled with a match time window of 0.7 min and an alignment time window of 20 min. Protein identification was searched using the Andromeda search engine against FASTA files obtained from the Uniprot database for K. pneumoniae and Mus musculus. 3. Upload appropriate output file into data analysis software, as appropriate. 4. Filter rows to remove contaminants, reverse hits, etc., label samples and filter for replicates. Complete data processing, as required by user. Representative data provided below (Figs. 1 and 2).

260

Arjun Sukumaran and Jennifer Geddes-McAlister

Fig. 1 Workflow highlighting key steps in proteome extraction. Sample (i.e., K. pneumoniae, uninfected macrophage, or co-cultured samples) is mechanically disrupted to extract proteins followed by digestion into peptides. Purified sample is loaded onto a mass spectrometer followed by bioinformatic analysis to identify proteins and analyze data. Figure is generated in Biorender

Fig. 2 (a) Principal component analysis. Clustering of biological replicates based on condition (Component 1, 66.1%) and replicate reproducibility (Component 2, 10.5%). (b) Venn diagram indicating the species composition of the proteins identified in the infection data. The dataset identified 5009 proteins, of which, 4194 are macrophage proteins and 815 are K. pneumoniae

4

Notes 1. Store aliquot of sterile PBS at 4  C to use when separating macrophage cells from plate/dish. 2. This method can be used for immortalized or primary cells. Optimization for cell types should be performed prior to co-culturing with bacterial cells.

Proteomic Profiling of Bacterial Pathogenesis

261

3. L-glutamine is prone to precipitate when frozen. Ensure while thawing that the solution is inverted to promote dissolution of any particulates. 4. Dissolve one PIC tablet in 10 mL of 100 mM Tris-HCl (pH 8.5). Keep cold at 4  C. 5. Prepare 1 M DTT by dissolving 1.54 g in 10 mL of water. Prepare 0.55 M IAA by dissolving 1.02 g in 10 mL ethanol. Prefer aliquots of DTT and IAA for single time use. Any unused DTT or IAA should be discarded and not frozen. 6. At this stage, the plate can be centrifuged at 200  g for 5 min to synchronize infection. 7. In certain cases, successful lysing of cells can be identified by the visible change of the samples from cloudy to clear. 8. If volume exceeds 400 μL, the following steps can be performed in a 15 mL conical centrifuge tube. 9. Samples can be acetone precipitated up to 2 weeks at 20  C. Increased precipitation time has not correlated with increased protein identification. 10. Depending on total protein content, more than 100 μL of 8 M Urea/40 mM HEPES may be required. Can add in 100 μL increments until sample is completely resolubilized. Need to ensure that amount of 50 mM ABC is added accordingly to ensure urea is diluted to 2 M. 11. Ensure that samples are kept cold at all times. Keep samples on ice when not vortexing. Make sure the water in the water bath sonicator is cooled to 4  C. 12. The amount of sample injected into the instrument depends on the reverse-phase column and instrument sensitivity. Parameters should be optimized for each mass spectrometer. 13. For high-resolution mass spectrometry systems (e.g., Thermo Scientific Orbitrap Fusion™ Lumos™ Tribrid™ or Orbitrap Exploris™ or Bruker timsTOF Pro), we recommend a 2- to 3-h gradient for cellular proteome or co-cultured samples. References 1. Warrington R, Watson W, Kim HL, Antonetti F (2011) An introduction to immunology and immunopathology. Allergy, Asthma Clin Immunol 7:S1. https://doi.org/10.1186/ 1710-1492-7-S1-S1 2. Jacobsen SM, Stickler DJ, Mobley HLT, Shirtliff ME (2008) Complicated catheterassociated urinary tract infections due to Escherichia coli and Proteus mirabilis. Clin Microbiol Rev 21:26–59. https://doi.org/10. 1128/CMR.00019-07

3. Papakonstantinou I, Angelopoulos E, Baraboutis I et al (2015) Risk factors for tracheobronchial acquisition of resistant gramnegative bacterial pathogens in mechanically ventilated ICU patients. J Chemother 27: 2 8 3 – 2 8 9 . h t t p s : // d o i . o r g / 1 0 . 1 1 7 9 / 1973947814Y.0000000199 4. Lau HY, Huffnagle GB, Moore TA (2008) Host and microbiota factors that control Klebsiella pneumoniae mucosal colonization in mice. Microbes Infect 10:1283–1290.

262

Arjun Sukumaran and Jennifer Geddes-McAlister

https://doi.org/10.1016/j.micinf.2008. 07.040 5. Paczosa MK, Mecsas J (2016) Klebsiella pneumoniae: going on the offense with a strong defense. Microbiol Mol Biol Rev 80:629–661. https://doi.org/10.1128/MMBR.00078-15 6. Sukumaran A, Woroszchuk E, Ross T, GeddesMcAlister J (2020) Proteomics of hostbacterial interactions: new insights from dual perspectives. Can J Microbiol:1–43 7. Sukumaran A, Coish JM, Yeung J et al (2019) Decoding communication patterns of the innate immune system by quantitative proteomics. J Leukoc Biol 106:1221–1232. https:// doi.org/10.1002/JLB.2RI0919-302R 8. Ball B, Bermas A, Carruthers-Lay D, GeddesMcAlister J (2019) Mass spectrometry-based proteomics of fungal pathogenesis, host-fungal interactions, and antifungal development. J

Fungi (Basel, Switzerland) 5:52. https://doi. org/10.3390/jof5020052 9. Ball B, Langille M, Geddes-McAlister J (2020) Fun(gi)omics: advanced and diverse technologies to explore emerging fungal pathogens and define mechanisms of antifungal resistance. MBio 11:e01020. https://doi.org/10.1128/ mBio.01020-20 10. Retanal C, Ball B, Geddes-McAlister J (2021) Post-translational modifications drive success and failure of fungal-host interactions. J Fungi (Basel, Switzerland) 7:124. https://doi.org/ 10.3390/jof7020124 11. Rappsilber J, Mann M, Ishihama Y (2007) Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using stage tips. Nat Protoc 2: 1896–1906. https://doi.org/10.1038/nprot. 2007.261

Chapter 18 Affinity Enrichment of Salmonella-Modified Membranes from Murine Macrophages for Proteomic Analyses Tzu-Chiao Chao, Samina Thapa, and Nicole Hansmeier Abstract Dissecting host–pathogen interaction requires the ability to specifically enrich distinct proteins along with their co-assembled constituents or complexes. Affinity technologies leverage specificity of reagents to desired targets and help to enrich proteins of interests along with specifically associated proteins. Coupled with mass-spectrometry-based proteomics, this technology has become a powerful tool to explore pathogen compartments of diverse facultative and obligate intracellular pathogens. Here, we describe the process from infection of macrophages with Salmonella enterica to the affinity enrichment of Salmonella-modified membranes from murine macrophages. Key words Affinity enrichment, Pathogen compartment, Host–pathogen interaction, Salmonella enterica, Macrophages, Immunoprecipitation

1

Introduction Diverse bacterial pathogens adopt an intracellular life-style to escape the host immune system. Host targets include immune cells such as macrophages. The food pathogen Salmonella enterica subsp. enterica serovar Typhimurium (STM) is one of these pathogens. After internalization by the host, Salmonella enterica transforms the phagosome to avoid the bactericidal phagocytic pathway into the so-called Salmonella-containing vacuole (SCV) (Fig. 1) [1–3]. The SCVs mature by continuous interaction with the endosomal and recycling pathway accompanied by a spatial shift close to the microtubule-organizing center into an extensive tubular network [4, 5]. This replication-permissive pathogen-containing compartment (PCC) is formed due to the actions of virulence proteins, so-called effector proteins, deployed into the host cytoplasm or integrated as part of the PCC via two type III secretion systems (T3SS) encoded on Salmonella pathogenicity islands 1 (SPI1) and 2 (SPI2) [6]. We summarize the PCC membranes modified by the

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_18, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

263

264

Tzu-Chiao Chao et al.

Fig. 1 Biogenesis of the pathogen-containing compartment of Salmonella enterica. Abbreviation: ER endoplasmic reticulum; ERGIC ER-Golgi intermediate compartment; SCV Salmonella-containing vacuole; SMM Salmonella-modified membranes; PCC pathogen-containing compartment

intracellular activities of Salmonella as Salmonella-modified membranes (SMM). The analysis of the proteome composition of these membranous compartments via traditional approaches has been challenging as host proteins not involved in the infection process vastly outnumber the proteins involved in the PCC biogenesis. Moreover, the fragile PCC structures are often not amenable for classic organelle enrichment techniques such as gradient-centrifugation. Here we describe a protocol, which we used to analyze the SMM of Salmonella enterica infected murine RAW264.7 macrophages (Fig. 2) [7]. We coupled subcellular fractionation to reduce cytoplasmic host proteins with affinity enrichment using the prominent PCC membrane-integral effector protein SseF as bait. Since specific antibodies for effector proteins are often missing, we used a genetically engineered C-terminal M45-tagged SseF fusion protein encoded on a low copy vector with its native promoter (p3711). To avoid effector overproduction and alteration of the infection process, the vector was used in a sseF deletion mutant. As control to filter out unspecific protein enrichments, we included the Salmonella enterica ΔssaV mutant, which is unable to form the mature PCC. PCC components were subsequently identified and

Affinity-Enrichment of Salmonella-Modified Membranes

265

Fig. 2 Workflow using affinity enrichment to resolve the proteins of the Salmonella-modified membranes

quantified via mass spectrometry and successfully validated in livecell imaging during Salmonella enterica macrophages infection. Similar procedures have been successfully employed for the analysis of the PCC for Salmonella enterica in epithelial cells [8, 9] and for Legionella pneumophila [10–13].

2

Materials All buffers and solutions are prepared using ultrapure water (18 MΩ-cm at 25  C) and HPLC-grade material. For cell cultivation, cell culture-grade material was used. Most media, buffers and solvents can be made and stored at room temperature in advance, unless mentioned otherwise. Please be careful when preparing solutions for crosslinking or protein extraction as some contain toxic substances (consult material safety data sheets before usage).

2.1 Basic Equipment for Cell Cultivation, Infection, and Harvest

1. Cell culture hood (i.e., biosafety cabinet) and incubators (humid CO2 incubator recommended for eukaryotic cell cultures and horizontal shaker for bacterial cell cultures).

266

Tzu-Chiao Chao et al.

2. Cell counter (i.e., hemocytometer) and spectrophotometer. 3. Centrifuge for cell culture vessels (up to 1000  g). 4. Inverted microscope. 5. Water bath (37  C). 6. Refrigerator and freezer ( 20  C). 7. Sterilizer (i.e., autoclave). 2.1.1 Cell Culture: Host

1. Host cell line RAW264.7 (ATCC no. TIB-71) (see Note 1). 2. Host cell medium Dulbecco’s Modified Eagle’s Medium (DMEM) containing 4.5 g/L glucose and 4 mM stable glutamine (Biochrom) supplemented with 6% inactivated fetal calf serum (iFCS). Store medium at 4  C (see Note 2). 3. 1 Phosphate Buffered Saline (PBS) (pH 7.3). 4. Antibiotic 10 or 100 μg/mL gentamicin in water (store at 20  C). 5. Cell culture treated vessels and sterile plastic material such as cell scraper, pipettes, tips, and tubes.

2.1.2 Cell Culture: Pathogen Salmonella enterica

1. Salmonella enterica serovar Typhimurium strains (Salmonella enterica) NCTC12023 expressing tagged membrane-integral effector protein for affinity enrichment (see Note 3). 2. Miller’s Lysogeny Broth (LB) medium: 10 g/L tryptone, 5 g/ L yeast extract, 10 g/L sodium chloride (NaCl) in water (pH 7–7.5). 3. Antibiotic for selection of plasmid (if required). 4. Cell culture vessels and sterile plastic material such as pipettes, tips, and tubes.

2.2 Basic Equipment for Protein Extraction and Affinity Enrichment

1. Centrifuge for 50 mL tubes (up to 500  g) and refrigerated centrifuge for 1–2 mL tubes (up to 12,000  g). 2. Homogenizer (e.g., Vortex-2 Genie with Turbomix). 3. Rotary shaker with end-over-end mixing. 4. Thermomixer. 5. Magnetic separation rack. 6. Spectrophotometer. 7. Refrigerator (4  C) and freezer ( 20  C). 8. Sterilizer (i.e., autoclave).

2.2.1 Protein Extraction for Affinity Enrichment

1. Phosphate Buffered Saline (PBS): 1.06 mM potassium di-hydrogen phosphate (KH2PO4), 2.98 mM di-natrium hydrogen phosphate (Na2HPO4), 155 mM sodium chloride (NaCl) in water (pH 7.4).

Affinity-Enrichment of Salmonella-Modified Membranes

267

2. Osmo-stabilizing buffer: 250 mM sucrose, 20 mM HEPES, 0.5 mM EGTA in water (pH 7.4). 3. Homogenization buffer: osmo-stabilizing buffer with 1 protease inhibitor cocktail (Serva). Prepare fresh before use. 4. Digestion buffer: osmo-stabilizing buffer supplemented with 1.5 mM magnesium chloride (MgCl2) and 50 μg/mL DNase I (pH 7.4). Prepare fresh before use. 5. Trypan Blue solution. 6. Protein assay kit (i.e., Bradford reagent). 7. PowerBead tubes (0.5 mm glass beads) for mechanical cell lysis of eukaryotic cells. 8. Sterile plastic material such as cell scraper, pipettes, tips, and tubes. 2.2.2 Labeling of Protein G Magnetic Beads with M45 Antibody

1. GE Protein G Mag Sepharose. 2. M45 antibody. 3. PBS. 4. Crosslink buffer A: 0.2 M triethanolamine (TEA) (pH 8.9). 5. Crosslink buffer B: 0.05 M dimethyl-pimelimidate dihydrochloride (DMP) in 0.2 M triethanolamine (TEA) (pH 8.9). Prepare fresh before use. 6. Blocking buffer A: 0.1 mM ethanolamine (ETA) in water (pH 8.9). 7. Washing buffer A: 0.1 M glycine-hydrochloride (HCl) in water (pH 2.9). 8. Blocking buffer B: 1% BSA (w/v) in PBS (7.4). Prepare fresh before use. 9. Sterile plastic material such as pipettes, tips, and tubes.

2.2.3 Affinity Enrichment of Salmonella-Modified Membranes

1. M45 labeled protein G magnetic beads. 2. Resuspension mix: 1.5 mM magnesium chloride (MgCl2), 10 mM potassium chloride (KCl), 0.1% nonidet P-40 (NP-40) in water. Prepare fresh before use. 3. Washing buffer B: 0.1% nonidet P-40 (NP-40) in PBS (pH 7.4). Prepare fresh before use. 4. SDS buffer A: 12.5% glycerol, 4% (w/v) sodium dodecyl sulfate (SDS), 2% beta-mercaptoethanol (C2H6OS) in 50 mM Tris (pH 6.8). Store at 20  C. 5. SDS buffer B for liquid digest: 4% (w/v) sodium dodecyl sulfate (SDS), 10 mM dithiothreitol (DTT) in 50 mM Tris (pH 6.8). Store at 20  C. 6. Sterile plastic material such as pipettes, tips, and tubes.

268

3

Tzu-Chiao Chao et al.

Methods

3.1 RAW264.7 Cell Infection

1. Seed RAW264.7 cells 48 h before infection at 37  C in an atmosphere of 5% CO2 (see Note 4). 2. Inoculate liquid medium (supplement medium with antibiotics, if required) with single colony of Salmonella enterica and grow them in rotary shaker at 37  C for 14–16 h (overnight) (see Note 5). 3. Infect RAW264.7 cells with Salmonella enterica overnight culture with a multiplicity of infection (MOI) of 50 for 30 min at 37  C in an atmosphere of 5% CO2 (see Note 6). 4. Remove infection medium and wash RAW264.7 cells carefully with pre-warmed (37  C) PBS (see Note 7). 5. Repeat washing step thrice. 6. Add DMEM containing 100 μg/mL gentamicin and incubate cells at 37  C in an atmosphere of 5% CO2 for 1 h (see Note 8). 7. Replace medium gently with DMEM containing 10 μg/mL gentamicin and incubate cells at 37  C in an atmosphere of 5% CO2 until infection-time point of interest (see Note 9). 8. Control Salmonella replication via replication assay and inspect each culture vessel by microscopy before proceeding with protein preparation for affinity enrichment (see Note 10).

3.2 Preparation of Protein Fraction for Affinity Enrichment

1. At infection time-point of interest, remove cell culture media from culture vessels and wash cells carefully with pre-warmed (37  C) PBS (see Note 7). 2. Repeat washing step thrice. 3. Add pre-warmed (37  C) osmo-stabilizing buffer and use the cell scraper to gently detach RAW264.7 cells. Afterwards transfer detached cells into conical tube and centrifuge at 500  g for 10 min to pellet cells (see Note 11). 4. Remove osmo-stabilizing buffer and resuspend cells in 1 mL of pre-cooled (4  C) homogenization buffer (see Note 12). 5. Transfer cells into a 2 mL PowerBead-tube (pre-packed with 0.5 mm glass beads) and lyse cells mechanically by applying 5  1 min strokes using a homogenizer (see Note 13). 6. Centrifuge lysate at 100  g for 15 min at 4  C to remove unbroken cells and beads. Then transfer supernatant into new 1 mL tube. 7. Centrifuge at 8000  g for 10 min at 4  C to collect protein fraction for affinity enrichment. 8. Wash pellet twice with pre-cooled (4  C) homogenization buffer to reduce cytoplasmic proteins (see Note 14).

Affinity-Enrichment of Salmonella-Modified Membranes

269

9. Resuspend pellet in 500 μL digestion buffer and incubate on thermomixer for 30 min at 37  C (see Note 15). 10. Determine protein concentration via protein assay (i.e., Bradford assay according to manufacturer’s instruction) and proceed directly with affinity enrichment (see Note 16). 3.3 Labeling of Magnetic Beads for Affinity Enrichment

1. Aliquot slurry containing protein G magnetic beads, apply magnet for 30 s and remove any liquid (see Note 17). 2. Wash beads with pre-cooled (4  C) PBS (see Note 18). 3. Repeat washing step. 4. Add antibody resuspended in 100 μL PBS (pH 7.4) and incubate on a rotary shaker with end-over-end mixing at 4  C overnight (see Note 19). 5. Remove liquid and wash beads with pre-cooled (4  C) PBS (see Note 18). 6. Add 0.5 mL of crosslinking buffer A, vortex gently, apply magnet and remove liquid. 7. Repeat previous step. 8. Resuspend magnetic beads in 0.5 mL crosslinking buffer B and incubate for 30 min at 4  C with end-over-end mixing. 9. Replace liquid with 0.5 mL of crosslinking buffer A. 10. Vortex gently and incubate for 15 min at 4  C with end-overend mixing. 11. Replace crosslinking buffer A with 0.5 mL washing buffer to remove non-crosslinked antibody. Apply magnet to remove liquid after 30 s. 12. Wash beads twice with pre-cooled (4  C) PBS (see Note 18). 13. Add 0.5 mL blocking buffer B and incubate for 30 min at 4  C with end-over-end mixing (see Note 20). Then apply magnet and remove liquid. 14. Wash beads with pre-cooled (4  C) PBS (see Note 18). 15. Use M45-labeled enrichment.

3.4 Affinity Enrichment of Salmonella-Modified Membranes (SMM)

magnetic

beads

directly

in

affinity

1. Resuspend 0.5 mg of the enriched protein fraction in 200 μL resuspension buffer and incubate mixture with M45-labeled magnetic beads on rotary shaker with end-over-end mixing for 12 h at 4  C (see Note 21). 2. Apply magnet for 30 s to separate beads and remove liquid with unbound proteins. 3. Add 0.5 mL pre-cooled (4  C) washing buffer B, vortex gently, separate beads and remove liquid. 4. Repeat washing step five times.

270

Tzu-Chiao Chao et al.

5. For elution, add either 20 μL SDS buffer A for protein fractionation via SDS-PAGE or B for filter-aided sample preparation (FASP) [14] and incubate sample on thermomixer at 95  C for 2 min. After short centrifugation (12,000  g for 5 min) collect the supernatant in a new tube. Sample can be stored at 80  C or immediately processed for mass spectrometry (see Note 22).

4

Notes 1. This protocol can also be applied to epithelial cell lines. Host cell cultivation and infection conditions need to be adapted to the particular cell line. Details for the application on HeLa cells can be found in Vorwerk et al. [8]. 2. FCS is heat inactivated before added to the DMEM. For heat inactivation, FCS is incubated at 56  C for 30 min and then sterile filtered (0.45 μm filter). 3. As monoclonal antibodies for Salmonella enterica effector proteins are missing, we used a low copy vector for the expression of an epitope-tagged effector protein and deleted its chromosomal-encoded copy to avoid effector overproduction and alteration of the infection process. We selected the SPI2T3SS effector protein SseF, as it is one of the most abundant membrane-integral components of Salmonella’s pathogen compartment with a long half-life [15] and amenability as fusion partner for various heterologous antigens and tags, such as a tandem M45 [16]. Details about the construction of the low copy vector named p3711 (SseF-2TEV-2M45) carrying a carbenicillin resistance for selection can be found in Vorwerk et al. [8]. For the affinity enrichment, we used two strains: a. STM ΔsseF p3711 expressing and translocating the M45-tagged SseF effector protein, forming an elaborate wildtype like pathogen compartment, whereas b. STM ΔssaV p3711 expressing SseF is unable to translocate the M45-tagged SseF effector protein and thereby impaired in the formation of a wild-type like pathogen-containing compartment. STM ΔssaV p3711 served as the control for the affinity enrichment to determine non-specific enrichment, which is of paramount importance for every affinity purification/enrichment-based analysis. 4. Cell numbers per culture vessel are determined via hematocytometer. To analyze the Salmonella-modified membrane at 12 h post infection (h.p.i.), we seeded six cell culture-treated vessels with a 75 cm2 surface area (cell density per culture vessel: 4  106 cells in 12 mL DMEM) 48 h before infection. As PCCs are smaller during earlier time-points of infection (e.g.,

Affinity-Enrichment of Salmonella-Modified Membranes

271

4 h or 8 h p.i.), host cell numbers per application have to be adjusted. For instance, to analyze PCCs as 8 h p.i., we used twice as many culture-treated vessels/host cells per application. 5. To ensure phagocytic uptake of Salmonella enterica by macrophages, STM strains were grown until late stationary growth phase (14–16 h). LB medium (3 mL) supplemented with 50 μg/mL carbenicillin were inoculated either with STM ΔsseF p3711 or the control strain STM ΔssaV p3711, both expressing the M45-tagged effector protein SseF. 6. For infection, the STM cultures are diluted with PBS (pH 7.3) to an optical density (600 nm) of 0.2 (~3  108 cfu/mL). We added 2.6 mL of this diluted cell suspension into the cell culture medium containing 1.6  107 RAW264.7 cells per culture vessel. To synchronize infection, we centrifuged culture vessels at 500  g for 5 min to mediate contact between pathogen and host cells. This step is followed by 25 min of incubation at 37  C in an atmosphere of 5% CO2. This timepoint is used as time-point zero (0 h.p.i.) of the infection. 7. To wash RAW264.7 cells but avoid disturbing the cell layer, pre-warmed (37  C) PBS (approximately 5 mL per 75 cm2 culture surface area) is added to the side of the vessel. The vessel is gently rocked several times back and forth for efficient rinsing. Afterwards PBS is carefully removed from culture vessel. 8. To kill extracellular Salmonella enterica, DMEM medium is supplemented with 100 μg/mL gentamicin before added to the culture vessel. 12 mL medium is required per 75 cm2 culture surface area. 9. To reduce re-infection of RAW264.7 cells (post 0 h.p.i.), DMEM medium is supplemented with 10 μg/mL gentamicin. 10. As the intracellular replication efficiency of Salmonella enterica depends on SPI2-expession as well as internal host factors, it is important to confirm intracellular Salmonella replication per infected culture vessel via an inverted light microscope. 11. To stabilize infected cells and minimize cell lysis, cells are scaped in osmo-stabilizing buffer (8 mL per 75 cm2 culture surface) and collected via low g centrifugation. 12. For each experiment, the homogenization buffer needs to be freshly prepared as the added protein inhibitor cocktail (prepared according to manufacturer’s instruction (Serva)) does not remain stable over an extended period of time. 13. Between each stroke, cool tubes for 30 s on ice to avoid local heating during mechanical cell lysis. To monitor cell lysis, remove 5–10 μL of the homogenate, mix it with equal volume

272

Tzu-Chiao Chao et al.

of trypan blue solution and inspect it under a microscope. More than 99% of the RAW264.7 cells should be lysed. If lysis efficiency is too low, add two additional strokes with the homogenizer and re-evaluate the cell lysis. 14. For our approach, the reduction of cytoplasmic proteins was critical to reduce unspecific protein enrichment. 15. This step is important, as excessive DNA contaminations interfere with successful affinity enrichment. However, DNAse treatment can be replaced by short, low power sonication, which is enough to reduce the viscosity but not powerful enough to damage protein complexes. 16. Avoid freezing the protein extracts. 17. It is essential to work with thoroughly dispersed slurry before aliquoting. Therefore, vortex thoroughly to break up any aggregates. To separate the supernatant from the beads, place the tube into a magnetic stand for 30 s to collect the magnetic beads. Then remove the liquid by drawing from the opposite site of the vessel and away from the beads to avoid losing beads. Per sample, we used 25 μL of the slurry containing protein G magnetic beads. For each affinity enrichment, magnetic beads were freshly labeled the day before their use in affinity enrichment. 18. Bead washing is an important step to reduce non-specific binding. We used 0.5 mL PBS, vortexed gently for 30 s, applied the magnet for 30 s to collect the beads and then removed all liquid by drawing from the opposite site of the beads. 19. We used 40 μg M45-antibody for labeling and chose incubation at 4  C overnight as it fit well with our infection and protein preparation workflow. This step can alternatively be performed for 2–4 h at room temperature. The use of a smaller amount of M45-antibody resulted in our hands in an increase of non-specific bound proteins. 20. Incubation with blocking buffer B reduces non-specific enrichment. 21. A minimum of 0.5 mg of the enriched protein fraction was in our hands required per affinity enrichment. 22. We selected the harshest elution method (SDS buffer), since with milder eluent (6–8 M urea buffer or glycine buffer (pH 2.7)) most of our M45-tagged SseF was still retained on magnetic beads.

Affinity-Enrichment of Salmonella-Modified Membranes

273

Acknowledgements I like to thank my former team from the Osnabru¨ck University, in particular Drs. Vorwerk, Deiwick, Walter and Hensel and MSc. Bo¨hles. This work was supported by NSERC grant RGPIN2019-07152. References 1. Steele-Mortimer O, Meresse S, Gorvel JP, Toh BH, Finlay BB (1999) Biogenesis of Salmonella typhimurium-containing vacuoles in epithelial cells involves interactions with the early endocytic pathway. Cell Microbiol 1(1):33–49 2. Brumell JH, Scidmore MA (2007) Manipulation of rab GTPase function by intracellular bacterial pathogens. Microbiol Mol Biol Rev 71(4):636–652. https://doi.org/10.1128/ MMBR.00023-07 3. Figueira R, Holden DW (2012) Functions of the Salmonella pathogenicity island 2 (SPI-2) type III secretion system effectors. Microbiology 158(Pt 5):1147–1161. https://doi.org/ 10.1099/mic.0.058115-0 4. Schroeder N, Mota LJ, Meresse S (2011) Salmonella-induced tubular networks. Trends Microbiol 19(6):268–277. https://doi.org/ 10.1016/j.tim.2011.01.006 5. Liss V, Hensel M (2015) Take the tube: remodelling of the endosomal system by intracellular Salmonella enterica. Cell Microbiol 17(5): 639–647. https://doi.org/10.1111/cmi. 12441 6. Haraga A, Ohlson MB, Miller SI (2008) Salmonellae interplay with host cells. Nat Rev Microbiol 6(1):53–66. https://doi.org/10. 1038/nrmicro1788 7. Reuter T, Vorwerk S, Liss V, Chao TC, Hensel M, Hansmeier N (2020) Proteomic analysis of Salmonella-modified membranes reveals adaptations to macrophage hosts. Mol Cell Proteomics 19(5):900–912. https://doi. org/10.1074/mcp.RA119.001841 8. Vorwerk S, Krieger V, Deiwick J, Hensel M, Hansmeier N (2015) Proteomes of host cell membranes modified by intracellular activities of Salmonella enterica. Mol Cell Proteomics 14(1):81–92. https://doi.org/10.1074/mcp. M114.041145 9. Santos JC, Duchateau M, Fredlund J, Weiner A, Mallet A, Schmitt C et al (2015) The COPII complex and lysosomal VAMP7 determine intracellular Salmonella localization and growth. Cell Microbiol 17(12):

1699–1720. https://doi.org/10.1111/cmi. 12475 10. Hoffmann C, Finsel I, Otto A, Pfaffinger G, Rothmeier E, Hecker M, Becher D, Hilbi H (2014) Functional analysis of novel Rab GTPases identified in the proteome of purified Legionella-containing vacuoles from macrophages. Cell Microbiol 16(7):1034–1052. https://doi.org/10.1111/cmi.12256 11. Schmolders J, Manske C, Otto A, Hoffmann C, Steiner B, Welin A, Becher D, Hilbi H (2017) Comparative proteomics of purified pathogen vacuoles correlates intracellular replication of Legionella pneumophila with the small GTPase Ras-related protein 1 (Rap1). Mol Cell Proteomics 16(4):622–641. https:// doi.org/10.1074/mcp.M116.063453 12. Naujoks J, Tabeling C, Dill BD, Hoffmann C, Brown AS, Kunze M et al (2016) IFNs modify the proteome of Legionella-containing vacuoles and restrict infection via IRG1derived itaconic acid. PLoS Pathog 12(2): e1005408. https://doi.org/10.1371/journal. ppat.1005408 13. Herweg JA, Hansmeier N, Otto A, Geffken AC, Subbarayal P, Prusty BK et al (2015) Purification and proteomics of pathogen-modified vacuoles and membranes. Front Cell Infect Microbiol 5:48. https://doi.org/10.3389/ fcimb.2015.00048 14. Wisniewski JR, Zougman A, Nagaraj N, Mann M (2009) Universal sample preparation method for proteome analysis. Nat Methods 6(5):359–362. https://doi.org/10.1038/ nmeth.1322 15. Kuhle V, Hensel M (2002) SseF and SseG are translocated effectors of the type III secretion system of Salmonella pathogenicity island 2 that modulate aggregation of endosomal compartments. Cell Microbiol 4(12):813–824 16. Xiong G, Husseiny MI, Song L, ErdreichEpstein A, Shackleford GM et al (2010) Novel cancer vaccine based on genes of Salmonella pathogenicity island 2. Int J Cancer 126(11):2622–2634. https://doi.org/10. 1002/ijc.24957

Chapter 19 Proteomic Profiling of Interplay Between Agrobacterium tumefaciens and Nicotiana benthamiana for Improved Molecular Pharming Outcomes Nicholas Prudhomme, Jonathan R. Krieger, Michael D. McLean, Doug Cossar, and Jennifer Geddes-McAlister Abstract Transient expression of recombinant proteins in plants is being used as a platform for production of therapeutic proteins. Benefits of this system include a reduced cost of drug development, rapid delivery of new products to the market, and an ability to provide safe and efficacious medicines for diseases. Although plant-based production systems offer excellent potential for therapeutic protein production, barriers, such as plant host defense response, exist which negatively impact the yield of product. Here we provide a protocol using tandem mass tags and mass spectrometry-based proteomics to quickly and robustly quantify the change in abundance of host defense proteins produced during the production process. These proteins can then become candidates for genetic manipulation to create host plants with reduced plant defenses capable of producing higher therapeutic protein yields. Key words Agrobacterium tumefaciens, Nicotiana benthamiana, Molecular pharming, Mass spectrometry-based proteomics, Tandem mass tags, Infectome

1

Introduction Plant-based production systems for pharmaceutical proteins are becoming an attractive alternative to current mammalian cell or bacterial systems due to scalability, reduced cost, and swift deployment [1]. The most common plant systems using transient gene expression are the model system, Nicotiana benthamiana, which is infected with the bacterial pathogen, Agrobacterium tumefaciens, to incorporate a gene of interest into the plant genome and facilitate target protein production. This process includes A. tumefaciens introduction into plant cells through agroinfiltration, where whole plants are submerged in infiltration media containing a bacterial suspension and a vacuum is employed to force bacterial cells through stomata and into plant tissue [2]. Infected plants are

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_19, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

275

276

Nicholas Prudhomme et al.

incubated for approximately 7 days and proteins of interest are extracted and purified. Host defense responses are an important component affecting target protein yield in plant-based production systems [3]. These responses include post-transcriptional gene silencing (PTGS) [4], elevated hydrogen peroxide production [5], the hypersensitive response [6], and unintended proteolysis [7], which may increase in the presence of environmental stress, including agroinfiltration with A. tumefaciens. Such host defense responses protect the plant from the pathogen by suppressing bacterial virulence and co-incidentally reducing target protein yield. While certain responses like PTGS have been successfully countered through co-expression of silencing suppressors, such as the Tombusvirus P19 protein [8], other host defense responses continue to be yield-limiting factors. Proteins responsible for host defense responses would likely begin to increase in abundance post-agroinfiltration. To identify and quantify the change in host defense proteins over time, mass spectrometry-based proteomics can be used [9]. As the proteins of interest are unknown, a discovery based “bottom up” mass spectrometry approach is employed. Many strategies are available to quantify and determine changes in protein abundance in such a proteomics approach. Peptide labeling with tandem mass tags (TMT) or alternative strategies, including isobaric tag for relative and absolute quantification (iTRAQ), allows the efficient and reproducible labeling of up to 16 samples, which can be multiplexed together for MS analysis [10, 11]. Multiplexing capabilities allows for deep proteome coverage of large sample sets without an accompanying increase in measuring time. These advantages make TMT attractive for this type of study where analysis requires robust quantitation of peptides across multiple time points and conditions. Here, infiltrated plant material at a given time point postinfection is homogenized, proteins extracted, digested into peptides, labeled, and abundances measured with a mass spectrometer (Fig. 1). The output files can be analyzed with various software platforms including open-source software such as MaxQuant [12] and Perseus [13] to provide an unbiased profile of how the plant proteome changes throughout the infection process and identify specific proteins involved in host defense response. These proteins can then become candidates for genetic manipulation to create host plants with reduced plant defenses capable of producing higher target protein yields. Alternatively, investigation of changes to the bacterial proteome can be explored to uncover novel bacterial virulence factors that may promote invasion and survival within the plant, overcoming plant defenses responses, leading to higher protein yields. We previously demonstrated the impact of bacterial growth conditions on the proteome and secretome, along with the effect of agroinfiltration on bacterial cellular remodeling

Proteomic Profiling of Interplay Between Agrobacterium tumefaciens. . .

277

Fig. 1 Mass spectrometry-based proteomics workflow for protein identification and quantification. Proteins are extracted from N. benthamiana plants infiltrated with A. tumefaciens by Bullet Blender® homogenization, and acetone precipitation, followed by trypsin digestion. The subsequent peptides from each sample are labeled with TMTpro™ 16plex tandem mass tags (TMT), combined, purified with Peptide Desalting Spin Columns, and separated by high-performance liquid chromatography (HPLC). The peptides are then ionized by electrospray ionization (ESI) and analyzed by mass spectrometry. Computational proteomics is performed using the publicly available software platforms, MaxQuant and Perseus, for data analysis and visualization

[14, 15]. In this chapter, we expand upon these previous proteomics profilings to describe the methodology to detect and quantify changes in protein abundance following agroinfiltration of N. benthamiana.

2 2.1

Materials Plant Material

Starting plant material as determined by the researcher. This protocol is optimized for processing approx. 30 mg of N. benthamiana leaves following agroinfiltration with A. tumefaciens and up to 7 days of growth. Notably, younger plant material, un-infiltrated plant material, or other plant tissues can be selected for processing; however, the extraction protocol can be further improved for these tissues as determined by the researcher.

278

Nicholas Prudhomme et al.

All reagents are made with mass spectrometry (MS)-grade H2O, unless states otherwise. 2.2 Protein Extraction

1. 100 mM Tris-HCl pH 8.5. 2. cOmplete™, EDTA-free Protease Inhibitor Cocktail (PIC) tablet. 3. 2 mL Eppendorf LoBind tube. 4. Next Advance SSB14B Stainless Steel Beads. 5. Bullet Blender® Storm. 6. Refrigerated table top centrifuge. 7. 20% Sodium Dodecyl Sulfate (SDS). 8. 1 M Dithiothreitol (DTT) (see Note 1). 9. 0.55 M Iodoacetamide (IAA) (see Note 1). 10. 100% Acetone stored at 11. 80% Acetone stored at

20  C. 20  C.

12. 8 M urea/40 mM HEPES. 13. Water bath sonicator. 14. 50 mM Ammonium Bicarbonate (ABC). 2.3

Protein Digestion

1. Trypsin/Lys-C Protease Mix, MS-grade. 2. 0.1% Acetic acid. 3. Stopping solution: 20% acetonitrile (ACN), 6% tri-fluoroacetic acid (TFA), 74% H2O.

2.4 Peptide Purification

1. 100% ACN. 2. Buffer A: 2% ACN, 0.1% TFA, 0.5% Acetic acid, 97.4% H2O. 3. Buffer B: 80% ACN, 0.5% Acetic acid, 19.5% H2O. 4. PCR strip tubes. 5. Vacuum centrifuge.

2.5

TMT Labeling

1. 100 mM HEPES pH 8.5 or 100 mM tri-ethyl ammonium bicarbonate (TEAB). 2. TMTpro™ 16plex Isobaric Label Kit. 3. Pierce™ Peptide Desalting Spin Columns. 4. Nanodrop spectrophotometer.

Proteomic Profiling of Interplay Between Agrobacterium tumefaciens. . .

3

279

Methods

3.1 Preparing Plant Material

1. Weigh 30 mg frozen N. benthamiana leaf in a 2 mL Eppendorf LoBind tube (see Note 2). 2. Add 0.8 g Next Advance SSB14B Stainless Steel Beads (0.9–2 mm diameter) (see Note 3). 3. Add 325 μL of 100 mM Tris-HCl with PIC tablet (see Note 4). 4. Homogenize tissue using Bullet Blender® Storm on power eight for 2 min (see Note 5). 5. Centrifuge 30 s at 1000  g to remove splash from sides of tubes. 6. Transfer 300 μL to a new 2 mL LoBind tube (see Note 6).

3.2 Protein Extraction

1. Add 1/10 20% SDS (30 μL). 2. Add 1/100 1 M DTT (3.3 μL). 3. Vortex briefly, then place on heating block at 95  C and 800 rpm for 10 min. 4. Allow samples to cool to room temperature (see Note 7), then add 1/10 0.55 M IAA (33.3 μL) (see Note 1), vortex briefly and incubate in the dark for 20 min. 5. Add four volumes (approximately 1.5 mL) of 100% ice cold acetone (final concentration 80% Acetone) and incubate overnight at 20  C (see Note 8). 6. Centrifuge at 4  C and >10,000  g for 10 min to pellet precipitate. 7. Remove supernatant, wash pellet with 500 μL of 80% ice cold Acetone and centrifuge at 4  C and >10,000  g for 10 min. 8. Repeat steps 6 and 7 for a total of two washes with 80% acetone (see Note 9). 9. Air dry pellet (see Note 10). 10. Add 100 μL 8 M urea/40 mM HEPES or 100 mM TEAB (see Note 11). 11. Waterbath sonicate at 4  C for 30 s on/30 s off for 15 cycles or until pellet is completely dissolved. 12. Add 300 μL 50 mM ABC. 13. Aliquot 120 μL to a new 1.5 mL LoBind Tube.

3.3

Protein Digestion

1. Dilute Trypsin/Lys-C Protease mix to 0.5 μg/μL in 0.1% acetic acid using MS-grade water. 2. Add 5 μL Protease Mix to sample and incubate at room temperature for approx. 12–18 h. 3. Add 1/10 Stopping solution (12.5 μL).

280

Nicholas Prudhomme et al.

4. Centrifuge at max speed for 10 min to pellet precipitate. 5. Transfer supernatant to a new 1.5 mL LoBind tube and store on ice prior to peptide purification (see Note 12). 3.4 Peptide Purification

1. Prepare STAGE tips with 3 layers of C18 resin (see Note 13). 2. Add 100 μL 100% ACN to STAGE tip and centrifuge at 1000  g for 1–2 min or until liquid has passed through the filter. 3. Add 50 μL Buffer B and centrifuge at 1000  g for 1–2 min or until all liquid has passed through the filter. 4. Add 200 μL Buffer A and centrifuge at 1000  g for 1–2 min or until all liquid has passed through the filter. 5. Add 50 μg (70 μL) of hydrolyzed plant protein extract and centrifuge at 1000  g for 3–5 min or until all liquid has passed through the filter. 6. Wash with 200 μL Buffer A and centrifuge at 1000  g for 3–5 min or until all liquid has passed through the filter. 7. Elute with 50 μL (2  25 μL) Buffer B in PCR strip tubes. 8. Dry samples in SpeedVac at 45  C for 45 min (see Note 14).

3.5

TMT Labeling

1. Resuspend dried peptides in 100 μL 100 mM HEPES pH 8.5 or 100 mM TEAB. 2. Quantify peptides by spectrophotometer (Absorbance at 280 nm). 3. Label peptides with TMTpro™ 16plex Isobaric Label Kit, according to manufacturer’s instructions (Table 1). 4. Clean up labeled samples with Peptide Desalting Spin Columns, according to manufacturer’s instructions.

3.6 Mass Spectrometry

1. Resuspend dried peptides in 10 μL Buffer A. 2. Measure concentration of sample. 3. Inject approximately 1.5–3 μg of peptides onto the highperformance liquid chromatography column for measurement on the high-resolution mass spectrometer, depending on the instrument (see Note 15). 4. Gradient length, ACN percentage, as well as appropriate MS1 and MS2 resolution and other instrument parameters depend on experiment and instrument (see Note 16).

3.7

Data Analysis

1. Load data files output from the mass spectrometer onto data processing software. We routinely use MaxQuant, but comparable software packages may be used.

Sample #

Common Sample 4 Sample 21 Sample 18 Sample 27 Sample 11 Sample 16 Sample 24 Sample 3 Sample 6 Sample 17 Sample 34 Sample 5 Sample 56 Sample 15 Sample 39

MS experiment

1

50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50

Protein μg TMTpro-126 TMTpro-127N TMTpro-127C TMTpro-128N TMTpro-128C TMTpro-129N TMTpro-129C TMTpro-130N TMTpro-130C TMTpro-131N TMTpro-131C TMTpro-132N TMTpro-132C TMTpro-133N TMTpro-133C TMTpro-134N

TMT label 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500

Label μg 3

MS experiment Common Sample 36 Sample 19 Sample 38 Sample 48 Sample 46 Sample 25 Sample 51 Sample 13 Sample 23 Sample 32 Sample 1 Sample 2 Sample 45 Sample 20 Sample 14

Sample # 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50

Protein μg

TMTpro-126 TMTpro-127N TMTpro-127C TMTpro-128N TMTpro-128C TMTpro-129N TMTpro-129C TMTpro-130N TMTpro-130C TMTpro-131N TMTpro-131C TMTpro-132N TMTpro-132C TMTpro-133N TMTpro-133C TMTpro-134N

TMT label

(continued)

500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500

Label μg

Table 1 Example table for creating 4 MS experiments from 60 samples using the TMTpro™ 16plex Isobaric Label Kit. A 16plex TMT experiment with 60 samples will generate 4 MS experiments. To normalize protein abundances quantified across these experiments, an internal standard or “common” channel is generated by mixing equal parts of all 60 samples together. For this example, at least 200 μg of internal standard is required. It is good practice to create more internal standard then is needed in case one MS experiment needs to be repeated or the TMT experiment is expanded in the future Proteomic Profiling of Interplay Between Agrobacterium tumefaciens. . . 281

Sample #

Common Sample 37 Sample 35 Sample 58 Sample 12 Sample 43 Sample 30 Sample 22 Sample 41 Sample 57 Sample 47 Sample 50 Sample 8 Sample 40 Sample 44 Sample 60

MS experiment

2

Table 1 (continued)

50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50

Protein μg TMTpro-126 TMTpro-127N TMTpro-127C TMTpro-128N TMTpro-128C TMTpro-129N TMTpro-129C TMTpro-130N TMTpro-130C TMTpro-131N TMTpro-131C TMTpro-132N TMTpro-132C TMTpro-133N TMTpro-133C TMTpro-134N

TMT label 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500

Label μg 4

MS experiment Common Sample 54 Sample 28 Sample 55 Sample 49 Sample 33 Sample 26 Sample 52 Sample 7 Sample 42 Sample 9 Sample 59 Sample 10 Sample 29 Sample 31 Sample 53

Sample # 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50

Protein μg

TMTpro-126 TMTpro-127N TMTpro-127C TMTpro-128N TMTpro-128C TMTpro-129N TMTpro-129C TMTpro-130N TMTpro-130C TMTpro-131N TMTpro-131C TMTpro-132N TMTpro-132C TMTpro-133N TMTpro-133C TMTpro-134N

TMT label

500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500

Label μg

282 Nicholas Prudhomme et al.

Proteomic Profiling of Interplay Between Agrobacterium tumefaciens. . .

283

2. Set general parameters for analysis, depending on user preference and instrumentation: reporter ion MS3 level quantification (if used), select or load isobaric labels for 16plex TMT, requirement of a minimum of two peptides for protein identification, a minimum peptide length of seven amino acids, allowing up to two missed cleavages, trypsin digestion, carbamidomethylation of cysteine as a fixed modification, and oxidation of methionine set as variable modifications, and peptide spectral matches filtered using a target-decoy approach at a false discovery rate of 1%. Match between runs enabled with a match time window of 0.7 min and an alignment time window of 20 min. Protein identification searched using Andromeda [16] search engine against FASTA files obtained from the Uniprot database for N. benthamiana and A. tumefaciens. 3. Upload appropriate output file into data analysis software. We routinely use Perseus, but comparable software packages may be used. 4. Filter rows to remove contaminants, reverse hits, etc., label samples and filter for replicates based on valid values. Complete data processing, as required by user. Representative data provided below (Fig. 2).

A

B

2328

Component 2 [1e-2] -5 0

286

-1

-0.5

0 0.5 1 Component 1 [1e-1]

1.5

2

Fig. 2 Overview of proteomics of N. benthamiana, A. tumefaciens infectome (a) Venn diagram of total proteins identified. N. benthamiana (2328 proteins: green) vs A. tumefaciens (286 proteins: yellow). In total, 2614 proteins identified. (b) Protein level principal component analysis (PCA) plot of infectome. A. tumefaciens (yellow) and N. benthamiana (green) proteins. (Black) Response proteins associated with defense, hydrogen peroxide catabolism, oxidative stress, and proteolysis. Shows particular proteins in high abundance of both host and pathogen following agroinfiltration

284

4

Nicholas Prudhomme et al.

Notes 1. Prefill aliquots of DTT and IAA for single time use. Can be stored at 20  C prior to experiment, fully thaw before use. Any unused DTT or IAA should be discarded and not refrozen. IAA is light sensitive. Keep 0.55 M aliquot in the dark until ready for use and then quickly add to samples. Bench drawers work great as a dark incubation chamber. 2. Ensure that samples are kept cold at all times. Keep samples on ice when not vortexing. Make sure the water in the water bath sonicator is cooled to 4  C. 3. The bag of Next Advance SSB14B Stainless Steel Beads comes with a small white spoon labeled “0699 0.10G.” One scoop of the provided spoon is approximately 0.8 g. Beads do not arrive sterile and should be autoclaved before use. Beads can be washed, autoclaved, and reused. 4. Dissolve one PIC tablet in 10 mL of 100 mM Tris-HCl (pH 8.5). 5. Power settings on the Bullet Blender® Storm go up to 12. If dealing with difficult to grind plant material, power and/or time can be increased. Eppendorf LoBind tubes begin to lose structural integrity after power 10 for 4 min. If more power or time is required, stronger tubes can be used for homogenization. 6. When transferring homogenate to a new 1.5 mL tube, tilt the tube 45 , wiggle the pipette tip down side of the tube into the beads and gently pipette up and down to loosen any material that may have pelleted. Beads will not be sucked up by the pipette but can occasionally block the tip, if this occurs readjust the tip until flowing freely. 7. Samples can be cooled to room temperature by placing on ice for 2–3 min. 8. Mixing by inversion after adding acetone will cause proteins to immediately begin precipitating. Samples can be acetone precipitated up to 2 weeks at 20  C. 9. When removing the final acetone wash, blot tube on clean paper towel after decanting to remove final drops of acetone. 10. Air drying time can be reduced by placing on a heat block at 32  C for 5 min, however, use caution to not overdry pellet— this will result in a pellet that is difficult to resuspend. 11. When adding urea/HEPES pipetting up and down several times helps to start dissolving the pellet. More than 100 μL of 8 M urea/40 mM HEPES may be required if protein content is very high. Add additional 8 M urea/40 mM

Proteomic Profiling of Interplay Between Agrobacterium tumefaciens. . .

285

HEPES by 100 μL increments until sample is completely dissolved. Amount of 50 mM ABC added to samples must be increased accordingly to ensure urea is diluted to 2 M prior to digestion, otherwise proteolytic enzyme will not cleave. 12. Samples should be stored on ice during preparation of STAGE tips (max. of 30 min). Alternatively, if samples need to be stored, flash freeze in liquid nitrogen and store at 80  C for 1 day, any longer and peptide degradation may reduce quality of the sample. 13. Do not pack the STAGE tips too tightly when loading the C18 resin. C18 resin only needs to be tapped down into the tip. If wash steps are taking longer than recommended time to pass through the filter, then the sample and subsequent wash with Buffer A will take very long to pass through. Instead, discard overpacked STAGE tips, remake, and pack lighter. 14. Dried peptides can be stored at 4  C for short-term storage or 20  C for long-term storage. 15. The amount of sample injected into the instrument depends on the reverse-phase column and instrument sensitivity. Parameters should be optimized for each mass spectrometer. 16. For high-resolution mass spectrometry systems (e.g., Thermo Scientific Orbitrap Fusion™ Lumos™ Tribrid™ or Orbitrap Exploris™ or Bruker timsTOF Pro), we recommend a 2 to 3-h gradient for cellular proteome or co-cultured samples. References 1. Chen Q, Davis KR (2016) The potential of plants as a system for the development and production of human biologics. F1000Research 5:912 2. Mclean MD (2017) Trastuzumab made in plants using vivoXPRESS® platform technology. J Drug Des Res 4:1052 3. Robert S, Goulet MC, D’Aoust MA, Sainsbury F, Michaud D (2015) Leaf proteome rebalancing in Nicotiana benthamiana for upstream enrichment of a transiently expressed recombinant protein. Plant Biotechnol J 13: 1169–1179 4. Yu H, Kumar PP (2003) Post-transcriptional gene silencing in plants by RNA. Plant Cell Rep 22:167–174 5. Xu XQ, Pan SQ (2000) An Agrobacterium catalase is a virulence factor involved in tumorigenesis. Mol Microbiol 35:407–414 6. Lee C-W, Efetova M, Engelmann JC, Kramell R, Wasternack C, Ludwig-Mu¨ller J, Hedrich R, Deeken R (2009) Agrobacterium tumefaciens promotes tumor induction by

modulating pathogen defense in Arabidopsis thaliana. Plant Cell 21:2948–2962 7. Grosse-Holz F, Kelly S, Blaskowski S, Kaschani F, Kaiser M, van der Hoorn RAL (2018) The transcriptome, extracellular proteome and active secretome of agroinfiltrated Nicotiana benthamiana uncover a large, diverse protease repertoire. Plant Biotechnol J 16:1068–1084 8. Garabagi F, Gilbert E, Loos A, Mclean MD, Hall JC (2012) Utility of the P19 suppressor of gene-silencing protein for production of therapeutic antibodies in Nicotiana expression hosts. Plant Biotechnol J 10:1118–1128 9. Geddes-McAlister J, Prudhomme N, Gongora DG, Cossar D, McLean MD (2022) The emerging role of mass spectrometry-based proteomics in molecular pharming practices. Current Opinion in Chemical Biology 68. https:// www.sciencedirect.com/science/article/abs/ pii/S1367593122000187?via%3Dihub 10. Thompson A, Scha¨fer J, Kuhn K, Kienle S, Schwarz J, Schmidt G, Neumann T, Hamon

286

Nicholas Prudhomme et al.

C (2003) Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal Chem 75:1895–1904 11. Thompson A, Wo¨lmer N, Koncarevic S, Selzer S, Bo¨hm G, Legner H, Schmid P, Kienle S, Penning P, Ho¨hle C, Berfelde A, Martinez-Pinna R, Farztdinov V, Jung S, Kuhn K, Pike I (2019) TMTpro: design, synthesis, and initial evaluation of a proline-based isobaric 16-plex tandem mass tag reagent set. Anal Chem 91(24):15941–15950. https:// doi.org/10.1021/acs.analchem.9b04474 12. Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteomewide protein quantification. Nat Biotechnol 26:1367–1372 13. Tyanova S, Temu T, Sinitcyn P, Carlson A, Hein MY, Geiger T, Mann M, Cox J (2016) The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods 13:731–740

14. Prudhomme N, Pastora R, Muselius B, McLean MD, Cossar D, Geddes-McAlister J (2021) Exposure of Agrobacterium tumefaciens to agroinfiltration medium demonstrates cellular remodelling and may promote enhanced adaptability for molecular pharming. Can J Microbiol 67(1):85–97. https://doi. org/10.1139/cjm-2020-0239. Epub 2020 Jul 28 15. Prudhomme N, Gianetto-Hill C, Pastora R, Cheung WF, Allen-Vercoe E, McLean MD, Cossar D, Geddes-McAlister J (2021) Quantitative proteomic profiling of shake flask versus bioreactor growth reveals distinct responses of Agrobacterium tumefaciens for preparation in molecular pharming. Can J Microbiol 67(1): 75–84. https://doi.org/10.1139/cjm2020-0238. Epub 2020 Aug 26 16. Cox J, Neuhauser N, Michalski A, Scheltema RA, Olsen JV, Mann M (2011) Andromeda: a peptide search engine integrated into the MaxQuant environment. J Proteome Res 10:1794– 1805

Chapter 20 Label-Free Quantitative Proteomic Profiling of Fusarium Head Blight in Wheat Boyan Liu, Danisha Johal, Mitra Serajazari, and Jennifer Geddes-McAlister Abstract To distinguish protein abundance changes in biological systems under different conditions, mass spectrometry-based proteomics provides a powerful tool to detect and quantify such responses. Improvements in mass spectrometry instrumentation sensitivity and resolution, along with advanced bioinformatics enable new strategies to study host–pathogen interactions. This protocol uses the state-of-the-art MS-based proteomics to assess infection of the global fungal pathogen Fusarium graminearum, on the world-wide cereal crop Triticum aestivum, resulting in the devastating disease of Fusarium head blight (FHB). Here, host infection is mimicked by inoculating F. graminearum onto T. aestivum cultivars (e.g., FHB-resistant and -susceptible) in the growth room under controlled environment, followed by sample harvesting at different time points (e.g., 24 and 120 h post-inoculation) to assess temporal responses to infection. The collected samples are processed using our in-house pipeline for total protein extraction and quantified via label-free methods by liquid-chromatography-coupled with tandem MS/MS. From this experiment, we define dual perspectives of infection considering dynamic protein abundance changes in both the pathogen and host simultaneously, allowing us to identify strategies used by the pathogen to evade the host defense responses and those used by the host to protect from severe infection. Key words Fusarium graminearum, Quantitative proteomics analysis, Triticum aestivum, Label-free quantification (LFQ), Mass spectrometry (MS), Host–pathogen interaction

1

Introduction Fusarium graminearum is a fungal pathogen and the causative agent of the cereal crop disease, Fusarium Head Blight (FHB), which causes millions of dollars in crop losses due to fungal contamination and toxin accumulation each year [1]. The production of secondary metabolites, including the mycotoxin such as deoxynivalenol (DON), reduces grain quality and poses a risk to poultry and livestock industries through the consumption of contaminated feed, as well as induces severe consequences to human health through agriculture run-offs and its presence in food

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_20, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

287

288

Boyan Liu et al.

manufacturing [2]. Current management strategies against FHB include single-dose fungicide treatment at heading, which assists with reducing infection rates. However, this significantly increases the cost to growers and provides little to no protection against the accumulation of mycotoxins if infection occurs. Moreover, the emerging rates of resistance towards antifungal agents, including fungicides, as well as recent population changes of FHB pathogens, present compounding issues [3, 4]. Presently, breeding for FHB-resistant cultivars provides the most effective strategy against infection, but no known cultivar of wheat is completely protected from FHB, and the presence and accumulation of mycotoxins may remain despite resistance to infection [5–7]. These challenges require novel techniques and approaches to devise new strategies to combat fungal infections within agricultural settings on a global scale. To combat FHB, the interaction between host and pathogen requires extensive profiling to comprehensively understand how the two biological systems defend against one another, and to uncover new strategies for overcoming the infection. Recent advances in mass spectrometry (MS)-based proteomics enable studying host–pathogen interactions on a larger scale than before and with in-depth analysis of cellular regulation, extracellular environments, protein–protein interactions, and post-translational modification regulating protein function and signaling [8]. By using bottom-up proteomics, this protocol aims to identify timedependent changes in protein abundance that are associated with F. graminearum-infected T. aestivum samples (i.e. FHB-resistant and -susceptible cultivars) from both host and pathogen perspectives (Fig. 1) with liquid chromatography with tandem mass spectrometry (LC-MS/MS). Overall, this protocol determines pathogen protein targets for inhibition to aid in the discovery of novel prevention methods against the fungal pathogen. It also establishes an optimized workflow for observing interaction between cereal crop host and filamentous fungal pathogen, as it can also be used for other host–pathogen interactions with agricultural relevance.

2

Materials MiliQ water (18 MΩ at 25  C) and analytical grade reagents are used to prepare all solutions; solutions are prepared fresh unless indicated otherwise. Steps are performed at room temperature unless otherwise specified. Always follow relevant waste disposal regulations.

Label-Free Quantitative Proteomic Profiling of Fusarium Head Blight in Wheat

289

Fig. 1 Workflow of label-free quantification method in detecting protein abundance change in F. graminearum infection of T. aestivum with bottom-up proteomic approach. Image produced by biorender.com 2.1 Media for Inoculation

1. Potato dextrose agar plates. 2. Wheat straw media: 5 g of ground wheat straw with 125 mL water in a 250 mL Erlenmeyer. Autoclaved for sterilization. 3. Rotary shaker. 4. Hemocytometer. 5. Desired strain of Fusarium graminearum.

2.2 Inoculation and Harvest

1. Seeds of desired Triticum aestivum cultivars. 2. Pots (one gallon). 3. Growth room with appropriate temperature and moiety. 4. 10 μL micropipette. 5. Plastic bags. 6. Misting bottle. 7. Tags and marker.

2.3 Total Protein Extraction

1. Mortar and pestle (prechilled with liquid nitrogen). 2. Water bath sonicator. 3. Thermal shaker.

290

Boyan Liu et al.

4. 15 mL conical tube. 5. 2.0 mL LoBind microcentrifuge tubes. 6. Tris-HCl buffer (pH 8.5): To 900 mL water, weigh and dissolve 10.90 g Tris. Adjust pH to 8.5 and top up solution to 1 L. Autoclave for sterilization. 7. Proteinase inhibition buffer: To a 15 mL falcon tube, transfer 10 mL Tris-HCl buffer and add 1 tablet of protease inhibitor cocktail to the tube. Vortex to dissolve. 8. 20% sodium dodecyl sulfate (SDS): To 1 L of water. Weigh and dissolve 200 g SDS. Store at room temperature. 9. 1 M dithiothreitol (DTT): To 5 mL water, weigh 0.781 g DTT. Mix properly. Aliquot 100 μL into 1.5 mL sterile microcentrifuge tubes. The aliquoted reagents can be flash frozen with liquid nitrogen and kept at 20  C. Perform all steps in the fume hood. 10. 0.55 M iodoacetamide (IAA): To 20 mL of water, weigh 2.03 g IAA. Mix properly. Aliquot 500 μL into sterile 1.5 mL microcentrifuge tubes. The aliquoted reagents can be flash frozen with liquid nitrogen and kept at 20  C. All steps should be performed in the fume hood and avoid light. 11. Acetone (100% and 80%) stored at

20  C.

12. 8 M urea/40 mM HEPES: To 20 mL water, weigh and dissolve 0.19 g HEPES and 9.6 g urea. The aliquoted reagents can be flash frozen with liquid nitrogen and kept at 20  C. 13. 50 mM ammonium bicarbonate (ABC): To 100 mL water, weigh and dissolve 0.40 g of NH4HCO3. Store at room temperature. 14. Lys-C/trypsin mixture: reconstitute according to manufacturer’s instructions. 2.4 Stop-and-Go Extraction Tips (STAGE-Tip) Desalting

1. 10 mL syringe (cut off the needle). 2. PCR strip tubes and caps. 3. 200 μL sterilized pipette tips. 4. C18 resin. 5. Stopping solution: 20% acetonitrile (ACN), 6% trifluoroacetic acid (TFA). 6. 100% acetonitrile: store at room temperature. 7. Buffer A: 2% ACN, 0.1% TFA, 0.5% acetic acid (all in v/v). 8. Buffer B: 80% ACN, 0.5% acetic acid. 1. High-resolution mass spectrometer (e.g., Thermo Scientific Orbitrap Exploris 240).

Label-Free Quantitative Proteomic Profiling of Fusarium Head Blight in Wheat

291

2.5 Liquid Chromatography with Tandem Mass Spectrometry (LC-MS/ MS)

2. Buffer A (as in Subheading 2.4).

2.6

1. Data analysis software tools (e.g., MaxQuant and Perseus).

Bioinformatics

3. Determined acetonitrile gradient in 0.5% acetic acid.

2. FASTA files from proteome database (e.g. UniProt).

3

Methods All steps performed at room temperature unless otherwise specified.

3.1 Culturing Fusarium graminearum and Inoculating Triticum aestivum

1. FHB-susceptible (e.g., Norwell) and -resistant (e.g., Sumai 3) spring wheat genotypes are seeded in 21/18  C environment with 16/8 photoperiod, in 15-cm pots (one gallon/pot) until the anthesis stage. Water as needed. 2. The desired strain of F. graminearum is cultured onto potato dextrose agar (PDA) plate at room temperature under dark conditions. Allow the fungi to grow for 5 days. 3. A macroconidia suspension is made by transferring four PDA plugs (see Note 1) to the autoclaved water with wheat straw media and incubated on a rotary shaker (120 rpm) at 25  C for 2 weeks. 4. With a hemocytometer, the macroconidia are counted and diluted with sterile water to 4000 macroconidia/1 μL. This will be the inoculum. Meanwhile, sterilized straw media without mycelium culture is used as the mock inoculum. 5. The fungal inoculum and mock inoculum are inoculated into wheat florets using the point-inoculation method [9]: separate the Palea and Lemma of the spike, then insert the tip of 10 μL pipette (Fig. 2a) between the two structures and inject 10 μL inoculum (see Note 2). 6. The newly inoculated heads are misted with water (see Note 3) and covered with a plastic bag to promote a humid environment. The plants are grown at 27/22  C with 16/8 h photoperiod in a growth chamber. 7. Wheat heads are harvested after 24 and 120 h post inoculation (hpi), or desired time points. The samples are flash frozen with liquid nitrogen in aluminum bags. Store the samples at 80  C until protein extraction step.

292

Boyan Liu et al.

Fig. 2 Diagram showing the inoculation site on wheat head. Left: infected wheat head showing Fusarium head blight symptoms. Right: diagram showing inoculation site on wheat head. M0 represents the inoculated spike. M+1 is the row above the infected site; M-1 is the row below the infected site. Note that the aerial mycelia expand to M-1 and M+1, as these rows should be considered if more florets are needed for protein extraction. Left image taken by Mitra Serajazari 3.2 Tissue Disruption and Protein Extraction

1. Approximately 100 μg of total protein is required for LC-MS/ MS analysis. Depending on the wheat plant used, 3 florets are obtained from each wheat head to reach this concentration (see Note 4, Fig. 2). With a precooled mortar and pestle in the presence of liquid nitrogen, grind florets into a fine powder (see Note 5). 2. The chilled powder is transferred to a 2 mL LoBind microcentrifuge tube (see Note 6) and 300 μL of cold 100 mM Tris-HCl, pH 8.5 with a proteinase inhibitor cocktail tablet is added to the samples. 3. SDS (20%) is added to a final concentration of 2%. 4. The mixture is sonicated for 30 s on/30 s off for 2–5 cycles using the 4  C water bath sonicator (120 Watt). 5. Vortex the samples briefly. 6. Briefly centrifuge the samples to collect all liquid at the bottom. 7. 1 M stock DTT was added to make a final concentration of 10 mM DTT. 8. Vortex samples briefly. 9. Incubate in thermal shaker at 98  C for 10 min with 800 rpm rotation. 10. Sample is cooled to room temperature (see Note 7) before adding 1:10 volume of 55 mM IAA (final concentration of 5 mM IAA). Avoid exposure to light. 11. Incubate samples in the dark at room temperature for 20 min.

Label-Free Quantitative Proteomic Profiling of Fusarium Head Blight in Wheat

293

12. Plant debris is filtered with a 40 μm pore-size filter (see Note 8) and the supernatant is transferred into a clean 2.0 mL LoBind tube. 13. Sample is centrifuged 9000  g at 4  C for 10 min to pellet any remaining debris and the supernatant is transferred to a new tube. 14. 100% ice cold acetone is added to the sample to a final concentration of acetone at 80% (v/v). 15. Incubate samples at 3.3 Protein Solubilization, Quantification, and Digestion

20  C overnight.

1. Sample is centrifuged at 16,000  g and 4  C for 10 min to pellet precipitate. 2. Discard the supernatant. 3. The pellet is washed with 500 μL of 80% ice cold acetone. 4. Repeat step 3 twice. 5. Let the pellet dry completely (see Note 9). 6. Pellet is resuspended in 200 μL of 8 M urea/40 mM HEPES (see Note 10). 7. Sonicate the samples in a 4  C water bath for 5–7 cycles (30 s on/30 s off) to dissolve pellets completely. Vortex in between intervals. 8. The sample protein concentration is measured with a protein concentration assay (e.g., BSA urea assay) (see Note 11). 9. ABC (50 mM) is added to adjust the final urea concentration in each sample to 2 M. 10. Lys-C and trypsin protease mixture is added to the samples in 1:50 enzyme-to-protein ratio (μg). 11. Digest samples at room temperature overnight (see Note 12).

3.4 Preparing and equilibrating the C18 Stop-and-Go Extraction Tips (STAGE-Tip)

1. Pack 3 layers C18 resin to assemble the STAGE-tip in a 200 μL micropipette tip. 2. Add 100 μL acetonitrile to activate the resin. 3. The liquid is removed by centrifuging at 1000  g for 2 min. 4. Add 50 μL Buffer B to equilibrate the tip. 5. Centrifuge at 1000  g for 30 s to 1 min. 6. Add 200 μL Buffer A to equilibrate the tip. 7. Centrifuge at 1000  g for 2 min.

3.5 STAGE-Tip Samples

1. Add 1/10 of stopping solution (v/v) to quench sample peptide digestion (from Subheading 3.3, step 11). 2. Centrifuge at 16,000  g to pellet precipitate. Pellet can be discarded.

294

Boyan Liu et al.

3. Transfer the supernatant to the equilibrated STAGE-tip from Subheading 3.4 (see Note 13). 4. Centrifuge STAGE-tip at 1000  g for 3–5 min, or until the sample liquid has moved through the tip (see Note 14). 5. Wash the tip with 200 μL Buffer A. 6. Centrifuge at 1000  g for 2 min to allow buffer pass through the tip. 7. Add 50 μL Buffer B to each tip. Apply pressure to the liquid with the syringe to elute sample. 8. The eluted sample is collected in a 0.2 mL PCR strip tube (see Note 15). 9. Eluted sample is lyophilized in a vacuum centrifuge (45  C for 30–40 min). 3.6 LC-MS/MS Analysis

1. Peptide is reconstituted in 10 μL Buffer A. 2. Depending on instrument specifications, the required amount of peptide is loaded onto the mass spectrometer. 3. For a high-resolution mass spectrometer (e.g., Thermo Scientific Exploris 240) samples are loaded onto a nanoflow LC system for separation along a reverse-phase silica column. Electrospray is performed over a 3 h gradient of 4–30% ACN in 0.1% formic acid at a constant flow rate followed by a 5 min wash with 95% ACN.

3.7 Proteome Data Analysis

1. Analyze the obtained MS spectra. We routinely use MaxQuant [10] for spectra analysis and Perseus for data processing and visualization [11]. 2. In MaxQuant, we load the .RAW files and set the digestion and instrument options according to the protein extraction experiment. Default settings are selected with the following exceptions (see Note 16). Under the “Group-specific parameters,” we choose the “LFQ” option under the “label-free quantification” section, and the min. Ratio count is set to “1.” To identify peptide, we import the proteome FASTA files of F. graminearum and T. aestivum from database (e.g., UniProt; https://www.uniprot.org/proteomes/) into MaxQuant under the “Global parameters.” The “Min. peptides” under “Identification” section are set 2. Scroll to the bottom of the section, and tick the “Match between runs” option, with the false discovery rate set to 0.01. Adjust the number of processors as desired and press “start.” 3. In Perseus, we load the “proteinGroups.txt” file for total proteome analysis (upload columns named “LFQ intensities” to “Main”) into Perseus. First, rows are filtered by contaminants, reverse peptides, and only modified by site, or other categories

Label-Free Quantitative Proteomic Profiling of Fusarium Head Blight in Wheat

295

as desired. Next, transform the datasets by log2, and select “categorical annotations” to annotate to each column by experiment. Filter for valid values (>50%) or according to the number of samples. Then, impute values based on normal distribution (as appropriate). Finally, the data can be processed as preferred. We provide a representative sample of the proteome data set in Fig. 3.

4

Notes 1. Transfer the selected plugs (based on aerial mycelia) with sterile straws and sticks, or with the bottom of 1 mL pipette tips. 2. The inoculum should be mixed completely, as the macroconidia may not distribute evenly in suspension. Each inoculated floret may be marked (e.g., marker or tied with a string). This is especially important for the 24 hpi and FHB-resistant samples as they have relatively weaker symptoms than 120 hpi and FHB-susceptible samples. 3. Keep 10–20 cm between the spray head and wheat head as the macroconidia may be washed off. Cover the inoculated wheat head with a plastic bag to ensure a humid environment for promotion of fungal growth and infection. 4. To select infected samples, the rows from M-1 to M+1 (Fig. 2) should be considered, as the aerial mycelia usually extend to these rows. While processing the samples, they should be kept on ice if possible. 5. When switching between biological samples (e.g., mock inoculated replicates), the mortar and pestle are cleaned with 70% ethanol and sterile water. When switching between different biological samples (e.g., 120 hpi FHB-susceptible vs. 120 hpiresistant samples), wash with detergent, water, and 70% ethanol. Alternatively, freshly autoclaved mortar and pestles can be used in between sample sets. A chilled Bullet Blender (e.g. Bullet Blender® Storm) can also be used as an alternative method to mortar and pestle. 6. Mortar and the microcentrifuge tube should be cold for easier transfer of the sample powder. A liquid nitrogen-chilled spatula can also assist in the process. 7. Place the samples on ice for faster cooling. 8. We used filters such as cell strainers to separate debris and lysate. The crude sample is transferred into a cell strainer and centrifuge at the lowest speed for less than 10 s to collect liquid to avoid strainer rupture. Solid debris should be discarded according to waste disposal regulations.

296

Boyan Liu et al.

Fig. 3 Sample proteome dataset from infection of susceptible and resistant wheat cultivars with Fusarium graminearum. (a) Venn diagram showing the number of pathogen proteins in infectome (left) and cell (right). (b) Volcano plot showing protein abundance change. Left: susceptible 120 hpi inoculated sample; Right: resistant 120 hpi inoculated sample. (c) principal component analysis plot

9. Heat the opened microcentrifuge tubes at 37  C for faster drying rate. Alternatively, the samples can be left in fume hood for 30 min.

Label-Free Quantitative Proteomic Profiling of Fusarium Head Blight in Wheat

297

10. Extend the sonication cycle to resolubilize the entire protein pellet. Longer times and additional urea/HEPES buffer can be added for larger pellets. 11. We recommend 50–100 μg protein for total proteome analysis. 12. Any remaining sample in ABC can be flash- frozen in liquid nitrogen and stored at 20  C for a maximum of 1 week. 13. The maximum volume for the STAGE-tip used in our procedure is about 200 μL. If the sample volume exceeds this, repeat step 2 to purify the sample. 14. We suggest to centrifuge at room temperature, as precipitate may be present and prevent the flow in STAGE-tips at 4  C. For a clogged STAGE-tip, this centrifuging step may take up to 30 min. 15. First elute 25 μL then another 25 μL Buffer B to ensure all samples are collected from the resin. Or elute 50 μL slowly into a PCR tube. 16. Default settings: variable modification is “Oxidation (M)” and “Acetyl (Protein N-term)”; fixed modification is “Carbamidomethyl (C)”; digestion mode is “Trypsin/P.” References 1. Xia R, Schaafsma A, w., Wu F, Hooker D c. (2020) Impact of the improvements in Fusarium head blight and agronomic management on economics of winter wheat. World Mycotoxin J 13:423–439. https://doi.org/ 10.3920/WMJ2019.2518 2. Tamburic-Ilincic L, Wragg A, Schaafsma A (2015) Mycotoxin accumulation and Fusarium graminearum chemotype diversity in winter wheat grown in southwestern Ontario. Can J Plant Sci. https://doi.org/10.4141/cjps2014-132 3. Geddes-McAlister J, Shapiro RS (2019) New pathogens, new tricks: emerging, drugresistant fungal pathogens and future prospects for antifungal therapeutics. Ann N Y Acad Sci 1435:57–78. https://doi.org/10.1111/nyas. 13739 4. Valverde-Bogantes E, Bianchini A, Herr JR et al (2020) Recent population changes of Fusarium head blight pathogens: drivers and implications. Can J Plant Pathol 42:315–329. https://doi.org/10.1080/07060661.2019. 1680442 5. Buerstmayr M, Steiner B, Buerstmayr H (2020) Breeding for Fusarium head blight resistance in wheat—progress and challenges. Plant Breed 139:429–454. https://doi.org/ 10.1111/pbr.12797 6. Bai G, Shaner G (2004) Management and resistance in wheat and barley to Fusarium

head blight. Annu Rev Phytopathol 42: 135–161. https://doi.org/10.1146/annurev. phyto.42.040803.140340 7. Fabre F, Bormann J, Urbach S et al (2019) Unbalanced roles of fungal aggressiveness and host cultivars in the establishment of the Fusarium head blight in bread wheat. Front Microbiol 10:2857. https://doi.org/10.3389/ fmicb.2019.02857 8. Aebersold R, Mann M (2016) Massspectrometric exploration of proteome structure and function. Nature 537:347–355. https://doi.org/10.1038/nature19949 9. Geddes J, Eudes F, Laroche A, Selinger LB (2008) Differential expression of proteins in response to the interaction between the pathogen Fusarium graminearum and its host, Hordeum vulgare. Proteomics 8:545–554. https:// doi.org/10.1002/pmic.200700115 10. Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteomewide protein quantification. Nat Biotechnol 26:1367–1372. https://doi.org/10.1038/ nbt.1511 11. Tyanova S, Temu T, Sinitcyn P et al (2016) The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods 13:731–740. https://doi.org/10. 1038/nmeth.3901

Chapter 21 DIA Proteomics and Machine Learning for the Fast Identification of Bacterial Species in Biological Samples Florence Roux-Dalvai, Mickae¨l Leclercq, Clarisse Gotti, and Arnaud Droit Abstract Identification of bacterial species in biological samples is essential in many applications. However, the standard methods usually use a time-consuming bacterial culture (24–48 h) and sometimes lack in specificity. To overcome these limitations, we developed a new protocol, combining LC-MS/MS analysis in Data Independent Acquisition mode and machine learning algorithms, enabling the accurate identification of the bacterial species contaminating a sample in a few hours without bacterial culture. In this chapter, we describe the three steps of the protocol (spectral libraries generation, training step, identification step) to generate customized peptide signatures and use them for bacterial identification in biological samples through targeted proteomics analyses and prediction models. Key words Bacterial identification, LC-MS/MS, Data independent acquisition, Machine learning, Peptide signature

1

Introduction Detection and identification of bacterial species in biological samples are essential in many fields of microbiology, such as epidemiology, food safety, environment, and clinics. However, the standard methods for the identification of microorganisms usually require a time-consuming bacterial culture (typically of 24–48 h but could extend to weeks) followed by another long step of immunological or biochemical tests. In the past few years, MALDI-TOF Mass Spectrometry has made a breakthrough for bacterial identification in routine laboratories by replacing the standard identification tests,

The original version of this chapter was revised. The correction to this chapter is available at https://doi.org/ 10.1007/978-1-0716-2124-0_25 Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_21, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022, Corrected Publication 2023

299

300

Florence Roux-Dalvai et al.

nevertheless, the bacterial culture is still required. Moreover, this method lacks specificity for certain species and is not quantitative. To overcome these limitations, we developed a new strategy that combines high sensitivity, high specificity Liquid Chromatographytandem Mass Spectrometry (LC-MS/MS) and artificial intelligence, which enabled the identification of the bacterial species contaminating a biological sample in a few hours without bacterial culture. The proposed protocol is based on a published study [1], where we proposed a fast and accurate method to identify, with more than 95% accuracy, the 15 bacterial species most commonly found in Urinary Tract Infections (UTI). It can be reapplied on any type of biological sample contaminated by a microorganism at a concentration meeting the sensitivity of the mass spectrometer used. The procedure described below includes three steps (see Fig. 1). The first one consists in the generation of large bacterial spectral libraries for each bacterial species of interest to be included in the final model. Each species is individually cultured to generate a spectral library by LC-MS/MS analysis in Data Dependent Acquisition (DDA) mode. This library will be further used to identify the bacterial peptides easily detectable in the biological background. The second step is a training step, where each bacteria of the model is individually inoculated in the biological background of interest. Multiple replicates of inoculation must be prepared for each species to ensure a high accuracy of the final model. The samples are then analyzed by LC-MS/MS in Data Independent Acquisition (DIA) mode and the detectable peptides are identified through the use of the Skyline software [2] with the spectral libraries previously prepared. LC-MS/MS-DIA has been chosen for its capacity to identify and quantify proteins in complex biological samples with a high reproducibility. The final list of identified peptides in each sample replicate is then given to a recently published machine learning tool, BioDiscML [3], in order to identify a peptide signature able to discriminate between all the bacterial species of interest. Finally, in a third step, the peptide signature is monitored by targeted proteomics analysis (here we present a protocol for Parallel Reaction Monitoring–PRM) on unknown samples and the list of detected and non-detected peptides is given to a prediction model derived from machine learning to obtain a bacterial identification. Once the two first steps have been performed, the third one can be used for validation on known samples and endlessly repeated to make class prediction on unknown samples. Using the model developed for the UTI, we are able to identify which bacterial species is contaminating the sample in less than 4 h. Among this time, three and half hours are dedicated to sample preparation that can be parallelized. LC-MS/MS analysis is

DIA Proteomics and Machine Learning for the Fast Identification of. . .

301

Fig. 1 Overview of the three steps workflow. The two first steps (spectral libraries and training) are performed only once to generate a peptide signature, while the third step (identification) can reuse this signature indefinitely to obtain predictions on unknown samples

performed in half an hour and a couple of minutes are needed for the machine learning prediction. The time required for bacterial identification using a newly developed model will depend on the background sample (some additional “clean up” steps might be added to ensure a sensitive detection) and on the type and the number of bacterial species to be detected (large signatures may require an extension of the LC-MS/MS runtime to ensure an accurate measurement of peptide signal in targeted analysis).

302

Florence Roux-Dalvai et al.

With this protocol, we were able to obtain a bacterial identification from 1 mL of inoculated urine at 1  104 CFU/mL with 95% accuracy. However, the sensitivity of the detection is dependent on several factors, including the type and the number of bacteria, the volume of available samples, and the mass spectrometer used. An upstream short broth culture could be added to enrich the sample in bacterial material and therefore improve the limit of detection. Although the protocol presented here used a single mass spectrometer (an Orbitrap Fusion Tribrid able to operate in DDA, DIA, and PRM modes), once the model developed, the monitoring of the signature is easily transferable to other spectrometers, including triple quadrupoles working in Selected Reaction Monitoring (SRM), making this method ideal for large scale screening in routine laboratories.

2

Materials

2.1 Sample Preparation

1. Bacterial cell lines: This protocol is applicable for the proteomic analysis of any broth culturable bacterial species. 2. Culture medium: Use the culture media appropriated to the chosen bacterial cell lines. 3. 50 mM Tris 4. 50 mM Ammonium bicarbonate 5. Mutanolysin solution: Mutanolysin from Streptomyces globisporus (Sigma, M9901) at 50 U/μL in 50 mM ammonium bicarbonate. 6. SDC solution: 5% Sodium deoxycholate. 7. DTT solution: 2 M 1,4-Dithiothreitol. 8. Bradford Protein Assay reagent. 9. Trypsin solution: resuspend a 20 μg vial of Sequencing Grade Modified Trypsin (Promega, V5111) in 100 μL of 50 mM ammonium bicarbonate. 10. 50% formic acid (FA) 11. Oasis HLB 1 cc Vac Cartridge, 10 mg Sorbent (Waters). 12. Acetonitrile (Oasis HLB conditioning solution). 13. 0.1% formic acid (FA) (Oasis HLB equilibration and washing solution) 14. 70% acetonitrile (Oasis HLB elution solution) 15. 1.5 mL Low protein binding microtubes 16. Benchtop microcentrifuge with the appropriate rotor to centrifuge 1.5 mL microtubes at 10,000  g.

DIA Proteomics and Machine Learning for the Fast Identification of. . .

303

17. Laboratory incubator (for 37  C incubation). 18. Microtube dry block heater (for 95  C heating). 19. BioruptorⓇ Plus (Diagenode). 20. Centrifuge vacuum concentrator. 2.2 High-pH Reversed-Phase HighPressure Liquid Chromatography

1. High-pressure liquid chromatography system with fraction collector (e.g., Agilent 1200 series HPLC). 2. Agilent extend 1.0 mm I.D., 150 mm length, 3.5 μm particles C18 column or equivalent. 3. HpH buffer A: 10 mM ammonium bicarbonate pH 10. 4. HpH buffer B: 90% acetonitrile, 10% ammonium bicarbonate pH 10. 5. 96 well plates compatible with the fraction collector.

2.3 NanoLC-MS/MS Analysis

1. NanoDrop 2000 spectrophotometer. 2. iRT Kit (Biognosys). 3. Orbitrap Fusion Tribrid mass spectrometer equipped with a nanospray source and interfaced with a U3000 RSLCnano liquid chromatography system (Thermo Fisher Scientific), or equivalent LC-MS/MS system able to acquire data in DDA and DIA modes. The same or another LC-MS/MS system able to work in targeted mode (SRM or PRM) can be used for the ‘Identification step’. 4. NanoLC Loading buffer: 2% (v/v) acetonitrile, 0.05% (v/v) trifluoroacetic acid in LCMS-grade water. 5. NanoLC buffer A: 0.1% (v/v) formic acid in LC-MS-grade water. 6. NanoLC buffer B: 80% (v/v) acetonitrile, 0.1% (v/v) formic acid in LCMS-grade water. 7. Acclaim™ PepMap™ 100 C18, 5 mm length, 1 mm I.D., 5 μm particles, trap cartridge (Thermo Fisher Scientific, No. 160434) for online desalting with NanoLC Loading buffer. 8. Acclaim™ PepMap™ 100 C18, 500 mm length, 75 μm I.D., 3 μm particles, analytical column, NanoViper (Thermo Fisher Scientific, No. 164570).

2.4

Software

1. Java version 1.8 (http://java.com). 2. R 3.5 or above (https://www.r-project.org/). 3. BioDiscML to be downloaded here: https://github.com/ mickaelleclercq/BioDiscML

304

Florence Roux-Dalvai et al.

(This protocol has been designed with BioDiscML-1.8.7, but the last version, with the last improvements, may work properly) 4. Mascot (Matrix Science, UK) or another database search engine. 5. Skyline 4.1 or above (www.skyline.ms).

3

Methods Experiments involving pathogenic bacterial species should be performed under microbiological hood until the bacterial inactivation step.

3.1 Spectral Libraries: Sample Preparation

The following steps should be done for each bacterial species to be detected in the final model. 1. Prepare a semi-log broth bacterial culture containing about 1  109 CFU/mL of bacterial cell (see Note 1). 2. Centrifuge 1 mL of the bacterial culture at 10,000  g for 15 min at room temperature. 3. Discard the supernatant and wash the pellet by resuspension with 1 mL 50 mM Tris. 4. Repeat steps 2 and 3. 5. Centrifuge a third time in the same conditions, discard the supernatant. The dried pellet can be directly used or stored at 20  C. 6. Resuspend each bacterial pellet with 120 μL of 50 mM ammonium bicarbonate and 12 μL of mutanolysin solution (corresponds to 600 enzymatic units) to help the bacterial lysis by digestion of cell wall peptidoglycan. Incubate at 37  C for 1 h (see Notes 2 and 3). 7. Add 15 μL of SDC solution (final concentration: 0.5%) and 1.5 μL of DTT solution (final concentration: 20 mM) and vortex for 30 s (see Note 4). 8. Perform protein denaturation and bacterial inactivation by heating the sample at 95  C for 10 min (see Note 5). 9. Sonicate the sample with a contactless system. We use the BioruptorⓇ Plus (Diagenode) with the following settings: 15 cycles of 30 s ON/ 30 s OFF at high level (see Note 6). 10. Centrifuge at 16,000  g for 15 min to remove cell debris. Collect supernatant. 11. Determine protein concentration using a Bradford assay or comparable quantification method.

DIA Proteomics and Machine Learning for the Fast Identification of. . .

305

12. Pipet the volume corresponding to 120 μg of proteins and adjust the volume to 100 μL with 50 mM ammonium bicarbonate containing 0.5% SDC. 13. Digest the proteins by addition of 12 μL of trypsin solution (corresponding to 1:50 enzyme:protein ratio). Vortex 30 s and incubate 1 h at 37  C. 14. Stop trypsin reaction by acidification with 40 μL of 50% FA (see Note 7). 15. Centrifuge the sample at 16,000  g for 5 min and collect the supernatant. 16. Purified the peptides on Oasis HLB cartridge 10 mg according to the manufacturer generic procedure with the following solutions: Conditioning solution: Acetonitrile (2 mL); Equilibration solution: 0.1% FA (3 mL); Washing solution: 0.1% FA (3 mL); Elution solution: 70% acetonitrile, 0.1% FA (200 μL) 17. Vacuum dry the eluted peptides. The sample can be stored at 20  C or used immediately. 18. For high-pH reverse phase peptide fractionation, resuspend the peptides in HpH buffer A. 19. Load the peptide sample at 1 mL/min of HpH buffer A on a HPLC system equipped with a C18 column. 20. Elute with a HpH buffer B gradient according to the parameters in Table 1. Collect the fractions during the elution at 1 min intervals in a 96 well plate. 21. Pool the fractions of each row of the plate to obtain a total of 8 fractions (see Note 8). 22. Vacuum dry the fractions and store them at 20  C until the LC-MS/MS analysis. 3.2 Spectral Libraries: DDA LC-MS/ MS Analysis

The details of the LC-MS/MS methods will depend on the instrumentation available. Here, we describe a method using an U3000 RSLCnano liquid chromatography system interfaced to an Orbitrap Fusion Tribrid mass spectrometer (Thermo Fisher Scientific). 1. Resuspend each fraction sample from each bacterial species with 25 μL of NanoLC loading buffer. 2. Determine the peptide concentration in each fraction by 205 nm absorbance measurement in 1 μL of sample using a NanoDrop spectrophotometer (see Note 9). 3. Adjust the peptide concentration in each fraction at 0.2 μg/μL using NanoLC loading buffer. 4. For each fraction, transfer 5 μL of peptide solution (equivalent 1 μg) in an injection vial and add 0.5 μL of iRT 1.

306

Florence Roux-Dalvai et al.

Table 1 Chromatography analysis settings High pH HPLC fractionation

NanoLC

HPLC

Agilent 1200 Series

UltiMate 3000 NanoRSLC (Thermo)

Peptide quantity injected

120 μg

5 μg

Sample loading Pre-column

μ-Precolumn, 300 μm i.d  5 mm, C18 PepMap100, 5 μm, 100 Å (Thermo)

Flow rate

20 μL/min

Loading buffer

2% acetonitrile/0.05% trifluoroacetic acid

Peptide separation Column

Agilent extend C18 (1.0 mm  150 mm, 3.5 μm)

PepMap100 RSLC, C18 3 m, 100 Å, 75 μm i. d., 50 cm (Thermo)

Flow rate

1 mL/min

300 nL/min

Buffer A

10 mM ammonium bicarbonate pH 10 0.1% formic acid

Buffer B

90% acetonitrile/10% ammonium bicarbonate pH 10

80% acetonitrile in 0.1% formic acid

Gradient

5–35% solvent B during 60 min 35–70% solvent B during 24 min

5–40% solvent B during 90 min

5. Inject all the volume of each vial (5.5 μL) on the LC-MS/MS system and use the liquid chromatography parameters described in Table 1 (column NanoLC) with the mass spectrometer operating in Data Dependent Acquisition (DDA) mode (see Table 2). 6. Store the .raw files for further analysis. 3.3 Spectral Libraries: Bioinformatic Treatment

Prior to generation of the spectral library, DDA analyses must be searched against protein databases using a database search engine such as Mascot, X!Tandem, or Sequest HT. The search parameters (notably the search tolerance) must be adapted to the instrument and acquisition parameters used. Here, we present a database search performed with Mascot search engine using the DDA raw files generated at Subheading 3.2, step 6. 1. For each bacterial species, make a database search for the eight fractions all together with the following parameters: Enzyme: Trypsin, Variable modifications: Methionine oxidation (see Note 10), Maximum Missed Cleavages: 2, MS mass tolerance: 10 ppm, MS/MS mass tolerance: 25 ppm (see Note 11).

DIA Proteomics and Machine Learning for the Fast Identification of. . .

307

Table 2 Mass spectrometer acquisition parameters DDA

DIA

PRM

Instrument

Orbitrap fusion

Orbitrap fusion

Orbitrap fusion

Method duration

120 min

120 min

120 min

Polarity

Positive

Positive

Positive

Analyzer

Orbitrap

Orbitrap

Resolution

120 K

60 K

Mass range

350–1800

400–1000

AGC

4.00E+05

4.00E+05

Max injection time (ms)

50

50

Isolation window

1.6

10 (40 windows, 400–800)

0.7

Activation type

HCD

HCD

HCD

Collision energy

35%

35%

35%

Analyzer

Orbitrap

Orbitrap

Orbitrap

Resolution or scan rate

15 K

30 K

30 K

AGC

5.00E+04

4.00E+05

5.00E+04

Max injection time (ms)

22

70

120

Data dependent MS2 parameters

Most intense precursor with intensity greater than 5000, top speed 3 s, dynamic exclusion 30 s with 10 ppm tolerance.

Internal calibration

Lock mass 445.12003

Lock mass 445.12003

Lock mass 445.12003

MS parameters

MS/MS parameters

2. Install the Skyline software tool. 3. In Skyline, go to Settings/PeptideSettings/Library/Build. . . . 4. Enter a name for your spectral library and the path where to store it.

308

Florence Roux-Dalvai et al.

5. Set the cut-off score to 0.95 and choose Biognosys-10 as iRT standard peptides. 6. Click on the Next button and then on the Add Files button to select all the results files (i.e., Mascot .dat files) from each single species (see Note 12). 7. Click on the Finish button. Skyline will then download all spectra contained in the results files. 8. A .blib file containing the final spectral library will then be created by Skyline, which can be browsed through the View/ Spectral libraries menu. The spectral library will also be listed in Settings/PeptideSettings/Library tab and must be checked to be used. 9. Repeat the steps 3–8 for all the bacterial species to be included in the final model. 3.4 Training Step: Sample Preparation of Bacterial Inoculates

Since the objective is to identify bacterial contamination in a biological background using machine learning-defined peptide signature, the model training has to be done on the biological sample inoculated with the different species of bacteria to be detected. To this purpose, bacterial inoculation must be prepared in the background of interest. 1. Prepare a semi-log broth bacterial cultures for each species as in Subheading 3.3, step 1. 2. For each species, prepare at least 10 biological replicates by inoculating about 1  106 CFU of bacterial cells in 1 mL of sample background (see Notes 13 and 14). 3. Centrifuge the inoculates at 10,000  g for 15 min to collect bacterial cells (see Note 15). 4. Since the pellet is usually invisible, discard the supernatant while taking care to not touch the bottom of the tube. 5. Wash the pellets by resuspension with 1 mL 50 mM Tris and centrifuge again at 10,000  g for 15 min. Discard the supernatants. 6. Resuspend each bacterial pellet with 46 μL of 50 mM ammonium bicarbonate and 1 μL of mutanolysin solution (50 U). Incubate at 37  C for 1 h. 7. Add 2.5 μL of SDC solution (final concentration 0.5%) and 0.5 μL of DTT solution (final concentration 20 mM) and vortex for 30 s. 8. Perform bacterial inactivation by heating the sample 10 min at 95  C (see Note 5). 9. Sonicate the sample with a contactless system. We use the BioruptorⓇ Plus (Diagenode) with the following settings: 15 cycles of 30 s ON/ 30 s OFF at high level (see Note 6).

DIA Proteomics and Machine Learning for the Fast Identification of. . .

309

10. Centrifuge at 16,000  g for 15 min to remove cell debris. Collect supernatant. 11. Digest the protein extract by addition of 1 μL of trypsin solution (200 ng). Incubate 1 h at 37  C. 12. Stop trypsin reaction by acidification with 20 μL of 50% FA (see Note 7). 13. Centrifuge the sample at 16,000  g for 5 min and collect the supernatant. 14. Purify on StageTips containing 3 M Empore C18 reverse phase as described in Rappsilber et al. [4] (see Note 16). 15. Vacuum dry the StageTip eluates and store at 20  C until LC-MS/MS analysis. 3.5 Training Step: DIA LC-MS/MS Analysis

As for the spectral library fractions, the LC-MS/MS methods will depend on the instrumentation available. However, DIA analyses must be performed on the same instrumentation than the DDA analyses before. 1. Resuspend each inoculated sample in 10 μL NanoLC loading buffer. 2. Transfer 5 μL of peptide solution (equivalent 1 μg) in an injection vial and add 0.5 μL of iRT 1. 3. Inject all the volume of each vial (5.5 μL) on the LC-MS/MS system and use the liquid chromatography parameters described in Table 1 (column NanoLC) with the mass spectrometer operating this time in Data Independent Acquisition (DIA) mode (see Table 2). 4. Store the .raw files for further analysis.

3.6 Training Step: DIA Signal Extraction

For each bacterial species, the signal of all the peptide identifications contained in the spectral library will be extracted in the DIA raw files. Since iRT peptides have been added in the samples, Skyline will use the Retention time predictor it created from each library to extract the peptide signals at their predicted retention time in DIA files. 1. First select the appropriate settings in the Peptide settings and Transition settings menus as listed in Table 3 (use only the settings in black). 2. Click on View/Spectral Libraries and scroll down to the library of the first species. Make sure that the Associate proteins box is unchecked, then click on the Add All. . . button to add all the peptides in the Targets panel on the left of the main page. Click on Refine/Add decoys. . . to add decoy peptides that will be further used for False Discovery Rate filtering. Keep the

310

Florence Roux-Dalvai et al.

Table 3 Skyline parameter settings. Parameters in red are those to be changed for the PRM analysis in the identification step Pepde Sengs

Transion Sengs

Digeson Enzyme Trypsin [KR|P] Max missed cleavages 0 Background Proteome None Predicon Retenon me predictor Choose your library Use measured retenon mes Checked when present

MS1 filtering

Unchecked

Isotope peaks included

None

2, 3 1, 2 y, b

MS/MS filtering Acquision method Product mass analyzer Isolaon Scheme Resolving power

DIA / Targeted Orbitrap DIA 10m/z 30000 / 15000 200m/z

from

ion 3

At

to

last ion

Use high selecvity extracon Checked

Special ions

Blank

Use only scan within xx minutes of predicted RT

Checked

Include all matching scans

Library Choose your library

Pick pepdes matching

Library and filter

Rank pepdes by

Picked intensity

Modificaons Structural modificaons 0

Max neutal losses

None

Checked

None

Max variable mods

Compensaon voltage

8 60 0

2 min

Ion mobility predictor

Library

Instrument Min m/z 50 Max m/z 2000 Dynamic min product m/z Unchecked Method match tolerance m/z 0.055m/z Full-Scan

Use opmizaon values when present Filter Precursor charges Ion charges Ion Types Product ion selecon

Time window Filter Min length Max length Exclude N-terminal Aas Auto-select all matching pepdes

Predicon Precursor mass Monoisotopic Product ion mass Monoisotopic Collision energy None Declustering potenal None Opmizaon library None

Use DIA precursor window for exclusion Auto-select all matching transions Library Ion match tolerance If a library spectrum is available, pick its most intense ions

1 Quanficaon

Regression Fit

None

Normalizaon Method

None

Regression Weighng MS Level

None All

Pick

10 min

Checked 0.5m/z Checked 6 product ions 0 / 3 minimum product ions

From filtered ion charges and Checked / Unckecked types From filtered ion charges and types plus filtered product Unchecked ions From filtered product ions Unchecked / Checked

number of decoy as identical to the number of targets as suggested by Skyline. 3. Save your Skyline by clicking on File/Save as... 4. Import the DIA raw files by clicking on File/Import results. Select only the raw files corresponding to the sample where the bacterial species corresponding to this library was inoculated. 5. In order to filter out for Skyline peak picking errors, the data must be validated with an FDR filtering algorithm. To do so, compute the FDR q-value by clicking on Refine/Reintegrate and select mProphet as Peak scoring model. Do not click on OK but go to Edit current. . . at the bottom of the Peak scoring model menu. 6. In the Edit Peak Scoring model window, check the Use decoys box and click on the Train Model button. In the Model Scores tab, you should see a partial separation of the distributions of

DIA Proteomics and Machine Learning for the Fast Identification of. . .

311

targets and decoys peptides and the Q Values tab, you should have a large majority of targets q-values close to 0. 7. Give a name to your model and click on OK. Then, click on OK again in the Reintegrate window. Skyline will reintegrate all the peaks according to this validation model and compute a q-value that could be seen in the final export. 8. Save the file by clicking on File/Save. 9. Export the data from Skyline by clicking on File/Export/ Report. Then, create a new report by clicking on Edit List/ Add. Give a name to the new report template and select the following columns: Protein Name, Peptide Sequence, Peptide Modified Sequence Unimod Ids, Precursor Mz, Precursor Charge, Missed Cleavages and Total Area, Average Mass Error PPM and Detection Q Value from the Precursor Results list. Make sure the Pivot Replicate Name box is checked at the bottom of the window and click on OK. Click on OK again and select the newly created report. Click on Export to save it in .csv format on your computer in a directory called “ExportSkyline.” 10. For each bacterial species, open a new Skyline window and repeat the Subheading 3.6, steps 1–9. Save all the exports in the same directory. 11. Use the scriptDIA.R script using R available at the address https://github.com/ArnaudDroitLab/skylineToCSV/blob/ main/DIA/scriptDIA.R to generate the DIA_InputML_discrete.csv file. 12. This script makes a merge of all Skyline exports, filter out for peptides susceptible to have low reproducibility (keep only peptides without missed cleavages, without PTM, without methionine or cysteine in their sequence, and with at least 8 amino acids) and validates the peptide detection (only peptide peaks with a library dot product (dotP) > 0.75 and a q-value < 0.01 were considered as detected. For those fulfilling these criteria, the “TRUE” value is given and “FALSE” for the others) (see Notes 17 and 18). 3.7 Training Step: Machine Learning Model and Signature Identification

1. Retrieve the DIA_InputML_discrete.csv file containing, for each sample in row, the corresponding identifier, the class to predict and all peptides’ levels (i.e., features), an example of the content of this file is given in Table 4: 2. Extract/Cut (manually or by scripting) 15% of your dataset for further validation and store it into DIA_InputML_discrete. validation.csv while keeping the same header (see Note 19). Keep a balanced class distribution in this dataset. For example, if you have 100 samples of bacteria_1 and 100 samples of bacteria_2, the validation file must contain 15 samples of each.

312

Florence Roux-Dalvai et al.

Table 4 Example of ‘DIA_InputML_discrete.csv’ file Identifier

Class

sample_1

bacteria_1

sample_2

bacteria_1

sample_3

bacteria_1

sample_4

bacteria_1

...

...

sample_11

bacteria_2

sample_12

bacteria_2

sample_13

bacteria_2

sample_14

bacteria_2

...

...

sample_n

bacteria_x

peptide_1

Peptide_2

...

peptide_m

3. Create a text file as a configuration file, named config.conf and copy the following parameters with a text editor (see Note 20): project=bacteria doClassification=true classificationClassName=class trainFile=DIA_InputML_discrete.csv computeBestModel=true numberOfBestModels=1 numberOfBestModelsSortingMetric=AVG_MCC

4. Execute BioDiscML using this command: java -jar biodiscml.jar -config config.conf -train

5. You can wait for the process to end or extract the best model found at any time using this command (see Note 21): java -jar biodiscml.jar -config config.conf -bestmodel

6. Open the file ending by .details.txt to visualize the information of the model, which include performance metrics, signature of features selected by BioDiscML and used in the model, and eventual correlated features. 7. If the average MCC (AVG_MCC) is not satisfactory (0.1), the model may not be

DIA Proteomics and Machine Learning for the Fast Identification of. . .

313

considered performant nor robust across various cross validations procedures. Thus, a solution may reside in an ensemble learning by gathering the decision of many models in one. To this purpose, open the results file ending by _c.classification.results.csv using Excel or any table viewer software and order the list by AVG_MCC. Then identify in the results file the models you want to include in the ensemble model. That can be either a list of model identifiers (see Note 22) or simply the best x models (see Note 23) or the models having a score above a threshold (see Note 24). Furthermore, if the number of features proposed by the best model does not fulfill your objectives (e.g., too many features selected by the best model), target a model in the result having the highest AVG_MCC with a satisfying number of features. 8. Locate the model that has been generated, starting with this prefix: bacteria_d_model.model. 9. Use the following command to evaluate the performance on the validation file of the final obtained model: java -jar biodisml.jar -config config.conf -newDataFile DIA_InputML_discrete.validation.csv -model bacteria_d_model. model -predict

3.8 Identification Step: Monitoring of Peptide Signature with PRM

The peptide signature generated by Machine Learning can be endlessly monitored by targeted proteomics on unknown samples to obtain bacterial identification. Before using unknown samples, this step could be applied on new inoculates to validate the model and assess the detection limit. We present here a monitoring by PRM on an Orbitrap Fusion Tribrid but the signature can also be monitored on other instruments and by SRM. 1. Prepare urine samples as inoculate samples, previously described from Subheading 3.4, steps 3–15. 2. For the LC-MS/MS analysis, repeat the Subheading 3.5, steps 1–4 but use this time the PRM parameters in Table 2 (see Note 25). 3. Use Skyline to extract the signal of each peptide of the signature. To do so, open a new Skyline file and use the Peptide and Transition settings listed in Table 3 (including the parameters in red). 4. Create a table in Excel containing two columns: “Peptide sequence” column: list of peptide sequences in the signature and list of the 10 iRT peptide sequences and “Protein name” column: indicate only “Signature” or “Biognosys standards.”

314

Florence Roux-Dalvai et al.

5. Copy the table and paste it in Edit/Insert/Peptides. The peptides should appear in “Targets” on the left side of the Skyline page. 6. Save the Skyline file by clicking on File/Save as... 7. Click on File/Import/Results. . . to import all the PRM .raw files. 8. Export the data as a .csv file using the export template created during the Subheading 3.6, step 9. 9. Peptide peaks are then validated in R using the script https:// github.com/ArnaudDroitLab/skylineToCSV/blob/main/ PRM/scriptPRM.R which will generate PRM_InputML_discrete.csv. Peptides are considered as detected if they fulfill the following criteria: Library dot product (dotP) > 0.85 and an average mass error < 10 or Library dotP > 0.75 and an average mass error < 3. In this case, they receive the “TRUE” value, otherwise a “FALSE” (see Note 26). 10. Finally, use the following command to obtain identification of bacteria on the new PRM data file PRM_InputML_discrete. csv of the final obtained model: java -jar biodisml.jar -config config.conf -newDataFile PRM_InputML_discrete.csv -model bacteria_d_model.model -predict

4

Notes 1. The bacterial cell concentration may vary with the bacterial species. The culture condition should be adapted to get a minimum of 109 CFU in order to obtain at least 120 μg of protein at the end of the extraction. 2. The mutanolysin is used to partially digest the peptidoglycan of the Gram-positive bacteria cell wall. In the case where the set of bacteria to be included in the model do not contain any Grampositive species, this step could be skipped. Conversely, if the model contains at least one Gram-positive species, all the species (Gram-positive and -negative) should encounter the same mutanolysin containing protocol. 3. This amount of mutanolysin is optimized for 109 CFU of bacterial cells and should be adapted if a different number of cells.

DIA Proteomics and Machine Learning for the Fast Identification of. . .

315

4. In order to have a fast sample preparation protocol, we skipped the reduction and alkylation of cysteine amino acids usually performed in proteomics protocols. The cysteine-containing peptides are then excluded prior to machine learning training. 5. This incubation should be enough to inactivate most of the bacterial species. However, it could be verified (especially for pathogens) by culture on agar plate. 6. If the sonication is not possible, this step might be replaced by 5 min of vortex mixing. 7. This acidification should result in the precipitation of the sodium deoxycholate contained in the sample. If the precipitate is not visible, add more volume of 50% FA until the precipitation appears. 8. A large number of fractions can be collected to increase the size of the final spectral library. However, eight fractions are usually enough to cover most of the peptides that will be also further observable in a non-fractionated sample. 9. Without NanoDrop spectrophotometer available, the peptide amount in each fraction can be estimated by dividing the peptide amount loaded on the HpH HPLC by the number of fractions collected (i.e., if eight fractions were collected from 120 μg of peptides, each fraction should approximately contain 15 μg of peptides). 10. No fixed modification is set since the carbamidomethylation of cysteine was not performed in order to keep the sample preparation protocol as short as possible, but the user might choose to add this step and then should adapt the search parameters consequently. Other variable modifications might be used if needed. 11. For each bacterial species, search in the most complete protein database, ideally Uniprot Reference databases. Certain species may have a very incomplete database, in that case, a closely related species could be used or a database at genus level. However, in this case, it could impair the ability of the final model to discriminate closely related species. 12. Skyline can use many other types of results files to generate the libraries depending on which search engine was used. 13. We recommend preparing as many replicates for each species as the total number of classes (species) to be predicted by the final model. Increasing the number of biological replicates might improve the final accuracy of the model. Moreover, the more biological class (species) to predict, the more replicates should be prepared. 14. The concentration of inoculation must be chosen to allow an efficient detection of the bacteria in the background.

316

Florence Roux-Dalvai et al.

1  106 CFU/mL should be easily detected by an Orbitrap Fusion or Q-Exactive mass spectrometer. A second level of inoculation, closer to the expected limit of detection, can be also used to improve the efficiency of the detection model at low concentrations. Although this protocol has been developed for 1 mL of samples, it could be adapted for larger volumes in order to lower the detection limit. 15. After inoculation and before the centrifugation, additional steps might be added to eliminate as much as possible background proteins. For instance, human urine inoculates can be low speed centrifuged (1000  g, 5 min) to remove human cell debris or bacterial pellets in milk background can be washed with a citrate buffer to rid of lipids and casein. 16. Since the sample volume to be purified is about 70 μL, StageTips should be loaded in two steps or be prepared in 200 μL tips. 17. We made the choice to have stringent peptide filtering in order to keep the most reproducible and the most detectable peptides. However, the user might choose to modify the R script to include/exclude peptides from the library or to modify the detection criteria. 18. Prior to machine learning training the user may also want to exclude certain peptide sequences that also match the protein background (e.g., Homo sapiens proteins) or other contaminants to not be detected. The Unipept software [5] can be used to identify these sequences. 19. We propose to randomly extract 20% of the samples for validation in a new file named samples_peptides.validation. csv, but any value between 10 and 20% should be reasonable. 20. BioDiscML installation files and descriptive documentation are available at https://github.com/mickaelleclercq/BioDiscML. Many options are available, such as supporting many very large input files merged using mergingID option. Also, if your dataset contains several hundred or thousands of samples, it is better to use the option loocv¼false in the BioDiscML configuration file. 21. BioDiscML may take several hours to run depending on your dataset. All models are stored in a file ending with _c.classification.results.csv. You can open this file in a table viewer like Excel and order the models by the column AVG_MCC to get an overview of the best models found by BioDiscML. 22. In case of choosing manually the models to be included in the ensemble model, add the models’ identifiers generated in the results in the following command: java -jar biodiscml. jar -config config.conf -bestmodel model_ID_1 model_ID_2 model_ID_3

DIA Proteomics and Machine Learning for the Fast Identification of. . .

317

23. Depending on your choice, you will need to modify the configuration file according to the section ## Best model autoselection in the config file provided by BioDiscML (https://github.com/mickaelleclercq/BioDiscML/blob/mas ter/config.conf). In case of choosing top x models based on AVG_MCC metric (default in BioDiscML), add these lines in config.conf: numberOfBestModels=1 combineModels=true

Then run the following command:

java -jar bio-

discml.jar -config config.conf -bestmodel

24. In case of letting BioDiscML choose the models based on specific threshold (e.g., MCC > 0.8) based on AVG_MCC metric with a maximum of ten models selected, add these lines in config.conf: numberOfBestModels=10 numberOfBestModelsSortingMetricThreshold=0.8 combineModels=true

Then run the following command:

java -jar bio-

discml.jar -config config.conf -bestmodel

25. Depending on the number of peptides to monitor, the method might need to be scheduled to keep an appropriate cycle time. Run length might also be shortened for a high throughput of analysis. In both cases, the iRT added in the samples can be used to adjust the monitoring to the peptide retention times. 26. These criteria might be adjusted depending on the instrumentation used and the acquisition conditions. References 1. Roux-Dalvai F, Gotti C, Leclercq M et al (2019) Fast and accurate bacterial species identification in urine specimens using LC-MS/MS mass spectrometry and machine learning. Mol Cell Proteomics 18:2492–2505 2. Pino LK, Searle BC, Bollinger JG et al (2020) The Skyline ecosystem: informatics for quantitative mass spectrometry proteomics. Mass Spectrom Rev 39:229–244 3. Leclercq M, Vittrant B, Martin-Magniette ML et al (2019) Large-scale automatic feature

selection for biomarker discovery in highdimensional OMICs data. Front Genet 10:452 4. Rappsilber J, Mann M, Ishihama Y (2007) Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat Protoc 2: 1896–1906 5. Mesuere B, Devreese B, Debyser G et al (2012) Unipept: tryptic peptide-based biodiversity analysis of metaproteome samples. J Proteome Res 11:5773–5780

Chapter 22 Novel Bioinformatics Strategies Driving Dynamic Metaproteomic Studies Caitlin M. A. Simopoulos, Daniel Figeys, and Mathieu Lavalle´e-Adam Abstract Constant improvements in mass spectrometry technologies and laboratory workflows have enabled the proteomics investigation of biological samples of growing complexity. Microbiomes represent such complex samples for which metaproteomics analyses are becoming increasingly popular. Metaproteomics experimental procedures create large amounts of data from which biologically relevant signal must be efficiently extracted to draw meaningful conclusions. Such a data processing requires appropriate bioinformatics tools specifically developed for, or capable of handling metaproteomics data. In this chapter, we outline current and novel tools that can perform the most commonly used steps in the analysis of cuttingedge metaproteomics data, such as peptide and protein identification and quantification, as well as data normalization, imputation, mining, and visualization. We also provide details about the experimental setups in which these tools should be used. Key words Metaproteomics, Bioinformatics, Microbiome, Mass spectrometry, Software, Proteomics, Quantification, Statistics, Computational biology

1

Introduction Metaproteomics studies are used to understand the complex community dynamics and functional profiles of microbial communities. Experimental methods have been developed to better investigate microbial communities in preparation for metaproteomics studies, including improvements to laboratory workflows with regard to protein extraction, purification, digestion, and mass spectrometry [1]. However, as experimental protocols improve, the amount of data acquired by each experiment increases. Computational tools and methodologies are required for confident peptide and protein identification, quantification, and downstream data mining and analysis. In comparison to single species proteomics, metaproteomics data poses additional challenges to data analysis due to the complexity of community samples stemming from the presence of

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_22, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

319

320

Caitlin M. A. Simopoulos et al.

many, and often previously uncharacterized or highly related, microbial species. Standard proteomics algorithms often fall short of efficiently and properly processing metaproteomics datasets. Furthermore, researchers typically require access to computers with greater power, as well as large and efficient memory to store important amounts of data. Even with the proper hardware in place, the lack of user-friendly computational tools for specific metaproteomics applications can require personnel trained in statistical or computational methods to build custom software packages or pipelines [2]. Large computational resource requirements are compounded by increased challenges in metaproteomics, such as protein inference problems encountered when identifying proteins from community samples and microbes that remain uncharacterized. These challenges highlight the necessity of metaproteomic-specific bioinformatics tools and pipelines for metaproteomics research progress. In this chapter, we outline recent computational strategies in metaproteomics, including tools for peptide and protein identification and quantification, metaproteomic-specific software workflows for functional and taxonomic analysis and how researchers have applied tools originally developed for proteomics in the context of metaproteomics research. Figure 1 provides a summary of the main stages of metaproteomics computational data analysis and the software packages tackling these tasks.

2

Peptide and Protein Identification

2.1 Mass Spectrum Acquisition

In modern discovery-based metaproteomics studies, researchers typically use shotgun proteomics approaches to identify peptides and proteins. With such methods, proteins are first enzymatically digested into peptides, using an enzyme such as trypsin. The resulting peptides are then usually separated using liquid chromatography. Tandem mass spectrometry (MS/MS) is then used to acquire data from the resulting separated peptides using two main methods: data-dependent acquisition (DDA) [3] and data-independent acquisition (DIA) [4, 5]. These data acquisition methods differ by how the mass spectrometer selects ions detected in a mass spectrum (MS1 spectrum) for further fragmentation and tandem mass spectrum acquisition (MS2 spectrum). Importantly, these two methods require different computational approaches to identify peptides from the gathered MS2 spectra. The most common approach in metaproteomics using MS/MS is DDA. In DDA, first the mass spectrometer (MS) detects precursor ions of peptides and measures their intensity. Then the most intense precursor ions are selected by the mass spectrometer for fragmentation and the acquisition of a tandem mass spectrum of the fragment ions [6]. DIA, on the other hand, is not biased

Bioinformatics Methods for Metaproteomics

321

Fig. 1 Graphical representation of the main steps of the computational analysis of metaproteomics data. Software packages are grouped under the tasks they specialize in. Numbers between brackets are references to the original publications of the tools

322

Caitlin M. A. Simopoulos et al.

towards the acquisition of intense ions. Instead, a series of windows of ions with a mass-to-charge ratio (m/z) within a given range are isolated and fragmented at the same time, yielding a chimeric MS2 spectrum, representing the fragment ions of multiple precursor ions. In the end, MS2 spectra resulting from both DDA and DIA can be used for peptide identification. 2.2 Peptide Identification with Database Search

As for traditional MS-based proteomics, the most common method for peptide identification in metaproteomics is the protein sequence database search [7]. Briefly, such an approach generates a database of theoretical MS2 spectra based on a given protein sequence database. The acquired MS2 spectra are then matched to the theoretical spectra and the level of agreement between acquired and theoretical spectra is measured computationally.

2.3 Protein Sequence Database Choice

Protein sequence databases used for mass spectra identification need to match the sample type and be small and specific enough to prevent challenges introduced by false discovery rate (FDR) estimation [8]. In other words, it would not be useful to use a sequence database of proteins derived from an ocean microbiome for a Homo sapiens gut microbiome study. Furthermore, a search through protein sequences of all identified microbes would be incredibly time consuming, and more importantly, would introduce many false positives due to peptide-spectrum matches (PSMs) that would occur by chance. The ideal protein sequence database is one that contains only the sequences of the proteins present in the analyzed sample. However, given the fact that most microorganisms composing a microbiome sample, and the proteins they express are typically largely unknown, building such a protein sequence database is often impossible. Hence, a more suitable database for metaproteomics research is one derived from a metagenomic sequence database, but the best way to use metagenomic data to build such a database remains unclear [9, 10]. For example, Timmins-Schiffman et al. suggest a sample-specific “metapeptide” approach, where the database to be searched is built from unassembled metagenomic reads [10]. Alternatively, Tanca et al. discuss the benefits of assembling individual genomes for database searching when analyzing less characterized microbial samples [9]. Metagenomics databases are, however, expensive and time consuming to produce due to the extensive amount of sequencing needed for their construction. Many researchers instead opt to utilize general sequence databases instead, such as the NCBI’s non-redundant microbial database [11] or system specific databases, such as the IGC human gut microbiome database [12] for their ease of use. Sequence database choices can have important effects on the biological conclusions of a metaproteomics investigation and thus it is essential to be cautious when both choosing a sequence database and interpreting identification results [10]. Fortunately, database quality and completeness can be assessed using

Bioinformatics Methods for Metaproteomics

323

LiDSiM (LImits of Detection SImulation for Microbes) method, by calculating how many MS2 spectra can be confidently identified in a particular experiment and with a given database [13]. 2.4 Protein Sequence Database Search

In addition to appropriate and biologically relevant databases, metaproteomic-specific database search algorithms should be considered when analyzing metaproteomics data. To reduce FDRs stemming from the large sequence databases required for accurate metaproteomics studies, common metaproteomics search strategies use iterative or two-step database search methods to reduce the size of such sequence database. Jagtap et al. [14] first described a two-step database search approach. In this two-step method, a primary search is completed on a larger database of gut microbiome sequences. This first search is followed by a second search on a target-decoy database, where target sequences are composed of positive peptide hits from PSMs of the primary search and the host (human) database, and decoys are their reverse sequences. Similarly, MetaPro-IQ [15] uses the X!Tandem [16, 17] database search algorithm, followed by the MaxQuant [18] search engine for a two-step target-decoy database search approach. In this case, the second and final database search is completed on a reduced non-redundant experiment-merged database, where only matched sequences in the first search are kept for the second search [15]. Novel protein sequence database search frameworks, such as the metaproteomic-specific ProteoStorm, now include such an iterative search strategy [19]. On the other hand, Xiao et al. described a metagenomics taxonomy-guided database search strategy that also follows a two-step methodology [20]. The first step of the database search uses inferred “pseudometagenomes,” a sequence database constructed from UniProt taxonomy rather than assembled metagenomes, as a preliminary database [21]. Once proteins are identified from this “pseudometagenomic” database, a merged database is constructed by combining these pseudometagenomes with translated open reading frame (ORF) sequences from assembled metagenomic contigs. The addition of the taxonomy inferred database led to the identification of more peptides than a classic metagenomic database in both synthetic microbial communities and actual human gut microbiome samples, suggesting benefits of also considering “pseudometagenomes.” ComPIL, a metaproteomic-specific protein database was recently updated to ComPIL 2.0 to reflect the increased amount of diverse microbes identified in meta-omics studies [22]. In combination with the updated database, ComPIL 2.0 can be searched by a metaproteomic-friendly search algorithm, ProLuCIDComPIL [23]. In combination, the database and updated search algorithm identify more microbial and host proteins than their original versions and offer another metaproteomic-specific approach to database searching.

324

Caitlin M. A. Simopoulos et al.

2.5 Spectral Library Search

While protein sequence database searches remain the most popular approach to identify peptides in the context of metaproteomics studies, spectral library searches are also gaining popularity in the field, especially in the context of datasets acquired using DIA. Briefly, in this type of search algorithm, acquired mass spectra are matched against a library of previously acquired and identified MS2 spectra, often acquired through DDA and identified using sequence database search approaches [24–26]. This type of peptide identification approach is often preferred in DIA datasets due to their improved peptide identification sensitivity when analyzing highly complex MS2 spectra containing the fragment ions of multiple precursor ions. However, this method is limited to the identification of peptides that have previously been identified in MS2 spectra. To address this issue, some approaches build in silico spectral libraries using deep learning predictions of the MS2 spectra of peptides without the use of DDA data. Methods such as DeepDIA [27] and Prosit [28] create these libraries based on the MS instrument used or fragmentation energy selected, respectively, as well as predicted peptide retention time. Spectral library search has been used in the context of metaproteomics datasets acquired using DIA. Indeed, the package Diatools [29], a pipeline processing DIA data, identifying peptides and quantifying them, has been used to characterize metaproteomics data from gut microbiomes of microbial mixtures and human stool samples [30]. Similarly, Long et al., performed a metaproteomics analysis of fecal samples of colorectal cancer patients and identified peptides from DIA data with Spectronaut (Biognosys AG, Switzerland). In conclusion, the quality of peptide identification is heavily influenced by the reference sequence database and spectral library selection and the computational search strategy used to explore them [8, 10].

2.6 Assessment of Peptide-Spectrum Match Quality and False Discovery Rate Estimation

Machine learning and statistical analyses are typically used to complement protein database search engines and spectral library investigation methods to assess PSM reliability. Such an approach is needed since most sequence database and spectral library search approaches match all acquired MS2 spectra to at least one peptide sequence. The quality of this match is typically evaluated using a target-decoy search, in which the search is performed against MS2 spectra from peptides that could be present in a sample (target) and MS2 spectra of peptides not expected to be present in the sample (decoy) [31]. An FDR for a given set of peptide identifications can then be derived from matches to these target and decoy mass spectra. Metaproteomics data analysis can be especially prone to a high FDR due to the presence of many microbial species in a sample, increasing the odds of matching decoy mass spectra by chance. Selecting a database and a search strategy adapted to the

Bioinformatics Methods for Metaproteomics

325

analyzed sample is essential to reduce FDR. In addition, one can also use machine learning-based software to accompany a search engine for evaluation of the resulting target and decoy matches and assessment of PSM quality for improved peptide identification. Among these approaches, we count Percolator, first introduced in 2007, which uses a semi-supervised support vector machine approach to estimate the FDR of PSMs [32, 33]. Researchers studying functional redundancy in ocean microbiomes have used Percolator to ensure accurate PSMs with an estimate of 1% FDR [34]. On the other hand, Sipros Ensemble uses ensemble PSM scoring and a supervised machine learning approach to PSM filtering [35]. Sipros Ensemble makes use of multiple PSM quality metrics, such as the multivariate hypergeometric (MVH) scoring function, cross-correlation (Xcorr), and weighted dot product (WDP) in a logistic regression classifier for confident PSM identification. Spiros Ensemble has been tested on both real and synthetic microbial community samples and was able to identify more confident PSMs, peptides, and proteins than methods that use a single metric. Alternatively PeptideProphet [36] uses a mixture model and probability-based approach to re-score PSM metrics to be used following a database search to ensure confidence in PSMs. Finally, the DTASelect [37] quadratic discriminant analysis performs the same task and was recently used to filter PSMs when testing the original and updated metaproteomic-specific ComPIL 2.0 database to assess the reliability of PSMs from human gut microbiomes [38]. 2.7 De Novo Sequencing

Peptides can also be identified without any reference sequence database or spectral library using de novo sequencing approaches, such as PEAKS [39], PepNovo [40], and NovoHMM [41]. These methods determine the peptide sequence that generated a given MS2 spectrum simply based on the fragment ions in the MS2 spectrum. Despite the challenges of doing so in noisy mass spectra, such approaches have the benefit of not requiring the construction of a large database of proteins expected to be present in a sample. This can be advantageous when very little is known about microorganisms’ presence in such an analyzed sample. While applications of de novo sequencing remain marginal, a novel method, NovoBridge, has been proposed to identify peptides in synthetic microbial communities, and microbiomes from the Bering Sea and a wastewater treatment plant [42]. De novo peptide sequencing was also used to identify cyclopeptides in the human gut using CylcoNovo, an automated de novo cyclopeptide sequencing algorithm based on de Bruijn graphs [43].

326

3

Caitlin M. A. Simopoulos et al.

Peptide and Protein Quantification In addition to multiple strategies for peptide identification in metaproteomics, there exists also numerous approaches for peptide quantification. Label-free quantification methods, such as spectral counting or intensity-based approaches, are common in metaproteomic studies because of their ease of use and the fact that they do not require any peptide labeling as in TMT (tandem mass tags) [44], 15N [45], or iTRAQ (isobaric tags for relative and absolute quantification) [46] strategies, which on the other hand typically result in higher quantification accuracy. Nevertheless, a method named SILAMi (stable isotopically labeled microbiota) [47], using in vitro stable isotopic labeling as a spike-in reference, was proposed to reduce technical variation and enable relative quantification of peptides in microbiomes. Such an approach enables the estimation of peptide and protein abundance fold-changes and statistical assessment of differential abundance across different experimental conditions using this spike-in reference. To alleviate the issues related to peptide quantification inaccuracies of label-free approaches, ANPELA (analysis performance assessment of label-free proteome quantification) can be used to assess the quality of a given label-free metaproteomics workflow [48]. While considering various data normalization, transformation, or imputation methods (further discussed below), ANPELA uses the raw output files of 18 different peptide intensity estimation software to assess the quality of estimates in pooled coefficient of variation (PCV) values. Included in the evaluation are precision, classification ability, differential expression analysis, reproducibility, and accuracy. While there are very few computational tools for the extraction of peptide quantification values (spectral counts or intensity values) from MS data that were explicitly developed for metaproteomics analyses, software packages, such as Census [49] and MaxQuant [18], which are commonly used for peptide and protein quantification in proteomics, can also be used in the analysis of microbiomes [15, 50].

3.1 Workflows Combining Identification and Quantification

Recently, integrated peptide identification and quantitation workflows have been developed specifically for metaproteomics. We designed the MetaLab data analysis pipeline that creates experiment-specific sequence databases, identifies and quantifies peptides and protein-groups, and completes taxonomic and functional analyses [51, 52]. MetaLab uses the iterative MetaPro-IQ approach to database search for peptide identification [15]. In 2020, MetaLab v2.0 was released and now includes the ability to perform an open database search to favor the identification of posttranslationally modified peptides, which can play a regulation role in the metabolic activity of microbes [53]. Following MetaLab

Bioinformatics Methods for Metaproteomics

327

analysis, iMetalab Shiny apps are available for further data analysis allowing end-to-end bioinformatics data analysis and publication quality figure generation (shiny.imetalab.ca/). Schiebenhoefer et al. [54] have also described a flexible end-toend metaproteomics workflow that incorporates MetaProteomeAnalyzer (MPA) [55, 56], a tool for MS data processing, as well as taxonomic and functional profiling, and Prophane [57] (www. prophane.de), a software package for metaproteomics data visualization. This workflow enables a complete metaproteomics data analysis from database creation to taxonomic and functional analysis. Unlike MetaLab, which is a single integrated software, the MPA-Prophane workflow is a customizable pipeline that uses multiple software packages for each step allowing for customization and uses different sequence database search engines (X!Tandem [16, 17] and OMSSA (open mass spectrometry search algorithm) [58]. PSMs from all algorithms are combined to maximize peptide identification with customizable data aggregation methods for protein group computation. Prophane is then used for automated taxonomic and functional analyses and visualization. Alternatively, Van Den Bossche et al. [59] described an end-toend workflow that uses MPA, but this time, along with PeptideShaker [60] and Unipept [61, 62] for peptide identification and quantification, as well as taxonomic and functional analysis. Of particular interest is the incorporation of PeptideShaker, a tool that connects directly to PRIDE [63], a proteomics data repository member of the ProteomeXchange consortium [64], which includes multiple proteomics databases. This connection allows its peptide identification module to integrate a slew of database search results. This workflow was specifically created for disseminating newly synthesized data and re-analyzing datasets that have already been deposited in PRIDE to enable easily accessible metaproteomics data sharing. The Galaxy suite of bioinformatics tools also offers a metaproteomic-specific workflow [65]. Free and open access, Galaxy’s metaproteomics tools allow users to easily complete and share metaproteomics analysis via cloud-based computing. Galaxy includes a package that uses a two-step database search as previously described by Jagtap et al. [14] that is modified for use with ProteinPilot (Sciex, Framingham, MA), a software used for protein identification and quantification. Users can make use of Galaxy’s other available software to complete taxonomic and functional analyses, such as Unipept [61] and MEGAN5 (MEtaGenome ANalyzer) [66]. In addition, OpenMS 2.0 [67] is a suite of tools that can be used for flexible proteomics analysis. Specifically created for reproducible analysis, OpenMS tools are available using a command line interface (CLI) and a GUI-based workflow management tools, such as Galaxy [68] and KNIME [69]. OpenMS is updated

328

Caitlin M. A. Simopoulos et al.

regularly and now contains tools designed for metaproteomics data analysis such as MetaProSIP for stable isotope incorporation quantification in functional metaproteomics [70]. Finally, pipelines and workflows for standard proteomics experiments are also available and are routinely used for metaproteomics data analysis. For example, the Integrated Proteomics Pipeline (IP2, Bruker Scientific LLC, Billerica, MA, http://www. bruker.com) has been previously used to identify and quantify peptides and proteins in microbiomes from mouse fecal samples [50]. Similarly, the Trans Proteomic Pipeline [71], another workflow designed and integrating software packages for proteomics data analysis, was used to analyze microbiomes from human saliva and tongues [72].

4

Data Refinement Data refinement prior to statistical analysis and further data mining is an essential step in a metaproteomics workflow to reduce the influence of technical biases and noise introduced during data generation. Data normalization, transformation, imputation, and filtering are some of the steps that may need to be performed prior to further downstream analyses to derive the appropriate biological conclusions from a dataset.

4.1

Normalization

Data normalization is useful for comparisons between samples or MS runs because it aims to reduce quantification variation introduced by the experimental conditions, allowing for legitimate biological differences to be observed. Some popular methods for proteomics data normalization include linear regression, local regression, variance stabilizing normalization (VSN), median or total intensity normalization, Progenesis normalization, quantile normalization, and EigenMS normalization. A benchmarking of these approaches in the context of label-free proteomics noted that VSN and linear regression typically performed the best [73]. However, the majority of these commonly used normalization methods are borrowed from genomics and can ignore proteomics-specific technical biases. Some normalization tools have recently been introduced to offer proteomics-specific normalization methods and the ability to test multiple methods on a user’s data using metrics such as pooled coefficient of variance (PCV) to evaluate reproducibility after normalization. Although not specifically developed for metaproteomics, such approaches are also likely to be appropriate methods for the normalization of such data. Among these, we count NormalyzerDE, a tool available as an R package from Bioconductor (https://www.bioconductor.org) that allows users to make use of classic normalization methods, while considering biases introduced

Bioinformatics Methods for Metaproteomics

329

by LC-MS/MS [74]. For example, users can use Loess local regression, VSN, and mean and median intensity normalization, while considering retention time-segmented normalization. Interactive tools for data processing also exist. For example, the InfernoRDN GUI (previously DAnTE [75]), originally developed for microarray data, can be used for data normalization, data visualization, and a set of statistical analyses using an R backend. InfernoRDN now focuses on proteomics data analysis, and, while not specifically developed for metaproteomics, is still an appropriate tool that has been used for user-friendly analyses [76]. 4.2

Data Imputation

Missing values are frequently encountered in metaproteomic datasets and can occur when a peptide is identified, but not quantified, or is quantified in one sample but not another. They can also occur at the taxonomic level, when a microorganism is detected in one sample but not in another. It is often necessary to impute missing data to enable a more accurate statistical analysis of quantification results and to prevent biases towards more highly expressed peptides and proteins. Similar to data normalization, data imputation tools were usually not designed for the sole purpose of metaproteomics analysis. Nevertheless, state-of-the-art approaches designed for “-omics,” or proteomics specifically, can also often translate to metaproteomics datasets. In metaproteomics, there are two different missing value types: missing completely at random (MCAR) and missing not at random (MNAR). MCAR can describe peptides that were not identified due to technical issues with the instrument and are not dependent on the peptide nature or intensity [77]. MCAR values are uniformly distributed throughout an experiment. MNAR values, however, are associated with intensity levels and typically affect peptides with intensities close to the instrument’s limit of detection [78]. Due to the difference in these missing value types, literature suggests that some data imputation methods perform better on different types of missing values [78]. Liu and Dongre [79] recently evaluated the downstream effects of multiple data imputation strategies on differential expression studies in DDA MS-based proteomics. The authors observed that most missing values in DDA datasets are MNAR. Through simulated datasets, Liu and Dongre [79] showed that imputation on a per-experimental condition basis vastly improved differential expression outcomes. In addition, the authors recommend a sample minimum-based imputation method for MNAR missing values and Bayesian PCA and SVD imputation methods for MCAR missing values. An online tool, NAguideR was proposed to offer multiple methods of missing value imputation [80]. The Shiny app (http://www.omicsolution.org/wukong/NAguideR/) accepts multiple file formats and evaluates 23 different data imputation methods using 8 criteria, such as normalized root mean square

330

Caitlin M. A. Simopoulos et al.

error and various correlation coefficients. While NAguideR’s performance was evaluated on DIA-MS data, this tool remains appropriate for datasets generated with both DIA and DDA. Finally, proteiNorm is a R Shiny based tool (https://sbyrum.shinyapps. io/proteiNorm/) that can filter datasets based on peptide identification, assess normalization methods, impute missing data, compare differential abundance methods, and complete exploratory data visualization [81]. 4.3

5

Data Aggregation

A major challenge in bottom-up proteomics is the assignment of parent proteins from peptides identified by MS. The nature of enzymatic digestion can lead to the same peptide sequences to be found in multiple different proteins making it difficult to determine the actual parent-protein of a peptide. This challenge has been named the Protein Inference Problem [82] and is exaggerated in metaproteomics due to the presence of multiple microbe strains and species in a microbiome, which can share conserved elements of protein sequences. Identifying parent proteins is often an essential step in metaproteomic data analysis since most bioinformatics tools use identified proteins for downstream analyses, such as functional enrichment, and require multiple peptide intensities or spectral counts to be aggregated into a single protein intensity or count value. The main approach to aggregate peptide identifications into a protein identification is the Occam’s razor principle, in which the smallest set of proteins explaining the presence of the identified peptides is reported [83]. Once a protein identification method has been chosen, a number of ways can be used to aggregate intensity or spectral count values. These peptide quantification values can be summed to provide a protein quantification value, which is the most common approach for spectral counting. They can also be averaged, which is often used for intensity-based quantification. When taking the average intensity of the peptides for a given protein, outlier detection can be applied to avoid biasing the average with extreme values. Alternatively, the median of the peptide quantification values can also be taken as the protein quantification value.

Data Mining and Functional Analysis After the common steps of identification, quantification, and data refinement, the next natural step is to analyze metaproteomics data in terms of taxonomic composition and functional enrichment. In the following section, we will discuss computational methods for such analyses as well as packages for metaproteomics data visualization.

Bioinformatics Methods for Metaproteomics

331

5.1 Taxonomic Analysis

To understand the complex microbial community structure of metaproteomics samples, one can use taxonomic analysis tools to identify the taxa present in such samples. A common strategy for taxonomic identification is implemented by the Unipept suite of tools [61, 62]. Unipept uses a lowest common ancestor (LCA) approach to identify the taxonomic representation within a sample by matching peptides sequences to the UniProtKB database. In addition, Unipept has been recently expanded to not only allow an evaluation of the biodiversity of a sample, but also explore its functional annotations through Gene Ontology (GO) terms [84] and Enzyme Commission (EC) numbers [85]. Unipept offers a wide array of interfaces and can be used via a web-based platform, locally using a command line interface, with an Application Programming Interface (API), and most recently through a local desktop application. Another software package, ProteoClade performs taxa-specific peptide identification as well as taxonomic quantification for multispecies experiments [86]. Implemented as a platform-independent Python toolkit, ProteoClade can identify peptides, and quantify taxa from either a specified database search or de novo sequencing approach, removing the requirement of an experiment-specific protein sequence database. ProteoClade does not use the LCA approach, and instead considers unique peptides to specific genera. Finally, another taxonomic analysis tool is METATRYP v2 [87]. This method was developed specifically for environmental microbiomes and identifies proteins with shared tryptic peptides for improved taxonomic identification by LCA. METATRYP is included in the Ocean Protein Portal, an online tool for sharing and interpreting ocean metaproteomic data [88], and a coronavirus implementation of METATRYP was recently developed. A curated marine protein web portal is also available for use with METATRYP, however local implementation with custom protein sequence databases, particularly those created using sequencing technology, is also possible.

5.2 Functional Analysis

Bioinformatics tools have also been developed for the exploration of the functional contributions of microbes in metaproteomics datasets. These methods rely on underlying functional annotations such as GO terms [84], the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [89], the Clusters of Orthologous Genes (COG) database [90], and the eggNOG database [91], to support the characterization of datasets. These functional databases are continuously updated and are accessible directly through an API for software-linked database queries. As previously described, Unipept offers taxonomic-dependent functional analysis using the functional information contained in UniProtKB [92]. Unipept, however, is a tool for exploratory analysis of individual samples, and does not offer the opportunity for

332

Caitlin M. A. Simopoulos et al.

complex statistical and enrichment analyses. In addition, recent literature demonstrates that taxonomic shifts may not always be associated with functional changes in microbial communities [93]. Like Unipept, MetaQuantome uses a function-taxonomy approach to functional analysis [94]. Implemented as a platformindependent Python package, MetaQuantome provides multiple modules for data analysis including a “Stat module” for two experimental condition comparisons at the taxonomic or functional level, and a “Viz module” for data visualization. MetaQuantome can accommodate both intensity- and spectral count-based quantification data and is available through a local implementation and the Galaxy framework. An approach that we proposed, pepFunk, offers a taxaindependent approach to peptide-centric functional analysis and includes a statistical analysis that can identify significant differences between metaproteomics samples [95]. Intended for use with intensity-based quantification, pepFunk is not limited by a specific number of metaproteomics treatments or experimental conditions and can run both locally and online as a Shiny application. Currently pepFunk’s built-in peptide-to-KEGG database is limited to human gut microbiomes, but the application also allows for the upload of a custom database for use with any type of proteomics dataset. Alternatively, MetaGOmics is a peptide-centric GO term focused enrichment tool for spectral count data [49]. This web application considers the complex topology of GO terms by creating directed acyclic graphs for each identified peptide. MetaGOmics can accept custom protein sequence databases and can identify statistically significant functional differences between two experimental conditions. 5.3 Metaproteomics Data Visualization

6

Most bioinformatics tools for metaproteome analysis can produce publication quality visualizations. However, there are few tools that focus specifically on data visualization. While not specific to metaproteomics, QIIME2 is a Python-based software package for the data analysis of microbiomes and can be used via a graphical user interface, using the command line, or even within a Jupyter notebook [96]. QIIME2 also supports the online hosting of complex interactive visualizations that can be shared with other users without the need to install QIIME2. With the growth of the field of metaproteomics, we expect to see the emergence of visualization tools specific to this kind of data.

Application: Metaproteomics Analytical Methods in Action with Real-World Data Metaproteomics research is strengthened by bioinformatics strategies that are user-friendly, that solve technical challenges, and that provide novel and innovative methods for data analysis. For

Bioinformatics Methods for Metaproteomics

333

example, Rechenberger et al. recently discussed the challenges with metaproteomics that were highlighted during a clinical study of patient fecal samples with multidrug-resistant Enterobacteriaceae infection [97]. The authors discussed the limitations of community-curated databases with over 50% of identified peptides missing from the commonly used IGC database. Instead of using pre-built gut microbiome protein sequence databases, Rechenberger et al. [97] used a multi-omic strategy by combining metaproteomics with metagenomics to build a sample-specific protein database for peptide identification. The authors also used MaxQuant [18] for peptide identification, Percolator [33] for PSM quality assessment and 16S sequencing to enhance taxonomic profiling. Finally, the authors input identified peptides into Unipept [61] for functional and taxonomic analyses. MetaGOmics [49] has also been used to explore the functional contributions of two geographically distinct ocean metaproteomes in response to simulated algal blooms and exposure to oligotrophic environments [34]. The authors also used 16S sequencing to aid in taxonomic profiling but focused a large portion of their analysis understanding how the ocean’s microbial function may shift over time after organic matter treatment. Through GO term enrichment and taxonomic profiling, the authors explored how identified taxa responded to both the simulated algal bloom treatment and oligotrophic environments. Taxonomic differences and functional redundancy were also identified between the samples taken from the two locations, leading the authors to hypothesize that function plays more of a key role than taxonomy in ocean metaproteomics studies and that different taxonomic signatures can lead to similar functional profiles. To understand the potential effect of climate change on microbes, researchers have also used metaproteomics to explore the functional impact of reduced irrigation followed by reclaimed water irrigation on the soil microbiome of an apple orchard [98]. Prophane [54] was used for taxonomic and functional analysis and revealed that reduced irrigation altered functional diversity, but not taxonomic diversity. The human gut microbiome is also known to play important roles in human disease processes, and there is interest in understanding how the compositional and functional profiles of human gut microbiomes may be contributing to our health and well-being. In addition, there is growing evidence that suggests that xenobiotics alter the human gut microbiome. To further study the connections between gut microbiome, disease processes and xenobiotics, we developed an in vitro assay, RapidAIM, to assess the microbiome’s responses to drugs using high-throughput metaproteomics [99]. Used in conjunction with the MetaLab software suite, we were able to identify microbial functional and taxonomic shifts

334

Caitlin M. A. Simopoulos et al.

associated with drugs [99], such as berberine and structural analogs [100], and resistant starches [101]. These are some of the examples of metaproteomics studies that were driven by recently developed bioinformatics tools. As the field of metaproteomics continues to grow, novel bioinformatics strategies will be required to maximize peptide characterization capabilities, complete complex statistical analyses, and make use of the large amounts of data being produced by high-throughput studies. Ease of use and accessibility of such algorithms will be crucial to facilitate metaproteomics discoveries in the future.

Acknowledgements This work was supported by Natural Sciences and Engineering Research Council of Canada Discovery grants to M.L.A. and D.F. Funding from the Government of Canada through Genome Canada and the Ontario Genomics Institute (OGI-156), the Natural Sciences and Engineering Research Council of Canada (NSERC, grant no. 210034), and the Ontario Ministry of Economic Development and Innovation (ORF-DIG-14405) to D.F. C.M.A.S. was funded by a stipend from the NSERC CREATE in Technologies for Microbiome Science and Engineering (TECHNOMISE) Program. References 1. Heyer R, Schallert K, Bu¨del A et al (2019) A robust and universal metaproteomics workflow for research studies and routine diagnostics within 24 h using phenol extraction, fasp digest, and the metaproteomeanalyzer. Front Microbiol 10:1883 2. Heyer R, Schallert K, Zoun R et al (2017) Challenges and perspectives of metaproteomic data analysis. J Biotechnol 261:24–36 3. Stahl DC, Swiderek KM, Davis MT, Lee TD (1996) Data-controlled automation of liquid chromatography/tandem mass spectrometry analysis of peptide mixtures. J Am Soc Mass Spectrom 7:532–540 4. Venable JD, Dong M-Q, Wohlschlegel J et al (2004) Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat Methods 1:39–45 5. Gillet LC, Navarro P, Tate S et al (2012) Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics 11(O111):016717

6. Doerr A (2014) DIA mass spectrometry. Nat Methods 12:35–35 7. Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5:976–989 8. Tanca A, Palomba A, Fraumene C et al (2016) The impact of sequence database choice on metaproteomic results in gut microbiota studies. Microbiome 4:51 9. Tanca A, Palomba A, Deligios M et al (2013) Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture. PLoS One 8:e82981 10. Timmins-Schiffman E, May DH, Mikan M et al (2017) Critical decisions in metaproteomics: achieving high confidence protein annotations in a sea of unknowns. ISME J 11: 309–314 11. O’Leary NA, Wright MW, Brister JR et al (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic

Bioinformatics Methods for Metaproteomics expansion, and functional annotation. Nucleic Acids Res 44:D733–D745 12. Li J, Jia H, Cai X et al (2014) An integrated catalog of reference genes in the human gut microbiome. Nat Biotechnol 32:834–841 13. Kuhring M, Renard BY (2015) Estimating the computational limits of detection of microbial non-model organisms. Proteomics 15: 3580–3584 14. Jagtap P, Goslinga J, Kooren JA et al (2013) A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies. Proteomics 13:1352–1357 15. Zhang X, Ning Z, Mayne J et al (2016) MetaPro-IQ: a universal metaproteomic approach to studying human and mouse gut microbiota. Microbiome 4:31 16. Craig R, Beavis RC (2003) A method for reducing the time required to match protein sequences with tandem mass spectra. Rapid Commun Mass Spectrom 17:2310–2316 17. Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20:1466–1467 18. Tyanova S, Temu T, Cox J (2016) The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat Protoc 11:2301–2319 19. Beyter D, Lin MS, Yu Y et al (2018) ProteoStorm: an ultrafast metaproteomics database search framework. Cell Syst 7:463–467 20. Xiao J, Tanca A, Jia B et al (2018) Metagenomic taxonomy-guided database-searching strategy for improving metaproteomic analysis. J Proteome Res 17:1596–1605 21. UniProt Consortium (2021) UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 49:D480–D489 22. Park SKR, Jung T, Thuy-Boun PS et al (2019) ComPIL 2.0: an updated comprehensive metaproteomics database. J Proteome Res 18:616–622 23. Xu T, Park SK, Venable JD et al (2015) ProLuCID: an improved SEQUEST-like algorithm with enhanced sensitivity and specificity. J Proteome 129:16–24 24. Lam H, Deutsch EW, Eddes JS et al (2007) Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7:655–667 25. Craig R, Cortens JC, Fenyo D, Beavis RC (2006) Using annotated peptide mass spectrum libraries for protein identification. J Proteome Res 5:1843–1849

335

26. Frewen BE, Merrihew GE, Wu CC et al (2006) Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal Chem 78: 5678–5684 27. Yang Y, Liu X, Shen C et al (2020) In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics. Nat Commun 11:1–11 28. Gessulat S, Schmidt T, Zolg DP et al (2019) Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat Methods 16:509–518 29. Pietil€a S, Suomi T, Aakko J, Elo LL (2019) A data analysis protocol for quantitative dataindependent acquisition proteomics. Methods Mol Biol 1871:455–465 30. Aakko J, Pietil€a S, Suomi T et al (2020) Dataindependent acquisition mass spectrometry in metaproteomics of gut microbiota—implementation and computational analysis. J Proteome Res 19:432–436 31. Elias JE, Gygi SP (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4:207–214 32. K€all L, Canterbury JD, Weston J et al (2007) Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods 4:923–925 33. The M, MacCoss MJ, Noble WS, K€all L (2016) Fast and accurate protein false discovery rates on large-scale proteomics data sets with percolator 3.0. J Am Soc Mass Spectrom 27:1719–1727 34. Mikan MP, Harvey HR, Timmins-Schiffman E et al (2020) Metaproteomics reveal that rapid perturbations in organic matter prioritize functional restructuring over taxonomy in western Arctic Ocean microbiomes. ISME J 14:39–52 35. Guo X, Li Z, Yao Q et al (2018) Sipros ensemble improves database searching and filtering for complex metaproteomics. Bioinformatics 34:795–802 36. Keller A, Nesvizhskii AI, Kolker E, Aebersold R (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74:5383–5392 37. Cociorva D, Tabb L, Yates JR (2007) Validation of tandem mass spectrometry database search results using DTASelect. Curr Protoc Bioinform 13:Unit 13.4 38. Chatterjee S, Stupp GS, Park SKR et al (2016) A comprehensive and scalable database search

336

Caitlin M. A. Simopoulos et al.

system for metaproteomics. BMC Genomics 17:642 39. Ma B, Zhang K, Hendrie C et al (2003) PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom 17: 2337–2342 40. Frank A, Pevzner P (2005) PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal Chem 77:964–973 41. Fischer B, Roth V, Roos F et al (2005) NovoHMM: a hidden Markov model for de novo peptide sequencing. Anal Chem 77: 7265–7273 42. Kleikamp HBC, Pronk M, Tugui C et al (2021) Database-independent de novo metaproteomics of complex microbial communities. Cell Syst 12:375–383.e5 43. Behsaz B, Mohimani H, Gurevich A et al (2020) De novo peptide sequencing reveals many cyclopeptides in the human gut and other environments. Cell Syst 10:99–108 44. Thompson A, Sch€afer J, Kuhn K et al (2003) Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal Chem 75: 1895–1904 45. Ong S-E, Blagoev B, Kratchmarova I et al (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics 1:376–386 46. Ross PL, Huang YN, Marchese JN et al (2004) Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol Cell Proteomics 3:1154–1169 47. Zhang X, Ning Z, Mayne J et al (2016) In vitro metabolic labeling of intestinal microbiota for quantitative metaproteomics. Anal Chem 88:6120–6125 48. Tang J, Fu J, Wang Y et al (2020) ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies. Brief Bioinform 21: 621–636 49. Riffle M, May DH, Timmins-Schiffman E et al (2018) MetaGOmics: a web-based tool for peptide-centric functional and taxonomic analysis of metaproteomics data. Proteomes 6: 2 50. Mayers MD, Moon C, Stupp GS et al (2017) Quantitative metaproteomics and activitybased probe enrichment reveals significant alterations in protein expression from a mouse model of inflammatory bowel disease. J Proteome Res 16:1014–1026

51. Cheng K, Ning Z, Zhang X et al (2017) MetaLab: an automated pipeline for metaproteomic data analysis. Microbiome 5:157 52. Cheng K, Ning Z, Zhang X et al (2020) MetaLab 2.0 enables accurate posttranslational modifications profiling in metaproteomics. J Am Soc Mass Spectrom 31: 1473–1482 53. Zhang X, Ning Z, Mayne J et al (2020) Widespread protein lysine acetylation in gut microbiome and its alterations in patients with Crohn’s disease. Nat Commun 11:1–12 54. Schiebenhoefer H, Schallert K, Renard BY et al (2020) A complete and flexible workflow for metaproteomics data analysis based on MetaProteomeAnalyzer and prophane. Nat Protoc 15:3212–3239 55. Muth T, Behne A, Heyer R et al (2015) The MetaProteomeAnalyzer: a powerful opensource software suite for metaproteomics data analysis and interpretation. J Proteome Res 14:1557–1565 56. Muth T, Kohrs F, Heyer R et al (2018) MPA portable: a stand-alone software package for analyzing metaproteome samples on the go. Anal Chem 90:685–689 57. Schneider T, Schmid E, de Castro JV et al (2011) Structure and function of the symbiosis partners of the lung lichen (Lobaria pulmonaria L. Hoffm.) analyzed by metaproteomics. Proteomics 11:2752–2756 58. Geer LY, Markey SP, Kowalak JA et al Open mass spectrometry search algorithm. J Proteome Res 3:958–964 59. Van Den Bossche T, Verschaffelt P, Schallert K et al (2020) Connecting MetaProteomeAnalyzer and PeptideShaker to unipept for seamless end-to-end metaproteomics data analysis. J Proteome Res 19:3562–3566 60. Vaudel M, Burkhart JM, Zahedi RP et al (2015) PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nat Biotechnol 33:22–24 61. Gurdeep Singh R, Tanca A, Palomba A et al (2019) Unipept 4.0: functional analysis of metaproteome data. J Proteome Res 18: 606–615 62. Verschaffelt P, Van Den Bossche T, Martens L et al (2021) Unipept desktop: a faster, more powerful metaproteomics results analysis tool. J Proteome Res 20:2005–2009 63. Perez-Riverol Y, Csordas A, Bai J et al (2018) The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res 47: D442–D450

Bioinformatics Methods for Metaproteomics 64. Deutsch EW, Csordas A, Sun Z et al (2017) The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res 45: D1100–D1106 65. Jagtap PD, Blakely A, Murray K et al (2015) Metaproteomic analysis using the galaxy framework. Proteomics 15:3553–3565 66. Huson DH, Weber N (2013) Microbial community analysis using MEGAN. Methods Enzymol 531:465–485 67. Ro¨st HL, Sachsenberg T, Aiche S et al (2016) OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Methods 13:741–748 68. Gru¨ning B, Chilton J, Ko¨ster J et al (2018) Practical computational reproducibility in the life sciences. Cell Syst. 6:631–635 69. Berthold MR, Cebron N, Dill F et al (2007) KNIME: the Konstanz information miner. In: Studies in classification, data analysis, and knowledge organization (GfKL 2007). Springer 70. Sachsenberg T, Herbst FA, Taubert M et al (2015) MetaProSIP: automated inference of stable isotope incorporation rates in proteins for functional metaproteomics. J Proteome 14:619–627 71. Deutsch EW, Mendoza L, Shteynberg D et al (2015) Trans-proteomic pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics. Proteomics Clin Appl 9:745–754 72. Rabe A, Gesell Salazar M, Michalik S et al (2019) Metaproteomics analysis of microbial diversity of human saliva and tongue dorsum in young healthy individuals. J Oral Microbiol 11:1654786 73. V€alikangas T, Suomi T, Elo LL (2018) A systematic evaluation of normalization methods in quantitative label-free proteomics. Brief Bioinform 19:1–11 74. Willforss J, Chawade A, Levander F (2019) NormalyzerDE: online tool for improved normalization of omics expression data and high-sensitivity differential expression analysis. J Proteome Res 18:732–740 75. Polpitiya AD, Qian W-J, Jaitly N et al (2008) DAnTE: a statistical tool for quantitative analysis of -omics data. Bioinformatics 24: 1556–1558 76. Marion S, Desharnais L, Studer N et al (2020) Biogeography of microbial bile acid transformations along the murine gut. J Lipid Res 61: 1450–1463 77. Karpievitch YV, Dabney AR, Smith RD (2012) Normalization and missing value

337

imputation for label-free LC-MS analysis. BMC Bioinform 13:1–9 78. Lazar C, Gatto L, Ferro M et al (2016) Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. J Proteome Res 15:1116–1125 79. Liu M, Dongre A (2020) Proper imputation of missing values in proteomics datasets for differential expression analysis. Brief Bioinform 22:bbaa112 80. Wang S, Li W, Hu L et al (2020) NAguideR: performing and prioritizing missing value imputations for consistent bottom-up proteomic analyses. Nucleic Acids Res 48:e83–e83 81. Graw S, Tang J, Zafar MK et al (2020) proteiNorm—a user-friendly tool for normalization and analysis of TMT and label-free protein quantification. ACS Omega 5: 25625–25633 82. Nesvizhskii AI, Aebersold R (2005) Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics 4: 1419–1440 83. Serang O, Noble W (2012) A review of statistical methods for protein identification using tandem mass spectrometry. Stat Interface 5: 3–20 84. Carbon S, Douglass E, Dunn N et al (2019) The gene ontology resource: 20 years and still GOing strong. Nucleic Acids Res 47: D330–D338 85. Bairoch A (2000) The ENZYME database in 2000. Nucleic Acids Res 28:304–305 86. Mooradian AD, van der Post S, Naegle KM, Held JM (2020) ProteoClade: a taxonomic toolkit for multi-species and metaproteomic analysis. PLoS Comput Biol 16:e1007741 87. Saunders JK, Gaylord DA, Held NA et al (2020) METATRYP v 2.0: metaproteomic least common ancestor analysis for taxonomic inference using specialized sequence assemblies-standalone software and web servers for marine microorganisms and coronaviruses. J Proteome Res 19:4718–4729 88. Saito MA, Saunders JK, Chagnon M et al (2021) Development of an ocean protein portal for interactive discovery and education. J Proteome Res 20:326–336 89. Ogata H, Goto S, Sato K et al (1999) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 27:29–34 90. Galperin MY, Wolf YI, Makarova KS et al (2021) COG database update: focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Res 49: D274–D281

338

Caitlin M. A. Simopoulos et al.

91. Huerta-Cepas J, Szklarczyk D, Heller D et al (2019) EggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res 47(D1): D309–D314 92. The UniProt Consortium (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47:D506–D515 93. Blakeley-Ruiz JA, Erickson AR, Cantarel BL et al (2019) Metaproteomics reveals persistent and phylum-redundant metabolic functional stability in adult human gut microbiomes of Crohn’s remission patients despite temporal variations in microbial taxa, genomes, and proteomes. Microbiome 7:18 94. Easterly CW, Sajulga R, Mehta S et al (2019) MetaQuantome: an integrated, quantitative metaproteomics approach reveals connections between taxonomy and protein function in complex microbiomes. Mol Cell Proteomics 18:S82–S91 95. Simopoulos CMA, Ning Z, Zhang X et al (2020) pepFunk: a tool for peptide-centric functional analysis of metaproteomic human gut microbiome studies. Bioinformatics 36: 4171–4179

96. Bolyen E, Dillon M, Bokulich N et al (2019) Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 37:852–857 97. Rechenberger J, Samaras P, Jarzab A et al (2019) Challenges in clinical metaproteomics highlighted by the analysis of acute leukemia patients with gut colonization by multidrugresistant enterobacteriaceae. Proteomes 7:2 98. Starke R, Bastida F, Abadı´a J et al (2017) Ecological and functional adaptations to water management in a semiarid agroecosystem: a soil metaproteomics approach. Sci Rep 7:1–16 99. Li L, Ning Z, Zhang X et al (2020) RapidAIM: a culture- and metaproteomics-based rapid assay of individual microbiome responses to drugs. Microbiome 8:33 100. Li L, Chang L, Zhang X et al (2020) Berberine and its structural analogs have differing effects on functional profiles of individual gut microbiomes. Gut Microbes 11: 1348–1361 101. Li L, Ryan J, Ning Z et al (2020) A functional ecological network based on metaproteomics responses of individual gut microbiomes to resistant starches. Comput Struct Biotechnol J 18:3833–3842

Chapter 23 MaxQuant Module for the Identification of Genomic Variants Propagated into Peptides Pavel Sinitcyn, Maximilian Gerwien, and Ju¨rgen Cox Abstract Standard shotgun proteomics data analysis pipelines usually only identify peptides that are encoded in the reference genome. In many situations, it is desirable to identify peptides resulting from non-synonymous variations as well. Here, we present a new module in the MaxQuant software that takes both DNA and RNA based next-generation sequencing (NGS) data as well as raw proteomics data as input. This allows for the identification of variant peptides that are otherwise missed. Key words MaxQuant, Proteogenomics

1

Proteomics,

Immunopeptidomics,

NGS,

Sequence

variations,

Introduction The most common method for identifying peptides in shotgun proteomics makes use of a peptide database search engine [1– 3]. In the MaxQuant software [4] the Andromeda search engine [5] fulfills this task. In the standard mode, only peptides that are derived from the reference genome will be identified. Proteogenomics [6, 7] is the general term describing efforts to find peptidelevel evidence for genomic variants of any kind. Several large-scale datasets from the cancer field are publicly available to study such proteogenomics effects [8–12]. We specifically describe here a computational workflow for the identifications of peptides resulting from single nucleotide polymorphisms (SNP) (see Fig. 1). This has important applications for instance in immunopeptidomics [13, 14] or for allele-specific protein quantification. In this protocol, we show how the novel MaxQuant module is applied on two datasets, one proteome Hela dataset and one peptidomics dataset. For both, one first has to preprocess the next-generation sequencing (NGS) data by mapping it to the reference genome. The MaxQuant module will then take these data of aligned reads as

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_23, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

339

 





    

  

 

   

    

                    

  

Pavel Sinitcyn et al.

 ! "      #   

$   

340

           $ #  

Fig. 1 Overview of the protocol, which consists of three main parts. First, variants are extracted from the NGS data. Then, the MaxQuant analysis is performed, with the variants included in the Andromeda search space. Finally, the resulting variant peptide data is validated and further analyzed

input for calling mutations. Based on this, a special FASTA file is written out that specifies the coding mutations that should be added to the peptide search space in the header of each FASTA entry. These are used in the subsequent MaxQuant analysis to enlarge the search space for the peptide identification in a similar way as variable modifications are applied. The SNP-derived peptide search results are integrated into the standard MaxQuant output tables. We offer several scripts for the post-processing of these results. In the protocol steps, we describe the analysis of both datasets, the Hela proteome [15] and the peptidomics dataset [16]. There is no dependency between them and either of them can be omitted, if desired.

2 2.1

Materials Data Downloads

1. Download the NGS data corresponding to the Hela proteome dataset. To that end, obtain single-end and paired-end RNA-seq data of the HeLa-S3 cell line from the SRA repository (ncbi.nlm.nih.gov/sra) with the dataset accession numbers SRX159818 (Caltech_RnaSeq_HeLa-S3_1x75D) and SRX159826 (Caltech_RnaSeq_HeLa-S3_2x75_200). Also the unaligned fragments (FASTQ) and alignment results (BAM) can be retrieved from the ENCODE webpage (encodeproject.org/)—with identifiers ENCFF781CJU/ ENCFF308ETR for single-end and ENCFF796WEV/ ENCFF129LVO for paired-end data.

MaxQuant Module for the Identification of Genomic Variants Propagated into. . .

341

2. Download the NGS data for the peptidomics dataset. Obtain paired-end RNA-seq data from A2902/A5101/A5401/ A5701-transduced B721.221 cells from SRA the repository with dataset accession numbers SRX2480296, SRX2480297, SRX2480298, and SRX2480299. 3. Download the genomic reference files. The most recent reference genomic information, such as DNA sequence and genomic annotation, can be obtained from the “FTP Download” section of the Ensembl webpage (ensembl.org) or directly from the FTP server (ftp.ensembl.org/pub), where one can find previous releases as well. For this protocol, we suggest the primary assembly of the human genome in FASTA format (Homo_sapiens.GRCh38.dna.primary_assembly.fa) and the genome annotation in GTF format (Homo_sapiens. GRCh38.104.gtf) of Ensembl Release 104 (May 2021). 4. Download the HeLa proteomics raw data. The Orbitrap RAW files can be retrieved from the PRIDE (ebi.ac.uk/pride/) repository with the dataset identifier PXD004452. In this study [15], proteins from the HeLa cell line have been separately digested with four proteases (Trypsin, LysC, GluC, and Chymotrypsin) and fractionated on 39 fractions each, which overall results in 156 RAW files. 5. Download the peptidomics raw data. The Orbitrap RAW files for HLA allele A0201 can be retrieved from the MassIVE repository (massive.ucsd.edu) with the dataset identifier MSV000080527. The Orbitrap RAW files for HLA alleles A3303, B3802, and B4002 can be retrieved from the MassIVE repository with the dataset identifier MSV000084172. Each biological sample is measured in multiple replicates (A0201, A3303, B3802–4 replicates, B4002–8 replicates), which overall results in 20 RAW files [16]. 2.2

Software

1. Although providing processed aligned results (BAM format) together with publications is becoming a common practice, especially for large genomics initiatives, such as ENCODE [17] and GTEx [18], there is frequently a need to adopt a processing pipeline for special experiment design or to reprocess the raw NGS data. We suggest to use a custom pipeline, such as ours (github.com/cox-labs/VariationSearchProtocol), or any user-friendly web platform, for instance, Galaxy [19]. 2. Download the latest version of MaxQuant from maxquant. org/maxquant/ (version 2.0.1 or later). The current version of MaxQuant requires to have .NET Core 2.1 installed (SDK or . NET Runtime). Specific installation instructions depend on the OS and can be found on the official webpage (dotnet. microsoft.com/). The current version of MaxQuant supports

Pavel Sinitcyn et al.



-(( 0+0

% 

0)(

 0., 0.1 01* 0-)

0((

/* .)

)(

// */ *)

1(

./

/**

--

( %"& & 

*+,

%

)-) .(,

'&% &  0  

-  

1  

 (

/0) )((

% 

$     

342

< 4     5  3 6 ' 7 8 9

: ;  % $

        2 0(( ,( .( *( -( (

3     $ %  ; : 9 8 7 ' 63  5    4
$11(1

0,)*  

9'$>$11(1

/  

9'$>?1,(-

-1/)  

9'$>?1,(-

,  

9'$>?*((-

-.1(  

9'$>?*((-

,  

:&

0(

()

()

((

   

9'$>$(-(0

:&

0(



  

343

0

-

1

*

)

.

   =

/

,

+

((

0

-

1

*

)

.

   =

/

,

+

Fig. 3 Sequence motifs for the identified immunopeptides length of 9 amino acids, once for reference peptides (a) and once for variant peptides (b). The mono-allelic HLA cell line is indicated, as is the number of immunopeptides identified

and the output (“Final directory”) will be generated. Optionally, a VCF file can be supplied (“Variant File”) to annotate detected variants. The default “Calling Parameters” are optimal for the most common applications. Refer to Note 1 for exceptions and further details. 2. The variant extraction will generate five files in the specified output directory. Proteins.fa is a FASTA-formatted file that gives the IDs of the proteins, their amino acid sequence and, if this particular protein contains variants, the IDs of those variants. This file can be used for the variant-aware proteomics search in MaxQuant, as described in Subheading 3.2. Variants. txt is a tab-delimited text file that lists all identified variants and gives crucial information on chromosomal position, reference

344

Pavel Sinitcyn et al.

and variant amino acid, type of the variant as well as transcripts in which the respective variant was identified. Transcripts.txt is a tab-delimited text file that represents the same information as in the Variant.txt file, but in transcript-centric manner. Two last XML files—parameters.xml and summary.xml—contain information about parameters which have been used for extraction, and communitive statistics about an amount initial and filtered fragments in BAM files. Note that the output of the variant extraction is not only used as input for the proteomics search (Subheading 3.2), but provides in conjunction with the proteomics data unique proteogenomic insights, as illustrated in Subheading 3.3. 3.2 Variant-Aware MaxQuant Proteomics Search

1. Open MaxQuant and navigate to the “Global Parameters” tab, where, under “Sequences,” the proteins FASTA file can be added. Select the file by clicking it (now highlighted in blue) and add a “Variation rule,” (i.e., a regular expression to correctly parse the variant information). In most cases, the default suggestion—>[^\s]+\s+(.+)—is appropriate, but make sure to validate this by clicking “Test” and confirming that the “Variation” column contains the variants expected from the proteins FASTA file provided. Also, change “Variation Mode” from “None” to “Read from FASTA file.” The default settings for peptide length and mass, and the maximum number of missed cleavages are reasonable for most of the cases but may be changed in certain experimental designs (see Note 2). For a more detailed description of how to set up MaxQuant, refer to the MaxQuant Protocol [22] (see Note 3). 2. The output of the MaxQuant Search will be generated in the specified directory. Depending on the desired information and downstream data analysis demands, different files will be of particular interest. As in a standard proteomics search in MaxQuant, the peptides.txt file lists all identified peptides and most information for the downstream data analysis of this protocol. In particular, two additional columns will be added. “Mutated” column contains the value “No” (no variants, peptide from reference proteome), “Yes” (peptide with at least one variant), or “Mixed” (insertion of a variant makes peptide similar to a reference one) and the “Mutation names” column contains a list of all variants for variant peptides.

3.3

Data Analysis

With the variant information on the genomic/transcriptomic (see Note 4) as well as proteomic level at hand, the challenge for the researcher at this stage becomes to extract meaningful insights from it. What constitutes such insights heavily depends on the research question and field. Thus, no sure formula can be provided in a stepby-step fashion as we enter the more exploratory area of

MaxQuant Module for the Identification of Genomic Variants Propagated into. . .

345

downstream data analysis. Therefore, we provide two scenarios in which the variant information—obtained as described in Subheadings 3.1 and 3.2—is used. 3.3.1 Proteogenomic Analysis of Ultra-deep HeLa Proteome

Reference [15] presents an optimized workflow to generate deep proteomes harnessing multi-enzyme usage, extensive pre-fractionation combined with short gradients and fast scanning. It was benchmarked using the HeLa cell line. In conjunction with RNA-seq data, this resource can be fruitfully explored from a proteogenomic perspective. One of the approaches taken in this study to achieve the proteomic depth was using other proteases in addition to trypsin. With our proteogenomic analysis we were able to quantify the unique variants identified by using trypsin (the most commonly used protease), chymotrypsin, LysC, GluC, or any combination thereof (Fig. 2a). The increased depth achieved by the combinations of proteases (especially the addition of LysC to the standard trypsin) is of great value to proteogenomics studies that aim at the discovery of functionally relevant variants, where often few are detected and even fewer proof to be amenable for downstream analysis. Furthermore, the amino acid changes are not uniformly distributed among all possible changes, rather some transitions are more likely than others (Fig. 2b). This is due to the different frequencies of amino acids, genetic code-related restrictions, but possibly also the physico-chemical nature of amino acids. Teasing these factors apart can hint at the impact of variants on protein expression, post-translation modifications, and protein– protein interaction.

3.3.2 Immunopeptidomic Analysis of HLA Peptides

HLA class I presents short peptides of 8–11 amino acids length. Understanding the peptide binding motifs of the HLA class I is challenging due to the staggering diversity of the HLA binding cleft, in part caused by allelic variation. In references [16, 23] an elegant strategy is reported, where an immunopeptidomic analysis of engineered, mono-allelic cell lines was performed. This allows the discovery of peptide motifs presented by particular mono-allelic HLA cell lines (Fig. 3a). Since the cells present a sampling of peptides derived from their constituent proteome, variant peptides are presented as well. Notably, those are of particular interest from clinical perspectives on immunobiology since such peptides can serve as tell-tale signs of infection or transformation. In Fig. 3b, the motifs of such variant peptides show the characteristic pattern of HLA peptides, which bind to the HLA complex with their terminal residues. As a sanity check for the variant immunopeptide identification by our workflow, the broad pattern of reference peptides’ motifs (Fig. 3a) is reproduced in the variant peptides (Fig. 3b).

346

4

Pavel Sinitcyn et al.

Notes 1. In the case of genetically non-uniform samples, such as patientderived tumor samples, the calling parameters could be less stringent. For example, we highly recommend reducing “Min alternative frequency” up to 0.1–0.15, so that variants, which are supported by 10–15% NGS fragments, will be included in the proteomics search. 2. For Trypsin/LysC/GluC proteases we recommend leaving the default settings of two maximum allowed miss cleavages. However, for chymotrypsin due to the lower specificity it generally recommended to allow up to four missed cleavages. In a case of immunopeptides, the “Digestion mode” should be set to “Unspecific.” The minimum and maximum peptide length for unspecific search should be 8 and 11, respectively. Additionally, due to the specific nature of immunopeptides generation, we recommend setting “Protein FDR” to 1.0 and “Minimum delta score for unmodified peptides” to 6. 3. As noted above, reference [22] offers a detailed, step-by-step guide to using MaxQuant. Furthermore, the video and tutorial material of the annual MaxQuant summer school offers both theory lessons and hands-on sessions in a video format (youtube.com/c/MaxQuantChannel). 4. Even though both described examples utilize RNA-seq data for the purpose of finding non-synonymous variants, whole genome or exome sequencing is well suited for the same purpose.

Acknowledgements We thank Daniel Bader and Michal Bassani-Sternberg for testing and suggestions and all members of the Computational Systems Biochemistry research group for helpful discussions. References 1. Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5: 976–989. https://doi.org/10.1016/10440305(94)80016-2 2. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20: 3551–3567

3. Sinitcyn P, Rudolph JD, Cox J (2018) Computational methods for understanding mass spectrometry–based shotgun proteomics data. Annu Rev Biomed Data Sci 1:207–234 4. Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteomewide protein quantification. Nat Biotechnol 26:1367–1372. https://doi.org/10.1038/ nbt.1511

MaxQuant Module for the Identification of Genomic Variants Propagated into. . . 5. Cox J, Neuhauser N, Michalski A et al (2011) Andromeda: a peptide search engine integrated into the MaxQuant environment. J Proteome Res 10:1794–1805. https://doi.org/10. 1021/pr101065j 6. Jaffe JD, Berg HC, Church GM (2004) Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4(1):59–77. https://doi.org/10. 1002/pmic.200300511 7. Nesvizhskii AI (2014) Proteogenomics: concepts, applications and computational strategies. Nat Methods 11:1114–1125. https:// doi.org/10.1038/nmeth.3144 8. Zhang B, Wang J, Wang X et al (2014) Proteogenomic characterization of human colon and rectal cancer. Nature 513:382–387. https:// doi.org/10.1038/nature13438 9. Zhang H, Liu T, Zhang Z et al (2016) Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell 166:755–765. https://doi.org/10.1016/j. cell.2016.05.069 10. Mertins P, Mani DR, Ruggles KV et al (2016) Proteogenomics connects somatic mutations to signalling in breast cancer. Nature 534: 5 5 – 6 2 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 8 / nature18003 11. Krug K, Jaehnig EJ, Satpathy S et al (2020) Proteogenomic landscape of breast cancer tumorigenesis and targeted therapy. Cell 183: 1436–1456.e31. https://doi.org/10.1016/j. cell.2020.10.036 12. Johansson HJ, Socciarelli F, Vacanti NM et al (2019) Breast cancer quantitative proteome and proteogenomic landscape. Nat Commun 10:1600. https://doi.org/10.1038/s41467019-09018-y 13. Schumacher TN, Schreiber RD (2015) Neoantigens in cancer immunotherapy. Science 348: 69–74 14. Bassani-Sternberg M, Br€aunlein E, Klar R et al (2016) Direct identification of clinically relevant neoepitopes presented on native human melanoma tissue by mass spectrometry. Nat

347

Commun 7:13404. https://doi.org/10. 1038/ncomms13404 15. Bekker-Jensen DB, Kelstrup CD, Batth TS et al (2017) An optimized shotgun strategy for the rapid generation of comprehensive human proteomes. Cell Syst 4:587–599.e4. https://doi. org/10.1016/j.cels.2017.05.009 16. Sarkizova S, Klaeger S, Le PM et al (2020) A large peptidome dataset improves HLA class I epitope prediction across most of the human population. Nat Biotechnol 38:199–209. https://doi.org/10.1038/s41587-0190322-9 17. Davis CA, Hitz BC, Sloan CA et al (2018) The encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res 46: D794–D801. https://doi.org/10.1093/nar/ gkx1081 18. Aguet F, Barbeira AN, Bonazzola R et al (2020) The GTEx consortium atlas of genetic regulatory effects across human tissues. Science 369(6509):1318–1330. https://doi.org/10. 1126/SCIENCE.AAZ1776 19. Jalili V, Afgan E, Gu Q et al (2021) The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2020 update. Nucleic Acids Res 48(W1):W395–W402. https://doi.org/10.1093/NAR/GKAA434 20. Sinitcyn P, Tiwary S, Rudolph JD et al (2018) MaxQuant goes Linux. Nat Methods 15:401 21. Tyanova S, Temu T, Sinitcyn P et al (2016) The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods 13:731–740. https://doi.org/10. 1038/nmeth.3901 22. Tyanova S, Temu T, Cox J (2016) The MaxQuant computational platform for mass spectrometry—based shotgun proteomics. Nat Protoc 11:2301–2319. https://doi.org/10. 1038/nprot.2016.136 23. Abelin JG, Keskin DB, Sarkizova S et al (2017) Mass spectrometry profiling of HLA-associated peptidomes in mono-allelic cells enables more accurate epitope prediction. Immunity 46(2): 315–326. https://doi.org/10.1016/j. immuni.2017.02.007

Chapter 24 Untargeted Metabolomic Profiling of Fungal Species Populations Thomas E. Witte and David P. Overy Abstract This chapter describes protocols for the development of consensus chemical phenotypes or “metabolomes” of fungal populations using ultra-high pressure liquid chromatography coupled to high resolution mass spectrometry (UPLC-HRMS). Isolates are cultured using multiple media conditions to elicit the expression of diverse secondary metabolite biosynthetic gene clusters. The mycelium and spent culture media are extracted using organic solvents and profiled by ultra-high pressure chromatography coupled with a high resolution Thermo Orbitrap XL mass spectrometer with the ability to trap and fragment ions to general MS2 spectra. MS data preprocessing is explained and illustrated using the freely available software MZMine 2. Through data processing, binary matrices of mass features can be generated and then combined into a consensus secondary metabolite phenotype of all isolates grown in all media conditions. The production of consensus chemical phenotypes is useful for screening large fungal populations (both inter and intra-species populations) for isolates potentially expressing novel secondary metabolites or analogs of known secondary metabolites. Key words Mass spectrometry, UPLC-HRMS, Thermo Orbitrap XL, LCMS, Metabolomics, Secondary metabolites, Fungal natural products, MZMine, Consensus chemical phenotypes, Fungi

1

Introduction Fungi possess a wealth of unexplored, rapidly evolving biosynthetic gene clusters, each potentially capable of producing important secondary metabolites. Intra-species variation in secondary metabolite production can occur within fungal populations for many reasons, including epigenetic differences [1], rapid sequence divergence in genomic hotspots [2], or sporadic phyletic distributions of secondary metabolite biosynthetic gene clusters, which are suggestive of genome instability [3] and/or horizontal gene transfer [4]. Understanding the full secondary metabolite potential of a population is of particular interest to researchers bioprospecting for novel molecules with associated biological activities, such as pharmaceuticals and agrochemicals, as well as by pathologists and

Jennifer Geddes-McAlister (ed.), Proteomics in Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 2456, https://doi.org/10.1007/978-1-0716-2124-0_24, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022

349

350

Thomas E. Witte and David P. Overy

regulatory agencies that monitor for the emergence virulent disease pathotypes due to the acquired production of virulence factors such as host specific toxins [5]. Profiling secondary metabolite phenotypes (here also referred to as “metabolomes”) using mass spectrometry is an effective means to investigate fungal population structure and to detect variation in secondary metabolite production. The following protocol is an untargeted high resolution mass spectrometry-based metabolomics approach to define secondary metabolite phenotypes from complex crude fungal extracts. Using a specific example, we describe the secondary metabolite biosynthetic diversity within an intra-species fungal plant pathogen population. Consensus secondary metabolite phenotypes will be used to highlight unique expression of metabolite mass features within the population—with potential ramifications to pathogen virulence. The rapid evolution of secondary metabolite biosynthetic gene clusters is likely a key factor involved in pathogen adaptation to plant defenses or overcoming barriers to invade new plant hosts [6]. Secondary metabolite phenotyping of pathogen populations is therefore relevant when selecting fungal isolates within a target population, such as a species complex or multispecies collection of isolates, for further intensive genomic profiling and plant pathogenicity trials. Consensus phenotypes of mass features associated with plant pathogens are particularly useful in the downstream interpretation of complex extracts of biological systems, such as diseased plants. It is important to note that many aspects of this protocol are not “one size fits all” solutions. Each fungus will have preferred culturing conditions and may produce secondary metabolites with various affinities for organic solvents employed during extraction. Mass spectrometer models, and chromatography parameters may require method development and troubleshooting to meet the needs of the research experiment. Furthermore, data preprocessing parameters must be adjusted upon manual inspection of each raw data set and to suit the experimental setup and research question at hand. The analysis of mass spectrometry data in a metabolomics context is a rapidly evolving field, and new tools are published frequently. Nevertheless, the protocol we provide in this chapter provides the fundamentals required to understand and perform consensus secondary metabolite phenotype analysis for fungal populations. 1.1 Culturing Methodology

The expression of fungal secondary metabolite biosynthetic gene clusters is subject to complex, often overlapping regulatory mechanisms that are sensitive to environmental perturbation [7]. Diversifying the abiotic conditions employed during the axenic culturing of fungal isolates is therefore desirable for maximizing the production of secondary metabolites [8]. Nitrogen, carbon, and

Untargeted Metabolomic Profiling of Fungal Species Populations

351

micronutrient types and concentrations, salt and starvation stress, pH, temperature, light, and humidity are all important parameters to consider when designing laboratory-based metabolomics experiments. Furthermore, individual isolates within a species may display different phenotypes in response to these stimuli. The generation of simplified, averaged “consensus” secondary metabolite phenotypes for populations of isolates cultured in diverse media conditions overcomes minor variation in biosynthetic gene regulation to visualize wide-scale secondary metabolite output at a population level [9]. 1.2

UPLC-HRMS

The use of ultra-high pressure liquid chromatography coupled to high resolution mass spectrometry (UPLC-HRMS) is critical for separating complex mixtures and obtaining precise mass to charge ratio (m/z) data (Δm/z < 5 ppm) on their constituents. Any make or model of high resolution mass spectrometer can be applied to metabolomics research. Mass spectrometers such as the Thermo Orbitrap are an excellent choice for high resolution mass spectrometry as they can also be used to carry out MSn experiments on specific mass features of interest. Research projects may be constrained by the availability and cost of local high resolution mass spectrometers. Low-resolution mass spectrometers can be used to generate metabolomic visualizations as demonstrated in this analysis, however downstream mass feature annotation becomes limited by the inability to accurately predict chemical formulas and ion fragment masses from low-resolution m/z. We have refrained from discussing mass feature annotation in this protocol, as it is a rapidly evolving field and is not a trivial task [10]. For annotation of small molecules from mass spectrometry data we refer the reader to numerous in silico computation-based tools, such as MS-FINDER [11] or CSI:Finder-ID [12], in addition to online MS2 database spectral comparison workflows, such as GNPS [13]. MS2-based molecule annotation is assisted by mass spectrometers such as the Thermo QExactive, which can rapidly produce large amounts of high resolution MS2 data, capable of effectively trapping and fragmenting nearly all m/z detected in a sample and can rapidly switch between negative and positive ion mode without inconveniencing UPLC-HRMS run times. This ability comes at an increased cost, which may not be feasible for many research programs. Samples produced by the protocol described in this chapter can always be further analyzed using different mass spectrometers once they are predicted to contain molecules of interest.

1.3

Data Analysis

Data preprocessing involves the simplification of ion “peaks” from raw MS data into a matrix of discriminate mass features consisting of a m/z linked to a chromatographic retention time (RT)—essentially compressing three-dimensional data into a two-dimensional matrix. Conversion of preprocessed metabolomics data into a

352

Thomas E. Witte and David P. Overy

binary and/or a frequency-based format simplifies the interpretation of secondary metabolite phenotypes by eliminating m/z signal intensity variation, which may arise from extract concentration variability or fungal competence levels. The protocol described here is therefore focused on the visualization of secondary metabolite expression patterns and is not designed for secondary metabolite quantification.

2

Materials

2.1 Materials for Biological Sample Culturing and Extraction

1. Frozen stocks of fungal isolates cultured from a single spore (can be mycelial plugs or conidia). 2. Mannitol, Murashige and Skoog Salts medium (MMK2): 40 g mannitol, 5 g yeast extract, 4.3 g Murashuge & Skoog salts, per 1 L distilled water. 3. Czapek Yeast Autolysate medium (CYA): 3 g NaNO3, 1 g KH2PO4, 500 mg KCl, 500 mg MgSO47H2O, 10 mg FeSO47H2O, 5 g yeast extract, 30 g sucrose, 1000 μL ‘trace elements’ (see below), per 1 L distilled water. 4. Trace elements: 1 g ZnSO47H2O, 0.5 g CuSO45H2O, per 100 mL distilled water. 5. Yeast Extract Sucrose medium (YES): 20 g yeast extract, 150 g sucrose, 500 mg MgSO47H2O, per 1 L distilled water. 6. Yeast Extract Sucrose with Instant Ocean medium (YESIO): add 18 g Instant Ocean salts to YES formulation. 7. Glassware: 50 mL glass slant tubes, Pasteur pipettes, 20 mL borosilicate scintillation vials with foil-lined caps, 125 mL Erlenmeyer flasks, 2 mL HPLC vials with polytetrafluoroethylene (PTFE) lined lids. 8. Equipment for cultivating fungi and working in sterile conditions: An incubator, an autoclave, a B2 biosafety cabinet (for plant pathogenic fungi), 118 mL Whorl-Pak bags, and a 20  C freezer. 9. Extraction solvent: HPLC-grade ethyl acetate. 10. Resuspension solvent: UPLC-grade methanol. 11. Equipment for processing extracts: A chemical fume hood, a Genevac vacuum concentrator or nitrogen blower (or equivalent means for drying organic solvents), a highly accurate mass scale (accurate to 0.1 mg).

2.2 Materials for UPLC-HRMS

1. MS instrument: a UPLC-HRMS equipped with an ion trap capable of acquiring MSn spectra of molecules up to m/z 2000 at high resolution (