Proteomics Data Analysis 1071616404, 9781071616406

This thorough book collects methods and strategies to analyze proteomics data. It is intended to describe how data obtai

387 51 8MB

English Pages 339 [325] Year 2021

Table of contents :
Dedication
Preface
Contents
Contributors
Part I: Data Analysis for Gel-Based Proteomics
Chapter 1: Two-Dimensional Gel Electrophoresis Image Analysis
1 Introduction
2 Image Preprocessing
3 Spot Detection and Spot Quantification
4 Gel Warping and Matching
5 Conclusions
References
Chapter 2: Chemometric Tools for 2D-PAGE Data Analysis
1 Introduction
2 Data Arrangement and Scaling
3 Pattern Recognition Methods
3.1 Principal Component Analysis (PCA)
3.2 Clustering Methods
4 Classification Methods
4.1 Evaluation of Classification Performances
4.2 Linear Discriminant Analysis (LDA)
4.3 Partial Least Squares Discriminant Analysis (PLS-DA)
4.4 Soft-Independent Model of Class Analogy (SIMCA)
4.5 Ranking-PCA
5 Concluding Remarks
References
Part II: Data Analysis for Gel-Free Proteomics
Chapter 3: Software Options for the Analysis of MS-Proteomic Data
1 Introduction
2 Software Tools for MS Data Processing
2.1 Preprocessing of Raw Data
2.2 Search Engines and Scoring Algorithms
2.3 Sequence Databases for Proteomics
2.4 Database Interrogation for Proteomics Identification
2.5 Quantification Algorithms
2.6 Format of Output Results
2.7 Integrated Analysis Platforms
3 Conclusions
4 Notes
References
Chapter 4: Analysis of Label-Based Quantitative Proteomics Data Using IsoProt
1 Introduction
2 Materials
2.1 Installing Docker
2.2 Preparing the Input Files
2.3 Launching IsoProt
3 Methods
4 Results and Interpretation
4.1 Quality Control
4.2 Data Interpretation
5 Notes
References
Chapter 5: Quantification of Changes in Protein Expression Using SWATH Proteomics
1 Introduction
2 Materials
2.1 Equipment
2.2 Protein Digestion
2.3 Samples
2.4 Chromatography
2.5 Mass Spectrometry
2.6 Data Processing
3 Methods
3.1 Cell Lysis and Protein Digestion
3.2 Preparation of Test Samples
3.3 Instrument Setup and Data Acquisition
3.4 SWATH Relative Abundance Quantification
3.5 Assessment of SWATH Data Quality
3.6 Data Analysis of the Full Dataset
4 Conclusions
5 Notes
References
Chapter 6: Data Processing and Analysis for DIA-Based Phosphoproteomics Using Spectronaut
1 Introduction
2 Materials
3 Methods
3.1 Create Spectral Library
3.2 Set Up DIA Analysis in Spectronaut
3.2.1 DIA Search Using Prerecorded Spectral Libraries
3.2.2 DIA Search Without Libraries or directDIA Search
3.2.3 Export DIA Results
3.3 Collapse to Phospho-Sites: Perseus Plugin
3.4 Differential Regulation Analysis with Prostar
4 Notes
References
Chapter 7: Glycan Compositions with GlyConnect Compozitor to Enhance Glycopeptide Identification
1 Introduction
2 Glycopeptide Identification Software
2.1 An Ever-Growing Catalog
2.2 Glycan Composition File Selection
3 Practical Examples
3.1 Exploring the Urine O-Glycome in GlyConnect
3.2 The N-Glycome of Erythropoietin
3.3 Using GlyConnect Metadata
4 Conclusion
5 Notes
References
Chapter 8: Elaboration Pipeline for the Management of MALDI-MS Imaging Datasets
1 Introduction
2 Materials
2.1 Data Acquisition
2.2 Computer Specifications
2.3 Software and Web Resources
3 Methods
3.1 Data Visualization and Tissue Annotation
3.2 Data Import
3.3 Data Preprocessing and Feature Selection
3.4 Unsupervised Statistical Analysis
3.5 Supervised Statistical Analysis
3.6 Internal Calibration and Protein ID Assignment
4 Notes
References
Chapter 9: Features Selection and Extraction in Statistical Analysis of Proteomics Datasets
1 Introduction
2 Inductive Reasoning, Dimensionality and Sparsity
3 Data Processing Before Feature Selection
4 Feature Selection
4.1 Linear Discriminant Analysis (LDA)
4.2 Partial Least Squares Discriminant Analysis (PLS-DA)
4.3 Principal Component Analysis (PCA)
4.4 Clustering Methods
5 Cross-Validation and Performance Estimation of a Proteomics Signature
5.1 k-Fold CV
5.2 Leave-One-Out CV (LOOCV)
5.3 Performance Estimation
6 Some Examples of Statistical Workflows Applied to Proteomics
6.1 Clinical Proteomics: Diagnostic Protein Signatures
6.2 Classification Methods for High-Dimension Proteomics Data of Mixed Quality
7 Concluding Remarks
8 Notes
References
Part III: Proteomics Data Interpretation
Chapter 10: ORA, FCS, and PT Strategies in Functional Enrichment Analysis
1 Introduction
2 Materials
2.1 Data Sources
2.2 Software and Apps
3 Methods
3.1 Metadata
3.2 Process Discovery Matrix
3.3 Differential Expression (DE) Analysis Using Limma
3.4 Functional Enrichment Analysis: ORA
3.5 Functional Class Scoring (FCS) Using GSEA
3.6 Pathway-Topology (PT) Analysis
4 Notes
References
Chapter 11: GO Enrichment Analysis for Differential Proteomics Using ProteoRE
1 Introduction
2 Materials: ProteoRE Tools and Content
2.1 ProteoRE Tools
2.2 Data Files
2.3 History, Datasets, and Availability of Analyses
3 Methods: Using ProteoRE Tools to Perform the Functional Analysis of a Proteomics Dataset
3.1 Access the ProteoRE Web Interface
3.2 Upload Datasets in Your History
3.3 Conversion of Identifiers (ID)
3.4 Filtering Proteins with No Identifier
3.5 Annotating the List of Differentially Expressed Proteins
3.6 Building a Breast Cancer Proteome as a Reference Background for GO Enrichment Analysis
3.7 Performing GO Singular Enrichment Analysis (Fisher´s Exact Test)
3.8 Performing GO Modular Enrichment Analysis Using ``Weight´´ Algorithm
3.9 Running GO Terms Enrichment Comparison Between Up- and Downregulated Proteins
4 Notes
References
Chapter 12: Protein Subcellular Localization Prediction
1 Introduction
2 Methods for Subcellular Extraction of Proteins
3 Mass Spectrometry Qualitative and Quantitative Analysis
4 Data Analysis for Subcellular Localization
4.1 Analysis of Protein Subcellular Localization by BUSCA Software
4.1.1 Input of Data
4.1.2 Output of Analysis
5 Recent Spatial Proteomics Approaches
6 Conclusion
References
Chapter 13: Protein Secretion Prediction Tools and Extracellular Vesicles Databases
1 Introduction
2 Materials
3 Methods
3.1 Classical Protein Secretion Prediction: SignalP 5.0
3.2 Transmembrane Protein Prediction: TMHMM
3.3 Non-classical Protein Secretion Prediction: SecretomeP
3.4 ExoCarta and Vesiclepedia
4 Notes
References
Chapter 14: Databases for Protein-Protein Interactions
1 Introduction
2 Materials
3 Methods
3.1 MINT
3.2 STRING
3.3 BioGRID
3.4 IntAct
3.5 DIP
3.6 HPRD
3.7 I2D
3.8 BIND
3.9 MPact
4 Computational Methods for Predicting Protein-Protein Interactions
4.1 Struct2Net
4.2 HOMCOS
4.3 ENTS
4.4 Comparison of Database Features
References
Chapter 15: Machine and Deep Learning for Prediction of Subcellular Localization
1 Introduction
2 Materials
2.1 Benchmark Datasets
2.2 Evaluation Criteria
3 Methods
3.1 Evolution Information
3.2 Feature Condensing
3.3 Convolutional Neural Network
3.4 Multi-label Classification
4 Notes
References
Chapter 16: Deep Learning for Protein-Protein Interaction Site Prediction
1 Introduction
2 Materials
2.1 Computing Resources
2.1.1 Software Installations
2.1.2 Machine Learning Frameworks
2.2 Databases and Datasets
2.3 Tools for Computing Features and Representations
2.3.1 Sequence-Based
2.3.2 Sequence Embeddings
2.3.3 Structure-Based
3 Methods
3.1 Data
3.1.1 Curation
3.1.2 Train-Test Split Strategies
3.1.3 Representation
3.1.4 Input Features
3.1.5 Pre-processing
3.2 Model Evaluation
3.2.1 Hyperparameter Tuning
3.2.2 Evaluation Metrics
3.2.3 Overfitting
3.2.4 Attribution
3.3 Alternative Training Regimes for Future Model Development
3.3.1 Multi-modal Input
3.3.2 Transfer Learning
3.3.3 Multi-task Learning
3.3.4 Learning Using Privileged Information
3.3.5 Uncertainty Modeling and Active Learning
3.3.6 Attention Mechanisms
3.3.7 Ensembling
4 Notes
References
Part IV: Proteomics Data Integration with Other -Omics
Chapter 17: Integrative Analysis of Incongruous Cancer Genomics and Proteomics Datasets
1 Introduction
2 Materials
2.1 Data Sources
2.2 Software and Apps
3 Methods
4 Notes
References
Chapter 18: Integration of Proteomics and Other Omics Data
1 Introduction
2 Materials
3 Methods
3.1 Unsupervised Analysis for Identifying Protein Functions
3.2 Unsupervised Analysis for Identifying Sample Heterogeneity or Protein Subgrouping
3.2.1 Sample Heterogeneity Analysis
3.2.2 Protein Clustering Analysis
3.3 Supervised Analysis for Identifying Proteomics Markers That Are Associated with Outcomes/Phenotypes
3.4 Supervised Analysis for Constructing Predictive Models for Outcomes/Phenotypes
3.5 Application Notes
4 Concluding Remarks
References
Index

Recommend Papers

Proteomics Data Analysis (Methods in Molecular Biology, 2361) 1071616404, 9781071616406

This thorough book collects methods and strategies to analyze proteomics data. It is intended to describe how data obtai

105 23 8MB Read more

Statistical Analysis in Proteomics 1493931059, 9781493931057

This valuable collection aims to provide a collection of frequently used statistical methods in the field of proteomics.

398 77 22KB Read more

Data analysis and visualization in genomics and proteomics [1 ed.] 0470094397, 9780470094396, 9780470094402

Data Analysis and Visualization in Genomics and Proteomics is the first book addressing integrative data analysis and vi

387 22 3MB Read more

Handbook of astronomical data analysis

556 68 5MB Read more

SQL for Data Analysis 9781492088783

1,126 165 14MB Read more

Metagenomic Data Analysis 1071630717, 9781071630716

This volume describes different sequencing methods, pipelines and tools for metagenome data analyses. Chapters guide rea

169 42 24MB Read more

Multivariate Data Analysis 129202190X, 9781292021904

For graduate and upper-level undergraduate marketing research courses. For over 30 years, this text has provided student

848 35 11MB Read more

Seismic data analysis (unoficcial translate)

Первое издание, под названием «Обработка сейсмических данных», было опубликовано в 1987 г. Обществом геофизиковразведчик

443 85 63MB Read more

Intelligent Data Analysis 9781599049823, 1599049821

Pattern Recognition has a long history of applications to data analysis in business, military and social economic activi

430 46 6MB Read more

Proteomics 9780306468957, 0306468956

Proteomics is an introduction to the exciting new field of proteomics, an interdisciplinary science that includes biolog

336 96 2MB Read more

Proteomics Data Analysis
1071616404, 9781071616406

Author / Uploaded
Daniela Cecconi

Similar Topics
Biology
Molecular

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Methods in Molecular Biology 2361

Daniela Cecconi Editor

Proteomics Data Analysis

METHODS

IN

MOLECULAR BIOLOGY

Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, UK

For further volumes: http://www.springer.com/series/7651

For over 35 years, biological scientists have come to rely on the research protocols and methodologies in the critically acclaimed Methods in Molecular Biology series. The series was the first to introduce the step-by-step protocols approach that has become the standard in all biomedical protocol publishing. Each protocol is provided in readily-reproducible step-bystep fashion, opening with an introductory overview, a list of the materials and reagents needed to complete the experiment, and followed by a detailed procedure that is supported with a helpful notes section offering tips and tricks of the trade as well as troubleshooting advice. These hallmark features were introduced by series editor Dr. John Walker and constitute the key ingredient in each and every volume of the Methods in Molecular Biology series. Tested and trusted, comprehensive and reliable, all protocols from the series are indexed in PubMed.

Proteomics Data Analysis Edited by

Daniela Cecconi Department of Biotechnology, University of Verona, VERONA, Verona, Italy

Editor Daniela Cecconi Department of Biotechnology University of Verona VERONA, Verona, Italy

ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-0716-1640-6 ISBN 978-1-0716-1641-3 (eBook) https://doi.org/10.1007/978-1-0716-1641-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana imprint is published by the registered company Springer Science+Business Media, LLC part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.

Dedication To my parents, for always loving and supporting me.

v

Preface This book is a collection of methods and strategies to analyze proteomics data. It is intended to describe how data obtained by gel-based or gel-free proteomics approaches can be analyzed, organized, and interpreted to extrapolate biological information. The extensive diffusion of proteomics has been accompanied by the development of dedicated bioinformatics tools, software packages, databases, and workflows, which are continuously updated and developed. This book addresses the need to introduce researchers to new tools and approaches for data analysis, and it is for both students and experienced practicing experts in the areas of the study of proteomics. Anyone who is facing the first steps in proteomics data analysis should be able to take from this book an understating of available alternatives, at the level of approaches, software, and bioinformatics tools, to analyze the data, whether they come from two-dimensional electrophoresis experiments or from the most advanced mass spectrometry. Proteomics Data Analysis had its origins in an invitation received in September 2019 by Professor emeritus John M. Walker. I felt honored and immediately accepted to be the guest editor of a book on the analysis of proteomic data, aware of the challenge but also confident given the attention I have directed toward this topic during my research activity. I have been dealing with proteomics analysis for 20 years now, and I always paid attention not only to the application of proteomic approaches but also to the development of analytical strategies. In recent years, I have also been involved in teaching bioinformatic tools for proteomics data analysis, for example during the “Clinical Proteomics” course for master’s degree students in Molecular and Medical Biotechnology at the University of Verona. I taught bioinformatics tools also during the Advanced Proteomic School of the “International Proteomics and Metabolomics Conference” organized in collaboration with the Division of Mass Spectrometry of the Italian Chemical Society (DSM-SCI). The enthusiastic response to this School reflected the need of introducing scientists to the new tools for proteomics data analysis. These teaching experiences provided the motivation for this book. This book is structured in four parts. The first part (Chapters 1 and 2) is composed of chapters dealing with strategies to analyze proteomics data obtained by gel-based approaches, exploring the potential of multivariate statistical techniques for candidate biomarker identification. The subsequent part highlights the different data analysis approaches for gel-free proteomics experiments (Chapters 3–9), including software options for pre-processing and database search of raw mass spectrometry (MS) data, strategies for label-based and label-free quantitative proteomics, tools for phospho- and glycoproteomics, as well as methods to manage imaging MS and to select and extract significant information from proteomics datasets. The third part (Chapters 10–16) is focused on bioinformatic tools for the interpretation of proteomics data to obtain biologically significant information, such as proteins and pathways that are over-represented and may have an association with disease phenotypes, as well as information on subcellular localization, secretion, and interactions of proteins, also by machine learning approaches. In the final part, two chapters (Chapters 17 and 18) deal with methods to integrate proteomics data with other omics datasets including genomics, transcriptomics, metabolomics, and other types of data.

vii

viii

Preface

The people to thank are numerous. First, I wish to thank Professor emeritus John M. Walker for his guidance in finalizing this book. I also thank the members of my lab, and all the students that have attended the lab over the years, for their experiments, their helpful scientific discussions, and their motivation in proteomics research. They have contributed in some way to my growth, not only from a scientific point of view but also as a person. I am also so grateful to all the authors who accepted my invitation to write a chapter for this book, for their interesting contributions. Finally, I thank you, dear reader, for choosing this book. I hope that it can be an effective help in your research activity. Verona, Italy

Daniela Cecconi

Contents Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

PART I

DATA ANALYSIS FOR GEL-BASED PROTEOMICS

1 Two-Dimensional Gel Electrophoresis Image Analysis . . . . . . . . . . . . . . . . . . . . . . . ` , and Emilio Marengo Elisa Robotti, Elisa Cala 2 Chemometric Tools for 2D-PAGE Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . ` , and Emilio Marengo Elisa Robotti, Elisa Cala

PART II

v vii xi

3 15

DATA ANALYSIS FOR GEL-FREE PROTEOMICS

3 Software Options for the Analysis of MS-Proteomic Data . . . . . . . . . . . . . . . . . . . . 35 Avinash Yadav, Federica Marini, Alessandro Cuomo, and Tiziana Bonaldi 4 Analysis of Label-Based Quantitative Proteomics Data Using IsoProt . . . . . . . . . 61 Johannes Griss and Veit Schw€ a mmle 5 Quantification of Changes in Protein Expression Using SWATH Proteomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Clarissa Braccia, Nara Liessi, and Andrea Armirotti 6 Data Processing and Analysis for DIA-Based Phosphoproteomics Using Spectronaut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 Ana Martinez-Val, Dorte Breinholdt Bekker-Jensen, Alexander Hogrebe, and Jesper Velgaard Olsen 7 Glycan Compositions with GlyConnect Compozitor to Enhance Glycopeptide Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Julien Mariethoz, Catherine Hayes, and Fre´de´rique Lisacek 8 Elaboration Pipeline for the Management of MALDI-MS Imaging Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Andrew Smith, Isabella Piga, Vanna Denti, Clizia Chinello, and Fulvio Magni 9 Features Selection and Extraction in Statistical Analysis of Proteomics Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Marta Lualdi and Mauro Fasano

PART III 10 11

PROTEOMICS DATA INTERPRETATION

ORA, FCS, and PT Strategies in Functional Enrichment Analysis . . . . . . . . . . . . . 163 Marco Fernandes and Holger Husi GO Enrichment Analysis for Differential Proteomics Using ProteoRE . . . . . . . . 179 Florence Combes, Valentin Loux, and Yves Vandenbrouck

ix

x

Contents

12

Protein Subcellular Localization Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Elettra Barberis, Emilio Marengo, and Marcello Manfredi 13 Protein Secretion Prediction Tools and Extracellular Vesicles Databases . . . . . . . Daniela Cecconi, Claudia Di Carlo, and Jessica Brandi 14 Databases for Protein–Protein Interactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Natsu Nakajima, Tatsuya Akutsu, and Ryuichiro Nakato 15 Machine and Deep Learning for Prediction of Subcellular Localization . . . . . . . . Gaofeng Pan, Chao Sun, Zijun Liao, and Jijun Tang 16 Deep Learning for Protein–Protein Interaction Site Prediction . . . . . . . . . . . . . . . Arian R. Jamasb, Ben Day, Ca˘ta˘lina Cangea, Pietro Lio`, and Tom L. Blundell

PART IV 17

18

197 213 229 249 263

PROTEOMICS DATA INTEGRATION WITH OTHER -OMICS

Integrative Analysis of Incongruous Cancer Genomics and Proteomics Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Karla Cervantes-Gracia, Richard Chahwan, and Holger Husi Integration of Proteomics and Other Omics Data. . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Mengyun Wu, Yu Jiang, and Shuangge Ma

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

325

Contributors TATSUYA AKUTSU • Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan ANDREA ARMIROTTI • Analytical Chemistry Lab, Istituto Italiano di Tecnologia, Genova, Italy ELETTRA BARBERIS • Department of Translational Medicine, University of Piemonte Orientale, Novara, Italy; Center for Translational Research on Autoimmune and Allergic Diseases, CAAD, University of Piemonte Orientale, Novara, Italy DORTE BREINHOLDT BEKKER-JENSEN • Novo Nordisk Foundation Center for Protein Research, Proteomics Program, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark; Evosep Biosystems, Odense, Denmark TOM L. BLUNDELL • Department of Biochemistry, University of Cambridge, Cambridge, UK TIZIANA BONALDI • Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy CLARISSA BRACCIA • D3 Pharmachemistry, Istituto Italiano di Tecnologia, Genova, Italy JESSICA BRANDI • Department of Biotechnology, University of Verona, Verona, Italy ELISA CALA` • Department of Sciences and Technological Innovation, University of Piemonte Orientale, Alessandria, Italy CA˘TA˘LINA CANGEA • Department of Computer Science and Technology, University of Cambridge, Cambridge, UK DANIELA CECCONI • Department of Biotechnology, University of Verona, Verona, Italy KARLA CERVANTES-GRACIA • Institute of Experimental Immunology, University of Zurich, Zurich, Switzerland RICHARD CHAHWAN • Institute of Experimental Immunology, University of Zurich, Zurich, Switzerland CLIZIA CHINELLO • Department of Medicine and Surgery, Proteomics and Metabolomics Unit, University of Milano-Bicocca, Milan, Italy FLORENCE COMBES • Universite´ Grenoble Alpes, INSERM, CEA, UMR BioSante´ U1292, Grenoble, France ALESSANDRO CUOMO • Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy BEN DAY • Department of Computer Science and Technology, University of Cambridge, Cambridge, UK VANNA DENTI • Department of Medicine and Surgery, Proteomics and Metabolomics Unit, University of Milano-Bicocca, Milan, Italy CLAUDIA DI CARLO • Department of Biotechnology, University of Verona, Verona, Italy MAURO FASANO • Department of Science and High Technology, Center of Bioinformatics, University of Insubria, Busto Arsizio, Italy MARCO FERNANDES • Department of Psychiatry, University of Oxford, Oxford, UK JOHANNES GRISS • Department of Dermatology, Medical University of Vienna, Vienna, Austria CATHERINE HAYES • Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland; Computer Science Department, University of Geneva, Geneva, Switzerland

xi

xii

Contributors

ALEXANDER HOGREBE • Department of Genome Sciences, University of Washington, Seattle, WA, USA HOLGER HUSI • Institute of Cardiovascular and Medical Sciences, University of Glasgow, Glasgow, UK; Division of Biomedical Sciences, Institute of Health Research and Innovation, University of the Highlands and Islands, Inverness, UK ARIAN R. JAMASB • Department of Computer Science and Technology, University of Cambridge, Cambridge, UK; Department of Biochemistry, University of Cambridge, Cambridge, UK YU JIANG • School of Public Health, University of Memphis, Memphis, TN, USA ZIJUN LIAO • Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, USA; Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Fujian Medical University, Fujian, China NARA LIESSI • Analytical Chemistry Lab, Istituto Italiano di Tecnologia, Genova, Italy PIETRO LIO` • Department of Computer Science and Technology, University of Cambridge, Cambridge, UK ´ FREDE´RIQUE LISACEK • Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland; Computer Science Department and Section of Biology, University of Geneva, Geneva, Switzerland VALENTIN LOUX • Universite´ Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France; Universite´ Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, Jouy-enJosas, France MARTA LUALDI • Department of Science and High Technology, Center of Bioinformatics, University of Insubria, Busto Arsizio, Italy SHUANGGE MA • Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT, USA FULVIO MAGNI • Department of Medicine and Surgery, Proteomics and Metabolomics Unit, University of Milano-Bicocca, Milan, Italy MARCELLO MANFREDI • Department of Translational Medicine, University of Piemonte Orientale, Novara, Italy; Center for Translational Research on Autoimmune and Allergic Diseases, CAAD, University of Piemonte Orientale, Novara, Italy EMILIO MARENGO • Department of Sciences and Technological Innovation, University of Piemonte Orientale, Alessandria, Italy; Center for Translational Research on Autoimmune and Allergic Diseases, CAAD, University of Piemonte Orientale, Novara, Italy JULIEN MARIETHOZ • Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland; Computer Science Department, University of Geneva, Geneva, Switzerland FEDERICA MARINI • Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy ANA MARTINEZ-VAL • Novo Nordisk Foundation Center for Protein Research, Proteomics Program, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark NATSU NAKAJIMA • Institute for Quantitative Biosciences, The University of Tokyo, Tokyo, Japan RYUICHIRO NAKATO • Institute for Quantitative Biosciences, The University of Tokyo, Tokyo, Japan JESPER VELGAARD OLSEN • Novo Nordisk Foundation Center for Protein Research, Proteomics Program, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen,

Contributors

xiii

Denmark; Proteomics Program, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen N, Denmark GAOFENG PAN • Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, USA ISABELLA PIGA • Department of Medicine and Surgery, Proteomics and Metabolomics Unit, University of Milano-Bicocca, Milan, Italy ELISA ROBOTTI • Department of Sciences and Technological Innovation, University of Piemonte Orientale, Alessandria, Italy VEIT SCHWA€ MMLE • Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark ANDREW SMITH • Department of Medicine and Surgery, Proteomics and Metabolomics Unit, University of Milano-Bicocca, Milan, Italy CHAO SUN • Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, USA JIJUN TANG • Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, USA; School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China YVES VANDENBROUCK • Universite´ Grenoble Alpes, INSERM, CEA, UMR BioSante´ U1292, Grenoble, France MENGYUN WU • School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China AVINASH YADAV • Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy

Part I Data Analysis for Gel-Based Proteomics

Chapter 1 Two-Dimensional Gel Electrophoresis Image Analysis Elisa Robotti, Elisa Cala`, and Emilio Marengo Abstract Gel-based proteomics is still quite widespread due to its high-resolution power; the experimental approach is based on differential analysis, where groups of samples (e.g., control vs diseased) are compared to identify panels of potential biomarkers. However, the reliability of the result of the differential analysis is deeply influenced by 2D-PAGE maps image analysis procedures. The analysis of 2D-PAGE images consists of several steps, such as image preprocessing, spot detection and quantitation, image warping and alignment, spot matching. Several approaches are present in literature, and classical or last-generation commercial software packages exploit different algorithms for each step of the analysis. Here, the most widespread approaches and a comparison of the different strategies are presented. Key words 2D-PAGE, Image analysis, Warping, Image preprocessing, Spot detection and quantification, Spot matching

1

Introduction Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) [1] is a widespread analytical technique for the separation of complex protein mixtures according to protein isoelectric point and molecular weight. Notwithstanding the increasing development of applications based on liquid chromatography coupled to mass spectrometry, 2D-PAGE is still an exploited tool above all for its high-resolution power. The experimental approach in gel-based proteomics is based on differential analysis, where groups of samples (e.g., control vs. diseased, control vs. drug-treated, etc.) are compared to identify panels of biomarkers, i.e., panels of differentially expressed proteins. To accomplish this procedure, 2D-maps must be analyzed by a multistep procedure [2, 3] consisting in: – Image preprocessing: maps are scanned through a densitometer providing an image characterized by the optical density of each

Daniela Cecconi (ed.), Proteomics Data Analysis, Methods in Molecular Biology, vol. 2361, https://doi.org/10.1007/978-1-0716-1641-3_1, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

3

4

Elisa Robotti et al.

point of the map; images are then processed to eliminate background and subtract noise [4]. – Spot detection and quantitation: spots are detected on each image independently, and each spot is quantified through its total optical density [5–13]. – Spot filtering: spots not satisfying specific criteria are eliminated. – Image warping and alignment: images are warped through different algorithms to produce the final alignment of all maps, usually through procedures based on the selection of corresponding landmarks present along all the available images, which is performed by the operator [5, 14–28]. – Spot matching: the identified, corresponding spots are matched along all the maps [6, 22, 23, 29–31]. – Differential analysis: this step allows to compare maps from different groups to identify the differences, i.e., the candidate biomarkers. Two approaches can be followed: the comparison of spot volume datasets, where each sample is described by the volumes (namely the total optical density) of the spots detected on its surface [32–38], or the direct comparison of 2D-maps at the pixel level [39–43]. Once the maps are compared, the statistically relevant regulations can be identified through classical monovariate statistical tests (Student’s t-test, Mann-Whitney test, etc.) [44] or by multivariate procedures [45–47]. The possible approaches reported in literature are mainly three (Fig. 1); two are based on a final differential analysis performed on spot volumes, while the last one involves differential analysis at the pixel level: – Mode 1: image preprocessing is followed by spot detection, spot matching and then differential analysis is carried out on a spot volume dataset. – Mode 2: image preprocessing is followed by image alignment, spot detection and then differential analysis is again carried out on spot volumes. – Mode 3: image preprocessing is followed by image alignment and final differential analysis carried out at the pixel level. Usually, classical software packages exploit the first approach [6, 7, 13], while the second one is usually applied by last-generation software packages [19, 26, 48, 49]. If the three approaches are compared, all of them are characterized by advantages and disadvantages: in the first approach, spot matching accuracy decreases as sample size increases [50]; the second approach avoids the presence of missing data [51]; the third one avoids two time-consuming steps since spot detection and quantification are eliminated and images are compared directly at the pixel level [52–55]; moreover,

Two-Dimensional Gel Electrophoresis Image Analysis

Mode 1

Mode 2

5

Mode 3

Mode 2

Pre-processing

Spot matching Mode 2

Mode 1

Spot detection

Spot detection

Spot matching

Mode 2

Mode 1

Mode 3

Spot volume dataset

Alignment

Differential analysis

Differential analysis

Fig. 1 2-DE image analysis strategies

this approach avoids arbitrary choices from the operator that can have a great effect on the final result. Table 1 reports the most popular commercial software packages for 2-D gel analysis. Here, the main steps of image analysis and the advantages and disadvantages of each procedure will be discussed.

2

Image Preprocessing All the three approaches are based on a first step of image preprocessing, fundamental to eliminate some common problems in gel-based procedures that increase analytical uncertainty and variability: (a) the effect of background, that shows a different color intensity when different maps are considered; (b) the presence of noise and image artifacts, like spurious spots or streaks. Preprocessing is therefore applied to remove the differences in background intensity and these artifacts, thus eliminating a part of the natural variability that affects 2D-PAGE. This step is fundamental to provide a more reliable detection of the spots present on the map that reflects in a more reliable identification of candidate biomarkers from spot volume datasets; Rogers et al. [56] and Tsakanikas et al. [57] proved that noise filtering and background subtraction

6

Elisa Robotti et al.

Table 1 Most popular commercial software packages for 2-D gel analysis Software

Company

Availability

Traditional software packages Dymension

Syngene

http://www.syngene.com/dymension

DeCyder 2D Differential Analysis

GE Healthcare

http://www.gelifesciences.com

Image Master 2D Platinum

GE Healthcare

http://www.gelifesciences.com

Proteomweaver

Bio-Rad Laboratories http://www.bio-rad.com/it-it/sku/soft-lit-5484proteomweaver-2-d-analysis-software

Phoretix 2D Advanced

Nonlinear Dynamics

PDQuest

Bio-Rad Laboratories http://www.bio-rad.com/en-us/product/pdquest-2-danalysis-software

Melanie

Geneva Bioinformatics

http://www.nonlinear.com

http://www.genebio.com/products/melanie/

Fourth-generation software packages Progenesis SameSpots

Nonlinear Dynamics

http://www.nonlinear.com

Delta 2D

Decodon

http://www.decodon.com/delta2d.html

Redfin Solo

Ludesi

http://www.ludesi.com/redfin/

GeneScope

GeneScope Bioinformatics group

http://www.genescope.com

SameSpots

Cleaver Scientific

https://www.cleaverscientific.com/electrophoresisproducts/samespots/

RegStat Gel

http://www.mediafire.com/FengLi/2DGelsoftware

Gel2DE

http://code.google.com/p/gel2de

can affect the spot quantification and the variability of the final analysis. 2D-PAGE maps can present both the so-called “Gaussian Noise” and/or the “spike noise” [16, 58]: the first one is usually caused by the acquisition step, while the second one is generally related to the experimental procedure. Noise can be usually easily identified, and then eliminated, as it shows high-frequency noise, as opposed to spots, characterized by low-frequency noise [16, 59]. Two main strategies for noise removal are usually applied: – Spatial filtering is often applied in commercial software [56, 60, 61]. In this approach, a filter is applied to a local nxn kernel/

Two-Dimensional Gel Electrophoresis Image Analysis

7

window [16, 17, 61, 62]: the new value of the pixel intensity at the center of the kernel is calculated on the basis of the values of the other pixels in the kernel, by applying the selected filter, and then the kernel is moved to the next position. Several different filters can be chosen: median filter, adaptive filters, Gaussian filters, polynomial filters, and median modified Wiener filter [63]; however, all these strategies alter the value of intensity of the spot pixels. – Spatial-frequency domain filters are usually applied to remove Gaussian noise. The two most exploited alternatives are wavelet filters and contourlet denoising [58, 62]. Wavelet denoising is very efficient but it suffers from a drawback since it is prone to directionality, due to the fact that noise is captured along the three different directions (horizontal, vertical, and diagonal); contour denoising was therefore developed to overcome this problem. These approaches are very powerful but need to define complex filter parameters that should be handled by experienced operators thus hampering their use in routine analysis and when large numbers of maps need to be treated altogether. Background correction is a mandatory step needed to subtract the contribution due to non-homogeneous gel staining or variability during the image acquisition. Two different strategies are available, based on how the estimation of the background is performed: – The first approach estimates the background by the value of the pixels just outside the spot boundary. In this case, it is possible to choose between two strategies: the selection of the background as the minimum value of intensity detected along the boundary of the spot, or the average of the intensities of the pixels along the spot boundary. Both approaches can be applied when the background is supposed to be homogeneous along the map [64]. – The second approach is based on the calculation of a model of the background along the gel surface [64]. This method should be applied after the spot detection step, since the pixels used for building the model must not belong to spots. Several software packages exploit this type of procedure, e.g., PDQuest, Delta 2D, Progenesis PG240, SameSpot.

3

Spot Detection and Spot Quantification Spot detection is focused on identifying the spots present in the image; this step is fundamental prior to gel matching when algorithms based on spot matching are used. The first step usually implies the identification of the spot centers, then a step of spot

8

Elisa Robotti et al.

segmentation is applied to identify the area of the gel that contains each spot; the final step of spot modeling is applied to calculate spot features and identify possible co-migrations. For what concerns image segmentation, approaches based on edge detection should be applied, after the removal of noise and artifacts, in order to avoid the identification of spurious spots [53, 65]. The most exploited method for spot segmentation consists in the watershed transformation algorithm [6, 11, 66], borrowed from geosciences. In this approach, a gel is considered as a topographic relief and the spots as depressions: the algorithm identifies the basins and the spots features by iterative fitting methods [11, 67]. Other methods exploit geometric algorithms, parametric spot models [68], the pixel value collection method [8, 53], and slice tree with confidence evaluation [7]. Once spots are detected, their features are extracted to provide a quantitation of the optical density of each spot (proportional to concentration); in this case, it is possible to apply either parametric or nonparametric algorithms [67]: the first ones (usually adopted by classical software) are based on spot modeling to transform each spot into an ideal spot with elliptic shape through a Gaussian fitting; last-generation software instead exploit nonparametric algorithms, where each spot is described by its own shape. Gaussian fitting works very well in the case of overlapping spots and small spots, while it provides irregular shapes for larger spots and does not work very well in the case of saturated spots (i.e., in the cases where the Gaussian approximation is hardly verified) [56]. In classical software packages, it is usually necessary to perform a heavy spot editing after spot detection, in order to remove spurious spots, add faint spots or split the overlapping ones. In the case of last-generation packages, the alignment is carried out at the pixel level: this procedure implies that spot editing is carried out just once, on the “fusion image,” and then it is propagated to all the single images, greatly decreasing the operator’s intervention [19]. Another advantage of last-generation software is that they reduce the problem of missing data. Missing data can in fact be present in a spot volume dataset due to biological reasons (in this case it is proper to maintain the protein as missing) or to technical problems, such as over-focusing, bad transfer between first and second dimension, gel-to-gel variations, etc. When this is the case, problems may arise in biomarkers identification. Since lastgeneration software packages exploit for each spot, the same set of spots boundaries for all the single images, missing values are instead represented by a very low intensity value, rather than a null value. However, the last-generation software, on the other hand, tend to merge together spots that are close one to each other on the bidimensional map, thus increasing the possibility of overlapping spots [50].

Two-Dimensional Gel Electrophoresis Image Analysis

4

9

Gel Warping and Matching The final result of differential analysis is deeply affected by gel matching, which in turn is very influenced by gel warping. Gel matching and warping are two deeply connected steps which are essential in 2D-PAGE maps analysis, since the experimental technique shows a high variability for what regards protein migration and therefore spot position. Warping is therefore intended to correct systematic geometric distortions. In warping procedures, two images at a time are registered: one image (source) is deformed into the second one (template), so that similar features in the images match one to each other [69]. Warping algorithms can exploit three different approaches [70]: – Landmark-based methods [60, 71–73]. In these methods, a first step of spot segmentation is mandatory, since the alignment of groups of images is driven by the selection of some landmarks in the images, i.e., spots present in all the images. The landmarks consist of a group of spots considered by the algorithm as individual points, provided as input to the software. The images are then aligned exploiting polynomial functions calculated on the provided landmarks [72, 74]. In the classical approaches, low-order polynomials are used, but other alternative are also present in literature, as thin plate spline (TPS) [74], hierarchical grid transformation [21, 24, 75, 76], the Iterative Closest Point procedure [70], and deformed graphs [30]. – Intensity-based methods [14, 24, 25, 77, 78], where the images are aligned through the optimization of the intensity similarity index, i.e., a mathematical function expressing the similarity between two images, as the cross-correlation or the sum of squared intensity differences. In intensity-based methods, the step of segmentation is not required: for this reason, they are now the most widespread techniques applied. In these methods, warping is carried out on pixels values [14], therefore the alignment takes into consideration further information that it is not used for alignment in spot-based procedures, as spot shape and intensity spread. Usually, the correlation between two images is optimized for the alignment, but other alternatives are present in literature [79], as the pixel-based analysis of multiple images (PMC) [52] or fuzzy warping [14–16, 18, 24, 27, 39, 40, 55, 59, 80, 81]. – The combination of spot-based and intensity-based approaches, meant to overcome the limitations of the two single procedures [69, 82], is becoming more popular.

10

5

Elisa Robotti et al.

Conclusions Here, the main steps needed for 2D-PAGE maps image analysis have been presented, giving an insight into the single steps and the alternative methods present in literature. The alternatives available for image analysis of proteomic 2D-maps are numerous and several software packages relying on different approaches are commercially available. Notwithstanding this plethora of different possibilities, it is important to point out that gel-based proteomics is deeply influenced by image analysis and by the choices that must be taken during this fundamental and time-consuming step of the analysis. The algorithms that have been proposed so far, strongly improve the single steps of spot detection, image warping, and gel matching, but the overall procedure is far from being completely automatic, and human supervision is still needed: This constitutes one of the most relevant limitations of the gel-based approach. The choice of the software package and of the specific algorithms that are employed during the analysis should be therefore selected with great attention, to choose the best alternative.

References 1. Sheenan D, Tyther R (eds) (2009) Two-dimensional electrophoresis protocols. Methods in molecular biology, vol 519. Humana Press, Totowa, NJ 2. Marengo E, Robotti E, Bobba M (2008) 2DPAGE maps analysis. In: Vlahou A (ed) Clinical proteomics: methods and protocols. Methods in molecular biology, vol 428. Humana Press, Totowa, NJ, pp 291–325 3. Marengo E, Robotti E (eds) (2016) 2D-PAGE maps analysis. Methods in molecular biology, vol 1384. Humana Press, Totowa, NJ 4. Rye M, Fargestad EM (2012) Preprocessing of electrophoretic images in 2-DE analysis. Chemom Intell Lab Syst 117:70–79 5. Moller B, Posch S (2009) Robust features for 2-DE gel image registration. Electrophoresis 30:4137–4148 6. Srinark T, Kambhamettu C (2008) An image analysis suite for spot detection and spot matching in two-dimensional electrophoresis gels. Electrophoresis 29:706–715 7. Liu YS, Chen SY, Liu RS, Duh DJ, Chao YT, Tsai YC, Hsieh JS (2009) Spot detection for a 2-DE gel image using a slice tree with confidence evaluation. Math Comput Model 50:1–14 8. Cutler P, Heald G, White IR, Ruan J (2003) A novel approach to spot detection for two-dimensional gel electrophoresis images

using pixel value collection. Proteomics 3:392–401 9. Kazhiyur-Mannar R, Smiraglia DJ, Plass C, Wenger R (2006) Contour area filtering of two-dimensional electrophoresis images. Med Image Anal 10:353–365 10. Wu Y, Lemkin PF, Upton K (1993) A fast spot segmentation algorithm for two-dimensional gel electrophoresis analysis. Electrophoresis 14:1351–1356 11. Bettens E, Scheunders P, VanDyck D, Moens L, VanOsta P (1997) Computer analysis of two-dimensional electrophoresis gels: a new segmentation and modeling algorithm. Electrophoresis 18:792–798 12. Tsakanikas P, Manolakos ES (2011) Protein spot detection and quantification in 2-DE gel images using machine-learning methods. Proteomics 11:2038–2050 13. Langella O, Zivy M (2008) A method based on bead flows for spot detection on 2-D gel images. Proteomics 8:4914–4918 14. Veeser S, Dunn MJ, Yang GZ (2001) Multiresolution image registration for two-dimensional gel electrophoresis. Proteomics 1:856–870 15. Dowsey AW, English J, Pennington K, Cotter D et al (2006) Examination of 2-DE in the Human Proteome Organisation Brain Proteome Project pilot studies with the new RAIN

Two-Dimensional Gel Electrophoresis Image Analysis gel matching technique. Proteomics 6:5030–5047 16. Dowsey AW, Dunn MJ, Yang GZ (2003) The role of bioinformatics in two-dimensional gel electrophoresis. Proteomics 3:1567–1596 17. Conradsen K, Pedersen J (1992) Analysis of two-dimensional electrophoretic gels. Biometrics 48:1273–1287 18. Woodward AM, Rowland JJ, Kell DB (2004) Fast automatic registration of images using the phase of a complex wavelet transform: application to proteome gels. Analyst 129:542–552 19. Luhn S, Berth M, Hecker M, Bernhardt J (2003) Using standard positions and image fusion to create proteome maps from collections of two-dimensional gel electrophoresis images. Proteomics 3:1117–1127 20. Schultz J, Gottlieb DM, Petersen M, Nesic L, Jacobsen S, Sondergaard I (2004) Explorative data analysis of two-dimensional electrophoresis gels. Electrophoresis 25:502–511 21. Salmi J, Aittokallio T, Westerholm J, Griese M, Rosengren A, Nyman TA, Lahesmaa R, Nevalainen O (2002) Hierarchical grid transformation for image warping in the analysis of two dimensional electrophoresis gels. Proteomics 2:1504–1515 22. Panek J, Vohradsky J (1999) Point pattern matching in the analysis of two-dimensional gel electropherograms. Electrophoresis 20:3483–3491 23. Kaczmarek K, Walczak B, de Jong S, Vandeginste BG (2002) Feature based fuzzy matching of 2D gel electrophoresis images. J Chem Inf Comput Sci 42:1431–1442 24. Gustafsson JS, Blomberg A, Rudemo M (2002) Warping two-dimensional electrophoresis gel images to correct for geometric distortions of the spot pattern. Electrophoresis 23:1731–1744 25. Smilansky Z (2001) Automatic registration for images of two-dimensional protein gels. Electrophoresis 22:1616–1626 26. Sorzano CO, Arganda-Carreras I, Thevenaz P, Beloso A, Morales G, Valdes I, Perez-Garcia C, Castillo C, Garrido E, Unser M (2008) Elastic image registration of 2-D gels for differential and repeatability studies. Proteomics 8:62–65 27. Daszykowski M, Faergestad EM, Grove H, Martens H, Walczak B (2009) Matching 2D gel electrophoresis images with Matlab ‘Image Processing Toolbox’. Chemom Intell Lab Syst 96:188–195 28. Potra FA, Liu X, Seillier-Moiseiwitsch F, Roy A, Hang Y, Marten MR, Raman B, Whisnant C (2006) Protein image alignment via

11

piecewise affine transformations. J Comput Biol 13:614–630 29. Rogers M, Graham J (2007) Robust and accurate registration of 2-D electrophoresis gels using point-matching. IEEE Trans Image Process 16:624–635 30. Noma A, Pardo A, Cesar RM (2011) Structural matching of 2D electrophoresis gels using deformed graphs. Pattern Recogn Lett 32:3–11 31. Xin HM, Zhu Y (2009) Multiple information based spot matching method for 2-DE images. Electrophoresis 30:2477–2480 32. Marengo E, Robotti E, Cecconi D, Hamdan M, Scarpa A, Righetti PG (2004) Identification of the regulatory proteins in human pancreatic cancers treated with Trichostatin-A by 2DPAGE maps and multivariate statistical analysis. Anal Bioanal Chem 379(7–8):992–1003 33. Marengo E, Robotti E, Bobba M, Liparota MC, Rustichelli C, Zamo A, Chilosi M, Righetti PG (2006) Multivariate statistical tools applied to the characterisation of the proteomic profiles of two human lymphoma cell lines by two-dimensional gel electrophoresis. Electrophoresis 27(2):484–494 34. Marengo E, Robotti E, Bobba M, Milli A, Campostrini N, Righetti SC, Cecconi D, Righetti PG (2008) Application of partial least squares discriminant analysis and variable selection procedures: a 2D-PAGE proteomic study. Anal Bioanal Chem 390(5):1327–1342 35. Robotti E, Demartini M, Gosetti F, Calabrese G, Marengo E (2011) Development of a classification and ranking method for the identification of possible biomarkers in two-dimensional gel-electrophoresis based on principal component analysis and variable selection procedures. Mol BioSyst 7 (3):677–686 36. Negri AS, Robotti E, Prinsi B, Espen L, Marengo E (2011) Proteins involved in biotic and abiotic stress responses as the most significant biomarkers in the ripening of Pinot Noir skins. Funct Integr Genomics 11(2):341–355 37. Polati R, Menini M, Robotti E, Millioni R, Marengo E, Novelli E, Balzan S, Cecconi D (2012) Proteomic changes involved in tenderization of bovine Longissimus dorsi muscle during prolonged aging. Food Chem 135 (3):2052–2069 38. Marengo E, Robotti E, Righetti PG, Campostrini N, Pascali J, Ponzoni M, Hamdan M, Astner H (2004) Study of proteomic changes associated with healthy and tumoral murine samples in neuroblastoma by

12

Elisa Robotti et al.

principal component analysis and classification methods. Clin Chim Acta 345(1–2):55–67 39. Marengo E, Robotti E, Gianotti V, Righetti PG, Cecconi D, Domenici E (2003) A new integrated statistical approach to the diagnostic use of proteomic two-dimensional maps. Electrophoresis 24(1–2):225–236 40. Marengo E, Robotti E, Righetti PG, Antonucci F (2003) New approach based on fuzzy logic and principal component analysis for the classification of two-dimensional maps in health and disease: application to lymphomas. J Chromatogr A 1004(1–2):13–28 41. Marengo E, Bobba M, Liparota MC, Robotti E, Righetti PG (2005) Use of Legendre moments for the fast comparison of two dimensional polyacrylamide gel electrophoresis maps images. J Chromatogr A 1096 (1–2):86–91 42. Marengo E, Cocchi M, Demartini M, Robotti E, Bobba M, Righetti PG (2011) Investigation of the applicability of Zernike moments to the classification of SDS 2D-PAGE maps. Anal Bioanal Chem 400 (5):1419–1431 43. Marengo E, Robotti E, Bobba M, Demartini M, Righetti PG (2008) A new method for comparing 2D-PAGE maps based on the computation of Zernike moments and multivariate statistical tools. Anal Bioanal Chem 391(4):1163–1173 44. Carpentier S, Panis B, Swennen R, Lammertyn J (2008) Finding the significant markers: statistical analysis of proteomic data. In: Vlahou A (ed) Clinical proteomics. Methods in molecular biology, vol 428. Humana Press, Totowa, NJ, pp 327–347 45. Jacobsen S, Grove H, Jensen KN, Sorensen HA, Jessen F, Hollung K, Uhlen AK, Jorgensen BM, Faergestad EM, Sondergaard I (2007) Multivariate analysis of 2-DE protein patterns—practical approaches. Electrophoresis 28:1289–1299 46. Faergestad EM, Rye MB, Nhek S, Hollung K, Grove H (2011) The use of chemometrics to normalization for 2-DE and 2-D-DIGE analyse protein patterns from gel electrophoresis. Acta Chromatogr 23:1–40 47. Grove H, Jorgensen BM, Jessen F, Sondergaard I, Jacobsen S, Hollung K, Indahl U, Faergestad EM (2008) Combination of statistical approaches for analysis of 2-DE data gives complementary results. J Proteome Res 7(12):5119–5124 48. Bandow JE, Baker JD, Berth M, Painter C, Sepulveda OJ, Clark KA, Kilty I, VanBogelen RA (2008) Improved image analysis workflow

for 2-D gels enables large-scale 2-D gel-based proteomics studies—COPD biomarker discovery study. Proteomics 8:3030–3041 49. Rye MB, Faergestad EM, Alsberg BK (2008) A new method for assigning common spot boundaries for multiple gels in two dimensional gel electrophoresis. Electrophoresis 29:1359–1368 50. Clark BN, Gutstein HB (2008) The myth of automated, high-throughput two-dimensional gel analysis. Proteomics 8:1197–1203 51. Morris JS, Clark BN, Wei W, Gutstein HB (2010) Evaluating the performance of new approaches to spot quantification and differential expression in 2-dimensional gel electrophoresis studies. J Proteome Res 9:595–604 52. Faergestad EM, Rye M, Walczak B, Gidskehaug L, Wold JP, Grove H, Jia X, Hollung K, Indahl UG, Westad F, van den Berg F, Martens H (2007) Pixel-based analysis of multiple images for the identification of changes: a novel approach applied to unravel proteome patterns [corrected] of 2-D electrophoresis gel images. Proteomics 7:3450–3461 53. Rye MB, Faergestad EM, Martens H, Wold JP, Alsberg BK (2008) An improved pixel-based approach for analyzing images in two dimensional gel electrophoresis. Electrophoresis 29:1382–1393 54. Van Belle W, Anensen N, Haaland I, Bruserud O, Hogda KA, Gjertsen BT (2006) Correlation analysis of two-dimensional gel electrophoretic protein patterns and biological variables. BMC Bioinformatics 7:198 55. Daszykowski M, Stanimirova I, BodzonKulakowska A, Silberring J, Lubec G, Walczak B (2007) Start-to-end processing of two dimensional gel electrophoretic images. J Chromatogr A 1158:306–317 56. Rogers M, Graham J, Tonge RP (2003) Using statistical image models for objective evaluation of spot detection in two-dimensional gels. Proteomics 3(6):879–886 57. Tsakanikas P, Manolakos ES (2009) Improving 2-DE gel image denoising using contourlets. Proteomics 9(15):3877–3888 58. Tsakanikas P, Manolakos I (2007) Effective denoising of 2D gel proteomics images using contourlets. In: Image processing, 2007, ICIP 2007. IEEE international conference on 2, 1–22 59. Marengo E, Robotti E, Antonucci F et al (2005) Numerical approaches for quantitative analysis of two-dimensional maps: a review of commercial software and home-made systems. Proteomics 5:654–666

Two-Dimensional Gel Electrophoresis Image Analysis 60. Appel RD, Palagi PM, Walther D et al (1997) Melanie II—a third-generation software package for analysis of two-dimensional electrophoresis images: I. Features and user interface. Electrophoresis 18:2724–2734 61. Appel RD, Vargas JR, Palagi PM et al (1997) Melanie II—a third-generation software package for analysis of two-dimensional electrophoresis images: II. Algorithms. Electrophoresis 18:2735–2748 62. Kaczmarek K, Walczak B, deJong S et al (2004) Preprocessing of two-dimensional gel electrophoresis images. Proteomics 4:2377–2389 63. Cannistraci CV, Montevecchi FM, Alessio M (2009) Median-modified Wiener filter provides efficient denoising, preserving spot edge and morphology in 2-DE image processing. Proteomics 9:4908–4919 64. Robotti E, Marengo E, Quasso F (2016) Image pretreatment tools II: normalization techniques for 2-DE and 2-D DIGE. Methods Mol Biol 1384:91–107 65. Pleissner KP, Hoffmann F, Kriegel K, Wenk C, Wegner S, Sahlstrom A, Oswald H, Alt H, Fleck E (1999) New algorithmic approaches to protein spot detection and pattern matching in two-dimensional electrophoresis gel databases. Electrophoresis 20:755–765 66. Silva E, O’Gorman M, Becker S, Auer G, Eklund A, Grunewald J, Wheelock AM (2010) In the eye of the beholder: does the master see the SameSpots as the novice? J Proteome Res 9(3):1522–1532 67. Rogers M, Graham J, Tonge RP (2003) Statistical models of shape for the analysis of protein spots in two-dimensional electrophoresis gel images. Proteomics 3:887–896 68. Berth M, Moser FM, Kolbe M, Bernhardt J (2007) The state of the art in the analysis of two-dimensional gel electrophoresis images. Appl Microbiol Biotechnol 76:1223–1243 69. Rohr K, Cathier P, Worz S (2004) Elastic registration of electrophoresis images using intensity information and point landmarks. Pattern Recogn 37:1035–1048 70. Shi G, Jiang T, Zhu W, Liu B, Zhao H (2007) Alignment of two-dimensional electrophoresis gels. Biochem Biophys Res Commun 357:427–432 71. Raman B, Cheung A, Marten MR (2002) Quantitative comparison and evaluation of two commercially available, two-dimensional electrophoresis image analysis software

13

packages, Z3 and Melanie. Electrophoresis 23:2194–2202 72. Lemkin PF (1997) Comparing twodimensional gels across the internet. Electrophoresis 18(3–4):461–470 73. Efrat A, Hoffmann F, Kriegel K, Schultz C, Wenk C (2002) Geometric algorithms for the analysis of 2D-electrophoresis gels. J Comput Biol 9(2):299–316 74. Bookstein FL (1989) Principal warps: thinplate splines and the decomposition of deformation. IEEE Trans Pattern Anal Mach Intell 11:567–585 75. Marengo E, Cocchi M, Demartini M, Robotti E, Cecconi D, Calabrese G (2012) GENOCOP algorithm and hierarchical grid transformation for image warping of two dimensional gel electrophoretic maps. Mol BioSyst 8(4):975–984 76. Robotti E, Marengo E, Demartini M (2016) GENOCOP algorithm and hierarchical grid transformation for image warping of two-dimensional gel electrophoretic maps. Methods Mol Biol 1384:165–184 77. Baker M, Busse H, Vogt M (2000) An automatic registration and segmentation algorithm for multiple electrophoresis images. In: Hanson KM (ed) Medical imaging 2000—Image processing (MI’2000), Proceedings of the SPIE international symposium, vol 3979, 14–17 February 2000, San Diego/CA, pp 426–436 78. Josso B, Zindy E, Aldemir H (2000) Automatic 2D gel registration using distance minimisation of image morphing. In: Proceedings of the IEEE international conference on information visualization (IV’00), London, England, 19–21 July 2000, pp 357–361 79. Glasbey CA, Mardia KV (2001) A penalized likelihood approach to image warping. J R Stat Soc B 63:465–514 80. Rodriguez A, Fernandez-Lozano C, Dorado J, ˜ al JR (2014) Two-dimensional gel elecRabun trophoresis image registration using blockmatching techniques and deformation models. Anal Biochem 454:53–59 81. Lin D-T (2010) Autonomous sub-image matching for two-dimensional electrophoresis gels using MaxRST algorithm. Image Vis Comput 28:1267–1279 82. Worz S, Winz M-L, Rohr K (2009) Geometric alignment of 2D gel electrophoresis. Images Methods Inf Med 48:320–323

Chapter 2 Chemometric Tools for 2D-PAGE Data Analysis Elisa Robotti, Elisa Cala`, and Emilio Marengo Abstract Two-Dimensional Polyacrylamide Gel Electrophoresis (2D-PAGE) provides two-dimensional maps where proteins appear separated according to their isoelectric point (pI ) and molecular weight (MW). Usually these maps are very complex (i.e., hundreds or thousands of spots can be present in each map), and characterized by a low reproducibility, which hinders the possibility to identify reliable biomarkers unless robust methods are applied. The analysis of different sets of 2D-PAGE maps (e.g., control vs. pathological or control vs. drug-treated samples) to identify candidate biomarkers (proteins under- or over-expressed in different conditions) is usually carried out through image analysis systems providing a so-called spot volume dataset where each sample corresponds to a map described by the optical densities of all the detected spots. The identification of candidate biomarkers can be therefore accomplished by comparing different maps by classical monovariate statistical tests applied spotwise, or by multivariate chemometric tools applied to the entire set of spots present on each map. Here, the most exploited multivariate techniques will be considered, ranging from pattern recognition to classification methods. Key words 2D-PAGE, Proteomics, Principal component analysis, Partial least squares discriminant analysis, Ranking-PCA, Linear discriminant analysis, Cluster analysis, Biomarker identification

1

Introduction Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) is a very widespread technique in the field of proteomics for the separation of complex protein mixtures in different fields: medical applications, drug design, food chemistry, environmental field. All these apparently heterogeneous applications have a common denominator that is the necessity to identify biomarkers of a particular condition: the arise of a pathology in the medical field; the effect played by an active principle in drug-design studies; the effect of ripening, storage conditions, etc. in food chemistry; the role played on humans, animals, plants, and microorganisms by the exposure to particular environmental conditions or pollution, in the field of environmental biology.

Daniela Cecconi (ed.), Proteomics Data Analysis, Methods in Molecular Biology, vol. 2361, https://doi.org/10.1007/978-1-0716-1641-3_2, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

15

16

Elisa Robotti et al.

2D-PAGE methodology is applied to separate complex protein mixtures providing a two-dimensional map where the proteins, separated according to their isoelectric point (pI) and molecular weight (MW), appear as spots spread along the gel matrix. Maps from different conditions (i.e., control vs. pathological samples) are then analyzed through image analysis software (see Chapter 1 for more details) to provide a set of protein spots quantified on each gel by means of their optical density. Candidate biomarkers can be therefore identified in spot volume datasets by different approaches via differential analysis, and further characterized by mass spectrometry. This strategy suffers from some drawbacks, the worst being the low reproducibility often affecting 2D-PAGE: this can be partially overcome by collecting different replications of the same experimental condition or exploiting 2D difference gel electrophoresis (DIGE); however, the number of replications is usually low (