Oh the Glory of It All [1 ed.] 9780471662587

Content: Overview of proteomics -- Proteomic tools for analysis of cellular dynamics -- Dynamics of functional cellular

258 56 7MB

English Pages 274 Year 2005

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Oh the Glory of It All [1 ed.]
 9780471662587

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

PROTEOMIC BIOLOGY USING LC-MS

PROTEOMIC BIOLOGY USING LC-MS Large Scale Analysis of Cellular Dynamics and Function

NOBUHIRO TAKAHASHI TOSHIAKI ISOBE

Copyright © 2008 by John Wiley & Sons, Inc. All rights reserved. Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400, fax 978-750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, 201-748-6011, fax 201-748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at 877-762-2974, outside the United States at 317572-3993 or fax 317-572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Wiley Bicentennial Logo: Richard J. Pacifico Library of Congress Cataloging-in-Publication Data: Takahashi, Nobuhiro. Proteomic biology using LC-MS: large scale analysis of cellular dynamics and function / Nobuhiro Takahashi, Toshiaki Isobe. p. ; cm. Includes bibliographical references and index. ISBN-13: 978-0-471-66258-7 (cloth) 1. Proteomics. 2. Liquid chromatography. 3. Mass spectrometry. I. Isobe, Toshiaki. II. Title. [DNLM: 1. Proteomics–methods. 2. Chromatography, Liquid–methods. 3. Mass Spectrometry–methods. QU 58.5 T136p 2007] QP551.T115 2008 572⬘.6–dc22 2006102196 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1

CONTENTS

Introduction 1 Overview of Proteomics

vii 1

1-1 What Is Proteomics?, 1 1-2 Establishment of Proteomics Using Mass Spectrometry, 3 1-2-1 Protein Separations by Two-Dimensional Electrophoresis, 3 1-2-2 Development of the Technologies for Protein Identification, 5 1-2-3 Protein Identification Based on Gel Separation and Mass Spectrometry, 6 1-3 Strategies for Characterizing Proteomes and Understanding Proteome Function, 15 1-3-1 Modification-Specific Proteomics, 17 1-3-2 Activity-Based Protein Profiling/Enzyme Substrate Proteomics, 31 1-3-3 Subcellular (Organellar) Proteomics, 41 1-3-4 Machinery or Complex Interaction Proteomics, 45 1-3-5 Dynamic Proteomics, 46 References, 49 2 Proteomic Tools for Analysis of Cellular Dynamics

63

2-1 LC-Based Proteomics Technologies, 63 2-1-1 LC-MS system for Peptide Separation and Identification, 71 2-1-2 Application of LC-MS Methods to Functional Proteomics, 101 2-2 Development of Quantitative Proteomics, 131 v

CONTENTS

vi

2-2-1 Isotope Labeling for Quantitative Analysis Using MS, 132 2-2-2 Quantification Strategy for LC-MS Analysis of Isotope-Labeled Peptide Mixture and Software for Computer Analysis, 140 2-2-3 Label-Free Quantification Software, 145 2-2-4 Absolute Quantification, 147 References, 153 3 Dynamics of Functional Cellular Machinery: From Statics to Dynamics in Proteomic Biology

167

3-1 Dynamic Analysis of Cellular Function, 167 3-1-1 Strategy for Dynamic Analysis of Cellular Machinery (Multiprotein Complexes), 168 3-1-2 Methods for the Isolation of a Cellular Machine/Multiprotein Complex, 174 3-1-3 Cellular Machinery (Multiprotein Complex), 196 3-2 Dynamics of Ribosome Biogenesis, 198 3-2-1 Snapshot Analysis of Preribosomal Particles in Yeast, 201 3-2-2 Snapshot Analysis of Preribosomal Particles in Mammals, 207 3-2-3 Quantitative (Dynamic) Analysis Using Isotope-Labeled Reagents, 224 3-2-4 Outline of Human/Mammalian Ribosome Biogenesis, 231 3-3 Dynamic Analysis of Subcellular Structures, 232 3-3-1 Proteome Dynamics of the Nucleolus, 233 References, 236 Index

249

INTRODUCTION

Proteins are involved in all biological processes and can therefore be considered as the functionally most important biological molecules. The systematic identification and characterization of proteins and “proteome” (a coined word to describe the set of proteins encoded by genome), called “proteomics,” is now becoming one of the major interdisciplinary research areas of life science in the postgenomic era. Like other interdisciplinary research areas with the suffix “omics” (such as genomics, transcriptomics, metabolomics, and bioinformatics), which are categorized as genome science, proteomics has become an essential component of the emerging “systems biology” approach toward the computational simulation of complex biological systems. The advent of proteomics and the other new scientific fields of “omics” changes our understanding of living things and our approach to their unsolved biological problems. Proteomics is a particularly rich source of biological information because proteins are involved in almost all biological activities and they also have diverse properties, which collectively contribute greatly to our understanding of biological systems (1). Currently, a number of international monthly journals are published specifically to cover worldwide proteomics studies, including Electrophoresis, Proteomics, Molecular Cellular Proteomics, and Journal of Proteomic Research. These journals have already gained a reputation for high standards in the published papers. The number of publications in this field has been exponentially increasing (Fig. I-1), and in 2005 alone, there were more than 3300 papers published in those and other authoritative journals in the scientific community, such as Nature and Science. In the last decade, when the field of proteomics was developing the fastest, most studies focused on developing fundamental technologies to analyze proteomes and to describe rather static aspects of the proteome in living systems. A typical technology in classical proteomics is a combination of two-dimensional electrophoresis and off-line mass vii

viii

INTRODUCTION

Fig. I-1. A number of publications on proteomics research. The data was collected by searching PubMed database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB⫽pubmed) for the term, “proteome”* or “proteomics.”

spectrometry, such as MALDI-ToFMS. Although this was a powerful tool in conventional protein science, the current proteomics has expanded more and more to the large scale and dynamic aspects of proteomes, including the analysis of not only all proteins expressed in a cell but also of their status of post-translational modifications, the global analysis of protein–protein interactions and dynamics of functional multiprotein cellular complexes, and the analysis of “postgenomic” matter. Proteomics is a constantly advancing field of state-of-the-art technologies. In this book, we emphasize that the proteome is a dynamic, not static, entity. Advanced proteomics methodologies, such as liquid chromatography (LC)-based mass spectrometry (MS) technology, enhance highly sensitive, high throughput, and comprehensive identification of protein components in complex biological samples. For instance, a typical LC-MS technology can identify several thousand proteins per single run in a crude protein mixture with a subpicomole (or maybe subfemtomole) quantity of sample. Thus, the analysis has been used to catalog proteins and to find novel components in many functional multiprotein complexes such as cell signaling complexes and transcriptional/translational machineries, cellular organelles such as nucleolus and endoplasmic reticulum, and the constituents of a variety of normal and aberrant cells and tissues. In addition, LC-MS technology coupled with and without stable isotope-labeling of proteins is emerging as an innovative tool for quantitative profiling to evaluate the dynamics of protein components in the cells, and thereby allows moment-by-moment snapshot analyses of cellular functions. This book focuses on the latest LC-MS technologies in depth and their biological application to the analysis of dynamic aspects of cellular function, giving rise to a new insight into the molecular basis of cells.

INTRODUCTION

ix

We begin with an overview of proteomics that provides the concept of proteomics and a review of the historical aspects of proteomic analysis and the state-of-the-art of proteomics technologies, together with a number of successful biological applications using the proteomics technology with extensive reference coverage (Chapter 1). In Chapter 2, we describe the basic aspects of LC-MS technology including the principle of the method and the assembly of the LC-MS system; we explain the integrated LC-based MS methodologies coupled with bioinformatics to search for the genome database for large scale and high throughput protein analyses. We also explain experimental examples by using a nanoliter scale LC-MS/MS technology for femtomole scale protein identification, and a fully automated multidimensional LCMS/MS technology for proteome-wide protein identification. In addition, we extend those technologies to the integration of isotope-labeling methodologies for quantitative evaluation of a large set of proteomes and of methodologies to allow specific, large scale identification of post-translational protein modification including phosphorylation, glycosylation, and ubiquitination. The goal of proteomics is a genome-wide survey of protein dynamics, to provide a bird’s-eye view of the protein society of the cell. The current technologies still need further improvement in this aspect: they cannot describe a complete set of proteomes expressed in the cells and its dynamics, but the technologies do have sufficient capability to attack the dynamic aspects of more focused, specific cellular functions including that of an entire organelle. Thus, one of the primary goals of current proteomics research is to describe the composition, dynamics, and connections of the multiprotein “modules” and “machineries” that perform a wide range of biological functions in the cell. Therefore, in Chapter 3, we focus on the global analysis of protein interactions and multiprotein cellular complexes, which include the affinity capture of a variety of specific cellular complexes by using epitope tags or tandem purification tags and the highly sensitive identification of protein components by the shotgun method with a highly sensitive nanoliter scale LC-MS/MS system. Applicability of those methodologies is shown with the most recent applications that include the analysis of signal transduction complexes following stimulation of cells and the analysis of the dynamic processes of ribosome biogenesis. It is widely accepted that the cell signaling process is a cascade of reactions that start with a cell surface receptor and frequently end at molecular machineries in the nucleus with changes in an RNA/protein expression level. The process is time dependent and involves a series of events that include modifications and the formation of a variety of signal transduction complexes. Characterization of complexes was challenging not only because they have many components but also because most of the complexes are unstable and transient. Proteomics technologies start to reveal parts of the molecular machines such as those involved in transcription, pre-mRNA splicing, and ribosome biogenesis. To explain the concept in more concrete terms, we describe the details of ribosome biogenesis from the proteomics point of view that integrates its role into global cellular function. One of the biggest unsolved problems in cell biology lies in ribosome biogenesis. The ribosome constitutes one of the most fundamental,

x

INTRODUCTION

and probably one of the most complex, molecular machines in living cells. Given that protein synthesis is essential for cell growth, proliferation, and adaptation, ribosome biogenesis is intimately coupled to the needs of the cell, or one of the outlets of the cell signaling process. Ribosome biogenesis is an extremely complex and dynamic process in that hundreds of proteins and RNAs are involved in the processing of rRNA and the assembly of ribosomal proteins. Methodologies that attack such extremely complex cellular process were not available until proteomics technologies appeared. Chapter 3 describes how proteomics technologies can apply to the analysis of such complex and dynamic processes of the essential cellular function, and includes an extensive catalog of mammalian trans-acting factors in comparison with their yeast orthologs involved in ribosome biogenesis. The approaches and the knowledge obtained may be useful for researchers working on growth and proliferation of mammalian cells. The approaches described in this book can be applicable to the dynamic analysis of many other cellular processes and thus will provide an advanced guide for students who want to learn about proteomics, for investigators and laboratory staff in academic institutions and industries, as well as for the new generation of scientists in training. Current proteomics research has been moving away from merely collating lists of proteins and mapping interactions analyzed in an early phase of the development, to a more integrated approach, in which proteomic datasets are interpreted in the context of other types of biological data, such as those generated by the approaches of genomics and transcriptomics as well as many other conventional disciplines in biology. Eventually, data infrastructure that allows the capture and dissemination of proteomic data has to be established to accomplish this. Furthermore, systems biology approaches that detect feedback loops and connections between pathways that have eluded biochemical and genetic analyses carried out on an individual researcher level have to be embraced (2). Therefore, we must fill the gaps between the disciplines that have been focused on addressing specific questions in biology and those that have mostly been involved in the development of technologies for proteomics. This book points the way to the dynamic analysis of cellular functions in proteomics using LC-MS technology not only for geneticists and molecular biologists who are moving away from the study of genomics and genotype to that of proteomics and phenotype, but also cell biologists, developmental biologists, and neuroscientists, who are reluctant to get into proteomics because of the interest gap between their individual research and proteomics research.

REFERENCES 1. Patterson, S. D., and Aebersold, R. H. (2003). Proteomics; the first decade and beyond. Nat. Genetics 33:311–323. 2. Editorial (2003). A cast of thousands. Nat. Biotechnol. 21:213.

1 OVERVIEW OF PROTEOMICS

1-1

WHAT IS PROTEOMICS?

The term proteomics is derived from a word “proteome” coined by M. Wilkins in 1995, which indicates “the entire protein complement expressed by a genome, or by a cell or tissue type” (1). The proteome is the time- and cell-specific protein complement of the genome, encompassing all proteins expressed in a cell at any given time. The study of the proteome, compared to the genome, is much more daunting for several reasons; while the genome of the cell is constant, nearly identical for all cells of an organ or organism, and consistent across a species, the proteome is extremely complex and dynamic as it continuously responds to environmental changes due to the presence of other cells, nutritional status, temperature, and drug treatment, to name only a few (2). Proteomics, which is a new field of interdisciplinary science derived from gene sequencing and classical protein chemistry, deals with the proteome; an initial goal of proteomics is the rapid identification of all the proteins expressed by a cell or tissue, although it has yet to be achieved for any species. The task of identifying and quantifying large scale proteins present in a cell or tissue or even an entire organism at a particular time is often referred to as proteome analysis. Because the proteome varies from time to time, any analysis of the proteome is a “snapshot.” As a result, there is no fixed proteome. Moreover, the dynamic range of protein expression within the proteome complicates study of a proteome; it may vary by as much as 7–12 orders of magnitude compared to only five orders of magnitude for DNA (3). The number of genes encoded in the genome of a species is limited

Proteomic Biology Using LC-MS: Large Scale Analysis of Cellular Dynamics and Function By Nobuhiro Takahashi and Toshiaki Isobe Copyright © 2008 John Wiley & Sons, Inc.

1

OVERVIEW OF PROTEOMICS

2

DNA

Genome

Transcription

Noncoding RNAs (rRNA, tRNAs, snRNAs, snoRNAs, scRNAs, etc.)

A subset of the entire gene is transcribed in a given cell and tissue

Coding RNAs

Transcriptome

(pre-mRNAs/mRNAs) RNA editing/modification

Degradation Alternative RNA splicing

Amplifying complexity

Translation

Proteins Degradation

Protein folding Proleolytic cleavage Chemical modification Intein splicing Interaction/assembly

Proteome Amplifying complexity

Biochemical processes

Metabolites

Metabolome

Fig. 1-1. Complexities of a proteome regulated at transcriptional and translational levels.

and ranges from a few hundred for bacteria to tens of thousands for mammalian species; however, the theoretical number of proteins defined by a genome alone is large enough that it is a struggle to entirely identify then by proteome analysis using technologies currently available. In addition, proteins are composed of 20 distinctive amino acids, each of which has its own chemical and physical characteristics including molecular weight (Mr) and isoelectric point (pI), and differ in their amino acid sequence and length; thus, they are extremely heterogeneous in chemical and physical characteristics. To make matters worse, the number of protein species produced from a genome is amplified, as the same gene can generate multiple protein products that differ as a result of RNA editing and alternative splicing of pre-mRNA, and of processing and chemical modifications of translated polypeptides (Fig. 1-1). The number of proteins, for instance, expressed by the approximately 24,000 human genes can be up to 100 times greater due to the known diversity of mRNA processing as well as to post-translational modifications of proteins (2). The main difficulty in performing proteome analysis lies in these diverse properties of proteins generated at post-transcriptional and post-translational levels in addition to the large number of proteins encoded by genes assigned in a genome. Thus, it is a formidable challenge to develop proteome analysis technologies that can handle simultaneously a huge number of proteins with extremely heterogeneous characteristics, and to identify in a comprehensive and flexible manner such extremely diversified proteins. Proteomics should be distinguished from conventional protein chemistry, which deals with mostly “individual proteins” to determine sequence, modification state, interaction partner, activity, structure, and so on. Protein chemistry was the

ESTABLISHMENT OF PROTEOMICS USING MASS SPECTROMETRY

3

mainstream approach of biological science in the 1970s but decreased in use during the flourishing era of genetic engineering in the 1980s. Proteomics, however, incorporated a number of basic ideas and technologies from conventional protein chemistry, and advanced those with new ideas incorporated into postgenomic science and emerging technologies to handle proteins on a large scale with small quantities and in a systematic way. In other words, protein chemistry has evolved and has been reincarnated as proteomics backed by technological advancements mainly in mass spectrometry and the accumulation of huge protein sequence data as a consequence of the global genomic sequencings of many species, which have catalyzed an expansion of the scope of biological studies from reductionist biochemical analysis of single proteins to proteome-wide measurements (4). Because proteomics deals with huge numbers of proteins in extremely complex mixtures of cells, tissues, and even organisms, the technologies used in proteomics have to achieve comprehensive and high throughput identification of proteins within an appropriate time.

1-2 ESTABLISHMENT OF PROTEOMICS USING MASS SPECTROMETRY 1-2-1

Protein Separations by Two-Dimensional Electrophoresis

The first step of proteome analysis is to separate proteins using high resolution and high speed techniques. Conventional protein chemistry had developed methodologies that separate proteins with high resolution, such as two-dimensional electrophoresis (2DE), and high performance, such as high performance liquid chromatography (HPLC). Two-dimensional electrophoresis, which is performed principally based on the method described in 1975 by O’Farrell, involves, in the first dimension, a separation by isoelectric focusing in gel rods containing urea and detergents, and, in the second dimension, a perpendicular separation in acrylamide (gradient) slab gels containing sodium dodecyl sulfate (SDS) (5). Because denaturing reagents are used in both dimensions, the physicochemical characteristics of polypeptides, whose secondary and tertiary structures are broken down, distinguish between the proteins; namely, proteins are separated first by isoelectric point (pI)(based on electric charge of polypeptides), and second by polypeptide length (based on molecular weight of polypeptides). Those two properties are not correlated at all; thus, proteins are spread effectively over the resulting two-dimensional pattern. The use of denaturing conditions in both dimensions is required in order to obtain high resolution and to distinguish a single charge difference, especially between two small proteins (but not always between large proteins), which may be generated by a single amino acid substitution or post-translational modification. [The single charge difference produces a much greater shift in the pI of small a protein (or subunit) than a large one (or oligomer).] In addition, 2DE can also detect some differences in molecular weight, which may be produced by proteolytic cleavage and/or post-translational modification. [The molecular weight differences are more easily detected by examination of individual polypeptides (of small proteins) rather than assembled oligomers (of large

OVERVIEW OF PROTEOMICS

4 TABLE 1-1. Typical Description of 2DE and 2D-DIGE

Two-Dimensional Electrophoresis Apparatus: First dimension IPGphor/Multiphor II/HoeferDALT (Amersham Biosciences) Dry IPG gels (Immobilin, APB; Bio-Rad) Second dimension Ettan Dalt (Amersham Biosciences) Staining method: Coomassie brilliant blue, silver ion, SYPRO ruby (Molecular Probes) Image scanner: GS710 Imaging densitometry, Molecular Imager FX, FluorS multiimager (Bio-Rad) Image analysis software: Melane3/PDQuest (Bio-Rad), ImageMaster 2D Elite 3.0/ Database 3.0 (Amersham Biosciences) 2D-DIGE, Two-Dimensional Differential Image Gel Electrophoresis Prelabeling of proteins: CyDye NHS (Cy3 550 nm, Cy5 649 nm) Minimal dye labels at lysine residue (Cy2 ⫽ MW 550.59; Cy3 ⫽ MW 582.76; Cy5 ⫽ MW 580.74) Saturation dyes at cysteine residue (Cy3 ⫽ MW 672.85; Cy5 ⫽ MW 684.86) Fluorescence image scanner: 2D ImageMaster, Typhoon (Amersham Bioscience) FLA5000 (Fuji Film Co.) Image analysis software: DeCyder, PDQuest (Bio-Rad)

proteins) under denatured conditions.] Thus, 2DE is a suitable method to distinguish different proteins in a chemical structure, to provide an inventory of the polypeptides in a mixture, and to create a protein catalog (5). Although resolution of 2DE separation is dependent on the dimension of the slab gel and the pH gradient used for isoelectric focusing in the first gel rod, a typical 2DE separates 1000–3000 protein species, which can be detected by protein staining with Coomassie blue, silver ion, and fluorescent dyes, such as SYPRO Ruby (6, 7) (Table 1-1). Larger dimension slab gels achieve much better separation and can separate over 10,000 protein species on a single gel in some cases, although a large amount of sample is required. 2DE has the highest resolution of the available technologies used to separate proteins in biological samples to date. Seemingly, a staining pattern on a 2DE gel gives a whole view of a proteome (or more correctly a subset view of a proteome), but does not necessarily represent the entire set of proteins expressed in an analyzed biological sample. A comparison between the two different 2DE staining patterns obtained from two or more (related or unrelated) biological samples enables us to estimate a change or difference of the proteomes in the different biological samples, by comparing independent 2DE gels visualized either by conventional staining methods or by 2D-DIGE (two-dimensional differential image gel electrophoresis). Coomassie blue dye has traditionally been used to stain polyacrylamide gels but does not provide the sensitivity needed for visualizing low abundance proteins. Silver staining is more sensitive but is not optimal for quantitation purposes because of its narrow dynamic range (8). Consequently, fluorescent dyes that are very sensitive and have a large linear dynamic range have

ESTABLISHMENT OF PROTEOMICS USING MASS SPECTROMETRY

5

been developed (9), although the cost of the dyes and the equipment used for visualization in addition to the variability between 2D gels have restricted their use (10). In 2D-DIGE, two or more pools of proteins are labeled with different fluorescent dyes (11, 12), and the labeled proteins are mixed and separated in the same 2DE gels; thus 2D-DIGE enables one to compare the differences in protein expression between the two samples or among the several samples on a single 2DE gel. Development of the immobilized pH gradient as the first-dimensional separation medium of 2DE (in which the pH gradient was fixed within the acrylamide matrix) popularized 2DE as a protein separation method for proteome analysis. In addition, the principles used for the global, quantitative analysis of gene expressions, such as the use of clustering algorithms and multivariate statistics that had been developed in the context of 2DE (13, 14), produced various image analyzing apparatus and software for quantitative comparison of the 2DE gels and further popularized 2DE as a methodology for proteome analysis (Table 1-1). 1-2-2 Development of the Technologies for Protein Identification The second step of proteome analysis is to identify proteins separated by technologies with high resolution power such as 2DE described previously. One of the archetypal proteome analyses was that applied to human plasma, where proteins were separated by 2DE and identified by a combination of various methods, including Western blotting, coelectrophoresis of purified proteins, and Edman sequencing of a protein extracted from gels (5). Although human plasma had been profiled extensively by those analyses in the early 1980s, a large number of antibodies and purified proteins were required for Western blotting and coelectrophoresis, respectively. However, it was obvious that those approaches were not suitable for the large scale identification of proteins. Conventional protein chemistry had made great efforts in technological development for identification of proteins separated by gels in the 1980s and early 1990s (4). Edman sequencing technology, which had been the most common technique to determine the amino acid sequence of a purified protein or enzymatically digested peptides, was applied to proteins either extracted in solution or blotted onto membrane by electroelution from gels and facilitated protein identification on gels. The gas phase protein sequencer (based on Edman sequencing technology) improved the sensitivity of protein sequencing. However, because the sequence database contained only a part of the total protein list, the chance that a particular protein and/or gene sequence was already represented in a sequence database was quite low. Therefore, during the 1980s and early 1990s, the main purpose of sequencing proteins separated on gels was either to synthesize degenerate oligonucleotide primer for cloning of its corresponding gene or to link the activity of a purified protein and amino acid sequence (4). Genomic sequence analyses began to flourish and provided entire genomic sequences of bacteria in the mid-1990s, and subsequently established sequences for a number of other species including human beings by the early 2000s. The data of genomic sequences facilitated protein identification with even a partial sequence of proteins separated on gels and allowed one to specify the complete sequence of proteins (genes) without the need for further experimentation. Thus, the first stage

OVERVIEW OF PROTEOMICS

6

of proteomic analysis used Edman sequencing technology to identify proteins separated by 2DE. The N-terminal sequence or partial peptide sequence could be used for retrieval of the analyzed protein from the protein/gene sequence database; however, it was not sensitive enough to determine the amino acid sequence of many proteins obtained from 2DE gels, and was not applicable to proteins with amino terminus (N terminus) blocked by chemical modifiers, such as acetylate or pyroglutamate. Consequently, a number of proteins on 2DE gels remained unidentified by adopting Edman sequencing technology. In addition, Edman sequencing was a time-consuming technology; thus, it was not suitable for systematic and rapid identification of proteins. With the normalized sequence database established as a result of the completion of a number of genomic sequencing analyses, a rapid methodology for protein identification was eagerly awaited. 1-2-3 Protein Identification Based on Gel Separation and Mass Spectrometry Mass spectrometry (MS) measured precise molecular weight, distinguished closely related molecular species, and had been a powerful method for physicochemical analysis of small molecules long before protein chemists began to apply MS to the determination of protein sequence in the late 1970s. In principle, a mass spectrometer is comprised of an ionization source (which adds charge onto a neutral analyte or forms an ion) and a mass analyzer (which measures the a mass-to-charge ratio, m/z, of a molecule). Thus, the analyte for mass spectrometry has to be ionized before being transferred into the high vacuum chamber of the mass analyzer (4). At the earliest stage, protein chemists used fast atom bombardment (FAB) as the ionization source and a magnetic sector as the mass analyzer (Fig. 1-2), but Ion Source

Mass Analyzer

Fast Atom Bombardment (FAB)

Magnetic Sector

Electrospray Ionization (ESI)

Triple/Quadrupole Ion-Trap (IT)

Hybrid

Matrix-Assisted Laser Desorption (MALDI)

Time-of-Flight (ToF) Fourier Transformed Ion Cyclotron Resonance (FT-ICR)

Fig. 1-2. Mass spectrometers used for proteomics research. Mass spectrometers are composed of two major compartments with different functions: an ion source that generates ions of target molecules and a mass analyzer that estimates the mass-to-charge ratio (m/z) of target ions. In mass spectrometers for proteomics research that analyze mostly proteins and peptides, an ESI or MALDI ion source is most frequently coupled with a triple-stage quadrupole (TSQ), ion-trap (IT), time-of-flight (ToF), or Fourier transformed ion cyclotron resonance (FT-ICR) mass analyzer. In some mass spectrometers, multiple mass analyzers are combined in a single instrument to perform “tandem mass spectrometry” for structural analysis of target molecules.

ESTABLISHMENT OF PROTEOMICS USING MASS SPECTROMETRY

7

had only limited success for very small peptides. Proteins and even peptides were very large molecules in standard mass spectrometry and it was difficult to form ions in a vacuum or at atmospheric pressure without destroying the molecules. Therefore, protein sequence determination with a mass spectrometer was hampered by this inability for a considerable time until new methods for the ionization of large molecules were developed. In the late 1980s, two methods—matrix-assisted laser desorption ionization (MALDI) (15) and electrospray ionization (ESI) (16)—were developed that allowed the ionization of peptides and proteins at high efficiency and without excessive destruction of the molecule. MALDI is a process that allows desorption of proteins and peptides, achieved by short laser pulses fi red at those proteins and peptides that are cocrystallized with a matrix on a sample plate, which is held under vacuum (Fig. 1-3A). The matrices are chemical compounds that absorb the energy of the laser and promote

3. Ionization of target molecules 1. Pulse laser 33 ns, 337 nm 2. Excitation of matrix molecules fired by laser pulses

Target molecules

Matrix molecule α-Cyano-4-hydroxycinnamic acid Sinapinic acid 2,5-Dihydroxybenzoic acid 3-Hydroxypicolinic acid etc.

Metal sample plate

(A)

Fig. 1-3. Schematic illustration of (A) matrix-assisted laser desorption ionization (MALDI) and (B) electrospray ionization (ESI) of biological molecules for mass spectrometric analysis. MALDI is performed by embedding target molecules in an excess of a specific wavelengthabsorbing matrix, such as α-cyano-4-hydroxycinnamic acid, which is dried to produce a cocrystallized mixture on a metal sample plate. Ions are produced by bombarding the sample with short-duration pulses of ultraviolet (UV) light from a nitrogen laser. The interaction of the laser pulse with the sample results in ionization, usually protonation, of both matrix and target molecule via an energy transfer mechanism from the matrix to the embedded target, rather than by direct laser ionization. ESI creates gas-phase ions by applying a potential to a flowing liquid that contains the target and solvent molecules. A fine spray of microdroplets is generated upon application of a high electric tension (typically ±3–5 kV) through a needle chip. Solvent is removed as the droplets enter the mass spectrometer by heat or some other form of energy, such as energetic collisions with an inert gas.

OVERVIEW OF PROTEOMICS

8 (B)

High voltage Mass analyzer

Needle chip

(vacuum)

+ + + + + + + + + +

++ + + + + + ++ + ++ + + + + + ++ +

+

+ + ++

++

+ + + + ++ + + + ++

+ + +

Fig. 1-3. (Continued)

ionization of the proteins and peptides. On the other hand, the ESI process is achieved by spraying a solution through a charged needle tip at atmospheric pressure toward the inlet of the mass analyzer (Fig. 1-3B). The voltage applied to the needle tip results in the formation of ions and the pressure difference results in their transfer into the mass analyzer. These ionization methods made it possible to determine molecular weights of peptides and proteins in high precision in combination with mass analyzers (Fig. 1-2). Although MALDI ion sources are coupled with any of the time-of-flight (ToF), ToF-ToF, quadrupole, and Fourier transformed ion cyclotron resonance (FT-ICR) mass analyzers (Fig. 1-2), MALDI-ToF mass spectrometers (MSs) are often used for obtaining single-stage mass spectrometry (MS) spectra that provide mass information on all ionizable components in a sample, and determine the masses of proteins or peptides with a high degree of accuracy (4). MALDI-ToFMSs are especially suitable for determining the masses of various peptides generated by fragmentation of an isolated protein with an enzyme of known cleavage specificity, such as trypsin (cleaves at C-terminal side of lysine and arginine) or endopeptidase Lys C (cleaves at C-terminal side of only lysine). The collected list of peptide masses is matched to those masses calculated from the same proteolytic digestion of each entry in a sequence database by database search algorithms (available at http://www/ mann.emble-heidelberg.de/GroupPages/PageLink/peptidesearchpage.html, http:// prospector.ucsf.edu/ucsfhtml4.0/msfit.html, http://www.matrixscience.com, http:// prowl.rockefeller.edu/, http://us.expasy.org/tools/peptident.html, etc.) that were reported originally by several independent groups in 1993 (17–20). This approach is

ESTABLISHMENT OF PROTEOMICS USING MASS SPECTROMETRY

9

called the peptide mass fingerprinting (PMF) method (Fig. 1-4) and has greatly facilitated the identification of proteins separated by polyacrylamide gel electrophoresis in conjunction with in-gel protease digestion, in which proteolytic fragmentation of a protein was performed in a gel piece excised from an electrophorized gel (Fig. 1-4B, Protocol 1.1) (21–23). The in-gel protease digestion–PMF method using MALDI-ToF was sensitive enough to identify a protein present even in a single spot on a 2DE gel stained by silver ion or fluorescent dye, which is a highly sensitive method to detect proteins on a 2DE gel. The PMF method is far less time consuming than Edman sequencing; thus, it is now a common approach for identifying proteins separated by 2DE. The approach using MS, in which the protein is first purified and cleaved into peptides, whose relative molecular weight (Mr) values are measured by MS and used for protein identification, is often called the bottom–up approach (3, 24, 25); an approach where the protein mixture, without digestion, is introduced directly into the MS instrument, is separated with high resolution, and is dissociated to measure the resulting fragment masses that are matched

Fig. 1-4. (A) Protein identification by in-gel protease digestion and peptide mass fingerprinting. In this method, a target protein, such as that separated by two-dimensional electrophoresis, is ingel digested with a sequence-specific protease, usually trypsin, and the resulting peptide mixture is analyzed by mass spectrometry, thus generating a peptide mass fingerprint. Experimentally determined peptide masses are compared with those obtained theoretically by a search algorithm such as MSFit or Mascot. (See insert for color representation.) (B) A typical experimental protocol for in-gel protease digestion of proteins separated by polyacrylamide gel electrophoresis. The resulting peptide mixture is analyzed directly on a mass spectrometer with a MALDI or ESI ion source.

OVERVIEW OF PROTEOMICS

10

Fig. 1-4. (Continued)

against protein database is called the top–down approach (see Section 2-2) (26). The specificity of the protease used for in-gel digestion, the number of peptides identified from each protein species by mass spectrometry, and the mass accuracy of the mass spectrometer are key elements in successful protein identification using the PMF approach; thus, the PMF method is applicable only to the identification of a single protein or a few proteins separated mostly by gel-based techniques (1DE, 2DE). When the PMF approach is not sufficient for identification, peptides can be sequenced by tandem MS (MS/MS) with an ESI ion source and the fragmentation pattern can be used to identify the protein in a database, even on the basis of one or a few peptide sequences. Although ESI ion sources were originally coupled with a triple-quadrupole or ion trap mass analyzer, they can be used in conjunction with most available mass analyzers including hybrid type MS/MS analyzers such as quadrupole-ToF and ToFToF MS/MS analyzers (Fig. 1-2). These MS/MS instruments are equipped with a mass filter that can select a peptide ion from a mixture of peptide ions (or that can isolate specific ions from a mixture on the basis of their m/z ratio), a collision cell in which peptide ions are fragmented in a sequence-dependent manner into a series of product ions through collision of the selected precursor ion with a noble gas (in a process referred to as collision-induced dissociation—CID), and a second mass analyzer that records the fragment ion mass spectrum (Fig. 1-5A) (4). Thus, these

ESTABLISHMENT OF PROTEOMICS USING MASS SPECTROMETRY

11

Fig. 1-5. (A) Protein identification by tandem, or MS/MS, mass spectrometry. In this method, individual peptides from a peptide mixture are isolated in the first step in the mass spectrometer and fragmented by collision-induced dissociation (CID), that is, by collision with an inert gas in a collision cell, during the second step in order to obtain the structural information of the peptide. (See pages 12–13 for more detailed information) (See insert for color representation). (B) Major fragment ions produced by tandem mass spectrometry of polypeptide chain. CID mainly induces the cleavage of peptide at peptide bonds and generates a series of fragment ions from the amino or carboxyl terminus of the peptides, which are termed “b” or “y” series ions, respectively. (C) Sequence tag. In tandem mass spectrometry, a single particular protein can be identified from the structural information of a single peptide produced by digestion with a sequence-specific protease, such as trypsin. The identification is based on the precise molecular mass of a particular peptide (mass-1), the internal amino acid sequence, which serves as a specific “tag” of the peptide, and the molecular masses of the remaining amino and carboxyl terminal portion of the peptides (mass-2 and mass-3). The computer algorithm, such as that developed by Mann and Wilm (27), selects a set of potential peptides having mass-1 from a database and then specifies one of those by the sequence tag and mass-2 and mass-3.

OVERVIEW OF PROTEOMICS

12

b series ions b1

b2

b3

b4

b5

R4

y4

y3

R5 y2

y series ions

Amino terminus

=

R3

=

y5

=

R2

=

R1

=

=

HO HO HO HO HO HO H2N- C-C-N-C-C -N-C-C -N-C-C -N-C-C -N-C-C-OH R6 y1

Carboxyl terminus

(B)

Sequence Tag Mass-1

K-

-E-I/L-A-I/L-VMass-2

EI AI V ELAI V EI ALV ELALV

K Mass-3

(C)

Fig. 1-5. (Continued)

instruments are especially suitable to generate an MS/MS spectrum (or CID spectrum) that corresponds to the fragment ion spectrum of a specific peptide ion and is generated in an amino acid sequence-dependent manner (Fig. 1-5B). This is coupled to the development of algorithms that can retrieve protein entries from database with either the peptide sequence tag assigned from the MS/MS spectrum (Fig. 1-5C) (27) or the MS/MS spectrum itself (28–30). The ESI-MS/MS approach has become a rapid and reliable method for protein identification. The approach is referred to as the sequence tag or MS/MS ion search method and identifies proteins with a much higher specificity than the PMF method. In addition, ESI-MS/MS can be combined with high performance liquid chromatography (HPLC), which separates peptides and can supply those continuously to the mass spectrometer. The system is known as LC-MS/MS and provides an additional powerful and reliable approach for protein identification and is described in Section 2-2-1) in detail. Most mass spectrometry-based analyses, commonly using Mascot (30) and/or SEQUEST (31), identify proteins by searching the databases. Such analyses generate datasets that include peptide/protein assignments and variables that yield detailed

ESTABLISHMENT OF PROTEOMICS USING MASS SPECTROMETRY

13

information on protein structure and function. In addition, the resulting datasets generally need further evaluation and reorganization, because they often include ambiguous peptide identity data and redundant peptide assignments for each protein; thus, data processing in proteomics studies is both labor intensive and time consuming (32). A number of software packages have been developed to expedite the data processing step. Of those, for example, Autoquest (33), SEQUEST SUMMARY (31), DTAselect (34), and INTERACT (35) are designed to organize and rearrange peptide/protein identification results. These software packages permit rapid data processing after identification of proteins by SEQUEST. STEM (STrategic Extractor for Mascot’s results, available at http://www.sci.metro-u.ac.jp/proteomicslab/), on the other hand, is designed to process Mascot search data and is a stand-alone computational tool that evaluates, integrates, and compares large datasets produced by Mascot (36). Web-based software DBParser is also developed for the analysis of large scale data obtained from LC-MS/MS (37). The results obtained by STEM and DBParser actively link to the primary mass spectral data and to public online databases such as NCBI, GO, and Swiss-Prot in order to structure contextually specific reports for biologists and biochemists. With the rapid advances in protein analytical technologies developed in the early 1990s, a huge number of protein constituents in a biological sample such as a cell or tissue extract separated by 2DE could be identified with much higher sensitivity and speed than with the technologies available before the mass-based technologies had emerged. Expansion of protein database fueled by the large scale genomic sequencing of many species made a big push to identify proteins on a large scale by those approaches. As described in Section 1-2-1, changes in a subset of the proteome could be seen on 2DE gels as different spots after the separation of proteins present in cells or tissues, similar to the appearance or disappearance of new mRNA in a differential display analysis or using DNA arrays. The PMF and/or MS/MS ion search method was immediately applied to the large scale identification of those changed proteins on 2DE gels as well as to the identification of the unchanged proteins. Because the methodologies based on 2DE and mass spectrometry were so powerful in identifying proteins in terms of sensitivity and speed when compared with methods used in conventional protein chemistry, this kind of approach dominated the first generation of proteomic studies. In fact, the proteome study involved mostly quantitative comparison and mass spectrometry-based identification of proteins separated by 2DE in the mid- to late-1990s, and resulted in a number of online 2DE gel-based protein profiling and catalog databases for various organisms, tissues, and cells, including plasma and urine, which had been used for clinical diagnosis (38–42) (http://kr.expasy.org/ ch2d/, http://kr.expasy.org/ch2d/2d-index.html). Intense interest grew in applying the approach to develop new biomarkers for diagnosis and early detection of disease; this led to the identification of a number of disease-related changes in protein expression including those associated with heart disease and various cancers (43–51). Despite the great advancement in large scale protein identification for proteomic analysis, the 2DE mass spectrometry-based identification method was realized as a rather low throughput approach in that it requires a relatively large amount of protein sample, even though all the improvements in highly sensitive detection methods,

14

OVERVIEW OF PROTEOMICS

automated spot cutting and in-gel digestion, automated mass spectrometry analysis, and so on had been made. The requirement of a large amount of protein sample was particularly problematic for clinical samples since such samples are generally procured in limited amount (51). In addition, clinical samples contain heterogeneous cells (or are mixtures of various types of cells) and thus are extremely complex in terms of protein constituents, which makes proteomic analysis much more difficult than analysis methods used in basic biology such as bacteria and cultured cells. To reduce heterogeneity of clinical samples, various tissue microdissection approaches could be used. For example, laser-capture microdissection allows one to isolate defined cell types from tissues, provides very useful samples for comparative analysis between cells of normal and disease-affected areas, and reduces tissue heterogeneity; however, it yields a small amount of proteins, making it difficult to meet the need for greater amounts for 2DE (47). In addition, 2DE has technical limitations on protein separation: that is, 2DE can resolve proteins mostly in the pI range of 3 to 11 and in the molecular weight (MW) range of several to 150 kDa. The other limitations facing proteomics analysis using 2DE include the inability to meet the great dynamic range of protein abundance [that spans an estimated range of five to six orders of magnitude for yeast cells (52) and more than ten orders of magnitude for human serum (e.g., from interleukin-6 at ∼2 pg/mL to albumin at 50 mg/mL)] (53) and the extent of hydrophobicity and post-translational modifications, even though they are not restricted to the 2DE-based approach. Despite the fact that 2DE is the highest resolving protein separation method known, it was also realized to not always be complete; the incidence of comigrating proteins is rather high (54). Because quantification in 2DE relies on the assumption that one protein is present in each spot, comigration jeopardizes comparative quantification analysis with 2DE. Furthermore, when unfractionated cell lysates were separated by 2DE, only a subset of a cellular proteome was visualized with available protein staining methods (4). Allowing that there are a number of limitations, however, it is certain that the 2DE mass spectrometry-based methodology has revolutionized the protein identification strategy and changed ways of thinking in the fields of protein chemistry and biology. As an estimate of all the constituents in a proteome of a single cell, a global analysis of protein expression has been done by using the yeast cell fusin library, where each open reading frame is tagged with a high affinity epitope and expressed from its natural chromosomal location (http://www.yeastgenome.org/chromosomeupdates/start_changes.shtml) (55). A census of proteins expressed during log-phase growth and measurements of their absolute levels through immunodetection of the common tag suggested that about 80% of the proteome is expressed during normal growth conditions, and that the abundance of proteins ranges from fewer than 50 to more than 106 molecules per cell (http://yeastgfp.ucsf.edu/). This experiment augments efforts to view the proteome using MS-based protein identification technologies and provides a comprehensive and sensitive view of the expressed proteome in a eukaryotic cell (56). This estimate can also be used to validate the capability of the developed proteomics techniques, including MS-based protein identification methods.

STRATEGIES FOR CHARACTERIZING PROTEOMES

15

1-3 STRATEGIES FOR CHARACTERIZING PROTEOMES AND UNDERSTANDING PROTEOME FUNCTION One of the main purposes of proteomic biology is to identify particular members of proteomes that participate in specific biological processes, to assign a function to each, and to devise strategies for their selective modulation (57). However, it was realized that just a description of protein components and the abundance of a proteome did not meet the requirements for reaching that goal. Backed by the development of powerful proteomics technologies typically using gel-based mass spectrometry, new strategies for characterizing functional aspects of proteomes began to take shape. One pioneering work adopting such new strategy is the systematic identification of in vivo substrates of the chaperonin GroEL by using 2DE and mass spectrometry (58). GroEL has an essential role in mediating protein folding in the cytosol of Escherichia coli and is involved in the folding of ∼10% of newly translated polypeptides in vivo, while it interacts in vitro with almost any nonnative model proteins. Thus, GroEL has an obvious preference for a subset of E. coli proteins in vivo; however, only a few proteins constituting the subset were known. In a proteomic approach to elucidate the in vivo substrate of GroEL, the newly synthesized substrates of GroEL were isolated by large scale immunoprecipitation with anti-GroEL antibodies in the presence of EDTA, which prevents the ATP-dependent release of protein substrates from GroEL. The study also analyzed the structurally unstable substrates of GroEL that were generated by heat shock and were also prepared by the same immunoprecipitation method. The PMF method using MALDI-ToF was applied to the analysis of the immunoprecipitated substrates after the separation by 2DE (first dimension, 13 cm Immobiline DryStrip pI 4–7L; second dimension, 11 cm SDS-PAGE gel MW 5–110 kDa) and the staining with Coomassie blue (58, 59). The analysis revealed that GroEL interacts strongly with a well-defined set of about 300 newly translated polypeptides out of 2500 polypeptides in the cytosol of E. coli, of which about one-third are structurally unstable and return to GroEL for conformational maintenance. Interestingly, GroEL substrates consisted preferentially of two or more domains with αβ folds, which contain α helices and buried β sheets with extensive hydrophobic surfaces. Because those proteins were expected to fold slowly and be prone to aggregate, the hydrophobic binding regions of GroEL are well adapted to interact with the nonnative states of αβ-domain proteins. Thus, the systematic analysis of GroEL substrate using 2DE gel mass spectrometry and database comparisons successfully highlighted the key structural features that determine the interacting proteins’ need for chaperonins during protein folding in vivo (58). This work provides examples of unique strategies for identifying proteins associated with a particular biological activity, thereby taking a step toward functional identification of a proteome. Thus, two main types of strategies in proteomics (that have complementary objectives to each other) have been formed so far: one is the global characterization of protein expression that is referred to as expression proteomics, descriptive proteomics, or cataloging proteomics (61, 62), and another is the characterization of proteome function that is referred to as functional

16

OVERVIEW OF PROTEOMICS

TABLE 1-2. Strategies for Quantitative Proteomics Expression proteomics (descriptive proteomics) Disease (Clinical) proteomics Functional proteomics (focused proteomics) Modification-specific proteomics Activity-based protein profiling/enzyme substrate proteomics Subcellular (organelle) proteomics Machinery or complex (interaction) proteomics Dynamic proteomics

proteomics or focused proteomics (Table 1-2) (3, 63). As described in the Section 1-2-3, large scale efforts to measure protein expression have typically relied on 2DE mass spectrometry-based methods and LC-MS/MS-based methods described in later sections, which are capable of simultaneously evaluating the relative abundance and permitting the identification of new proteins associated with discrete physiological and/or pathological states. By focusing on measurements of protein abundance, however, expression proteomics (descriptive or cataloging proteomics) provides only an indirect assessment of protein function and may fail to detect important post-translational forms of protein regulation, such as those mediated by enzymatic activities and protein–protein and/or protein–biomolecule interactions (64, 65). To expedite the analysis of post-translational forms of protein regulation, focused proteomics and/or functional proteomics have formed subdivided strategies (66, 67), which analyze a limited subset of a proteome with common features including a particular post-translational modification, an enzymatic activity, a specific cellular localization, and the functional relationship among proteins in an identified protein cluster. Those strategies also intended to fill the gaps between the ideal proteome analysis that completely characterizes the entire proteome and the inability to characterize it due to the current technological limitations (60, 68). The subdivided strategies for functional or focused proteomics are modification-specific proteomics that focuses on mapping post-translational modifications (69, 70), and activity-based protein profiling (ABP), in which a chemical probe is used to label and isolate an enzyme from a complex mixture and allows searching substrates of a specific enzyme and/or each of mechanistically distinct enzyme class (64, 71–73). Those strategies also include subcellular (organelle) proteomics that focuses on mapping proteomes of subcellular structure or organelles (74–76), machinery (complex interaction) proteomics that focuses on mapping functional multiprotein complexes, cellular machinery, and interactions (interactomes) (77–83), and dynamic proteomics that deals with the need to monitor proteinredistribution events (Table 1-2) (75, 84–87). The development of various new strategies is also ongoing (88). Those subdivided strategies are well designed for attacking proteome functions from a broad point of view of post-translational forms of protein regulation as follows.

STRATEGIES FOR CHARACTERIZING PROTEOMES

1-3-1

17

Modification-Specific Proteomics

Post-translational modifications (PTMs) of proteins are covalent processing events that change the properties of a protein by proteolytic cleavage or by addition of a modifying group to one or more amino acids, and can determine its activity state, localizations, turnover, and interactions with other proteins (69). Many vital cellular processes are governed not only by the relative abundance of proteins but also by these PTMs. However, the full extent and functional importance of protein modifications in the working cell are not well understood because of a lack of suitable methods for their large scale study. At least 200 different PTMs are known (http:// en.wikipedia.org/wiki/Posttranslational_modification) (89); the best characterized PTMs in eukaryotes are phosphorylation and glycosylation, but other common PTMs include acetylation, methylation, ubiquitination, and sumoylation (Table 1-3). A single protein can be modified not only at multiple sites within the molecule but also in various combinations of those PTMs. In addition, protein modifications at given sites are typically not homogeneous; a specific modification may occur only partially at a given site of the protein. The amount of protein in a single modification state can thus be a very small fraction of the total amount of the whole population of the protein; thus, the specification of a PTM at a given site requires in general a large amount of the protein. Furthermore, a single gene can give rise to a number of gene products as a result of alternative splicing. All of those contribute to the extreme complexity of an entire proteome and cause difficulty in characterizing even a specific type of PTM on the proteomic scale. Many techniques for mapping PTMs had been developed in classical protein chemistry and are now being examined for their applicability with new ideas and technologies on the proteomic scale. Those efforts are called modification-specific proteomics and have now been reported for a number of different PTMs and, especially in phosphorylation and glycosylation analyses, are beginning to yield results for proteomic-wide PTM analysis (Table 1-3) (69, 90). In an early stage of modification-specific proteomics, 2D-PAGE was used as the first choice for mapping PTMs. 2D-PAGE has the highest resolution of the known protein-separation methods and has sufficient resolution to separate modification states of a protein in some cases. For example, modifications that cause changes of protein charge, such as phosphorylation and glycosylation, result in the horizontal shift of protein spots and may be detected on 2D-PAGE gels. The same modification may also result in a change of molecular weight, depending on the molecular weights of the modified functional groups. However, the mobility changes on 2D-PAGE gels alone specify neither the protein nor the type of modification. Because MS measures mass-to-charge ratio (m/z), yielding the molecular weight and fragmentation pattern of peptide derived from proteins, it represents a general method for all modifications that change the molecular weight (69). Thus, in modification-specific proteomics, MS methods are used in conjunction with the methods for protein separation as common technologies to identify proteins, the type of PTM, and/or the specific sites of the modification in proteins (Table 1-3). Although 2D-PAGE is one choice for detecting PTM of proteins in combination with detection methods such as Western blotting, chemical labeling,

18

Catalyzed by protein kinase

Phosphorylation

Modification Type

92

93, 136

Ser/Thr

133, 134

131, 132

References to Proteomic Scale Analysis

Ser/Thr

80

Mass Change (Da)

135

PO4⫺

Functional Group

Tyr

Tyr, Ser/Thr, His/Asp (in prokaryote) Tyr, Ser/Thr

Amino Acid Modified

TABLE 1-3. Post-translational Modifications

Enrichment of proteins with phospho-Ser/ phospho-Thr by immunoprecipitation. Immunodetection with antibody against phospho-Tyr, phospho-Thr, or phospho-Ser. Precursor ion scanning in positive-ion mode utilizing the immonium ion of phosphotyrosine (called phosphotyrosinespecific immonium ion scanning) on a Q-ToF after immunopurification with antiphosphotyrosine antibody Modification of phosphopeptides with free sulfhydryls that are then trapped by covalent attachment to iodoacetic acidlinked glass beads. Acid elution regenerates phosphopeptides, which are then analyzed by MS. β-Elimination and addition of biotinyliodoacetamidyl-3,6-dioxaoctanediamine containing either four alkyl hydrogen or four alkyl deuterium atoms. Biotinylated peptides are further purified in a second avidin-binding step and analyzed by MS.

In vivo labeling with 32P/33P (ATP/GTP)

Principle for Experimental Specification of a Modification

19

139, 140 141, 142

143 144

Tyr, Ser/Thr Tyr, Ser/Thr

138

Ser/Thr

Tyr, Ser/Thr Tyr

137

Ser/Thr

(continued)

β-Elimination and addition of aminoethylcysteine convert phospho-Ser/phospho-Thr into Lys analogs aminoethylcysteine and βmethylaminoethylcysteine, respectively. Trypsin cleaves the peptide bond at the C-terminal side of the converted amino acid. Immobilized aminoethylcysteine can be used to capture proteins/peptides with Ser/Thr phosphorylation sites. An immobilized library of partially degenerate phosphopeptides biased toward a particular protein kinase phosphorylation motif is used to isolate phospho-binding domains that bind to proteins phosphorylated by specific kinase. Top–down MS/bottom–up MS/MS. In vivo labeling with 12C and 13C tyrosine and immunoisolation with anti-phosphotyrosine antibody for quantitative comparison. 32 P labeling and Edman sequencing/2DE. Immunoprecipitation of targeted phosphorylated proteins from cell extract labeled with SILAC, quantitative FTICR-MS analysis to monitor the kinetics of multiple, ordered phosphorylation events on protein players in the canonical mitogen-activated protein kinase signaling pathway.a

20 147–151

146

Ser/Thr

Tyr, Ser/Thr

145

Tyr

References to Proteomic Scale Analysis

Phosphorylation

Mass Change (Da)

Modification Type

Functional Group

Amino Acid Modified

TABLE 1-3. (Continued)

Oligonucleotide-tagged multiplex assay. Multiple SH2 domains are labeled by domain-specific oligonucleotide tags, applied as probes to complex protein mixtures in a multiplex reaction and phosphotyrosine-specific interactions are quantified by PCR. The method involves phosphorylation of proteins using ATP-γ S and the selective in situ alkylation of the resultant thiophosphorylated proteins, resulting in a stable covalent bond. The thiophosphate-specific alkylating reagent can be linked to biotin or solid support (e.g., glass or Sepharose beads) with or without a photocleavable linker to facilitate convenient, high yield isolation of phosphorylated peptide/ proteins. Enrichment of phosphoproteins/ phosphopeptides with immobilized metal affinity chromatography (IMAC) and analysis by MS.

Principle for Experimental Specification of a Modification

21

Asn

Catalyzed by Asn a series of glycosyltransferases

Glycosylation

Oligosaccharide ⬎800

153

152

(continued)

Isotope-coded glycosylation-site-specific tagging (IGOT) is based on the lectin column-mediated affinity capture of a set of glycopeptides generated by tryptic digestion of protein mixtures, followed by the peptide:N-glycosidase-mediated incorporation of a stable isotope tag, 18 O, specifically into the N-glycosylation site. The 18O-tagged peptides were then assigned by 2D-LC-MS/MS. Proteins from two biological samples are oxidized and coupled to hydrazine resin. Nonglycosylated peptides are removed by proteolysis and extensive washes. The N-terminus of glycopeptides are isotope-labeled by succinic anhydride carrying either d0 or d4. The beads are then combined and the isotopically tagged peptides are released by peptide- Nglycosidase F (PNGase F). The recovered peptides are then identified and quantified by MS/MS.

22

Methylation

14

154

156

CH3

⬎203, ⬎800

References to Proteomic Scale Analysis

Arg

Glycosylation

Mass Change (Da)

155

Thr/Ser

Modification Type

Functional Group

Ser/Thr

Amino Acid Modified

TABLE 1-3. (Continued)

Four arginine methyl-specific antibodies (ASYM24 and ASYM25 are specific for asynmetrical DMA; SYM10 and SYM11 recognize symmetrical DMA), which were generated by using peptides with aDMA or sDMA in the context of different RG-rich sequences, are used to immunoprecipitate proteins and they are analyzed by microcapillary LC-MS/MS.

Mild β-elimination followed by Michael addition of dithiothreitol (Cleland’s reagent, DTT) (BEMAD) or biotin pentylamine (BAP) to tag O-GlcNAc sites (as well as phosphorylation sites). The tag allows for enrichment via affinity chromatography and is stable during collision-induced dissociation, allowing for site identification by LC-MS/MS. An immunoaffinity and enzymatic strategy is provided to discriminate between O-GlcNAc and phosphorylation sites with the use of BEMAD. The approach involves lectin-isolation, 2DEPAGE, and in-gel protease, MS analysis.

Principle for Experimental Specification of a Modification

23

140

CH3CO

Ubiquitin (polypeptide)

N terminus

Lys

Acetylation

Ubiquitination

159

140

N terminus

⬎1000

158

Arg/Lys

42

157

Arg

Catalyzed by methyltransferase

(continued)

The proteins are purified by immunoaffinity chromatography with anti-ubiquitin antibody under denaturing or native conditions. They are then digested with trypsin, and the resulting peptides were analyzed by 2DLC-MS/MS. After tryptic digestion, ubiquitination site is modified with the Gly-Gly (⫹114.1 Da) dipeptide.

Top–down MS/bottom–up MS/MS.

Direct LC-MS/MS identification of isolated Golgi fraction. Isotopically heavey [13CD(3)]methionine is metabolically converted to the sole biological methyl donor [13CD(3)]S-adenosyl methionine by SILAC method. Heavy methyl groups are fully incorporated into in vivo methylation sites, directly labeling the PTM. Methylated proteins are isolated by using antibodies targeted to methylated residues. Identification and relative quantitation of protein methylation is done by LC-MS/MS. Top–down MS/bottom–up MS/MS.

24

Sumoylation

Ubiquitination Catalyzed by several enzymes, including an ubiquitinactivating enzyme (E1), an ubiquitinconjugating enzyme (E2), an ubiquitin-protein ligase (E3), and a deubiquitinating enzyme (DUB)

Modification Type

Amino Acid Modified

Lys

Lys

TABLE 1-3. (Continued)

Small ubiquitinlike modifier 1 (SUMO-1, 101 AA polypeptide)

Functional Group

∼12 kDa

Mass Change (Da)

161

160

References to Proteomic Scale Analysis

The method involves in vitro expression cloning (IVEC) screen for SUMO-1 substrates. Briefly, DNA was in vitro transcribed and translated (IVT) in reticulocyte lysate in the presence of 35S-methionine and subjected to in vitro sumoylation reactions, which contained IVT product, Aos1-Uba2, Ubc9, and SUMO-1. Sumolylation was detected SDS-PAGE followed by autoradiography.b

The method involves 6⫻His-tag ubiquitin expression, affinity chromatography isolation with Ni-chelate resin column, proteolysis with trypsin, and analysis by 2D LC-MS/MS.

Principle for Experimental Specification of a Modification

25

Catalyzed by a cascade of enzymes, including SUMO isopeptidases (SENPs), SUMO activating enzyme (a heterodimer of Aos1-Uba2), SUMOconjugating enzyme (Ubc9), and SUMO ligases

SUMO-1, SUMO-2, SUMO-3

SUMO-1, SUMO-2

Lys

Lys

165

162–164

(continued)

His6-SUMO-1 or His6-SUMO-2 is expressed stably in HeLa cells. These cell lines and control HeLa cells are labeled with stable arginine isotopes. His6-SUMOs are enriched from lysates using immobilized metal affinity chromatography. Quantitative proteomics analyzed the target protein preferences of SUMO-1 and SUMO-2. d

The overall strategy involves the development of a stable transfected cell line expressing a double-tagged SUMO under a tightly negatively regulated promoter, followed by the induction of the expression and conjugation of the tagged modifier to cellular proteins, the use of a tandem affinity purification (TAP) method for the specific enrichment of the modified proteins, and the identification of the enriched proteins by LC-MALDI-MS/MS. c

26

Sumoylation

Modification Type

SUMO

Lys

SUMO-2

Functional Group

SUMO

Amino Acid Modified

Lys

TABLE 1-3. (Continued) Mass Change (Da)

168

167

166

References to Proteomic Scale Analysis

A stable HeLa cell line expressing His6-tagged SUMO-2 was established and used to label and purify novel endogenous SUMO-2 target proteins. Tagged forms of SUMO-2 were functional and localized predominantly in the nucleus. His6-tagged SUMO-2 conjugates were affinity purified from nuclear fractions and identified by mass spectrometry. The method involves expression of SUMO with both N-terminal (His)6- and FLAGtags, two-step isolation of the sumolylated proteins by Ni-NTA chromatography and a FLAG affi nity purification in tandem, and identification by LC-MS/MS analysis using an LTQ FTMS. The method involves expression of SUMO with both N-terminal (His)8-tag, one-step isolation of the sumolylated proteins by NiNTA chromatography, and identification by multi-LC-MS/MS -SEQUEST analysis after digestion with Lys-C and trypsin.

Principle for Experimental Specification of a Modification

27

Catalyzed by NO synthase

S-nitrosylation

Cys

NO

30

170

169

(continued)

The approach, termed SNOSID (SNO Site Identification), is a modification of the biotin-swap technique, comprising methylthiolation of all Cys-thiols in a protein mixture, followed by selective reduction of S—NO bonds, thereby generating a new unmodified thiol at each former SNO-Cys site. These new thiols are then marked by introduction of a mixed-disulfide bond with a biotintagging reagent, captured on immobilized avidin, trypsinolysis, affinity purification of biotinylated-peptides, and amino acid sequencing by LC-MS/MS. e The method involves the biotin-switch method. The first step is methylthiolation of all free Cys-thiols in a protein mixture, followed by selective reduction of S—NO bonds, thereby generating a new unmodified thiol at each former SNO-Cys site. These new thiols are then marked by introduction of a mixed-disulfide bond with a biotin-tagging reagent, captured on immobilized avidin, and selectively released from avidin by reduction of the disulfide linker. The isolated proteins are separated by 1D- or 2D-PAGE and in-gel trypsinolysis for identification of SNO proteins by mass spectrometry.f

28

Nitration

Acylation (isoprenylation)

Cys

Tyr

Modification Type

Carbonylation

Amino Acid Modified

TABLE 1-3. (Continued)

Farnesyl (15-carbon farnesyl isoprenoid)

NO2

Functional Group

204

45

Mass Change (Da)

175

Tagging-via-substrate (TAS) technology involves metabolic incorporation of a synthetic azido-farnesyl analog and chemoselective derivatization of azido-farnesyl-modified proteins by a Staudinger reaction, using a biotinylated phosphine capture reagent. The resulting protein conjugates can be specifically detected and/or affinity-purified by streptavidin-linked horseradish peroxidase or agarose beads, respectively.i

The method involves derivatization of the carbonyl groups in the protein side chain to 2,4-dinitrophenylhydrazone (DNPH) by reaction with 2,4dinitrophenylhydrazine, blotted onto PVDF membrane or nitrocellulose paper after 2D-PAGE, and detected with anti-DNP antibodies. The approach involves 2D-PAGE separation and LC-MS/MS analysis coupled with a hydrazide biotin–streptavidin methodology in order to identify protein carbonylation in aged mice.h

172, 173

174

The approach involves in 2D-PAGE separation, Western blotting with anti-3-nitrotyrosine antibodies, in-gel protease digestion, and identification by MALD-ToF and MS/MS.g

Principle for Experimental Specification of a Modification

171

References to Proteomic Scale Analysis

29

210

238

Myristoyl

Palmitoyl

176

The N-myristoylation reaction is coupled to that of pyruvate dehydrogenase, and NADH is continuously detected spectrophotometrically.

e

Reversible addition of NO to Cys-sulfur in proteins, a modification termed S-nitrosylation, is an ubiquitous signaling mechanism for regulating diverse cellular processes. The method is applied to rat cerebellum lysates. f The method is applied to mesangial cells. g Many mammalian proteins are inactivated by nitration of tyrosines, among which are manganese superoxide dismutase and glutamine synthase. In addition, important enzymes or structural proteins such as manganese superoxide dismutase, neurolament L, actin, and tyrosine hydroxylase have been indicated as the targets of tyrosine nitration in pathological conditions and animal models of disease F37. All these findings suggest an important role of protein nitration in modulating activity of key enzymes in neurodegenerative disorders. h Protein carbonylation content is widely used as a marker to determine the level of protein oxidation that is caused either by the direct oxidation of amino acid side chains (e.g., proline and arginine to glutamylsemialdehyde, lysine to aminoadipic semialdehyde, and threonine to aminoketobutyrate) or via indirect reactions with oxidative by-products [lipid peroxidation derivatives such as 4 hydroxynonenal (HNE), malondialdehyde (MDA), and advanced glycation end products (AGEs)]. A deleterious consequence of these oxidative impairments is protein dysfunction. i Protein farnesylation is a post-translational modification involving the covalent attachment of a 15-carbon farnesyl isoprenoid through a thioether bond to a cysteine residue near the C terminus of proteins in a conserved farnesylation motif designated the “CAAX box.” Azido-farnesylated proteins maintain the properties of protein farnesylation, including promoting membrane association, Ras-dependent mitogen-activated protein kinase kinase activation, and inhibition of lovastatin-induced apoptosis.

b

a

Dynamic analysis of phosphorylation events by quantitative proteomics. Small ubiquitin-like modifier (SUMO) regulates diverse cellular processes through its reversible, covalent attachment to target proteins. Many SUMO substrates are involved in transcription and present in chromatin structure. Sumoylation appears to regulate the functions of target proteins by changing their subcellular localization, increasing their stability, and/or mediating their binding to other proteins. c Sumoylation consensus motifs: ΨKXE (Ψ, a hydrophobic residue; X, any residue), mammalians. d Fourteen of the 25 SUMO-1 conjugated proteins contain zinc fingers. Sumoylation is strongly associated with transcription since nearly one-third of the identified target proteins are putative transcriptional regulators.

N terminus

30

OVERVIEW OF PROTEOMICS

Fig. 1-6. Chemical approaches to post-translational proteomics, illustrating methods for analyzing protein phosphorylation in a complex biological mixture. The base-catalyzed phosphate elimination method (left flowchart) and the phosphoramidate modification method (right flowchart) are outlined. TFA, trifluoroacetic acid; tBOC, tert-butoxycarbonyl; DTT, dithiothreitol; EDC, 1-(3-dimethylaminopropyl)-3-ethylcarbodiimide hydrochloride. [From G. C. Adam, E. J. Sorensen, and B. F. Cravatt, Mol. Cell. Proteomics (Ref. 64) (2003). Copyright (2003) by American Society for Biochemistry and Molecular Biology, Inc. Reproduced with permission of the ASBMB via the Copyright Clearance Center.]

or in vivo isotope labeling specific to a certain PTM, affinity-based isolation methods with specific antibodies or with chemical reagents that specifically react to a certain PTM site of the amino acid residues are often adopted to collect or concentrate proteins with a specific type of PTM (Fig. 1-6). The earliest works on modification-specific proteomics using chemical reagents are those for selectively modifying phosphoproteins within complex mixtures (91).

STRATEGIES FOR CHARACTERIZING PROTEOMES

31

In those approaches, modified peptides are enriched by covalent or high affinity avidin–biotin coupling to immobilized beads, allowing stringent washing to remove nonphosphorylated peptides. One method begins with a proteolytic digest, which was alkylated after reduction to eliminate reactivity from cysteine (92). After protection of N and C termini, phosphoramidate adducts at phosphorylated residues are formed by carbodiimide condensation with cystamine. The free sulfhydryl groups produced from this step are captured covalently onto glass beads coupled to iodoacetic acid, are cleaved off with trifluoroacetic acid, and are eluted from the beds. MS analyzes the regenerated phosphopeptides. On the other hand, another method starts with a protein mixture in which cysteine is oxidized with performic acid (93). β-Elimination of phosphate from phosphoserine and phosphothreonine is induced by base hydrolysis, and the resulting alkenes are modified by ethandithiol to produce free sulfhydryls that allow coupling to biotin. The biotinylated phosphoproteins are then captured with avidin-affinity beads, eluted, and digested with trypsin. The peptides are again captured with avidin beads, washed, and eluted for MS analysis. Currently, many works adopting affinity-based isolation methods and/or isotopelabeling methods in conjunction with MS-based protein identification methods have been reported for PTMs other than phosphorylation. Each approach is well designed for adapting to each specific character of the PTM. Those are summarized in Table 1-3. Because of importance of PTMs, the details of some approaches including experimental protocols for modification-specific proteomics will be described in Section 2-2. 1-3-2 Activity-Based Protein Profiling/ Enzyme Substrate Proteomics Activity-based profiling (ABP) provides a strategy for identifying proteins associated with a particular biological activity—typically enzymatic activity (57)—and utilizes synthetic chemistry to create tools and assays for the characterization of protein samples of high complexity (64). The ABP strategy simplifies a complex biological mixture of proteins before analysis by labeling a specific set of related proteins with an affinity or fluorescence tag (Fig. 1-7), and thus includes the development of chemical affinity tags to react to the active site of enzymes and, in certain cases, to measure the relative expression level and PTM state of proteins in cell and tissue proteomes. This strategy, in a certain sense, may share some of the approaches with those of modification-specific proteomics (described in Section 1-3-1) and quantitative proteomics (which permits the quantitative comparison of proteins and allows monitoring of dynamics in protein function in complex proteomes; this strategy, typically using ICAT reagents, is described in Chapter 2). Many of those approaches (not restricted to ABP) interface well-established approaches in molecular biology or cell biology with proteomics. As a common theme, but in contrast to classic cell biological or biochemical research, the approaches are all designed to allow systematic screening of proteins in a defined experimental paradigm (75). In the original version of ABP, fluorophosphonates (known to be specific covalent inactivators of serine proteases) were linked to biotin. In this strategy, a fluorophosphonate irreversibly phosphonylates the active-site serine only in those proteases that are catalytically active, thereby attaching a biotin moiety that could be affinity captured with avidin. Biotin can be substituted by a fluorophore to detect

32

OVERVIEW OF PROTEOMICS

Fig. 1-7. Chemical approaches to activity-based protein profiling (ABP). (A) General structure of activity-based probes and (B) some probes directed toward specific classes of enzymes. ABP probes label the active sites of target enzymes. NU, nucleophilic amino acid residue; RG, reactive group; L, linker; TAG, detection and/or affinity tag. [From G. C. Adam, E. J. Sorensen, and B. F. Cravatt, Mol. Cell. Proteomics (Ref. 64) (2003). Copyright (2003) by American Society for Biochemistry and Molecular Biology, Inc. Reproduced with permission of the ASBMB via the Copyright Clearance Center.]

and increase the sensitivity of the initial screening step. Isolation of the tagged serine proteases followed by MS analysis allows the identification of active serine proteases. Thus, the approach for identifying functional proteins is expected to be specific for identifying members of the serine protease family (57). Principally, the same approach as this can be applied to the identification of many other functional families of enzymes. In fact, many ABP probes have been designed for isolating other functional families of enzymes, including cysteine proteases (94), ubiquitin-specific protease (95), threonine proteases, metalloproteases, protein phosphatases, kinases, glucosidase, exoglycosidases, and transglutaminases (Table 1-4). In some cases, wellknown inhibitors were exploited to direct probe reactivity toward a specific class of enzyme and the designed probes have been shown to label selectively active enzymes, but not their inactive precursor (e.g., zymogen) or inhibitor-bound forms (71, 96, 97). Active sites of enzymes invariably contain nucleophilic groups (that participate in diverse reactions such as acids, bases, or nucleophiles) and the environments of groups (that are involved in catalysis); in some other cases, the ABP probes are also designed as nonspecific electrophiles and are directed to react over a much wider range

33

Epoxide electrophiles Diazomethyl ketones, Biotin/ radioiodine fluoromethyl [E64 and derivatives ketones, containing a P2 acyloxymethyl leucine residue ketones, O(DCG-03 and acylhydroxylamines, DCG-04) /DCG-04 vinyl sulfones and derivatives replaced epoxysuccinic Leu with all natural derivatives. amino acids]

Cys

Bleomycin hydrolases, calpains, caspases, cathepsins, thiol-ester motifcontaining protein (TEP) 4 (fly), etc.

Reference

94, 179, 180

Biotin (5-(bio- 177, 178 tinamido) pentylamine) (NH2-biotin, Pierce)

Affinity Tag/ Reporter

Cysteine protease

Diisopropyl fluorophosphate etc.

Known Inhibitor

Fluorophosphonate/ fluorophosphate (FP) derivatives

Binding and Reactive Group

Ser

Active Site

Proteases, lipases, esterases, and amidases

Members Identified

Serine hydrolase

Targeted Protein (Enzyme) Family

TABLE 1-4. Activity-Based Protein Profiling

(continued)

Most cysteine proteases are synthesized with an inhibitory propeptide that must be proteolytically removed to activate the enzyme, resulting in expression profiles that do not directly correlate with activity. The largest set of papain-like cysteine proteases, the cathepsins, act in concert to digest a protein substrate. Isotope-coded activity-based probe for the quantitative profiling of cysteine proteases is also developed.

The chemical probes inert to the most part of cysteine, aspartyl, and metallohydrolases and react in a catalytically active state.

Note

34

Members Identified

Active Site

Probably Mechanistically Acetyl-CoACys/ distinct acetyltransferase, Asp/ enzyme classes aldehyde Glu/His dehydrogenase, NAD/ NADP-dependent oxidoreductase, enoyl CoA hydratase, epoxide hydrolase, glutathione S-transferase, 37-hydroxysteroid dehydrogenase/ 5-isomerase, platelet phosphofructokinase, type II tissue transglutaminase, the endocannabinoiddegrading enzyme fatty acid amide hydrolase (FAAH), triacylglycerol hydrolase (TGH) and an uncharacterized membrane-associated hydrolase

Targeted Protein (Enzyme) Family

TABLE 1-4. (Continued)

Sulfonate ester derivatives (phenyl-, quinoline-, octyl, nitrophenyl, naphthyl, mesyl, pyridyl, thiophene, and azido)

Binding and Reactive Group —

Known Inhibitor Biotin, rhodamine/ biotin-rhodamin

Affinity Tag/ Reporter 71, 72, 181– 183

Reference

Libraries of candidate probes are screened against complex proteomes for activitydependent protein reactivity in a nondirected or combinatrial strategy.

Note

35

NonType II tissue specific transglutaminase (tTG2), formiminotransferase cyclodeaminase, aldehyde dehydrogenase-9, aminolevulinate Δ-dehydratase, epoxide hydrolase, cathepsin Z, and the muscle and brain isoforms of creatine kinase (CK), [e.g., Asp-Aspα-CA targets two isoforms of CK, Gly-Glyα-CA target tTG2; Leu-Met (maleylacetoacetate isomerase, ATP citrate lyase), Leu-Asp (hydroxypyruvate reductase, peroxiredoxin), LeuArg (malic enzyme) etc.]

α-chloroacetamide (α-CA) reactive group and a variable dipeptide binding group —

Biotin, rhodamine, biotinrhodamin

96

(continued)

The α-chloroacetamide (α-CA) reactive group, consistent with the behavior of other moderately reactive carbon electrophiles, proved capable of labeling in an active site-directed manner several mechanistically unrelated enzyme classes, thereby further expanding the scope of enzymes addressable by ABPP. In total, more than 10 different classes of enzymes were identified as targets of the α-CA probe library, most of which were not labeled by previously described ABPP probes.

36

Members Identified

Active Site

Cys Ubiquitin-specific Ub-processing protease proteases (UBP), (deubiquitiUb carboxylnating enzyme) terminal hydrolases, ubiquitin-specific protease (UPS)-4, -5, -7, -8, -9X, -10, -11, -12, -13, -14, -15, -15i, -16, -19, -22, -24, -25, -28, CYLD-1, m64E, UPS flag1 (KIAA891), UCHL1, UCHL3, UCH37, and HSPC263

Targeted Protein (Enzyme) Family

TABLE 1-4. (Continued)

Ub-derived activesite-directed probes (four Michael acceptorderived probes, vinyl methyl sulfone, vinyl methyl ester, vinyl phenyl sulfone, and vinyl cyanide, and three alkylhalidecontaining inhibitors, chloroethyl, bromoethyl, and bromopropyl), which are designed to react at a position that corresponds to the C-terminal carbonyl of the Gly76 amide bond conjugating Ub to its substrate.

Binding and Reactive Group —

Known Inhibitor Reference

Influenza hem- 184–188 agglutinin (HA)

Affinity Tag/ Reporter

The family of ubiquitin (Ub)specific proteases (USP) removes Ub from Ub conjugates and regulates a variety of cellular processes. Four major classes are identified. UBPs can hydrolyze both linear and branched Ub modifications whereas the activity of UCH enzymes is restricted to the hydrolysis of small Ub C-terminal extensions. A third USP family contains an ovarian tumor (OTU) domain with USP activity. Finally, RPN11/ POH-1, a proteasome 19S cap subunit belonging to the Jab1/MPN domain-associated metalloisopeptidase (JAMM) family that lacks the cysteine protease signature, and cleave Ub from substrates in a Zn2⫹and ATP-dependent manner.

Note

37



Biotin/ nitrophenol moiety

Hydroxamate Zinc-chelating ZincBiotinMatrix inhibitor GM6001 activated hydroxamate metalloproteinases rhodamine/ (ilomastat), (that chelates the water (MMP)-2, 7, rhodamine marimastat, molecule conserved zinc atom 9, neprilysin, in MP-active sites in peptidyl aminopeptidase, and hydroxamate a bidendate manner) dipeptidylpeptidase -a benzophenone zinc-binding group (ZBG) photocrosslinker (a photolabile diazirine group; converting tight-binding reversible MP inhibitors into active site-directed affinity labels by incorporating a photocrosslinking group into these agents.)

Peptide vinyl sulfone, carboxybenzylleucyl-leucylleucine vinyl sulfone (Z-L3VS)

Metalloproteases

Thr

Proteasomal beta subunits, HslV subunit of the Escherichia coli protease complex HslVyHslU

Proteasome

97, 190

189

(continued)

Metalloproteases (MPs) are a large and diverse class of enzymes implicated in numerous physiological and pathological processes, including tissue remodeling, peptide hormone processing, and cancer. In the cases of serine and cysteine proteases, ABPP probes are designed to target conserved nucleophiles in protease active sites, an approach that cannot be directly applied to MPs, which use a zinc-activated water molecule (rather than a protein-bound nucleophile) for catalysis.

Proteasomes are multicatalytic proteolytic complexes found in almost all living cells and are responsible for the degradation of the majority of cytosolic proteins in mammalian cells. The tripeptide vinyl sulfone Z-L3VS and related derivatives inhibit the trypsin-like, the chymotrypsinlike, and, unlike lactacystin, the peptidylglutamyl peptidase activity of the proteasome in vitro by covalent modification of the NH2-terminal threonine of the catalytically active b subunits.

38

Phosphatase

Targeted Protein (Enzyme) Family

Active Site

Protein tyrosin Cys phosphatase (PTP)1B, prostatic acid phosphatase, protein Ser/Thr phosphatase calcineurin

Members Identified

TABLE 1-4. (Continued)

4-fluromethylaryl phosphate [phenylphosphate (recognition head)p-hydroxymandelic acid derivatives]

Binding and Reactive Group

Affinity Tag/ Reporter Biotin/ dansyl fluorophore

Known Inhibitor —

191, 192

Reference

When the designated bond between the recognition head and the latent trapping device (p-hydroxymandelic acid derivatives) is selectively cleaved with the assistance of the target hydrolase, it becomes activated by the release of p-hydroxybenzylic fluoride intermediate. The intermediate quickly undergoes 1,6-elimination to produce highly reactive quinone methide (QM), which in turn could alkylate suitable nucleophiles on nearby hydrolases to form labeled adduct. Here, the latent trapping device serves as the core of the probe and the QM plays a critical role in the covalent labeling of target hydrolase.

Note

39

β-glucosidase

Glucosidase

ND

Thr Cyclin G-associated kinase (GAK), casein kinase 1-alpha and 1-epsilon, RICK [Rip-like interacting caspase-like apoptosis-regulatory protein (CLARP) kinase/Rip2/ CARDIAK], GSK3 beta, JNK,

Cys

Kinase

The probe is specific to PTPs including PTP1B, HePTP, SHP2, LAR, PTP, PTPH1, VHR, and Cdc14.



β-glucosyl unit (recognition head)p-hydroxymandelic acid derivatives)



Protein kinases are p38 kinase inhibitor SB 203580 key regulators of cellular signaling and therefore represent attractive targets for therapeutic intervention in a variety of human diseases. p38 kinase inhibitor SB 203580 analogue (pyridinyl imidazole 51) that possesses a primary methylamine function instead of the sulfoxide moiety.

α-bromobenzyl-phosphonate

The crystal structure of p38 in complex with SB 203580 shows exposure of the inhibitor’s sulfoxide moiety at the protein surface, suggesting a suitable site for the attachment of linkers extending from solid support materials.

194

(continued)

The active site reacts to quinone methide intermediate.

The activity-based probe is targeted to the PTP active site for covalent adduct formation that involves the nucleophilic Cys.

193

Dansyl 191, 195 fluorophore, biotin

Epoxyactivated sepharose 6B

Biotin

40 5-pentylamine (the acyl-acceptor), glutaminecontaining peptide, (the acyl-donor)



Transglutaminase

Substrates of transglutaminase (acyl-doner; fatty acid synthase, tumor rejection antigen-1, DNase gamma etc., acyl acceptor; myosin heavy polypeptide 9, T-complex protein 1g subunit, etc.

Binding and Reactive Group 2,4-dinitrophenyl 4⬘-amido-2,4⬘dideoxy-2fluoro–xylobioside inactivator

Active Site

Glu

Members Identified

Betaβ-1,4-glycanases (from endoglycosidases Cellulomonas fi mi), an endoxylanase (Bcx from Bacillus circulans) and a mixed-function endoxylanase/ cellulase (Cex from Cellulomonas fi mi)

Targeted Protein (Enzyme) Family

TABLE 1-4. (Continued)

Biotin

Biotin



Affinity Tag/ Reporter



Known Inhibitor

197, 198

98, 196

Reference

Transglutaminases (TGs)1 (EC 2.3.2.13) constitute a family of enzymes that catalyze the post-translational modification of proteins. Their calciumdependent catalytic activity is exhibited toward carboxamide groups of peptidebound glutamine residues and amino groups of peptide-bound lysines, leading to an intrachain or interchain isopeptide bond.

One enzyme molecule reacted with one DNP2FX2SSB (2,4-dinitrophenyl 4⬘-amino-N{7-[N-(D-biotinoyl)-13-amino4,7,10-trioxatridecanylamino]}4,5-dithiaheptanoyl-2,4⬘-dideoxy2-fluoro–xylobioside) molecule, forming the biotinylated fluoroglycosyl-enzyme intermediate and releasing 2,4dinitrophenol(ate).

Note

STRATEGIES FOR CHARACTERIZING PROTEOMES

41

of mechanistically distinct enzyme classes. The ABP probes directed for enzymes may be generalized as typically possessing three elements (Fig. 1-7): (1) a binding group that promotes interactions with the active sites of specific classes of enzymes; (2) a reactive (electrophilic) group that covalently labels those active sites (shown as “binding and reactive group” in Table 1-4); and (3) a reporter group (e.g., fluorophore or biotin) for the visualization or affinity purification of probe-labeled enzymes (shown as “affinity tag/reporter” in Table 1-4) (98). Multiple enzyme families can be attracted to the same electrophilic group; thus, some ABP probes allow the facilitated identification of large numbers of functional proteins in a particular proteome (57). By varying the electrophilic group used for the APB probe, different functional families of enzymes may be targeted. Thus, the ABP strategy analyzes proteomes based on functional properties such as enzymatic activity rather than expression level alone, and provides exceptional access to low abundance proteins in complex proteomes by concentrating specific families of enzymes with ABP probes (97). As an extension of this strategy, the development of a new approach that utilizes rhodamine-based fluorogenic substrates encoded with PNA (protein nucleic acid) tags is a challenging attempt. The PNA tags have two arms: one is made of a chemical affinity tag similar to that used for ABP, and another is made of a defined oligonucleotide which assigns each of the substrates that interacts with the chemical affinity arm to a predefined location on an oligonucleotide microarray through hybridization with the nucleic acid arm of the PNA tag, thus allowing the deconvolution of multiple signals from a solution (99). The PNA tag approach may thus provide an additional strategy for analyzing the functional aspects of proteomes. 1-3-3 Subcellular (Organellar) Proteomics Eukaryotic cells are compartmentalized to provide distinct and suitable environments for biochemical processes such as protein synthesis and degradation, storage of genetic materials, ribosome production, provision of energy-rich metabolites, protein glycosylation, DNA replication, and transcription. Accordingly, the compartmentalized structure of a cell is supported by subsets of proteins that are specifically targeted to particular subcellular structures. Although subcellular structures and organelles are thought to be discrete entities carrying out independent cellular functions, there are complex mechanisms of intracellular communication and contact sites between the organelles. Some proteins are associated with subcellular structures only in certain physiological states, but localized elsewhere in the cell in other states (100); among the possible mechanisms that underlie such conditional association are the protein translocation between different compartments, cycling of proteins between the cell surface and intracellular pools, or shuttling between nucleoplasm and cytoplasm. Thus, protein localization is linked to cellular function and introduces an additional strategy for proteomics at subcellular levels (75). The strategy of cellular compartmentalization is to enrich for particular subcellular structures or organelles by subcellular fractionation with classic biochemical fractionation techniques (such as centrifugation), and to map comprehensively the proteome of these structures by an MS-based protein identification method typically

42

OVERVIEW OF PROTEOMICS

Fig. 1-8. Biochemical and genetic protocols for the isolation of subcellular structure or organelle used in subcellular (organellar) proteomics. The isolated cellular compartments are studied in terms of the protein composition and function. [Adopted by permission from Macmillan Publishers Ltd.; B. Westermann and W. Neupert, Nat. Biotechnol., 21:239–240 (2003).]

after electrophoresis-based separation or by the shotgun analysis described in Chapter 2 (see the Section 2-1) (Fig. 1-8) (75). Among the key potentials of this strategy is the capability not only to screen for previously unknown gene products but also to assign them, along with other known but poorly characterized gene products, to particular subcellular structures. This strategy is called subcellular proteomics (75) or organellar proteomics (74). Subcellular structures targeted by this strategy include not only entire organelles (such as the nucleus and mitochondria) but also nonorganelle structures (such as the postsynaptic density and raft), which can be isolated by traditional subcellular fractionation typically using sucrose density-gradient ultracentrifugation, and comprise a focused set of proteins that fulfill discrete but varied cellular functions. Subcellular proteomics or organellar proteomics ranges in scope to include cataloging studies that test the ability of a method to identify as many unique proteins as possible, in particular, unknown low abundance proteins specific to a particular organelle (74). The comprehensive identification of the proteins present in a prepared organelle by MSbased methods may reveal true components of the structure investigated at the level of the endogenous gene products, but will also yield a certain amount of false-positives, depending on the degree of impurities derived from other subcellular structures present in the preparation. Those false-positives make it hard to evaluate the biological significance of proteins that are usually associated with one organelle but are detected in the

STRATEGIES FOR CHARACTERIZING PROTEOMES

43

proteome of another organelle. Although these proteins could be artifacts of subcellular fractionation procedures, they might also be biologically significant (74). Therefore, cell biological methods (such as immunocytochemical analysis and fluorescence tag analysis) as well as sequence analyses by bioinformatics tools are often required to validate the MS-based identifications (Fig. 1-8) (76). It should be noted that an innovative method called protein correlation profiling (PCP) was introduced to address this problem in the study of the human centrosome (101). In the PCP method, mass spectrometric intensity profiles from centrosomal marker proteins are used to define a consensus profile through a density centrifugation gradient, in direct analogy to Western blotting profiling of gradient fractions. Distribution curves generated from the intensities of tens of thousands of peptides from consecutive fractions established centrosomal proteins by their similarity to the consensus profile using mean squared deviation (χ2 value) (102). When combined with those validation methods (103), the strategy of subcellular (organellar) proteomics allows not only assigning known proteins but also identifying previously unknown gene products in particularly defined subcellular (organellar) structures, and contributes significantly to the functional annotation of the products of a genome. Thus, subcellular structures (organelles) represent attractive targets for global proteome analysis because they represent discrete functional units; their complexity in protein composition is reduced relative to whole cells and lower abundance proteins specific to the organelle are revealed (74). In addition, the analysis at the subcellular level is a prerequisite for the detection of important regulatory events such as protein translocation in comparative studies (75). The approach has been applied to a number of subcellular structures, including plasma membrane, nucleus, nucleolus, interchromatin granule clusters, nuclear (inner, outer) membranes, centrosome, midbody, endoplasmic reticulum (microsomes), Golgi apparatus, clathrin-coated vesicles, lysosome, peroxisomes, phagosome, mitochondrion, chloroplast (plant), and to nonorganelles including synaptosome (postsynaptic density) and raft (74, 75, 104–111). The largest protein collection of subcellular localization obtained by a single study is so far that from mouse liver cells. The study localized 1404 proteins to ten cellular compartments including early endosomes, recycling endosomes, and proteasome (protein lists are available at http://proteome.biochem.mpg.de/ormd.htm) by using PCP introduced in the analysis of the centrosome in conjunction with other validation methods such as enzymatic assays, marker protein profiles, and confocal microscopy (102). As a complementary approach to the biochemical protocols described earlier, a comprehensive gene expression approach in yeast has been developed (Fig. 1-8). It relies on the systematic cloning of open reading frames (ORFs) for subsequent expression or generation of genomic sets of strains expressing tagged proteins suitable for detecting cellular localization. The initial stage of proteomic scale analysis of protein localization involved a description of the cellular localization of almost half of the yeast proteins using plasmid-based overexpression of epitope-tagged proteins and genome-wide transposon mutagenesis for high throughput immunolocalization of tagged gene products (see Saccharomyces Genome Database, http://www.yeastgenome.org/) (56, 112). This approach, however, led sometimes to mislocalization of proteins, because of their overexpression, that was not responsive to normal regulatory circuitry. In the second-stage proteomic scale analysis, therefore, each yeast ORF is

44

OVERVIEW OF PROTEOMICS

fused with an affinity (TAP tag, tandem affinity purification tag) or fluorescent (GFP, green fluorescent protein) tag at the carboxyl (C) terminus, inserted on the predicted yeast ORF on the budding yeast genome by homologous recombination strategy, and is expressed from its native promoter in its endogenous chromosomal location responsive to normal regulatory circuitry (56, 113). To validate known localization, colocalization experiments are done using monomeric red fluorescent protein (RFP) fused to proteins whose cellular localization was established. This second-stage proteomic scale analysis for cellular localization has categorized about 4500 ORFs out of over 6000 strains with GFP-tagged ORFs into 22 distinguishable subcellular localization (organelle) patterns; they include cytoplasm, nucleus, mitochondrion, endoplasmic reticulum (ER), nucleolus, vacuole, cell periphery, punctate speckle, bud neck, spindle pole, vacuolar membrane, nuclear periphery, early Golgi/COPI, endosome, bud, late Golgi clathrin, Golgi, cytoskeleton, lipid particle, peroxisome, microtubules, and ER to Golgi (113). This genomic-scale information of protein localization in budding yeast enables us to make a comparison with that obtained for other species and allows us to assess functional conservation at subcellular levels among evolutionally different species. The best information available for subcellular structures other than that for yeast is that for the nucleolus of human HeLa cells. The comparison of the protein composition of yeast nucleolus with that of human HeLa cells indicates that out of the 142 yeast nucleolar proteins that have at least one human homolog, 124 are found in the human nucleolar proteome (87%). The data indicate that approximately 90% of the yeast nucleolar proteins with human homologs are also nucleolar components in HeLa cells and that the nucleolus is highly conserved throughout the eukaryotic kingdom (84). Thus, the genetic tractability of yeast allows a large fraction of yeast ORFs to be tagged for localization studies; however, such an approach is more challenging in mammalian systems due, in part, to artifacts from overexpression and to difficulty in constructing gene expression system from its native promoter in its endogenous chromosomal location (114). All the localization information obtained for yeast is available at a public database (http://yeastgfp.ucsf.edu; see Saccharomyces Genome Database, http://www.yeastgenome.org/) and is useful for further analysis with yeast cells and for comparative localization analysis of the other species, although proteins with crucial C-terminal targeting signals are often mislocalized in this analysis and new fusions will have to be constructed to get an accurate view of the subcellular location of this group of proteins. Localization information is integrated into different functional genomics datasets and it will be challenging to formulate biological hypotheses, such as those regarding correlation of colocalization with transcriptional coexpression (obtained by microarray analysis) and the relationship between colocalization and physical or genetic interaction. [The relative enrichment for colocalization was assessed for the combination of protein–protein or genetic interactions in the GRID database (http://biodata.mshri.on.ca/grid) (56).] Thus, the challenge of global subcellular (organellar) proteomics is to provide a functional context for proteins by associating them with a distinct group of proteins in defined intracellular environments and is extremely useful for the integration of information obtained from analyses done in other proteomic strategies as well as in different functional genomics (74).

STRATEGIES FOR CHARACTERIZING PROTEOMES

45

1-3-4 Machinery or Complex Interaction Proteomics Multiprotein complexes are among the fundamental units of macromolecular organization and thus are key molecular entities that integrate multiple gene products to perform cellular functions (81). In fact, many multiprotein complexes constitute molecular machineries, such as transcription and RNA processing machinery, nuclear pore complex, preribosomal ribonucleoprotein (pre-rRNP) complex, ribosome, proteasome, and receptor–signal transduction complexes. They carry out and regulate important cell mechanisms such as transcription and RNA processing, membrane transport, ribosome synthesis, protein synthesis and degradation, and receptor–signal transduction. They require precise organization of molecules in time and space, are thought to assemble in a particular order, and often require energy-driven conformational changes, specific post-translational modifications, or chaperone assistance for proper formation. Their composition may also vary according to cellular requirements (81). Because of the difficulties of conventional protein chemical technologies in analyzing such cellular machineries or multiprotein complexes with multicomponents, most multiprotein complexes remain only partially characterized. Thus, an additional strategy for proteomics can be introduced at the levels of multiprotein complex and cellular machineries. We would like to call this strategy machinery proteomics or complex (interaction) proteomics. Some of the multiprotein complexes (molecular machineries) can be prepared by subcellular fractionation methods using ultracentrifugation, which are often used to prepare subcellular structures or organelles as described in Section 1-3-3 [e.g., nuclear pore complex, anaphase-promoting complex, preribosomal ribonucleoprotein (prerRNP) complexes] (110, 111, 115, 116) or reconstituted from subcellular extracts (e.g., spliceosome) (108, 117). They can also be prepared by pull-down analyses, such as affinity purification (using epitope tag, tandem affinity purification tag, antibodies, etc.) as exemplified for spliceosome (117, 118) and pre-rRNP complexes (119–121) (see the Section 3-2-1). Once pure complexes or interacting partners are obtained, MS-based technologies allow identifying their components at subfemtomole levels on a large scale and with high throughput performance. Machinery proteomics or complex (interaction) proteomics often starts with an initial hypothesis to draft a design for particular known cellular machinery or to identify interaction partners of a particular known protein. It especially allows one to assign protein constituents with unknown function as the constituents of functionally defined cellular machinery. For instance, the constituents of the nuclear pore complex are expected to perform events related to nuclear transport, those of the preribosome perform events related to ribosome synthesis, or those of the spliceosome perform events related to mRNA splicing. Identification of unknown constituents in known machineries or of known interaction partners for a protein with unknown function may lead to the specification of function for the unknown protein. In addition, the strategy allows one to catalog data of known constituents. Thus, analysis of protein complexes or cellular machineries is one of the most useful strategies for directly assigning protein function and for annotating protein products of the genome in terms of biological activity of each protein product.

46

OVERVIEW OF PROTEOMICS

Currently, two groups, the Cellzome Corporation in Germany (82) and the University of Toronto in Canada (122), took the TAP-tag approach for genome-wide screening for complexes in an organism, the budding yeast. They are particular about endogenous expression of TAP-tagged bait proteins from their natural chromosomal locations; Saccharomyces cerevisiae strains are generated with in-frame insertions of TAP tags individually introduced by homologous recombination at the 3⬘ end of each predicted open reading frame (ORF) (http:// www.yeastgenome.org/) (55, 123). This construction of TAP-tagged protein ORFs ensures expression from its native promoter in its endogenous chromosomal location responsive to normal regulatory circuitry; thus, the formed protein complex is expected to be equal to its endogenous counterpart unless TAP tag itself has any effect on this complex formation in vivo. The group performed tandem affinity purification repeatedly for over 4500 different tagged proteins of the yeast; the majority of the protein complexes were purified at least several times, and the group characterized the composition and organization of the multiprotein complex/cellular machinery based on the huge dataset obtained and based on available data on expression, localization, function, evolutionary conservation, protein structure, and binary interactions. They propose that the ensemble of cellular proteins partitions into 491–547 complexes, of which about half are novel, and differentially combine with additional attachment proteins or protein modules to enable a diversification of potential functions. The detailed data is available online (the BioGRID database, http://thebiogrid.org; the ntAct database, http://www. ebi.ac.uk/intact/; the MS protein identifications, http://yeastcomplexes.embl.de; Euroscarf, http://web.uni-frankfurt.de/fb15/mikro/euroscarf/col_index.html) and is useful for future studies on individual proteins, biological data integration, and modeling and thus for functional genomics and systems biology. Genomic scale analysis of machinery or complex (interaction) proteomics has not been reported yet for any species other than yeast; however, at least one group in Japan has been taking an epitope-tag approach for genome-wide screening of multiprotein complexes in humans. 1-3-5 Dynamic Proteomics Multiprotein complex modules require precise organization of molecules in time and space, are assembled in a particular order, and vary their composition according to cellular requirements as described in Section 1-3-4. The primary goal of proteomics is to describe not only the composition and connection but also the dynamics of the multiprotein modules and ideally of the entire proteome (124). The approaches using affinity purification and MS methods allow for isolation of almost any multiprotein complex formed in the cell and for the detection of many constituents of complexes in a fraction, and allow one to determine the connection between multiprotein modules on the genomic scale. The approaches enable us to probe specific states of multiprotein complexes and the network formed in some biological states by collating lists of identified proteins (profiling or cataloging analysis), or to distinguish some different states by enumerating differences in protein composition among the corresponding complexes obtained from different cell states (subtractive analysis). However, the approaches cannot tell us anything about the extent of those differences, when those

STRATEGIES FOR CHARACTERIZING PROTEOMES

47

differences are caused, or how long they take to happen; thus, it remains difficult to analyze the dynamic aspect of multiprotein complexes (machineries) and even more difficult to analyze the dynamic aspect of the much higher-ordered organization of multiprotein modules (i.e., subcellular structure or organelle). To analyze the dynamic aspect, quantitative changes in protein complex abundance and composition of protein complexes/subcellular structures formed during physiological alteration in the cell have to be determined (quantitative analysis). We would like to propose a strategy for analyzing the dynamic aspect of multiprotein modules, subcellular structures, or ideally the entire proteome and call that strate dynamic proteomics. One approach used for this strategy is direct visual comparison of proteins present in the isolated protein complex or subcellular structure after protein staining on electrophoresis gels among different samples, followed by MS-based protein identification. The approach was successfully used to analyze, for example, remodeling of small nuclear ribonucleoprotein (snRNP) complexes during catalytic activation of the spliceosome, which removes introns from mRNA precursors (125). In this example, human 45S activated spliceosomes and a previously unknown 35S U5 snRNP were isolated by tobramycin-affinity selection (118) and characterized by gel-based mass spectrometry. Subtractive comparison of their protein components with those of other snRNP and spliceosomal complexes revealed dynamic changes of proteins that participated in the remodeling of splicing machinery during spliceosome activation (125). A similar approach was also applied to the analysis of reorganization of the entire human nucleolus upon transcriptional inhibition with actinomycin D (126). Proteins from nucleoli isolated from both control and actinomycin D-treated cells were separated by 1D SDS-PAGE and stained with dye. The total proteomes were similar for both control and actinomycin D-treated nucleoli on stained gels; however, there were 11 protein bands whose intensity in the actinomycin-treated nucleoli was increased relative to the control nucleoli. Those bands were excised, digested with trypsin, and analyzed by MS. All of those proteins identified by the analysis were examined by immunocytochemical analysis and were shown to be predominantly nucleoplasmic but relocated to the nucleolar periphery following actinomycin D treatment (104, 127). Those approaches are visceral and certainly very useful; however, extreme care should be taken when protein-staining bands are compared quantitatively among different samples because each compared staining may not necessarily contain a single or the same protein as that present in the corresponding staining on gels. MS-based protein identification covers the shortcomings of this comparison to some extent; however, if an excised staining band contained multiple proteins, the identified protein does not always correspond to the protein that changed staining intensity on gels among the compared samples. The method using stable isotope tagging and MS gives an alternative approach to this gel visualization approach (128), and provides an efficient strategy for determining the specific composition, changes in the composition, and changes in the abundance of multiprotein complexes or subcellular structures. Among the reported stable isotope tagging methods, a well-known approach is based on the use of isotope-coded affinity tag (ICAT) reagents and LC-MS, which can be used to compare the relative abundances of tryptic peptides derived from suitable pairs

48

OVERVIEW OF PROTEOMICS

between different samples. Derivatization of two distinct proteomes with the light and heavy versions of the ICAT reagent provides the basis for proteome quantitation by MS analysis. Since the ICAT method can incorporate the isotopic label into only Cys-containing sites of proteins after protein extraction, it simplifies proteome analysis by isolating only Cys-polypeptides and has universal applicability. This approach, in any case, can be used to distinguish the protein components of the complex or the subcellular structure from a background of copurified proteins by comparing the relative abundances of peptides derived from a control sample and the specific complex. An example of this type of analysis is the specific identification of the components present in the RNAP II preinitiation complex that is purified from nuclear extracts by single-step promoter DNA affinity chromatography (129). This same method can certainly detect quantitative changes in the abundance and dynamic changes in the composition of protein complexes, or subcelluar structures obtained from different cell states, as exemplified by an analysis of STE12 protein complexes isolated from yeast cells in different states (128). Several other in vitro or in vivo labeling approaches in combination with mass spectrometry are introduced to quantitate relative protein levels (see Chapter 2). Although those methods are specifically directed toward quantitation of relative abundance of proteins expressed in cells or tissues, they can also be applied to describe the composition, connection, and dynamics of the multiprotein complexes and/or subcellular structure in a way similar to that described for the ICAT method. One example of in vitro labeling is the method using isotope-labeled O-methylisourea [H215N13C(OCH3) 5NH] and unlabeled reagent, which allows quantitative guanidination of the N-terminus of the peptide and the epsilon amino group of lysine residues (see Chapter 2 for details). This reagent modifies all peptides generated by trypsin or Lys-C protease digestion; therefore, the peptide mixture generated by this method is very complex and requires a separation technique for higher resolution. However, the chance for quantification of multiple peptides obtained from the same protein is higher than that using ICAT reagent, which labels only cysteine-containing peptide. This guanidination method was applied to the comparative analysis of the preribosomal ribonucleoprotein (pre-rRNP) complexes associated with three typical trans-acting factors—nucleolin, fibrillarin and B23—which function at different stages of ribosome biogenesis (see Section 2-2-2, Experimental Example 2-3, and Section 3-3-1). The most impressive work done in dynamic proteomics using MS-based organelle proteomics and in vivo stable isotope labeling, called amino acids in cell culture (SILAC), is the dynamic analysis of the entire nucleolus obtained from human HeLa cells (see Section 3-1-1) (84, 130). This study demonstrates the power of the quantitative approach using isotope labeling and LC-MS for the high throughput characterization of the flux of endogenous proteins through even entire subcellular structures or organelles (84). So far we have a number of methodologies on hand with high throughput enough to handle the dynamic nature of not only large cellular machineries (protein complexes) but also of an entire subcellular structure or organelle, whose protein compositions vary extensively under different environments for growth and metabolic conditions. The development of dynamic proteomics

REFERENCES

49

coupled with other strategies heralds a new generation of “proteomic biology” that correlates dynamic proteome changes with cell function and thus enables us to understand biological aspects of living cells from the point of view of proteome dynamics. REFERENCES 1. Wilkins, M. R. (1995). Progress with proteome projects: why all proteins expressed by a genome should be identified and how to do it. Biotech. Gen. Eng. Rev. 13:19–50. 2. Neverova, I., and Van Eyk, J. E. (2005). Role of chromatographic techniques in proteomic analysis. J. Chromatogr. B 815:51–63. 3. Pandey, A., and Mann, M. (2000). Proteomics to study genes and genomes. Nature 405:837–846. 4. Patterson, S. D., and Aebersold, R. H. (2003). Proteomics: the first decade and beyond. Nat. Genet. Suppl. 33:311–323. 5. Anderson, N. L., Tracy, R. P., and Anderson, N. G. (1984). High-resolution twodimensional electrophoretic mapping of plasma proteins. In The Plasma Proteins IV, F. W. Putnam (Ed.), Academic Press, New York, pp. 221–270. 6. Lopez, M. F., Berggren, K., Chernokalskaya, E., Lazarev, A., Robinson, M., and Patton, W. F. (2000). A comparison of silver stain and SYPRO Ruby Protein Gel Stain with respect to protein detection in two-dimensional gels and identification by peptide mass profiling. Electrophoresis 21:3673–3683. 7. Patton, W. F. (2000). A thousand points of light: the application of fluorescence detection technologies to two-dimensional gel electrophoresis and proteomics. Electrophoresis 21:1123–1144. 8. Shevchenko, A., Wilm, M., Vorm, O., and Mann, M. (1996). Mass spectrometric sequencing of proteins silver-stained polyacrylamide gels. Anal. Chem. 68:850–858. 9. Berggren, K., Chernokalskaya, E., Steinberg, T. H., Kemper, C., Lopez, M. F., Diwu, Z., Haugland, R. P., and Patton, W. F. (2000). Background-free, high sensitivity staining of proteins in one- and two-dimensional sodium dodecyl sulfatepolyacrylamide gels using a luminescent ruthenium complex. Electrophoresis 21:2509–2521. 10. Steen, H., and Pandey, A. (2002). Proteomics goes quantitative: measuring protein abundance. Trends Biotechnol. 20:361–364. 11. Patton, W. F. (2002). Detection technologies in proteome analysis. J. Chromatogr. B 771:3–31. 12. Zhou, G., Li, H., DeCamp, D., Chen, S., Shu, H., Gong, Y., Flaig, M., Gillespie, J. W., Hu, N., Taylor, P. R., Emmert-Buck, M. R., Liotta, L. A., Petricoin, E. F. 3rd, and Zhao, Y. (2002). 2D differential in-gel electrophoresis for the identification of esophageal scans cell cancer-specific protein markers. Mol. Cell. Proteomics 1:117–124. 13. Anderson, N. L., Hofmann, J. P., Gemmell, A., and Tayler, J. (1984). Global approaches to quantitative analysis of gene-expression patterns observed by use of two-dimensional gel electrophoresis. Clin. Chem. 30:2031–2036. 14. Tarroux, P., Vincens, P., and Rabilloud, T. (1987). HERMes: a second generation approach to the automatic analysis of two-dimensional electrophoresis gels. Part V: Data analysis. Electrophoresis 8:187–199.

50

OVERVIEW OF PROTEOMICS

15. Tanaka, K., Ido, Y., Akita, S., Yoshida, Y., and Yoshida, T. (1987). Detection of high mass molecules by laser desorption time-of-flight mass spectrometry. In Proceedings of the 2nd Japan–China Joint Symposium on Mass Spectrometry, H. Matsuda, and L. Xiao-tian (Eds.), Osaka, Japan, pp. 185–188. 16. Fenn, J. B., Mann, M., Meng, C. K., Wang, S. F., and Whitehouse, C. M. (1989). Electrospray ionization for mass spectrometry of large molecules. Science 246:64–71. 17. Mann, M., Hojrup, P., and Roepstorff, P. (1993). Use of mass spectrometric molecular weight information to identify proteins in sequence databases. Biol. Mass Spectrom. 22:338–345. 18. James, P., Quadroni, M., Carafoli, E., and Gonnet, G. (1993). Protein identification by mass profile fingerprinting. Biochem. Biophys. Res. Commun. 195:58–64. 19. Pappin, D. J., Hojrup, P., and Bleasby, A. J. (1993). Rapid identification of proteins by peptide-mass fingerprinting. Curr. Biol. 3:327–332. 20. Henzel, W. J., Billeci, T. M., Stults, J. T., Wong, S. C., Grimley, C., and Watanabe, C. (1993). Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proc. Natl. Acad. Sci. USA 90:5011–5015. 21. Rosenfeld, J., Capdevielle, J., Guillemot, J. C., and Ferrara, P. (1992). In-gel digestion of proteins for internal sequence analysis after one- or two-dimensional gel electrophoresis. Anal. Biochem. 203:173–179. 22. Jeno, P., Mini, T., Hintermann,E., and Horst, M. (1995). Internal sequences from proteins digested in polyacrylamide gels. Anal. Biochem. 224:451–455. 23. Shevchenko, A., Wilm, M., Vorm, O., and Mann, M. (1996). Mass spectrometric sequencing of proteins silver-stained polyacrylamide gels. Anal. Chem. 68: 850–858. 24. Andersen, J. S., Svensson, B., and Roepstorff, P. (1996). Electrospray ionization and matrix assisted laser desorption/ionization mass spectrometry: powerful analytical tools in recombinant protein chemistry. Nat. Biotechnol. 14:449–457. 25. Qin, J., and Chait, B. T. (1997). Identification and characterization of posttranslational modifications of proteins by MALDI ion trap mass spectrometry. Anal. Chem. 69:4002–4009. 26. Zabrouskov, V., Giacomelli, L., van Wijk, K. J., and McLafferty, F. W. (2003). A new approach for plant proteomics: characterization of chloroplast proteins of Arabidopsis thaliana by top down mass spectrometry. Mol. Cell. Proteomics 2(12):1253–1260. 27. Mann, M., and Wilm, M. (1994). Error-tolerant identification of peptides in sequence database by peptide sequence tags. Anal. Chem. 66:4390–4399. 28. Eng, J. K., McCormack, A. L., Yates, I., and John, R. (1994). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database (Sequest). J. Am. Soc. Mass Spectrum. 5:976–989. 29. Field, H. I., Fenyo, D., and Beavis, R. C. (2002). RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database (Sonar). Proteomics 2:36–47. 30. Perkins, D. N., Pappin, D. J., Creasy, D. M., and Cottrell, J. S. (1999). Probability-based protein identification by searching sequence database using mass spectrometry data (Mascot). Electrophoresis 20:3551–3567. 31. Ducret, A., Van Oostveen, I., Eng, J. K., Yates, J. R. 3rd, and Aebersold, R. (1998). High throughput protein characterization by automated reverse-phase chromatography/electrospray tandem mass spectrometry. Protein Sci. 7(3):706–719.

REFERENCES

51

32. MacCoss, M. J. (2005). Computational analysis of shotgun proteomics data. Curr. Opin. Chem. Biol. 9:88–94. 33. Tabb, D. L., Eng, J. K., and Yates, J. R. 3rd (2001). In Proteome Research: Mass Spectrometry, P. James, (Ed.), Springer, New York, Vol. 1, pp. 125–142. 34. Tabb, D. L., McDonald, W. H., and Yates, J. R. 3rd (2002). DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. J. Proteome Res. 1(1):21–26. 35. Von Haller, P. D., Yi, E., Donohoe, S., Vaughn, K., Keller, A., Nesvizhskii, A. I., Eng, J., Li, X. J., Goodlett, D. R., Aebersold, R., and Watts, J. D. (2003). The application of new software tools to quantitative protein profiling via isotope-coded affinity tag (ICAT) and tandem mass spectrometry: II. Evaluation of tandem mass spectrometry methodologies for large-scale protein analysis, and the application of statistical tools for data analysis and interpretation. Mol. Cell. Proteomics. 2:428–442. 36. Shinkawa, T., Taoka, M., Yamauchi, Y., Ichimura, T., Kaji, H., Takahashi, N., and Isobe, T. (2005). STEM: a software tool for large-scale proteomic data analyses. J. Proteome Res. 4(5):1826–1831. 37. Yang, X., Dondeti, V., Dezube, R., Maynard, D. M., Geer, L. Y., Epstein, J., Chen, X., Markey, S. P., and Kowalak, J. A. (2004). DBParser: Web-based software for shotgun proteomic data analyses. J. Proteome Res. 3:1002–1008. 38. Evans, G., Wheeler, C. H., Corbett, J. M., and Dunn, M. J. (1997). Construction of HSC-2D PAGE: a two-dimensional gel electrophoresis database of heart proteins. Electrophoresis 18:471–479. 39. Li, X. P., Pleissner, K. P., Scheler, C., Regitz-Zagrosek, V., Salnikow, J., and Jungblut, P. R. (1999). A two-dimensional gel electrophoresis database of rat heart protein. Electrophoresis 20:891–897. 40. Pieper, R., Gatlin, C. L., Makusky, A. J., Russo, P. S., Schatz, C. R., Miller, S. S., Su, Q., McGrath, A. M., Estock, M. A., Parmar, P. P., Zhao, M., Huang, S.-T., Zhou, J., Wang, F., Esquer-Blasco, R., Anderson, N. L., Taylor, J., and Steiner, S. (2003). The human serum proteome: display of nearly 3700 chromatographically separated protein spots on two-dimensional electrophoresis gels and identification of 325 distinct proteins. Proteomics 3:1345–1364. 41. Thongboonkerd, V., Mcleish, K. R., Arthur, J. M., and Klein, J. (2002). Proteomic analysis of normal human urinary proteins isolated by acetone precipitation or ultracentrifugation. Kidney Int., 62:1461–1469. 42. Abbott, A. (2003). Brain protein project enlists mice in “dry run,” Nature 425:110. 43. Heinke, M. Y., Wheeler, C. H., Chang, D., Einstein, R., Drake-Holland, A., Dunn, M. J., and Remedios, C. G. (1998). Protein changes observed in pacing-induced heart failure using two-dimensional electrophoresis. Electrophoresis 19:2021–2030. 44. Van Eyk, J. E. (2001). Proteomics: unraveling the complexity of heart disease and striving to change cardiology. Curr. Opin. Mol. Therapeut. 3:546–553. 45. Nelson, P. S., Han, D., Rochon, Y., Corthals, G. L., Lin, B., Monson, A., Nguyen, V., Franza, B. R., Plymate, S. R., Aebersold, R., and Hood, L. (2000). Comprehensive analysis of prostate gene expression: convergence of expressed sequence tag databases, transcript profiling and proteomics. Electrophoresis 21:1823–1831. 46. Hanash, S. M., Madoz-Gurpide, J., and Misek, D. E. (2002). Identification of novel targets for cancer therapy using expression proteomics. Leukemia 16:478–485.

52

OVERVIEW OF PROTEOMICS

47. Petricoin, E. F., Zoon, K. C., Kohn, E. C., Barrett, J. C., and Liotta, L. A. (2002). Clinical proteomics: translating bench-side promise into bedside reality. Nat. Rev. Drug Discov. 1:683–695. 48. van Der Velden, J., Klein, L. J., Zaremba, R., Boontje, N. M., Huybregts, M. A., Stooker, W., Eijsman, L., de Jong, J. W., Visser, C. A., Visser, F. C., and Stienen, G. J. (2001). Effects of calcium, inorganic, phosphate, and pH on isometric force in single skinned cardiomyocytes from donor and failing human hearts. Circulation 104:1140–1146. 49. Oh, P., Li, Y., Yu, J., Durr, E., Krasinska, K. M., Carver, L. A., Testa, J. E., and Schnitzer, J. E. (2004). Subtractive proteomic mapping of the endothelial surface in lung and solid tumours for tissue-specific therapy. Nature 429:629–635. 50. McDonough, J. L., Neverova, I., and Van Eyk, J. E. (2002). Proteomic analysis of human biopsy samples by single two-dimensional electrophoresis: Coomassie, silver, mass spectrometry, and Western blotting. Proteomics 2:978–987. 51. Hanash, S. (2003). Disease proteomics. Nature 422:226–232. 52. Gygi, S. P., Rochon, Y., Franza, B. R., and Aebersold, R. (1999). Correlation between protein and mRNA abundance in yeast. Mol. Cell. Biol. 19:1720–1730. 53. Putnam, F. W. (1984). Progress in plasma proteins. In The Plasma Proteins, Vol. IV, F. W. Putnam (Ed.), Academic Press, Orlando, FL, pp. 1–44. 54. Gygi, S. P., Corthals, G. L., Zhang, Y., Rochon, Y., and Aebersold, R. (2000). Evaluation of two-dimensional gel electrophoresis-based proteome analysis technology. Proc. Natl. Acad. Sci. USA 17:9390–9395. 55. Ghaemmaghami, S., Huh, W.-K., Bower, K., Howson, R. W., Belle, A., Dephoure, N., O’Shea, E. K., and Weissman, J. S. (2003). Global analysis of protein expression in yeast. Nature 425:737–741. 56. Andrews, B., Bader, G. D., and Boone, C. (2003). Playing tag with the yeast proteome. Nat. Biotechnol. 21:1297–1299. 57. Gerlt, J. A. (2002). Fishing for the functional proteome. Nat. Biotechnol. 20:786–787. 58. Houry, W. A., Frishman, D., Eckerskorn, C., Lottspeich, F., and Hartl, F. U. (1999). Identification of in vivo substrates of the chaperonin GroEL. Nature 402:147–154. 59. Fountoulakis, M., and Langen, H. (1997). Identification of proteins by matrix-assisted laser desorption ionization-mass spectrometry following in-gel digestion in low-salt, nonvolatile buffer and simplified peptide recovery. Anal. Biochem. 250:153–156. 60. Gygi, S. P., and Abersold, R. (2000). Using mass spectrometry for quantitative proteomics. In Proteomics: A Trends Guide, Elsevier Science, London, pp. 31–36. 61. Moseley, M. A. (2001). Current trends in differential expresssion proteomics: Isotopically coded tags. Trends Biotechnol. 19:510–516. 62. Kirkpatrick, D. S., Denison, C., and Gygi, S. P. (2005). Weighing in on ubiquitin: the expanding role of mass spectrometry-based proteomics. Nat. Cell Biol. 7(8):750–757. 63. Takahashi, N., Kaji, H., Yanagida, M., Hayano, T., and Isobe, T. (2003). Proteomics: advanced technology for the analysis of cellular function. J. Nutrition 133:2090–2096. 64. Adam, G. C., Sorensen, E. J., and Cravatt, B. F. (2003). Chemical strategies for functional proteomics Mol. Cell. Proteomics 1(10):781–790. 65. Kobe, B., and Kemp, B. E. (1999). Active site-directed protein regulation. Nature 402:373–376. 66. Blackstock, W. (2000). Trends in automation and mass spectrometry for proteomics. In Proteomics: A Trends Guide, Elsevier Science, London, pp. 12–16.

REFERENCES

53

67. Blackstock, W. P., and Weir, M. P. (1999). Proteomics: quantitative and physical mapping of cellular proteins. Trends Biotechnol. 3:121–127. 68. Santoni, V., Molloy, M., and Rabilloud, T. (2000). Membrane proteins and proteomics: un amour impossible? Electrophoresis 6:1054–1070. 69. Mann, M., and Jensen, O. N. (2003). Proteomic analysis of post-translational modification. Nat. Biotechnol. 21:255–261. 70. Ptacek, J., Devgan, G., Michaud, G., Zhu, H., Zhu, X., Fasolo, J., Guo, H., Jona, G., Breitkreutz, A., Sopko, R., McCartney, R. R., Schmidt, M. C., Rachidi, N., Lee, S. J., Mah, A. S., Meng, L., Stark, M. J., Stern, D. F., De Virgilio, C., Tyers, M., Andrews, B., Gerstein, M., Schweitzer, B., Predki, P. F., and Snyder, M. (2005). Global analysis of protein phosphorylation in yeast. Nature 438(7068):679–684. 71. Adam, G. C., Sorensen, E. J., and Cravatt, B. F. (2002). Proteomic profiling of mechanistically distinct enzyme classed using a commone chemotype. Nat. Biotechnol. 20:805–809. 72. Adam, G. C., Sorensen, E. J., and Cravatt, B. F. (2002). Trifunctional chemical probes for the consolidated detection and identification of enzyme activities from complex proteomes. Mol. Cell. Proteomics. 1(10):828–835. 73. Gerlt, J. A. (2002). Fishing for the functional proteome. Nat. Biotechnol. 20:786–787. 74. Taylor, S. W., Fahy, E., and Ghosh, S. S. (2003). Global organellar proteomics. Trends Biotechnol. 21:82–88. 75. Dreger, M. (2003). Subcelluar proteomics. Mass Spectrom. Rev. 22:27–56. 76. Dreger, M. (2003). Proteome analysis at the level of subcellular structures. Eur. J. Biochem. 270:589–599. 77. Han, J. D., Bertin, N., Hao, T., Goldberg, D. S., Berriz, G. F., Zhang, L. V., Dupuy, D., Walhout, A. J., Cusick, M. E., Roth, F. P., and Vidal, M. (2004). Evidence for dynamically organized modularity in the yeast protein–protein interaction network. Nature 430(6995):88–93. 78. Han, J. D., Dupuy, D., Bertin, N., Cusick, M. E., and Vidal, M. (2005). Effect of sampling on topology predictions of protein–protein interaction networks. Nat. Biotechnol. 23(7):839–844. 79. Li, S., Armstrong, C. M., Bertin, N., Ge, H., Milstein, S., Boxem, M., Vidalain, P. O., Han, J. D., Chesneau, A., Hao, T., Goldberg, D. S., Li, N., Martinez, M., Rual, J. F., Lamesch, P., Xu, L., Tewari, M., Wong, S. L., Zhang, L. V., Berriz, G. F., Jacotot, L., Vaglio, P., Reboul, J., Hirozane-Kishikawa, T., Li, Q., Gabel, H. W., Elewa, A., Baumgartner, B., Rose, D. J., Yu, H., Bosak, S., Sequerra, R., Fraser, A., Mango, S. E., Saxton, W. M., Strome, S., Van Den Heuvel, S., Piano, F., Vandenhaute, J., Sardet, C., Gerstein, M., Doucette-Stamm, L., Gunsalus, K. C., Harper, J. W., Cusick, M. E., Roth, F. P., Hill, D. E., and Vidal, M. (2004). A map of the interactome network of the metazoan C. elegans. Science 303(5657):540–543. 80. Kumar, A., and Snyder, M. (2002). Protein complexes take the bait. Nature 415:123–124. 81. Gavin, A. C., Bosche, M., Krause, R., Grandi, P., Marzioch, M., Bauer, A., Schultz, J., Rick, J. M., Michon, A. M., Cruciat, C. M., Remor, M., Hofert, C., Schelder, M., Brajenovic, M., Ruffner, H., Merino, A., Klein, K., Hudak, M., Dickson, D., Rudi, T., Gnau, V., Bauch, A., Bastuck, S., Huhse, B., Leutwein, C., Heurtier, M.-A., Copley, R. R., Edelmann, A., Querfurth, E., Rybin, V., Drewes, G., Raida, M., Bouwmeester, T., Bork, P., Seraphin, B., Kuster, B., Neubauer, G., and Superti-Furga, G. (2002). Functional

54

82.

83.

84. 85.

86. 87. 88. 89. 90. 91. 92. 93. 94.

95.

OVERVIEW OF PROTEOMICS

organization of the yeast proteome by systematic analysis of protein complexes. Nature 415:141–147. Gavin, A.-C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L. J., Bastuck, S., Dümpelfeld, B., Edelmann, A., Heurtier, M., Hoffman, V., Hoefert, C., Klein, K., Hudak, M., Michon, A. M., Schelder, M., Schirle, M., Remor, M., Rudi, T., Hooper, S., Bauer, A., Bouwmeester, T., Casari, T., Drewes, G., Neubauer, G., Rick, J. M., Kuster, B., Bork, P., Russell, R. B., and Superti-Furga, G. (2006). Proteome survey reveals modularity of the yeast cell machinery. Nature 440(30): 631–636. Ho, Y., Gruhler, A., Heilbut, A., Bader, G. D., Moore, L., Adams, S. L., Millar, A., Taylor, P., Bennett, K., Boutilier, K., Yang, L., Wolting, C., Donaldson, I., Schandorff, S., Shewnarane, J., Vo, M., Taggart, J., Goudreault, M., Muskat, B., Alfarano, C., Dewar, D., Lin, Z., Michalickova, K., Willems, A. R., Sassi, H., Nielsen, P. A., Rasmussen, K. J., Andersen, J. R., Johansen, L. E., Hansen, L. H., Jespersen, H., Podtelejnikov, A., Nielsen, E., Crawford, J., Poulsen, V., Soensen, B. D., Matthiesen, J., Hendrickson, R. C., Gleeson, F., Pawson, T., Moran, M. F., Durocher, D., Mann, M., Hogue, C. W. V., Figeys, D., and Tyers, M. (2002). Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415: 180–183. Andersen, J. S., Lam, Y. W., Leung, A. K., Ong, S. E., Lyon, C. E., Lamond, A. I., and Mann, M. (2005). Nucleolar proteome dynamics. Nature 433(7021):77–83. Patton, W. F. (1999). Proteome analysis. II. Protein subcellular redistribution: linking physiology to genomics via the proteome and separation technologies involved. J. Chromatogr. B Biomed. Sci. Appl. 1(2):203–223. Andersen, J. S., and Mann, M. (2000). Functional genomics by mass spectrometry. FEBS Lett. 1:25–31. Mann, M., Hendrickson, R. C., and Pandey, A. (2001). Analysis of proteins and proteomes by mass spectrometry. Annu. Rev. Biochem. 70:437–473. Godovac-Zimmermann, J., and Brown, L. R. (2001). Perspectives for mass spectrometry and functional proteomics. Mass Spectrom. Rev. 1:1–57. Gudepu, R. G., and Wold, F. (1998). In Proteins: Analysis and Design, R. H. Angeletti (Ed.), Academic Press, San Diego, CA, pp. 121–207. Jensen, O. N. (2004). Modification-specific proteomics: characterization of post-translational modifications by mass spectrometry. Curr. Opin. Chem. Biol. 8(1):33–41. Ahn, N. G., and Resing, K. A. (2001). Toward the phosphoproteome. Nat. Biotechnol. 19(4):317–318. Zhou, H., Watts, J., and Aebersold, R. (2001). A systematic approach to the analysis of protein phosphorylation. Nat. Biotechnol. 19:375–378. Oda, Y., Nagasu, T., and Chait, B. T. (2001). Enrichment analysis of phosphorylated proteins as a tool for probing the phosphoproteome. Nat. Biotechnol. 19:379–382. van Swieten, P. F., Maehr, R., van den Nieuwendijk, A. M., Kessler, B. M., Reich, M., Wong, C. S., Kalbacher, H., Leeuwenburgh, M. A., Driessen, C., van der Marel, G. A., Ploegh, H. L., and Overkleeft, H. S. (2004). Development of an isotope-coded activity-based probe for the quantitative profiling of cysteine proteases. Bioorg. Med. Chem. Lett. 14(12):3131–3134. Hemelaar, J., Galardy, P. J., Borodovsky, A., Kessler, B. M., Ploegh, H. L., and Ovaa, H. (2004). Chemistry-based functional proteomics: mechanism-based activityprofiling tools for ubiquitin and ubiquitin-like specific proteases. J. Proteome Res. 3(2):268–276.

REFERENCES

55

96. Barglow, K. T., and Cravatt, B. F. (2004). Discovering disease-associated enzymes by proteome reactivity profiling. Chem. Biol. 11(11):1523–1531. 97. Saghatelian, A., Jessani, N., Joseph, A., Humphrey, M., and Cravatt, B. F. (2004). Activity-based probes for the proteomic profiling of metalloproteases. Proc. Natl. Acad. Sci. USA 101(27):10000–10005. 98. Hekmat, O., Kim, Y. W., Williams, S. J., He, S., and Withers, S. G. (2005). Active-site peptide “fingerprinting” of glycosidases in complex mixtures by mass spectrometry. Discovery of a novel retaining beta-1,4-glycanase in Cellulomonas fi mi. J. Biol. Chem. 280(42):35126–35135. 99. Winssinger, N., Damoiseaux, R., Tully, D. C., Geierstanger, B. H., Burdick, K., and Harris, J. L. (2004). PNA-encoded protease substrate microarrays. Chem. Biol. 11(10):1351-1360. Comment in: Chem. Biol. 11(10):1328–1330. 100. Bryant, N. J., Govers, R., and James, D. E. (2002). Regulated transport of the glucose transporter GLUT4. Nat. Rev. Mol. Cell Biol. 3:267–277. 101. Andersen, J. S., Wilkinson, C. J., Mayor, T., Mortensen, P., Nigg, E. A., and Mann, M. (2003). Proteomic characterization of the human centrosome by protein correlation profiling. Nature 426(6966):570–574. 102. Foster, L. J., de Hoog, C. L., Zhang, Y., Zhang, Y., Xie, X., Mootha, V. K., and Mann, M. (2006). A mammalian organelle map by protein correlation profiling. Cell 125(1):187–199. 103. Donnes, P., and Hoglund, A. (2004). Predicting protein subcellular localization: past, present, and future. Geno. Prot. Bioinfo. 2(4):209–215. 104. Andersen, J. S., Lyon, C. E., Fox, A. H., Leung, A. K. L., Lam, W. W., Steen, H., Mann, M., and Lamond, A. I. (2002). Directed proteomic analysis of the human nucleolus. Curr. Biol. 12:1–11. 105. Scher, A., Coute, Y., De’on, C., Callé, A., Kindbeiter, K., Sanchez, J.-C., Greco, A., Hochstrasser, D., and Diaz, J.-J. (2002). Functional proteomic analysis of human nucleolus. Mol. Biol. Cell 13:4100–4109. 106. Bell, A. W., Ward, M. A., Blackstock, W. P., Freeman, H. N., Choudhary, J. S., Lewis, A. P., Chotai, D., Fazel, A., Gushue, J. N., Paiement, J., Palcy, S., Chevet, E., LafreniereRoula, M., Solari, R., Thomas, D. Y., Rowley, A., and Bergeron, J. J. (2001). Proteomics characterization of abundant Golgi membrane proteins. J. Biol. Chem. 276:5152–5165. 107. Dreger, M., Bengtsson, L., Schneberg, T., Otto, H., and Hucho, F. (2001). Nuclear envelope proteomics: novel integral membrane proteins of the inner nuclear membrane. Proc. Natl. Acad. Sci. USA 98:11943–11948. 108. Neubauer, G., King, A., Rappsilber, J., Calvio, C., Watson, M., Ajuh, P., Sleeman, J., Lamond, A., and Mann, M. (1998). Mass spectrometry and EST-database searching allows characterization of the multi-protein spliceosome complex. Nat. Genet. 20: 46–50. 109. Rout, M. P., Aitchison, J. D., Suprapto, A., Hjertaas, K., Zhao, Y., and Chait, B. T. (2000). The yeast nuclear pore complex: composition, architecture, and transport mechanism. J. Cell Biol. 148:635–651. 110. Peters, J.-M., King, R. W., Hoog, C., and Kirschner, M. W. (1996). Identification of BIME as a subunit of the anaphase-promoting complex. Science 274:1199–1201. 111. Zachariae, W., Shin, T. H., Galova, M., Obermaier, B., and Nasmyth, K. (1996). Identification of subunits of the anaphase-promoting complex of Saccharomyces cerevisiae. Science 274:1201–1204.

56

OVERVIEW OF PROTEOMICS

112. Kumar, A., Agarwal, S., Heyman, J. A., Matson, S., Heidtman, M., Piccirillo, S., Umansky, L., Drawid, A., Jansen, R., Liu, Y., Cheung, K. H., Miller, P., Gerstein, M., Roeder, G. S., and Snyder, M. (2002). Subcellular localization of the yeast proteome. Genes Dev. 16(6):707–719. 113. Huh, W. K., Falvo, J. V., Gerke, L. C., Carroll, A. S., Howson, R. W., Weissman, J. S., and O’Shea, E. K. (2003). Global analysis of protein localization in budding yeast. Nature 425(6959):686–691. 114. Simpson, J. C., Wellenreuther, R., Poustka, A, Pepperkok, R., and Wiemann, S. (2000). Systematic subcellular localization of novel proteins identified by large-scale cDNA sequencing. EMBO Reports 1(3):287–292. 115. Cronshaw, J. M., Krutchinsky, A. N., Zhang, W., Chait, B. T., and Matunis, M. J. (2002). Proteomic analysis of the mammalian nuclear pore complex. J. Cell Biol. 158(5):915–927. 116. Link, A. J., Eng, J., Schieltz, D. M., Carmack, E., Mize, G. J., Morris, D. R., Garvik, B. M., and Yates, J. R. (1999). Direct analysis of protein complexes using mass spectrometry. Nat. Biotechnol. 17:676–682. 117. Zhou, Z., Licklider, L. J., Gygi, S. P., and Reed, R. (2002). Comprehensive proteomic analysis of the human spliceosome. Nature 419:182–185. 118. Hartmuth, K., Urlaub, H., Vornlocher, H.-P., Will, C. L., Gentzel, M., Wilm, M., and Lührmann, R. L. (2002). Protein composition of human prespliceosomes isolated by a tobramycin affinity-selection method. Proc. Natl. Acad. Sci. USA 99(26):16719–16724. 119. Fatica, A., and Tollervey, D. (2002). Making ribosomes. Curr. Opin. Cell Biol. 14:313–318. 120. Fromont-Racine, M., Senger, B., Saveanu, C., and Fasiolo, F. (2003). Ribosome assembly in eukaryotes. Gene 313:17–42. 121. Takahashi, N., Yanagida, M., Fujiyama, S., Hayano, T., and Isobe, T. (2003). Proteomic snapshot analysis of preribosomal ribonucleoprotein complexes formed at various stages of ribosome biogenesis in yeast and mammalian cells. Mass Spectrom. Rev. 22:287–317. 122. Krogan, N. J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A. P., Punna, T., Peregrln-Alvarez, J. M., Shales, M., Zhang, X., Davey, M., Robinson, M. D., Paccanaro, A., Bray, J. E., Sheung, A., Beattie, B., Richards, D. P., Canadien, V., Lalev, A., Mena, F., Wong, P., Starostine, A., Canete, M. M., Vlasblom, J., Wu, S., Orsi, C., Collins, S. R., Chandran, S., Haw, R., Rilstone, J. J., Gandi, K., Thompson, N. J., Musso, G., Onge, P. S., Ghanny, S., Lam, M. H. Y., Butland, G., Altaf-Ul, A. M., Kanaya, K. S., Shilatifard, A., O’Shea, E., Weissman, J. S., Ingles, C. J., Hughes, T. R., Parkinson, J., Gerstein, M., Wodak, S. J., Emili, A., and Greenblatt, J. F. (2006). Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(30):637–643. 123. Rigaut, G., Shevchenko, A., Rutz, B., Wilm, M., Mann, M., and Seraphin, B. (1999). A generic protein purification method for protein complex characterization and proteome exploration. Nat. Biotechnol. 17:1030–1032. 124. Hartwell, L. H., Hopfield, J. J., Leibler, S., and Murray, A. W. (1999). From molecular to modular cell biology. Nature 402:C47–C52.

REFERENCES

57

125. Makarov, E. M., Makarova, O. V., Urlaub, H., Gentzel, M., Will, C. L., Wilm, M., and Luhrmann, R. (2002). Small nuclear ribonucleoprotein remodeling during catalytic activation of the spliceosome. Science 298(5601):2205–2208. 126. Ospina, J. K., and Matera, A. G. (2002). Proteomics: the nucleolus weighs in dispatch. Curr. Biol. 12:R29–R31. 127. Fox, A. H., Lam, Y. W., Leung, A., Lyon, C. E., Andersen, J. S., Mann, M., and Lamond, A. I. (2002). Paraspeckles: a novel nuclear domain. Curr. Biol. 12:13–25. 128. Ranish, J. A., Yi, E. C., Leslie, D. M., Purvine, S. O., Goodlett, D. R., Eng, J., and Aebersold, R. (2003). The study of macromolecular complexes by quantitative proteomics. Nat. Genet. 33:349–355. 129. Ranish, J. A., Yudkovsky, N., and Hahn, S. (1999). Intermediates in formation and activity of the RNA polymerase II preinitiation complex: holoenzyme recruitment and a postrecruitment role for the TATA box and TFIIB. Genes Dev. 13:49–63. 130. Aebersold, R., and Mann, M. (2003). Mass spectrometry-based proteomics. Nature 422:198–207. 131. Salih, E., Ashkar, S., Gerstenfeld, L. C., and Glimcher, M. J. (1997). Identification of the phosphorylated sites of metabolically 32P-labeled osteopontin from cultured chicken osteoblasts. J. Biol. Chem. 272:13966–13973. 132. Salih, E. (2003). In vivo and in vitro phosphorylation regions of bone sialoprotein. Connect. Tissue Res. 44:223–229. 133. Gronborg, M., Kristiansen, T. Z., Stensballe, A., Andersen, J. S., Ohara, O., Mann, M., Jensen, O. N., and Pandey, A. (2002). A mass spectrometry-based proteomic approach for identification of serine/threonine-phosphorylated proteins by enrichment with phospho-specific antibodies: identification of a novel protein, Frigg, as a protein kinase A substrate. Mol. Cell. Proteomics 1(7):517–527. 134. Yanagida, M., Miura, Y., Yagasaki, K., Taoka, M., Isobe, T., and Takahashi, N. (2000). Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry analysis of proteins detected by anti-phosphotyrosine antibody on two-dimensional gels of fibroblast cell lysates after tumor necrosis factor-a stimulation. Electrophoresis 21:1890–1898. 135. Steen, H., Fernandez, M., Ghaffari, S., Pandey, A., and Mann, M. (2003). Phosphotyrosine mapping in Bcr/Abl oncoprotein using phosphotyrosine-specific Immonium ion scanning. Mol. Cell. Proteomics 2(3):138–145. 136. Oda, Y., Huang, K., Cross, F. R., Cowburn, D., and Chait, B. T. (1999). Accurate quantitation of protein expression and site-specific phosphorylation. Proc. Natl. Acad. Sci. USA 96:6591–6596. 137. Knight, Z. A., Schilling, B., Row, R. H., Kenski, D. M., Gibson, B. W., and Shokat, K. M. (2003). Phosphospecific proteolysis for mapping sites of protein phosphorylation. Nat. Biotechnol. 21:1047–1054. 138. Elia, A. E. H., Cantley, L. C., and Yaffe, M. B. (2003). Proteomic screen finds pSer/ pThr-binding domain localizing Plk1 to mitotic substrates. Science 299:1228–1231. 139. Borchers, C. H., Thapar, R., Petrotchenko, E. V., Torres, M. P., Speir, J. P., Easterling, M., Dominski, Z., and Marzluff, W. F. (2006). Combined top–down and bottom–up proteomics identifies a phosphorylation site in stem-loop-binding proteins that contributes to high-affinity RNA binding. Proc. Natl. Acad. Sci. USA 103:3094–3099.

58

OVERVIEW OF PROTEOMICS

140. Meng, F., Du, Y., Miller, L. M., Patrie, S. M., Robinson, D. E., and Kelleher, N. L. (2004). Molecular-level description of proteins from Saccharomyces cerevisiae using quadrupole FT hybrid mass spectrometry for top–down proteomics. Anal. Chem. 76(10):2852–2858. 141. Ibarrola, N., Kalume, D. E., Gronborg, M., Iwahori, A., and Pandey, A. (2003). A proteomic approach for quantitation of phosphorylation using stable isotope labeling in cell culture. Anal. Chem. 75(22):6043–6049. 142. Ibarrola, N., Molina, H., Iwahori, A., and Pandey, A. (2004). A novel proteomic approach for specific identification of tyrosine kinase substrates using [13C]tyrosine. J. Biol. Chem. 279(16):15805–15813. 143. MacDonald, J. A., Mackey, A. J., Pearson, W. R., and Haystead, T. A. J. (2002). A strategy for the rapid identification of phosphorylation sites in the phosphoproteome. Mol. Cell. Proteomics 1:314–322. 144. Ballif, B. A., Roux, P. P., Gerber, S. A., MacKeigan, J. P., Blenis, J., and Gygi, S. P. (2005). Quantitative phosphorylation profiling of the ERK/p90 ribosomal S6 kinasesignaling cassette and its targets, the tuberous sclerosis tumor suppressors. Proc. Natl. Acad. Sci. USA 102(3):667–672. 145. Dierck, K., Machida, K., Voigt, A., Thimm, J., Horstmann, M., Fiedler, W., Mayer, B. J., and Nollau, P. (2006). Quantitative multiplexed profiling of cellular signaling networks using phosphotyrosine-specific DNA tagged SH2 domains. Nat. Methods 3(9):737–744. 146. Kwon, S. W., Kim, S. C., Jaunbergs, J., Falck, J. R., and Zhao, Y. (2003). Selective enrichment of thiophosphorylated polypeptides as a tool for the analysis of protein phosphorylation. Mol. Cell. Proteomics 2(4): 242–247. 147. Moser, K., and White, F. M. (2006). Phosphoproteomic analysis of rat liver by high capacity IMAC and LC-MS/MS. J. Proteome Res. 5(1):98–104. 148. Aprilita, N. H., Huck, C. W., Bakry, R., Feuerstein, I., Stecher, G., Morandell, S., Huang, H. L., Stasyk, T., Huber, L. A., and Bonn, G. K. (2005). Poly(glycidyl methacrylate/ divinylbenzene)-IDA-FeIII in phosphoproteomics. J. Proteome Res. 4(6):2312–2319. 149. Larsen, M. R., Thingholm, T. E., Jensen, O. N., Roepstorff, P., and Jorgensen, T. J. (2005). Highly selective enrichment of phosphorylated peptides from peptide mixtures using titanium dioxide microcolumns. Mol. Cell. Proteomics. 4(7):873–886. 150. Brill, L. M., Salomon, A. R., Ficarro, S. B., Mukher, J. I. M., Stettler-Gill, M., and Peters, E. C. (2004). Robust phosphoproteomic profiling of tyrosine phosphorylation sites from human T cells using immobilized metal affinity chromatography and tandem mass spectrometry. Anal. Chem. 76(10):2763–2772. 151. Cantin, G. T., Venable, J. D., Cociorva, D., and Yates, J. R. 3rd. (2006). Quantitative phosphoproteomic analysis of the tumor necrosis factor pathway. J. Proteome Res. 5(1):127–134. 152. Kaji, H., Saito, H., Yamauchi, Y., Sinkawa, T., Taoka, M., Hirabayashi, J., Kasai, K., Takahashi, N., and Isobe, T. (2003). Lectin affinity capture, isotope-coded tagging and mass spectrometry to identify N-linked glycoprotein. Nat. Biotechnol. 21(6):667–672. 153. Zhang, H., Li, X. J., Martin, D. B., and Aebersold, R. (2003). Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. Nat. Biotechnol. 21(6):660–666.

REFERENCES

59

154. Wells, L., Vosseller, K., Cole, R. N., Cronshaw, J. M., Matunis, M. J., and Hart, G. W. (2002). Mapping sites of O-GlcNAc modification using affinity tags for serine and threonine post-translational modifications. Mol. Cell. Proteomics 1(10): 791–804. 155. Cieniewski-Bernard, C., Bastide, B., Lefebvre, T., Lemoine, J., Mounier, Y., and Michalski, J. C. (2004). Identification of O-linked N-acetylglucosamine proteins in rat skeletal muscle using two-dimensional gel electrophoresis and mass spectrometry. Mol. Cell. Proteomics. 3(6):577–585. 156. Boisvert, F. M., Cote, J., Boulanger, M. C., and Richard, S. (2003). A proteomic analysis of arginine-methylated protein complexes. Mol. Cell. Proteomics. 2(12):1319–1330. 157. Wu, C. C., MacCoss, M. J., Mardones, G., Finnigan, C., Mogelsvang, S., Yates, J. R. 3rd, and Howell, K. E. (2004). Organellar proteomics reveals Golgi arginine dimethylation. Mol. Biol. Cell 15(6):2907–2919. 158. Ong, S. E., Mittler, G., and Mann, M. (2004). Identifying and quantifying in vivo methylation sites by heavy methyl SILAC. Nat. Methods 1(2):119–126. 159. Matsumoto, M., Hatakeyama, S., Oyamada, K., Oda, Y., Nishimura, T., and Nakayama, K. (2005). Large-scale analysis of the human ubiquitin-related proteome. Proteomics 5(16):4145–4151. 160. Peng, J., Schwartz, D., Elias, J. E., Thoreen, C. C., Cheng, D., Marsischky, G., Roelofs, J., Finley, D., and Gygi, S. P. (2003). A proteomics approach to understanding protein ubiquitination. Nat. Biotechnol. 21(8):921–926. 161. Gocke, C., Yu, H., and Kang, J. (2004). Systematic identification and analysis of mammalian SUMO substrates. J. Biol. Chem. 280(6):5004–5012. 162. Rosas-Acosta, G., Russell, W. K., Deyrieux, A., Russell, D. H., and Wilson, V. G. (2005). A universal strategy for proteomic studies of SUMO and other ubiquitin-like modifiers. Mol. Cell. Proteomics 4:56–72. 163. Panse, V. G., Hardeland, U., Werner, T., Kuster, B., and Hurt, E. (2004). A proteomewide approach identifies sumolyated substrate proteins in yeast. J. Biol. Chem. 279:41346–41351. 164. Li, X. J., Pedrioli, P. G., Eng, J., Martin, D., Yi, E. C., Lee, H., and Aebersold, R. (2004). A tool to visualize and evaluate data obtained by liquid chromatography–electrospray ionization–mass spectrometry. Anal. Chem. 76:3856–3860. 165. Vertegaal, A. C., Andersen, J. S., Ogg, S. C., Hay, R. T., Mann, M., and Lamond, A. I. (2006). Distinct and overlapping sets of SUMO-1 and SUMO-2 target proteins revealed by quantitative proteomics. Mol. Cell. Proteomics 5(12):2298–2310. 166. Vertegaal, A. C., Ogg, S. C., Jaffray, E., Rodriguez, M. S., Hay, R. T., Andersen, J. S., Mann, M., and Lamond, A. I. (2004). A proteomic study of SUMO-2 target proteins. J. Biol. Chem. 279:33791–33798. 167. Denison, C., Rudner, A. D., Gerber, S. A., Bakalarski, C. E., Moazed, D., and Gygi, S. P. (2005). A proteomic strategy for gaining insights into protein sumoylation in yeast. Mol. Cell. Proteomics 4(3):246–254. 168. Wohlschlege, J. A., Johnson, E. S., Reed, S. I., and Yates, J. R. 3rd. (2004). Global analysis of protein sumoylation in Saccharomyces cerevisiae. J. Biol. Chem. 279: 45662–45668. 169. Hao, G., Derakhshan, B., Shi, L., Campagne, F., and Gross, S. G. (2006). SNOSID, a proteomic method for identification of cysteine S-nitrosylation sites in complex protein mixtures. Proc. Natl. Acad. Sci. USA 103(4):1012–1017.

60

OVERVIEW OF PROTEOMICS

170. Kuncewicz, T., Sheta, E. A., Goldknopf, I. L., and Kone, B. C. (2003). Proteomic analysis of S-nitrosylated proteins in mesangial cells. Mol. Cell. Proteomics 2(3):156–163. 171. Castegna, A., Thongboonkerd, V., Klein, J. B., Lynn, B., Markesbery, W. R., and Butterfield, D. A. (2003). Proteomic identification of nitrated proteins in Alzheimer’s disease brain. J. Neurochem. 85:1394–1401. 172. Ballesteros, M., Fredriksson, A., Henriksson, J., and Nystrom, T. (2001). Bacterial senescence: protein oxidation in non-proliferating cells is dictated by the accuracy of the ribosome. EMBO J. 20:5280–5289. 173. Poon, H. F., Castegna, A., Farr, S. A., Thongboonkerd, V., Lynn, B. C., Banks, W. A., Morley, J. E., Klein, J. B., and Butterfield, D. A. (2004). Quantitative proteomics analysis of specific protein expression and oxidative modification in aged senescenceaccelerated-prone mice brain. Neuroscience 126:915–926. 174. Soreghan, B. A., Yang, F., Thomas, S. N., Hsu, J., and Yang, A. J. (2003). Highthroughput proteomic-based identification of oxidatively induced protein carbonylation in mouse brain. Pharm. Res. 20(11):1713–1720. 175. Kho, Y., Kim, S. C., Jiang, C., Barma, D., Kwon, S. W., Cheng, J., Jaunbergs, J., Weinbaum, C., Tamanoi, F., Falck, J., and Zhao, Y. (2004). A tagging-via-substrate technology for detection and proteomics of farnesylated proteins. Proc. Natl. Acad. Sci. USA 101(34):12479–12484. 176. Boisson, B., and Meinnel, T. (2003). A continuous assay of myristoyl-CoA:protein N-myristoyltransferase for proteomic analysis. Anal. Biochem. 322(1):116–123. 177. Liu, Y., Patricelli, M. P., and Cravatt, B. F. (1999). Activity-based protein profi ling: the serine hydrolases. Proc. Natl. Acad. Sci. USA 96(26):14694–14699. 178. Kidd, D., Liu, Y., and Cravatt, B. F. (2001). Profiling serine hydrolase activities in complex proteomes. Biochemistry 40(13):4005–4015. 179. Greenbaum, D., Medzihradszky, K. F., Burlingame, A., and Bogyo, M. (2000). Epoxide electrophiles as activity-dependent cysteine protease profiling and discovery tools. Chem. Biol. 7(8):569–581. 180. Kocks, C., Maehr, R., Overkleeft, H. S., Wang, E. W., Iyer, L. K., Lennon-Dumenil, A. M., Ploegh, H. L., and Kessler, B. M. (2003). Functional proteomics of the active cysteine protease content in Drosophila S2 cells. Mol. Cell. Proteomics. 2(11):1188–1197. 181. Adam, G. C., Cravatt, B. F., and Sorensen, E. J. (2001). Profi ling the specific reactivity of the proteome with non-directed activity-based probes. Chem. Biol. 8(1):81–95. 182. Speers, A. E., Adam, G. C., and Cravatt, B. F. (2003). Activity-based protein profiling in vivo using a copper(i)-catalyzed azide-alkyne [3 ⫹ 2] cycloaddition. J. Am. Chem. Soc. 125(16):4686–4687. 183. Leung, D., Hardouin, C., Boger, D. L., and Cravatt, B. F. (2003). Discovering potent and selective reversible inhibitors of enzymes in complex proteomes. Nat. Biotechnol. 21(6):687–691. 184. Borodovsky, A., Ovaa, H., Kolli, N., Gan-Erdene, T., Wilkinson, K. D., Ploegh, H. L., and Kessler, B. M. (2002). Chemistry-based functional proteomics reveals novel members of the deubiquitinating enzyme family. Chem. Biol. 9:1149–1159. 185. Ovaa, H., Kessler, B. M., Rolen, U., Galardy, P. J., Ploegh, H. L., and Masucci, M. G. (2004). Activity-based ubiquitin-specific protease (USP) profiling of virus-infected and malignant human cells. Proc. Natl. Acad. Sci. USA 101(8):2253–2258.

REFERENCES

61

186. Ovaa, H., Galardy, P. J., and Ploegh, H. L. (2005). Mechanism-based proteomics tools based on ubiquitin and ubiquitin-like proteins: synthesis of active site-directed probes. Methods Enzymol. 399:468–478. 187. Galardy, P., Ploegh, H. L., and Ovaa, H. (2005). Mechanism-based proteomics tools based on ubiquitin and ubiquitin-like proteins: crystallography, activity profiling, and protease identification. Methods Enzymol. 399:120–131. 188. Rolen, U., Kobzeva, V., Gasparjan, N., Ovaa, H., Winberg, G., Kisseljov, F., and Masucci, M. G. (2006). Activity profiling of deubiquitinating enzymes in cervical carcinoma biopsies and cell lines. Mol. Carcinog. 45(4):260–269. 189. Bogyo, M., McMaster, J. S., Gaczynska, M., Tortorella, D., Goldberg, A. L., and Ploegh, H. (1997). Covalent modification of the active site threonine of proteasomal subunits and the Escherichia coli homolog HslV by a new class of inhibitors. Proc. Natl. Acad. Sci. USA 94:6629–6634. 190. Chan, E. W., Chattopadhaya, S., Panicker, R. C., Huang, X., and Yao, S. Q. (2004). Developing photoactive affinity probes for proteomic profiling: hydroxamate-based probes for metalloproteases. J. Am. Chem. Soc. 126(44):14435–14446. 191. Lo, L. C., Chiang, Y. L., Kuo, C. H., Liao, H. K., Chen, Y. J., and Lin, J. J. (2005). Study of the preferred modification sites of the quinone methide intermediate resulting from the latent trapping device of the activity probes for hydrolases. Biochem. Biophys. Res. Commun. 326(1):30–35. 192. Lo, L.-C., Pang, T.-L., Kuo, C.-H., Chiang, Y.-L., Wang, H.-Y., and Lin, J.-J. (2002). Design and synthesis of class-selective activity probes for protein tyrosine phosphatases. J. Proteome Res. 1:35–40. 193. Kumar, S., Zhou, B., Liang, F., Wang, W.-Q., Huang, Z., and Zhang, Z.-Y. (2004). Activity-based probes for protein tyrosine phosphatases. Proc. Natl. Acad. Sci. USA 101:7943–7948. 194. Godl, K., Wissing, J., Kurtenbach, A., Habenberger, P., Blencke, S., Gutbrod, H., Salassidis, K., Stein-Gerlach, M., Missio, A., Cotten, M., and Daub, H. (2003). An efficient proteomics method to identify the cellular targets of protein kinase inhibitors. Proc. Natl. Acad. Sci. USA 100:15434–15439. 195. Tsai, C.-S., Li, Y.-K., and Lo, L.-C. (2002). Design and synthesis of activity probes for glycosidases. Org. Lett. 4:3607–3610. 196. Williams, S. J., Hekmat, O., and Withers, S. G. (2006). Synthesis and testing of mechanism-based protein-profiling probes for retaining endo-glycosidases. Chem. Biochem. 7(1):116–124. 197. Orru, S., Caputo, I., D’Amato, A., Ruoppolo, M., and Esposito, C. (2003). Proteomics identification of acyl-acceptor and acyl-donor substrates for transglutaminase in a human intestinal epithelial cell line: implication for celiac disease. J. Biol. Chem. 273:31766–31773. 198. Ruoppolo, M., Orru, S., D’Amato, A., Francese, S., Rovero, P., Marino, G., and Esposito, C. (2003). Analysis of transglutaminase protein substrates by functional proteomics. Protein Sci. 12:1290–1297.

2 PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

2-1

LC-BASED PROTEOMICS TECHNOLOGIES

In classical proteome analysis, proteins are separated most commonly by gel electrophoresis to reduce sample complexity prior to introduction into the mass spectrometer, as described in Sections 1-2-1, 1-2-2, and 1-2-3. One-dimensional (1D) gels, which separate proteins based on molecular mass, provide a low resolution separation of proteins but, when coupled with tandem mass spectrometry, can be used to identify proteins in moderately complex mixtures. For more complex mixtures, 2D gels are used to separate proteins and provide better resolution of proteins (1). However, 2D gel-based analyses are still labor intensive, have a limited dynamic range, and are also not necessarily applicable to membranes or highly basic or highly acidic proteins under standard conditions (see Section 1-2-3). As an alternative to gel-based methods, liquid chromatography (LC) [and, in some cases, electrophoretic separation in liquid-phase techniques (capillary isoelectrophoresis, capillary zone and free-flow electrophoresis)] is used for the separation of a complex mixture of proteins. LC employs one or more inherent characteristics of a protein—its mass, isoelectric point, hydrophobicity, or biospecificity (2); accordingly, purification of proteins could be achieved by a combination of columns with different separation modes, such as those of anion exchange, cation exchange, reversed phase, size exclusion, or specific affinity.

Proteomic Biology Using LC-MS: Large Scale Analysis of Cellular Dynamics and Function By Nobuhiro Takahashi and Toshiaki Isobe Copyright © 2008 John Wiley & Sons, Inc.

63

64

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

These separation methods have used in various combinations for isolating a few proteins of interest in experiments, and the remaining coexisting proteins were mostly discarded during the purification processes. After the purification, the purified proteins were used for further biochemical characterization. More specifically, in those procedures, the eluent from the first separation column was collected in many fractions and then only fractions of interest were applied to the second separation column. The fractions can be collected by time, drop, or inflection point using an autosampler. A collection of fractions based on inflection points eliminates the splitting of broad peaks between two fractions. If necessary, the third or more columns were used for further purification. This approach could be categorized as multidimensional liquid chromatography (multi-LC). Multi-LC was performed manually in classical protein purification and was used for limited preparative purpose, where only proteins of interest were analyzed by subsequent biochemical methods. 2D-PAGE was used mostly for analytical purposes, for example visualization of proteins as protein maps by staining on 2D gels was often the end point of a series of experiments until the proteomics approach was adapted to 2D-PAGE. Meanwhile, in classical protein sequence determination, a purified protein underwent chemical or enzymatic digestion to produce many peptides, which were then separated by a multi-LC approach into peptides that were pure enough to be sequenced by the Edman method (see Section 1-2-2). Unlike protein purification, the procedure for peptide purification had to be systematic: that is, ideally all peptides (or as many peptides as possible) generated from a protein of interest had to be isolated to cover the entire amino acid sequence of the analyzed protein. MultiLC with a combination of various columns—typically that of ion-exchange and reversed-phase columns—was often used for this purpose and was done manually. To isolate all generated peptides, every peptide-containing fraction collected from the first column was applied to the second column. If necessary, further purification with a third column was done for every peptide-containing fraction eluted from the second column, and the purification steps were continued until all of the peptide purification was complete. Thus, systematic peptide separation even for one digest of a protein involved dozens of manual LC applications. Although the development of high performance (pressure) liquid chromatography (HPLC) revolutionized the efficiency of peptide purification in the early 1980s, manual operation of multi-LC often caused many problems, such as precipitation during concentration of samples, sample loss by multihandling, increased time and labor, chance of human error, and decreased reproducibility through accumulation of experimental errors (3). Nonetheless, no significant effort to develop an automated procedure that separates all peptides systematically in a single experimental operation was made until the mid-1980s. The first automated multi-LC system aimed at the systematic isolation and recovery of all peptides from extremely complex peptide mixtures generated by tryptic digestion of large proteins was developed in 1985 (4, 5). The principle of this system was later adapted to the direct proteomic analysis of complex protein mixtures using multidimensional LC-MS/MS (6). The idea behind this automated multi-LC system is somewhat similar to that of 2D-PAGE, which uses a combination of isoelectric focusing

LC-BASED PROTEOMICS TECHNOLOGIES

65

and SDS-PAGE. Proteins are separated first by charge and second by size of protein. The automated system for the multi-LC technique performs sequential chromatography on two columns with different separation modes: (1) a sample mixture is applied to an ion-exchange column and eluted in a stepwise manner; (2) the eluent from the first column is introduced directly into the second column, which is a reversed-phase column connected online in tandem; (3) after application of the eluent, reversed-phase chromatography is performed by a linear gradient elution; and (4) stepwise elution for ion-exchange chromatography and gradient elution for reversed-phase chromatography are synchronized by a computer program that controls the LC-pump system (Fig. 2-1A,B). The resolution power of the columns is multiplied by a combination of the two columns with different separation mode or specificity. In principle, if each column is capable of resolving at peptide mixture into, for instance, 50 peaks, the theoretical separation capacity of the combination is 50 ⫻ 50 ⫽ 2500 peaks (Fig. 2-1C) (3). In this approach, the multidimensional system should allow for as much peptide coverage as possible, not only selected fractions (peaks) of the first dimension, as in heart cutting procedures used in conventional HPLC where only regions (peaks) of interest are subjected to further analysis (7). A procedure with the same principle as this automated system was later applied to the systematic separation of protein mixtures and was applied to the separation of extracts obtained from mammalian cerebellum (8). The protein profile obtained by the multidimensional LC system could be used for analysis of expression levels of cellular proteins and their dynamics (9–11). The 2DLC/MS system can typically resolve several hundred proteins expressed in cells and is therefore a good supplement to 2D-PAGE. For this purpose, one can use a simple mass spectrometer with an electrospray ionization source, such as ESI-ToF MS without a collision cell, and construct an LC-based “protein profiling” system that is fully automated from sample injection to data collection. However, this system cannot provide information on protein identification, because the mass data of a protein molecule is insufficient to specify a particular protein, unless candidates are predicted (12). In the original version of the multidimensional technique, a protein was digested into peptides prior to chromatographic separation, and conventional sizes were used for the two columns (4.6 mm I.D. ⫻ 30 cm, and 4.6 mm I.D. ⫻ 15 cm) with ion-exchange and reversed-phase modes in the first and second chromatography steps, respectively (4). The peptides eluted from the second column were then detected with at UV (ultraviolet) detector. This technique has been refined for the proteomics approach by incorporating MS and low flow micro-LC or nano-LC technologies; for example, a conventional UV detector and conventional large size HPLC columns were replaced by MS with ESI ion source and miniaturized capillary columns (6, 13, 14). Several installations are possible for performing automated 2D-LC-MS (MS/MS). For instance, a system coupled with ESI-MS, in which a single capillary column is packed with two independent chromatography phases in series [i.e., a biphasic capillary column (50–150 μm I.D. ⫻ 300–1000 mm)], has been developed (Fig. 2-1A) (6). The other system is composed of two independent, conventional HPLC assemblies with an ion-exchange and a reversed-phase column (1–2 mm I.D. ⫻ 30–1000 mm) connected in tandem through a column-switching valve, and a mass spectrometer with ESI ion source

66

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

Fig. 2-1. (A) Schematic diagram of an automated multidimensional LC-MS/MS system. The system is composed of two independent LC assemblies equipped with an ion-exchange or a reversedphase microcapillary column connected in tandem through a column-switching valve and a reversed-phase “trap” column-based solvent desalting unit, and a mass spectrometer with electrospray source. The large amount of MS and MS/MS data collected are then processed by a series of data processing and database-searching programs (e.g., Mascot or SEQUEST) for automatic assignment of peptides and proteins on a genome/protein sequence database. [From R. J. Simpson (Ed.), Proteins and Proteomics: A Laboratory Manual, Cold Spring Harbor Laboratory Press (2002).] In another system, two microcapillary columns are replaced by a single biphasic column packed with an ion-exchange and a reversed-phase packing material in series (inset). [Adopted by permisssion from Macmillan Publishers Ltd.; Link et al., Nat. Biotechnol. (Ref. 6) (1999). Reprinted from Issaq et al., J. Chromatogr. B. (Ref. 7) (2005), with permission from Elsevier.] (B) Time-dependent elution program for the operation of automated multidimensional LC system. The program synchronizes ion-exchange and reversed-phase chromatography by controlling two LC assemblies (LC1 and LC2) and an electrical six-way column-switching valve. In this program, the elution of solutes from the ion-exchange column (operated by LC1) is stopped during the chromatography in order to perform the reversed-phase chromatography, to which the eluent from the ion-exchange column is applied. After completion of the reversed-phase chromatography, the ion-exchange chromatography is started again. These separation cycles are repeated a number of times as set by the control program. [From R. J. Simpson (Ed.), Proteins and Proteomics: A Laboratory Manual, Cold Spring Harbor Laboratory Press (2002).] (C) Hierarchy of peptide separation and MS-based structural analysis in multidimensional LC-MS/MS technology. The peptides in a complex biological mixture are first separated by ion-exchange chromatography according to the charge characteristics, and subsets of peptides eluted from the ion-exchange column are further separated by reversedphase chromatography according to the hydrophobicity. The eluate is continuously sprayed into the mass spectrometer through an electrospray interface for “data-dependent” tandem MS, in which MS and MS/MS data of each peptide are automatically collected for subsequent structural analysis. (See insert for color representation of parts A and C.)

LC-BASED PROTEOMICS TECHNOLOGIES

67

(B)

Sample injection 0M NaCl

Ion exchange elution

0.2MNaCl

0.4M NaCl Pump stopped

Pump stopped

Pump stopped Initialization

LC1 0%B Ion echange

Valve

Stepwise 100%B

Stepwise 50%B Reversed phase

Ion exchange Reversed phase Ion exchange

60%B

Reversedphase elution

20%B

20%B

LC2

Initialization

Gradient

Reversed phase

60%B

Initialization

60%B

20%B

Gradient

Initialization

Pump stopped

Gradient

Time (min)

(C)

Hydrophobicity

y ph ra g to

e

(M S)

a om

M as s sp ec tro m et ry

on

g an ch x e

hr ec

)I

(1

Structural information

(3 )M

(2) Reversed phase chromatography

as s

g ar Ch

(4) Tandem MS

Fig. 2-1. (Continued)

(Fig. 2-1A) (12). Integration of MS adds significant advantages to the multi-LC technique. Since MS is an excellent tool for separating biological samples very precisely according to their molecular mass, direct coupling of MS increases resolution of the multi-LC system (Fig. 2-1C). Combined with an automated MS/MS

68

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

technique, the system also provides structural information on samples simultaneously upon detection, as described in Section 1-2-3. The use of a miniaturized biphasic capillary column—or two or more miniaturized capillary columns—drastically reduced the amount of peptide mixture loaded onto the column(s) and increased the sensitivity of the analysis. In addition, the lowered flow rate increased ionization efficiency, improved the detection of the ionized peptides, and thus increased the number of peptides identified subsequently by MS or MS/MS analysis. The automation of multi-LC has several advantages, not only over manual operation of the technique but also over 2D-PAGE; those include high resolution, reproducibility, ease of sample handling and recovery, and dynamic range for quantification. Most importantly, when combined with MS equipment, an LC system can be connected online to the MS analyzer through an ESI ion source and can supply peptides in the eluent continuously for ionization. Thus, the whole process of LCMS/MS analysis of very complex mixtures can be performed in a completely automated manner (Fig. 2-1A) (13, 15). The power of the mass spectrometer to separate peptides and to analyze almost simultaneously the structure of peptides made it possible to couple multi-LC-MS/MS with a database search algorithm (such as SEQUEST and Mascot, see Section 1-2-3) to identify thousands of peptides in a very complex peptide mixture in a single analysis (Fig. 2-1A,C). The database search specifies proteins that generate the peptides identified by their MS/ MS spectra or sequence tags obtained from MS and MS/MS spectra (see Section 1-2-3). The approach has been called by various terms, such as “direct analysis of large protein complexes (DALPC)” (6), “multidimensional protein identification technology (MudPIT)” (13), or “large scale identification technology with 2D-LCMS/MS (LSIT)” (14). In all approaches, a protein mixture is digested by protease without any further protein separation to generate a mixture of peptides and is directly analyzed by LC-MS/MS, and a database search program is employed to map the peptides onto proteins to determine the original content of the mixture (see Sections 1-2-3 and 1-3-3) (13, 16, 17). Because the approach is similar to that used for shotgun sequencing of genomic DNA, in which fragments of a genome are sequenced randomly and reassembled computationally to determine the genome of an organism, the approach is also called “shotgun” analysis. The term “shotgun” analysis was initially proposed as a general term for a process in which a protein complex was digested into peptides and analyzed via a single dimension of reversedphase column in an ESI-MS/MS system (Fig. 2-2). Accordingly, the strategy for a large scale analysis, using either multi-LC-MS/MS or single LC-MS/MS technology after protease digestion of protein mixtures, is generally called shotgun proteomics and represents a gel-free approach (without using 1D- or 2D-PAGE) based on peptide separation and identification (16, 18). The advantage of shotgun analysis is that peptides possess greater solubility in a wider range of solvents and are therefore easier to separate than proteins. This fact is especially advantageous for the analysis of membrane proteins, which are typically insoluble in aqueous buffers. However, the approach needs to analyze a huge number of peptides in a single analysis and often a large number of peptides in a sample remain unanalyzed by MS/MS. Simplification of the peptide mixture

LC-BASED PROTEOMICS TECHNOLOGIES

69

Fig. 2-2. “Shotgun” analysis by 1D-LC- or multidimensional LC-MS/MS. In this analysis, numerous peptides produced by protease digestion of complex biological samples, such as crude cell extracts or protein complexes, are continuously sprayed, like a shotgun, from the LC spray-tip into the mass spectrometer, and a large volume of generated MS and MS/MS data are processed by a data retrieval system for automatic assignment of peptides and proteins on a genome/protein sequence database. (See insert for color represenation.)

prior to LC-MS/MS analysis, by adopting methodologies such as ICAT or those used in focused proteomics (see Section 1-3), therefore helps to increase the number of species that will be successfully identified by subsequent MS/MS analysis. The ability to decrease the number of coeluting peptides introduced to the spectrometer at any one time is directly proportional to the successful and comprehensive characterization of a proteome sample; thus, multidimensional LC separation of peptides prior to the introduction to MS equipment is preferred for the identification of proteins in a very complex protein mixture. A typical shotgun analysis using multi-LC-MS/MS generates from 10,000 to 100,000 MS/MS spectra in an analysis of extremely complex peptide mixtures, such as a tryptic digest of crude cellular extracts, from which thousands of peptides can be assigned on the gene/protein database (12). Currently, the shotgun approach is a major trend in proteomics and produces a huge number of reports on large scale analysis of biological samples (1, 18). This domination of shotgun analysis may be due to the limitation of mass spectrometers

70

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

currently available, which are more precise for small rather than for large molecules, because the accuracy of mass detection is an important factor for protein identification (see Section 1-2-3). Moreover, although many attempts have been made to separate and identify proteins based on mass measurements of proteins and their fragments subsequently dissociated by using highly accurate mass spectrometers such as FT-ICR MS (the method is called top–down approach) (19), the mass accuracy for intact proteins has a range, such that any modification resulting in only a small change to the total mass of the protein will be within the error of the method; that is, modified and nonmodified forms and/or amino acid substituted and nonsubstituted forms of the protein could not be distinguished. Therefore, it is very difficult to identify proteins just by measuring masses of proteins using commonly available mass spectrometers. Since mass accuracy is best over the low mass range, analysis of peptides rather than intact proteins allows for superior mass detection and, consequently, protein identification. This is why protein identification using MS requires proteolytic digestion of proteins either directly in gel or in the liquid phase before fractionation for MS analysis. Shotgun analysis takes advantage of the performance of both mass spectrometers and LC for peptide separation and of their automation for routine samples. Protein identification based on mass measurements of peptides derived from given proteins is known as the bottom–up approach (see Section 1-2-3). Therefore, shotgun proteomics using LC coupled with MS and/or MS/MS technology has some difficulty in characterizing the entire structure of each analyzed protein. To identify all differences between the predicted and actual protein structures requires Mr or MS/MS data from peptides representing the entire protein, which is difficult to obtain because of the low sequence coverage of shotgun analyses. To allocate a protein to the corresponding DNA sequence on a genome, on the other hand, it is sufficient to obtaine reliable MS/MS data for just a few peptides. In fact, the current minimum requirement for protein identification by prestigious journals is to identify two peptides per protein (19). Thus, shotgun analyses of complex peptide mixtures rarely find peptide mass differences thought to be due to RNA editing, alternative splicing, signal peptide cleavage, and post-translational modifications unless a strategy for concentrating specific proteins or PTMs was used (see Section 1-3). Nonetheless, the shotgun approach based on LC-MS/MS or multi-LC-MS/MS has the best performance in terms of large scale identification and quantification of proteins among currently available proteomics technologies (see Sections 1-3-1 to 1-3-5 and 2-1-1 to 2-1-4). It is an unbiased approach, since large or small proteins of high or low abundance, or extremes in pI were identified with equal sensitivity (1). In addition, the approach is a versatile one, which can be used to identify proteins in samples from a wide variety of sources, such as tissue and cell extracts, organelles, and subcellular structures including membranes, cellular machineries, protein complexes, and those collected by activity-based protein profile methods (see Sections 1-3-2 to 1-3-4) (20–25), and can also be used to identify post-translational modifications (Section 1-3-1), as well as the quantitative comparison of protein expression (see Sections 2-1-1 to 2-1-4).

LC-BASED PROTEOMICS TECHNOLOGIES

71

To make use of the full potential of the shotgun proteomics approach, therefore, it is essential to develop an efficient algorithm that can be automated and allow systematic processing, analyses, validation, and presentation of mass spectral data, as described in Sections 1-2-3 and 2-1-1 to 2-1-4. A number of efforts on how to compare datasets from different MS platforms, given the unique features of each, have also been made, which resulted in the introduction of a series of data analysis platforms, such as mzXML and the Global Proteome Machine; these platforms facilitate data management, interpretation, and dissemination in proteomics research using different instrumentation (18, 25–27). In addition, a number of informatics approaches for the quantitative analysis of complex MS and/or MS/MS datasets generated by multi-LC-MS/MS have been developed for shotgun proteomics (see Sections 2-1-1 to 2-1-4). In the following sections, we describe in more detail the LC system for peptide separation that will allow us to perform successful shotgun analyses of not only moderately complex but also very complex protein mixtures. 2-1-1

LC-MS System for Peptide Separation and Identification

Multidimensional separation of complex peptide mixtures involves the use of two or more separation procedures that have to be orthogonal and compatible with each other (7). Four different mechanisms—including size exclusion (Stokes’s radius), reversed-phase interaction (hydrophobicity), ion exchange (charge), and biospecificity (specific biomolecular interactions)—by which complex peptide mixtures are separated are available and can be used in any combination for multidimensional separation of peptides (Table 2-1) (3, 18). However, the combination of an ionexchange column (mainly strong cation-exchange column—SCX) for the first dimension and a reversed-phase column for the second dimension is the preferred choice for shotgun analysis today. This is partly because the first-dimension separation has to be complemented by reversed-phase (RP) chromatography in the second dimension, and partly because the ion-exchange column has high resolution power and peak capacity comparable to the reversed-phase column. The samples eluted from the RP column are in the most desirable form for injection into the mass spectrometer; that is, elution solvents are entirely volatile and have less suppressive effect for ionization of peptides. Most RP separations are carried out using acetonitrile gradients; improvements in selectivity can be accomplished by addition of an ion pairing agent, such as formic acid, acetic acid, or heptafluorobutyric acid (HFBA), instead of trifluoroacetic acid (TFA), which was used most commonly for conventional HPLC separation of peptides but has unacceptable suppressive effect during ESI ionization for MS analysis unless very low concentration of TFA was used. Ion-exchange chromatography for the first-dimension separation is conducted using salts such as sodium chloride, potassium chloride, ammonium acetate, or ammonium formate at different concentrations in a gradient or stepwise format (3, 7). The selected salt, its concentration, and the buffer composition used in the first dimension should not deleteriously affect the second dimension of separation in terms of the resolution and the

72

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

TABLE 2-1. Various Methods Used to Fractionate a Peptides Based on Their Physical or Chemical Properties Fractionation Method

Physical/Chemical Property

Ultracentrifugation Size exclusion chromatography Isoelectric focusing Hydrophobic interaction chromatography Reversed-phase chromatography Ion-exchange chromatography Affinity chromatography Capillary electrophoresis

Density Stokes’ radius Isoelectric point Hydrophobicity Hydrophobicity Charge Specific biomolecular interaction Size/charge

Source: Reprinted from (Ref. 7). Copyright 2005 with permission from Elsevier. a Size exclusion (SE): peptides or proteins are separated based on their size in solution using noninteractive stationary phases with uniformly sized pores. Anion exchange (AE): peptides or proteins are separated as negatively charged species at high pH interacting with positively charged stationary phases, commonly with amino or ammonium groups, and competing with negatively charged counter ions, such as Cl⫺ or HCOO⫺. Strong cation exchange (SCX): peptides or proteins are separated as positively charged species at low pH interacting with negatively charged stationary phases, commonly with phospho or sulfo groups, and competing with positively charged counter ions, such as Na⫹, K⫹, or NH2⫹. Reversed-phase (RP): peptides or proteins are separated based on hydrophobicity and their interactions with C4, C8, or C18 alkyl chains, for example, on the surface of a stationary phase, and eluted with low to high organic solvent gradient. (From Ref. 18.)

following MS analysis. Because of this, volatile ammonium salts are often used to replace mineral salts (KCl and NaCl) (1, 28) or a desalting step is introduced between the first ion exchange and RP chromatography, if mineral salts were used in the elution buffer (14). LC-MS Interface ESI Apparatus The most commonly used interface between an LC system and the MS analyzer is an ESI device, in which a liquid flowing from a capillary in the presence of a high electric field causes charge separation during formation of a plume of droplets (Fig. 2-3A). ESI requires the delivery of substrates for ionization in the flow of a volatile and very pure solvent to ensure formation of true ions, not charged clusters or particles. The smaller the size of the droplet, the more rapid is the formation of ions at the higher charge density (see later discussion). Ideally, for 100% efficient ionization, the flow and concentration of the sample should be in the range that provides less than one analyte molecule per droplet (7). There are two types of ESI device interfaced between the LC system and MS equipment. One type uses a spray needle different from the separation capillary column. Another type of ESI device uses an all-in-one column (or integrated column), which is combined with a spray needle and separation column in a capillary tube (30, 31). In other words, this type of ESI device uses the capillary column itself as the ESI spray needle.

LC-BASED PROTEOMICS TECHNOLOGIES

73

Fig. 2-3. Devices for electrospray ionization interface: (A) Spray needle tip with a high voltage connecter on a positioning stage of the mass spectrometer. (B) Stainless steel union that connects the capillary tubing and the all-in-one column with a high voltage connecter. (C) Epoxy-free three-way union that connects the capillary tubing and the all-in-one column with a gold wire electrode. [Reprinted by permission of Elsevier from Ref. 38. Copyright (1998) by the American Society for Mass Spectrometry.] (D) Spray needle tip with Taylor cone.

The ESI device in which the spray needle is separated from the capillary column employs a sharply pointed hollow metal tube, such as a syringe needle [injection syringe needle for baby (50 μm I.D., 125 μm O.D., GL Science, Tokyo Japan), PicoTips®; CE, standard coating; N, no coating; D, distal coating; New Objective, Woburn, MA, USA], with liquid pumped through the tube, which can be connected to a capillary column with relatively larger size (e.g., 1–2 mm I.D. ⫻ 30–1000 mm), with flow rates of 0.5–100 μL/min. A spray needle joins a capillary column through a union [Teflon union (Eksigent Dublin, CA, USA), stainless steel union (Valco Instruments, http://www.vici.com/ref/contents.php)] with minimized dead space at the connection site. A high voltage power supply is connected directly to the spray needle of this type because the spray needle is made of stainless steel or coated with gold, graphite, or polyaniline that has electric conductivity. On the other hand, when the ESI spray uses an all-in-one (or integrated) capillary column, a voltage has to be applied to the eluent of the column because the fused silica-based capillary tubing has no electric conductivity. Connecting the union (e.g., Valco Louisville, KY, USA) or cross [Upcharch Scientific, Oak Harbor, WA, USA (http://www.upchurch.com/)]

74

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

that holds an all-in-one capillary ESI column to a high voltage power supply can do this, but contact between the elution solvent and the metal within the union or cross has to be confirmed (Fig. 2-3B,C). A high voltage power supply is connected to the outlet of the ESI device (a spray needle or an all-in-one column) and it is positioned in front of a plate, called a counterelectrode, commonly held at ground potential (Fig. 2-3A). When the power supply is adjusted for the proper voltage, the liquid being pumped through the spray needle transforms into a fine continuous mist of droplets at the tip of the emitter, which fly rapidly toward the counterelectrode. In general, as the liquid begins to exit the needle, it charges up and forms a conical shape, referred to as the Taylor cone, because when charged up, a cylindrical shape can hold more charge than a sphere (Fig. 2-3D). At the tip of the cone, the liquid again changes shape into a fine jet, which then becomes unstable, breaking up into a mist of fine droplets. Since these droplets are all highly charged with the same electrical charge, they repel each other very strongly. Thus, the droplets fly apart from each other and cover a wide surface area (from New Objective; http://www.newobjective.com/electrospray/index.html). Since the shape and size of the outlet of the spray needle seem to affect the efficiency of ionization, the top of the spray needle or all-in-one ESI column (0.5–50 μm I.D.) is usually electropolished (32) or polished gently by using alumina paper to get a fine tip (15, 33). Stainless steel spray needles that are connected with capillary columns have much longer lifetimes than fused silica needles coated with gold, graphite, or polyaniline, which are also connected to capillary columns. However, all-in-one ESI columns seem to have good durability when used as spray needles even if a high voltage is applied to the column. Micro- and Nanocapillary Columns Several prepacked capillary columns, which are packed with reversed-phase particles and are used in combination with an ESI spray needle, are commercially available [capillary EX columns (GL Science, Nomura Chemicals); STYROS columns (OraChrom, Woburn, MA, USA)]. Because there is considerable dead space in the ESI spray needle and possibly in the connection area between the spray needle and capillary column, which can cause peak broadening after elution of peptides from the capillary column, a separate ESI spray needle interface tends to be used for an LC system with relatively larger size columns (e.g., 1–2 mm I.D. ⫻ 30–1000 mm; often called microcapillary columns). and elution takes place at relatively high flow rates (0.5–100 μL/min). A considerable number of multiLC-MS/MS systems use this type of ESI spray needle–LC column interface. When an all-in-one ESI column (or integrated column), which combines a spray needle and separation column in a capillary tube, is used as the ESI device (30, 31), peptides can be sprayed directly into the mass spectrometer soon after elution from the all-in-one column without peak broadening because this type of ESI column has almost no postcolumn dead space. Thus, this ESI column gives better performance for LC-MS/MS analysis when only small amounts of sample are available. Although all-in-one ESI columns are commercially available [e.g., Pico frit columns (New Objective)], many laboratories make their own columns by using fused silica capillary tubing (Polymicro Technology, Phoenix, AZ, USA; GL Science, Tokyo,

LC-BASED PROTEOMICS TECHNOLOGIES

75

Japan) and commercial laser-based micropipette pullers (P-2000, Sutter Instrument, Navoto, CA, USA), or just a gas burner. To achieve efficient ionization with all-in-one ESI columns, the size of the columns is miniaturized; typical columns have dimensions of 50–150 μm I.D. ⫻ 300–1000 mm length. The capillary columns are packed into fused silica capillary tubing with outlet diameters as small as 0.5–10 μm and provide routine detection levels in the low femtomole range for peptides. Those columns are often called nanocapillary columns. Accordingly, elution with a flow rate as low as 20–500 nL/min is required to achieve maximum performance with LCMS/MS analysis. Typical nanocapillary columns are packed with 3–5 μm diameter reversed-phase materials, most commonly C18 silica-based matrices (and SCX for a biphasic column) as described later, although particles with much smaller size up to 1 μm have been used in some cases (6, 15). To eliminate leakage of packing material, sintered silica particle or a membrane supported by a short piece of fused silica capillary (25 μm I.D. ⫻ 150 μm O.D. for a 150 μm I.D. ⫻ 350 μm O.D. column) is used as a frit of an all-in-one column (or integrated column) (31). However, preparation of columns with this type of frit is difficult and the column can easily become clogged. To overcome this problem, two methods to retain packing materials in the capillary columns without frits have been developed. The first method uses a single porous particle to retain the packing particles in the capillary at the outlet of the tapered end with a diameter less than that of a single particle (30). However, the particle easily clogged the column and jammed the elution (31, 34). The second method, on the other hand, uses an outlet diameter of the tapered end that is larger than that of the single particle packed in the capillary tube, and forms an “arch” of particles in a self-assembled manner above the outlet at low pressure at the beginning of the packing process (Fig. 2-4A–C) (34). The “arch” has two styles: the “playing hands style arch” and the “keystone style arch”, which serve as a stable structure to retain the particles in the capillary without clogging the column. The playing hands style arch can be prepared by using, for example, ReproSil 3 μm C18 particles and a capillary needle with 8 μm outlet I.D. (New Objective) (Fig. 2-4D, Protocol 2-1A) (35), while the keystone style arch can be prepared by using Mightysil C18 3 μm (Kanto Chemicals) and 150 μm I.D. ⫻ 375 μm O.D. fused silica capillary tubing (Polymicro Technology, Phoenix, AZ, USA; GL Science, Tokyo, Japan), which is pulled using a laser puller (Sutter Instruments Co., Novato, CA) to obtain an outlet tip size of 8–10 μm I.D. (Fig. 2-4C,E, Protocol 2-1B) (36). These methods for preparing all-in-one ESI columns with “arch” frits for LC-MS allow one to minimize postcolumn dead space, decrease peak width, and provide highly efficient peptide analysis for proteomics. When the internal diameter of the capillary is further reduced from 150 to 15 μm and, consequently, the flow rate is decreased to 20–40 ηL/min, the efficiency of ionization increased ∼100-fold (37). More specifically, a 15 μm internal diameter and long columns (up to 80 cm) equipped with a short precolumn of larger diameter provides a flow rate of 20 ηL/min at 10,000 psi and at the same time enables detection of zeptomole amounts of protein (38). However, with flow rates in this range, it is still a challenge to maintain high efficiency separation and to process typical sample volumes and be effectively coupled to an MS analyzer. The development of highly efficient separation technology with extremely miniaturized LC columns

76

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

Fig. 2-4. Separation columns for LC-MS shotgun analysis: (A) a conventional column with a frit to support packing materials, (B) columns without a frit, and (C) a microscopic view of the outlet of the tapered end of an all-in-one nanospray tip column packed with 3 μm reversed-phase particles (provided by Dr. Y. Yamauchi). (D) A device for the preparation of an all-in-one column, which allows slurry packing of separation media into fused silica capillary tubing under high pressure (e.g., Helium Pressure Cell, Thermo Electron Co., Houston, TX, USA) and (E) alternative devices for preparation of an all-in-one column.

LC-BASED PROTEOMICS TECHNOLOGIES

77

and of technology for handling extremely small volumes may allow one to analyze proteomes of even a single cell in the near future (2, 39). Protocol 2-1: Method of Column Packing for an All-in-One (Integrated) Capillary Column (A) With a High Pressure Gas Device (Fig. 2-4D) (35) • Add 50 mg of C18 particles (ReproSil C18 3 μm) and 0.5 mL of methanol in a 1.5 mL tube, and sonicate with a small starring bar. • Dip the other side of the tapered end of a capilary column tube (75 μm I.D. ⫻ ∼100 mm, fused silica tubing) into the sonicated C18–methanol slurry, and introduce the slurry into the capillary column tube by the capillary phenomenon over a few seconds. • Fill with methanol to the outlet of the tapered end of the capillary column tube by using a disposable syringe filled with methanol that is connected with a capillary adaptor (Innove Quartz capillary adaptor, Phoenix, AZ, USA). • Connect the slurry and the capillary column tube filled with methanol onto a pressure gas device so as to keep contact with the slurry (e.g., Helium Pressure Cell, Thermo Electron Co., Houston, TX, USA; this device can be used for packing 50, 75, 100, and up to 200 μm I.D. capillary columns, http://www.brechbuehler.ch/usa/) as shown in Fig. 2-4D. • Connect the device to a helium gas container, increase the pressure gradually by 50 bar, and maintain the pressure until the capillary column tube is filled with the C18 particles. If necessary, vibrate the capillary column tube to prevent the particles from clogging in midflow. • Release the pressure gradually. • Connect the packed capillary column to the LC-pump system and pressurize by about 200 bar by eluting 80% acetonitrile and subsequently a 2% acetonitrile– water solution. (B) With an Injection Syringe and Stainless Steel Tube Connector (Fig. 2-4E) (36, 40) • Mix one volume of C18 particles (Mightysil C18 3 μm) and about 5 volumes of chloroform:hexanol ⫽ 1:1 solution and sonicate to get a homogeneous slurry. • Draw up ∼0.5 mL of the C18 particle slurry into an injection syringe. • Connect the injection syringe with one side of a stainless steel tube attached with unions at both ends, and fill the C18 particle slurry into the stainless steel tube by pushing the injection syringe. • Connect the other side of the stainless steel tube with the tapered end of a capillary column tube (150 μm I.D. ⫻ ∼50 mm) through a union. • Pressurize the injection syringe by hand to remove air from the capillary column tube.

78

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

• Connect the stainless steel tube filled with slurry to the capillary column tube to LC-pump system, and pressurize the capillary column by pumping methanol. Set a maximum pressure of the LC-pump to 25 MPa. • Cut the tip of the capillary column tube with alumina paper to elute the methanol. Packing Materials The particle size, pore size, surface area, stationary phase, and chemistry of the substrate surface of the packing material determine the separation efficiency. The most popular materials for RP and ion-exchange HPLC column packing used in proteomics are based on spherical silica. Typically, particle sizes of 3 and 5 μm are used for analytical separation of proteins or peptides. The smaller the particle size is, the higher the separation efficiency, for instance, separation efficiency increases by 30–40% when particle size is reduced from 5 to 3 μm for the same column length (2). Much smaller particle sizes than 3 μm elevate the backpressure beyond the pressure limit of most commercially available pump systems for HPLC. Therefore, 3 μm is currently the minimum particle size for practical use. Another factor that affects separation efficiency is accessibility of peptides to the surface of the packing material; this can be achieved by expanding the porosity of the particles. Commercially available particles have pore sizes varying from 60 to 300 Å (6–30 nm), which makes over 90% of the surface available within the pores. Practically, porous packing materials work well for small proteins and peptides obtained by proteolytic digestion of proteins. High molecular weight proteins or extensively modified proteins (e.g., glycosylated proteins) have fairly large radii that cause peak broadening when conventional porous particles are used. To minimize this, an array of macroporous, gigaporous, and gelfilled gigaporous packing materials have been introduced (2). The surface of the silicabased particles packed in RP columns is modified with alkyl chains varying in length from C4 to C18 so as to have hydrophobic interaction with proteins and peptides. The C18 bound particles [e.g., Zorbax SB-C18 (Agilent Technology, USA), Vydac C18 (Grace Vydac, USA), Inertsil-ODS 3 (Varian, Palo Alto, CA, USA), ReproSil AQ-Pur (Dr. Maisch HPLC GmbH, Ammerbuch-Entringen, Germany), Mightysil C18 (Kanto Chemical, Japan), Aqua C18 (Phenomenex, Torrance, CA, USA)] are most popular for the separation of peptides, as they offer retention and selectivity for a wide range of peptides containing different polar and nonpolar groups on their surfaces. The C4 and C8 bound particles are used preferentially for separation of proteins. In ion-exchange columns, the surfaces of the silica-based or polymer layered silica-based particles commonly used for HPLC separation of proteins and peptides are modified with a diethylaminoethyl (DEAE) group as the weak anion exchanger, with carboxylmethyl (CM) as the weak cation exchanger, or with a sulfonic group as the strong cation exchanger (SCX) to achieve electrostatic interaction with the proteins and peptides. Weak anion or cation exchanger-bound particles [SynChropack AX300 (Synchrom, Indianapolis, IN, USA), Bio Rex (Bio-Rad, San Francisco, CA, USA), IEX CM (Toyo Soda Co., Yamaguchi, Japan)] are used mostly for the separation of proteins. Strong cation exchanger-bound particles [PolySULFOETHYL Aspartamide (PolyLC, Columbia, MD, USA), Partisphere (SCX) resin (Whatman, Clifton, NJ, USA), ProPacTM SCX-10 (DIONEX, Sunnyvale, CA, USA), Bioassist-Q (TOSOH, Tokyo, Japan)] are used mostly for the separation of peptides.

LC-BASED PROTEOMICS TECHNOLOGIES

79

In proteomic LC-MS/MS, high speed analysis is a prerequisite for identifing thousands of peptides within a limited time period; the ion-exchange LC and RPLC separations are one of the most time-consuming steps for LC-MS/MS analysis. Therefore, high speed LC packing materials are required. In porous materials, separation speed is limited by the rate of analyte transfer between the mobile phase and stationary liquid in the pores of the matrix particles. Silica monolith columns have macropores with 2 μm diameters (this allows fast flow of the eluent) and fine pores of 13 nm (130 Å) (this provides the surface area required for the separation process, and a total porosity of over 80%) and allow flow rates ten times higher than conventional rates to enhance mass transfer (2, 41, 42). The introduction of monolith columns to LC-MS/MS analysis may further improve the efficiency and throughput of protein identification in proteomics in the near future. 1D-LC System (Application to the Shotgun Analysis of Moderately Complex Protein Mixtures) A solvent delivery system for a capillary column for gradient RP-LC needs to perform at a slow flow rate (e.g., several dozen nL/min to 100 μL/ min). One common approach is to split the flow to deliver solvents to the capillary column at a reduced rate (Fig. 2-5A). [The split type LC system is available from Michrom Bioresources (Auburn, CA, USA; http://www.michrom.com/catalog/ index.php), DIONEX (Sunnyvale, CA, USA; http://www1.dionex.com/en-us/index. html), or Agilent Technologies, Inc. (Palo Alto, CA, USA; http://www.chem.agilent. com/Scripts/Phome.asp).] An advantage of this split flow system is that it performs a gradient separation by using conventional LC systems with a capillary column having a flow rate less than 100 μL/min and also by using conventional LC instruments with a tee tube (Valco) or a commercially available splitter that can change the split ratio of the two flows (splitter valve M405S or M472, Upchurch, Oak Harbor, WA, USA; http://www.upchurch.com/) (31, 43). The split ratio is usually set to no less than 1:1000 to obtain reproducible results. The split flow system can easily be assembled with a conventional microflow LC system. For example, to elute a capillary column with a backpressure of 70 bar at a flow rate of 200 nL/min by using a LC system with the flow rate of 100 μL/min, the length of the capillary tubing with a backpressure of 9 bar/10 cm ⫻ 50 μm I.D. at 100 μL/min for the split flow can be set to 70/9 ⫻ 10 ⫽ 78 cm. Changing the length of the capillary tubing for split flow allows a fine adjustment of the flow rate for elution of the column (35). It is, however, susceptible to pressure changes during chromatography, and it is therefore difficult to maintain a constant flow in order to assure a reproducible analysis, particularly at a nanoscale flow rate, although the currently developed split-type LC system (e.g., Agilent 1100 system) uses a feedback control device to maintain the constant split ratio during analysis and gives relatively stable flow rate at sub nanoliter level. Another approach for performing gradient RP-LC is to use a direct delivery pump to deliver solvents to a capillary column at several dozen nL/min (Fig. 2-5B). The main problem with this approach is that it is difficult to achieve a reproducible gradient elution at such a low flow rate; that is, it is very difficult to reproduce a solvent gradient in a volume as small as a few microliters or less. A “preformed gradient” device, in which the solvent gradient formed by a separate solvent delivery system is

80

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

stored in a loop and then transferred to an analytical line by a switching valve, has been developed (44). However, it requires solvents to be filled up manually before every analytical run, and the gradient profile is susceptible to time-dependent changes due to diffusion during storage in the loop. In addition, it can hardly be applied to the low nanoliter gradient device because of the dead space of the loop instrumentation. These problems have been solved by the development of a full-scale automated gradient device, called the “revolving nanoconnection” (ReNCon system, Nanosolution Inc., Wakayama, Japan). The device can form almost any gradient in volumes as low as a few microliters just before starting every gradient elution. The ReNCon system in conjunction with a direct splitless nanoflow delivery pump [KYA Technologies, Tokyo, Japan]; the whole LC system is called a direct nano LC (DNLC) system] performs gradient elution of all-in-one ESI capillary columns at a very low flow rate, even as low as 40 nL/min, in a pressure-resistant manner (Fig. 2-5B) (15).

Fig. 2-5. Nanoflow LC system to operate all-in-one spray tip column (A) with and (B) without a splitter. In a split flow system, the elution solvent is divided into a certain ratio (e.g., 5:95) by a splitter to reduce the flow rate and a small portion of the solvent is delivered to the spray tip column. In a direct flow system, a nanoliter volume of solvent pumped out by a “nanoflow” pump is delivered directly to the spray tip column. The direct flow system has advantages over the split flow system in stability and reproducibility; however, a high precision nanoflow pump and a small volume gradient forming device compatible with the nanoliter flow rate is required in a direct flow system, while conventional pump and gradient devices can be used in a split flow system.

LC-BASED PROTEOMICS TECHNOLOGIES

81

The ReNCon device consists of ten-channel solvent reservoirs connected by means of a ten-port electrical switching valve and a manifold. Each reservoir is filled with step elution solvent for RP-LC supplied from two separate reservoirs for initial and final solvents using an automated high flow-rate mixing module (Fig. 2-6A). The ten-channel solvent reservoirs are made of PEEK tubings (8 cm length, 1.6 mm O.D., 254 μm I.D.; inside volume, 4.0 μL) connected between a ten-port manifold (Z10M1, Valco, Texas) and a ten position-switching valve (C5-1000, Valco) with finger-tight fittings. All these modules are assembled through a two-way six-position switching valve (Fig. 2-6A) that serves to switch the flow path of the system—one for solvent feed to the ReNCon device and another for analysis (Fig. 2-6B) (15, 33). In the feed

10 position Valve

(A) 10 port manifold

Gradinet Channel (ReNCon) 2 Position 6 Port Valve High Pressure Feeding Pump A

Nanoflow Pump 30 mm I.D. x 250 mm Injection Port

2 3

1 4

6 5

High Pressure Feeding Pump B

ESI column

MS 50 mm I.D. x 150 mm

Sample Loop Waste

30 mm I.D. x 400 mm

Fig. 2-6. (A) Assembly of the direct nanoflow LC system. The system consists of a high pressure syringe pump for constant nanoflow solvent delivery, a gradient device designated as a revolving nanoconnection (ReNCon) system consisting of ten channel solvent reservoirs for step elution solvents to carry out reversed-phase LC, a high pressure pump mixing module for solvent feed to the gradient device, and a fritless spray tip column packed with 3 μm reversed-phase beads. These modules are assembled around the two-position six-port valve, which switches the flow path of the system from the solvent feed to the analysis position or vice versa. (B) Flow diagram of the direct nanoflow LC system. The flow path of the system is switched by the two-position six-port valve shown in (A) from the solvent feed (upper) to the analysis (lower) position or vice versa. (C) General scheme of the direct nanoflow LC system. The solvent reservoirs of the ReNCon gradient device are connected between a ten-port electrical switching valve and a manifold. By sequentially rotating the electrical valves for programmed time duration, each solvent is transferred step-by-step to the fused-silica capillary, where a linear gradient is generated by diffusion of solvent boundaries during transfer to the fritless spray tip column. [A, B, C: Reprinted with permission from Natsume et al. (Ref. 15). Copyright (2002) American Chemical Society.] (D) Diagram of a 1/16 inch LC end fitting as a packing bed support for a glass capillary column, with pictures of ferrules, unions, and a capillary tube cutter.

82

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

Fig. 2-6. (Continued)

LC-BASED PROTEOMICS TECHNOLOGIES

83

Fig. 2-6. (Continued)

position, programmed step gradients for RP-LC are generated by the high pressure pump module and are supplied to reservoirs of the ReNCon device via an inlet on the ten-port manifold (Fig. 2-6B). Thus, each reservoir is filled with different RP solvents with increasing acetonitrile concentrations by rotating the ten position-switching valve automatically at programmed time intervals. After the filling process, the ReNCon device is connected to the nanoflow pump by switching the six-way valve to the analysis position. By rotating the ten-position electrical valves of the gradient device sequentially for an appropriate time duration, each solvent in the ten channel reservoirs is transferred step-by-step to the fused silica capillary connected to the all-in-one ESI column, where a linear gradient is generated by diffusion of solvent boundaries during transfer to the column (Fig. 2-6C). This gradient device can produce almost linear profiles of a solvent gradient over a wide range of flow rates, in particular, at extremely low flow rates (e.g., 40 nL/min during an hour) (Fig. 2-6C). The ReNCon device is integrated into a DNLC system with a single pressure-driven nanoflow pump (Figs. 2-5B and 2-6A) (Nanosolutions Inc., Wakayama, Japan; KYA Technologies, Tokyo, Japan). The whole DNLC system therefore consists of two high pressure LC pump modules [feeding pumps A and B in Fig. 2-6A, Shimadzu (Kyoto, Japan); HP1100 G1312A, Hewlett Packard (USA)] to generate and to feed stepwise gradients to the ReNCon device, an injection valve (Cheminart C2-0006, Valco, TX) for sample loading, and a constant flow nanoflow pump with a pressure limit of ∼300 bar (Fig. 2-6A) (Nanosolution, Wakayama, Japan; KYA Technologies, Tokyo, Japan).

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

84

Because the described LC system works in a flow system with very small volume and elutes peptides from a column at a flow rate of nanoliters/minute, all dead space present in the LC system critically affects the performance of peptide separation and the efficiency of the protein identification by subsequent MS analysis. Therefore, when one assembles an LC system, dead spaces resulting from the connection process among the tubing lines, unions, and switching valves have to be eliminated as much as possible. Selection of parts and tubing also affects the dead spaces introduced during system assembly; the inner and outer diameters of capillary tubing and the compatibility of materials, valves, ferrules, PEEK fittings, and unions have to match completely to the system assembly (Fig. 2-6D). In particular, dead spaces can often be introduced by unskilled cutting of capillary tubing. Use of commercially available tube cutters [Hewlett-Packard, Capillary and Fitting Kits (Agilent)] is therefore recommended to cut capillary tubing when connecting capillary tubing to the column, valve, and injector (Fig. 2-6D). The 1D-RPLC-MS/MS systems have often been used to analyze peptide mixtures generated from a purified protein or a relatively simple protein mixture, such as that obtained by in-gel protease digestion of a excised staining band on a 1D-SDS-PAGE or 2D-PAGE gel (see Fig. 1-4B, Protocol 1-1). In those analyses, the 1D-RPLC-MS/ MS approach has a high probability of identifying the proteins analyzed; it usually detects many peptides from the same protein and thus gives high coverage of peptides for a single protein, which is advantageous in identifying post-translational modification of a protein (see Sections 1-3-1 and 2-1-2). This 1D-RPLC-MS/MS approach can also be used for “shotgun” analysis, as described in Section 2-1. A combination of 1D-RPLC-MS/MS with data-dependent collision-induced dissociation MS/MS plus automated data processing, when used for shotgun analysis, allows identification of up to 250 protein components in a low femtomole amount of a moderately complex protein mixture, such as that of functional multiprotein complex isolated by immunoaffinity purification in machinery or complex (interaction) proteomics (see Section 1-3-4), and of subcellular structure isolated by ultracentrifugation fractionation in subcellular (organelle) proteomics (see Section 1-3-3). Because the 1D-RPLC-MS/MS system with an all-in-one ESI column has very high resolution and gives highly reproducible separation, it is especially suitable for the label-free quantification of extremely complex protein mixtures and absolute quantification without using an internal standard, as described in Sections 2-2-3 and 2-2-4. Thus, the 1D-RPLC-MS/MS system is very useful in various strategies in proteomics. 䉴 Experimental Example 2-1 Shotgun analysis of preribosomal ribonucleoprotein (pre-rRNP) complex by using 1D-RPLC-MS/MS (DNLC system) (15, 45, 46). MATERIALS • Achromobacter protease 1 (Lysylendopeptidase, Lys-C) (Wako Pure Chemicals, Osaka, Japan).

LC-BASED PROTEOMICS TECHNOLOGIES

• • • • • •

85

Trypsin (sequence grade) (Promega, Madison, WI, USA). Octylglucopyranoside (Sigma, St. Louis, MO, USA). HPLC-grade acetonitrile (Waken Chemical, Tokyo, Japan). HPLC-grade formic acid (Waken Chemical, Tokyo, Japan). A solvent (0.1% formic acid in water). B solvent (0.1% formic acid in acetonitrile).

APPARATUS • All-in-one capillary ESI column: 0.15 mm ⫻ 50 mm (Mightysil RP 18 GP, 3 μm particles, Cica, Tokyo, Japan), packed according to Protocol 2-1B. • Direct nano liquid chromatography (DNLC) system with a ReNCon gradient device (Nanosolution, Wakayama, Japan). • Centrifugal vacuum concentrator. • Electrospray-ionization Q-ToF tandem mass spectrometer (Waters-Micromass Q-ToF 2). SOFTWARE TOOLS • MassLynx. • Mascot. • STEM 13 (available from http://www.sci.metro-u.ac.jp/proteomicslab/). APPARATUS SETUP • 1D-RP-LC (see Figs. 2-5B and 2-6). RP-LC GRADIENT ELUTION (FIG.

Line Number of ReNCon Reservoirs from a Ten-Port Valve 1 and 2 3 4 5 6 7 8 9 10

0.1% Formic acid 5% Acetonitrile in 0.1% formic acid 10% Acetonitrile in 0.1% formic acid 15% Acetonitrile in 0.1% formic acid 20% Acetonitrile in 0.1% formic acid 25% Acetonitrile in 0.1% formic acid 30% Acetonitrile in 0.1% formic acid 40% Acetonitrile in 0.1% formic acid 70% Acetonitrile in 0.1% formic acid

2-7A) Loading Time and Flow Rate to Form a Standard Gradient 5 min ⫻ 100 nL/min 5 min ⫻ 100 nL/min 5 min ⫻ 100 nL/min 5 min ⫻ 100 nL/min 5 min ⫻ 100 nL/min 10 min ⫻ 100 nL/min 30 min ⫻ 100 nL/min 30 min ⫻ 100 nL/min

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

86

PROCEDURE Protease Digestion of Pre-rRNP Complex in Solution 1. Precipitate 0.2 μg of a nucleolin (NCL)-associated pre-rRNP complex isolated by immunoprecipitation (see Section 3-3-2, snapshot analysis) using 20 μL of mixed methanol and chloroform (1:1 v/v). 2. To collect the precipitate centrifuge at 15,000 rpm for 1 min, and remove the supernatant. 3. After vacuum drying, digest the precipitate with 5 μL of Achromobacter protease I (40 pM, substrate-to-enzyme ratio ⫽ 50:1) dissolved in Tris buffer (50 mM Tris-HCl, 6 M urea, 0.005% n-octylglucopyranoside, pH 9.0) overnight at 37 ⬚C.

Preparation of Control Peptides 4. As a control peptide mixture, dissolve the tryptic digest of human serum albumin (HSA) (5 mg) in 2 mL of 0.5 M Tris-HCl (pH 8.5) containing 7 M guanidium hydrochloride and 10 mM EDTA. 5. Add dithiothreitol to a 3 mM final concentration while gentle bubbling the solution with N2 gas. 6. After 2 h, add 40 mg of iodoacetamide, and stand in the dark for 1 h at room temperature. 7. Dialyze the mixture extensively against 10 mM ammonium bicarbonate buffer (pH 8.0) and recover the S-carbamoyl HSA by lyophilization. 8. Digest the S-carbamoyl HSA with trypsin at 37 ⬚C for 8 h in 10 mM ammonium bicarbonate buffer (pH 8.0) at an HSA-to-trypsin ratio of 50:1. 9. Inject the digestion mixture directly at a flow rate of 500 nL/min for 10 min (5 μL) onto the all-in-one nanocapillary column via a 5 μL sample loop; then decrease the flow rate to 50–100 nL/min. 10. After backpressure of the column becomes stable (within 5–10 min), elute the peptides by a 60 min gradient from 0% to 70% B solvent, generated by the ReNCon system (Fig. 2-6A–C and Fig. 2-7A). LC-MS/MS Analysis 11. Analyze the peptide mixture using a DNLC system connected to an electrospray-ionization tandem mass spectrometer. A direct nanoflow allin-one capillary ESI column is utilized. The chromatography is performed automatically under the time-dependent control program, and the eluate is sprayed directly into a high resolution Q-ToF2 hybrid mass spectrometer for data-dependent MS and MS/MS analyses. Up to four precursor ions above the intensity threshold of 10 counts/second are selected for MS/MS analyses from each survey scan.

LC-BASED PROTEOMICS TECHNOLOGIES

87

Fig. 2-7. (A) ReNCon gradient device. The ten-channel solvent reservoirs of the device are filled with step elution solvents to carry out reversed-phase LC and each solvent is transferred step-bystep to the fused-silica capillary by sequentially rotating the electrical valves for programmed time duration. A linear gradient is generated by diffusion of solvent boundaries during transfer to the fritless spray tip column. (B) Base peak chromatogram of the tryptic digest of human serum albumin (100 femtomole) analyzed by the direct nanoflow LC system connected to quadrapole/ time-of-flight hybrid MS. (C) Total ion MS/MS chromatograms. [Reprinted with permission from Natsume et al. (Ref. 15). Copyright (2002) American Chemical Society.]

88

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

Protein Identification 12. Acquire the MS/MS signals by MassLynx (Waters Micromass, Manchester, UK) and convert to text files by ProteinLynx software (Micromass). 13. Perform the database search triplicate by Mascot (Matrix Science Ltd., London, UK) against protein sequence databases of NCBInr (Refseq, Swiss-Prot) for the appropriate species, with the following parameters: fixed modification; carbamoylmethylation (Cys); variable modifications of deamidation on Asn, pyroglutamination on N-terminal Gln, and oxidation (Met); maximum missed cleavages, 3; peptide mass tolerance, 150 ppm; and MS/MS tolerance, 0.5 Da. Criteria for match acceptance are as follows: (a) When the match scores exceed 30 above the threshold, identifications are accepted without further consideration; (b) When scores are lower than 30 or identifications are based on single matched MS/MS spectra, the raw data are inspected prior to acceptance. (c) Peptides assigned by less than three y series ions and those with a ⫹4 charge state are all eliminated regardless of scores. 14. Where necessary, evaluate the dataset (DAT file obtained by Mascot search) by the V (view)-mode processing program in STEM 13 software (http:// www.sci.metro-u.ac.jp/proteomicslab/) to remove unreliable Mascot peptide identifications and redundant assignments and to integrate the results with key information in the experiment (47). In the V mode of STEM software, the information for rapid and automated identification of proteins can be extracted from Mascot search results, and the identified proteins can be properly organized and displayed along with key information from the experiment. 15. Convert each MS/MS spectrum to an independent peak list (DAT) fi le using MassLynx (Micromass) to obtain the dataset used for the evaluation by STEM. The STEM V mode collects important information from DAT fi les to confi rm peptide identification and outputs the data to a flat text fi le. Each DAT fi le is read to acquire Ions Score, Threshold Score, the identified sequence with modifications, precursor mass, calculated precursor mass, charge state, the number of assigned y-series ions, the number of assigned b-series ions, and the protein or genetic locus from which the peptide derives. The STEM V mode stores the information from DAT fi les and database lookup in the STEM txt fi le, and fi lters out ambiguous peptide identifications according to the significant threshold calculated from each Threshold Score and the number of y- or b-series signals. In these steps, the threshold and the number of series ions, where 95% confidence and 3 are the respective default settings, can be set. Only the information that passes through these steps is judged as an “identified peptide” and carried forward to catalog the data, in which the peptides are sorted by locus, and peptides for each locus are sorted by sequence. Finally, the same database used by Mascot is searched for the descriptive

LC-BASED PROTEOMICS TECHNOLOGIES

89

name, peptide positions, and percentage of identified peptide sequence in the full protein sequence (47). RESULT To validate resolution and sensitivity of the DNLC-MS/MS system, the tryptic digest of reduced and S-carbamoyl HSA was applied to the DNLC system and the eluate was sprayed directly into a Q-ToF hybrid mass spectrometer to generate MS and MS/MS data. The DNLC system exhibited an excellent peak resolution and reproducibility, as illustrated by the base peak chromatogram obtained with a 100 fmol HSA digest (Fig. 2-7B). The average half-width of ten major peaks was 7.0 s (standard deviation ⫽ 0.7) and variations in retention times of these peaks were less than 0.5 min in three repeated analytical runs. The detection limit of this system was estimated to be 120 attomol at a signal-to-noise ratio (S/N) of 2, based on a fragment with m/z ⫽ 682.383⫹ (indicated by a closed arrow in Fig. 2-7B) detected using a single ion monitor. The 44 peptides were assigned to HSA, which corresponded to 66% coverage of the total sequence (401 among 609 total residues). To demonstrate the potential of this system in interaction proteomics, we analyzed the peptide mixture generated from the human NCL-associated prerRNP complex (∼0.2 μg) by Achromobacter protease I digestion by using the DNLC-MS/MS system (Fig. 2-7C). Sequence database searching with the MS/ MS spectra of more than 7500 obtained by three runs of the DNLC-MS/MS system for Achromobacter protease I digest of the NCL-associating RNP complex resulted in a total of 1080 identified peptides, and 134 proteins met our identification criteria. Namely, at least two different peptides were identified in a single run of the DNLC-MS/MS system, and/or at least one peptide was identified at least twice in the three separate runs of the system with highly significant database matching scores. LIMITATION OF THE METHOD The DNLC-MS/MS system described here is very rapid, sensitive, and versatile for identifying hundreds of peptides generated from a moderately complex protein mixture. However, when relatively complex protein mixtures that contain more than 200 proteins are analyzed, many MS peaks are not included in the MS/MS analysis because of the relatively small capacity of MS/MS analysis due to the limited resolution power and flow rate of 1D-LC. This problem can be solved to some extent by using column-packing particles smaller than 3 μm I.D. and by reducing the flow rate of DNLC. However, the use of smaller particle size required high pressure pumping that exceeded the limit of currently available pumps, and a stable flow rate was not obtained at less than 40 nL/min. In theory, by reducing the flow rate by, for example, 10 nL/min, the number of MS peaks

90

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

analyzed by MS/MS can increase four times using the same MS equipment. To increase the performance of a DNLC-MS/MS system, therefore, we must develop flow pumps that can stand high pressure and/or flow at rates as low as 10 nL/min or less. 䉳 2D-LC System (Application to the Shotgun Analysis of Extremely Complex Protein Mixtures) Although any combination of separation columns with different separation modes (Table 2-1) can be used for multi-LC, the combination of ion-exchange and reversed-phase columns has been applied most commonly in multi-LC-MS/MS analysis (6, 13, 14, 17, 48, 49). When ion-exchange and reversed-phase columns are used, the result of multi-LC analysis provides information about the charge and hydrophobicity of peptides, because the chromatographic behavior of peptides is relatively well studied for both ionexchange and reversed-phase columns, and high chromatographic resolution for peptides is well established. In principle, the recovery and the retention time of the expected peptides eluted from a reversed-phase column can be predicted because the retention time in reversed-phase chromatography is linearly related to the natural logarithm of the sum of the hydrophobicity [e.g., the value of ln(1 ⫹ H), in which H indicates hydrophobicity of the peptide, is used to estimate the retention time in reversed-phase chromatography], which can be estimated from the amino acid composition of the peptide (50, 51). The charge of the peptide at a given elution condition for ion-exchange chromatography can also be calculated from the amino acid composition because each of the amino acids has its own specific isoelectric point; thus, the approximate order of the peptides eluted from an ion-exchange column at a given condition can also be predicted (3, 4). Those predictions help to evaluate the results of MS and/or MS/MS analysis of complex peptide mixtures; namely, most of the peptides that do not separate on the ion-exchange column and/or on the reversed-phase column (or peptides that pass through without interacting with the columns) cannot be subjected to MS and/or MS/MS analysis during multi-LC-MS/MS analysis; thus, they should be eliminated from the results of the data obtained (52–54). In a typical case using an anion-exchange column equilibrated at pH 8.0 as the fi rst dimension and a reversed-phase column eluted by a linear gradient of acetonitrile from 0% to 60% in 0.1% TFA, for example, the peptides with the value of ln(1 ⫹ H) ⬍ 3 could not be separated by the second reversed-phase column chromatography. The peptides not subjected to ion-exchange or reversed-phase chromatography should be considered invalid peptides, even if they were hit by database searching of the resulting MS/MS spectra. Validation of peptide hits using chemical properties of peptides and human expert knowledge is often required to gain a higher level of confidence in protein identification, which can be enormously time consuming (18). Thus, the rapid high throughput production of shotgun proteomic data necessitates the development of statistical approaches to evaluate and assemble interpreted MS/MS results into protein identifications from a given sample (18, 55–61).

LC-BASED PROTEOMICS TECHNOLOGIES

91

The prediction also helps to select the column used, especially for the first dimension. The peptides with the positive charges were expected to pass through the fi rst anion-exchange column, although the second column can separate those and the behavior of peptides on the reversed-phase column seems to contribute more critically than that on the ion-exchange column in the multi-LC-MS/MS analysis. If the number of peptides, which are expected to pass through an anion-exchange column, was large, a cation-exchange column may be a better choice for the first separation of peptides; these predictions, therefore, are useful for the selection of the columns to be used in a multi-LC system. In addition, these predictions could be used in a proportion of absolute quantification methods, typically one with emPAI values that are calculated based on the number of peptides analyzed by MS and that have a linear relationship to the logarithm of the injected amount of the peptides in MS analysis (see Section 2-1-4) (62). Most of the major methods on both relative and absolute quantifications of proteins in current proteomics involve shotgun methodology using LC-MS or LC-MS//MS as described in Sections 2-1 and 2-2. Both anion- and cation-exchange columns can be used for the peptide ionexchange separation as described; however, a strong cation-exchange (SCX) column is currently preferred for the fi rst-dimension separation primarily because of buffer pH compatibility with subsequent RP separation and the use of positive ion mode mass detection. Buffers for optimal operation of LC with an anion exchange column are usually in the pH range of 7 or greater, which can cause column degradation and poor chromatographic reproducibility with silica-based RP columns (63). In addition, the SCX column has a much greater loading capacity than the RP column, and acts as a peptide reservoir, storing peptides until a peptide subset is eluted to the RP column with incremental increases in salt concentration in the LC gradient or stepwise elution solvent. Furthermore, the choice for the fi rstdimension ion-exchange column may consider the presence of materials coexisting with the proteins, such as nucleic acids in complex biologically derived samples (e.g. cell lysates or cytosolic fractions), which strongly interact with anion groups on the column depending on the pH of the elution buffers, and potentially cause secondary interaction of proteins with DNA/RNA through polyphosphate-based weak cation exchange. For a 2D-LC system with a column of SCX resin placed directly upstream from the C18 resin in a capillary column, the dislodged or eluted peptides from the firstdimension SCX column are separated on the RP column using an acetonitrile gradient and, after reequilibration, another fraction of peptides is displaced from the SCX to the RP with an increase in salt concentration. The iterative process of salt elution followed by RP separation is repeated until the reserve of peptides on the SCX is exhausted. This method greatly increases the number of digested proteins that can be analyzed and enhances the detection of low abundance proteins in the mixture (1). In this type of 2D-LC separation, one must decide on how many successive pulses with increasing salt concentration should be used or how many fractions should be collected in an off-line salt gradient fractionation. The answer depends on factors such

92

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

as the complexity of the peptide mixture, sample size, gradient time, available time, MS mode, and the purpose of the proteome analysis. Another aspect of the SCX column separation that has to be considered for efficient peptide separation and subsequent MS performance is the buffer type, concentration, and pH. In general, high salt concentrations result in sharp peaks, short elution times, and a large number of coeluting peptides. On the other hand, low salt concentrations result in broad peaks, long elution times, and a reduced number of coeluting peptides (7). The fundamental requirements for ideal multidimensional separation are that the mechanisms of separation must be orthogonal and that no resolution gained in the first dimension is lost in any subsequent dimension. The ultimate goal is to maximize the peak capacity of a separation system, where peak capacity is defined as the maximum number of peaks that can consecutively stack into accessible separation space with a complete baseline separation between neighboring peaks (18). A major concern with protein separations, particularly with the practical 2D-LC methods used mostly today, however, is the possibility that a single molecular species is observed in multiple fractions due to fraction splitting. This phenomenon complicates data analysis and can lead to erroneous data interpretation as to the number of distinct entities separated. Because of this, the subsequent MS and MS/MS analysis identifies the same peptide more than once. There are potentially two major causes for the multiple occurrences of this protein multi-identification. First, if a protein elutes at the end of an ion-exchange step, some of the mass may remain bound on the column and be eluted in the subsequent step. Implementing a step-gradient approach, rather than the more usual continuous gradients used for protein chromatography, can minimize this fraction splitting, although there are strong arguments against using a step-gradient approach and in favor of linear gradients in terms of maximizing peptide resolution (63). This fraction-splitting phenomenon is characterized by the appearance of the same mass at comparable reversed-phase retention time in consecutive ion-exchange fractions. The second potential mechanism for the appearance of components in multiple fractions is carry-over from the RP-LC step. This can be attributed to peptide remaining on the stationary phase following a gradient run and appearing in subsequent analyses even if no new sample injection is made (63). This phenomenon tends to be observed for larger peptides with higher hydrophobicity. Although there are many concerns, including those discussed earlier, about improving multidimensional LC separation of extremely complex peptide mixtures, it is generally accepted that no single chromatographic method is capable of resolving such complex mixtures of peptides generated from a global proteolytic digest of a proteome. The multidimensional separation in conjunction with MS and/ or MS/MS is certainly a very powerful approach for the identification of a large number of proteins in biological samples via bottom–up methodology using MS technology. Increasing the number of dimensions of separation prior to MS analysis increases the number of peptides that can be identified; however, a balance between the time invested and the overall results obtained must be carefully considered (7). Taking those into consideration, 2D-LC separation is the most practical and efficient choice of multidimensional methods that can be considered to date.

LC-BASED PROTEOMICS TECHNOLOGIES

93

䉴 Experimental Example 2-2 Shotgun analysis of extremely complex peptide mixtures using multidimensional LC-MS/MS system (12, 14, 64–68). MATERIALS • • • • • • •

Dithiothreitol. Iodoacetamide. 10 mM ammonium bicarbonate buffer (pH 8.0). TPCK-treated trypsin (Cooper Biochemicals). 6 M HCl. Ammonia, aqueous. 0.5 M Tris-HCl (pH 8.5) containing 7 M guanidium hydrochloride and 10 mM EDTA. • Elution buffers and solvents: B1 buffer: 0.025 M Tris-HCl buffer (pH 8.0). B2 buffer: 0.025 M Tris-HCl buffer (pH 8.0) containing 0.4 M NaCl. B3 solvent: 0.2% formic acid (HPLC grade) in water. B4 solvent: 0.2% formic acid in acetonitrile (HPLC grade). B5 solvent: 1% formic acid in water. APPARATUS • First dimension SCX column; Bioassist Q (5 μm, 2 mm I.D. ⫻ 35 mm; Tosoh). • Second-dimension RP column; Mightysil C18 (3 μm, 0.32 mm I.D. ⫻ 100 mm; Kanto Chemicals, Tokyo, Japan). • Trap RP column; Mightysil C18 (15 μm, 1 mm I.D. ⫻ 5 mm; Kanto Chemicals, Tokyo, Japan). • HPLC assemblies (Model SCL-10AD, Shimadzu) with an SCX column and an RP column connected in tandem through a six-way column-switching valve (Model C2-0006, Valco). • ESI spray needle; injection syringe needle for baby (50 μm I.D., 125 μm O.D., GL Science, Tokyo, Japan). • Electrospray-ionization Q-ToF tandem mass spectrometer (Waters-Micromass Q-ToF 2). SOFTWARE TOOLS • MassLynx. • Mascot. • STEM 13 (available from http://www.sci.metro-u.ac.jp/proteomicslab/).

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

94

APPARATUS SETUP • 2D-LC (see Fig. 2-1A). The first-dimension column is connected in tandem through a six-way column-switching valve, which is also used to insert a small trap column between the first and second columns (Fig. 2-8A). This trap column removes salts from peptides eluted from the first-dimension LC, an important process for the efficient ionization of peptides using MS analysis and thus for a reproducible analysis. • Step gradient elution for the first-dimension LC (Fig. 2-1B).

Cycle

% B1

%B2

Elution Time (min)

Interval Time (min)

Flow Rate (μL/min

Prerun

100

0

20

0

100

100 95 90 85 80 75 70 60 50 0

0 5 10 15 20 25 30 40 50 100

5 5 5 5 5 5 5 5 5 5

55 55 55 55 55 55 55 55 55 0

100 100 100 100 100 100 100 100 100 100

Sample Injection 1 2 3 4 5 6 7 8 9 10

• Gradient elution for the second-dimension LC (Fig. 2-1B).

Cycle Initialization

% B3

%B4

Elution Time (min)

Interval Time (min)

Flow Rate (μL/min

100

0

20

0

5

5 40 15

0 0 0

5 5 5

Each Cycle (Repeat Ten Times) Sample injection

100 95→40 100

0 5→60 0

PROCEDURE Preparation of Peptide Samples 1. Dissolve 200–500 μg of protein sample [e.g., soluble and insoluble protein extracts from E. coli cell extract (64), Caenorhabditis elegans (14, 68), mouse

LC-BASED PROTEOMICS TECHNOLOGIES

2. 3.

4.

5. 6. 7.

95

ES cells (65, 67), or mouse brain (66)] in 200 μL of 0.5 M Tris-HCl (pH 8.5) containing 7 M guanidium hydrochloride and 10 mM EDTA. Bubble the solution with N2 gas for 10 min. Add 100 μg of dithiothreitol (3 mM final concentration of DTT) to the protein solution and mix under N2 gas bubbling at room temperature for 2 h. After 2 h, add 250 μg of iodoacetamide (7 mM final concentration of iodoacetamide) and leave the solution in the dark for 1 h at room temperature. Dialyze the solution against 200 mL of 10 mM ammonium bicarbonate buffer (pH 8.0) and change the dialysis buffer every 2 h three times. Alternatively, ultrafiltration can be done to remove the excess reagents. Add 1–2 μg of TPCK-treated trypsin into the protein solution and digest overnight at 37˚C. Acidify the digest to pH 2 by adding an aliquot of 6 M HCl and, where necessary, remove any precipitates by centrifugation. Neutralize the supernatant with aqueous ammonia to pH 8, dilute the peptide mixture with an equal volume of water, and apply to the 2D-LC-MS/MS system.

First Step of “2D-LC” Performance 8. Activate the computer program that controls the HPLC systems. The stepgradient elution for the first-dimension LC and the gradient elution for the second-dimension LC are synchronized automatically by the computer program (see Apparatus Setup, Fig. 2-1A,B). The total time required to analyze the protein sample using this procedure is 12 h. Preconditioning 9. Equilibrate columns C1 and C2 for 20 min with buffer B1 and solvent B3, respectively, at flow rates of 100 μL/min (for column C1) and 5 μL/min (for column C2) at the six-way valve position shown in Fig. 2-8A. Sample Loading and Ion-Exchange Chromatography with a Stepwise Elution by LC-1 10. Apply the peptide mixture generated by tryptic digestion of a sample of interest to column C1 through the sample injector (In, Fig. 2-1A), and elute with buffer B1 for 5 min at a flow rate of 100 μL/min. Desalting 11. Mix the eluent online with solvent B5, which is continuously supplied by pump 5 (P5). The mixture flows into a reversed-phase “trap” column, where

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

96

salts are separated from peptides and proteins by continuous washing with solvent B5 (Figs. 2-1A and 2-8A). Column Switching and RP Chromatography with a Gradient Elution by LC-2 12. LC-1 is automatically stopped, and simultaneously the six-way switching valve moves to connect the trap column to column C2 (Fig. 2-8B). 13. Introduce the salt-free peptide sample adsorbed on the trap column into the RP column (C2) and separate them for 40 min. with a gradient formed by LC-2 from solvents B3 and B4 (5–60% acetonitrile in 0.2% formic acid) (Fig. 2-1A,B) at a flow rate of 5 μL/min. MS and MS/MS Analysis 14. Load the eluted samples into the ion source of the mass spectrometer through an ESI spray needle. MS and MS/MS data are collected automatically. Data Collection and Database Searching 15. MS/MS data are generally converted to text files listing mass values and intensities of the fragment ions. The text files are analyzed using appropriate search engines that scan gene and protein databases and report back with peptide assignments and putative identification of the proteins. For the database searching of the MS/MS data obtained from the 2D-LC-MS/MS

(A)

Valve position of first-dimension LC

(B)

Valve position of second-dimension LC P6

P5 Drain

Drain

C1

C1

E-Valve

E-Valve C2

LC-2

Trap

C2

LC-2

Trap

Fig. 2-8. The six-port electrical valve to switch the flow path of the automated multidimensional LC system. Valve position of (A) first dimension ion-exchange and (B) seconddimension reversed-phase LC.

LC-BASED PROTEOMICS TECHNOLOGIES

97

analysis of the cell extracts of Caenorhabditis elegans, the collected MS/ MS data are converted to text files by MassLynx software (Micromass, UK) and are analyzed by the Mascot Daemon algorithm (Matrix Science) to assign peptides and identify proteins from the C. elegans database (wormpep 66 database, The Wellcome Trust Sanger Institute, Cambridge, UK). More specifically, the database search is performed with the following parameters: The sole fixed modification parameter is carbamoylmethylation (Cys). The variable modification parameters are pyro-Glu, acetylation (protein N terminus), oxidation (Met), and phosphorylation (Ser, Thr, and Tyr). The maximum missed cleavage is set at 3, with a peptide Mw tolerance of ±500 ppm. Peptide charges from ⫹2 to ⫹4 states and MS/MS tolerances of ±0.5 Da are allowed. Among the obtained search results file (dat file), the parameters of top-ranked candidate(s), such as the amino acid sequence of the peptide, the coding sequence (CDS) identifier (such as, F52D10.3), the probability (total score, threshold, and the difference), and the modification, are extracted as a text file with a comma-separated values (csv) format, using STEM. End of the First Cycle of 2D-LC-MS/MS 16. Soon after finishing the gradient elution for the RP column chromatography, the six-way switching valve automatically moves to connect column C1 to the trap column at the position indicated in Fig. 2-8A. Equilibrate column C2 again with solvent B3 for 15 min after the gradient elution is finished. The first cycle of the 2D-LC is now completed. Column Switching and Ion-Exchange Chromatography with a Gradient Elution by LC-1 17. Once the gradient elution for the RP column chromatography is completed, reactivate the LC-1 system automatically with the computer program, so that peptides are again eluted from the ion-exchange column C1 for 5 min at a flow rate of 100 μL/min, but this time with a mixture of buffers B1 and B2 (95% B1:5% B2). Proceeding Cycles 18. Repeat steps 11–17. 19. Repeat the separation cycles eight more times, each time changing the ratio of buffers B1 and B2 used to elute peptides from the ion-exchange column C1 as described in Apparatus Setup—step gradient elution for the first-dimension LC.

98

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

RESULT The described 2D-LC-MS/MS system can separate peptides with the simultaneous collection of precise molecular mass for each peptide and MS/MS data for the internal amino acid sequence. The most significant application of this system is mass identification of proteins in extremely complex protein mixtures, such as crude extracts of cells or tissues, cellular organelles, subcellular structures, on very complex cellular machineries. Combined with the automated generation of collision-induced dissociation of peptide fragments and computer-assisted retrieval of the spectra, one can expect to assign many thousands of peptides in a highly complex peptide mixture generated with protease digests of very complex protein mixtures, from which 100–2500 proteins are routinely identified from the gene and protein databases. Because the identification depends in part on the protein structure (e.g., peptide mass and sequence tag), this technique is applicable to any protein mixture, even intractable ones, such as those that include membrane-bound proteins. When the 2D-LC-MS/MS approach was applied to the protein identification of tryptic digests generated from soluble and insoluble protein fractions of the nematode C. elegans, a single operation of the 2D-LC-MS/MS carried out 8000–14,000 MS/MS analyses. Among the spectra, about one-fourth gave reliable candidate peptides (2200–3700) by searching the sequence database, wormpep. The peptides were assigned to 1700–2500 original proteins (genes). Several peptides derived from a single protein were detected by a single analysis run. About 2.0–3.5 peptides were assigned per protein on average, and a total of 1616 proteins, including 110 secreted/targeted proteins and 242 transmembrane proteins, were identified (14). The 2D-LC-MS/MS method described here has also been applied to large scale identification of very complex peptide mixtures generated from the cell extracts of mouse embryonic stem (ES) cells (E14-1) (in total, 1797 proteins including 40 transcription factors, 365 potential nuclear proteins, and 260 membrane proteins were identified in this analysis) (65), a total cell extract of Escherichia coli strain K12 (JM109) (this analysis identified 1480 expressed proteins, equivalent to ∼35% of the total open reading frames predicted in the genome) (64), and a postsynaptic density fraction of rat forebrain (this analysis identified 5264 peptides attributed to 492 proteins) (66). The technology has also been used for analysis of the surface of undifferentiated mouse ES cell lines, D3, in combination with cell surface labeling with sulfo-NHS-LC-biotin and subcellular fractionation of plasma membranes (Experimental Example 2–3) (67), and for the characterization of N-linked high mannose- and/or hybrid-type glycoproteins in C. elegans in combination with affinity capture of a set of glycopeptides generated by tryptic digestion of protein mixtures on a lectin column, followed by incorporation of a stable isotope tag, 18O, specifically into the N-glycosylation site upon a peptide: N-glycosidase mediated conversion of the glycosylated Asn to Asp in 18O-labeled water (see Experimental Example 2–5) (68).

LC-BASED PROTEOMICS TECHNOLOGIES

99

LIMITATION OF THE METHOD Although a total of 1616 proteins were identified with about 2.0–3.5 peptides assigned per protein on average in the case of the 2D-LC-MS/MS analyses of C. elegans cell extracts, 40–60% of those proteins were assigned by single peptide hits. This is partly because the complexity of the sample peptide mixture often exceeds the separation capacity of the 2D-LC-MS/MS system used and partly because the selection of a peptide for MS/MS analysis is data dependent and somewhat irregular (14, 69). The probobility of successful identification by MS/MS analysis is lower for peptides originating from less abundant proteins than for peptides from abundant proteins; this principle is also used for absolute quantification without using an internal standard peptide (see Section 2-2-4). Therefore, multiple measurements of the same preparation generally increase the number of proteins identified. In fact, when the same peptide preparation of E. coli cell extract was analyzed repeatedly under the same conditions, the number of proteins identified increased from 850 to 1480 after repeating the analysis ten times (Fig. 2-9A,B) (64). The total analyses generated ∼162,000 MS/MS spectra and resulted in the assignment of more than 58,700 peptides. The identified proteins corresponded to about 35% of the total 4289 ORFs predicted in the E. coli genome (70). Thus, after analyses of the ten repeats, the total number of identified proteins was near the maximum. This indicates the necessity of repeated analyses of 2D-LC-MS/MS for the same sample preparation in order to obtain the maximum identification of proteins in extremely complex mixtures, such as cell extracts. In addition, the results suggest other possible limitations of the analysis using 2D-LC-MS/MS. In the case of the analysis of E. coli cell extract, the identified protein subset contained a wide range of proteins with respect to physicochemical characteristics such as pI and molecular mass (Mr), and showed that the most acidic protein identified had a pI of 3.42, while the most basic had a pI of 13.1. The smallest protein identified had Mr ⫽ 5.1 kDa, and the largest had Mr ⫽ 182 kDa. A 2D visualization of the pI and Mr of the 1480 proteins and the E. coli proteome predicted from the ORFs (Fig. 2-9B) suggested that 2D-LC-MS/MS analyses covered 99% of the bacterial proteome with respect to Mr and pI. Almost the same coverage of protein identification using 2D-LC-MS/MS was obtained for the analysis of the cell extract obtained from C. elegans; that is, the 2D-LC-MS/MS system could have detected more than 99% of the proteins predicted from the genome sequence of C. elegans, as far as their Mr and pI values are concerned (pI 3.48–12.41, Mr 6.0–1369 kDa) (Fig. 2-9C) (14). In contrast to the high coverage with respect to Mr and pI ranges, however, the coverage in terms of abundance in the protein mixtures is somewhat different. The codon adaptation index (CAI) is widely used as an indicator of protein expression level. When the proteins were identified for C. elegans by 2D-LC-MS/ MS using the CAI, 336 (66%) out of 506 were categorized as “highly expressed” proteins (CAI ⬎ 0.7) and 666 (30%) out of 2228 were categorized as “moderately expressed” proteins, whereas only 549 (3.4%) out of 16,298 categorized as “low

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

100

abundance” proteins were detected (Fig. 2-9D). Thus, 2D-LC-based technology favors the identification of proteins with higher abundance, although it appears to have a wider dynamic range than conventional 2D-PAGE, as most of the “low abundance” proteins had not been identified by previous 2D-PAGE studies (14). The number of “peptide hits” used to identify a protein is expected to relate statistically

(A)

(B) 1000

1,000

Mr calculated (kDa)

Number of identified proteins

1,500

total newly indentified 500

100

10

0 1

2

3

4

5

6

7

8

Number of analyses

9

10

11 1 3

5

7

9

11

13

pI calculated

Fig. 2-9. (A) Identification of E. coli proteins by 2D-LC-MS/MS shotgun analysis. Analysis of the tryptic digest of the bacterial extracts was repeated 10 times, and the graph shows the number of newly identified proteins in each analysis (Δ) and the total number identified (♦). (B) Two-dimensional display of the proteome that was experimentally obtained by the shotgun analysis (orange: 1480 proteins) or predicted from the genome sequence (m52p) (yellow: 4280 entries), and the proteome predicted from the genes encoded within horizontally transferred genes called “K-loop” (blue; 490 proteins). Mr and pI of the entire proteome were calculated from the amino acid sequences without considering post-translational modifications. The y-axis is presented as a logarithmic scale. [A, B: From (Ref. 64).] (See insert for color representation of part B.) (C) Two-dimensional display of the C. elegans proteome predicted from the genome sequence (wormpep 66)(20,219 entries) and experimentally obtained by 2D-LC-MS/MS (inset black square) or 2D-PAGE (inset gray square) technology, illustrating that 2D-LC-MS/MS can cover a wider range of proteins than 2D-PAGE. Mr and pI of the proteome were calculated from the amino acid sequences without considering post-translational modifications. The y-axis is presented as a logarithmic scale. (D) Codon adaptation index (CAI) of the genes predicted from the C. elegans proteome (gray bars) and the proteins identified by 2D-LC-MS/MS (black bars). CAI is an index used for the expression level of the gene in the cell. The figure also indicates the average number of peptides used to identify proteins with different CAI ranges (shown by triangles). Although the 2D-LC-MS/MS system tends to identify proteins with CAI . 0.5, that is, highly and moderately expressed proteins, with relatively high recovery, to detect the peptides derived from proteins with larger CAI values more frequently, it also allows the identification of “low abundance” proteins in the cell, with CAI , 0.5 (more than 500 proteins in this analysis). [C, D: Reprinted with permission from Mawenyega et al. (Ref. 14). Copyright (2001) American Chemical Society.]

LC-BASED PROTEOMICS TECHNOLOGIES

101

Fig. 2-9. (Continued)

to the abundance of a protein in the sample mixture and to the protein length (the number of peptides generated by tryptic digestion). This is clearly shown by the plot of the number of peptide hits versus the CAI range of the identified protein (Fig. 2-9D), in which the number of peptide hits increased with higher CAI values of the identified proteins. Proteins of low abundance with CAI < 0.5 were identified with 1.82 peptides on average, and those of medium abundance (CAI < 0.7) had about 4.30 peptides. Many highly abundant proteins with CAI < 0.7 were identified on the basis of 11.9 peptides (Fig. 2-9D). This result also explains why 40–60% of the proteins identified by 2D-LC-MS/MS analysis of the cell extract of C. elegans were assigned by single peptide hits. Thus, all of these results emphasize the importance of the preconcentration of “low abundance” proteins or the elimination of “highly expressed” and/or “moderately expressed” proteins before analysis by 2D-LC-MS/MS technologies. 䉳 2-1-2 Application of LC-MS Methods to Functional Proteomics Subcellular (Organelle) Proteomics Using Cell-Surface Modifi cation Reagents and Cell Fractionation Among the subcellular structures, membranes are challenging targets for subcellular proteomics partly because membrane proteins (that have intracellular, extracellular, and membrane spanning domains) contain both hydrophilic and hydrophobic domains and thus are difficult to solubilize, and partly because they are generally less abundant. While 2D-PAGE can be used for separation of intact membrane proteins, it does not resolve them very well, mainly because of the solubilization difficulties and the low abundance (71, 72). One-dimensional SDS-PAGE can be used as an alternative method for the separation of membrane proteins because this gives better solubility of membrane proteins, but it does not have enough power to separate

102

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

those proteins. In addition, electrophoresis-based methods are carried out in a strongly denaturing detergent-based environment; under such conditions, topological information (what is inside the cell, what is outside the cell) is lost. To overcome these drawbacks, a “shotgun approach” is introduced in conjunction with protease and alkaline treatments of cell (tissue) extracts or membrane vesicles (73). The shotgun approach refers to the direct analysis of complex peptide mixtures derived from proteolytic digestion of heterogeneous mixtures of proteins to rapidly generate a global profile of the protein complement within the mixture. In this approach, only the nonmembrane (intracellular and extracellular but not membrane spanning) domains are digested with protease into peptides and subjected to subsequent analysis with LC-MS/MS. In a variation on the method using a protease protection strategy, the intact membrane vesicles are fi rst digested with protease at neutral pH. This step completely degrades the exposed domains, which are present on the outer membrane side. The vesicles are then unsealed by treatment with protease at high pH, permitting isolation and identification of previously protease-protected domains. Comparison of the results obtained by the direct and protease-protection strategies using the same sample provides information on relative protein topology and protein localization (73). Another difficulty is associated with the isolation of a specific membrane, especially plasma membrane, in a pure form because the membranes lose their specific structure upon cell lysis, and a typical plasma membrane-rich fraction prepared by ultracentrifugation is heavily contaminated with other membrane components (21, 74). Several methods, including coating of intact cells with silica derivatives (75) and selective labeling of cell-surface proteins with the membrane-impermeable reagent biotin, have been developed to obtain relatively homogeneous preparations of plasma membranes (74, 76). Those methods, when combined with conventional sucrose density gradient ultracentrifugation, give a high purity plasma membrane. Because a selective labeling reagent of the cell surface, sulfo-N-hydroxysuccinimide (NHS)-LC-biotin, is impermeable to plasma membrane and labels amines in proteins mainly at the ε-amino group of lysine residues exposed to the extracellular space (Fig. 2-10A, B), the labeled peptides should reside on the extracellular domain of a transmembrane protein. When combined with this method, 2D-LC-MS/MS becomes a very powerful approach for identifying cell-surface proteins, many of which are receptors for various stimuli from outside the cell. Therefore, Experimental Example 2-3 consists of (1) in situ biotinylation of surface proteins on intact cells using the membraneimpermeable reagent sulfo-NHS-LC-biotin, (2) cell lysis and subcellular fractionation of the biotinylated membranes on sucrose gradients, (3) tryptic digestion of the protein mixture, (4) affinity capture of the biotinylated peptides with avidin, and (5) 2D-LC-MS/MS analysis of the peptide mixture. 䉴 Experimental Example 2-3 Selective isolation and identification of ES cell-surface proteins using sulfo-NHS-LC-biotin (67).

LC-BASED PROTEOMICS TECHNOLOGIES

103

Fig. 2-10. Cell surface labeling and subcellular fractionation. (A) Mouse embryonic stem cells D3 are labeled with sulfo-NHS-LC-biotin. (B) Biotin labeling (green, FITC-conjugated avidin) is observed using confocal laser microscopy. Fluorescence (top) and transmission microscopy (bottom). (C) A lysate of biotinylated D3 cells was centrifuged at 3000g for 10 min, and the supernatant was fractionated by sucrose density gradient centrifugation (fractions 1–8, 60% to 15% sucrose). Proteins in each fraction were then analyzed by polyacrylamide gel electrophoresis and stained with Coomassie brilliant blue (CBB), or labeled with anti-avidin-alkaline phosphatase (Avidin-AP), anti-annexin II (α-annexin II: a marker for plasma membrane), anti-GM130 (α-GM130; Golgi apparatus), anti-Bip/GRP78 (α-Bip/ GRP78; endoplasmic reticulum), or anti-nucleoporin p62 (α-nucleoporin p62; nucleus). (D) MS/MS spectrum of a biotinylated peptide assigned to the transferrin receptor, expanding residues 124 (L) and 136 (K). Arrows indicate the fragment ions at m/z ⫽ 227.1 and 340.2, derived from the labeling reagent. (E) Subcellular localization of FLAG-tagged proteins expressed in D3 cells. Panels (top to bottom): RIKEN cDNA B430119L13, trophoblast glycoprotein, glycoprotein A33, hypothetical protein D7Ertd458e, and empty vector. Cells are stained with anti-FLAG, anti-CD9 (an ES cell plasma membrane marker), and Hoechst 33342. (See insert for color representation.)

104

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

Fig. 2-10. (Continued)

MATERIALS • A mouse embryonic stem (ES) cell line, D3 (American Type Culture Collection, Manassas, VA). • Culture medium: Dulbecco’s modified Eagle’s medium (Invitrogen, Carlsbad, CA) supplemented with 15% heat-inactivated fetal bovine serum (JRH Biosciences, Lenexa, KS), 0.1 mM α-mercaptoethanol, 100 units/mL penicillin, 100 μg/mL streptomycin, and 1000 units/mL recombinant mouse leukemia inhibitory factor (ESGRO-Chemicon International, Temecula, CA). • PBS saline (10 mM NaH2PO4 /Na2HPO4, pH 7.4, 138 mM NaCl, 2.7 mM KCl) supplemented with 0.1 mM CaCl2, 1 mM MgCl2 (PBS⫹). • EZ-Link™ sulfo-NHS-LC-biotin (Pierce, Rockford, IL). • 100 mM glycine in PBS⫹. • Protease inhibitor cocktail (Roche Diagnostics, Basel, Switzerland). • 10 mM Hepes-NaOH, pH 7.5, 0.25 M sucrose [8.5% (w/v)]. • Phosphatase-conjugated avidin (Pierce) or respective organelle-specific antibodies (Organelle sampler kit, BD Biosciences, Lexington, KY). • Acetone. • Alkylation buffer: 8 M urea in 400 mM NH4HCO3, pH 8.5. • Dithiothreitol. • Iodoacetamide. • N (alpha)-L-tosyl-L-phenylalanine chloromethyl ketone (TPCK)-treated porcine trypsin (Promega, Madison, WI). • Immunopure Immobilized MonomeircAvidin (Pierce). • Trifluoroacetic acid.

LC-BASED PROTEOMICS TECHNOLOGIES

105

• Acetonitrile. • BCA protein assay kit (Pierce). • HABA; 2-(4⬘-hydroxyazobenzene)-benzoic acid (Pierce). APPARATUS • Ultracentrifuge. • 2D-LC-MS/MS system (see Experimental Example 2-2). • Centrifugal vacuum concentrator. SOFTWARE TOOLS • • • •

MassLynx. ProteinLynx software (Micromass). Mascot. STEM 13 (available from our website, http://www.sci.metro-u.ac.jp/ proteomicslab/).

PROCEDURE Cell-Surface Labeling 1. Culture a mouse embryonic stem (ES) cell line, D3, on 0.1% gelatin-coated tissue culture dishes in culture medium. 2. When the grown, D3 cells are approximately 80% confluent on 150 mm tissue culture dishes, incubate in a serum-free medium for 1 h, then rinse twice with ice-cold PBS saline. 3. Incubate with 1 mg/mL EZ-Link™ sulfo-NHS-LC-biotin in PBS⫹ for 20 min at 4 ⬚C with gentle agitation. 4. After removal of the supernatant, quench residual labeling reagent with 100 mM glycine in PBS⫹. 5. Harvest cells using a plastic scraper.

Preparation of Biotinylated Membrane Fractions 6. Wash biotinylated D3 cells (approximately 4.8 ⫻ 109 cells) twice with PBS⫹. 7. Suspend the biotinylated D3 cells in 10 mM Hepes-NaOH, pH 7.5, 0.25 M sucrose [8.5% (w/v)], and protease inhibitor cocktail, and then lyse by nitrogen cavitation at 800 psi on ice for 20 min (71). 8. Centrifuge the cell lysates at 3000g for 10 min to remove large cell debris and nuclei.

106

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

9. Layer the supernatant on a discontinuous sucrose density cushion consisting of 15%, 30%, 45%, and 60% sucrose (w/v) in 10 mM Hepes-NaOH, pH 7.5, and ultracentrifuge at 100,000g for 17 h (72). Comments; When biotinylated peptides are prepared directly by tryptic digestion of total cell lysates of biotin-labeled D3 cells without subcellular fractionation, LC-MS/MS analysis of this peptide preparation results in the identification of many peptides that are derived from a number of highly abundant intracellular proteins, probably originating from nonviable cells. To minimize contamination that can interfere with the selective and sensitive identification of low abundance cell-surface proteins, incorporation of this sucrose density centrifugation step to purify plasma membrane prior to tryptic digestion of protein samples is necessary. 10. Analyze the resultant fractions by SDS-PAGE, followed by Western blotting using alkaline phosphatase-conjugated avidin (Pierce) or respective organelle-specific antibodies to detect the biotinylated proteins. 11. Collect the fractions containing biotinylated proteins, and dilute four fold with distilled water, and ultracentrifuge at 120,000g for 2 h to obtain plasma membrane-rich pellets. Tryptic Digestion and Avidin-Affinity Enrichment of Biotinylated Peptides 12. Delipidate the biotinylated membrane fractions twice with cold acetone, dry, and solubilize with 8 M urea in 400 mM NH4HCO3, pH 8.5. 13. Reduce the protein samples with 2 mM dithiothreitol at room temperature for 30 min, alkylate with 2 mM iodoacetamide for 30 min in the dark, and dilute four fold with distilled water. 14. After removal of dithiothreitol and iodoacetamide by dialysis, digest the protein samples with TPCK-treated porcine trypsin at an enzyme-to-substrate ratio of 1:100 (w/w) at 37 ⬚C for 16 h. 15. Monitor the digestion by SDS-PAGE, followed by Western blotting using alkaline phosphatase-conjugated avidin. 16. Apply approximately 1 mg of the digest to a column packed with 1 mL of Immunopure Immobilized Monomeirc Avidin pretreated with 30% CH3CN in 0.4% trifluoroacetic acid and equilibrated with 2 M urea in 100 mM NH4HCO3, pH 8.5. 17. Wash sequentially with (a) 2 M urea in 100 mM NH4HCO3, (b) 2 M urea in 100 mM NH4HCO3 containing 0.5 M NaCl, (c) 2 M urea in 100 mM NH4HCO3 containing 30% CH3CN, and (d) 100 mM NH4HCO3. 18. Elute bound peptides with 30% CH3CN in 0.4% trifluoroacetic acid. 19. Concentrate the eluted peptides on a vacuum concentrator. 20. Determine the amount of peptides and biotin labels using the BCA protein assay kit and HABA, respectively, according to the manufacturer’s instructions.

LC-BASED PROTEOMICS TECHNOLOGIES

107

Automated Multidimensional LC-MS/MS Analysis 21. Analyze the peptide mixture on an automated 2D-LC-MS/MS system using a combination of first-dimension cation exchange and second-dimension reversed-phase chromatography. 22. Separate the peptide mixture (50 μg) on a SP-5PW column (1 mm I.D. ⫻ 40 mm long, 20 μm particles; TOSOH). 23. Elute the peptides using a 15 min stepwise elution process (20 mM acetate buffer, pH 4.0, containing 0 mM, 25 mM, 50 mM, 100 mM, and 400 mM NaCl) at a flow rate of 10 μL/min. 24. Capture the eluted peptides at each step on a trap column (Mightysil C18, 0.5 mm I.D. ⫻ 1 mm long, 3 μm particles; Kanto Chemicals) for desalting. 25. Separate on a Mightysil C18 column (0.15 mm I.D. ⫻ 40 mm long, 3 μm particles) using three-step linear gradients (0–30% CH3CN in 0.1% formic acid for 120 min, 30–70% CH3CN in 0.1% formic acid for 40 min, and 70% CH3CN in 0.1% formic acid for an additional 10 min) at a flow rate of 50 nL/min. 26. Analyze the eluted peptides by a high resolution Q-ToF-2. The total analysis time for a single 2D-LC-MS/MS operation was 22.5 h. Protein Identification by Database Search 27. Acquire the MS/MS signals by MassLynx and convert to text files by ProteinLynx software. 28. Perform the database search in triplicate against the Refseq sequence databases of mouse, human, and rat, respectively, with the following parameters: fixed modification—carbamoylmethylation (Cys); variable modifications—oxidation (Met), sulfo-NHS-LC-biotin (Lys); maximum missed cleavages, 3; peptide mass tolerance, 150 ppm; MS/MS tolerance, 0.5 Da. 29. For peptide and protein identification, process the search results as follows. (a) Screen the candidate peptide sequences with the probabilitybased Mowse scores that exceeded their thresholds (P ⬍ 0.05) and with MS/MS signals for y- or b-ions 艌3. (b) Remove redundant peptide sequences. (c) Assign each peptide sequence to a protein giving the maximal number of peptide assignments among the candidates. (d) Combine mouse, human, and rat datasets. (e) Remove interspecies redundancy of proteins. For reliable identification of labeled peptide sequences, apply additional criteria with visual inspection of individual MS/MS spectra— (f) the presence of MS/MS signal corresponding to the labeled lysine residue and/or (g) the presence of one of more fragment ions (M⫹ ⫽ 227.1 and 340.2) derived from the labeling reagent (76) —and exclude peptide sequences that are identified without biotin modifications regardless of their scores.

108

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

RESULT The method of cell-surface labeling and preparation of plasma membrane allows the following: (1) selective biotinylation of the cell surfaces of about 95% cells with sulfo-NHS-LC-biotin (Fig. 2-10B), (2) concentration of the biotinylated plasma membrane and removal of most of the other subcellular structures by ultracentrifugation using a sucrose density gradient cushion (Fig. 2-10C), (3) selective isolation of biotinylated membrane proteins by avidin-affinity chromatography, and (4) generation and purification of biotin-labeled peptides that originate from cell-surface proteins. For example, automated 2D-LC-MS/MS analysis of the avidin-purified biotin-labeled peptide mixture (50 μg) obtained from D3 ES cells generated 5871 MS/MS spectra in a single 22.5 h analysis. These spectra were assigned to 608 unique peptides by the sequential Mascot search of mouse, human, and rat Refseq sequence databases. The advantage of this method is that assignment of the MS/MS signals that originate from the labeling reagent can directly validate the biotin-labeled peptides. As an example, a typical MS/MS spectrum assigned to a peptide from transferrin receptor is shown in Fig. 2-10D. The MS/MS analyses of biotinylated peptides showed fragment ions at m/z ⫽ 227.1 and 340.2 due to collision-induced dissociation of the labeling reagent, as indicated by arrows in Fig. 2-10D. By detecting these MS/MS signals, peptide assignments without biotin labels can be excluded from the list of identification. After careful inspection of the assigned 608 unique peptides, 551 peptides were found to carry the biotin label and were attributed to 240 unique proteins (2.3 biotin-labeled peptides/protein on average). When the 2D-LC-MS/MS analysis was performed twice with different peptide preparations, the number of unique peptides carrying the biotin label increased by 965, which are attributed to 324 unique proteins (67). The number of peptides used to identify a single protein ranged from 1 to 37 with an average of 3.0 peptides per protein, and 151 proteins were identified by multiple peptide assignments. The assigned 324 proteins contain 235 known membrane proteins or have putative signal sequences and/or trans-membrane segments. To validate the present procedure to identify cell-surface molecules experimentally, the subcellular localization of several proteins with no annotated localization in the Gene Ontology (GO) database was studied by transiently expressing a FLAG-tagged protein in D3 cells. When RIKEN cDNA B430119L13, trophoblast plasma membrane glycoprotein, glycoprotein A33, and a hypothetical protein D7Ertd458e, which have single predicted transmembrane segments and are known to be components of pre- or/and postimplantation embryo or present in neoplastic tissues, were selected immunofluorescence staining of transfected D3 cells revealed that all four proteins were localized extensively on the plasma membrane as judged by colocalization with CD9, a cell-surface marker of ES cells (Fig. 2-10E). In contrast to this approach, analysis of tryptic digests of total ES cell lysates using 2D-LC-MS/MS (described in the Experimental Example 2-2) identified 1790 proteins. However, one-third of the cell-surface enriched proteins (∼30%) were not identified by analysis of the total cell extracts. In addition, the method of cell-surface

LC-BASED PROTEOMICS TECHNOLOGIES

109

protein enrichment allowed scientists to identify minor cellular components such as cytokine/growth factor receptors Erbb2, Fgfr2, Flt4, and Lifr, where Lifr, for instance, has been reported to be present on 250–300 copies/cell (77). Thus, the method described here enabled researchers to identify relatively low abundance and insoluble proteins integrated on the plasma membrane. LIMITATION OF THE METHOD The described labeling and isolation strategy does not yield plasma membrane proteins that are completely free of contamination. First, intracellular proteins from nonviable cells (usually ∼5% of the cell culture) are labeled with the biotinylating reagent. About one-fourth of the identified proteins in this experiment had no potential signal sequences or TM segments, and contained abundant housekeeping proteins, such as ribosomal constituents, structural molecules, histones, and chaperones. Although some of these proteins might potentially be cell-surface components, it is still difficult to distinguish them from the intracellular components of nonviable cells that might be labeled and retained in the membrane fractions. Second, the biotinylating reagent might fail to label proteins that have few reactive lysine residues, small extracellular regions, or many post-translational modifications, such as glycosylation (67). Other relevant methods need to be applied to confirm that the identified proteins are cell-surface membrane proteins. 䉳 Modification-Specific Proteomics Among the more than 200 different posttranslational modifications (PTMs) that are known to date (http://en.wikipedia. org/wiki/Posttranslational_modification), proteomic-scale methods of identifying proteins with specific PTMs have been developed for about a dozen (see Section 1-3-1). In eukaryotes the best-characterized PTMs using proteomic-scale methods are phosphorylation and glycosylation, while other common PTMs, including ubiquitination, acetylation, methylation, and sumolyation, are also beginning to be characterized (see Table 1-3). Almost all of these methods involve either derivatization of PTMs with modification-specific reagents or specific-affinity binding to a PTM with antibodies or other specific binders to select the proteins with a specific modification. Experimental examples of three of those—phosphorylation, glycosylation, and ubiquitination—are described here. Phosphorylation Site Mapping Protein phosphorylation is the most ubiquitous among the known PTMs; almost 2% of the human genome encodes protein kinases and an estimated one-third of all proteins are phosphorylated (78). Protein phosphorylation is a post-translational modification catalyzed by protein kinases that form phosphodiester bonds between the phosphate group and hydroxyl group of either tyrosine, serine, or threonine residues of proteins. Common approaches for phosphorylation site mapping rely largely on the use of MS/MS to sequence individual peptides. By using mass spectrometry coupled to phosphopeptide enrichment strategies, such as covalent modifications with thiol reagents that incorporate affinity tags (79), immobilized metal affinity chromatography (80),

110

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

strong cation-exchange chromatography (81), or peptide immunoprecipitation (82), thousands of modification sites have been identified. Currently, a new method for data analysis and interpretation of large numbers of protein phosphorylation sites obtained by 1D-RP-LC-MS/MS has emerged and may facilitate the task of phosphoproteomics further (83). Despite the power of these approaches and significant successes, MS analysis of protein phosphorylation is still far from being routine and is believed to have some difficulties in identifying the phosphopeptides and phosphorylation sites because of (1) the low sensitivity to phosphopeptides caused by signal suppression of phosphate-containing molecules in the commonly used positive detection mode, (2) the inherent lability of the phosphate group upon collision-induced dissociation, and (3) the difficulty of achieving full sequence coverage, especially for long peptides, peptides present in low abundance, and peptides phosphorylated at substoichiometric levels, all of which are common for phosphopeptides (84). Although those arguments are challenged by the results obtained by ESI-MS analysis of synthetic phosphopeptides (85), the outcome of MS analysis of phosphoproteins is often unsatisfactory. A new strategy for specific proteolysis at serine and threonine phosphorylation sites may overcome those dificulties, at least for researchers who are unfamiliar with phosphopeptide analysis (78). The strategy relies on the selective chemical transformation of phosphoserine and phosphothreonine residues in phosphopeptides into lysine analogs (aminoethylcysteine and β-methylaminoethylcysteine, respectively), which are cleaved with a lysine-specific protease to map phosphorylation sites (Fig. 2-11A). The use of solid phase reagent for the reaction facilitates the specific enrichment of phosphopeptides and modification in one step (Fig. 2-11B). In addition, the strategy utilizes ordinal MS and LC-MS/MS equipment for peptide identification; therefore, it may be a convenient approach for researchers who have not specialized in phosphopeptide analysis by using MS and MS/MS. 䉴 Experimental Example 2-4 Phosphorylation site mapping by phosphospecific proteolysis. [Translated by permission from Macmillan Publishers Ltd.; Knight et al., Nat. Biotechnol. (Ref. 78) (2003)]. MATERIALS • • • • • • •

β-Elimination solution: saturated Ba(OH)2. 5 M NaOH. Cysteamine (Sigma). Sequencing grade trypsin, Lys-C, and Asp-N (Roche Diagnostics). Lysyl endopeptidase (Wako). Tentagel AC resin (Advanced Chemtech). Synthesized peptides (Anaspec or synthesized using standard Fmoc solid-phase chemistry). • Acetonitrile (HPLC grade).

LC-BASED PROTEOMICS TECHNOLOGIES

111

• • • •

Trifluoroacetic acid (TFA, HPLC grade). Reversed-phase ZipTips C18 (C-18 resin, Millipore). α-Cyano-4-hydroxycinnamic acid (HCCA, Sigma). MALDI-ToF/MS calibration standard: angiotensin I, ACTH 1–17, ACTH 18–39, and ACTH 7–38. • Solvent A for 1D-RPLC: 0.05% formic acid in 98% H2O/2% acetonitrile. • Solvent B for 1D-RPLC: 0.05% formic acid in 98% acetonitrile/2% H2O.

Fig. 2-11. Chemical modification of phosphoserine to aminoethylcysteine. (A) Phosphoserine residues are transformed to aminoethylcysteine through dehydroalanine as an intermediate. (B) Solid-phase support for the capture and aminoethylcysteine modification of phosphoserine-containing peptides. (C) ESI-MS/MS spectrum of peptide RELEELNVPGEIVEK* (K* ⫽ aminoethylcysteine) [residues Arg1–Lys*15] of β-casein obtained after aminoethylcysteine modification of phosphoserine residue and digestion with trypsin/lysylendopeptidase. The [M ⫹ 2H] 2 at m/z 886.40 (M ⫽ 1770.79) was selected for CID. (D) Top: β-Casein was modified as aminoethylcysteine, digested with trypsin, and analyzed by MALDI-MS. Masses in bold and magnified indicate aminoethylcysteine-modified peptides. Bottom: Unmodified β-casein was digested with trypsin and analyzed by MALDI-MS. Insets indicate that unmodified phosphoserine-containing peptides, predicted at m/z 2061.8 (Phe33–Lys48) and at m/z 2966.2 (Glu2– Arg25) and 3122.3 (Arg1–Arg25) could not be detected in this spectrum. [Reprinted by permission from Macmillan Publishers Ltd.; Knight et al., Nat. Biotechnol. (Ref. 78) (2003).]

112

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

Fig. 2-11. (Continued)

APPARATUS • 10,000 MWCO Slide-A-Lyzer Mini Dialysis units (Pierce). • Dynamax SD-200 HPLC solvent delivery system (Rainin) equipped with a Zorbax 300 C-18 9.4 mm ⫻ 25 cm column. • MALDI-ToF MS on a Voyager DESTR equipped with a 337 nm nitrogen laser (Applied Biosystems). • Ultimate nanocapillary HPLC system equipped with a PepMap C18 nanocolumn (75 μm I.D. ⫻ 15 cm) (Dionex) and CapTrap Microguard column (0.5 μL bed volume, Michrom). • QSTAR quadrupole orthogonal ToF mass spectrometer (MDS Sciex) equipped with a Protana nanospray ion source.

APPARATUS SETUP • MALDI-ToF/MS is operated as follows: 1. In positive-ionization mode with reflectron optics. 2. Under delayed extraction conditions in reflectron mode.

LC-BASED PROTEOMICS TECHNOLOGIES

113

3. With a delay time of 190 ns and a grid voltage 66–70% of full acceleration voltage (20–25 kV). For linear mode experiments, the delay time is 100 ns and the grid voltage is 93.4% of the acceleration voltage. • 1D-RPLC-MS/MS is operated as follows: 1. At a flow rate of 300 nL/min in a gradient of 2% B (from 0 to 5 min), and 2–70% B (from 5 to 55 min) for 1D-RPLC. 2. The column eluent is introduced directly to the QSTAR quadrupole orthogonal ToF mass spectrometer through a Protana nanospray ion source, under MS conditions of typically 2300 V for the nanospray needle voltage in HPLC-MS mode. 3. ESI-MS and ESI-MS/MS results are recorded in positive-ion mode with a resolution of 12,000–15,000 at full-width half-maximum (FWHM). 4. For CID-MS/MS, the mass window for precursor ion selection of the quadrupole mass analyzer is set to ±1 mass unit. 5. The precursor ions are fragmented in a collision cell using nitrogen as the collision gas. 6. The LCMS results, run on the QSTAR instrument, are acquired in “Information Dependent Acquisition” mode (advanced IDA), which allows the user to acquire MS/MS spectra based on an inclusion mass list and dynamic assessment of relative ion intensity. PROCEDURE Aminoethylcysteine Modification 1. Pretreat protein sample with performic acid for 2 h to quantitatively oxidize cycteine residues. 2. Desalt overnight by microdialysis against 2 liters water using 10,000 MWCO Slide-A-Lyzer Mini Dialysis units. 3. After dialysis, transfer dialyzed samples to 0.5 mL Eppendorf tubes, and wash the dialysis membrane three times with 20 μL of water. 4. Concentrate the combined dialysate by Speed-Vac to reduce the volume to ∼5 μL, taking care not to concentrate to dryness. 5. Add 5 μL of a 3:1 mixture of dimethylsulfoxide (DMSO)/ethanol directly to this sample. 6. Initiate β-elimination by the addition of 4.6 μL saturated Ba(OH)2 and 1 μL 500 mM NaOH and incubate 2 h in a 37 ⬚C water bath with gentle vortexing every 20 or 30 min to prevent excessive aggregation. For peptides, dissolve a sample (∼2 μg) in 50 μL of a 4:3:1 solution of H2O/DMSO/ethanol in 0.5 mL microcentrifuge tubes (adjust to maintain a protein concentration ⭌0.01 μg/μL).

114

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

7. Add 23 μL of the β-elimination solution [saturated Ba(OH)2] and 1 μL of 5 M NaOH, and incubate at room temperature for 1 h. 8. After 2 h, the sample is placed at room temperature for 5–10 min. 9. Add 10 μL of a freshly prepared 1 M solution of cysteamine HCl directly to the β-elimination sample and allow the Michael addition reaction to proceed 3–6 h at room temperature. (For peptides, add 50 μL of a 1 M solution of cysteamine in H2O directly to the reaction solution and incubate for 3–6 h at room temperature for proteins/peptides containing phosphoserine. Proceed with the β-elimination reaction for proteins/peptides containing pSer-Pro sequences and pThr residues for 2 h at 37 ⬚C.) 10. Transfer the protein solution to a rinsed minidialysis unit (10,000 MWCO Slide-A-Lyzer Mini Dialysis unit) and dialyze overnight against 2 L of 20 mM Tris, pH 8.0. 11. After dialysis, transfer the protein solution to a new 0.5 mL Eppendorf tube, and wash the dialysis membrane three times with 10 μL of 20 mM Tris, pH 8.0. 12. Concentrate the dialyzed sample combined with the wash solution to ∼5 μL by Speed-Vac and use the concentrated sample for digestion with appropriate proteases (e.g., trypsin or Lys-C) and/or for analysis by LC-MS/MS or MALDI-MS. If necessary to confirm cysteamine modification, add steps 13 and 14; if not, skip those steps and go to step 15. 13. After dilution into 1 mL H2O/0.1% TFA, separate the reaction products by RP-HPLC on a Dynamax SD-200 solvent delivery system. 14. Confirm the conversion of phosphoserine and phosphothreonine into aminoethylcysteine and β-methylaminoethylcysteine, respectively, by ESI-MS offline or by MALDI-MS and ESI-MS/MS. Protease Cleavage for Phosphorylation Site Mapping 15. Transfer the dialyzed and concentrated sample to a new 0.5 mL microcentrifuge tube. 16. Add 5 μL of acetonitrile as a denaturant, heat the modified protein sample to 65 ⬚C for 10 min, and digest by the addition of 15 μL of 10 mM Tris, pH 8.0, containing trypsin or Lys-C at 1:10 enzyme-to-substrate ratio by weight for 6 h at 37 ⬚C. (Lys-C cleaves at modified sites somewhat more efficiently than trypsin, and use slightly higher concentrations of protease than would be recommended for an ordinary trypsin digestion, although optimal digestion conditions vary substantially between samples.) 17. Analyze the proteolytic peptide mixtures (∼1 pmol) by MALDI-MS (go to steps 18–20) or by 1D-RPLC-MS/MS (go to steps 21–23).

LC-BASED PROTEOMICS TECHNOLOGIES

115

MALDI-ToF MS Analysis 18. Apply the digested peptide mixture to a ZipTips C18 column equilibrated with 0.1% trifluoroacetic acid (TFA) and elute in 75% acetonitrile, 0.1% TFA (see Protocol 1-1). 19. Mix 1 μL of the peptide mixture with 1 μL of 33 mM HCCA in acetonitrile/methanol [1:1 (v/v)] analyte (0.1–1 pmol of material), and then air dry at room temperature on a stainless steel target. 20. Obtain mass spectra by MALDI-ToF MS. Use about 50 laser shots to record each spectrum, and calibrate the obtained mass spectra externally with an equimolar mixture of angiotensin I, ACTH 1–17, ACTH 18–39, and ACTH 7–38. 1D-RP-LC-MS/MS Analysis 21. Load the peptide mixture onto the guard column on an Ultimate nanocapillary HPLC system and wash with the loading solvent (H2O/0.05% formic acid, flow rate: 20 μL/min) for 5 min to remove salts and denaturing reagents. 22. Transfer onto the C18 nanocapillary HPLC column and perform 1D-RPLCMS/MS at the conditions described in Apparatus Setup. 23. Calibrate spectra in static nanospray mode using the MS/MS fragment ions of a renin peptide standard (histidine immonium ion with m/z at 110.0713, and b8 ion with m/z at 1028.5312) providing a mass accuracy of ⬉50 ppm. Phosphorylation Site Mapping by Phosphospecific Proteolysis After SolidPhase Capturing and Modification: Resin Synthesis 24. Swell 5 g of Tentagel AC resin in 75 mL of anhydrous THF at room temperature under an inert atmosphere. 25. Add 2.5 g of 1,1-carbonyldiimidazole and stir for 3 h. 26. Filter the resin, wash with THF and diethyl ether, and dry in vacuo overnight. 27. Dissolve 5 g of cysteamine HCl salt in 45 mL of H2O before use. 28. Adjust the pH to 12 with NaOH and extract the cysteamine with CH2Cl2. 29. Dry the organic phase with MgSO4, filter, and remove the solvent in vacuo to give a clear oil. 30. Add ∼1 g of the oil to 2 g of the activated resin swollen in 25 mL of THF. 31. Add 2 mL of N-methylmorpholine and heat the resin to 60 ⬚C for 4–6 h under an inert atmosphere. 32. Filter the resin, wash with THF and diethyl ether, dry in vacuo, and store at ⫺20 ⬚C until use.

116

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

33. Immediately before use, treat the resin with 100 mM dithiothreitol in H2O to expose the cysteamine thiol for 15 min for deprotection of the resin. 34. Confirm the resin loading capacity by quantification with Ellman’s reagent typically to 60–80% loading (0.20–0.25 mmol/g). Solid-Phase Capture and Modification of Phosphoserine Peptides 35. After deprotection, the resin is washed five times with H2O and five times with 4:3:1 H2O/DMSO/ethanol. 36. Dissolve a peptide sample in 250 μL of 4:3:1 H2O/DMSO/ethanol and add to 80 mg of resin swollen in the same. 37. Add 225 μL of saturated Ba(OH) and 10 μL of 5 M NaOH and incubate the reaction mixture for 1 h at room temperature. 38. Rinse the resin successively with H2O, dimethylformamide, CH2Cl2, and diethyl ether, and dry overnight in vacuo. 39. Suspend the dried resin in 1 mL of 95:2.5:2.5 TFA/dimethylsulfide/H2O for 15 min at room temperature to release the peptides. 40. Filter the resin and wash three times with 1 mL of TFA. 41. Concentrate the filtrate in vacuo. The released peptides are taken up in H2O/0.1% TFA and analyzed by HPLC and MS as described. RESULT (78) When this method was applied to phosphorylation site mapping of α- and β-casein that contained three and five sites of phosphorylation, respectively, eight peptides were identified by mass fingerprinting using 1D-RPLC-MS, corresponding to direct cleavage at phosphorylation sites of the two predicted proteins. The peptides, generated from cleavage at the sites of the aminoethylcysteine modification, contained aminoethylcysteine residue at the C terminus, as confirmed by LC-MS/MS. For example, one of the peptides containing aminoethylcysteine at position 15 in β-casein produced typical peptide MS/MS fragmentation patterns that were readily interpretable, as shown in Fig. 2-11C. The tandem mass spectrum for the peptide showed the characteristic y1 ion at m/z 165.1, which results from a loss of a C-terminal aminoethylcysteine residue and appears as a highly abundant product ion in this MS/MS spectrum and other CID spectra of peptides containing this C-terminal residue. Unlike existing precursor ion scanning approaches for phosphopeptides, the detection of this y1 ion is not only indicative of the presence of a phosphoserine-containing peptide, but also positively identifies its precise position in the sequence (i.e., at the C terminus) (Fig. 2-11C). Because epimerization occurs at Cα, of the formerly phosphorylated amino acid, aminoethylcysteine modification generates diastereomeric aminoethylcysteine peptides (R, S) in an approximately 1:1 mixture, only one of which, the peptide containing the R stereochemistry at Cα, is a substrate for lysinespecific protease. As a result, cleavage occurs at approximately 50% of the sites for any given phosphopeptide under complete proteolysis conditions. In practice, this

LC-BASED PROTEOMICS TECHNOLOGIES

117

obligatory partial digestion is advantageous for phosphopeptide mapping because it provides staggered and redundant mass information for multiply phosphorylated peptides, although it can also increase the complexity of the resulting mass spectra. Phosphorylation sites can also be examined by comparing two equivalent samples of β-casein, one of which is aminoethylcysteine modified and one of which is untreated. The two samples are subjected to trypsin digestion and analyzed by MALDI-ToF MS. Figure 2-11D shows that four modified phosphopeptides are detected as prominent ions at m/z 1771.9, 2031.1, 2041.0, and 3038.4 in the MALDI spectrum from only the aminoethylcysteine-derivatized sample. On the other hand, tryptic peptides containing the phosphorylation sites are not detected in the MALDI-MS spectrum from the untreated control sample (Fig. 2-11D). This example shows that common MS equipment, such as that using conventional MALDI-MS analysis under standard conditions and used for most proteomics projects, can do the job for phosphorylation site mapping, which had been difficult to performe using common MS equipment. The method can detect aminoethylcysteine-modified peptides at as little as 25 fmol of an unseparated tryptic digest. This type of approach will be a valuable complement to traditional MS/ MS sequencing as a strategy for phosphorylation site mapping (78). 䉳 N-Glycosylation Site Mapping Glycoproteins are proteins with covalently bound carbohydrate moieties, are ubiquitous in nature, and are found in almost all living organisms. Glycoproteins occur in cells in soluble and membrane-bound forms as well as in the intracellular matrix and extracellular fluids. There are two different kinds of linkage between carbohydrate moieties and proteins: first, N-glycosidic linkage of glucosamine, where asparagine in the obligatory signal sequence, Asn-X-Thr/Ser, is involved in the carbohydrate-peptide bond; and second, O-glycosidic linkage of galactosamine, where serine and/or threonine, as well as hydroxylysine in no apparent signal sequence, are involved in the carbohydrate–peptide bond. The constituents of carbohydrate chains include galactose, mannose, fucose, N-acetylglucosamine, N-acetylgalactosamine, and sialic acid (86). Almost one-third of the genes encoded in the genome are estimated to be potential substrates for protein glycosylation in model organisms such as C. elegans and mouse (68). However, the presence of consensus AsnX-Ser/Thr sequence for N-glycosylation in amino acid sequences does not necessarily mean that the site is actually glycosylated. In addition, not only can the same site be only partially glycosylated, but the carbohydrate chains can have variable size even at a single glycosylation site. Single proteins can also be glycosylated at multiple sites and subsequent processing often modifies a carbohydrate chain attached on each site differently and partially. These factors contribute to the extreme complexity and cause difficulty in characterizing protein glycosylation on a proteomic scale. Historically, glycoproteins were analyzed by separating sugar chains from polypeptide chains enzymatically or chemically from polypeptide chains, and glyco biologists analyzed the structures of sugar chains; whereas protein chemists analyzed the structures of polypeptide chains independently. The identification of glycosylation sites, therefore, had not received much attention for a long period. Because of their biological and clinical importance, the identification of proteins along with their glycosylation sites has been recognized as an important step in the characterization of proteomes (87–89). Two

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

118

major methods using LC-MS/MS are now available for proteomic scale identification of N-glycosylation sites (see Table 1-3): one is based on the conjunction of glycoprotein to a solid support using hydrazide chemistry, isotope labeling of glycopeptides, and specific release of formerly N-linked glycosylated peptides via peptide N-glycosidase (90); another is based on the lectin column-mediated affinity capture of a set of glycoproteins generated by tryptic digestion of protein mixture, followed by peptide-Nglycosidase-mediated incorporation of a stable isotope tag, 18O, specifically into the N-glycosylation site [this method is termed isotope-coded glycosylation-site-specific tagging (IGOT)] (Fig. 2-12A,B) (68). An MS-based method for the identification of O-glycosylation sites has also been developed based on mild β-elimination followed by

(A)

Proteolysis

Biological materials

Lectinaffinity capture

Complex protein mixture

(trypsin)

Glycopropteins

PNGase

Lectinaffinity capture

2DLCMS/MS

H218O 18O-tagged peptides

Glycopeptides

(B) NH R--GlcNAc-N-CO-CH 2-CH H C=O Glycosylated Asn

NH

PNGase

18 H O-CO-CH2-CH

H218O

C=O 18

O-tagged Asp

Fig. 2-12. Identification of N-glycoproteins by LC-MS/MS. (A) Scheme for isotope-coded glycosylation-site-specific tagging (IGOT). In the IGOT approach, N-glycosylated peptides are captured by lectin-affinity chromatography and treated with protein-N-glycanase (PNGase) in H218O to remove the glycan and to incorporate the glycosylation-site-specific tag. The 18O-labeled peptides are then identified by LC-MS/MS. (B) Chemistry of PNGasemediated incorporation of 18O into glycosylated Asn residues. (C) ESI-MS/MS spectra of peptide GPVFANPVAQALVN*SSNYWK (N* ⫽ glycosylated Asn) of the product of the gene C42D8.5 in C. elegans, incorporating 16O (top) or 18O (bottom) by IGOT. The glycosylated Asn residue is converted to Asp by a PNGase reaction, and its signal is shifted by ⫹2 mass units (insets). (D) Mass spectra of tryptic peptides, FPNATDKEGK (residues 32–38) of chicken ovomucoid (a), and that containing the N-linked glycan after PNGase reaction in H216O (– – –), H218O (---), and a mixed one (—) N*-glycosylated (b).

LC-BASED PROTEOMICS TECHNOLOGIES

Fig. 2-12. (Continued)

119

120

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

Michael addition with dithiothreitol. Because this method utilizes basically the same reaction used for the identification of serine/threonine phosphorylation sites, O-glycosylation and serine/threonine phosphorylation have to be discriminated by adopting affinity purification of O-glycosylated proteins or other methods before β-elimination (91). Of these, the IGOT method is taken up as an experimental example here because the method can discriminate N-glycosylation sites based on the structural difference of formerly attached carbohydrate chains. The method consists of (1) affinity capture of glycoproteins by a lectin from complex biological mixtures, (2) tryptic cleavage of the glycoproteins and affinity capture of glycopeptides by the same lectin, (3) peptide-Nglycosidase (PNGase) digestion of the glycopeptides in H218O, and (4) analysis of the 18 O-tagged peptides by an integrated 2D-LC-MS/MS technology (Fig. 2-12A,B). 䉴 Experimental Example 2-5 N-linked glycoprotein-specific mapping by lectin affinity and isotope-coded tagging (Fig. 2-12A,B) (68, 92). MATERIALS • TBS: 50 mM Tris-HCl, pH 7.5, 150 mM NaCl. • Protease inhibitor cocktail (Sigma, Missouri, USA). • Concanavalin A (ConA)-agarose (ConA, specific to high-mannose-type glycans, HONEN ). • Guanidine hydrochloride. • Dithiothreitol. • Ethylenediamine tetraacetic acid·2Na (EDTA). • Extraction buffer (0.5 M Tris-HCl, pH 8.0, containing 7 M guanidine-HCl, 50 mM DTT, 50 mM EDTA). • Iodoacetoamide. • Hepes buffer: 50 mM Hepes-NaOH, pH 7.5. • Sequencing-grade modified trypsin (Promega, Madison, WI, USA). • α-Methyl mannopyranoside (for ConA affinity column). • ConA elution buffer: 50 mM Hepes-NaOH, pH 7.5, containing 0.2 M α-methyl mannopyranoside. • Ethanol (EtOH). • 1-Butylalcohol (BuOH). • Hydrophilic interaction chromatography (HIC) solvent A: 50% EtOH (v/v). • Hydrophilic interaction chromatography (HIC) solvent B: water:EtOH:1-BuOH (1:1:4,v/v). • Stable isotope, 18O-labeled water (H218O, 艌 99 atom% 18O) (Taiyo Nippon Sanso Corp., Japan). • Peptide-N-glycanase F (PNGase F, Lyophilized) (Takara Bio). • PNGase buffer: 0.1 M Tris-HCl/H218O, pH 8.6 (replace 16O with 18O as possible).

LC-BASED PROTEOMICS TECHNOLOGIES

121

APPARATUS • Lectin affinity column (4.6 mm ⫻ 15 cm for HPLC, J-Oilmils). • Sepharose CL-4B column: 5 mm ⫻ 50 mm (GE Healthcare). • Mightysil C18 reversed-phase column (2 mm I.D. ⫻ 35 mm long, 15 μm resin, Cica, Tokyo, Japan). • 2D-LC-MS/MS (see the 2D-LC system, Experimental Example 2-2). PROCEDURE Preparation of ConA-Bound Glycopeptides 1. Culture C. elegans strain N2 in a liquid medium at 20 ⬚C with E. coli HB101 as food, harvest mixed growth phase of worms, separate from bacteria by centrifugation in 30% sucrose solution, collecte, and wash with 0.1 M NaCl. If necessary store at ⫺20 ⬚C until use. 2. Lyse 13 g (wet weight) of mixed growth phase populations of the worm by sonication in TBS containing protease inhibitor cocktail. 3. Centrifuge the homogenate at 100,000g for 20 min at 4 ⬚C. 4. Apply the soluble fraction to a ConA-agarose column (bed volume 30 mL) equilibrated with TBS. 5. After washing the column with TBS, elute ConA-bound proteins with 0.1 M α-methyl mannopyranoside in TBS. 6. Apply the flow-through fraction again to the same ConA-agarose column, recover the proteins, and combine those with the first protein eluate. 7. To avoid artificial proteolysis, precipitate the collected proteins immediately by addition of trichloroacetic acid. 8. Dissolve protein samples (1–2.5 mg/mL) in 0.5 M Tris-Cl (pH 8.5) containing 7 M guanidium hydrochloride and 10 mM EDTA. Bubble the solution with N2 gas for 10 min. 9. Add dithiothreitol (3 mM final concentration of DTT) to the protein solution and mix under N2 gas bubbling at room temperature for 2 h. 10. After 2 h, add iodoacetamide (7 mM final concentration of iodoacetamide) and leave the solution in the dark for 1 h at room temperature. 11. Dialyze the solution against 2 L of 10 mM ammonium bicarbonate buffer (pH 8.0) to remove excess reagents and salts, and then dialyze the solution overnight against 25 mM Hepes-NaOH buffer, pH 7.5. 12. Add TPCK-treated trypsin (1:50 w/w protein) or Lys-C protease (1:100 w/w protein) into the protein solution and incubate with gentle mixing overnight at 37 ⬚C. Check the completion of digestion by SDS-PAGE. If necessary, add another aliquot of protease solution and leave the mixture at 37 ⬚C for complete digestion.

122

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

13. Apply the tryptic digests (5 mg glycoproteins) to a ConA-agarose column (bed volume 5 mL: for the other lectins, a typical lectin column with dimensions 4.6 mm I.D. ⫻ 150 mm can separate ∼400 μg) equilibrated with 25 mM Hepes-NaOH buffer, pH 7.5. 14. Wash the ConA-agarose column with the equilibration buffer until the absorption of the effluent at 280 nm is ⬍0.1. 15. Elute the glycopeptides with the Hepes buffer containing 0.2 M α-methyl mannopyranoside. (Use appropriate oligosaccharides to elute glycopeptides for other lectin columns.) 16. Repeat the ConA-lectin column chromatography three to five times for the flow-through fraction with the same lectin column that is reequilibrated with the Hepes buffer and recollect the glycopeptide fraction under the same conditions as above. Combine the glycopeptide fractions. Fractionation of Glycopeptides by Hydrophilic Interaction Chromatography 17. Add an equal volume of EtOH and 4 volumes of 1-BuOH into the glycopeptide solution eluted from the lectin column. If precipitates are formed, collect by centrifugation, redissolve the precipitate in a small volume of water, and add EtOH and 1-BuOH. 18. Load the sample solution immediately onto the Sepharose CL-4B column equilibrated with HIC solvent B, and wash the column with the same solvent until the absorption of the effluent at 220 nm is ⬍ 0.2. 19. Elute glycopeptides with HIC solvent A, monitoring the eluate at 220 nm. Stable Isotope Tagging of Glycopeptides 20. Load the HIC-eluted glycopeptide solution on a Mightysil C18 reversedphase column (2 mm I.D. ⫻ 35 mm long, 15 μm resin), recover by elution with a small volume of 60% acetonitrile in 0.1% (v/v) TFA, and dry up in 0.5 mL microtube on a centrifugal vacuum concentrator to remove water, H216O, completely. 21. Dissolve the dried glycopeptides (10–20 μg/50–100 μL) in PNGase buffer prepared with H218O. 22. Add PNGase F dissolved in H218O (final concentration; 1 mU/10 μg glycopeptide) to the peptide solution and proceed with the reaction overnight at 37 ⬚C. 23. Acidify the reaction solution by the addition of 1 M HCl to approximately pH 2 (approximately 5 μL/100 μL reaction solution). Analysis of the IGOT Peptides by 2D-LC-MS/MS 24. Follow Experimental Example 2-2.

LC-BASED PROTEOMICS TECHNOLOGIES

123

Protein Identification and Determination of the Glycosylated Site 25. Search the MS/MS spectra against a nonredundant protein database IPI (http://www.ebi.ac.uk/IPI/IPIhelp.html) using Mascot algorithms for peptide sequence identification with the parameters described in Experimental Example 2-2, except use custom modifications, “deamidation of Asn (⫹1 Da)” and “deamidation ⫹ 18O (Asn ⫹ 3Da)”, for deamidation of Asn incorporating light 16O and heavy 18O, respectively. 26. Inspect the “identified peptides” that contain one or more Asp ⫹ 18O(s) on the basis of the MS/MS spectrum, and select the 18O-labeled peptide that has one or more consensus sequences for N-linked glycosylation (i.e., Asn-XaaSer/Thr, where Xaa ⬆ Pro). If the candidate peptide has an Asn-Lys or AsnArg sequence at the C terminus, examine if the residue following Lys/Arg in the protein sequence may be Ser or Thr, which meets the consensus tripeptide sequence. RESULT In the IGOT strategy, the removal of the oligosaccharides attached to the polypeptide chain with PNGase is coupled with the site-specific 18O tagging of the peptides by using H218O, which proceeded simultaneously during the enzymatic conversion of the glycosylated Asn to Asp. This conversion causes mass shifts of ⫹1 and ⫹3 mass unit(s) in the spectra of glycopeptides with N-glycan after the PNGase digestions in H216O and H218O, respectively, whereas no spectral changes occurred for nonglycosylated peptides (Fig. 2-12B, D). Thus, the IGOT method specifically labels N-glycosylation sites and enables one to detect those by MS and MS/MS analysis. When the IGOT method was applied to the glycopeptides prepared from C. elegans proteins by using a ConA-lectin column as an example, the 2D-LC-MS/MS analyses generated about 14,000 MS/MS spectra, which were attributed to 1100–1300 reliable candidate peptides by searching the wormpep database (http://www.sanger. ac.uk/Projects/C_elegans/). After removing the data for the redundant identification and the peptide assignments without the 18O tag, 400 unique N-glycosylation sites derived from 250 distinct proteins were identified. Most of the glycopeptides had a single glycosylation site, while a small portion of the peptides (about 20) carried two sites within a single molecule. The typical MS/MS spectra of the glycopeptides identified by the IGOT strategy are illustrated in Fig. 2-12C. The glycosylation site is clearly distinguished from the nonglycosylated Asn by the presence of the 18O tag and is assigned unambiguously on the genome database based on the series of MS/MS signals. Given the fact that the IGOT is specific to the N-glycosylation sites and can introduce a 2 mass unit difference per site by the tagging reaction in H216O/H218O, IGOT could be used for the quantitative profiling of glycoproteins by a principle similar to that of ICAT/MCAT, if the glycopeptide samples from two different sources, such as those from normal cells and cancer cells, are labeled differentially with 16O and 18O. When the aliquots of ConA-bound worm peptides were processed by IGOT in the

124

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

presence of 16O or 18O, respectively, and the mixed preparation was subjected to the LC-MS analysis, the MS spectra of one of the peptides, for example, were identified in the 16O/18O-tagged peptide mixture (Fig. 2-12D). Although the isotope distributions of the 16O- and 18O-tagged peptides are partly overlapped due to the natural abundance of isotopes, both spectra are distinguishable to a sufficient extent to estimate the relative quantity of the 16O- and 18O tagged peptides. Thus, the IGOT strategy could be applied to estimate the relative abundance of proteins within biological samples, and thereby for the large scale quantitative profiling of glycoproteins, if an appropriate data processing system for automated peak analysis is developed (68). LIMITATION OF THE METHOD The method described uses a lectin column first to collect glycoproteins from crude extracts. The advantage of this is to specify the flame of the structures of the carbohydrate chains attached to the identified sites. On the other hand, this can also be a disadvantage of the method, because only glycoproteins with the carbohydrate chains that bind to the lectin used can be collected, and, thus, the other glycoproteins not bound to the lectin cannot be analyzed by this method. Therefore, to increase the coverage of glycoproteins, multiple lectin columns with distinct specificity need to be used to collect glycoproteins. In fact, glycoproteins captured, for example, from mouse liver extract with ConA (that binds specifically to high-mannose-type glycans) and RCA120 (that binds specifically to complex/hybrid-type glycans with a terminal lactosamine) overlap only about 50% (Kaji et al., unpublished results). The coverage of the glycoproteins may also be increased by applying the method to the analysis of multiple subproteomes, such as subcellular fractions and protein complexes. The method described cannot be applied to O-linked glycoproteins because it utilizes PNGase to incorporate the 18O tag and is only applicable to the identification of N-linked glycoproteins. 䉳 Ubiquitinated Protein Identification and Site Mapping Ubiquitination is a posttranslational modification of protein substrates by ubiquitin, a highly conserved 76 amino acid polypeptide containing 7 lysine residues (Fig. 2-13A), which is a main player in the ubiquitin system and a well-characterized pathway involved in regulating nearly every cellular process in eukaryotes. Ubiquitin at the C-terminal glycine links covalently to the side chain of lysine(s) within the protein substrate through an isopeptide bond (Fig. 2-13A). This is catalyzed by the coordinated actions of an E1 activating enzyme, E2 conjugating enzyme, and E3 ligase. Polyubiquitin chains can be formed through any of seven lysine residues within ubiquitin by E3 ligase (Fig. 2-13A), further elongated by E4 enzymes, and reversely shortened by deubiquitinating enzymes (93). Trypsin cleavage generates unique signature peptides containing glysine–glysine (GG) modified lysine residue from ubuiquitinated proteins as well as polyubiquitin chains themselves, but it cannot break at lysines modified by isopeptide-linked ubiquitin. Each of the signature peptides bears an additional mass of 114.04 Da that corresponds to the MW of GG, denoting the original position of the modification. Database searching algorithms can utilize both the missed

LC-BASED PROTEOMICS TECHNOLOGIES

125

cleavage and GG modification as search criteria when assigning precise sites of ubiquitination; that is, the mass value of the additional GG can be set as the fixed modification parameter. The data search analysis of MS and MS/MS spectra of signature peptides allows the identification not only of ubiquitinated protein substrates, but also of the ubiquitin–ubiquitin linkages that correspond to isopeptide bonds formed between the C-terminal glycine of one ubiquitin and the ε-amino group of a lysine residue within the second. These linkages can be formed through any of seven lysine residues (Fig. 2-13A,B). The identification of ubiquitin–ubiquitin

(A)

Ubiquitin

Human ubiquitin (Ub)

GGR

MQIFVK6TLTGK11TITLEVEPS DTIENVK27AK29IQDK33EGIPPD QQRLIFAGK48QLEDGRTLSDY NIQK63ESTLHLVLRLRGG

K

G -CO-NH-CH2-CO NH CH2 CH2 CH2 CH2 -CO-NH-C-COK

(B) GGRL

GGRL

QRLIFAGK48QLEDGRTL

Ub Ub

K48 Ub Ub

VK6TLTGK11TITLEVEPSDTIENVK27AK29

Ub

K11

Ub Ub Ub

K

K

Ub Ub Ub Ub K

Ub

Ub Ub Ub K

K63 GGRL GRTLSDYNIQK63ESTLHLVLRLR

GGRL GGRL

K29-33 fork

VK27AK29IQDK33EGIPPDQQRLI

Fig. 2-13. Identification of ubiquitinated proteins by LC-MS/MS. (A) Left: Amino acid sequence of human ubiquitin that highlights lysine residues (gray) and C-terminal dipeptide sequence GG. Right: Ubiquitin E3 ligase, responsible for protein ubiquitination, mediates the covalent attachment of C-terminal glycine of ubiquitin to α-amino group of a lysine residue of target protein via the formation of the CO–NH bond. (B) Polyubiquitination. Different types of polyubiquitination reactions occur via the formation of intramolecular ubiquitination between lysine residues at various positions and C-terminal glycine. [Adopted by permission from Macmillan Publishers Ltd.; Kirkpatrick et al., Nat. Cell Biol. (Ref. 93) (2005).] (C) Interaction of proteins with an affinity column bound with a monoclonal antibody (designated as FK2) to ubiquitin under native or denaturing conditions. Lysates of HEK293T cells cultured in the presence of 20 μM leucyl-leucyl-norleucynal are incubated in the absence or in the presence of 4, 6, 8 M urea and subjected to immunoblot analysis with antibodies to ubiquitin (FK2), to VCP, and to α-subunit of the proteasome, indicating that ubiquitinated proteins are associated with VCP and proteasome subunits under denaturing conditions. (D) Determination of ubiquitinated sites by MS. MS/MS spectra of ubiquitinated peptides containing Lys 48 (residues 43–54) or Lys 63 (residues 55–72) of ubiquitin. [From Ref. 102, 2005. Copyright John Wiley & Sons Limited. Reproduced with permission.]

126

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

Fig. 2-13. (Continued)

linkages is very useful for understanding the biological significance of ubiquitination because some specific linkages are found to relate to specific biological functions of the ubiquitinated proteins; for example, degradation of ubiquitinated proteins occurs through a mechanism largely dependent on ubiquitin–ubiquitin linkages formed through K48 of ubiquitin, and for substrate modifications monoubiquitin, K63-linked chains regulate a series of proteasome-independent cellular processes (93). In large scale MS analysis of polyubiquitinated protein substrates, the proteins are typically purified via an N-terminal epitope tag fused to ubiquitin, digested using trypsin, and analyzed by proteome-scale shotgun sequencing (94–96). The histidine–biotin (HB) tag, a new tandem affinity tag for two-step purification, is especially useful for this approach because it allows one to isolate ubiquitinated proteins under denaturing conditions (97). A similar approach was also applied to proteins modified by the ubiquitin-like protein SUMO (small ubiquitin-like modifier) (98, 99). The epitope-tagged approaches are very useful for specifically isolating ubiquitinated proteins. They are also advantageous for the quantitative comparison of ubiquitinated proteins in cells cultured under two or more different conditions, if the methods are combined with methods for quantification, such as

LC-BASED PROTEOMICS TECHNOLOGIES

127

ICAT or SILAC. For example, the isotope-coded affinity tag (ICAT) strategy can be used in combination with the epitope-tag approach for comparing ubiquitinated proteins between wild-type cells and mutant cells lacking an ubiquitin pathway enzyme (e.g., DUB, E2, and E3) (93). These approaches, however, are a bit tricky because the exogenous expression of epitope-tagged ubiquitin may cause unexpected effects over the endogenous ubiquitin on cell functions. This is why most large scale studies have used yeast cells, which offer a distinct advantage over mammalian systems since the multiple genes encoding ubiquitin can be genetically inactivated prior to introduction of epitope-tagged ubiquitin, making it the sole form of ubiquitin within cells (100, 101). Therefore, strategies for enriching targets without using epitope tags, such as using ubiquitin-binding proteins (96) or antibodies against ubiquitin (102), need to be introduced to overcome this difficulty by avoiding the use of epitope tags completely, and thus allowing the large scale analysis of ubiquitinated proteins from untransfected cells, animal tissues, or possibly even clinical specimens (93). For this reason, Experimental Example 2-6 describes a method for the identification of human ubiquitinated proteins isolated by using an antibody against ubiquitin. 䉴 Experimental Example 2-6 Large scale identification of human ubiquitinrelated proteins isolated by using antibody against ubiquitin (102, 103). MATERIALS • • • • • • • • • • •

FK2 mouse mAb to ubiquitin (Nippon Bio-Test Laboratories, Japan). Ubiquitin (Nippon Bio-Test Laboratories, Japan). Protein A Sepharose CL-4B (Amersham-Pharmacia Biotech). 100 mM triethanolamine-HCl (pH 8.3). 50 mM dimethyl pimelimidate (Pierce). 0.1 M Tris-HCl (pH 7.4). 100 mM glycine-HCl (pH 2.8). DMEM (Life Technologies). FBS (Life Technologies). Leucyl-leucyl-norleucinal (LLnL, Roche). Lysis buffer; 50 mM Tris-HCl (pH 7.4), 300 mM NaCl, 0.5% Triton X-100, aprotinin (10 μg/mL), leupeptin (10 μg/mL), 1 mM PMSF, 400 μM Na3VO4, 400 μM EDTA, 10 mM NaF, and 10 mM sodium pyrophosphate. • Solvent A for SCX column chromatography (0.2% formic acid, 25% acetonitrile). • Solvent B for SCX column chromatography (0.2% formic acid, 25% acetonitrile, 500 mM ammonium acetate).

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

128

APPARATUS • LC-Q mass spectrometer (Thermoquest). • SMART HPLC system (Amersham Biosciences). • SCX column; PolySULFOETHYL aspartamide (1.0 mm I.D. ⫻ 50 mm, Poly LC, Columbia, MD). • Mono Q (Amersham Biosciences).

SOFTWARE TOOLS • Mascot (Matrix Science). APPARATUS SETUP • A gradient elution of SCX column chromatography. Time (min) Solvent A% Solvent B% Solvent B% Solvent B%

100 30 60 100

0 30 45 Wash

PROCEDURE Preparation of an Immunoaffinity Column 1. Mix 1 mg of the FK2 mouse mAb to ubiquitin in 0.5 mL of PBS with 0.5 mL of protein A–Sepharose beads. 2. Rotate the mixture for 2 h at 4 ⬚C. 3. Wash the protein A–Sepharose beads first with 5 mL PBS twice and then with 5 mL 100 mM triethanolamine-HCl (pH 8.0). 4. Incubate the antibody-bound protein A–Sepharose beads with 0.5 mL of 50 mM dimethyl pimelimidate in 100 mM triethanolamine-HCl (pH 8.0) for 1 h at room temperature. 5. Wash with PBS, add 0.1 M Tris-HCl (pH 7.4) containing 150 mM NaCl to the antibody-bound Sepharose beads, and incubate 2 h at room temperature. 6. Pack the beads into a column tube, wash the column with 2 mol of 100 mM glycine-HCl (pH 2.8) several times, and equilibrate with PBS.

LC-BASED PROTEOMICS TECHNOLOGIES

129

Preparation of Cell Extracts 7. Culture HEK293T cells to 70–80% confluent under an atmosphere of 5% CO2 at 37 ⬚C in DMEM supplemented with 10% FBS (Life Technologies) and antibiotics (15 cm dish ⫻ 5). 8. Add leucyl-leucyl-norleucinal (final concentration 20 μM) into the culture dishes and incubate for 6–8 h to accumulate ubiquitylated proteins in the cells. 9. Harvest the cells (5 ⫻ 108) and wash with PBS. 10. Add 5 mL of lysis buffer, suspend, and let stand for 10 min at 4 ⬚C. (For denatured protein fractionation, add 2.5 mL of lysis buffer containing 8 M urea and let stand for 30 min at room temperature. Then add 2.5 mL of the lysis buffer.) 11. Centrifuge at 16,000g for 10 min at 4 ⬚C and collect the supernatant. Isolation of Ubiquitin-Related Proteins and Ubiquitinated Proteins 12. Equilibrate an antobody-bound Sepharose CL-4B column with lysis buffer sufficiently. (For denatured protein fractionation, equilibrate an antibodybound Sepharose CL-4B column with lysis buffer containing 4 M urea sufficiently.) 13. Apply the supernatant of the cell extract to the antibody-bound Sepharose CL-4B column (native condition). [For denatured protein fractionation, apply the supernatant of the cell extract prepared under denatured conditions to the antibody-bound Sepharose CL-4B column (denatured condition).] 14. Wash with three column volumes of lysis buffer three times. (For denatured protein fractionation, wash with three column volumes of lysis buffer containing 4 M urea three times, and then wash with three column volumes of lysis buffer.) 15. Elute bound proteins from the column with three volumes of 100 mM glycineHCl (pH 2.8), and neutralize the eluate by the addition of 1 M Tris-HCl (pH 7.4) (1/10 of the eluate volume). [For denatured protein fractionation, elute bound proteins from the column with three volumes of 100 mM glycine-HCl (pH 2.8), and neutralize the eluate by the addition of 1 M Tris-HCl (pH 7.4) (1/10 of the eluate volume).] 16. Concentrate the eluates from the column by approximately one-third and precipitate by the addition of 10% trichloroacetic acid. 17. Wash the precipitate with 90% acetone (10% water). In-solution Digestion of Purified Proteins 18. Dissolve the precipitate in 10 μL of 100 mM Tris-HCl (pH 8.3), containing 8 M urea, and 0.05% SDS for 3 h to overnight at 30 ⬚C. 19. Add 2 μL of 20 mM DTT and incubate 30 min at 37 ⬚C under gently bubbling N2 gas.

130

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

20. Add 2 μL of 100 mM iodoacetamide, and let stand for 30 min at room temperature in the dark. 21. Dilute the reaction solution with 100 μL of water and adjust the pH to 8.0 by the addition of 1 M Tris-HCl (pH 8.0). 22. Digest with 2 μg of trypsin for 16 h at 37 ⬚C. 23. Add 500 μL of solvent A for SCX column chromatography, and remove any precipitate, by centrifugation. LC Separation with an SCX Column 24. Load the resulting peptides onto an SCX column equilibrated with solvent A. 25. Elute the peptides with a gradient of ammonium acetate (described in Apparatus Setup) at a flow rate of 50–100 μL /min. 26. Collect the eluant for every 100 μL and evaporate each to dry. 27. Dissolve with 0.1% TFA in 2% acetonitrile. 28. Analyze each fraction by 1D-RP-LC-MS/MS. RESULT In this study, an antibody-immobilized affinity column chromatography was used to enrich ubiquitin-conjugated proteins and to eliminate free ubiquitin. The monoclonal antibody used recognizes the ubiquitin moiety of ubiquitin–protein conjugates but not free ubiquitin (104). The proteins that bound to this antibody were recovered under native and denaturing conditions, which enables one to distinguish ubiquitinconjugated proteins from their associated protein without ubiquitination. In fact, treatment of lysates with 8 M urea prevented purification by the affinity column of proteasome subunits (subunits 1, 2,3, 5, 6, and 7), VCP, all of which have been shown to bind to ubiquitylated proteins (Fig. 2-13C). On the other hand, proteins including β-catenin, CRM1, and Hsp90 that are polyubiquitylated were recovered by antibodybound affinity column chromatography under the same denatured conditions. Although MS often uses gel-based separation prior to protein identification, it is not suitable for the fractionation of ubiquitylated proteins because of the heterogeneity of polyubiquitin chains. The shotgun approach is especially useful for the analysis of polyubiquitinated proteins. The 2D-LC was performed manually in this case, and a total of 680 distinct proteins were identified by comparison of the resulting CID (collision-induced dissociation) spectra with the IPI database; 350 proteins were identified under denaturing conditions, and 330 proteins were identified under native conditions. The LC-MS/MS analyses can also detect directly protein ubiquitylation from MS data for some proteins based on two criteria: one is an increment in molecular mass of 114 Da for each targeted lysine residue in peptides containing a ubiquitylated site (or sites) due to the addition of tandem glycine derived from the C-terminal of ubiquitin, and another is inhibition of proteolytic cleavage by trypsin at

DEVELOPMENT OF QUANTITATIVE PROTEOMICS

131

the modified site due to ubiquitin conjugation to a lysine residue. The study identified ubiquitylated peptides containing Lys48 (residues 43 to 54) or Lys63 (residues 55 to 72) of ubiquitin (Fig. 2-13D), as well as Lys6, Lys11, and Lys33 as ubiquitylation sites of ubiquitin, indicating that alternative types of polyubiquitin chains are generated naturally in human cells (102). In addition, the ubiquitylation sites of several proteins including ribosomal proteins L3, S18, and L24 as well as Hsp70 could be identified by this analysis. The method described involves purification of ubiquitylated proteins by using antibody against ubiquitin under denaturing conditions, and does not require affinity tags, such as His6 tag; thus, the method may be applicable to the analysis of tissue specificity of protein ubiquitylation and disease-associated ubiquitylation. 䉳

2-2

DEVELOPMENT OF QUANTITATIVE PROTEOMICS

A primary goal of proteomics is to monitor the changes in protein abundances in response to either temporal or environmental changes in the context of biological functions of the proteome. Therefore, it is desirable to develop methods that precisely measure changes in protein abundances in a high throughput manner. Such methods enable us to study the dynamic aspects of not necessarily the entire proteome but at least its subset of cells, tissues, or organs as well as those of multiprotein complexes and subcellular structures in a reasonable time. In classical proteomics, the total protein complement of cells, tissues, or organs is separated by a 2DE gel on the basis of their charge and molecular weight into several thousand proteins all at once and are visualized by fluorescent labeling, silver staining, and so on. Differences between the reference and altered states can be measured by quantifying the ratios of labeling or staining intensities between the independent 2DE gels or the ratios of intensities of differently labeled protein spots overlapped on a single gel in the case of 2D-DIGE (see Section 1-2-1). MS-based methods allow the identification of those protein spots; thus, 2DE combined with MS can be a tool for the dynamic analysis of proteomes. However, 2DE has technical limitations on protein separation ranges, the dynamic range of protein abundance, the resolving power, and the accuracy in quantification and coordination of protein spots. MS protein identification methods complement some of those limitations on 2DE; in any case, the 2DE-based method is a rather low throughput (time-consuming) and less precise approach and also requires a relatively large amount of protein sample (see Section 1-2-3). To overcome the limitations associated with 2D-PAGE-based quantification, a variety of efforts have been made to develop MS-based quantification methods, which allow for the simultaneous and automated identification and quantification of complex protein mixtures on a large scale (48, 105–109). The most prominent problem for quantifying protein levels using MS alone is that it is very difficult to quantify proteins because peptide signals in the mass spectrometer are extremely variable. The analyte molecules have to be ionized and transferred into the gas phase for the detection of peptides using MS (see Section 1-2-3). This ionization process depends partly on the physicochemical properties of the analyte molecules themselves (ionization efficiency) and partly on the presence of other components,

132

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

such as buffer salts, other peptides, and solvents, in the sample mixture at the time of ionization (suppression effects) (110). Because of this, the intensity of a particular peptide signal is not simply a function of the peptide’s abundance unless the conditions are exactly the same at the time of ionization; the ion signals of two different peptides, even when they originate from the same protein, and the signal intensities of the same peptide ion obtained from two independent experiments cannot simply be compared with each other. However, this problem is overcome by introducing a method used for small molecules in analytical MS chemistry into MS-based quantitative proteomics, in which a stable isotope is incorporated into the protein sample of one of the states to be analyzed as described in Section 1-3-5. This labeling method is known as stable-isotope dilution, which involves the addition to the sample of a chemically identical form of analytes containing stable heavy isotopes (e.g., 2H, 13 C, 15N) as internal standards (111). While most of the physicochemical characteristics of the unlabeled and labeled peptides remain the same, the masses of the isotopically labeled molecules are shifted in the MS spectrum; thus, labeling with a stable isotope is a perfect combination with mass spectrometry, which not only easily distinguishes labeled and corresponding nonlabeled analytes, but also allows accurate quantification of differences by examining the ratio between the labeled and nonlabeled mass peaks in the same experiment. This does not mean, however, that the method determines absolute abundances (105). It gives relative abundances unless the exact amount of the internal standard is known. 2-2-1

Isotope Labeling for Quantitative Analysis Using MS

For commonly accepted quantitative analyses using MS, it is a prerequisite that proteins or peptides be isotopically labeled. The labeling methods developed so far in quantitative proteomics can broadly be classified in two categories depending on the manner in which the labeling is accomplished (Fig. 2-14) (110). One category, in vivo labeling, labels proteins through metabolic incorporation into living cells (106) and another, in vitro labeling, labels proteins through chemical reactions after preparation of a protein mixture from cell or tissue extract (see Section I-3-5) (107). Each approach introduces “heavy” and “light” labels into a protein or peptide at distinct points in an experimental system (Fig. 2-14). The earliest introduction of a heavy label is possible when one can control the growth of cells or an organism. Introduction of heavy and light labels as early as possible minimizes differential sample loss; therefore, the in vivo labeling approach is preferable over the in vitro labeling approach in this context. This may be particularly relevant in microorganisms (such as S. cerevisiae, E. coli, and D. radioduran) as well as in mammalian tissue cultures (48). In vivo labeling, however, cannot be applied to most tissue samples or is very difficult to apply to entire organisms. On the other hand, in vitro labeling is applicable to any sample, including those from tissues or organisms. In Vivo Labeling To label proteins uniformly in the in vivo approach, the cell has to be grown under a particular condition upon adding a heavy or light isotope to the growth media. This is usually done by culturing the cell either in 15N-rich media

DEVELOPMENT OF QUANTITATIVE PROTEOMICS

133

(A) In vivo labeling Sample A Light (normal) 14N-rich media or amino acid combined Sample B

Quantification by LC-MS based analysis

protease digest

Heavy 15N-rich media or isotope labeled amino acid (B) In vitro labeling Sample A

Sample B

protein mixture extract A protein mixture extract B

protease digest A

Light (normal) reagent

protease digest B

Heavy reagent

Quantification by LC-MS based combined analysis

Fig. 2-14. Strategies for stable isotope labeling of biological samples for quantitative proteomics (110). In in vivo labeling, proteins are labeled through cell/microorganism culture in media containing isotopically labeled N-rich media or amino acids. In in vitro labeling, proteins are labeled through chemical/enzymatic reaction with isotopically labeled reagents or solvents. Note that the in vivo labeled samples are combined before protease digestion, while the in vitro labeling is performed after protease digestion of the samples. In each case, the differentially labeled sample mixtures are analyzed by LC-MS-based technology for quantitative identification of proteins.

(e.g., commercially available Cambridge Isotope Laboratory Bio-Express-1000, Martek 9-N, Sigman 15N-labeled ammonium sulfate) (112, 113) or in media containing isotope-labeled amino acids [stable isotope labeling by amino acids in cell culture (SILAC); e.g., commercially available Isotec deuterium-labeled L-leucine-5,5,5-D3, 99 atom % D, [13C6] L-leucine CLM-2262, Cambridge Isotope Labs [13C6]arginine, L-4,4,5,5-D4-lysine] (114–116). In either labeling method, each of two cell pools is cultured independently in media containing a heavy or light (natural) isotope until isotopically labeled proteins are accumulated ideally by 100%. The high uptake rate of the isotope label at this stage affects the quality of the mass spectra used for accurate quantitative analysis. The two cell pools are then combined and subjected to further experiment (Fig. 2-14A). An advantage of the metabolic labeling is that the protein lysates are mixed together at an early stage in the experiment, decreasing the chances for an experimental error being inadvertently introduced to one sample and not the other. Disadvantages of this method are that it is limited to situations where the researcher can finely control the growth media of cells, and that it is costly (48). In addition, when isotope-rich media have some

134

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

effects on cell growth, cell morphology, or sensitivity to drugs, the metabolic labeling approach certainly is not applicable. When using 15N-rich media for labeling, the method results in a mass shift depending on the number of nitrogen atoms contained within the protein or peptide. This mass shift complicates the data analysis of MS spectra and may require use of high accuracy mass spectrometry, such as Fourier transform ion cyclotron resonance (FT-ICR) (110). On the other hand, in the case of the SILIAC approach, all proteins are labeled by using a specific, labeled amino acid with known mass shifts (see Chapter 3); that is, there is a predictable incorporation into peptides and thus conventional mass spectrometers can be used for identification and quantification (114–116). Use of isotopically labeled arginine or lysine especially simplifies subsequent MS analysis for quantification when isotopically labeled proteins are digested with trypsin, because each of the produced peptides contains only one arginine or lysine. A drawback of this method is the relatively high cost of labeled amino acids required for cell culture. Otherwise, both metabolic labeling and SILAC in vivo methods allow labeling of almost all proteins synthesized in living cells. No chemical labeling or affinity purification steps are performed, and the method is compatible with virtually all cell culture conditions. Because labeling of proteins with stable isotopes in living cells requires that a metabolic pathway be accessible to the label, it is difficult for multicellular organisms to be labeled in vivo. However, its use is expanded to the labeling of C. elegans and Drosophila melanogaster, which are fed on isotopically labeled E. coli and yeast grown in 15N-rich media, respectively (112). In addition, a method based on the use of culture-derived isotope tags (CDITs) has also been developed for quantitative analysis of tissue or organ proteome, which was actually impossible to label in vivo. In the original method of CDITs, a protein extract of mouse Neuro2A cells cultured in a stable isotope-enriched medium is mixed with mouse brain sample to serve as the internal standard (117). Different states of tissue samples can be compared based on the relative ratio of the mass peaks with the CDITs derived from isotopically labeled cell cultures. In this method, known amounts of synthetic unlabeled peptides can be used as internal standards for the absolute quantification of the CDITs, which can then be used for the absolute quantification of the corresponding tissue proteins. The method certainly opens up in vivo quantitative analysis of tissue proteomes if the closely related cell lines are available. In Vitro Labeling Meanwhile, in vitro quantification approaches introduce label into peptides or proteins through chemical or enzymatic reaction of protein mixtures prepared from cell or tissue extracts of interest (Fig. 2-14B). Those are applicable to any protein samples and do not require cell culture or growing cells for preisotope labeling as for the in vivo labeling approach. The main disadvantage is that several chemical reaction steps for two or more different samples and purification are required, which can lead to sample losses and increase the chance of experimental errors. The best known in vitro labeling method for quantitative proteomics is the one using ICAT reagents (see Sections 1-3-5 and 2-2-2), which react with the thiol group of cysteine residues in proteins or peptides. The original version of ICAT reagent consists of three elements: an affinity biotin tag that allows specific isolation of ICAT-labeled peptides or proteins, a linker that incorporates eight atoms of deuterium

DEVELOPMENT OF QUANTITATIVE PROTEOMICS

135

(heavy d8) or just hydrogen (light d0), and a reactive iodoacetamide derivative that is specific to cysteine (Fig. 2-15A). In this method, proteins from two different cell or tissue states are collected, denatured, reduced, and reacted separately with light d0 or heavy d8 ICAT reagent at cysteine. The labeled protein samples are combined and digested with trypsin. Protein/peptide preparation with streptavidin-immobilized beads that collect the ICAT-labeled peptides reduces the complexity of peptide mixtures and increases the number of sequences that are identified in a single MS/ MS experiment; however, strong interaction between biotin and streptavidin results in low recovery of ICAT-labeled peptides. The biotin tag itself also adds extra mass to peptides (⫹442 Da), causes complex fragmentation during MS/MS analysis, and lowers the quality of MS/MS data—thus reducing the number of identified proteins ICAT (Isotope coded affinity tag) method (A)

O NH

HN

O

S

X N H

X X O

X

X

Biotin-tag

Cleavable ICAT

Biotin-tag Cleavable linker

X O

O

X

2H8

X

or H8

N H

Reactive group

-C10H17N3O3 12C9

or 13C9 Reactive group 227 amu (light reagent) 239 amu (heavy reagent)

Fig. 2-15. (A) A classical isotope-coded affinity tag (ICAT) reagent. The ICAT reagent consists of a reactive group to cysteine residues of peptides/proteins and a biotin tag connected with an isotopically labeled linker. A modified reagent, cleavable ICAT, has a linker that can be cleaved under acidic condition to improve recovery of labeled peptides and incorporates a mass difference of 12 units/cysteine. [Reprinted by permission of Elsevier from Ref. 108. Copyright (2000) by Elsevier Science Publishers.] (B) iTRAQ reagents. The reagents consist of a charged reporter group, a reactive group to α-amino groups of peptides/proteins, and a neutral balance group. [Reproduced from Discovery of Biomarkets for Endomaterial Cancer with BioScience Technology, via Reprint Management Services.] (C) 2-Nitrobenzenesulfenyl (NBS) reagent. NBS reacts to tryptophan residues of peptides/proteins and incorporates a mass difference of 6 units/tryptophan. [From Kuyama et al., Rapid Commun. Mass Spectrum. 17:1642–1650 (2003), Copyright John Wiley & Sons, Ltd. Reproduced with permission.] (D) 18O labeling of the carboxyl terminus of peptides. The stable labeling takes place during protease digestion in H218O through formation of an acylenzyme intermediate complex. [Reprinted with permission from Yao et al. (Ref. 128). Copyright (2006) American Chemical Society.] (E) Guanidination of lysine residues. O-methylisourea reacts to ε-amino group of lysine and produces homoarginine. The 13C, 15N doublelabeled O-methylisourea introduces a mass difference of 3 units/lysine.

136

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

per experiment. In addition, some peptides labeled with heavy d8 ICAT reagent have retention times different from the corresponding peptides labeled with light d0 ICAT on reversed-phase column chromatography (118). Peptides that do not coelute can bias quantitation because the different backgrounds that are present at different elution times can result in different ionizations (119). Therefore, several improvements have been made to the ICAT methodology. The first improvement addresses crosslinking peptides to beads via their cysteine groups and photoreleasing them afterwards. The method solves many of the above-mentioned problems and leads to a larger number of identifications of cysteine-containing peptides. However, the method may compromise low level analysis (115, 120). The second improvement is the development of cleavable ICAT reagents that incorporate an acid-cleavable linker (with 13C9 rather than deuterium 2H8 into the ICAT heavy reagent molecule) into the ICAT molecule (Fig. 2-15A). The reagents allow removal of the biotin affinity tag before MS and MS/MS analysis—thus improving MS and MS/MS performance and significantly increasing the number of proteins identified and quantified in a single experiment. The reagents also improve in separation of the heavy and light isotopes in reversed-phase chromatography, thereby increasing accuracy of quantification measurements by LC-MS. The ICAT reagents, including the original and acid cleavable versions, are commercially available (Applied Biosystems Inc., California), and are used probably most commonly in quantitative proteomics. A variation of the ICAT technology, iTRAQ (Applied Biosystems Inc.), has also been introduced. Although similar in their basic concepts, the ICAT and iTRAQ

Isobaric Tag (Total mass = 145)

(B)

Peptide Reactive Group

Balance (Mass = 31 to 28)

Reporter (Mass = 114 to 117)

PRG Charged

Neutral loss

Gives strong signature ion in MS/MS Gives good b- and y-ion series Maintains charge state Maintains ionization efficiency of peptide Signature ion masses lie in quiet low mass region

Balances the mass change of reporter to maintain a total mass of 145 Neutral loss in MS/MS

Fig. 2-15. (Continued)

Amine specific

DEVELOPMENT OF QUANTITATIVE PROTEOMICS

137

(C) R

R * H + ClS N H

* S

*

*

O2N

tryptophan residue

*

*

N H

*

* *

* *

O2N

*

+ HCl

NBSCl

NBS (2-nitrobenzenesulfenyl) method * : six

13

C(heavy) or six 12C(light)

Trypsin Glu - C Lys - C

O (D) NH

O–

2

NH

H218O

R′

18OH

Trypsin Glu - C Lys - C H218O

R′ O

O 18O–

R′

Ser-Enz Trypsin Glu - C Lys - C

18O– 18O

18O–

OH2

OH R′ O

H218O

R′

18

Trypsin Glu - C Lys - C H218O

O– 18O

R′

Ser-Enz (E)

Guanidination of lysine

NH2 C NH

NH2

NH2

CH2

CH2

CH2 CH2

pH 10

C NH

+

CH2 CH2

O CH3

CH2

+

CH3OH

CH2

HN CH C

HN CH C

O lysine H2N14

O homoarginine

O-methylisourea 12C

14NH

OMe Light reagent

H2N15

13C

15NH

OMe Heavy reagent

Fig. 2-15. (Continued)

methanol

138

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

tagging technologies differ in the reactive group (Fig. 2-15B). The ICAT method relies on tagging cysteine residues and isolating peptides containing these tagged residues by affinity chromatography. The net result is a reduction in the complexity of peptide pools generated by digestion with proteases including trypsin. In the case of the iTRAQ method, tagging is on primary amines. This difference in labeling strategy eliminates the dependence on relatively nonabundant cysteine-containing peptides intrinsic to ICAT-based methods, thus potentially allowing the tagging of most tryptic peptides. In addition, the iTRAQ technology performs relative quantification via MS/MS and uses four possible tags, which permits multiplexing of up to four samples in a single experiment. Quantification is performed via the differences in abundances of four product ions, 114, 115, 116, and 117 mass unit reporter parts (Reporter in Fig. 2-15B), that are each cleaved from one of the four possible tags. The tags have an identical mass (Isobaric tag, total mass ⫽ 145, in Fig. 2-15B), a result of differences in other parts of the iTRAQ tag structure (Balance in Fig. 2-15B), with the consequence that an identical peptide in the four samples will have an identical mass and LC retention time after tagging. This strategy simplifies analysis and will potentially increase analytical accuracy and precision (121). There are a number of other in vitro labeling techniques similar to the ICAT method, which require chemical modification of the peptides or proteins (109, 122, 123). Some of those methods couple the labeling and peptide selection steps as in the ICAT method, whereas others decouple these two steps or do not include the affinity step (124). At least two methods label at specific amino acid residues: one involves the derivatization of an imidazole group of tryptophan with the isotopically heavy 2-nitrobenzenesulfenyl (NBS)-Cl-13C6, which introduces a 6 mass unit differential from the corresponding light reagent NBSCl-12C6 (Fig. 2-15C) (125), and another involves the derivatization of the sulfhydryl group of cysteine with commercially available d3-acrylamide, which differs by 3 mass units from its natural d0 counterpart (126). The other in vitro labeling methods include derivatization of either carboxyl or amino groups of peptides for the incorporation of stable isotopes (110). Labeling of a carboxyl group can be done by esterification with normal or deuterated methanol hydrochloric acid (123). This is simple and less costly; however, it causes esterification of not only the carboxyl termini of peptides but also the carboxyl groups of side chains of aspartate and gultamate, resulting in low yield, complicated reactions, or unpredicted effects on peptide ionization and fragmentation. Enzymatic digestion of proteins with trypsin, Glu-C, or Lys-C in the presence of 18O-water also specifically labels the carboxyl termini of peptides at lysine/arginine, glutamic acid, or lysine; this relies on digesting one sample in 16O- and a parallel sample in 18O-containing water (Fig. 2-15D) (127–130). Two 18O atoms are incorporated into the carboxyl termini of all peptides during the proteolytic cleavage of all proteins in the first pool (Fig. 2-15D). Proteins in the second pool are cleaved analogously with the carboxyl termini of the resulting peptides containing two 16 O atoms (i.e., no labeling). The two peptide mixtures are pooled for fractionation and separation; and MS measures the masses and isotope ratios of each peptide pair (128). This labeling method introduces a 4 mass unit difference so that high mass resolution is required to distinguish the labeled peptides. However, proteolytic 18O labeling specifically

DEVELOPMENT OF QUANTITATIVE PROTEOMICS

139

occurs during the protease digestion process and does not require any other experimental step before MS and MS/MS analysis; thus, it may be suitable for a shotgun approach and is a useful tool for comparative proteomics studies of very complex protein mixtures if MS equipment with high mass resolution is available. Analogous to the derivatization of carboxyl groups, labeling can be done at amino groups present in peptides. One method, known as mass-coded abundance tagging (MCAT), relies on selective and quantitative guanidination of the ε-amino group of C-terminal lysine residues of tryptic peptides at high pH with O-methylisourea (Fig. 2-15E) (109). This method efficiently transforms lysine into homoarginine, which is 42 Da heavier than lysine, but does not affect the peptide amino terminus or other side groups except for the N-terminus of glycine if it is the Nterminal residue of peptides. Despite the obvious structural difference between the modified and unmodified tryptic peptides, the MCAT approach, by comparing two samples where only one is chemically modified instead of using an isotopically labeled reagent, is proposed to serve as an effective method for determining the relative abundance of peptide based on the fact that: (1) O-methylisourea modifies all lysine-containing peptides present in the mixture in a quantitative manner; (2) the mass tag is easily resolvable by MS; and (3) the modification preserves the charge and ionization properties of peptides so that the efficiency of ionization and overall signal intensity are unaffected (109). However, because one of the compounds is chemically distinct from the other, time- and sample-consuming reversed-labeling experiments have to be performed to take the different ionization efficiencies into account to ensure reliable quantitation data (110). This approach may better serve as an efficient de novo sequencing method because, when analyzing as the mixture of unmodified and modified MCAT peptides, the C-terminal daughter ions produced during MS/MS fragmentation of alternately unmodified and MCATmodified sister peptides exhibit a mass differential of 42 Da (easily resolvable by MS regardless of charge state), which makes the data acquisition for de novo sequencing easier (109). By taking advantage of the specificity and efficiency of the MCAT reaction, as well as unchanged behaviors of fragmentation and ionization other than the mass difference of the C-terminal daughter ions between unmodified and MCAT-modified peptides during MS/MS analysis, an economical and efficient in vitro labeling method is developed for large scale quantitative proteomics (see Section 2-2-2). In this method, both isotopically labeled [H215N13C(OCH3)15NH] (heavy) and unlabeled [H214N12C(OCH3)14NH] (light) O-methylisourea are used for modification of the ε-amino group of lysine residue of tryptic peptides instead of just one unlabeled Omethylisourea as in the MCAT method (Fig. 2-15E). This modification introduces 3 mass unit differences between the heavy and light O-methylisourea-modified peptides. Because the reagents modify principally all peptides generated by trypsin or Lys-C protease digestion (whereas the ICAT method labels cysteine-containing peptide and reduces the complexity of the generated peptide mixture), the generated peptide mixture is extremely complex. In addition, because heavy and light reagents introduce only 3 mass unit differences, MS equipment with high resolution has to be used for

140

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

quantification. However, this method does not require any extra step for prefractionation of peptides and increases the chance for the quantification of multiple peptides obtained from the same protein when compared with the ICAT method (see Sections 1-3-5, 2-2-2, and 3-2-3 for details on Experimental Example 3-4). In addition, the labeled and unlabeled peptides using isotopically labeled O-methylisourea separate neither in ion-exchange LC nor in reversed-phase LC; thus, the method may be suitable for large scale shotgun analysis with multidimensional LC-MS systems. Another method similar to this has been developed but uses a different reagent. The method involves acylation of all primary amino groups; that is, N-terminal, ε-amino groups of lysine residues are acylated by using acylation reagent, 13C4 succinic anhydride (118, 122). This introduces a 4 mass unit difference between the heavy and light peptides, which behave identically on reversed-phase column chromatography. The isotope ratio 13C/12C coding can be determined from a single mass spectrum taken at any point in the elution profile of reversed-phase chromatography. Therefore, the mixture of peptides labeled with 13C/12C reagents can be subjected directly to LC-MS/MS for large and highly accurate quantitative analysis without any concern about separation of isotopically different peptides. 2-2-2 Quantification Strategy for LC-MS Analysis of Isotope-Labeled Peptide Mixture and Software for Computer Analysis Various strategies based on stable isotope coding and MS have been developed as described previously and are becoming increasingly popular as alternatives to 2DPAGE-based methods for quantitative proteomics. In most of those strategies, differentially labeled peptides are analyzed by LC-MS/MS in a data-dependent manner in which MS/MS scans are automatically triggered after MS survey scans. Usually, protein identification is based on MS/MS data, and quantification is based directly on MS data (Fig. 2-16A left). In some cases, quantification is also performed based indirectly on extracted ion chromatograms (XICs) of the MS signal (XICs are given as total intensities of a series of mass peaks originating from either an isotopically labeled or unlabeled peptide) (Fig. 2-16A right) (114, 131). Protein ratios can be determined by comparing the relative intensities of MS signals (or XICs) from differentially labeled peptides (132). The intensity of an MS spectrum can be measured by the manual calculation of the MS peak area, which is actually applicable for a small amount of MS data obtained typically by MALDI-ToF or nanospray-ESI-MS or small scale LC-MS analysis of protein spots excised from 1D or 2D-PAGE gels. Elimination of peptide pairs with equal intensities from the quantitative data analysis increases the efficiency of quantitative calculations (because those indicate proteins whose expression is unchanged and need not be investigated further) (110); thus, a considerable proportion of quantitative proteomic analysis has been carried out manually (112, 115, 128). In a quantitative calculation, isotopic mass pairs on MS spectra need to be recognized based not only on the characteristic isotope pattern but also on the abundance ratios of MS peaks. In particular, the ESI-MS step in LC-MS/MS analysis generates a set of multiple mass peaks due to the multiple ionization and natural abundance of isotopes (this may not be true for MALDI-MS because the ionization predominantly generates singly charged ions for peptides.); for example, when the mass difference

DEVELOPMENT OF QUANTITATIVE PROTEOMICS

141

is 3 mass units between heavy- and light-labeled peptides using O-methylisourea, ESI-MS generates 2⫹ or 3⫹ ions for each of those peptides, and a complex series of signals with a mass difference of 1.5 or 1 amu for each is produced (Fig. 2-16C). Such a set of multiple mass peaks generates characteristic isotope patterns for each of the used labeling methods, depending on which atoms are isotopically labeled and the mass difference between the heavy and light forms in a single peptide. Because of this, the mass signals for a heavy peptide often overlap in part with those for the corresponding light peptide. Therefore, for accurate measurements of the relative abundance of a peptide pair having heavy and light peptides, the partial overlap of mass signals based on the theoretical abundance of natural isotopes has to be differentiated mathematically and corrected (Fig. 2-16D) (47). This correction is critical when the mass difference between heavy and light peptides is small, for example 3 or 4 mass units. Those calculations require tedious effort; however, they can be done manually when the number of mass peaks is small. [Online software for this manual calculation (A)

L

(B)

MS

L H

H

m/z MS/MS

m/z L

m/z

y(L)

XIC

H

Protein ID

MS

b

MS/MS y(L) y(H) b y(H) m/z

Time Ratio

Protein ID + Ratio

Fig. 2-16. Schematic view of different strategies for protein quantification. (A) The XIC- or MS-based strategies. A small precursor window is used to isolate a single peptide to obtain MS/ MS data for protein identification. Quantification is based directly or indirectly (through XIC) on MS data. [From G. Zhang and T. A. Neubert, Mol. Cell. Proteomics 5(2): 401–411 (2006). Copyright (2006) by American Society for Biochemistry and Molecular Biology, Inc. Reproduced with permission of the ASBMB via the Copyright Clearance Center.] (B) The multiplex MS/MS-based strategy. A wide precursor window is used to isolate the peptide pair to obtain multiplex MS/MS data, which are used for both protein identification and quantification. H, heavy; L, light. [From G. Zhang and T. A. Neubert, Mol. Cell. Proteomics 5(2): 401–411 (2006). Copyright (2006) by American Society for Biochemistry and Molecular Biology, Inc. Reproduced with permission of the ASBMB via the Copyright Clearqnce Center.] (C) A typical mass spectrum of a peptide pair, with an amino acid sequence ZELAAAMK, labeled with “light” and “heavy” O-methylisourea. (From Ref. 33.) (D) Intensity correction. In the isotope labeling with O-methylisourea, the resulting mass signals of a peptide pair labeled with the heavy or light reagent overlap as a consequence of a small mass difference between labeling agents. Thus, the observed signal ratio is corrected by calculating the natural peptide signal ratio that is estimated theoretically from the natural isotope abundance ratio. (See insert for color representation.)

142

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

Fig. 2-16. (Continued)

is available at ProteinProspecter site (“MS-Isotope,” http://prospector.ucsf.edu/ucsfhtml4.0/msiso.htm, provided by P. Baker and K. Clauser).] An increasing number of quantitative analyses using MS are done by combining it with high resolution LC, such as multidimensional LC, which allows separation of hundreds of peptides in a single experiment. Those peptides are ionized sequentially by ESI, and peptide ions are selected automatically, for example, in the order of decreasing signal intensity, for

DEVELOPMENT OF QUANTITATIVE PROTEOMICS

143

fragmentation by CID. Thus, the analysis often generates many thousands of MS and MS/MS data all at one time, which makes manual calculation unrealistic. Therefore, development of computer programs is a prerequisite to performing high throughput quantification of massive datasets generated by such LC-MS/MS analysis. A number of software packages have been developed to automatically convert mass spectrometry-derived data of peptides into relative protein abundances; however, only several of them, such as XPRESS (133), ASAPRatio (134), RelEx (135), and a few others (136), have achieved relative popularity. For instance, the ASAPRatio program is based on an algorithm for the automated statistical analysis of protein abundance ratios of proteins contained in two samples. The algorithm utilizes the signals recorded for the different isotopic forms of peptides of identical sequences and employs numerical and statistical methods, such as Savitzky–Golay smoothing filters, statistics for weighted samples, and Dixon’s test for outliers, to evaluate protein abundance ratios and their associated errors. The algorithm also provides a statistical assessment to distinguish proteins of significant abundance change from a population of proteins of unchanged abundance (134). The software for quantitative analysis using original and cleavable ICAT reagents is commercially available (Applied Biosystems), although it can only be used with a limited number of MS equipment [i.e., Applied Biosystems/ MDS SCIEX API QSTAR® Pulsar System (Pro ICAT Software) and Applied Biosystems 4700 Proteomics Analyzer (GPS Explorer™ Software)]. Use of a computer quantification program and automated search engine, such as SEQUEST (in which XPRESS, ASAPRatio, and RelEx permit rapid data processing after identification of proteins) (137), Mascot (138), or Sonar (139), which match their fragment ions against a designated protein database, is required for peptide identification; namely, the identification and quantification of proteins are usually done by combining the information for quantification and identification obtained independently from the peptides that associate with the particular protein (56, 134). While those quantitative software packages for comparative proteomics experiments are designed primarily for ICAT or metabolic labeling studies, which introduce a relatively large mass difference, equivalent software packages for analyses that use isotope tags having a smaller mass difference comparable, for instance, to O-methylisourea [H214N12C(OCH3)14NH](light) / [H215N13C(OCH3)15NH](heavy) or H216O/H218O differential labeling, have also been developed (47, 140–142). One such software is STEM (STrategic Extractor for Mascot’s results), which efficiently processes Mascot search data and is compatible with quantitative proteomics studies that utilize various stable isotope tags (47) (http://www.sci.metro-u.ac.jp/proteomicslab/). For instance, the difference in the masses of the isotopes can be set in this software for any type of isotope used for labeling. In addition, when mass peaks in the MS spectrum are overlapped as a consequence of a small mass difference between heavy and light peptides (e.g., 4 mass unit difference for H216O/H218O labeled peptides, 3 mass unit difference for O-methylisourea labeled peptides), the natural peptide peak ratio is theoretically calculated using the natural isotope abundance ratio. Then, the observed peak ratio is corrected using the calculated value. Without those computer programs for peptide identification and quantification, the high throughput LC-MS/MS approach cannot routinely identify and quantify hundreds to

144

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

thousands of proteins from a biological sample pair (143) (tools.proteomecenter.org/ software.php.). Thus, the development of computer programs that are fully automated, fast, and independent of instrument data type cannot be overemphasized. A strategy using MS/MS data for automated quantification has also been proposed as an alternative strategy to use MS or XCIs data (132). MS/MS-based quantification is believed to offer great potential for enhancing accuracy and dynamic range, because MS/MS data generally have much better signal-to-noise ratio than MS or XICs data (115, 131, 144, 145). Two strategies for MS/MS-based quantification—those using isobaric tags (Fig. 2-15B) and those using “data-independent” acquisition—have been reported. The isobaric tags, such as iTRAQ, contain labile bonds that are easily cleaved during MS/MS; the resulting fragment ions are of different masses and can be used for quantification (146, 147). Because this method relies on fragment ions from the tags, it cannot be used for metabolic isotope labeling experiments. On the other hand, data-independent acquisition is based on the sequential isolation and fragmentation of defined window increments; for example, a 2.5 m/z precursor window increment is used initially within the ion trap until a desired mass range is covered (131). When the mass spectrometer scans MS/MS signals quickly enough, protein quantification is reliable because it is achieved by reconstructing XICs from fragment ion intensities of MS/MS spectra. However, if a narrow range of predetermined precursor ion selection windows is used, it would differentially influence signal intensities for each member of a labeled and unlabeled peptide pair when one member of the pair is near a selection window boundary (132). To overcome this problem, a wide precursor window (10 m/z) is applied to include both the light and heavy peptides for simultaneous fragmentation (Fig. 2-16B); the resulting multiplex MS/MS data can be used for protein identification by, for example, Mascot and quantification by a simple program, such as that written with Perl language (132, 148), that compares fragment ion intensities derived from the light and heavy peptides. In this data-independent acquisition method with wide precursor window, labeled and unlabeled peptide peaks are included for simultaneous fragmentation. For instance, when Lys and Arg labeling is used in combination with tryptic digestion, the C-terminal ions (typically y ions) appear as doublets with predictable mass differences in multiplex MS/MS spectra, whereas N-terminal ions (typically b ions) appear as singlet peaks. The multiplex MS/MS spectra contain all the sequence information obtained from standard uniplex MS/MS spectra, so that these multiplex spectra can be used for protein identification. In addition, the same MS/MS data can be used for quantification because the relative intensities of the C-terminal ions reflect the relative peptide abundances (Fig. 2-16B) (132,148). The C-terminal ions (typically y ions) are easily recognized in multiplex MS/MS spectra, because they appear as doublets with predictable mass differences. Based on those characteristics for multiplex MS/MS spectra, an automated software called MS2Ratio has been developed for comparative quantification using stable isotope labeling (132). Although this software uses Mascot for protein identification, which supports data from most commercially available MS instruments and provides an interface to allow convenient presentation of results and data visualization (138), it is incorporated into the reporting system of Mascot. The procedure for calculating relative protein ratios

DEVELOPMENT OF QUANTITATIVE PROTEOMICS

145

using MS2Ratio contains a series of steps: (1) protein hits with matched peptides are obtained from a Mascot result file; (2) peptides that are singly charged or have missed cleavages are excluded; (3) all matched y ions and their labeling partners for each eligible peptide are found; (4) peptide ratios are calculated based on y ion intensities; and (5) protein ratios and errors are calculated. This software also applies two filters for accurate calculation: one removes potential interference from isobaric fragment ions if any, and another excludes low intensity y ions—namely, all y ions with intensities below 10% of the most intense y ion are neglected, and the peptide ratio is calculated as the ratio between the sum of all y ion intensities from the labeled and unlabeled peptides instead of simply averaging ratios of all quantifiable y ions (132). The software improves sensitivity for protein identification, accuracy for quantification with an extended dynamic range, and speed of the quantification process. Most importantly, the method can be fully automated and applied independently of instrument data type (132). 2-2-3

Label-Free Quantification Software

MS-based stable isotope labeling methods provide powerful approaches for quantitative proteomics; however, those methods still have limitations on the analysis application that aims to monitor a large number of proteins in many samples, such as that for biomarker discovery or for the study of cellular response to environmental perturbations. For example, biomarker discovery needs to process large numbers of samples to achieve sufficient statistical power to distinguish disease-specific markers from coincidental proteome fluctuations within the human population. The in vivo labeling approach cannot be used for such clinical samples, whereas the in vitro labeling approach may have difficulties in completely matching a large number of samples (143). Substantial efforts have been made to develop methods that quantify peptides without stable isotope labeling but meet requirements of high sample throughput and highly reproducible coverage of the proteome for a large number of samples (61, 143, 149, 150). Those methods are referred to label-free quantification methods (151), which rely on peptide mapping by LC-MS not by LC-MS/MS (Fig. 2-17). The idea behind those methods is that the MS signal intensity of each peptide in a substantially similar sample analyzed under identical conditions is proportional to the abundance of the peptide within the dynamic range of the instrument and is at least monotonic in abundance beyond the dynamic range (60, 143, 152). In most quantitative LC-MS/MS methods, a significant fraction of the peptides present in a sample are not selected by the mass spectrometer for CID (60). Those unanalyzed peptides are generated randomly but the chances of escape from CID by the mass spectrometer are increased for low abundance peptides. Therefore, the number of peptides unanalyzed by MS/MS, especially low abundance peptides, will increase when a multiple number of substantially similar samples are being analyzed. Conversely, peptides that are consistently detected tend to be high abundance peptides that generate intense MS signals. As a result, quantitative information on low abundance proteins tends to be eliminated across multiple samples by LC-MS/MS. However, LC-MS analysis does not perform peptide (or MS peak) selection for CID; thus, all peptide MS peaks can be used for quantitative comparison. As described in

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

146

Section 2-2, the most prominent problem for quantifying protein levels is that the MS method alone cannot quantify proteins because peptide signals in the mass spectrometer are extremely variable due to the ionization process, which depends partly on the physicochemical properties of the analyte molecules themselves (ionization efficiency) and partly on the presence of other components in the sample mixture at the time of ionization (suppression effects) (110). It is therefore obvious that in label-free quantification methods, the relative abundance of a peptide in different, related samples needs to be analyzed under identical LC-MS conditions and the MS signal intensity of that same peptide must be compared for different LC-MS runs (Fig. 2-17) (61, 143, 150, 152). The experimental processes of LC-MS analysis for label-free quantification are almost the same as those used for shotgun LC-MS/MS analysis without performing CID. However, the mass analyzer used is preferably a high resolution, high accuracy spectrometer, such as a Q-ToF type or similarly performing analyzer. In addition, the MS analyzer requires coupling with an extremely accurate or reproducible LC system in terms of flow rate and gradient formation of elution solvent to achieve identical ionization efficiency and suppression effects at the time of ionization of the same peptide eluted from the LC column in different runs. The main differences from the quantitative LC-MS/MS methods are the algorithm (that compares peptide patterns and extracts peptide relative abundance from LC-MS data) and the approach for peptide identification (in that information such as m/z, charge, and retention time on discriminatory peptides is provided to the mass spectrometer and is analyzed by targeted MS/MS and database searching). The label-free quantification methods first measure the relative abundance of thousands of peptides and then focus the power of MS/MS selectively on discriminatory peptides.

n

tio

ra

a ep

Only the same peptide peak can be compared.

-s

LC

MS separation

MS peak intensity

MS peak intensity

Only the same peptide peak can be compared.

n

tio

ra

a ep

-s

LC

MS separation

Fig. 2-17. Principle of the label-free quantification based on LC-MS analysis. The method compares the LC-MS spectra of a pair of samples without isotope labels and quantifies the relative abundance of each peptide pair directly from their signal intensities. It is suitable for the analysis of samples that have similar protein composition such as the serum from healthy humans and patients with a particular disease. The method is based principally on the assumption that the signal intensity of each peptide in a substantially similar sample analyzed under identical conditions should be proportional to the abundance of the peptide within the dynamic range of the instrument and is at least monotonic in abundance beyond the dynamic range.

DEVELOPMENT OF QUANTITATIVE PROTEOMICS

147

Several software packages for label-free quantitative analysis using LC-MS have been developed. Of these, SpecArray is available for free under an open source license at tools.proteomecenter.org/software.php (143). The software takes a set of LC-MS data as input and outputs a peptide versus sample array and stores the relative abundance of thousands of peptide features matched across all samples. It formats a peptide array in an identical manner to that of a gene expression microarray, except that peptide features replace gene names in a peptide array, which can be subjected to unsupervised clustering analyses to classify sample types and/or to discriminant analyses to identify peptides between samples having different characteristics, just like in a DNA microarray. This software also allows one to visualize LC-MS data in graphic images to ensure data quality, by extracting high quality MS signals from raw noisy MS signals, applying pattern matching to extract peptide features from MS signals, aligning peptide features between different samples, and generating a peptide array from aligned peptide features in all samples. In addition, the software has a feature that evaluates a retention time calibration curve between any two samples and then aligns peptides of all samples by using this calibration curve. This alignment minimizes the adverse effects of possible retention time shift in different LC runs. Furthermore, the software performs sample-dependent ratio normalization before reporting the peptide relative abundance in order to correct any systematic errors due to uneven sample loading or uneven ionization efficiency (tools.proteomecenter.org/software.php.) (143). Other software packages for label-free quantitative analysis include those applied to 2D-ICAL (two-dimensional image-converted analysis of liquid chromatography and mass spectrometry), which analyzes vast amounts of data generated by nanoflow LC-MS, to differential protein expression analysis (151), and to online LC-FTICR MS/MS studies in which a set of initially unidentified peptides from a proteome analysis can be selected for identification based on their distinctive changes in abundance following a “perturbation” (153). Thus, most label-free quantification analyses require comparison of identical proteolytic peptides in each of the two experiments to accurately determine relative ratios of the particular proteins of interest. The values of relative abundance for each peptide for a given protein can then be obtained to quantitatively characterizing the differential expression of proteins between different sample states. Quantification using these methods is based on determining the ratios of the peak area of identical peptides between different conditions (Fig. 2-17). One critical factor limiting the quantitative reproducibility of these methods is the ability to efficiently cluster the detected peptides as described earlier. This in turn relies on the accuracy of the mass measurement and the chromatographic reproducibility (154). 2-2-4

Absolute Quantification

Various methods for quantitative proteomics have been developed as described in previous sections; those include the stable isotope quantification method differing in isotope composition and incorporation strategy (such as metabolic labeling, SILAC, ICAT, iTRAQ, 18O labeling, and O-methylisourea; see Sections 2-2-1 and 2-2-2) and label-free quantification methods utilizing software for large scale intensity

148

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

comparison and matching of MS peaks obtained by LC-MS analysis. In principle, those methods establish the relative quantification of proteins expressed in cells and tissues or present in multiprotein complexes or subcellular structures; relative quantification monitors changes in protein abundance under several conditions caused by environmental perturbation such as those induced by drug, disease, or other stimuli (see Section 1-3-5). However, those relative quantification methods neither determine the absolute quantity of proteins present in samples nor provide quantitative comparisons among different proteins within a sample. To address this, an absolute quantification (AQUA) strategy is needed for the precise determination of protein abundance (155). The AQUA strategy is the inevitable task in quantitative proteomics and is also required for understanding the dynamic nature of complicated biological networks of cooperative protein interactions or multiprotein complexes (see Sections 1-3-3, 1-3-4, and 1-3-5). Method Using Stable Isotope-Labeled Reference Peptides One of the most popularly accepted approaches for the determination of absolute concentration of proteins in a complex mixture involves the use of stable isotope-labeled peptides spiked into the mixture as an internal standard (Fig. 2-18). The isotopically labeled reference peptide is chemically identical to one of the naturally occurring peptides of a given protein except that it has a predetermined higher mass (156). Using reference peptides at a known concentration with stable isotopes that can be distinguished from the corresponding peptides in a complex mixture by a mass spectrometer, the signal intensity ratio observed between the known standard and unknown sample versions of the same molecules provides a measure of the abundance of each species in the original sample. To measure the quantity of a protein precisely, a calibration–response curve for specific polypeptides from a protein of interest has to be generated by using one or more external reference peptides (Fig. 2-18). The absolute quantification of the given protein is determined from the observed signal response for the specific polypeptide in the sample relative to that generated in the calibration curve. If the absolute quantification of a number of different proteins is to be determined, separate calibration curves are necessary for each specific external reference peptide for each protein (154). The approach for absolute quantification of proteins using synthesized peptides can be summarized as follows: (1) the reference standard is introduced to a complex mixture; (2) the mixture is analyzed using LC-MS (or MS) to measure the corresponding signal intensity for the derivative peptide along with the endogenous peptide; and (3) the intensity signal response is compared with an intensity calibration curve created using the introduced synthetic molecule to determine the amount of endogenous protein in the mixture (157). The same methods can also be applied to the quantitative analysis of PTMs, in which changes in modified states of particular peptides are measured in protein mixtures obtained from multiprotein complexes, subcellular structures, cells, tissues, or organisms. There is a concern, however, about these popular methods because the internal standard used is a different molecule than the analyte at the start of the experiment; more specifically, the internal standard is a peptide and the analyte is a protein prior to cleavage. It is not until after the cleavage process that the stable isotope-labeled

DEVELOPMENT OF QUANTITATIVE PROTEOMICS

149

Fig. 2-18. Absolute quantification by MS-based technology using reference peptides. To quantify the absolute amount of a particular peptide in a sample mixture, known amounts of the isotopically labeled peptide are spiked into the mixture as a reference standard and subjected to LC-MS analysis. The amount of the peptide present in the sample mixture is estimated by comparing the MS signal intensity with that of the reference peptide. A typical calibration standard curve is illustrated in the figure. (Provided by Dr. Yamauchi.) (See insert for color representation.)

synthetic peptide has the same physicochemical behavior as the peptide cleaved from the protein. To dispel this concern, a method using cleavable stable isotopelabeled synthetic peptide called PC-IDMS (protein cleavage–isotope dilution mass spectrometry) has been proposed; the tryptic cleavage sites are incorporated into the internal standard synthetic peptide to create an internal standard that has cleavage characteristics more similar to the protein being quantified (158). This method was applied to the quantification of a model biomarker prostate-specific antigen in serum and proved to be a promising technique for quantifying proteins; however, concerns still exist regarding sensitivity compared to existing immunoassays as well as the reproducibility of PC-IDMS performed in different matrixes (159). Therefore, for the absolute quantification method using isotopically labeled internal standards, it should be remembered at the start of the experiment that the analyte is a protein and the internal standard is a peptide; thus, severe quantification errors may result due to the selection of unsuitable reference peptides and/or imperfect protein proteolysis (160). This emphasizes the importance of the selection of the reference peptides, which are produced stoichiometrically from proteins by proteolytic cleavage. There have been a number of reports on the quantitative analyses of particular proteins by using synthesized reference peptides; those include the identification

150

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

and quantification of apolipoprotein A1 (161), G protein-coupled receptor rhodopsin (162), myoglobin, yeast silent information regulatory (Sir)2 and Sir4 proteins, and Separase (157), and for post-translational modification including phosphorylation of cyclin-dependent kinases (Cdks), Cdk1, Cdk2, and Cdk3 (163) and polyubiquitin chain formation through lysine 48 (155). The absolute quantification method using isotopically labeled reference peptides has also been applied to determine the stoichiometry of ten components of human U1 small nuclear ribonucleoprotein complex that constitutes a part of the spliceosome (164), and to determine the absolute abundances of 32 proteins present in the postsynaptic density proteome isolated from rat forebrain and cerebellum (165, 166). In principle, if the necessary reference peptides are provided for all proteins, every protein can be absolutely quantified in a complex protein mixture. At the same time, peptides in the mixture can be identified by the mass differences with the reference peptides in an MS spectrum. Taking advantages of this, a proteome-browsing technology has been proposed that will achieve the automation and sample throughput required for undertaking huge population-based clinical studies (167). The proposed proteome-browsing technology may provide full-scale identification and absolute quantification of a proteome by using LC-MS (or just MS) and isotopically labeled peptides as internal standards for all proteins in a proteome without de novo identification using LC-MS/MS database searching. The proposed technology consists of the following: (1) selection and synthesis of reference peptides labeled with tags of a heavy stable isotope for each protein, protein isoform, or specifically modified form of a protein; (2) addition of precisely measured amounts of these reference peptides as definitive markers to a sample in which the proteins or peptides were labeled with tags of a light stable isotope; (3) MS or LC-MS analysis of the combined peptide sample to generate an ordered peptide array; (4) examination of each array element by a mass spectrometer and generation of two types of signal, one representing the signals of the peptides for which no reference peptide is added, appearing as single peaks, and the other representing the signals for those peptides for which a reference peptide is added, appearing as paired signals with a mass difference that precisely corresponds to the mass differential encoded in the stable isotope tag; and (5) identification of each protein by correlating the position and the accurately measured mass of each isotope–peptide pair in the array and quantification of each peptide by determining the ratio of the size of the signal of a peptide derived from the protein mixture with the signal of the corresponding reference peptide. Such a proteome-browsing method will allow (1) the unambiguous identification and quantification of each protein by one peptide (the number of peptides that need to be analyzed to identify and quantify the product of every gene approaches the number of genes in a genome); (2) self-explanatory identification and quantification of proteins by correlating the acquired data with a look-up table, rather than by de novo identification; (3) cross-laboratory standardization; (4) the absolute quantification of each protein, making datasets easily comparable; (5) selective interrogation of any subset of proteins, such as those contained in cellular machinery, multiprotein complexes, organelles, subcellular fractions, or differentiated cells; (6) absolute quantification of splice isoforms and differentially modified or processed

DEVELOPMENT OF QUANTITATIVE PROTEOMICS

151

proteins by providing appropriate synthesized reference peptides; and (7) economical analysis per assay by using only minuscule (nanogram to subnanogram) amounts of the peptide standards (167). The major drawback of this method is the necessity of preparing reference peptides, which have to be synthesized for every protein, and to later spike those prior to determining the absolute quantity of the protein itself. The reference peptides need to be analyzed efficiently during LC separation and ionization for MS, and the corresponding peptides (to be quantified in the sample) have to be generated quantitatively and reproducibly by protease (i.e., trypsin or Lys-C) digestion of proteins prior to LC-MS (or MS) analysis. Those requirements result in the troublesome task of selecting peptides to be used for internal standards before synthesis of the peptides. Despite the tremendous amount of work required for preparation of reference peptides, once the synthesized and isotopically labeled peptides prove to be suitable as references of absolute quantitative analysis using MS or LC-MS, the quantification and identification of proteins in a sample can be done just by LC-MS (or MS) analysis and use of a look-up table that contains m/z values and the amount for each reference peptide versus the expected m/z value of the corresponding peptide in the sample. The technology may allow a browsing mode analysis of a biological system without expensive instrumentation and expertise for proteomics, in which the biologically possible event is searched to find a proteome that correlates with a particular state or function of cells, tissues, or organisms (167). Method Without Using Internal Standards Currently, attempts for developing other techniques capable of determining the absolute concentration of proteins in complex mixtures from a simple LC-MS analysis without using specific internal standards for each protein are reported. Even a typical shotgun analysis using LC-MS/MS generates a large list of identified proteins with the help of database searching, and gives information on the hit rank in identification, the probability score, the number of identified peptides per protein, ion counts of identified peptides, LC retention times, and so on. Although those parameters alone cannot be used directly for absolute quantification, some parameters, such as the hit rank, the score, the integrated ion counts, and the number of peptides per protein, are indicators of protein abundance in the analyzed sample (168). The integrated ion count of the peptides identifying each protein, for instance, was used to compare protein expression in different states (169). The number of identified peptides (or other scores such as spectra counts or the total of the peptide probability scores) in shotgun LC-MS analysis has also been used as an index for the quantification of proteins in a complex mixture (14, 170). However, the mass spectrometer has some limitations as an abundance detector partly because of limited linearity to detect ion counts and partly because of background and ionization suppression effects as described in Section 2-2 (39). Therefore, it is necessary at least to normalize these parameters to obtain approximate quantitative information when feasible. For example, the protein abundance index (PAI), which represents the number of peptides identified divided by the number of theoretically observable proteolytic peptides, is superior to the number of

152

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

identified peptides as an index for quantification of proteins in a complex mixture (62, 171). This is because the PAI takes account of the fact that, for the same number of molecules, larger proteins and proteins with many peptides in the preferred mass range for mass spectrometry generate more observed peptides. The PAI method is simply based on the idea that as the amount of protein increases, the number of observed peptides for the protein also increases. However, PAI alone was not necessarily correlated to the abundance of proteins present in a complex protein mixture. Currently, a reliable method has been reported using an exponentially modified version of PAI (emPAI) for determining absolute quantity using the number of observed peptides for each protein (62, 171, 172). This index takes account of the fact that the number of peptides detected by MS analysis is influenced not only by protein size but also by the complexity of the peptide mixture; a very large number of peptides in the total cell lysate affects to some extent the random selection for MS/MS events, ion suppression effects, and saturation of the MS analyzer and/or detector. Although the number of peptides detected by MS does not have a linear relationship with the protein amount even in the region where the peak area is linear, it was found to have a linear relationship with the logarithm of the injected amount in some restricted ranges (e.g., from 3 to 500 fmol) (62, 170). This emPAI is equal to 10PAI minus one (emPAI ⫽ 10PAI ⫺1) and is easily calculated from the output information of database search engines such as Mascot. This relationship is also independent of the MS equipment used. Therefore, it is possible to apply this approach to previously measured or published datasets to add quantitative information without any additional experiment. Although emPAI estimates rough absolute quantities of peptides in a complex mixture, it is especially useful for quantification in cases where isotope-based approaches cannot be applied because of quantitative changes that are too large for accurate measurements of ratios, because metabolic labeling is not possible, or because sensitivity constraints do not allow chemical labeling techniques (62). To calculate the emPAI, the PAI value has to be obtained first. For the calculation of the PAI, the number of theoretically observable tryptic peptides is calculated by eliminating peptides that have MW beyond the scan range of the mass spectrometer and expected retention times beyond the retention range of the used reversed-phase chromatography system, from all theoretical peptides obtained from in silico digestion of their corresponding proteins. A program to calculate the peptide number using Microsoft Excel is available at xome.hydra.mki.co.jp:8080/bitt/common/ Menu. The number of actually observed peptides per protein is obtained by counting unique parent ions, sequences, sequences without partial modification, and the overlap caused by missed tryptic cleavage. These numbers can be exported from Mascot html files to Excel spreadsheets using the “Export All Peptides” function of MSQuant software, which is available at sourceforge.net. Total protein amounts are measured as weight by conventional BCA assay, and the weight fractions of measured proteins are calculated using the equation Protein content (weight%) ⫽

100 ⫻ (emPAI ⫻ M r ) ∑ (emPAI ⫻ M r )

REFERENCES

153

where Mr is the molecular weight of the protein, and Σ(emPAI) is the summation of emPAI values for all identified proteins. emPAI gives the protein abundance with deviation percentages similar to or better than determination of abundance by standard protein staining (one estimation of deviation percentage is within 63% on average of actual values when applied to the analysis of a complex peptide mixture); thus, emPAI can be a reliable indicator for absolute quantification (62).This emPAI method is certainly useful within a narrow protein concentration range; however, once a higher protein concentration is reached and all the observable peptides have been identified, the relationship deteriorates to an asymptotic limit (154). In addition, the emPAI method relies on characterizing peptides using data-directed MS/MS and may therefore be applicable to the more abundant proteins present in a mixture. Another method that provides absolute quantification of proteins from LC-MS data of simple or complex mixtures of tryptic peptides without requiring use of an external reference is based on a discovery that the average MS signal response for the three most intense tryptic peptides per mole of protein is constant (within a coefficient of variation of less than ±10% peptides) (154). This average value is called the universal signal response factor and is given in units of counts/pmol of protein. When the average MS signal responses of the three most intense peptides from each protein are plotted against their calculated protein concentrations, this plot is reported to result in a linear relationship with an R2 value of over 0.99. This method is very simple and accurate if the three most intense peptides are identified by the spectra obtained by MS or LC-MS analysis of a peptide mixture; thus, it is suited for the quantitative analysis of less complex protein mixtures. When the method is applied to a very complex mixture, it may have difficulty in identifying the three most intense peptides for each specific protein because the extremely complex spectra obtained by MS or LC-MS analysis of the mixture contains a huge number of mass peaks generated from a number of different proteins (154). Each of the absolute quantification methods has its own limitations when applied to complex protein mixtures; however, they can estimate quantities of proteins using just MS and MS/MS data without any external reference peptide or isotope labeling and can give at least relative abundances among the proteins present in a sample.

REFERENCES 1. Delahunty, C., and Yates, J. R. III (2005). Protein identification using 2D-LC-MS/MS. Methods 35:248–255. 2. Neverova, I., and Van Eyk, J. E. (2005). Role of chromatographic techniques in proteomic analysis. J. Chromatogr. B 815:51–63. 3. Takahashi, N., Isobe, T., and Putnam, F. W. (1991). Multidimensional, microscale HPLC technique in protein sequencing. In HPLC of Proteins, Peptides, and Polynucleotides, M. T. W. Heam (Ed.), Verlag Chemie lnternational, New York, pp. 307–330.

154

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

4. Takahashi, N., Ishioka, N., Takahashi, Y., and Putnam, F. W. (1985). Automated tandem high-performance liquid chromatographic system for separation of extremely complex peptide mixtures. J. Chromatogr. 326:407–418. 5. Takahashi, N., Takahashi, Y., Ishioka, N., Blumberg, B. S., and Putnam, F. W. (1986). Application of an automated tandem HPLC system to peptide mapping of genetic variants of human serum albumin. J. Chromatogr. 359:181–191. 6. Link, A. J., Eng, J., Schieltz, D. M., Carmack, E., Mize, G. J., Morris, D. R., Garvik, B. M., and Yates, J. R. (1999). Direct analysis of protein complexes using mass spectrometry. Nat. Biotechnol. 17:676–682. 7. Issaq, H. J., Chan, K. C., Janini, G. M., Conrads, T. P., and Veenstra, T. D. (2005). Multidimensional separation of peptides for effective proteomic analysis. J. Chromatogr. B. 817:35–47. 8. Isobe, T., Takahashi, N., and Putnam, F. W. (1991). Protein and peptide mapping by two-dimensional HPLC. In HPLC of Peptides and Proteins: Separation, Analysis, and Conformation, R. Hodges and C. Mant (Eds.), CRC Press, Boca Raton, FL, pp. 835–845. 9. Isobe, T., Uchida, K., Taoka, M., Shinkai, F., Manabe, T., and Okuyama, T. (1991). Automated high performance liquid chromatographic system for mapping proteins in highly complex mixtures. J. Chromatogr. 588:115–123. 10. Taoka, M., Isobe, T., Okuyama, T., Watanabe, M., Kondo, H., Yamakawa, Y., Ozawa, F., Hishinuma, F., Kubota, M., Minegishi, A., Song, S. Y., and Yamakuni, T. (1994). Murine cerebellar neurons express a novel gene encoding a protein related to cell cycle control and cell fate determination proteins. J. Biol. Chem. 269:9946–9951. 11. Taoka, M., Yamakuni, T., Song, S. Y., Yamakawa, Y., Seta, K., Okuyama, T., and Isobe, T. (1992). A rat cerebellar protein containing the cdc10/SW16 motif. Eur. J. Biochem. 207:615–620. 12. Isobe, T., Yamauchi, Y., Taoka, M., and Takahashi, N. (2002). Automated two-dimensional high performance liquid chromatography/tandem mass spectrometry for large-scale protein analysis. In Proteins and Proteomics: A Laboratory Manual, R. J. Simpson (Ed.), Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, pp. 869–876. 13. Washburn, M. P., Wolters, D., and Yates, J. R. III (2001). Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 19:242–247. 14. Mawuenyega, K. G., Kaji, H., Yamauchi, Y., Shinkawa, T., Taoka, M., Takahashi, N., and Isobe, T. (2002). Large-scale identification of Caenorhabditis elegans proteins by multidimensional liquid chromatography–tandem mass spectrometry. J. Proteome Res. 2:23–35. 15. Natsume, T., Yamauchi, Y., Nakayama, H., Shinkawa, T., Yanagida, M., Takahashi, N., and Isobe, T. (2002). Direct nano flow liquid chromatography–tandem mass spectrometry system for functional proteomics. Anal. Chem. 74:4725–4733. 16. Yates, J. R. III (1998). Mass spectrometry and the age of the proteomics. J. Mass Spectrom. 33:1–19. 17. MacCoss, M. J., McDonald, W. H., Saraf, A., Sadygov, R., Clark, J. M., Tasto, J. J., Gould, K. L., Wolters, D., Washburn, M., Weiss, A., Clark, J. I., and Yates, J. R. III

REFERENCES

18. 19.

20.

21.

22.

23.

24.

25.

26. 27. 28.

29. 30.

155

(2002). Shotgun identification of protein modifications from protein complexes and lens tissue. Proc. Natl. Acad. Sci. USA. 99(12):7900–7905. Swanson, S. K., and Washburn, M. P. (2005). The continuing evolution of shotgun proteomics. Drug Discovery Today 10:719–725. Zabrouskov, V., Giacomelli, L., van Wijk, K. J., and McLafferty, F. W. (2003). A new approach for plant proteomics: characterization of chloroplast proteins of Arabidopsis thaliana by top down mass spectrometry. Mol. Cell. Proteomics 2(12):1253–1260. Cheeseman, I. M., Anderson, S., Jwa, M., Green, E. M., Kang, J., Yates, J. R. III, Chan, C. S., Drubin, D. G., and Barnes, G. (2002). Phospho-regulation of kinetochore-microtubule attachments by the Aurora kinase Ipl1p. Cell 111:163–172. Schirmer, E. C., Florens, L., Guan, T., Yates, J. R. III, and Gerace, L. (2003). Nuclear membrane proteins with potential disease links found by subtractive proteomics. Science 301:1380–1382. Florens, L., Washburn, M. P., Raines, J. P., Anthony, R. M., Grainger, M., Haynes, J. P., Moch, J. K., Muster, N., Sacci, J. B., Tabb, D. L., Witney, A. A., Wolters, D., Wu, Y., Gardiner, M. J., Holder, A. A., Sinden, R. E., Yates, J. R. III, and Carucci, D. J. (2002). A proteomic view of the Plasmodium falciparum life cycle. Nature 419:520–526. Wu, C. C., MacCoss, M. J., Howell, K. E., and Yates, J. R. III (2003). A method for the comprehensive proteomic analysis of membrane proteins. Nat. Biotechnol. 21(5):532– 538. Takahashi, N., Yanagida, M., Fujiyama, S., Hayano, T., and Isobe, T. (2003). Proteomic snapshot analysis of preribosomal ribonucleoprotein complexes formed at various stages of ribosome biogenesis in yeast and mammalian cells. Mass Spectrom. Rev. 22:287–317. Pedrioli, P. G., Eng, J. K., Hubley, R., Vogelzang, M., Deutsch, E. W., Raught, B., Pratt, B., Nilsson, E., Angeletti, R. H., Apweiler, R., Cheung, K., Costello, C. E., Hermjakob, H., Huang, S., Julian, R. K., Kapp, E., McComb, M. E., Oliver, S. G., Omenn, G., Paton, N. W., Simpson, R., Smith, R., Taylor, C. F., Zhu, W., and Aebersold, R. (2004). A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotechnol. 22:1459–1466. Craig, R. Cortens, J. P., and Beavis, R. C. (2004). Open source system for analyzing, validating, and storing protein identification data. J. Proteome Res. 3:1234–1242. Craig, R., and Beavis, R. C. (2004). TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20:1466–1467. Fox, S. D., Lempicki, R. A., Hosack, D. A., Baseler, M. W., Kovacs, J. A., Lane, H. C., Veenstra, T. D., and Issaq, H. J. (2003). A comparison of microLC/electrospray ionization-MS and GC/MS for the measurement of stable isotope enrichment from a [2H2]-glucose metabolic probe in T-cell genomic DNA. Anal. Chem. 75(23):6517– 6522. McDonald, W. H., Oni, R., Miyamoto, D. T., Mitchinson, T. J., and Yates, J. R. III (2002). Int. J. Mass Spectrom. 219:245–251. Gatlin, C. L., Kleemann, G. R., Hays, L. G., Link, A. J., and Yates, J. R. III (1998). Protein identification at the low femtomole level from silver-stained gels using a new electrospray interface for liquid chromatography–microspray and nanospray mass spectrometry. Anal. Biochem. 263(1):93–101.

156

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

31. Davis, M. T., and Lee, T. D. (1998). Rapid protein identification using a microscale electrospray LC/MS system on an ion trap mass spectrometer. J. Am. Soc. Mass Spectrom. 9(3):194–201. 32. Ishihama, Y., Katayama, H., Asakawa, N., and Oda, Y. (2002). Highly robust stainless steel tips as microelectrospray emitters. Rapid Commun. Mass Spectrom. 16(10):913–918. 33. Yamauchi, Y., Taoka, M., Natusme, T., Hayano, T., Takahashi, N., and Isobe, T. (2004). Direct nano-flow-LC-MS/MS for the analysis of large protein complexes. In Experimental Manuals for Proteome Analyses, T. Isobe and N. Takahashi (Eds.), Yodo-sha, Tokyo, Japan, pp. 81–91 (in Japanese). 34. Ishihama, Y., Rappsilber, J., Andersen, J. S., and Mann, M. (2002). Microcolumns with self-assembled particle frits for proteomics. J. Chromatogr. A 979(1-2):233–239. 35. Ishihama, Y. (2004). Construction of LC system for nano-LC-MS/MS. In Mass Spectrometry Based Proteomics, Y. Oda and T. Natsume (Eds.), Nakayama Shoten, Tokyo, Japan, pp. 57–71 (in Japanese). 36. Yamauchi, Y., Taoka, M., Natusme, T., Hayano, T., Takahashi, N., and Isobe, T. (2004). Direct nano-flow-LC-MS/MS for the analysis of large protein complexes. In Experimental Manuals for Proteome Analyses, T. Isobe and N. Takahashi (Eds.), Yodo-sha, Tokyo, Japan, pp. 81–91 (in Japanese). 37. Shen, Y., Tolic, N., Masselon, C., Pas a-Tolic, L., Camp, D. G. II, Hixson, K. K., Zhao, R., Anderson, G. A., and Smith, R. D. (2004). Ultra-sensitive nano-scale proteomics using high-efficiency on-line MicroSPE-NanoLCNanoESIMS and MS/MS. Anal. Chem. 76:144–154. 38. Belov, M. E., Anderson, G. A., Wingerd, M. A., Udseth, H. R., Tang, K., Prior, D. C., Swanson, K. R., Buschbach, M. A., Strittmatter, E. F., Moore, R. J., and Smith, R. D. (2004). An automated high performance liquid chromatography–Fourier transform ion cyclotron resonance mass spectrometer for high-throughput proteomics. J. Am. Soc. Mass Spectrom. 15:212–232. 39. Shen, Y., Zhao, R., Berger, S. J., Anderson, G. A., Rodriguez, N., and Smith R. D. (2002). High-efficiency nanoscale liquid chromatography coupled on-line with mass spectrometry using nanoelectrospray ionization for proteomics. Anal. Chem. 74:4235–4249. 40. Yamauchi, Y., and Kumanotani, J. (1981). Packing of 3-μm particle ODS silicas using hexanol-1-methylenechloride (1:1) as a slurry medium in high-performance liquid chromatography. J. Chromatogr. 210:512–515. 41. Minakuchi, H., Nakanishi, K., Soga, N., Ishizuka, N., and Tanaka, N. (1997). Effect of skeleton size on the performance of octadecylsilylated continuous porous silica columns in reversed-phase liquid chromatography. J. Chromatogr A 762(1-2):135–146. 42. Ishizuka, N., Minakuchi, H., Nakanishi, K., Soga, N., Nagayama, H., Hosoya, K., and Tanaka, N. (2000). Performance of a monolithic silica column in a capillary under pressure-driven and electrodriven conditions. Anal. Chem. 72(6):1275–1280. 43. Balogh, M. P., and Stacey, C. C. (1991). Chromatographic developments in low flow delivery for liquid chromatography–mass spectrometry. J. Chromatogr. 562:73–79. 44. Davis, M. T., Stahl, D. C., Hefta, S. A., and Lee, T. D. (1995). A microscale electrospray interface for on-line, capillary liquid chromatography/tandem mass spectrometry of complex peptide mixtures. Anal. Chem. 67(24):4549–4556.

REFERENCES

157

45. Hayano, T., Yanagida, M., Yamauchi, Y., Shinkawa, T., Isobe, T., and Takahashi, N. (2003). Proteomic analysis of human Nop56p-associated pre-ribosomal ribonucleoprotein complexes: possible link between Nop56p and the nucleolar protein treacle responsible for Treacher Collins syndrome. J. Biol. Chem. 278(36):34309–34319. 46. Yanagida, M., Hayano, T., Yamauchi, Y., Shinkawa, T., Natsume, T., Isobe, T., and Takahashi, N. (2004). Human fibrillarin forms a sub-complex with splicing factor 2 associated p32, protein /arginine methyltransferases, tubulin a3 and b1, which is independent of its association with preribosomal ribonucleoprotein complexes. J. Biol. Chem. 379:1607–1614. 47. Shinkawa, T., Taoka, M., Yamauchi, Y., Ichimura, T., Kaji, H., Takahashi, N., and Isobe, T. (2005). STEM: a software tool for large-scale proteomic data analyses. J. Proteome Res. 4(5):1826–1831. 48. Washburn, M. P., Ulaszek, R., Deciu, C., Schieltz, D. M., and Yates, J. R. III (2002). Analysis of quantitative proteomic data generated via multidimensional protein identification technology. Anal. Chem. 74:1650–1657. 49. Wolters, D. A., Washburn, M. P., and Yates, J. R. III (2001). An automated multidimensional protein identification technology for shotgun proteomics. Anal. Chem. 73:5683– 5690. 50. Meek, J. L. (1980). Prediction of peptide retention times in high-pressure liquid chromatography on the basis of amino acid composition. Proc. Natl. Acad. Sci. USA 77:1632–1636. 51. Sakamoto, Y., Kawakami, N., and Sasagawa, T. (1988). Prediction of peptide retention times. J. Chromatogr. 17:69–79. 52. Strittmatter, E. F., Kangas, L. J., Petritis, K., Mottaz, H. M., Anderson, G. A., Shen, Y., Jacobs, J. M., Camp, D. G. II, and Smith, R. D. (2004). Application of peptide LC retention time information in a discriminant function for peptide identification by tandem mass spectrometry. J. Proteome Res. 3:760–769. 53. Krokhin, O. V., Craig, R., Spicer, V., Ens, W., Standing, K. G., Beavis, R. C., and Wilkins, J. A. (2004). An improved model for prediction of retention times of tryptic peptides in ion pair reversed-phase HPLC: its application to protein peptide mapping by offline HPLC-MALDI MS. Mol. Cell. Proteomics 3:908–919. 54. Cargile, B. J., and Stephenson, J. L. Jr. (2004). An alternative to tandem mass spectrometry: isoelectric point and accurate mass for the identification of peptides. Anal. Chem. 76:267–275. 55. Keller, A., Nesvizhskii, A. I., Kolker, E., and Aebersold, R. (2002). Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74:5383–5392. 56. Nesvizhskii, A. I., Keller, A., Kolker, E., and Aebersold, R. (2003). A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75:4646–4658. 57. Sadygov, R. G., Liu, H., and Yates, J. R. (2004). Statistical models for protein validation using tandem mass spectral data and protein amino acid sequence databases. Anal. Chem. 76:1664–1671. 58. Sadygov, R. G., and Yates, J. R. III (2003). A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases. Anal. Chem. 75:3792–3798.

158

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

59. Tabb, D. L., McDonald, W. H., and Yates, J. R. III (2002). DTASelect and Contrast: tools for assembling and comparing protein identifications from shotgun proteomics. J. Proteome Res. 1:21–26. 60. Li, X. J., Pedrioli, P. G., Eng, J., Martin, D., Yi, E. C., Lee, H., and Aebersold, R. (2004). A tool to visualize and evaluate data obtained by liquid chromatography–electrospray ionization–mass spectrometry. Anal. Chem. 76:3856–3860. 61. Radulovic, D., Jelveh, S., Ryu, S., Hamilton, T. G., Foss, E., Mao, Y., and Emili, A. (2004). Informatics platform for global proteomic profiling and biomarker discovery using liquid chromatography–tandem mass spectrometry. Mol. Cell. Proteomics 3:984–997. 62. Ishihama, Y., Oda, Y., Tabata, T., Sato, T., Nagasu, T., Rappsilber, J., and Mann, M. (2005). Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol. Cell. Proteomics 4(9):1265–1272. 63. Liu, H., Berger, S. J., Chakraborty, A. B., Plumb, R. S., and Cohen, S. A. (2002). Multidimensional chromatography coupled to electrospray ionization time-of-flight mass spectrometry as an alternative to two dimensional gels for the identification and analysis of complex mixtures of intact proteins. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 782(1-2):267–289. 64. Taoka, M., Yamauchi, Y., Sinkawa, T., Kaji, H., Motohashi, W., Nakayama, H., Takahashi, N., and Isobe, T. (2004). Only a small subset of the horizontally transferred chromosomal genes in Escherichia coli are translated into proteins. Mol. Cell. Proteomics 3:780–787. 65. Nagano, K., Taoka, M., Yamauchi, Y., Itagaki, C., Shinkawa, T., Nunomura, K., Okamura, N., Takahashi, N., Izumi, T., and Isobe, T. (2005). Large-scale identification of proteins expressed in mouse embryonic stem cells. Proteomics 5:1346–1361. 66. Yoshimura, Y., Yamauchi, Y., Shinkawa, T., Taoka, M., Donai, H., Takahashi, N., Isobe, T., and Yamauchi, T. (2004). Molecular constituents of the postsynaptic density fraction revealed by proteomic analysis using multidimensional liquid chromatography–tandem mass spectrometry. J. Neurochem. 88:759–769. 67. Nunomura, K., Itagaki, C., Nagano, K., Taoka, M., Okamura, N., Sugano, S., Takahashi, N., Izumi, T., and Isobe, T. (2005). Cell-surface labeling and mass spectrometry reveals diversity of cell-surface markers and signaling molecules expressed in undifferentiated mouse embryonic stem cells. Mol. Cell. Proteomics 4(12):1968–1976. 68. Kaji, H., Saito, H., Yamauchi, Y., Sinkawa, T., Taoka, M., Hirabayashi, J., Kasai, K., Takahashi, N., and Isobe, T. (2003). Lectin affinity capture, isotope-coded tagging and mass spectrometry to identify N-linked glycoprotein. Nat. Biotechnol. 21(6):667–672. 69. Davis, M. T., Beierle, J., Bures, E. T., McGinley, M. D., Mort, J., Robinson, J. H., Spahr, C. S., Yu, W., Luethy, R., and Patterson, S. D. (2001). Automated LC-LCMS-MS platform using binary ion-exchange and gradient reversed-phase chromatography for improved proteomic analyses. J. Chromatogr. B. Biomed Sci. Appl. 752:281–291. 70. Blattner, F. R., Plunkett, G. 3rd, Bloch, C. A., Perna, N. T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J. D., Rode, C. K., Mayhew, G. F., Gregor, J., Davis, N. W., Kirkpatrick, H. A., Goeden, M. A., Rose, D. J., Mau, B., and Shao, Y. (1997). The complete genome sequence of Escherichia coli K-12. Science 277:1453–1474.

REFERENCES

159

71. Watarai, H., Inagaki, Y., Kubota, N., Fuju, K., Nagafune, J., Yamaguchi, Y., and Kadoya, T. (2000). Proteomic approach to the identification of cell membrane proteins. Electrophoresis 21:460–464. 72. Adam, P. J., Boyd, R., Tyson, K. L., Fletcher, G. C., Stamps, A., Hudson, L., Poyser, H. R., Redpath, N., Griffiths, M., Steers, G., Harris, A. L., Patel, S., Berry, J., Loader, J. A., Townsend, R. R., Daviet, L., Legrain, P., Parekh, R., and Terrett, J. A. (2003). Comprehensive proteomic analysis of breast cancer cell membranes reveals unique proteins with potential roles in clinical cancer. J. Biol. Chem. 278:6482–6489. 73. Rabillaud, T. (2003). Membrane protein rides shotgun. Nat. Biotechnol. 21:508–510. 74. Pasquali, C., Fialka, I., and Huber, L. A. (1999). Subcellular fractionation, electromigration analysis and mapping of organelles. J. Chromatogr. B Biomed. Sci. Appl. 722:89–102. 75. Durr, E., Yu, J., Krasinska, K. M., Carver, L. A., Yates, J. R., Testa, J. E., Oh, P., and Schnitzer, J. E. (2004). Direct proteomic mapping of the lung microvascular endothelial cell surface in vivo and in cell culture. Nat. Biotechnol. 22:985–992. 76. Chen, W. N., Yu, L. R., Strittmatter, E. F., Thrall, B. D., Camp, D. G. II, and Smith, R. D. (2003). Detection of in situ labeled cell surface proteins by mass spectrometry: application to the membrane subproteome of human mammary epithelial cells. Proteomics 3:1647–1651. 77. Hilton, D. J., and Nicola, N. A. (1992). Kinetic analyses of the binding of leukemia inhibitory factor to receptor on cells and membranes and in detergent solution. J. Biol. Chem. 267:10238–10247. 78. Knight, Z. A., Schilling, B., Row, R. H., Kenski, D. M., Gibson, B. W., and Shokat, K. M. (2003). Phosphospecific proteolysis for mapping sites of protein phosphorylation. Nat. Biotechnol. 21:1047–1054. 79. Zhou, H., Watts, J., and Aebersold, R., (2001). A systematic approach to the analysis of protein phosphorylation. Nat. Biotechnol. 19:375–378. 80. Ficarro, S. B., McCleland, M. L., Stukenberg, P. T., Burke, D. J., Ross, M. M., Shabanowitz, J., Hunt, D. F., and White, F. M. (2002). Phosphoproteome analysis by mass spectrometry and its application to Saccharomyces cerevisiae. Nat. Biotechnol. 20:301–305. 81. Beausoleil, S. A., Jedrychowski, M., Schwartz, D., Elias, J. E., Villen, J., Li, J., Cohn, M. A., Cantley, L. C., and Gygi, S. P. (2004). Large-scale characterization of HeLa cell nuclear phosphoproteins. Proc. Natl. Acad. Sci. USA 101(33):12130–12135. 82. Rush, J., Moritz, A., Lee, K. A., Guo, A., Goss, V. L., Spek, E. J., Zhang, H., Zha, X. M., Polakiewicz, R. D., and Comb, M. J. (2005). Immunoaffinity profiling of tyrosine phosphorylation in cancer cells. Nat. Biotechnol. 23(1):94–101. 83. Beausoleil, S. A., Villen, J., Gerber, S. A., Rush, J., and Gygi, S. P. (2006). A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. online version. 84. Katayama, H., and Oda, Y. (2004). Analysis of phospho-protein/peptide by using MALDI-TOF/MS and ESI-LCQ-MS. In Mass Spectrometry Based Proteomics, Y. Oda and T. Natsume (Eds.), Nakayama-shoten, Tokyo, pp. 120–135 (in Japanese). 85. Steen, H., Jebanathirajah, J. A., Rush, J., Morrice, N., and Kirschner, M. W. (2006). Phosphorylation analysis by mass spectrometry: myths, facts, and the consequences for qualitative and quantitative measurements. Mol. Cell. Proteomics 5(1):172–181.

160

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

86. Takahashi, N., Isobe, T., Seta, K., and Putnam, F. W. (2002). Glycoproteins. In HPLC of Biological Macromolecules, 2nd ed., Chromatographic Science Series, K. M. Gooding and F. E. Regnier (Eds.), Marcel Dekker, New York, Vol. 87, pp. 635–652. 87. Hart, G. W. (1997). Dynamic O-linked glycosylation of nuclear and cytoskeletal proteins. Annu. Rev. Biochem. 66:315–335. 88. Wells, L., Vosseller, K., and Hart, G. W. (2001). Glycosylation of nucleocytoplasmic proteins: signal transduction and O-GlcNAc. Science 291:2376–2378. 89. Freeze, H. (2003). Mass spectrometry provides sweet inspiration. Nat. Biotechnol. 21:627–629. 90. Zhang, H., Li, X. J., Martin, D. B., and Aebersold, R. (2003). Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. Nat. Biotechnol. 21(6):660–666. 91. Wells, L., Vosseller, K., Cole, R. N., Cronshaw, J. M., Matunis, M. J., and Hart, G. W. (2002). Mapping sites of O-GlcNAc modification using affinity tags for serine and threonine post-translational modifications. Mol. Cell. Proteomics 1(10):791–804. 92. Kaji, H., Yamauchi, Y., Takahashi, N., and Isobe, T. (2006). Large-scale identification of N-linked glycoproteins using glycosylation site-specific stable isotope tagging and liquid chromatography-mass spectrometry. Nat. Protocol. In press. 93. Kirkpatrick, D. S., Denison, C., and Gygi, S. P. (2005). Weighing in on ubiquitin: The expanding role of mass spectrometry-based proteomics. Nat. Cell Biol. 7(8):750–757. 94. Peng, J., Schwartz, D., Elias, J. E., Thoreen, C. C., Cheng, D., Marsischky, G., Roelofs, J., Finley, D., and Gygi., S. P. (2003). Aproteomics approach to understanding proteion ubiquitination. Nat. Biotechnol. 21(8):921–926. 95. Hitchcock, A. L., Auld, K., Gygi, S. P., and Silver, P. A. (2003). A subset of membraneassociated proteins is ubiquitinated in response to mutations in the endoplasmic reticulum degradation machinery. Proc. Natl. Acad. Sci. USA. 100(22):12735–12740. 96. Mayor, T., Russell-Lipford, J., Graumann, J., Smith, G. T., and Deshaies, R. J. (2005). Analysis of poly-ubiquitin conjugates reveals that the Rpn10 substrate receptor contributes to the turnover of multiple proteasome targets. Mol. Cell. Proteomics 4(6):741–751. 97. Tagwerker, C., Flick, K., Cui, M., Guerrero, C., Dou, Y., Auer, B., Baldi, P., Huang, L., and Kaiser, P. (2006). A tandem affinity tag for two-step purification under fully denaturing conditions: application in ubiquitin profiling and protein complex identification combined with in vivo cross-linking. Mol. Cell. Proteomics 5(4):737–748. 98. Zhou, W., Ryan, J. J., and Zhou, H. (2004). Global analyses of sumoylated proteins in Saccharomyces cerevisiae. Induction of protein sumoylation by cellular stresses. J. Biol. Chem. 279:32262–32268. 99. Denison, C., Rudner, A. D., Gerber, S. A., Bakalarski, C. E., Moazed, D., and Gygi, S. P. (2005). A proteomic strategy for gaining insights into protein sumoylation in yeast. Mol. Cell. Proteomics 4(3):246–254. 100. Finley, D., Sadis, S., Monia, B. P., Boucher, P., Ecker, D. J., Crooke, S. T., and Chau, V. (1994). Inhibition of proteolysis and cell cycle progression in a multiubiquitinationdeficient yeast mutant. Mol. Cell. Biol. 14:5501–5509.

REFERENCES

161

101. Tsirigotis, M., Thurig, S., Dube, M., Vanderhyden, B. C., Zhang, M., and Gray, D. A. (2001). Analysis of ubiquitination in vivo using a transgenic mouse model. Biotechniques 31:120–126. 102. Matsumoto, M., Hatakeyama, S., Oyamada, K., Oda, Y., Nishimura, T., and Nakayama, K. (2005). Large-scale analysis of the human ubiquitin-related proteome. Proteomics 5(16):4145–4151. 103. Matsumoto, M., and Nakayama, K. (2004). Method for the analysis of ubiquitinated proteins. In Manuals for Proteome Analysis, T. Isobe and N. Takahashi (Eds.), Yodosha, Tokyo, pp. 158–166. 104. Takada, K., Hirakawa, T., Yokosawa, H., Okawa, Y., Taguchi, H., and Ohkawa, K. (2001). Isolation of ubiquitin-E2 (ubiquitin-conjugating enzyme) complexes from erythroleukaemia cells using immunoaffi nity techniques. Biochem. J. 356:199– 206. 105. Mann, M. (1999). Quantitative proteomics? Nat. Biotechnol. 17:954–955. 106. Oda, Y., Huang, K., Cross, F. R., Cowburn, D., and Chait, B. T. (1999). Accurate quantitation of protein expression and site-specific phosphorylation. Proc. Natl. Acad. Sci. USA 96:6591–6596. 107. Gygi, S. P., Rist, B., Gerber, S. A., Turecek, F., Gelb, M. H., and Aebersold, R. (1999). Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17:994–999. 108. Gygi, S. P., and Abersold, R. (2000). Using mass spectrometry for quantitative proteomics. In Proteomics: A Trends Guide, Elsevier Science, London, pp. 31–36. 109. Cagney, G., and Emili, A. (2002). De novo peptide sequencing and quantitative profiling of complex protein mixtures using mass-coded abundance tagging. Nat. Biotechnol. 20:163–170. 110. Steen, H., and Pandey, A. (2002). Proteomics goes quantitative: measuring protein abundance. Trends Biotechnol. 20:361–364. 111. De Leenheer, A. P., and Thienpont, L. M. (1992). Application of isotope dilution–mass spectrometry in clinical chemistry, pharmacokinetics, and toxicology. Mass Sepectrom. Rev. 11:249–307. 112. Krijgsveld, J., Ketting, R. F, Mahmoudi, T., Johansen, J., Artal-Sanz, M., Verrijzer, C. P., Plasterk, R. H., and Heck, A. J. (2003). Metabolic labeling of C. elegans and D. melanogaster for quantitative proteomics. Nat. Biotechnol. 21(8):927–931. 113. Conrads, T. P., Alving, K., Veenstra, T. D., Belov, M. E., Anderson, G. A., Anderson, D. J., Lipton, M. S., Pasa-Tolic, L., Udseth, H. R., Chrisler, W. B., Thrall, B. D., and Smith, R. D. (2001). Quantitative analysis of bacterial and mammalian proteomes using a combination of cysteine affinity tags and 15N-metabolic labeling. Anal. Chem. 73(9):2132–2139. 114. Ong, S. E., Kratchmarova, I., and Mann, M. (2003). Properties of 13C-substituted arginine in stable isotope labeling by amino acids in cell culture (SILAC). J. Proteome Res. 2(2):173–181. 115. Ong, S. E., Blagoev, B., Kratchmarova, I., Kristensen, D. B., Steen, H., Pandey, H., and Mann, M. (2002). Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 1(5):376–386.

162

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

116. Gu, S., Pan, S., Bradbury, E. M., and Chen, X. (2003). Precise peptide sequencing and protein quantification in the human proteome through in vivo lysine-specific mass tagging. J. Am. Soc. Mass Spectrom. 14:1–7. 117. Ishihama, Y., Sato, T., Tabata, T., Miyamoto, N., Sagane, K., Nagasu, T., and Oda, Y.(2005). Quantitative mouse brain proteomics using culture-derived isotope tags as internal standards. Nat. Biotechnol. 23(5):617–621. 118. Zhang, R., and Regnier, F. E. (2002). Minimizing resolution of isotopically coded peptides in comparative proteomics. J. Proteome Res. 1:139–147. 119. Lill, J. (2003). Proteomic tools for quantitation by mass spectrometry. Mass Spectrom. Rev. 22:182–194. Moseley, M. A. (2001). Current trends in differential expression proteomics: isotopically coded tags. Trends Biotechnol. 19:510–516. 120. Zhou, H., Ranish, J. A., Watts, J. D., and Aebersold, R. (2002). Quantitative proteome analysis by solid-phase isotope tagging and mass spectrometry. Nat. Biotechnol. 20:512–515. 121. DeSouza, L., Diehl, G., Rodrigues, M. J., Guo, J., Romaschin, A. D., Colgan, T. J., and Siu, K. W. (2005). Search for cancer markers from endometrial tissues using differentially labeled tags iTRAQ and cICAT with multidimensional liquid chromatography and tandem mass spectrometry. J. Proteome Res. 4:377–386. 122. Munchbach, M., Quadroni, M., Miotto, G., and James, P. (2000). Quantitation and facilitated de novo sequencing of proteins by isotopic N-terminal labeling of peptides with a fragmentation-directing moiety. Anal. Chem. 72:4047–4057. 123. Goodlett, D. R., Keller, A., Watts, J. D., Newitt, R., Yi, E. C., Purvine, S., Eng, J. K., von Haller, P., Aebersold, R., and Kolker, E. (2001). Differential stable isotope labeling of peptides for quantitation and de novo sequence derivation. Rapid Commun. Mass Spectrom. 15:1214–1221. 124. Regnier, F. E., Riggs, L., Zhang, R., Xiong, L., Liu, P., Chakraborty, A., Seeley, E., Sioma, C., and Thompson, R. A. (2002). Comparative proteomics based on stable isotope labeling and affinity selection. J. Mass Spectrom. 37:133–145. 125. Kuyama, H., Watanabe, M., Toda, T., Ando, E., Tanaka, K., and Nishimura, O. (2003). An approach to quantitative proteome analysis by labeling tryptophan residues. Rapid Commun. Mass Spectrom. 17:1642–1650. 126. Sechi, S., and Chait, B. T. (1998). Modification of cysteine residues by alkylation. A tool in peptide mapping and protein identification. Anal. Chem. 70:5150–5158. 127. Mirgorodskaya, O. A., Kozmin, Y. P., Titov, M. I., Korner, R., Sonksen, C. P., and Roepstorff, P. (2000). Quantitation of peptides and proteins by matrix-assisted laser desorption/ionization mass spectrometry using (18)O-labeled internal standards. Rapid Commun. Mass Spectrom. 14:1226–1232. 128. Yao, X., Freas, A., Ramirez, J., Demirev, P. A., and Fenselau, C. (2001). Proteolytic 18O labeling for comparative proteomics: model studies with two serotypes of adenovirus. Anal. Chem. 73:2836–2842. 129. Wang, Y. K., Ma, Z., Quinn, D. F., and Fu, E. W. (2001). Inverse 18O labeling mass spectrometry for the rapid identification of marker/target proteins. Anal. Chem. 73:3742– 3750. 130. Stewart, I. I., Thomson, T., and Figeys, D. (2001). 18O labeling: a tool for proteomics. Rapid Commun. Mass Spectrom. 15:2456–2465.

REFERENCES

163

131. Venable, J. D., Dong, M. Q., Wohlschlegel, J., Dillin, A., and Yates, J. R. (2004). Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat. Methods 1:39–45. 132. Zhang, G., and Neubert, T. A. (2006). Automated comparative proteomics based on multiplex tandem mass spectrometry and stable isotope labeling. Mol. Cell. Proteomics 5(2):401–411. 133. Han, D. K., Eng, J., Zhou, H., and Aebersold, R. (2001). Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat. Biotechnol. 19:946–951. 134. Li, X. J., Zhang, H., Ranish, J. A., and Aebersold, R. (2003). Automated statistical analysis of protein abundance ratios from data generated by stable-isotope dilution and tandem mass spectrometry. Anal. Chem. 75:6648–6657. 135. MacCoss, M. J., Wu, C. C., Liu, H., Sadygov, R., and Yates, J. R. III (2003). A correlation algorithm for the automated quantitative analysis of shotgun proteomics data. Anal. Chem. 75:6912–6921. 136. Blagoev, B., Kratchmarova, I., Ong, S. E., Nielsen, M., Foster, L. J., and Mann, M. (2003). A proteomics strategy to elucidate functional protein–protein interactions applied to EGF signaling. Nat. Biotechnol. 21:315–318. 137. Eng, J. K., McCormack, A. L., and Yates, J. R. III (1994). An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5:976–989. 138. Perkins, D. N., Pappin, D. J., Creasy, D. M., and Cottrell, J. S. (1999). Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–3567. 139. Field, H. I., Fenyo, D., and Beavis, R.C. (2002). RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database. Proteomics 2(1):36–47. 140. Qian, W. J., Monroe, M. E., Liu, T., Jacobs, J. M., Anderson, G. A., Shen, Y., Moore, R. J., Anderson, D. J., Zhang, R., Calvano, S. E., Lowry, S. F., Xiao, W., Moldawer, L. L., Davis, R. W., Tompkins, R. G., Camp, D. G., 2nd, and Smith, R. D. (2005). Quantitative proteome analysis of human plasma following in vivo lipopolysaccharide administration using 16O/18O labeling and the accurate mass and time tag approach. Mol. Cell. Proteomics 4:700–709. 141. Zang, L., Palmer Toy, D., Hancock, W. S., Sgroi, D. C., and Karger, B. L. J. (2004). Proteomic analysis of ductal carcinoma of the breast using laser capture microdissection, LC-MS, and 16O/18O isotopic labeling. J. Proteome Res. 3:604–612. 142. Brown, K. J., and Fenselau, C. J. (2004). Investigation of doxorubicin resistance in MCF-7 breast cancer cells using shotgun comparative proteomics with proteolytic 18O labeling. J. Proteome Res. 3:455–462. 143. Li, X.-J., Yi, E. C., Kemp, C. J., Zhang, H., and Aebersold, R. (2005). A software suite for the generation and comparison of peptide arrays from sets of data collected by liquid chromatography–mass spectrometry. Mol. Cell. Proteomics 4(9):1328–1340. 144. Arnott, D., Kishiyama, A., Luis, E. A., Ludlum, S. G., Marsters, J. C. Jr., and Stults, J. T. (2002). Selective detection of membrane proteins without antibodies: a mass spectrometric version of the Western blot. Mol. Cell. Proteomics 1:148–156.

164

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

145. Berger, S. J., Lee, S. W., Anderson, G. A., Pasa-Tolic, L., Tolic, N., Shen, Y., Zhao, R., and Smith, R. D. (2002). High-throughput global peptide proteomic analysis by combining stable isotope amino acid labeling and data-dependent multiplexed-MS/MS. Anal. Chem. 74:4994–5000. 146. Ross, P. L., Huang, Y. N., Marchese, J. N., Williamson, B., Parker, K., Hattan, S., Khainovski, N., Pillai, S., Dey, S., Daniels, S., Purkayastha, S., Juhasz, P., Martin, S., Bartlet-Jones, M., He, F., Jacobson, A., and Pappin, D. J. (2004). Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics 3:1154–1169. 147. Thompson, A., Schafer, J., Kuhn, K., Kienle, S., Schwarz, J., Schmidt, G., Johnstone, R., Neumann, T., and Hamon, C. (2003). Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal. Chem. 75:1895–1904. 148. Blagoev, B., Ong, S. E., Kratchmarova, I., and Mann, M. (2004). Temporal analysis of phosphotyrosine-dependent signaling networks by quantitative proteomics. Nat. Biotechnol. 22:1139–1145. 149. Zhang, H., Yi, E. C., Li, X. J., Mallick, P., Kelly-Spratt, K. S., Masselon, C. D., Camp, D. G. II, Smith, R. D., Kemp, C. J., and Aebersold, R. (2005). High throughput quantitative analysis of serum proteins using glycopeptide capture and liquid chromatography mass spectrometry. Mol. Cell. Proteomics 4:144–155. 150. Wang, W., Zhou, H., Lin, H., Roy, S., Shaler, T. A., Hill, L. R., Norton, S., Kumar, P., Anderle, M., and Becker, C. H. (2003). Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Anal. Chem. 75:4818–4126. 151. Ono, M., Shitashige, M., Honda, K., Isobe, T., Kuwabara, H., Matsuzuki, H., Hirohashi, S., and Yamada, T. (2006). Label-free quantitative proteomics using large peptide data sets generated by nano-flow liquid chromatography and mass spectrometry. Mol. Cell. Proteomics 5(7):1338–1347. 152. Bondarenko, P. V., Chelius, D., and Shaler, T. A. (2002). Identification and relative quantitation of protein mixtures by enzymatic digestion followed by capillary reversedphase liquid chromatography–tandem mass spectrometry. Anal. Chem. 74:4741–4749. 153. Masselon, C., Pasa-Tolic, L., Tolic, N., Anderson, G. A., Bogdanov, B., Vilkov, A. N., Shen, Y., Zhao, R., Qian, W. J., Lipton, M. S., Camp, D. G. 2nd, and Smith, R. D. (2005). Targeted comparative proteomics by liquid chromatography–tandem Fourier ion cyclotron resonance mass spectrometry. Anal. Chem. 77(2):400–406. 154. Silva, J. C., Gorenstein, M. V., Li, G.-Z., Vissers, J. P. C., and Geromanos, S. J. (2006). Absolute quantification of proteins by LCMSE; a virtue of parallel MS acquisition. Mol. Cell. Proteomics 5(1):144–156. 155. Kirkpatrick, D. S., Gerber, S. A., and Gygi, S. P. (2005). The absolute quantification strategy: a general procedure for the quantification of proteins and post-translational modifications. Methods 35:265–273. 156. Kuhn, E., Wu, J., Karl, J., Liao, H., Werner, Z., and Guild, B. (2004). Quantification of C-reactive protein in the serum of patients with rheumatoid arthritis using multiple reaction monitoring mass spectrometry and 13C-labeled peptide standards. Proteomics 4:1175–1186.

REFERENCES

165

157. Gerber, S. A., Rush, J., Stemman, O., Kirschner, M. W., and Gygi, S. P. (2003). Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc. Natl. Acad. Sci. USA 100(12):6940–6945. 158. Barnidge, D. R., Hall, G. D., Stocker, J. L., and Muddiman, D. C. (2004). Evaluation of a cleavable stable isotope labeled synthetic peptide for absolute protein quantification using LC-MS/MS. J. Proteome Res. 3(3):658–661. 159. Barnidge, D. R., Goodmanson, M. K., Klee, G. G., and Muddiman, D. C. (2004). Absolute quantification of the model biomarker prostate-specific antigen in serum by LC-Ms/MS using protein cleavage and isotope dilution mass spectrometry. J. Proteome Res. 3(3):644–652. 160. Bronstrup, M. (2004). Absolute quantification strategies in proteomics based on mass spectrometry. Expert Rev. Proteomics 1(4):503–512. 161. Barr, J. R., Maggio, V. L., Patterson, D. G. Jr., Cooper, G. R., Henderson, L. O., Turner, W. E., Smith, S. J., Hannon, W. H., Needham, L. L., and Sampson, E. J. (1996). Isotope dilution–mass spectrometric quantitation of specific proteins: model application with apolipoprotein A-I. Clin. Chem. 42:1676–1682. 162. Barnidge, D. R., Dratz, E. A., Martin, T., Bonilla, L. E., Moran, L. B., and Lindall, A. (2003). Absolute quantification of the G protein-coupled receptor rhodopsin by LC/MS/MS using proteolysis product peptides and synthetic peptide standards. Anal. Chem. 75(3):445–451. 163. Mayya, V., Rezual, K., Wu, L., Fong, M. B., and Han, D. K. (2006). Absolute quantification of multisite phosphorylation by selective reaction monitoring mass spectrometry: determination of inhibitory phosphorylation status of cyclin-dependent kinases. Mol. Cell. Proteomics 5(6):1146–1157. 164. Hochleitner, E. O., Kastner, B., Frohlich, T., Schmidt, A., Luhrmann, R., Arnold, G., and Lottspeich, F. (2005). Protein stoichiometry of a multiprotein complex, the human spliceosomal U1 small nuclear ribonucleoprotein: absolute quantification using isotope-coded tags and mass spectrometry. J. Biol. Chem. 280(4):2536– 2542. 165. Cheng, D., Hoogenraad, C. C., Rush, J., Ramm, E., Schlager, M. A., Duong, D. M., Xu, P., Wijayawardana, S. R., Hanfelt, J., Nakagawa, T., Sheng, M., and Peng, J. (2006). Relative and absolute quantification of postsynaptic density proteome isolated from rat forebrain and cerebellum. Mol. Cell. Proteomics 5(6):1158–1170. 166. Peng, J., Kim, M. J., Cheng, D., Duong, D. M., Gygi, S. P., and Sheng, M. (2004). Semiquantitative proteomic analysis of rat forebrain postsynaptic density fractions by mass spectrometry. J. Biol. Chem. 279:21003–21011. 167. Aebersold, R. (2003). Constellations in a cellular universe. Nature 422:115–116. 168. Corbin, R. W., Paliy, O., Yang, F., Shabanowitz, J., Platt, M., Lyons, C. E. Jr., Root, K., McAuliffe, J., Jordan, M. I., Kustu, S., Soupene, E., and Hunt, D. F. (2003). Toward a protein profile of Escherichia coli: comparison to its transcription profile. Proc. Natl. Acad. Sci. USA 100:9232–9237. 169. Lasonder, E., Ishihama, Y., Andersen, J. S., Vermunt, A. M., Pain, A., Sauerwein, R. W., Eling, W. M., Hall, N., Waters, A. P., Stunnenberg, H. G., and Mann, M. (2002). Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature 419:537–542.

166

PROTEOMIC TOOLS FOR ANALYSIS OF CELLULAR DYNAMICS

170. Liu, H., Sadygov, R. G., and Yates, J. R. III (2004). A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 76:4193– 4201. 171. Rappsilber, J., Ryder, U., Lamond, A. I., and Mann, M. (2002). Large-scale proteomic analysis of the human spliceosome. Genome Res 12:1231–1245. 172. Sanders, S. L., Jennings, J., Canutescu, A., Link, A. J., and Weil, P. A. (2002). Proteomics of the eukaryotic transcription machinery: identification of proteins associated with components of yeast TFIID by multidimensional mass spectrometry. Mol. Cell. Biol. 22:4723–4738.

3 DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY: FROM STATICS TO DYNAMICS IN PROTEOMIC BIOLOGY

3-1 DYNAMIC ANALYSIS OF CELLULAR FUNCTION The cell undertakes various tasks, such as transcription, RNA processing, RNA export, translation, transport, degradation, and secretion, and exhibits its specific function as a consequence of integrating all of those tasks, each of which is carried out by highly specialized, elaborate multiprotein complexes that form the basic functional modules of their molecular machinery. Some machinery is made of tens of multiprotein complexes during functional operation, which results in an ordered assembly of hundreds of protein components. How these functional complexes form and assemble at the right time and in the right place and how those tasks are integrated into the functional machinery and eventually architectural framework of the cell are fundamental, unanswered questions in biology (1). This dynamic nature of cellular function suggests that it is not just the result of a gathering of genes and proteins but is performed as a system, in which all molecular events are integrated specifically in a series of time-dependent action of the cell. Thus, while cataloging of protein expression and mapping of networks of protein interaction and multiprotein complexes continue to be important, it is also necessary to understand the cell function in terms of the dynamics of multiprotein complexes/machinery and eventually of a system’s cell structure. In other words, the cell function cannot be fully understood merely by drawing diagrams of interconnections of cell constituents. To quote system biologist Kitano’s own words: “Although such [a] diagram represents an important first step, it is analogous to a static roadmap, whereas what

Proteomic Biology Using LC-MS: Large Scale Analysis of Cellular Dynamics and Function By Nobuhiro Takahashi and Toshiaki Isobe Copyright © 2008 John Wiley & Sons, Inc.

167

168

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

we really seek to know are the traffic patterns, why such traffic patterns emerge, and how we can control them. Identifying all the genes and proteins in an organism is like listing all the parts of an airplane …. We need to know how these parts are assembled to form the structure of the airplane. This is analogous to drawing an exhaustive diagram of gene-regulatory networks and their biochemical interactions. Such diagrams provide limited knowledge of how changes to one part of a system may affect other parts, but to understand how a particular system functions, we must first examine how the individual components dynamically interact during operation” (2). As described in the Introduction of this book, one of the goals of proteomics is a genome-wide survey of protein dynamics, to provide a bird’s-eye view of the protein society of the cell in the context of how cells are functioning dynamically at the molecular level. In fact, there have been a number of attempts to analyze perturbed molecular networks of various cell systems using proteomics and/or transcriptomic technologies in terms of system biology, including those for the analysis of 20 systematic perturbations of the yeast galactose-utilization pathway (3), disease-perturbed networks in prostate cancer cells (4), cell differentiation/ apoptosis of promyelocytic leukemia induced by retinoic acid/arsenic trioxide (5), Myc-induced apoptosis (6), and perturbation of PTMs of human histones (7). In addition, quantitative MS coupled with SILAC in vivo labeling was also applied to the analysis of a protease-substrate network responsive to a specific environmental stress that caused DNA damage, and showed a possible new damage response (8). The current technologies obviously still need further improvement in analyzing whole cell systems (9); however, they have sufficient capability to attack dynamic aspects of more focused, specific cellular functions that are regulated at levels of protein complexes, cellular machinery, or even of an entire subcellular structure (10). This chapter describes the strategies for the analysis of the dynamics of “multiprotein complexes,” or “machinery,” that are at the forefront of a wide range of biological functions in the cell. 3-1-1 Strategy for Dynamic Analysis of Cellular Machinery (Multiprotein Complexes) The time scale of cellular processes involving proteins varies typically from milliseconds to days (11). It is essential to know the time scale of cellular processes for the study of process dynamics. The time scale defines an interval of time and frequency of data collection, which are required for capturing features of a particular cellular process. For example, the shortest time scale characteristic of cellular reactions with proteins is probably that for a few PTMs and protein–protein interactions, which occur in milliseconds to seconds (12–14). The time scales of more complex regulatory processes (where a number of PTMs and protein–protein interactions are involved) may vary from milliseconds to hours. For instance, electrophysiological reactions take milliseconds to generate an active potential in neurons (12), and the intracellular signaling process initiated by growth factors

DYNAMIC ANALYSIS OF CELLULAR FUNCTION

169

probably takes from minutes to hours (15). The requirement of longer time frames for such regulatory processes is due to rearrangements of interactions between proteins or between proteins and other components of the cell, typically through PTMs (13, 16–21). In addition, the time scales of cellular function are dependent on the number of steps involved in the regulatory cascades, the rate of protein movements and transport systems to reach intracellular sites where modification reactions and complex formation take place, and the compartmentalization that creates structural barriers for free protein diffusion. Basic cellular tasks—for example, transcription, translation, and ribosome biogenesis that involve many dozens to hundreds of proteins and other components—take several minutes to complete the whole process, which is much faster than the time required for experimental operations using current proteomics technologies (11). In a strict sense, for the dynamic analysis of those cellular processes, the experimental datasets have to be collected about single protein molecules within a time interval of a few milliseconds over a period of minutes to hours or days. In this context, an analysis of the dynamic organization of nuclear pore complexes, composed of about 30 different polypeptides, may provide a strategy for the dynamic analysis of cellular machinery. The analysis can determine the dissociation rates of protein subunits from complexes that have a traceable localization inside single living cells (22). It shows that the components of nuclear pore complexes exhibit a wide range of residence times covering five orders of magnitude from seconds to days, and that the central parts of the nuclear pore complexes are very stable, consistent with a function as a structural scaffold, whereas more peripheral components exhibit more dynamic behavior, suggesting adaptor as well as regulatory functions. Although this example may provide a possible strategy to characterize the dynamic behavior of a cellular machine in live cells, this kind of analysis is not yet feasible for many other cellular machinery (multiprotein complexes), because techniques for studying single molecules are not suitable for large scale analyses, and proteomics is still too laborious to be able to analyze hundreds of experimental time points in a single analysis (11, 23). Therefore, an important issue in the study of dynamics using proteomics technologies is a definition of time intervals and the number of points for data collection. In most cases, the definitions of frequency and time intervals for data collection are based on an empirical knowledge of studied cellular processes (11). It may be based on knowledge of how long a time it takes to observe changes of interest, and on knowledge of the dynamics of changes (e.g., features of cell division, cell death, differentiation, RNA synthesis and processing, and ribosome biogenesis). Balanced with a workload, it defines the number of points for data collection (11). In any case, however, the number of data points collected by typical proteomic approaches is two to five, because of the capability and limitations of proteomics technologies available today, which deal with a hundred to thousands of proteins in an analysis. Thus, longer time intervals are usually set for cellular processes with longer time scales, and shorter time intervals are set for cellular processes with shorter time scales in order to collect data. The cellular process reconstructed from those data may be expressed like a four-frame cartoon. The data at each collected data point is a snapshot representing which proteins are present and in what amounts (24). More specifically, current proteomic approaches

170

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

take typically two to five snapshots as the studied cellular process progresses and reconstruct the dynamics as discrete or jumpy movements. As the number of data collection points increases and the time interval becomes shorter, the smoother the reconstructed dynamic movement of the cellular processes. Approach for Collecting Time-Dependent Data In dynamic analysis, proteomics approaches have to be used to distinguish between different states of studied cellular processes by identifying and quantifying the differences in a time-dependent manner. Various methods, such as in vivo or in vitro isotope labeling and label-free or absolute quantification methods (described in Section 2-2) of quantitative proteomics, can be used for the strategy of dynamic analysis of cellular function (also see Section 1-3-5). However, their usefulness cannot be proved unless appropriate protein samples are obtained using suitable methodology for sample preparation, because of the restricted analyzing power of the quantitative proteomics technologies available today. One of the most successful examples in terms of the number of data points used for quantitative comparison and the scale of the object prepared is the one for dynamic analysis of the nucleolus proteome during environmental perturbations with several inhibitors, such as actinomycin D (25) (see Section 3-3-1). This example used quantitative MS technology combined with in vivo labeling and subcellular fractionation, and showed that the currently available proteomics methodology has the capability to analyze the dynamic changes of almost all protein constituents even of a whole subcellular structure. In the strategy of this example, the nucleolus was prepared from human HeLa cells isotopically labeled in vivo with SILAC (Arg0, Arg6, and Arg10) at each of five time interval points by sucrose density gradient ultracentrifugation, and was monitored for dynamic changes of 489 out of 692 nucleolar protein constituents during 3 h after an environmental perturbation by quantitative MS analysis (i.e., one set of experiments took data at 0 min, 20 min, and 80 min, and another set at 0 min, 40 min, and 180 min, and those were combined to get five data points). The dynamic behaviors of the nucleolus during various environmental perturbations were reconstructed by combining the time-dependent changes of all protein components measured (see details in Section 3-3-1). Other examples have used MS methodologies combined with not only in vivo labeling but also in vitro labeling and have analyzed the dynamic aspects of multiprotein complexes (cellular machinery) prepared by affinitybased purification methods (see Sections 1-3-1, 1-3-4, and 1-3-5). Among those examples, that of the RNA polymerase (RNAP) II preinitiation complex gives a commonly applicable strategy for the dynamic analysis of functional cellular machinery (multiprotein complexes) as well as for reliable identification of the true components using a crude affinity preparation (26, 27). In this example, proteins isolated from the specific and control samples by a single-step promoter DNA affinity procedure are labeled differently with heavy and normal stable isotopes of ICAT reagent (Fig. 3-1A; see Chapter 2, Section 2-2-1, for ICAT method). The proteins in two different purifications are then combined, digested with protease, and analyzed by the shotgun method using LC-MS/MS. The relative abundance of an identified ICAT-labeled peptide pair is determined from the ratios of the peptides’ signal

DYNAMIC ANALYSIS OF CELLULAR FUNCTION (A)

TBP(143N) nuclear extract

+

171

DNA affinity probe activator binding site

*

+

activator

1) Cation exchange LC 2) Avidin affinity LC 3) RP-LC

PstI site TATA-binding site

+ TATA-binding protein (TBP)

Online ESI-MS/MS

A tide

tide

pep

denature & reduce

pe p

Affinity- purified sample 1

B pe p tide C pe p tide D pe p tide E pep tide F

elute by PstI cleavage Affinity- purified sample 2

ICAT-labeled peptide B

Labeled with d0 ICAT

Labeled with d8 ICAT Heavy stable isotope

Light (normal) stable isotope

Quantify relative peptide levels by measuring peak ratio

Identify proteins by MS/MS

Combine & proteolysis 1)

*

2)

Distinguish specific complex components from copurifying proteins by relative quantification Determine relative abundance and dynamic changes in composition of complexes isolated from cells in different states

Fig. 3-1. Proteomics approach to the dynamic analysis of affinity-purified cellular machinery (multiprotein complexes). (A) Quantitative analysis of RNAP II preinitiation complex isolated by single-step promoter DNA affinity chromatography. [Adopted by permission from Macmillan Publishers Ltd.; Ranish et al., Nat. Genet. (Ref. 26) (2003).] In this technology, DNA affinity probe carrying activator-binding site, TATA-binding site, and PstI cleavage site is mixed with nuclear extract and activator in the presence/absence of TATA-binding protein to generate complete/incomplete preinitiation complexes. The protein components of each complex are eluted by cleaving the DNA probe with PstI, differentially labeled with “heavy” and “light” ICAT reagent, respectively, and combined. After protease digestion, to increase coverage by MS analysis, prefractionation of the digest is done by two-step chromatography using a strong cation-exchange column and avidin affinity beads. The peptide mixture is then subjected to LC-MS/MS to analyze quantitatively the difference in protein components of the complete/incomplete preinitiation complex (26). (B) Reverse-tagging approach to isolate protein complexes. In this technology, a target protein complex is first isolated by use of fi rst bait protein with an epitope tag. After proteomic characterization, another protein component identified in the complex is selected as second bait to fish protein complexes from the cell. (See pages 173–174 for more detailed information.) [From Takahashi et al., Mass Spectrom. Rev. 22:287–317 (2003). Copyright John Wiley & Sons, Ltd. Reproduced with permission.] (C) Assembly snapshot analysis of cellular machinery (multiprotein complex). The approach can be used to isolate protein complexes (machinery) with some differences as sequential snapshots of the biochemical composition of cellular machinery (multiprotein complexes) related to one another or formed in a series of processes in a specific biological event. (From Ref. 24.) (See insert for color representation.)

intensities in MS spectra (Fig. 3-1A). Authentic components of the complex are identified by their increased abundance in the specific purification compared with the nonspecific purification. The technique allowed comprehensive analysis of fully assembled RNAP II preinitiation complexes; this provided a detailed description of

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

172

(C) Assembly snapshot analysis

(B) Reverse tagging approach

Snapshot 1

Tag-1

Dynamic process Tag-3

Tag-2

Snapshot n

Fig. 3-1. (Continued)

the partially purified core RNAP II complex and led to the detection of potential new components of this complex that have been extensively studied already (27). This approach can also be applied to the comparison of protein complexes prepared form different cell states and can give the dynamics of protein components in a cellular machinery or multiprotein complex of interest along with changes of cell states. One example of this is the analysis of protein complexes associated with yeast transcription factor STE12 (26). In this example, yeast mating type a (MATa) cells, which bear the DNA fragment encoding protein-A-tagged transcription factor STE12 at its native locus, are grown in the presence or absence of α-mating factor and subjected to immunopurification. The purified proteins associated with STE12 are then labeled with either isotopically heavy or light ICAT reagents and prepared for MS analysis. The α-mating factor causes MATa cells to arrest in the G1 stage of mitosis and to express mating-specific genes in preparation for conjugation with cells of the opposite mating type. STE12 mediates transcriptional induction of those matingspecific genes in response to α-mating factor. The quantitative comparison revealed the coordinated increase in a minimal set of transcriptional machinery associated with STE12 as well as dynamic changes in other constituents of affinity purified STE12-associated transcriptional machinery after α-mating factor treatment (26). This example gave relative quantification of proteins at known concentrations with an accuracy in the range of 15% in mixtures, a linearity of approach over a tenfold range of relative abundance, and an error rate of less than 20% in the same protein

DYNAMIC ANALYSIS OF CELLULAR FUNCTION

173

among several different peptides, which are comparable to those measured by Western blotting. Another example using a similar approach is the analysis of transcription factor NF-E2p18/MafK-associated complex that dynamically changes during erythroid differentiation (28). One concern about isolation of cellular machinery (multiprotein complexes) for dynamic analysis by affinity purification is that proteins associated with those machinery only transiently or with low affinity may not be isolated at all or may be isolated in extremely low yield. Although quantitative analysis in comparison with an appropriate control sample may obviate the need for extensive purification, which is often accompanied by sample loss and loss of weakly interacting proteins (26), MS-based analysis is often unsuccessful in identifying such proteins. To overcome this problem, a method called quantitative analysis of tandem affinitypurified in vivo cross-linked protein complexes (QTAX) has been developed (29). This method involves in vivo formaldehyde cross-linking to freeze both stable and transient interactions occurring in intact yeast cells prior to lysis. To isolate crosslinked protein complexes with high purification efficiency under fully denaturing conditions, a tandem affinity tag consisting of a hexahistidine sequence and an in vivo biotinylation signal (30) was developed and used for affinity-based purification in conjunction with SILAC in vivo labeling for quantitative MS-based analysis (29). This method was applied to capturing and identifying the complete composition of yeast 26S proteasome complex and led to the identification of a total of 64 potential proteasome-interacting proteins, of which 42 are novel interactions. The method may be applicable to the dynamic analysis of other cellular machinery (multiprotein complexes). Approach Utilizing Stage-Specific Protein Association Another strategy is reported for analyzing the dynamic aspect of cellular machinery (multiprotein complexes) (31–34). Although this approach may not be a dynamic analysis in a real sense of the term, it can reconstruct dynamic aspects performed by cellular machinery (multiprotein complexes) in a way that is different from time-dependent dynamic analysis. The idea behind this approach is that some of the associated proteins in one initially isolated complex can also be present in other precursor complexes, and this would allow the purification of cellular machinery (multiprotein complexes) from different stages of a cellular process. In other words, a certain multiprotein complex is not necessarily of invariable composition nor are all protein components uniquely associated with that specific complex (31–34). Therefore, with several distinct tagged proteins as entry points to purify a complex, it is possible to isolate functional cellular machinery (multiprotein complex) formed at specific stages of a cellular process and to identify not only the core components, but also the more dynamic and regulatory components that may be present differentially in the studied machinery (multiprotein complex). This approach is uniquely developed in machinery [complex (interaction)] proteomics (see Sections 1-3-4 and 1-3-5) and is called reverse-tagging methodology (Fig. 3-1B). If the isolated cellular machinery (multiprotein complex) has features that characterize involvement in a specific stage in a sequential time

174

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

order of a studied process, or if a protein of interest associated with machinery (multiprotein complex) at a known stage in a sequential cellular process is studied, the dynamic cellular process can be reconstructed by analyzing it. The reverse tagging approach allows one to isolate protein complexes (machinery) with some differences in a series of snapshots and thus reconstruct sequential snapshots of the biochemical composition of cellular machinery (multiprotein complexes) related to one another or formed in a series of processes in a specific biological event (Fig. 3-1C) (24, 32, 33). The reverse-tagging methodology can be used either in a subtractive approach that distinguishes qualitatively some different states by enumerating differences in protein composition among the different states of protein complexes or in a quantitative approach that allows dynamic characterization of functional protein complexes or machinery in operation during a certain ongoing cellular process by quantifying their differences. These approaches enable us to analyze cellular machinery or multiprotein complexes in operation, which are required for the precise organization of molecules in time and space and for their assembly in a particular order, and whose compositions vary accordingly. This strategy has been adapted to the analyses of ribosome biogenesis (see Section 3-3-1) and mRNA splicing (see Section 1-3-5) (33–35). To choose between these approaches for dynamic analysis, a particular cellular machine (multiprotein complex) of interest must be prepared first. Sample preparation is one of the most critical steps for reconstructing the dynamic process of a particular cellular machine (multiprotein complex) of interest without any misleading interpretation, because current MS-based technologies are sensitive enough to identify even nonspecifically associated proteins in trace contamination. 3-1-2 Methods for the Isolation of a Cellular Machine/ Multiprotein Complex Affinity-Tag Purification Affinity purification methods or pull-down methods, coupled with ultracentrifugation in some cases, are used to isolate protein machinery or complex and are well suited for machinery proteomics or complex (interaction) proteomics under near-physiological conditions, as described in Section 1-3-4 (36, 37). In most of those methods, a target gene is attached with a specific “tag,” such as an epitope tag or tandem affinity purification (TAP) tag (38), to produce a DNA construct designed for a “bait” protein (Tables 3-1 and 3-2). The gene that encodes the bait protein is transfected into cultured cells to allow the tagged protein to be expressed in the cell and to form physiological complexes with its interacting proteins. The entire multiprotein complex can be isolated by the use of an immobilized antibody against the tag. The isolation of protein complex with epitope tags [such as HA, c-myc, His, FLAG, and VSV-G, Table 3-2 (39, 40)] is based on the highly selective binding between the epitope tag and anti-epitope antibody, and on its specific dissociation with an epitope peptide (Fig. 3-2A). The method can drastically reduce protein contamination in the isolated complexes when compared with conventional affinity purification methods such as those with protein-immobilized or antibody-immobilized beads (41). The protein components in the purified complex

DYNAMIC ANALYSIS OF CELLULAR FUNCTION

175

TABLE 3-1. General Approaches for Preparation of Protein Complexes/Machinery 1. Obtain ORFs(genes encoding bait protein). 2. Select epitope tag/affinity tag (epitope tag, TAP tag, GFP tag, etc.). 3. Construct expression vector (gap repair-mediated recombination, ligation-independent cloning, recombination by gateway cloning system, etc.). 4. Transfect the vector into cells (transitory expression, stable expression, inducible expression). 5. Confirm protein expression (Western blotting, immunocytochemical staining). 6. Isolate protein complex (affinity chromatography, two-step affinity chromatography, immunoprecipitation, ultracentrifugation). 7. Identify protein constituents (gel-based separation MS, shotgun method).

are subsequently analyzed by MS-based proteomics technologies (Table 3-1) (32, 38). In some cases, the epitope-tag method results in several specific contaminations in the isolated complexes, probably due to the presence of proteins with a sequence similar to that of the used epitope (42) and raises some concern about overexpression of the bait protein (described later). Two-Step Affinity Purification In the TAP method, the complex, which interacts with a bait protein that is fused with TAP-tag—which has the calmodulin-binding peptide and the IgG-binding unit of protein-A of Staphylococcus aureus connected by tabacco etch virus (TEV)-protease cleavage sequence (Glu-Asn-LeuTyr-Phe-Gln↓Gly, GAGAATTTGTATTTTCAGGGT)—is recovered from cell extract by affinity selection on an IgG matrix, is released by TEV-protease (Invitrogen) from IgG matrix, and is again trapped with calmodulin-coated beads in the presence of calcium ion (Fig. 3-2B) (38). Bait-associating complex is recovered by eluting with EGTA; thus, the TAP method involves two different affinity purification methods and results in highly purified protein complexes. The protein components in the highly purified complex can then be identified by a gel-based MS method (Section 1-2-3) and/or by the shotgun method (Section 2-1). There are other twostep purification methods similar to the TAP method, such as those using the TEV protease recognition sequence between FLAG-tag and myc-tag (39, 43) or between the FLAG-tag and in vivo biotinylation tag (24, 30). To validate the components identified by MS-based analysis of isolated cellular machinery or multiprotein complex as real entities of the complexes present in vivo, other purification methods, including immunoprecipitation using an antibody against the protein of interest and/or the pull-down method using the prepurified protein immobilized on beads, are performed (described in later sections of this chapter). These approaches try to isolate endogenous complexes containing the protein of interest in vivo and are used in some cases in conjunction with another round of MS-based identification and/or with Western blotting to confirm the first protein identification by MS-based methods (36). Because MS-based technologies are highly sensitive, they easily identify nonspecifically associated proteins or contaminated proteins even in trace amounts in the preparation of multiprotein complexes, and may result

176

Protein A (IgG binding domain)-TEV cleavage sequence-CalM-binding peptide CalM binding peptide-His-HA myc-TEV-cleavage sequence-FLAG

Glutathion-S-transferase, thrombin/factor Xa/preScission cleavage sequence Maltose binding protein Green fluorescent protein Thioredoxin Herpes simplex virus protein C-terminal 524–595 of the Klebsiella pneumoniae oxalacetate decarboxylase a subunit GAL4 DNA binding domain VIP activation domain

DYKDDDDK EQKLISEEDL YPYDVPDYA YTDIEMNRLGK MASMTGGQQMG GKPIPNPLLGLDST HHHHHH

Structure/Protein

Mainly N Mainly N

IP IP

IP

YH2 YH3

Mainly N Mainly N

Mainly C

PD IC, TL PD PD PD, IC

PD

IP, IC IP, IC IP, IC IP, IC IP, IC IP, IC IP, PD

Applications

N, C N, C N, C Mainly N Mainly N

Mainly N

N, C, I N, C, I N, C, I N, C, I N, C, I N, C N, C

Attached Site

CalM-beads, MA-beads for His and HA MA-beads for myc and FLAG, TEV protease

IgG-beads, CalM-beads, TEV protease

PA PA

MA, maltose-beads MA, MA-beads PA, PA-beads PA, PA-beads Streptavidin, streptavidin-beads

MA, glutathion-beads

MA, PE, MA-beads MA, PE, MA-beads MA, PE, MA-beads MA, PE, MA-beads MA, PE, MA-beads MA, PE, MA-beads MA, PE, Ni⫹-chelate beads

Commercial Availabilityb

a

Abbreviations: N, N-terminus; C, C-terminus; I, internal region; IP, immunoprecipitation; IC, immunocytochemistry; PD, pull-down analysis; MA, monoclonal antibody; PA, polyclonal antibody; PE, epitope peptide; TL, time-lapse analysis; YH2, yeast two-hybrid analysis; YH3, yeast three-hybrid analysis. b Available from Roche Diagnostics, Santa Cruz Biotechnology, Sigma, and/or Amersham Bioscience.

CHH MEF

TAP

Hybrid Tags

MBA GFP Thioredoxin HSV Biotinylation signal GAL4 VPI6

GST

Other Tags

FLAG myc HA VSV-G T7 V5 His

Epitope Tag

Tag

TABLE 3-2. Tags Used for Affinity Purification a (39, 40)

DYNAMIC ANALYSIS OF CELLULAR FUNCTION

177

in the misidentification of those as constituents of a studied protein complex. Therefore, it is necessary to take special care to ensure the purity of multiprotein complex or machinery (see Section 1-3-4). Especially, when an epitope-tagged protein is overexpressed, it may associate nonspecifically with unwanted proteins that are not an intrinsic or physiological partner of the protein used as bait because of the unphysiologically high concentration of the protein expressed. In addition, the overexpressed protein has to compete with the

(A) Epitope tag purification Anti-epitope antibody

Bait protein

Epitope tag Anti-epitope antibodyimmobilized beads Elute with epitope peptide

Fig. 3-2. Schematic representation of affinity purification methods. (A) Affinity purification of protein complexes by using an epitope tag. In machinery (complex) proteomics (see also Chapter 1, Section 1-3-4), a bait protein fused with an epitope tag is first expressed within a cultured cell by transfection of its cDNA construct, and the protein incorporated into the target complex is “pull-downed” by using an anti-epitope antibody immobilized on insoluble beads. After washing the beads with an appropriate buffer solution, the target complex is recovered by elution with a buffer containing an epitope peptide. Typical epitope tags are shown in Table 3–2 with their amino acid sequences. (B) Affinity purification of protein complexes by using a tandem affinity purification (TAP) tag. TAP tag consists of epitope tags that allow multiple purification steps of target protein complexes. The most widely used TAP tag, shown in this figure, is composed of two epitope tags that provide the protein A- and calmodulinbinding sites, which are connected in tandem though a short stretch of cleavage sequence recognized with TEV protease. A target protein complex incorporating the TAP-tagged bait protein is first purified on immunoglobulin (Ig) G-immobilized beads, released from the beads specifically by TEV protease-mediated proteolysis, and finally purified on calmodulin beads by use of calcium-dependent interaction between calmodulin and the calmodulin-binding epitope tag. This multistep purification removes nonspecifically bound proteins and ensures the purity of the target complex. (C) Double-tagging approach to purify protein complexes. In this technology, two different bait proteins are selected from a single target complex and are expressed with different epitope tags. The complex is first purified by use of one of those tags and then purified by use of another, thereby eliminating nontarget complexes that carry no or possibly one of the bait proteins. [From Takahashi et al., Mass Spectrom. Rev. 22:287–317 (2003). Copyright John Wiley & Sons Ltd. Reproduced with permission.]

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

178 (B) Protein A

TEV protease cleavage sequence Calmodulinbinding peptide

TAP tag

Bait protein

Expression in cells extract

TEV cleavage

Calmodulin immobilized beads

Ca2+ TEV

EGTA

Ca2+

IgG Immobilized beads TEV

Calmodulin immobilized beads

Purified protein complex

TEV

(C) Double tagging approach

Tag-1

Tag-2

Tag-1

Two-step purification

Tag-2

Fig. 3-2. (Continued)

DYNAMIC ANALYSIS OF CELLULAR FUNCTION

179

corresponding intrinsic protein in the interaction with the partner proteins, and may cause unknown regulatory effects on the cell. Therefore, the expression level of epitope-tagged protein should be kept as low or equal to endogenous level as possible. One method to avoid such unwanted effects is to use an inducible expression system coupled with knockdown of the mRNA of the intrinsic protein by RNA interference (44). Ideally, endogenous expression of epitope-tagged bait proteins from their natural chromosomal locations is recommended if possible. Such a method was used successfully in the systematic analyses of protein expression (see Section 1-2-3), protein localization (see Section 1-3-3), and protein-complex network (see Section 1-3-4) of yeast cells (http:// www.yeastgenome.org/) (45–48). In those studies, the construction of an epitope-tagged protein ORF ensures expression from its native promoter in its endogenous chromosomal location responsive to normal regulatory circuitry; thus, the protein complex formed will be equal to its endogenous counterpart. However, it is, unfortunately, not always applicable to higher organisms, such as mammals. Double-Tagging Method In addition to nonspecific contamination, isolation of protein machinery or multiprotein complexes has to consider the heterogeneity of the isolated complex by using a single bait protein, which can form various protein complexes with different protein compositions. Therefore, affi nity isolation often follows a purification step, such as that by size exclusion chromatography or sucrose-gradient ultracentrifugation, to isolate a single particle of the affinitypurified machine or protein complex in terms of size (49, 50). Machinery [complex (interaction)] proteomics (see Section 1-3-4) gives us an additional approach to deal with this: the double-tagging approach that relies on the simultaneous epitope tagging of two components (Fig. 3-2C) (32). The approach was used for isolating the large U3 snoRNP complex involved in ribosome biogenesis (51). In this approach, two different tags, such as TAP tag and the FLAG tag, are fused to two different proteins and coexpressed in the cell to allow the two modified proteins to form physical complexes with other proteins within a cell. The fi rst isolation step immobilizes a mixture of protein complexes that contain, for instance, the TAP-tagged protein through the protein-A portion of the TAP tag, which is released by TEV-protease. In the second step, only the protein complex that contains another tag, for example, the FLAG-tagged protein, is trapped on an anti-FLAG antibody column and is released with an excess of FLAG peptides. This affinity purification method with two distinct bait proteins allows the selection of the protein complex that only contains both bait proteins; thus, one can eliminate other protein complexes that are formed by interactions with only one of the bait proteins (32). Affinity Chromatography with an Antibody- or a Protein-Immobilized Column Because an epitope-tagged protein ORF is quite difficult to express from its endogenous chromosomal location responsive to normal regulatory circuitry in cells from higher organisms, such as mammalian cells, exogenous expression of tagged proteins is often used to isolate protein complexes. However, some

180

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

difficulties in expressing with tagged bait protein are observed with considerable frequency, especially in mammalian cells; that is, some bait proteins are expressed too little to recover their associating complexes due to unknown reasons, while some expressions show toxicity to the cell (32, 38). In addition, there is always some concern about the disadvantages of the tag-expression method (e.g., protein overexpression may not be possible for heteromeric complexes of unknown composition or may lead to the assembly of overexpressed protein in nonphysiological complexes). These concerns must be considered for each individual case (32, 38). Therefore, in some cases, it is necessary to use approaches other than the affinity-tag expression method for the successful isolation of protein complexes. Immunoaffinity Purification Using Antibody-Fixed Beads To overcome the above difficulties and also to solve the problems associated with overexpression, an antibody against an intrinsic protein as bait can be used to isolate its associated protein complex if the appropriate antibody is available (Fig. 3-3A) (52–55). However, use of antibody to pull down its binding protein from cell extracts inevitably leads to protein contamination associated with the used antibody, which has high molecular weights of more than 150 kDa and carbohydrate moieties (56). In addition, an additional protein, protein G or protein A, is often used between the antibody and immobilizing bead to obtain maximum binding of the antibody to its binding protein by keeping the binding site of the antibody free on the immobilized bead (Table 3-3). Protein G or protein A binds to the C-terminal half of the Fc-region of the antibody molecule, but not to the N-terminal half of the Fab region that has a binding site to its antigen, and guarantees the molecular direction of the antibody and its binding capability to antigen even when it binds to the protein A- or protein G-immobilized beads (Fig. 3-3A). This protein G or protein A also contributes to unwanted contamination. Pretreatment of protein samples or cell extracts with bead and nonspecific antibody-fixed protein G (or protein A)-immobilized beads before the affinity purification using antibody-immobilized beads drastically removes those contaminations (Fig. 3-3A). Although protein complex pull-down can be eluted from antibody-immobilized beads typically at acidic and high salt conditions, this elution causes dissociation of the antibody from protein G- or protein A-immobilized beads. This can be avoided by linking covalently between antibody and protein G or protein A with a cross-linking

TABLE 3-3. Compatibility of Protein G and Protein A to Antibody Class and Species Raised Protein

Monoclonal Antibody

Protein G Sepharose

Mouse IgG1, rat

Protein A Sepharose

Mouse IgG2a, IgG2b, IgG3

Polyclonal Antibody Mouse, rat, sheep, horse, donkey, cow, goat Rabbit, human, pig, guinea pig, dog, cat

DYNAMIC ANALYSIS OF CELLULAR FUNCTION

181

Fig.3-3. Isolation of cellular machinery (protein complex) with antibody-fixed beads. (From Ref. 55.) (A) A schematic representation of antibody fixed-affinity bead preparation and immunoaffinity purification. (See details in the text.) (B) Confirmation of cross-linking between anti-RecQ5 antibody and protein G-immobilized Sepharose beads by SDS-PAGE. Check-sample 1, before cross-linking; check-sample 2, after cross-linking. (C) Removal of nonspecifically associated proteins by preincubation of cell extract with Sepharose beads (check-sample 3) and unspecified monoclonal IgG1-fixed protein G–Sepharose beads (checksample 4). After pretreatment with unspecified monoclonal IgG1-fixed protein G–Sepharose beads, the unspecified monoclonal IgG1-fixed protein G–Sepharose beads no longer bind to any proteins in the cell extract (check-sample 5); however, the anti-RecQ5 antibody-fixed protein G–Sepharose beads associate specifically with a number of proteins including RecQ5 (check-sample 6). (D) RecQ5-associated proteins are eluted with the acidic buffers indicated. S.B indicates that SDS-sample buffer was added in the indicated acidic buffer to elute RecQ5associated proteins completely from the beads.

reagent, such as dimethyl pimelimidate (DMP) before mixing the antibody-fixed beads with protein samples or cell extracts (52, 55). Then, MS-based protein identification methods determine qualitatively or quantitatively the protein constituents of the protein complex eluted from the antibody-fixed beads. This approach allows the isolation of intrinsic protein complexes associated with a protein of

182

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

Fig. 3-3. (Continued)

interest without any cellular perturbation due to exogenous protein expression in the cell. This approach certainly requires careful and troublesome preparation of both antibody-fixed beads and protein samples; however, it is the most reliable method to isolate protein complex (machinery) naturally in the cell if appropriate antibody is available. This approach may also be used to validate results obtained by using the affinity-tag approach. 䉴 Experimental Example 3-1 Isolation of RecQ RNA helicase-associated complex using a monoclonal antibody against RecQ protein (55). MATERIALS • Human kidney cell line, 293EBNA cells (Invitrogen, Groningen, The Netherlands). • Dulbecco’s modified Eagle’s medium (DMEM) (Sigma-Aldrich Chemical, Steinheim, Germany). • Mouse monoclonal anti-RecQ5 antibody (IgG1) (57). • Unspecified monoclonal IgG1 (Sigma-Aldrich Chemical, Steinheim, Germany). • Protein G–Sepharose 4 Fast Flow (Amersham Bioscience). • PBS (phosphate buffer pH 7.4); 8.0 g NaCl, 2.9 g Na2HPO4 –12H2O, 0.2 g KCl, KH2PO4 dissolved in 1 liter of water and adjusted to pH 7.4.

DYNAMIC ANALYSIS OF CELLULAR FUNCTION

183

• 0.15 M sodium borate buffer pH 9.0; 3.02 g sodium borate and 0.5 mL IGEPAL CA-630 (octylphenoxy)polyethoxyethanol (Sigma), dissolved in 100 mL of water. • DMP—Dimethyl pimelimidate (Pierce). • 0.2 M ethanolamine (pH 8.0); 1.22 g ethanolamine (Wako, Tokyo Japan) dissolved in 100 mL of water. • Lysis buffer; 50 mM Tris-HCl (pH 8.0), 150 mM NaCl, 0.5% IGEPAL CA-630 containing 1 mM phenylmethylsulfonyl fluoride (Wako, Tokyo, Japan), 2 μg/ mL aprotinin (Wako), 2 μg/mL pepstatin A (Peptide Institute, Osaka, Japan), and 2 μg/mL leupeptin (Peptide Institute). • SDS sample buffer; 0.5 M Tris-HCl pH 6.8, 2 mL (0.1 M); 10% SDS, 4 mL (4%); β-mercaptoethanol, 1.2 mL (12%); glycerol, 2 mL (20%); water, 0.8 mL; and bromophenol blue, several drops. • 0.1 M glycine buffer; 0.1 M glycine adjusted to pH 1.0, 1.5, 2.0, 2.5, and 3.0 with HCl. • 1 M Tris-HCl, pH 9.0. APPARATUS AND APPARATUS SETUP • MALDI-ToF mass spectrometer Voyager DE-STR (PE Biosystems, Foster City, CA, USA). MALDI-ToF/MS is operated (see Section 2-1-2, Experimental Example 2-4) (1) in positive-ionization mode with reflectron optics, (2) under delayed extraction conditions in reflectron mode, and (3) with delay time of 190 ns and grid voltage at 66–70% of full acceleration voltage (20–25 kV). For linear mode experiments, the delay time is 100 ns and the grid voltage is 93.4% of the acceleration voltage. (4) The database is searched for proteins with a mass range of 0–30 kDa or 10–100 kDa for smaller proteins (apparent Mr ⬎ 20 kDa by SDS-PAGE), and 50–200 kDa or 100–300 kDa for larger proteins, with constraint on human origin. • Equipment setup for 1D-RPLC-MS/MS; the same as those used in Experimental Example 2-1 (see Section 2-1-1).

PROCEDURE Preparation of RecQ5-Specific Antibody-Fixed Protein G Beads 1. Mix RecQ5 monoclonal antibody (IgG1; 2 mg) with 1 mL (wet volume) of protein G–Sepharose 4 Fast Flow in 10 mL PBS and incubate at room temperature for 1 h with gentle mixing. 2. Centrifuge the mixture at 2000 rpm for 5 min at 4 ⬚C, remove the supernatant, and wash the beads twice with 10 mL of 0.15 M sodium borate buffer (pH 9.0).

184

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

3. Collect the beads by centrifugation at 3000g for 5 min, and resuspend in 10 mL of the borate buffer (keep 10 μL as check-sample 1). 4. Add DMP at a final concentration of 20 mM into the protein G–bound anti-RecQ5 antibody bead suspension for 30 min at room temperature with gentle mixing to be bridged covalently between protein G and anti-RecQ5 antibody. 5. Centrifuge the reaction suspension at 2000 rpm for 5 min at 4 ⬚C, remove the supernatant, and wash with 5 mL of 0.2 M ethanolamine (pH 8.0). 6. After removal of the wash supernatant, add another 5 mL of 0.2 M ethanolamine (pH 8.0), and incubate for 2 h at room temperature with gentle rotation to stop the bridging reaction. 7. Centrifuge at 2000 rpm for 5 min at 4 ⬚C, remove the supernatant, and store in PBS containing 0.05% sodium azide until use (check-sample 2). Preparation of Unspecified Monoclonal IgG1-Fixed Protein G Beads Columns 8. Mix unspecified monoclonal IgG1 (IgG1; 2 mg) with 1 mL (wet volume) of protein G–Sepharose 4 Fast Flow in 10 mL PBS and incubate at room temperature for 1 h with gentle mixing. 9. Repeat steps 2–7. Confirmation of Antibody Immobilization on Sepharose Beads 10. Add 10 μL of SDS-PAGE sample buffer to 10 μL each of check-samples 1 and 2, and boil for 5 min at 100 ⬚C. 11. Analyze by SDS-PAGE with a 12% polyacrylamide gel. For check-sample 1, both heavy (50 kDa) and light (25 kDa) chains can be detected if the binding of antibody to protein G or protein A-immobilized Sepharose beads is successful, while only light chains can be detected for check-sample 2 after the successful cross-reaction (Fig. 3-3B). Preparation of Cell Extract for Immunoaffinity Purification 12. Remove the culture medium from each of seven 90-mm dishes containing cultured human 293EBNA cells until confluent growth, add 10 mL of PBSsaline to each dish gently, and remove the PBS. 13. Add 10 mL of PBS to each dish, harvest confluent cells from seven 90-mm dishes using a cell scraper, and transfer the harvested cells to a centrifugation tube. 14. Centrifuge at 1000 rpm for 5 min at 4 ⬚C, and remove the supernatant. Repeat this step twice. 15. Add 7 mL of lysis buffer to the precipitated cells, and incubate for 30 min at 4 ⬚C. 16. Centrifuge at 15,000 rpm for 30 min at 4 ⬚C to remove insoluble residue and collect the soluble cell lysate into a centrifugation tube.

DYNAMIC ANALYSIS OF CELLULAR FUNCTION

185

17. Mix the soluble cell lysate with 0.7 mL of Sepharose CL-6B beads suspended in an equal amount of lysis buffer (v/v) for 1 h at 4 ⬚C with gentle rotation. 18. Centrifuge the mixture at 1000 rpm for 5 min at 4 ⬚C, and transfer the supernatant to a new centrifugation tube. Add SDS-sample buffer to the remaining Sepharose CL-6B and collect the solubilized solution (check-sample 3). 19. Add 0.1 mL of unspecified monoclonal IgG1-fixed protein G–Sepharose beads, and incubate for 2 h at 4 ⬚C with gentle rotation. 20. Centrifuge the mixture at 1000 rpm for 5 min at 4 ⬚C, and transfer the supernatant to a new centrifugation tube. Use the supernatant as cell extract for further immunopurification. Add SDS-sample buffer to the remaining unspecified monoclonal IgG1-fixed protein G–Sepharose beads and collect the solubilized solution (check-sample 4). [To check the removal of proteins, which possibly associate nonspecifically with the RecQ5-specific monoclonal antibody-fixed protein G–Sepharose beads, the cell extract was mixed with new unspecified monoclonal IgG1-fixed protein G–Sepharose beads. After removal of the supernatant, SDS-sample buffer was added to the remaining unspecified monoclonal IgG1-fixed protein G–Sepharose beads and the solubilized solution was collected (check-sample 5).] Immunoprecipitation Using Antibody-Fixed Protein G–Sepharose Beads 21. Add 20 μL of RecQ5-specific monoclonal antibody-fixed protein G–Sepharose beads into 1 mL of the cell extract prepared as in step 20, and incubate for 4 h at 4 ⬚C or overnight with gentle rotation. 22. Centrifuge at 1000 rpm for 5 min at 4 ⬚C, and remove the supernatant carefully. 23. Add 1 mL of lysis buffer to the RecQ5-specific monoclonal antibody-fixed protein G–Sepharose beads, mix adequately, and centrifuge at 1000 rpm for 2 min at 4 ⬚C. 24. Remove the supernatant, and repeat step 23 four more times. Take an aliquot of the remaining beads, add SDS-sample buffer, and collect the solubilized solution (check-sample 6). 25. Wash the RecQ5-specific monoclonal antibody-fixed protein G–Sepharose beads with 1 mL of 50 mM Tris-HCl pH 8.0 containing 150 mM NaCl to remove detergent (IGEPAL CA-630) adequately. Elution of the RecQ5-Associated Protein Complex with Acidic Buffer 26. Add 20 μL of 0.1 M glycine buffer with either pH 1.0, 1.5, 2.0, 2.5 or 3.0 to the RecQ5-specific monoclonal antibody-fixed protein G–Sepharose beads prepared as in step 25, and incubate for 1 h at 4 ⬚C with occasional mixing. 27. Centrifuge at 1000 rpm for 5 min at 4 ⬚C, collect the supernatant to a new Eppendolf tube, and neutralize the supernatant by adding several microliters of 1 M Tris-HCl pH 9.0. Add-SDS-sample buffer to the remaining

186

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

RecQ5-specific monoclonal antibody-fixed protein G–Sepharose beads, and collect the solubilized solution to check proteins that are not eluted by the acidic buffer. PROTEIN IDENTIFICATION BY MS-BASED METHODS PMF Method Using MALDI-ToF MS 28. Perform in-gel protease digestion by following Protocol 1.1 (see Section 1-2-3, Fig. 1-4B). 29. Mix peptides purified with ZipTip C18 with an equal volume of 10 mg/mL α-cyano-4-hydroxycinnamic acid in 50% acetonitrile, 0.1% trifluoroacetic acid, and apply aliquots of 0.5 μL onto a target disk and allow to air dry. 30. Set the target disk onto MALDI-ToF MS and obtain MS spectrum. 31. Use the database-fitting program MS-Fit available at the World Wide Web site at the University of California at San Francisco (http://prospector.ucsf. edu/ucsfhtml3.4/msfit.htm) to interpret MS spectra of protein digests. The following criteria are used for positive identification of proteins: • A minimum of five measured peptide masses must match tryptic peptide masses calculated for an individual protein, in the database, with ⬍50-ppm average deviation in mass between measured and calculated values. • The peptides identified by these matches must provide at least 15% sequence coverage of the identified protein. Incomplete tryptic cleavage and peptide modifications that may alter the peptide masses, such as oxidized methionine or carbamidomethyl cysteine, are calculated for the putatively identified protein and compared with the measured masses. The modified peptides identified in the search are added to the list to increase the number of matching peptides and sequence coverage. • Protein that revealed the highest scored PMF matching by database search is retrieved as the identified protein. SHOTGUN METHOD USING 1D-RPLC-MS/MS Protease Digestion in Solution 32. Precipitate 0.2 μg of the isolated RecQ5-associated complex using 20 μL of mixed methanol and chloroform (1:1 v/v). 33. Centrifuge to collect the precipitate at 15,000 rpm for 1 min, and remove the supernatant. 34. After vacuum drying, digest the precipitate with 5 μL of Achromobacter protease I (40 pM, substrate-to-enzyme ratio ⫽ 50:1) dissolved in Tris buffer (50 mM Tris-HCl, 6 M urea, 0.005% n-octylglucopyranoside, pH 9.0) overnight at 37 ⬚C.

DYNAMIC ANALYSIS OF CELLULAR FUNCTION

187

LC-MS/MS Analysis 35. Analyze the peptide mixture using a DNLC system connected to an ESI-MS/ MS spectrometer and identify proteins as described in Section 2-1-1, Experimental Example 2-1. RESULT The first pretreatment of the cell extract with Sepharose beads showed that Sepharose beads trapped huge numbers of proteins (Fig. 3-3C, check-sample 3), indicating it is very effective in removing proteins, which are potential contaminants recovered by subsequent immunopurification. The second pretreatment still showed absorption of a number of proteins on unspecified monoclonal IgG1-fixed protein G–Sepharose beads (Fig. 3-3C, check-sample 4), and removed almost completely potential contaminants from the cell extract (Fig. 3-3C, check-sample 5). Those pretreatments ensure that the proteins immunoprecipitated by using RecQ5-specific antibody-fixed protein G beads were at least specifically associated with anti-RecQ5 antibody (Fig. 3-3C, check-sample 6). Those results, in turn, indicated that without those pretreatments of the cell extract huge numbers of proteins would be contaminated by immunopurification using RecQ5-specific antibody-fixed protein G beads. The recovery of the RecQ5-associated complex from the antibody-fixed beads was examined by varying the pH of the elution buffer from 1.0 to 3.0 at a 0.5 increment. The highest recovery was obtained by elution at pH 1.5 (Fig. 3-3D). This elution purified the RecQ5-associated complex further and increased the efficiency of protein identification, especially by subsequent shotgun analysis, because it avoided the elution of a large amount of light chains originating from anti-RecQ5 antibody compared with elution using SDS-sample buffer containing a reducing reagent, such as DTT (Fig. 3-3D). The PMF and shotgun analysis identified about 100 proteins in the RecQ5-associated complex. 䉳 Pull-Down Purification Using Immobilized Protein Beads Prepurified protein can be used to pull down its associated protein complex from cell or tissue extract. This method is based on the principle that the prepurified protein used as bait can replace its corresponding intrinsic protein present in protein complex through affinity exchange between endogenous and prepurified bait proteins. Although prepurified bait protein needs to be used in excess amount compared to that of endogenous protein in this approach, it can be a complementary method to affinity-tag expression or immunopurification methods (especially when bait protein is expressed in too small an amount to recover its associating complex in the cell expression system due to unknown reasons), when expression of a bait protein shows toxicity to the cell, or when appropriate antibody is not available. In the pull-down approach, a prepurified bait protein is immobilized either by covalent linking directly to matrix beads, such as NHS- or BrCN-activated Sepharose beads (Amersham Bioscience), or by noncovalent binding between an affinity tag and its binding (e.g., ligand,

188

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

binding domain; Table 3-2) that is preimmobilized on the matrix beads’ partner, if an affinity-tagged bait protein was used. Affinity beads prepared by the former method generally reduce the binding capacity and efficiency of the immobilized protein bait due to random covalent linking of the bait protein molecules to the matrix of the beads, which masks the binding site of the bait protein molecule. In addition, specific dissociation of protein complex from immobilized bait protein is not possible unless a specific inhibitor or antagonist of the interaction between bait protein and binding protein complex is available. On the other hand, the latter approach, when affinity-tag-fused bait protein is used, is based on the highly selective binding between the epitope or affinity tag and anti-epitope antibody or binding partner, respectively, and on its specific dissociation with an epitope peptide, antagonist, or other interaction disrupter, which is available in most cases. The latter approach therefore generally has a high binding capacity, specificity, and recovery and can drastically reduce protein contamination in the isolated complexes when compared to the former approach. However, introduction of extra affinity tag may cause additional binding proteins to the affinity tag itself; also, affinity tag with a high molecular weight has a high risk of unwanted contamination. To overcome this drawback, the latter approach can also introduce a protease-specific recognition sequence between the affinity tag and bait protein in a way similar to TAP tag, which makes it possible to release protein complex associated with bait protein from immobilized beads specifically by the action of the protease. Various expression systems that produce such epitope-fused (Table 3-2) or affinity-tag-fused protein with protease recognition sequence are commercially available (e.g., pGEX-4T has CTCGTTCCGCGTGGATCT sequence that encodes Leu-Val-Pro-Arg-Gly-Ser for thrombin cleavage between GST and bait protein; pGEX-3X or 5X encodes Factor Xa recognition sequence, Ile-Glu-Gly-Arg↓, and pGEX-6P encode PreScission recognition sequence, Leu-Glu-Val-Leu-Phe-Gln↓Gly-Pro; Amersham Bioscience). Among those expression systems, a bacterial expression or in vitro protein expression system is widely used to produce affinity-tagged bait proteins; for example, an E. coli expression system is the most commonly used to produce protein fused to glutathione-S-transferase (GST) tag, maltose-binding protein, or thioredoxin, because the fusion improves efficiency and the yield of protein expression. With such expression systems, one can use commercially available affinity beads for prepurification. As an experimental example, we describe a pull-down purification using GST with a thrombin cleavage site, incorporated into human Parvulin (hParvulin), which belongs to the third family of peptidyl prolyl cis–trans isomerases that exhibit an enzymatic activity of interconverting the cis–trans conformation of the prolyl peptide bond but have no known function (58, 59). In this example, thrombin cleavage is used to elute gently the protein complex from the glutathione beads without any unwanted contamination of GST-associated proteins and/or glutathione-beadbound proteins (Fig. 3-4A). It shows that the approach can isolate protein complex even if exogenously expressed protein behaves differently from its endogenous counterpart.

DYNAMIC ANALYSIS OF CELLULAR FUNCTION

189

Fig. 3-4. Pull-down isolation of protein complex using GST-bait protein-GST-Sepharose and thrombin cleavage. (A) Schematic representation of isolation of GST-associated protein complex by affinity pull-down and thrombin cleavage elution. (See details in the text.) (Adapted from Ref. 58.) (B) Time course of hParvulin release from hParvulin-fixed GST-Sepharose beads by thrombin cleavage. Molecular weight markers (MW, kDa) at left side, incubation time (h) at upper side, and proteins (GST-hParvulin, hParvulin, and GST) at right side are indicated. (From Ref. 58.) (C) Time course of Pin1 release from Pin1-fixed GST-Sepharose beads by thrombin cleavage. Time course of Pin1 release from GST-Pin1-fixed GST-Sepharose is shown in five lanes at left side. Pin1-Thro-GST indicates that GST-fused Pin1 protein with two repeats of thrombin recognition sequence was cleaved with thrombin for the time indicated. O/N, overnight. (From Ref. 58.) (D) Specific isolation of hParvulin-associated complex by thrombin cleavage method. Lane 1, proteins associated with GST were eluted with SDS-containing buffer; lane 2, proteins associated with GST-hParvulin were eluted with SDS-containing buffer; lane 3, proteins associated with GST were eluted by thrombin cleavage method; lane 4, proteins associated with GST-hParvulin were eluted by thrombin cleavage method. (From Ref. 58.) (E) The presence of hParvulin-associated complex in the nucleolus. The cytosol fraction was used for pull-down analysis with GST (lane 1) and GST-hParvulin (lane 2). The nuclear fraction was used for pull-down analysis with GST (lane 3) and hParvulin (lane 4). (From Ref. 58.)

190

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

Fig. 3-4. (Continued)

DYNAMIC ANALYSIS OF CELLULAR FUNCTION

191

䉴 Experimental Example 3-2 Isolation of protein complex using GST–hParvulin– glutathione–Sepharose and thrombin cleavage method (58, 59).

MATERIALS • • • • • • • • • • • • • • • • • • • • •

• •

Bacto Tryptone and Bacto Yeast Extract (DIFCO, Detroit, MI, USA). Escherichio coli BL21 (DE3). Expression vector pGEX-4T-1 (Amersham Bioscience). IPTG (isopropyl β-D-thiogalactopyranoside). Mouse fibroblast cell line L929 cells, human kidney cell line 293EBNA, and human HeLa cells (Invitrogen, Groningen, The Netherlands). RPMI 1640 medium (Nissui Pharmaceutical Co., Ltd., Tokyo, Japan). DMEM (Dulbecco’s modified Eagle’s medium, Sigma-Aldrich Chemical, Steinheim, Germany). DAPI (4⬘,6-diamidino-2-phenylindole, Sigma-Aldrich Chemical, Steinheim, Germany). Anti-FLAG monoclonal antibody (Sigma-Aldrich Chemical, Steinheim, Germany). Glutathione–Sepharose 4B, Sepharose CL-6B (Amersham Pharmacia Biotech AB, Uppsala, Sweden). Thrombin protease (Amersham Pharmacia Biotech AB, Uppsala, Sweden). Trypsin (sequence grade, Promega, Madison, WI, USA). Piperazine diacrylamide (Bio-Rad Laboratories, Hercules, CA, USA). ZipTip C18 (Millipore, Bedford, MA, USA). 0.45 μm pore size filter unit (Millipore, Bedford, MA, USA). LipofectAMINE (Gibco BRL, Grand Island, NY, USA). OPTI-MEM medium (Gibco BRL, Grand Island, NY, USA). Alpha-cyano-4-hydroxycinnamic acid (Sigma-Aldrich Chemical, Steinheim, Germany). Nonionic detergent IGEPAL CA-630 (Sigma-Aldrich Chemical, Steinheim, Germany). All other reagents (Wako Pure Chemical Industries, Osaka, Japan). PBST (phosphate buffer containing 0.1% Triton X-100); NaCl 8.0 g, Na2HPO4•12H2O 2.9 g, KCl 0.2 g, and KH2PO4 0.2 g in 1 liter of water plus 1 mL of Triton X-100. Lysis buffer; 1.0 mL of 1 M Hepes (10 mM), and 1 mL of 1 M KCl (10 mM) in 100 mL of water, adjusted to pH 7.8 with 1 M NaOH. 10% IGEPAL CA-600 solution; IGEPAL CA-630 100 μL, lysis buffer 900 μL.

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

192

• Wash buffer; 50 mM Tris-HCl, pH 7.8 containing 150 mM NaCl and 1% Triton X-100 (Tris 3.029 g, NaCl 4.383 g, Triton X-100 5 mL in 500 mL of water, adjusted to pH 7.8 with HCl). • Thrombin cleavage buffer; wash buffer containing 0.5 mM EDTA, 0.5 mM EGTA, 1 mM DTT, 50 mM NaF, 40 mM glycerol phosphate, 10 mM Na3VO4, 2 μg/mL pepstatin A. • SDS sample buffer; 0.5 M Tris-HCl pH 6.8, 2 mL (0.1 M); 10% SDS 4 mL (4%); β-mercaptoethanol 1.2 mL (12%); glycerol 2 mL (20%); water 0.8 mL; and 1% bromophenol blue, a few drops. APPARATUS AND APPARATUS SETUP • • • • • •

Incubator. Disk rotor (Bio CRAFT). Disk filter (0.8 μm pore size, Millipore). Syringe. Ultrafree filter (0.22 μm pore size, Millipore). PE Biosystems MALDI-ToF mass spectrometer (Voyager DE-STR, Foster City, CA, USA). MALDI-ToF/MS see Section 2-1-2, Experimental Example 2-4, and (For operation of Experimental Example 3-1). • Equipment setup for 1D-RPLC-MS/MS; the same as that used in Experimental Example 2-1 (see Section 2-1-1). PROCEDURE Preparation of GST-Fused hParvulin 1. Transfect the expression vector (pGEX-4T-1) containing the gene coding for hParvulin that is fused at the amino terminal end to GST through a short sequence (or sequence repeats) with a thrombin cleavage site (or multiple cleavage sites) to E. coli BL21 (DE3) cells. 2. Grow in LB medium (1.0% Bacto Tryptone, 0.5% Bacto Yeast Extract, 1.0% NaCl, and 1 mM NaOH), containing ampicillin (50 μg/mL). 3. Grow BL21 (DE3) cells for a further 4 h at 37 ⬚C after the induction of protein expression at A600 ⫽ 0.7 with 0.1 mM IPTG. 4. Harvest the grown cells by centrifugation at 8000 rpm for 30 min at 4 ⬚C. 5. Lyse the harvested cells in sodium phosphate buffer–saline (PBS, pH 7.4), containing 2 mM EDTA, 0.1 mM phenylmethanesulfonyl fluoride (PMSF, Wako, Tokyo Japan), and 0.1% 2-mercaptoethanol (2-ME); sonicate and centrifuge at 16,000 rpm for 30 min at 4 ⬚C. 6. Apply the supernatant to a glutathione–Sepharose column and elute with reduced glutathione as described in the manufacturer’s instruction manual (Phamacia).

DYNAMIC ANALYSIS OF CELLULAR FUNCTION

193

7. Add ammonium sulfate to saturate the eluate from the glutathione– Sepharose column and let stand at least 4 h at 4 ⬚C. 8. Centrifuge at 20,000 rpm for 30 min at 4 ⬚C and remove the supernatant. 9. Dissolve the precipitate with minimum volume of PBS (pH 7.4) and dialyze twice against 1 liter of PBS (pH 7.4), containing 0.1% 2-ME. Determination of Conditions of Protease Cleavage on GST-Fused Bait Protein Immobilized Beads 10. Mix 1 mL of PBST with 100 μL of new glutathione–Sepharose 4B with inversion. 11. Centrifuge at 1500 rpm for 4 min at 4 ⬚C, and remove the supernatant carefully. 12. Repeat steps 10 and 11 three times to remove completely the ethanol contained in the storage solution of Sepharose 4B beads. 13. Add 0.5 mg of purified GST–hParvulin to the washed glutathione–Sepharose 4B beads on ice and stand for 30 min with occasional mixing. 14. Centrifuge at 1500 rpm for 1 min at 4 ⬚C, and remove the supernatant carefully. 15. Add 1 mL of PBST, mix, centrifuge at 1500 rpm for 1 min at 4 ⬚C, and remove the supernatant. 16. Repeat step 15 twice. 17. Divide the GST–hParvulin-bound Sepharose 4B beads into five tubes. 18. Add 2 units of thrombin in 20 μL to each tube. 19. Add 20 μL of SDS sample buffer to one of the tubes soon after thrombin is added (control sample for thrombin cleavage, Fig. 3-4B). 20. Incubate the remaining four tubes at 16 ⬚C with occasional mixing, and add 20 μL of SDS sample buffer to one of the tubes every 1 h to stop the cleavage reaction (Fig. 3-4B). 21. Analyze by SDS-PAGE, and determine the appropriate reaction time. If cleavage did not proceed appropriately, increase the ratio of thrombin added, the reaction time, or the reaction temperature, and repeat steps 10–21 with appropriate modification. If effi cient cleavage was not obtained, one remedy is to insert additional repeats of thrombin recognition sequence between GST and the bait protein; that is, insert one or two repeats of the CTCGTTCCGCGTGGATCT sequence next to the original nucleotide sequence encoding thrombin recognition sequence into pGEX-4T-1 vector (see GST-Pin1 in Fig. 3-4C). Another remedy is to use other proteases and cleavage sequences as described later. If cleavages with thrombin were observed within the bait protein molecule, use an other expression vector, such as pGEX-3X, pGEX-5X, or pGEX6P, to produce bait protein.

194

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

Preparation of Nuclear Extract from Culture Cells 22. Grow mammalian cells in RPMI 1640 or DMEM medium supplemented with 10% heat-inactivated fetal calf serum, 2 mM L-glutamine, 160 U/mL penicillin G, and 0.1 mg/mL streptomycin at 37 ⬚C in an incubator with 5% CO2. 23. Remove culture medium from the dishes of cultured mammalian cells until confluent growth appears; harvest and transfer the harvested cells to a centrifugation tube. 24. Centrifuge at 1000 rpm for 5 min at 4 ⬚C, and remove the supernatant. 25. Add PBS-saline, centrifuge at 1000 rpm for 5 min at 4 ⬚C, and remove the supernatant. Repeat this step twice. 26. Add lysis buffer containing 2 mM MgCl2, 0.1% IGEPAL CA-630, 1 mM DTT, 50 mM NaF, 40 mM glycerol phosphate, 10 mM Na3VO4, 5 μg/mL PMSF, and 2 μg/mL each of aprotinin (Wako), pepstatin A (Peptide Institute, Osaka, Japan), and 2 μg/mL leupeptin (Peptide Institute) and incubate on ice for 15 min. 27. Centrifuge the lysate at 1000 rpm for 3 min at 4 ⬚C. The supernatant is used as the cytosolic fraction and the precipitate as the nuclear pellet. 28. Confirm the purity of the nuclear pellet by microscopic observation under fluorescent microscopy BX50 FLA (KS Olympus, Tokyo, Japan) after DAPI staining. 29. Sonicate the nuclear pellet on ice in lysis buffer containing 1 mM DTT, 50 mM NaF, 40 mM glycerol phosphate, 10 mM Na3VO4, 5 μg/mL PMSF, 2 μg/mL aprotinin, 2 μg/mL pepstatin A, plus 1% IGEPAL CA-630 and 1% sodium deoxycholate (Wako, Osaka, Japan). 30. Centrifuge at 16,000 rpm for 5 min at 4 ⬚C. The supernatant is used as nuclear extract. Isolation of GST–hParvulin-Associated Complex 31. Mix the nuclear extract or the cytosolic fraction with Sepharose CL-6B for 1 h at 4 ⬚C with disk rotor and centrifuge at 14,000 rpm for 5 min at 4 ⬚C. 32. Filtrate the supernatant with a filtrating membrane having a 0.45 μm pore size. 33. Incubate the filtrated supernatant with 50 μL of glutathione–Sepharose 4B beads preincubated with GST–hParvulin for 1 h at 4 ⬚C. 34. Centrifuge at 1500 rpm for 1 min at 4 ⬚C, and remove the supernatant. 35. Add 1 mL of wash buffer containing 0.5 mM EDTA, 0.5 mM EGTA, 1 mM DTT, 50 mM NaF, 40 mM glycerol phosphate, 10 mM Na3VO4, 5 μg/mL PMSF, 2 μg/mL aprotinin, and 2 μg/mL pepstatin A, mix, centrifuge at 1500 rpm for 1 min at 4 ⬚C, and remove the supernatant. 36. Repeat step 35 four times. 37. Add 1 mL of thrombin cleavage buffer, mix, centrifuge at 1500 rpm for 1 min at 4 ⬚C, and remove the supernatant. 38. Repeat step 37 once more.

DYNAMIC ANALYSIS OF CELLULAR FUNCTION

195

39. Release the hParvulin-associated complexes from the glutathione– Sepharose beads by cleaving between GST and hParvulin with 5 units thrombin in 50 μL of thrombin cleavage buffer for 2 h at 16 ⬚C. 40. Repeat steps 33–39 for GST bound to glutathione–Sepharose 4B beads as control. 41. Centrifuge at 1500 rpm for 1 min at 4 ⬚C and collect the supernatant (hParvulin-associated complex fraction). 42. Filtrate the supernatant with an ultrafree filter. 43. Add 50 μL of thrombin cleavage buffer to the precipitate (GST-bound glutathion– Sepharose beads), mix, and centrifuge at 1500 rpm for 1 min at 4 ⬚C. 44. Filtrate the supernatant with an ultrafree filter and conbine with the hParvulinassociated complex fraction. PROTEIN IDENTIFICATION BY MS-BASED METHODS PMF Method Using MALDI-ToF MS 45. Perform in-gel protease digestion and MALDI-ToF MS analysis as described in Experimental Example 3-1. Shotgun Method Using 1D-RPLC-MS/MS 46. hParvulin-associated complex was concentrated with methanol and chloroform (1:1 v/v) and digested in solution by the method described in Experimental Example 3-1, and was analyzed by LC-MS/MS using a DNLC system connected to an ESI-MS/MS spectrometer (Q-ToF2, Miromass) in order to identify proteins as described in Section 2-1-1, Experimental Example 2-1. RESULT Release of hParvulin from GST–hParvulin-bound glutathione–Sepharose beads occurred extensively even after 1 h incubation with thrombin and gradually increased as incubation time increased (Fig. 3-4B). Although the release was completed with a 3 h incubation, the incubation time for the release of hParvulin-associated complex was set at 2 h to avoid possible degradation of constituents of the complex. When the same conditions were applied to release of GST–Pin1, another human peptidyl-prolyl isomerase, immobilized on glutathione–Separose beads as another example, however, the cleavage was not as efficient as that of GST–hParvulin (Fig. 3-4C). Addition of extra thrombin recognition sequence between GST and Pin1 protein improved the cleavage, proving its effectiveness on protein release from the glutathione–Sepharose beads (Fig. 3-4C, GST-Thro-Pin1). The thrombin release of hParvulin from glutathione–Sepharose beads when mixed with cell extract obviously decreased the nonspecific protein background that originated from the association with GST-bound glutathione–Sepharose beads, as well as the contamination of GST protein in the eluate containing hParvulin-associated

196

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

complex (Fig. 3-4D). Although hParvulin-associated proteins were expected in both the cytosolic fraction and the nuclear extract, because it was reported to be present in the nucleus as well as the cytoplasm, the hParvulin-associated proteins were obtained mostly from the nuclear extract (Fig. 3-4E). PMF and shotgun analyses identified 140 protein components in the isolated hParvulin-associated complex, including 48 ribosomal proteins and 55 possible trans-acting factors whose yeast homologs are found in preribosomal ribonucleoprotein (pre-rRNP) complexes formed at various stages of ribosome biogenesis. The MS-based analysis of protein identification leads to the unbiased prediction that hParvulin is probably a component of pre-rRNP complex involved in ribosome biogenesis of mammalian cells. Other biochemical and cell biological evidence—including nucleolar localization, presence of preribosomal RNA species in the isolated complex, and its presence in pre-40S and pre-60S intermediate fractions—supported the prediction. On the other hand, when hParvulin was fused to FLAG epitope and expressed in human cells, it tended to be excluded from the nucleolus and become dispersed throughout the nucleoplasm, while endogenous hParvulin was clearly concentrated in the nucleolus (data not shown). In addition, immunoprecipitation of FLAG– hParvulin using anti-FLAG antibody conjugated beads did not show any association with other proteins. It is possible that this result reflects a potential altered specificity of FLAG–hParvulin; exogenously expressed hParvulin certainly differed from the endogenous protein in terms of cellular localization and preferential binding partners in the cell. Thus, exogenous expression of tagged hParvulin probably induced some form of cellular stress, thereby causing qualitative changes in pre-rRNP complexes and/or nucleolar structure. This example promotes awareness of using exogenous expression of tagged bait protein and in turn shows that the method described here can be used as an alternative approach for the isolation of a protein complex if the expression of a tagged protein by transfecting its expression vector into cells is not successful. In addition, the prepurified tagged protein can be used as the second bait protein for the “double-tagging methodology.” In another independent experiment, a new large human coactivator complex necessary for the estrogen receptor alpha (ER alpha) transactivation, which contains GCN5, HAT, c-Myc-interacting protein TRRAP/PAF400, TAF (II) 30, and other subunits, was purified by the use of the double-tagging methodology (60). In this case, the coactivator complex was first pulled down with prepurified GST-ER alpha from the extract of HeLa cells that were transfected with the FLAG-tagged GCN5 gene; it was further pulled down by FLAG antibody beads and was eluted with FLAG peptides (60). 䉳 3-1-3 Cellular Machinery (Multiprotein Complex) Multiprotein complexes that form the basic functional modules of their molecular machinery execute most cellular activities. The full use of the methodologies currently available in proteomics combined with well-established biochemical and cell biological approaches made it possible to isolate and to characterize almost any protein complex formed in the cell, so that the analysis of cellular processes seems to be

DYNAMIC ANALYSIS OF CELLULAR FUNCTION

197

no longer limited by an inability to isolate the synthetic intermediates of functional protein complexes or cellular machinery. In fact, proteomic methodologies have characterized a number of fundamental cellular machinery composed of many protein components, including transcription machinery (61–65), the spliceosome (35, 49, 50, 66), the nuclear pore complex (67, 68), the anaphase-promoting complex (53, 69, 70), the inward rectifier potassium channel (Kir2.x)-associated complex (71), the cullin-RING ubiquitin ligase complex (DDB1-CUL4-ROC1 complex) (72), the ribosome (73–75), the RNA polymerase II preinitiation complex (76), the mammalian mediator complex of RNA polymerase II (77), and the RNA polymerase II elongation complex (78). Other important multiprotein complexes analyzed by proteomic approaches include those associated with transcription factor stem cell regulator C/enhancer binding protein α (EBPα) (79), B-cell lymphoma 6 (BCL6) transcription factor (80), glucocorticoid receptor (81), transcriptional corepressor mSin3-histone deacetylases (HDACs) (82), histone H2B monoubiquitinating Rad6/ Bre1 (83), and chromatin at a boundary (84). The dynamic natures of various cellular machinery—including those of human translation initiation cap-binding eIF4F complex in human cells grown under different conditions (85), thyroid hormone receptor coactivation complex that undergoes a conformational change that allows for the interaction of coactivating proteins necessary for gene transcription upon the binding of thyroid hormone (86)—have been also reported. In addition, by using a quantitative double-tagging proteomics approach that integrates mass spectrometry, stable isotope labeling, and affinity purification, in conjunction with DNA nuclease digestion, the dynamic nature of the histone H2AX-associating protein complex in mammalian chromatin in response to ionizing radiation has been characterized (87). The analysis monitored the dynamic changes of the complex for 2 h after the cell was exposed to ionizing radiation, and showed that the H2AX complex undergoes dynamic changes upon induction of DNA damage and during DNA repair. Proteomics technologies coupled with affinity purifications have also analyzed intracellular signaling cascades that are linked to cell surface receptor and associated proteins, such as N-methyl-D-aspartate (NMDA) receptor-adhesion protein (36), EGF-receptor-MAP kinase (13, 88, 89), presynaptic MALS (mammalian LIN7)-CASK-liprin-α (90), CD4/lck receptor (91), metabotropic glutamate receptor 5 protein (92), GluR6 kainate receptor (93), and Wnt signaling complex containing beta-Catenin and 14-3-3 zeta (94). Their analyses have revealed even an entire network of protein complexes involved in the intracellular signal transduction pathway; for example, the tumor necrosis factor (TNF)-α-transcription factor NF-κB signaling pathway in human cells (95). In the proteomic analysis of the TNF-α/NF-κB pathway, the TAP strategy was used to systematically isolate cellular complexes around all 32 known and candidate components involved in the pathway, including the TNF receptors, receptor proximal components, MAP3K signal relay kinase, the IKKα, β, and γ subunits, all NF-κB transcription factor subunits, and the IκB inhibitor subunits, to screen in vivo for constitutive and signal-induced protein interactions. The TAP constructs were integrated stably into TNF-α-responsive HEK293 cells by retrovirus-mediated gene transfer and expressed the tagged proteins at close to endogenous levels by adjusting the multiplicity of infection. Each tagged protein

198

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

was purified in the complex with its endogenous associated proteins from uninduced and TNF-α-induced cells. In total, 237 TAP purifications (at least four purifications per TAP-tagged component) were made and their LC-MS/MS analyses resulted in 680 nonredundant protein hits in total, and assigned 171 out of the 241 interactions previously reported in the literature for the 32 pathway components chosen, indicating a benchmark success rate of about 70% (Fig. 3-5). Unsuccessful identifications of the remaining interactions are attributed to the transient or low affinity nature of associated proteins, the dependence on stimuli other than TNF-α, steric interference by the tag, and low level expression and tissue specificity. The false-positive rate of the TAP-MS approach in mammalian cells is estimated below 20% on the basis of a genome-wide saturation screen in S. cerevisiae. Knowing those limitations, the analysis identified in total 33 interactions that are dependent on a stimulus of TNFα or NIK (Fig. 3-5). In addition, it identified a number of changes in ubiquitination upon TNF-α stimulation, including those of the IκB and NF-κB precursors p105 and p100 (95) (http://tnf.cellzome.com). As described in Chapter 1 (see Section 1-3-4), far more ambitious work on the genome-wide network of multiprotein complexes has been done in yeast cells using TAP methodology by two large groups (Section 1-3-4) (47, 48), thus accelerating the analysis of the dynamic aspect of the network in the model organism—yeast. In fact, system biology is constructing new studies, such as those for a comparative approach toward interpreting these networks, contrasting networks of different species and molecular types, and under varying conditions (96). Those studies are intended to elucidate cellular machinery and to predict protein function and interaction in terms of mathematical formulations. However, all of those approaches in system biology cannot be realized until proved correct by experimental evidence. Proteomics is now providing experimental datasets to understand cellular function on a far larger scale and faster than the other conventional technologies. 3-2 DYNAMICS OF RIBOSOME BIOGENESIS One of the biggest advances in understanding specific cellular function by the advent of proteomics is that in ribosome biogenesis, which is a process of making ribosomes in the cell. Ribosome biogenesis is essential for cell growth, proliferation, and adaptation and accounts for up to 80% of the energy consumption of dividing cells (97). Therefore, disturbances in the ribosome synthesis pathway must be detected and coupled with cell cycle progression to prevent premature cell divisions. Many studies, over three decades, have been done on yeast cells and a great deal of knowledge has been gained about processing of preribosomal RNA (pre-rRNA) and many factors involved in ribosome biogenesis. Ribosome biogenesis starts with transcription of precursor of rRNA in the nucleolus. Association of ribosomal proteins with rRNA is believed to begin on the nascent pre-rRNA and most of the 80 known ribosomal proteins are already bound to the rRNA before transport of ribosomal subunits (i.e., small 40S subunit and large 60S subunit) to the cytoplasm.

DYNAMICS OF RIBOSOME BIOGENESIS

199

Fig. 3-5. Connectivity map of the TNF-α/NF-κB signal transduction pathway. The pathway is visualized as a network in which proteins are represented as shapes and colors that indicate a functional category (for complete Alliance for Cellular Signaling convention, see http:// www.signalling-gateway.org) and with lines between proteins indicating that these proteins copurified. TAP-tagged pathway component are grouped in boxes. [Reprinted by permission from Macmillan Publishers Ltd.; Bouwmeester et al., Nat. Cell Biol. (Ref. 95) (2004).]

200

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

During this assembly process of ribosomal proteins, a number of trans-acting factors are associated with the pre-rRNA and their processing products in the form of a preribosomal ribonucleoprotein (pre-rRNP) complex (98–102). At least three major types of pre-rRNP complexes, the 90S, pre-40S, and pre-60S particles, are formed during ribosome biogenesis (103). The earliest pre-rRNP complex in ribosome biogenesis is pre-90S, which is composed of the nascent transcript of preribosomal RNA (pre-rRNA). It is separated into 40S and 60S preribosome particles (pre-40S and pre-60S, respectively) and their maturations take place independently. During maturation, the preribosomes move from the nucleolus toward the nuclear pore through the nucleoplasm, and primary RNA transcript is cleaved at specific sites in strictly defined order (32, 102). Many trans-acting factors, including those involved in RNA methylation and pseudouridylation, work on pre-rRNP complexes one after another to produce matured rRNAs (i.e., 18S ribosomal RNA, 5.8S and 25S or 28S ribosomal RNAs) and to assemble ribosomal proteins during this process. Another constituent of the large subunit (5S ribosomal RNA) is synthesized independently of this pathway, assimilated into the process, and assembled eventually into the 60S subunit. The final maturation occurs after passage through the nuclear pore and the two subunits assemble into a functional 80S mature ribosome in the cytoplasm (104). It is clear from this synthetic pathway that the dynamic process must exist for the subunits’ ordered assembly and disassembly from the pre-rRNAs and ribosome RNAs (rRNAs) (98). Despite this well-written scenario of ribosome biogenesis, however, there was very little information concerning the nature of pre-rRNP complexes as synthetic intermediates of ribosomes, in which the assembly of ribosomal proteins onto prerRNAs takes place with the help of a number of trans-acting factors. Each transacting factor is associated with the pre-rRNP complex only transiently to perform its action at the right place and at the right time. In addition, its action has to be coordinated with the actions of other constituents on the appropriate pre-rRNP complex (107). Because of the speed of action of the trans-acting factors, it was believed that isolation of each pre-rRNP complex as a synthetic intermediate of ribosome was not possible. In fact, until recently, the identification of many processing and assembly factors in eukaryotes was made on the basis of individual biochemical and genetic studies done mostly on yeast. Current proteomic approaches, however, made it possible to isolate and characterize various pre-rRNP complexes (33, 108–112). Ribosome assembly is now best understood in yeast, where a large number of ribosome precursors have been isolated and their components identified. Proteomic strategy, mostly using affinity purification and MS-based protein identification methodology, has dramatically changed understanding of the assembly of ribosome. It is now possible to outline ribosome assembly in the light of the preribosomal particles formed during ribosome biogenesis, even for mammals. The formation of the pre-rRNP complexes is highly dynamic and involves ∼80 ribosomal proteins, preribosomal RNAs (pre-rRNAs), over a hundred small nucleolar RNAs (snoRNAs), and at least 170 trans-acting proteins in yeast that were identified and currently implicated in post-transcriptional ribosome synthesis (101, 105, 106).

DYNAMICS OF RIBOSOME BIOGENESIS

201

3-2-1 Snapshot Analysis of Preribosomal Particles in Yeast The strategy for analyzing the pathway of ribosome assembly in yeast cells utilized a reverse-tagging approach described in Section 3-1-1 (approach utilizing stagespecific association). This strategy was based on results obtained by early analyses for several pre-rRNP complexes isolated by affinity purification using TAP-tagged trans-acting factors: Nug1p (108), Nog2p (113), Nop7p (110), and Ssf1p (114). Those analysis showed that; (1) several stable pre-60S particles (109) were successfully isolated despite the speed with which the cell modifies and processes the pre-rRNA, and (2) those particles shared some proteins but differed in other proteins and RNA composition and appeared to reflect the presence of a series of distinct nucleolar and/or nucleoplasmic pre-60S particles. These results suggested that some of the associated proteins in a pre-rRNP complex could be present in other precursor particles; thus, pre-rRNP particles from different stages of maturation can be purified (Fig. 3-6A) (32, 34). The first reverse-tagging strategy was carried out using seven trans-acting factors (Nsa3p, Nop7p, Sda1, Rx1, Arx1p, Kre35p, and Nug1p) as affinity baits, each of which was fused to TAP tag in DNA constructs to express its native chromosomal position in yeast cells, and its associated pre-rRNP complex was isolated by twostep affinity purification. Five of those (Nsa3p, Nop7p, Sda1, Rix1, and Arx1p) were the components of the pre-rRNP complex associated with Nug1p, a nuclear GTPase that is involved in pre-60S export from the nucleoplasm to the cytoplasm (108). The isolated Nug1p-associated pre-rRNP complex had a size of around 60S shown by sucrose gradient ultracentrifugation fractionation and contained mostly late precursors to the 25S and 5.8S rRNA (i.e., 5⬘ extended 25S, 27SB, and 7S pre-rRNAs) but not precursors to the 18S or 35S primary transcript shown by Northern blot analysis, indicating that the pre-rRNP complex is formed at late stages of 60S biogenesis. Each of the other six pre-rRNP complexes isolated was characterized in terms of the size and pre-rRNA species by similar analyses. The protein components were then identified by PMF and MS/MS analyses after in-gel protease digestion of SDS-PAGE separated gel. In addition, those factors were tagged with the green fluorescent protein (GFP) and examined for their subcellular localization. Species and sizes of pre-rRNAs present in studies of pre-rRNP complex were used as indicators of the stage at which studied pre-rRNP complex is present stably in a sequential order of the processing pathway (Fig. 3-6B). They can be examined by Northern blot, primary extension, or pulse chase experiment. The exact processing step that studied trans-acting factor is performed in vivo; the factor can also be examined by either Northern blot or pulse chase experiment in combination with loss of function experiment using disruption of corresponding gene, temperature mutant, or small interference RNA (siRNA). For example, in the reverse-tagging analysis described earlier, primary extension and Northern analyses detected predominantly the 27SA2, 27SB pre-rRNAs, and 7S rRNA but not 18S rRNA in the Nsa3 particle that is unique in the presence of the U3 snoRNA but not U14 or snR10 snoRNA, assigning the pre-rRNP complex associated with Nsa3 as the earliest ribosomal pre-60S particle. Similar analysis showed that both Nop7 and Nug1 contained mainly 27SB, 7S pre-rRNA, and 5S rRNA, but less of the 27SA2 precursor RNA and

202

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

Fig. 3-6. (A) Assembly snapshot analysis of pre-rRNP complexes formed at various stages of ribosome biogenesis. Using appropriate protein as affinity bait can isolate each of the pre-rRNP complexes formed at various stages of ribosome biogenesis. (See insert for color representation.)

a higher ratio of matured 25S rRNA; thus, the pre-rRNP complexes associated with those factors are formed at stages later than that at which the Nsa3-associated pre60S is formed. On the other hand, pre-rRNP complexes associated with Rix1, Arx1, and Kre35 contained mostly mature 25S, 5.8S, and 5S with smaller quantities of 27S and 7S precursors, while Sda1 did not detectably coprecipitate any pre-rRNAs. These RNA analyses made it possible to order the isolated pre-rRNP complexes in the processing pathway of ribosome biogenesis. One other indicator to order the prerRNP complexes in the assembly pathway is size of pre-rRNP complex (pre-90S, pre-60S, and/or pre-40S) associated with a trans-acting factor that can be examined by sucrose density gradient ultracentrifugation or size-exclusion chromatography. Reverse-tagging analysis described showed that each bait protein coprecipitated with pre-rRNP complexes, but with markedly different sedimentation profiles. For example, Nsa3 displayed a complicated sedimentation pattern with one peak at ∼40S, a second broader peak from 60S to 90S, and some partitioning into fractions below 90S. In contrast, Sda1, Arx1, and Kre35 showed a distinct and confined peak at ∼60S. Nop7 and Rix1 reveal an “intermediate” sedimentation pattern, with a pronounced peak at 60S and a second broader peak below 80S (34). These results indicated that the bait proteins analyzed are associated with different 60S preribosomal particles, and in some cases are associated with more than one particle. The cellular localization can also be an additional indicator of ordering of the prerRNP complexes in the assembly pathway. In the same reverse-tagging analysis described earlier, the steady-state location of preribosomal particles in living cells was determined by examining yeast strains expressing GFP-tagged Nsa3, Nop7, Rix1, Arx1, or Kre35 by fluorescence microscopy. This examination showed varied cellular localization for each of the trans-acting factors: that is, both nucleolar and nucleoplasmic

DYNAMICS OF RIBOSOME BIOGENESIS

203

rDNA repeat unit 5S

5’ETS

B2 B0

D B1S E C1

A0 A1

ITS1

18S

ITS2

5.8S

3’ETS

25S

Transcription

Rnt1p

Primary RNA transcript

Rat1p

C/D-box snoRNPs H/ACA-box snoRNPs

A0

Processome (U3 snoRNP)

5S

35S 2’-O-ribose methylation, pseudouridylation

A1

33S

Rat1p/ Xrn1p

A2

32S

Rat1p/ A3 Xrn1p

20S

27SA2 Alternative pathways

RNase MRP

27SA3 Rat1p/ Xrn1p

D

20S 18S

B1S Major pathway

B2

Minor pathway B1L

C2 C1

C2 C1

27SBL

27SBS E

5.8SS

7SS

E

Exosome

25S

5.8SL

7SL

Exosome

25S

(B)

Fig. 3-6. (B) The processing pathway of the primary 35S pre-rRNA in yeast. The transcript of the 35S pre-rRNA operon is released from the transcriptional machinery by cotranscriptional cleavage at a site within 3⬘ETS by endonuclease Rnt1p. The initial processing site of the 35S at A0 yields 33S, which is further cleaved at site A1 to produce 32S. By the cleavage at A2, the 32S is split into 20S and 27SA2, which are destined for the 40S and 60S ribosomal subunits, respectively. 20S is processed by cleavage at D to produce the mature small subunit 18S rRNA species. 27SA2 is processed by two alternative pathways (major and minor) to result ultimately in mature 5.8S and 25S for the 60S subunit. In the major pathway, cleavage at site A3 yields 27SA3, which is converted to 27SBS by exonucleolytic digestion. In the minor pathway, 27SA2 is cleaved at B1L instead of A3 to produce 27SBL. Identical steps further process the 27SBS and 27SBL species; these precursors are cleaved at C2 and processed to C1 to release mature 25S and either 7SS or 7SL, which is processed from C2 to the 3⬘ end of 5.8S by exonuclease digestion to generate 5.8SS or 5.8SL. Endonucleolytic cleavage sites are indicated by ↑. Sites and directions of exonucleolytic processing are indicated by → for 5⬘ to 3⬘ exonucleolytic processing, and by ← for 3⬘ to 5⬘ exonucleolytic processing. Functional complexes (SSU processome, C/D-box snoRNPs, H/ACA-box snoRNPs, exosome, and RNase MRP) are indicated at the steps they work, or are expected to work. [From Takahashi et al., Mass Spectrom. Rev. 22:287-317 (2003). Copyright John Wiley & Sons, Ltd. Reproduced with permission.]

localization of Nug1-GFP, nucleolar localization of Nsa3 and Nop7, nuclear localization of Rix1-GFP, nucleoplasmic localization of Sad1, dual localization of Arx1-GFP in the nucleoplasm and in cytoplasm, and exclusive cytoplasmic distribution of Kre35. It revealed a pathway for pre-60S maturation, from the predominant nucleolar particles associated with Nsa3 and Nop7, to nucleolar/nucleoplasmic particles (Nug1),

204

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

Fig. 3-6. (C) The 90S preribosomal complex is proposed to contain the 35S rRNA, the U3 snoRNA, and some 40S processing factors. The early pre-rRNA is cleaved at A0, A1, and A2, and the processed products are eventually incorporated into mature 40S and 60S particles. (From Ref. 116.) (D) Outline pathway of ribosome synthesis in yeast. The pre-90S particle is proposed to contain the 35S/33S/32S rRNA and many trans-acting factors including SSU processome (indicated in square). The early pre-rRNA cleavages [at sites A0 to A2 in part (C)] lead to the separation of the pre-40S and pre-60S particles. In both pathways, a series of intermediates are formed (pre-60S E0, pre-60S E1, pre-60S E2, pre-60S M, pre-60S L, and pre-60C for intermediates of large 60S subunit, and pre-40S E and pre-40S L for intermediates of small 40S subunit) according to their position on the proposed pathway. Each of the intermediates is proposed to contain the pre-rRNAs indicated. Trans-acting factors are indicated at both sides of the intermediates and boxed with an arrow bar to show their presence on the corresponding intermediates. Probable cellular localization of the intermediates is also indicated. Final maturation of large and small subunits occurs in the cytoplasm, and they are also included (matured 60S and matured 40S). Note that it is very probable that other preribosomal complexes exist in addition to those shown, and it is not clear in what order the components are gained and lost between the complexes. (From Ref. 116.) (See insert for color representation.)

nucleoplasmic particles (Rix1 and Sda1), nucleoplasmic/cytoplasmic particles (Arx1), and cytoplasmic Kre35-associated particles. Based on those results and others, the pre-rRNP complexes isolated by the reversetagging approach using seven trans-acting factors as baits were successfully ordered in their processing stages from early stage in the nucleolus, to middle to late stage

DYNAMICS OF RIBOSOME BIOGENESIS

205

in the nucleus, and to mature stage in the cytoplasm. Thus, those analyses provided sequential “assembly snapshots” of the biochemical composition of the isolated prerRNP complexes along with the maturation and export pathway from nucleolar processing through nuclear export to cytoplasmic maturation (Fig. 3-6B). At the same time, those analyses showed that a given trans-acting factor used as bait occurred in different 60S pre-rRNP particles formed at various stages (early, middle, and late) of ribosome biogenesis, but when purified, it is preferentially associated with pre60S particles formed at one of those stages. More specifically, for example, in the analysis described, Arx1 was found associated with pre-rRNP complexes formed at every early, medium, and late stages of ribosome biogenesis; however, the isolated pre-rRNP complex associated with Arx1p was characterized as that formed at late stages occurring in the nucleus/cytoplasm. This may be because the concentration of a given bait protein differs in the various precursor particles at steady state. Thus, the result of Arx1p probably reflects that Arx1 may be at its highest concentration in late nuclear/cytoplasmic particles and occurs in lower amounts in earlier particles. Similarly, Nsa3 was found associated with the earlier pre-60S particles, which had many known 60S biogenesis factors but also contained a few components that also play a role in 40S subunit biogenesis, such as core factors of both box C/D and box H/ACA snoRNAs that are required for post-transcriptional modification of the prerRNA. Because most snoRNA-directed modification takes place on the 35S rRNA, the presence of the snoRNP proteins in the pre-60S particle is an indication that the early cleavage of primary transcript of 35S rRNA is not fully completed in the Nsa3-associated pre-rRNP complexes. In addition to the size of the pre-rRNP complex, the cellular localization as well as the compositions of pre-rRNA species and trans-acting factors can be used as indicators for specifying ordering of the stages at which pre-rRNP complexes are formed in a sequential process of ribosome biogenesis. Thus, the reverse-tagging approach made it possible to reconstruct a dynamic assembly process, as it would appear when pre-60S particles are matured and transported to the nucleoplasm by specifying which trans-acting factors are removed, which others remain attached, and which additional factors transiently associate, and provided an initial biochemical map of 60S ribosomal subunit formation on its path from the nucleolus to the cytoplasm. Reverse-tagging strategy has also been applied to the characterization of early preribosomal particles and used twelve nucleolar trans-acting factors (Pwp2p, Rrp9p, YDR449c, Krr1p, Noc4p, Kre31p, Bud21p, YHR196w, YGR090w, Enp1p, YJL109c, and Nop14p) as affinity bait with TAP tag (33). Each of the pre-rRNP complexes isolated contained principally 35S pre-rRNA and components of U3 snoRNP specifically required for 40S subunit synthesis. In addition, all of the complexes isolated by this study had approximately 90S sedimentation coefficient upon sucrose density gradient ultracentrifugation, indicating that the major form of the isolated complex represented the 90S preribosomes (115). The analysis of those complexes by MSbased methods revealed that 90S preribosome particles are associated with many trans-acting factors involved in 40S synthesis but not with those involved in 60S synthesis. This result indicates a remarkable dichotomy of ribosome biogenesis in that assembly of 40S-processing factors onto the 35S pre-rRNA occurs prior to assembly

206

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

of the 60S synthesis machinery (Fig. 3-6C) (116). The isolated 90S preribosome particles contained most of the constituents of the large U3-snoRNA-associated processing machinery, called processome, that is responsible for the processing of 18S pre-rRNA, and together with other trans-acting factors found in the complex probably form the terminal knobs at the 5⬘ end of nascent pre-rRNA transcripts (33). The processome was isolated by the double-tagging approach described in Section 3-1-2 (see Affinity-Tag Purification and Double-Tagging Method) (51). The analysis also observed that there were some variations in the composition of pre-rRNA species other than 35S pre-rRNA and nonribosomal proteins among the pre-rRNP complexes isolated. These variations might reflect genuine differences, probably indicative of an ordered assembly during early stages of the pathway involving 90S particles. However, this also imposed some limitations on the approach used, because (1) the efficiency of TAP coprecipitation may be influenced by the properties of the individual bait proteins, and (2) qualitative MS-based analysis makes the identification of small proteins and low abundance proteins less reliable. Basically, the same strategy has been used in many other works and about 50 prerRNP complexes in total in yeast have been analyzed to date (105). Almost all analyses relied on affinity-tag purification, 1D-SDS-PAGE separation of protein components, and PMF and/or LC-MS/MS protein identification. Those analyses, including the early monumental work described earlier, have so far identified around 110 proteins in total, that were found associated with the bait proteins used, and assigned those into three major intermediates of ribosome—about 40 proteins into pre-90S, about 10 into pre-40S, and 60 into pre-60S particles—although some of those proteins were commonly found among the three intermediates (102, 105). For pre-60S intermediate, further subgroups are being defined on the basis of particle localization, and pre-rRNA and/or protein composition; namely, at least six intermediates designated as early (pre-60S E0, pre-60S E1, and pre-60S E2), middle (pre-60S M), late (pre-60S L), and cytoplasmic (pre-60S C) are proposed on a synthetic pathway (Fig. 3-6D). In addition to those intermediates, other preribosomal complexes probably exist; however, it is not clear in what order the components are gained and lost between the complexes (32, 102, 105). On the other hand, the 90S and pre-40S particles are less well characterized. This is possibly because of their extremely short lifetime or their structural instability in the cell; for example, their formation is coupled to transcription of pre-rRNA, so that the purification of those intermediates requires a dissociation step from DNA, such as DNase treatment, which may cause breakage of the structure of the primary intermediates. The trans-acting factors associated with preribosomal particles are certainly essential for ribosome biogenesis. On the one hand, many other factors have been discovered by genetic screens and are involved in ribosome biogenesis; however, their stable association with preribosome intermediates remains unclear. Among them, many of those not found associated with pre-rRNP complexes are ones with enzymatic activity or ones involved in RNA cleavage and trimming, and post-transcriptional/ post-translational modification (pseudouridinylation, phosphorylation, and RNA helication) (105). In general, enzymes interact but release their substrate soon after completion of their actions, so that those with enzymatic action may remain only for a moment on pre-rRNP complexes due to their intrinsic nature. One possible methodology

DYNAMICS OF RIBOSOME BIOGENESIS

207

to analyze such interaction is QTAX (29) described in Section 3-1-1 (Approach for Collecting Time-Dependent Data). This method involves in vivo formaldehyde crosslinking to freeze both stable and transient interactions occurring in intact cells prior to performing isolation steps for the complex under study. Adaptation of such methodology will be the next important step in the dynamic analysis of ribosome biogenesis. Surprisingly, despite the fact that assembly of ribosomal proteins onto rRNAs is the main event of ribosome biogenesis, no systematic analysis has been done in yeast because of a lack of quantitative analysis in the studies of ribosome biogenesis to date. Almost all work done for pre-rRNP complexes isolated from yeast cells involved a subtractive approach based on qualitative characterization of protein constituents of isolated pre-rRNP complexes. Ribosomal proteins are often detected and identified even in affinity-purified protein complexes not related to ribosome biogenesis, because ribosomal proteins are the most abundant proteins among those present in almost all cells. Therefore, a qualitative approach has difficulty in validating whether those identified ribosomal proteins are real entities or just contamination. Introduction of a quantitative methodology will be needed to analyze the dynamic nature of the associations. Currently, a new method involving pulse chase monitored by quantitative MS (PC/QMS) has been developed to follow the assembly of the 20 ribosomal proteins with 16S rRNA during formation of the functional 30S subunit of E. coli in vitro. In this method, PC/QMS took advantage of the ability of MS to quantify large numbers of proteins relative to stable isotope-labeled species (see Sections 2-2-1 and 2-2-2). Assembly of 30S subunits is initiated by incubating the E. coli 16S rRNA with a mixture of uniformly 15N-labeled 30S proteins. At various time points, the binding of the 15N proteins is chased with an excess of unlabeled 14 N proteins. Completely formed 30S subunits are purified, and the 15N:14N ratio for each protein is determined by MALDI-ToF. The 15N:14N ratios can be quantified accurately, as judged by standard curves collected on known mixtures of labeled and unlabeled proteins, and the majority of the 30S proteins are observed in a single scan. Plotting the fractional isotope ratios for a given protein as a function of time produces a progress curve for the binding of that protein during assembly of the whole subunit. In this way, the binding kinetics of all the ribosomal proteins can be determined in a single experiment (117). Although this method cannot be applied directly to the analysis of dynamic assembly of ribosomal proteins in vivo, it may provide a way of thinking for quantitative kinetic characterization of the assembly of pre-rRNP complexes formed during ribosome biogenesis in the cell. Finally, it should be noted that, despite the importance of post-translational modification in the regulation of ribosome biogenesis, no strategy used in modification proteomics has yet been applied to the analysis of pre-rRNP complexes. Methodologies used in modification proteomics can also be applicable to the analysis of prerRNP complexes in terms of not only a dynamic but also regulatory point of view during ribosome biogenesis in yeast cells. 3-2-2 Snapshot Analysis of Preribosomal Particles in Mammals The results obtained from genetic and proteomic analyses of yeast ribosome biogenesis should be useful in understanding higher eukaryotic systems, such as those

208

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

of the human. However, the strategies to regulate ribosome biogenesis must be much more diversified because higher eukaryotic organisms such as humans are constructed from a number of differentiated cells (32, 118). In fact, differences in the order of processing events and intermediates of the rRNA have been reported for different mammalian cell types (119, 120). In addition, defects in human genes that encode proteins involved in ribosome biogenesis cause various diseases—such as dyskeratosis congentia, which is caused by mutations in the gene encoding dyskerin (a pseudouridine synthetase that catalyzes post-transcriptional modification of rRNA) and characterized by premature aging and an increased susceptibility to cancer, Diamond–Blackfan anemia, which is characterized by increased susceptibility to cancer and is caused by mutations in genes that encode ribosomal proteins S19 and S24 (121), and Treacher Collins syndrome, which is caused by a defect in a gene responsible for craniofacial disorder (122) and is also connected to deregulation of human ribosome biogenesis (123, 124). Those examples probably represent deregulation of only a few diversified mechanisms by which humans as well as other mammals produce ribosome in different cells. Furthermore, an increasing body of evidence suggests that deregulation of ribosome biogenesis is associated with malignancy; that is, several tumor suppressors and proto-oncogenes, such as Arf and cMyc, were found to affect human ribosome biogenesis, suggesting that they might regulate malignant progression by altering the protein synthesis machinery (121). Those examples, in turn, strongly suggest that ribosome biogenesis of higher eukaryotes, especially human, is closely linked to higher-order cellular function and that there exists a regulatory mechanism specific to ribosome biogenesis associated with those functions. It is therefore obvious that the detailed molecular mechanisms that underlie rRNA processing and assembly of ribosome biogenesis in each eukaryotic species or even specific cells must be deciphered independently of yeast to elucidate their unique features and regulation. Despite its importance, however, analysis of ribosome biogenesis has been limited to yeast and only a few small studies-works have been done for humans and other mammals (32). Only eight trans-acting factors [nucleolin (NCL), nucleophosmin (B23), fibrillarin (FIB), Mpp10, Bop1, Pes-1, Wdr1, and dyskerin] involved in mammalian ribosome biogenesis had been identified by conventional biochemical and cell biological analyses before the proteomic approach was introduced (44, 125–132). Yeast counterparts of NCL, FIB, Mpp10, Bop1, Pes1, Wdr1, and dyskerin have been assigned to Nsr1p (also known as She5p/ Pab1p), Nop1p, Mpp10p, Erb1p, Nop7p, Ytm1p, and Cbf5p, respectively, whereas B23 has no yeast counterpart. Although pre-rRNP complex from human HeLa cells had been isolated by immunoprecipitation with an anti-NCL antibody, only NCL, B23, FIB, and S6 were identified in the isolated pre-rRNP complex by discriminatory method, that is, Western blotting biased by the selection of antibody (133). Thus, knowledge about mammalian ribosome biogenesis is quite limited compared to that of yeast and remains to be exploded for the most part. The major obstacle to studying ribosome biogenesis in mammals, especially humans, is the difficulty in doing genetic analysis and obtaining precursors of ribosome in sufficient amount and with high purity. To overcome those problems, an informatics approach has tried to assign function to uncharacterized proteins in about

DYNAMICS OF RIBOSOME BIOGENESIS

209

700 nucleolar proteins by data integration coupled to a machine-learning method, and has proposed a draft of the human ribosome biogenesis pathway encompassing 74 proteins, of which 49 were clamed to be previously uncharacterized proteins. In this approach, possible protein complexes formed in the nucleolus were predicted by integrating various existing data, including those for protein–protein interaction, known functional protein complexes, cellular localizations, protein expression patterns, and nucleolus protein dynamics across species (134). This kind of approach is certainly useful in creating a working hypothesis; however, it cannot be proved without any experimental evidence. The recent advances in proteomics technologies can now overcome many of the difficulties associated with the analysis of human and/or mammalian ribosome biogenesis and can provide experimental evidence for the predictions obtained from the informatics point of view. Strategy Using Highly Sensitive Nano-LC-MS/MS for the Analysis of Pre-rRNP Complexes Formed at Various Stages of Human/Mammalian Ribosome Biogenesis At present, only one group has attempted to apply a reverse-tagging methodology to isolate human and other mammalian pre-rRNP complexes (32). The strategy used was as follows: (1) NCL tagged with FLAG was expressed transiently in cells and used as affinity bait to isolate pre-rRNP complex at the beginning of the reverse-tagging analysis (135); (2) the purity of the isolated NCL-associated pre-rRNP complex was confirmed by specific dissociation from NCL using an RNA oligonucleotide that corresponds to the nucleolin recognition element (NRE) sequence and by isolating the pre-rRNP complex using an NRE binding domain mutant of NCL; (3) the protein components of the prerRNP complex were identified by PMF and shotgun analysis; (4) some of the identified protein components that are known trans-acting factors, including B23 and FIB, and have homologies in amino acid sequence with yeast trans-acting factors involved in ribosome biogenesis were used as second-step affinity bait to isolate other pre-rRNP complexes; (5) the size of the pre-rRNP complex associated with each of the bait proteins was determined by Western blot analysis with anti-FLAG antibody after fractionation of the nuclear extract obtained from FLAG-fused bait-expressing cells by sucrose density gradient ultracentrifugation; (6) the species of pre-rRNA and rRNAs associated with each of the isolated pre-rRNP complexes were determined by Northern blot analysis using synthetic oligonucleotides corresponding to the regions in 5⬘ ETS, 18S, ITS1, ITS2, 5.8S, 28S, and 3⬘ ETS; (7) the cellular localization of each of the bait proteins was examined by immunocytochemical analysis; (8) each of the isolated pre-rRNP complexes with and without RNase treatment was analyzed mainly by the shotgun method using highly sensitive nano-LC-MS/MS equipment (see Section 2-1-1, Experimental Example 2-1) and occasionally in conjunction with a gel-based method (32, 42, 123, 132); (9) some of the newly identified proteins were used as the third-step affinity bait to isolate additional pre-rRNP complexes and the analyses of steps (5)–(8) were performed; (10) the identified proteins were compared with trans-acting factors or possible trans-acting factors identified in the isolated yeast pre-rRNP complexes; and (11) the outline of human/mammalian ribosome biogenesis was drawn based on the results obtained from the analyses described above. No less than 30 proteins, including NCL, FIB, B23, Nop56, Nop52 (NNP1), hnRNPU (HN), treacle (TC), hParvulin (Par14), Bop1, Arf, dyskerin, SF2Ap32,

210

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

PBK1, RNA helicase Gu (DDX21), DDX47, DDX27, DDX18, DDX9, DDX5, EBNAbp1, and Nop132, have so far been used as affinity baits to isolate their associated protein complexes. Some representatives of the isolated complexes are shown in Fig. 3-7A; they dissociated mostly by RNase treatment. Northern blot analysis showed that they contained pre-rRNAs and mature rRNAs, indicating they were pre-rRNP complexes. Nano-LC-MS/MS analyses of each of the isolated pre-rRNP complexes generated 1500–3000 MS/MS spectra that were attributed to 700–1200 peptides and assigned 100–200 proteins. What we have learned from those analyses in terms of technical point of views are: 1. Although anti-FLAG antibody-fixed beads currently available almost always bind protein arginine methyltransferase 5 (PRMT5), protein phosphatase 2C, methylosome protein 50 (MEP50), and kinesin-like spindle protein (hKSP), which are eluted by FLAG peptides (those proteins should be excluded from the protein identification list unless quantitative increase is proved), the FLAG-tag method is very useful to isolate pre-rRNP complexes from human cells (42). 2. Although a gel-based MS method could uniquely identify some protein components, the shotgun method using nano-LC-MS/MS identified approximately twice as many proteins in each isolated protein complex than those by gelbased MS did (32, 74). 3. Most of the proteins that have homology to the trans-acting factors involved in yeast ribosome biogenesis were identified by nano-LC-MS/MS analysis only with extremely low flow rate (∼50 nL/min; efficiency of protein identification was dependent on flow rate) (42, 74, 123), suggesting the necessity of highly sensitive technology for protein identification of human/mammalian pre-rRNP complexes. 4. The shotgun method of protein identification analysis could be repeated three or more times per isolated pre-rRNP complex and would allow one to get reproducible results in a short period of time. 5. The high sensitivity of nano-LC/MS/MS method could identify a number of proteins in a control sample prepared from mock or empty/FLAG vectortransfected cells; however, care should be taken to isolate pre-rRNP complexes during the affinity purification process. 6. RNase treatment of isolated pre-rRNP complex can give information about proteins associated with bait protein without RNA integrity and is probably useful for analyzing protein–protein interactions within the pre-rRNP complex (42). 7. The determination of species and sizes of pre-rRNAs was sometimes disturbed by degradation of some isolated pre-rRNP complexes possibly due to conformational instability; thus, extreme care is required for the preparation of pre-rRNP complexes in terms of RNA degradation. Reverse-tagging analysis applied to human pre-rRNP complexes has so far identified about 90 probable human/mammalian trans-acting factors

DYNAMICS OF RIBOSOME BIOGENESIS

211

Fig. 3-7. Snapshot analysis of human pre-rRNP complexes isolated by reverse-tagging approach. (A) Examples of human preribosomal particles isolated by reverse-tagging method. The pre-rRNP complexes affinity-purified by using the human trans-acting factors (NCL, nucleolin; B23, nucleophosmin; FIB, fibrillarin; NNP, Nop52; Nop56; TC, treacle; HN, hnRNP U) as bait were subjected to SDS-PAGE and visualized by silver staining. Arrows indicate bait proteins. ⫹, with RNase treatment; ⫺, without RNase treatment. (B) Model for coupling between rDNA transcription, pre-rRNA modification, and processing. [Reprinted from Ref. 150. Copyright (2005) with permission from Elsevier.] In this mode, Pol I is associated with a large number of processing factors at the promoter region, forming a huge Pol I holoenzyme. After initiation of transcription, numerous pre-rRNA processing factors, like modification guide snoRNPs and t-Utp complex, are recruited to the pre-rRNA. Since the t-Utp complex can bind the pre-rRNA in the absence of the SSU processome, it is possible that the t-Utp complex binds the pre-rRNA very early, thereby nucleating the assembly of additional pre-rRNA processing factors, including the U3 snoRNP, leading to the formation of the SSU processome. This is visualized as the large terminal knob on the 5⬘ end of the pre-rRNA. Cotranscriptional cleavage can then occur in ITS1. A smaller knob is visualized on the pre-rRNA, which becomes larger and more compact when Pol I nears the 39 end of the rDNA. This knob (termed the LSU knob) may contain factors required for the synthesis of 5.8S and 25S and thus the 60S ribosomal subunit. When elongating Pol I reaches the 3⬘ETS, the RNase III endonuclease, Rnt1, is loaded onto the pre-rRNA, and cleavage in the 3⬘ETS proceeds cotranscriptionally. (C) Separation of the FIB-associated subcomplex and RNP complexes by ultracentrifugation through a sucrose gradient. FLAG-tagged FIB genetransfected 293EBNA cells were subjected to immunoprecipitation using anti-FLAG. The immunoprecipitate was fractionated into 18 fractions by ultracentrifugation on a 12–50% sucrose gradient. The arrow below the SDS-PAGE gel indicates the fraction containing the subcomplex. Molecular weights are indicated to the left. M, molecular weight markers; FIB full, components of the immunoprecipitate loaded onto the gradient; PRMT1 and 5, protein arginine methyltransferase 1 and 5; SF2Ap32, splicing factor 2-associating protein 32; FIB, fibrillarin. [From Yanagida et al. (Ref. 42) (2003). Reprinted with permission from the American Society for Biochemistry and Molecular Biology, Inc.]

212

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

Fig. 3-7. (Continued)

that are homologs to those of yeast involved in ribosome biogenesis (Table 3-4), as well as about 90 non-ribosomal proteins that have no known function in ribosome biogenesis or that have no yeast counterparts (42, 59, 123, 135–137; Kawasaki et al., manuscript in preparation; Yoshikawa et al., manuscript in preparation). In addition, each of the isolated pre-rRNP complexes contained 40–60 ribosomal proteins from small and large subunits. Detailed comparison with known yeast transacting factors assigned around 30 probable human trans-acting factors in pre-90S particles, of which 12 are the components of processome containing four proteins

213

Nop5/58

NOL5A/Nop56

15.5K

M-phase phosphoprotein 10 TBL3/transducin (beta)-like 3

8

9

10

11

12

Fibrillarin (FIB)

Nucleolin (NCL) RSL1D1/ DKFZP564M182 protein PWP1 CBF2 NOC2L/ DKFZP564C186 protein Dkc1

Protein Name

7

6

3 4 5

1 2

Number

Utp13

Mpp10

Snu13

Nop56/Sik1

Nop5/58

Nop1

Cbf5

Pwp1 Mak21/Noc1 Noc2

Nsr1/She5 Cic1/Nsa3

Yeast Homolog















⬚ ⬚ ⬚

⬚ ⬚













⬚ ⬚







Bait

⬚ ⬚

Par 14 DDX47 FIB



Bait







⬚ ⬚



Bait ⬚





⬚ ⬚

Nop56 NCL NNP1 B23 ⬚

Gu



⬚ ⬚



















⬚ ⬚ ⬚

⬚ ⬚



⬚ ⬚ ⬚

⬚ ⬚

DDX18 Nop132 90S 40S 60S

Yeast Preribosome

Processome

Box H/ACA snoRNP Box C/D U3 snoRNP Box C/D U3 snoRNP Box C/D U3 snoRNP Box C/D U3 snoRNP Processome

Noc1-Noc2 Noc1-Noc2

Remark

Protein names identified are given by the name according to HUGO and/or NCBI database. Corresponding yeast trans-acting factors are also indicated. If functions or functional complexes of yeast trans-acting factors are known, they are given in Remark. When the protein that is identified in the pre-rRNP complex associated with the indicated protein used as bait [Par14, DDX47, FIB, Nop52, NCL, NNP1, B23, Gu (DDX21), DDX18, and Nop132], is homologous to a yeast trans-acting factor, it is given by circle in the corresponding position in the table. When preribosome particles associated are known for yeast trans-acting factors, they ares also given by circle in the corresponding positions in the table.

TABLE 3-4. Probable trans-Acting Factors Involved in Human/Mammalian Ribosome Biogenesis

214

FLJ12949

31

30

27 28 29

KIAA1709 BMS1L C4orf9 HRB2/HIV-1 rev binding protein 2 PDCD11 Ddx10 Hypothetical protein FLJ10534 Nop10

DKFZP564O0463 protein CGI-94 protein Nucleolar protein family 6 CIRH1A HEATR1 WDR46 KIAA0690 SART3 RNA binding motif protein 39 KIAA1470

Protein Name

23 24 25 26

22

16 17 18 19 20 21

14 15

13

Number

TABLE 3-4. (Continued)

Kri1

Nop10

Rrp5 Hca4 Tsr1

Kre33 Bms1 Utp2/Nop14 Krr1

Srm1

Utp4 Utp10 Utp7 Rrp12 Nsr1 Nsr1

Utp11 Utp22

Sof1

Yeast Homolog



⬚ ⬚





⬚ ⬚

⬚ ⬚ ⬚

⬚ ⬚ ⬚ ⬚ ⬚













Par 14 DDX47 FIB





⬚ ⬚



Nop56 NCL NNP1 B23

Gu

⬚ ⬚

⬚ ⬚ ⬚ ⬚



⬚ ⬚ ⬚ ⬚ ⬚ ⬚

⬚ ⬚





⬚ ⬚ ⬚

DDX18 Nop132 90S 40S 60S

Yeast Preribosome

Box H/ACA snoRNP 40S processing factor

DEAD box

GDP/GTP exchange factor

Processome Processome Processome

Processome Processome

Processome

Remark

215

Bop1

GRWD repeat domain 12 Ddx5 Ddx17 Ddx18 CGI-37 RNA binding protein RNA processing factor 1

44

45

46

47 48 49 50 51 52

41 42 43

40

39

38

37

33 34 35 36

GTPBP4/GTP binding protein 4 BXDC2/BRIX Ddx27 Ddx24 FTSJ3FtsJ homolog 3 (E. coli) NOL1/proliferating cell nuclear protein RBM28/hypothetical protein FLJ10377 PPAN/peter pan homolog EBNA1BP2/EBNA1 binding protein 2 RRS1 MKI67IP/NIFK NOC3L/AD24 protein Pes1

32

Dbp2 Dbp2 Has1 Nip7 Mak16 Rpf1

Ytm1

Erb1

Nop7/Yph1

Rrs1 Nop15 Noc3

Ebp2

Ssf1

Nop4/Nop77

Nop2/Yna1

Brx1 Drs1 Mak5 Spb1

Nog1

⬚ ⬚ ⬚ ⬚









⬚ ⬚ ⬚









⬚ ⬚ ⬚ ⬚







⬚ ⬚ ⬚









⬚ ⬚ ⬚ ⬚



⬚ ⬚













⬚ ⬚











⬚ ⬚





Bait





⬚ ⬚ ⬚ ⬚ ⬚ ⬚







⬚ ⬚ ⬚









⬚ ⬚ ⬚ ⬚



(continued)

Bop1-Pes-GRWD complex Bop1-Pes-GRWD complex Bop1-Pes-GRWD complex DEAD box DEAD box DEAD box

Noc2-Noc3

Methyltransferase

DEAD box DEAD box Methyltransferase

Putative GTPase

216

69

68

62 63 64 65 66 67

61

60

58 59

57

54 55 56

53

Number

Brix domain containing 1 XTP5 Nop52/NNP1 TGF beta-inducible nuclear protein 1 Chromosome 1 open reading frame 33 GNL3/nucleostemin GNL2/nucleolar GTPase GRWD1/glutamaterich WD repeat containing 1 AD034/RIO kinase 1 (yeast) KIAA0179 protein Splicing factor RS-6 XRN2 WDR57 SDAD Polymyositis/ scleroderma autoantigen 2, 100 kDa Polymyositis/ scleroderma autoantigen 1 PRP4 pre-mRNA processing factor 4

Protein Name

TABLE 3-4. (Continued)

Rrp4

Rrp45

Rrp1 Nop3 Rat1 Rsa4 Sda1 Rrp6

Rio1/Rrp10

Rrb1

Nug1 Nog2

Mrt4

Puf6 Rrp1 Nsa2

Rpf2

Yeast Homolog



⬚ ⬚ ⬚











⬚ ⬚



⬚ ⬚

Par 14 DDX47 FIB













⬚ ⬚



⬚ ⬚







Bait

⬚ ⬚

Nop56 NCL NNP1 B23

Gu







⬚ ⬚ ⬚ ⬚ ⬚ ⬚





⬚ ⬚



⬚ ⬚ ⬚



DDX18 Nop132 90S 40S 60S

Yeast Preribosome

Exosome

Exosome

Exosome

Putative GTPase Putative GTPase

Remark

217

Ss-B

p160 Myb-binding protein Nopp140

Treacle ES/130 Nop132

83 84 85

86

87

88

89 90 91

78 79 80 81 82

77

75 76

74

72 73

71

Opa-interacting protein 2 Exosomal core protein CSL4 KIAA0116 protein Exosome component Rrp46 Nucleophosmin (B23) hnRNP U Ddx21 (RNA helicase II/Gu) alpha Ddx9 (RNA helicase A, leukophysin) Ddx15 Ddx3 Ddx30 Similar to Ddx36 Ddx50 (RNA helicase II/Gu) beta Ddx48 Ddx47 UBF1

70

— Nup116 Nol8

Srp40

Pol5

Lhp1/Lah1

Tif1 Rrp3 —

Prp43 Ded1 Prp2/ Rna2 — Dbp1/Lph8

YLR419W

Nop3 Dbp1/Lph8



Rrp45 Rrp46

Csl4

Rrp43



















⬚ ⬚



Bait







⬚ ⬚ ⬚



⬚ ⬚











⬚ ⬚









⬚ ⬚













⬚ ⬚





⬚ ⬚

Bait





⬚ Bait







Bait





DEAD box DEAD box Transcriptional factors Transcriptional factors Transcriptional factors Transcriptional factors

DEAD box DEAD box DEAD box DEAD box DEAD box

DEAD box

DEAD box

Exosome Exosome

Exosome

Exosome

218

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

of box C/D snoRNPs (Table 3-4). More than half of them are found in the DDX47and Par14-associated pre-rRNP complexes (137), suggesting they may be pre-90S particles formed at very early stages of human/mammalian ribosome biogenesis. The presence of UBF1 and several other transcription factors in the Par14-associated pre-rRNP complex supports this (136; Fujiyama et al., manuscript in preparation). The FIB-, Nop56-, and NCL-associated complexes contained at least 12, 7, and 6 of those, respectively, as well as a few transcription factors, and may also constitute pre-90S particles formed at different stages from those at which the Par14- and DDX47-associated pre-rRNP complexes are formed. Nop56 was found associated with treacle, a product of the gene responsible for Treacher Collins syndrome (123), which is involved in ribosomal DNA transcription by interacting with UBF1 (124). This result also supports involvement of the Nop56-associated pre-rRNP complex in early stages of pre-90S biogenesis. Over 35 factors including components of Bop1Pes-1-GRWD complex are assigned to the pre-60S particles isolated by the reversetagging approach. Two-thirds of those are found also in the Par14- and DDX47-associated pre-rRNP complexes. This result probably reflects that the roles of Par14 and DDX47 persist for long periods from the early pre-90S to pre-60S formation during ribosome biogenesis. The other pre-rRNP complexes, such as those associated with NNP1/Nop52 and B23, lack most of the pre-90S processing factors (or have only a few) and have mainly pre-60S processing factors, many of which have not been found in any of the pre-rRNP complexes isolated in yeast (Table 3-4). In fact, about 25 proteins identified by this reverse-tagging analysis for human trans-acting factors are homologs of yeast trans-acting factors that have not yet been found in isolated yeast pre-rRNP complexes (104, 105). Of the 25 proteins, 6 are the components of exosome and 9 are DEAD box RNA helicases; they are enzymes that generally dissociate soon after their actions are completed. Those results emphasize the need to use efficient protein identification methods, such as the shotgun method, using highly sensitive direct nano-LC-MS/MS. Interestingly, most of the identified exosome components are found associated with NCL, suggesting that exosome may function mainly on the NCL-associated pre-rRNP complex. However, the analysis identified only a few pre-40S processing factors, whereas reverse-tagging analysis of yeast pre-rRNP complexes has identified a dozen pre-40S processing factors (104, 105). Those results altogether indicate that a large proportion of the trans-acting factors described in yeast have counterparts in humans/mammals; thus, the fundamental mechanisms that underlie ribosome biogenesis have been conserved throughout evolution. In fact, reverse-tagging analysis revealed the presence of a series of distinctly different intermediates of ribosome particles in the nucle(ol)us of human/mammalian cells and could correlate those intermediates with the corresponding yeast preribosome particles (Table 3-4) (32). (More detailed comparison of pre-rRNP complexes between yeast and humans/mammals in terms of functional aspects will be described elsewhere.) However, in the isolated human/mammalian pre-rRNP complexes, a number of nonribosomal proteins, which do not have a counterpart in the yeast pre-rRNP complexes, have been identified by this analysis. Those proteins could function uniquely in the process of human/mammalian ribosome biogenesis

DYNAMICS OF RIBOSOME BIOGENESIS

219

and/or its related processes, and might be downstream targets of some of the oncogenes and growth-factor receptors, which are shown to be major regulators of the protein synthesis machinery (32, 138–140). In fact, B23, which has no known yeast counterpart, is an abundant nucleolar endoribonuclease required for the maturation of 28S rRNA (141) and is regulated by its homolog nucleophosmin 3 (NPM3), which also has no known homolog in yeast (127), and tumor suppressor Arf (142). NPM3 localizes dependently on active rRNA transcription, interacts with B23 in the nucleolus, and inhibits ribosome biogenesis. On the other hand, Arf is regulated in its nucleolar targeting by B23 (143) and, conversely, also regulates the role of B23 in ribosome biogenesis through ubiquitination of B23 and interaction with B23 independently of the presence of rRNA within the nucleolar 60S preribosomal particle (144, 145). Those regulations are obviously unique to human/mammalian ribosome biogenesis. Thus, the regulatory mechanisms that underlie ribosome biogenesis of human/mammalian cells are much more diversified than those of yeast cells, because adaptation, growth, and proliferation of distinctly different cell types, each of which is dependent on different growth factors and other stimuli, are directly coupled with the regulatory mechanisms of ribosome biogenesis (97, 121, 138–140, 146). Failure of some of those regulatory mechanisms may result in some specific disorders that are only observed in humans/mammals. In fact, during the course of our analyses, the function of a protein that is responsible for the cause of a genetic craniofacial disorder was found connected to ribosome biogenesis (123, 124). Because the synthetic process and its regulatory mechanism of human/mammalian ribosome biogenesis are mostly unexplored, systematic proteomic analyses, such as those described here, may reveal the whole picture of human/mammalian ribosome biogenesis and open up a new dimension in understanding human diseases, associated with differentiation, cell growth, and proliferation. One important aspect obtained from reverse-tagging analysis is that some of the pre-rRNP complexes isolated by using affinity tag contained several transcription factors, including UBF1, essential for transcription of rDNA in humans/mammals. Previously, it was thought that rRNA transcription and pre-rRNA cleavage in eukaryotes were separable steps in gene expression; however, recent findings suggest that these two steps in gene expression can be concurrent and are coregulated (Fig. 3-7B) (149, 150). In fact, optimal rDNA transcription requires the presence of a defined subset of components of the pre-rRNA processing machinery (151), and even pre-rRNA processing can occur before the completion of transcription (152). These findings also suggest that pre-90S and pre-60S are formed in the state attached to the transcription machinery that is transcribing rDNA (Fig. 3-7B). This is very similar to how RNA polymerase II-dependent transcription is coupled with spliceosome-dependent splicing. The presence of transcription factors in some of the pre-rRNP complexes isolated by affinity purification probably reflects this phenomenon (Table 3-4) and, in turn, suggests that it is possible to isolate the entire machinery complex that contains rDNA transcription and pre-rRNA processing machinery in operation. Next, we examine how human/mammalian pre-rRNP complexes are isolated by the reverse-tagging approach.

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

220

䉴 Experimental Example 3-3 Reverse-tagging and shotgun analyses of pre-rRNP complex by using 1D-RPLC-MS/MS (42, 74, 123, 135). MATERIALS • Human kidney cell line, 293EBNA cells (Invitrogen, Groningen, The Netherlands). • Dulbecco’s modified Eagle’s medium (DMEM) (Sigma-Aldrich Chemical, Steinheim, Germany). • Anti-FLAG peptide antibody M2-conjugated agarose beads. • Anti-FLAG peptide antibody M2. • FLAG peptide. • Alpha-cyano-4-hydroxycinnamic acid. • Nonionic detergent IGEPAL CA-630 [(octylphenoxy)polyethoxyethanol]. • CHAPS (3-[(3-cholamidopropyl) dimethylammonio]-1-propane-sulfonate) (Dojindo Laboratories, Kumamoto, Japan). • Piperazine diacrylamide (Bio-Rad Laboratories, Hercules, CA, USA). • Tween 20. • Protease inhibitor cocktail Complete Mini (Roche Diagnostics). • NBT/BCIP. • LipofectAMINE (Gibco BRL). • OPTI-MEM. • Achromobacter protease 1 (Lysylendopeptidase, Lys-C). • Trypsin (sequence grade) (Promega, Madison, WI, USA). • Modified trypsin (sequence grade) (Boehringer-Mannheim, Framingham, MA, USA). • HPLC-grade acetonitrile (Waken Chemical, Tokyo, Japan). • HPLC-grade formic acid (Waken Chemical, Tokyo, Japan). • Reversed-phase packing material (Mightysil C18; particle size, 3 μm) (Kanto Chemical, Tokyo, Japan). APPARATUS, SOFTWARE TOOL, AND APPARATUS SETUP See Chapter 2, Section 2-1-1, Experimental Example 2-1.

PROCEDURE Isolation of Pre-rRNP Complex Associated with trans-Acting Factors Involved in Human Ribosome Biogenesis 1. Construct expression vector by inserting the amplified cDNA fragment encoding each human trans-acting factor of interest that was digested with

DYNAMICS OF RIBOSOME BIOGENESIS

2. 3.

4. 5. 6. 7. 8. 9.

221

restriction enzymes whose sites were tagged with primer, NheI and XhoI, and subcloned downstream of a sequence encoding the N-terminally tagged FLAG epitope in a pcDNA3-based vector. Maintain human 293EBNA cells in DMEM supplemented with 10% heat-inactivated fetal calf serum and culture at 37 ⬚C in an incubator with 5% CO2. Transfect 10 μg of the expression plasmid DNA into confluent cells (70%) cultured in 100 mm dishes using LipofectAMINE (Gibco BRL) and grow the transfected cells for 48 h at 37 ⬚C. Harvest the transfected cells and wash with phosphate-buffered saline (PBS). Lyse in 1 mL of lysis buffer [50 mM Tris-HCl (pH 8.0), 150 mM NaCl, 0.5% IGEPAL CA-630, containing protease inhibitor cocktail]. Centrifuge the cell lysate at 15,000 rpm for 30 min at 4 ⬚C. Incubate the supernatant with 20 μL of M2-agarose beads overnight at 4 ⬚C for immunoprecipitation. Wash the protein-bound agarose beads five times with the lysis buffer. Elute the proteins with 20 μL of 50 mM Tris-HCl, 150 mM NaCl containing 500 μg/mL of FLAG peptide.

Ultracentrifugation of Pre-rRNP Complexes 10. Load pre-rRNP complex eluted from M2-agarose beads with FLAG peptide on a 5 mL ultracentrifugation tube that was formed by a 12–50% sucrose gradient in 50 mM Tris pH 7.5, 25 mM KCl, 5 mM MgCl2. 11. Centrifuge in an SW65 rotor (Beckman) at 50,000 rpm (180,000g) for 3 h at 4 ⬚C. 12. Collect each 300 μL into 18 fractions. 13. Determine the migration of 40S/60S/80S in comparison to the ultraviolet absorption profile at 254 nm of cytosolic ribosomes fractioned by ultracentrifugation under the identical conditions to that of pre-rRNP complex. Immunocytochemical Analysis of FLAG-Tagged trans-Acting Factors 14. Culture human 293EBNA cells on 8-well culture slides (Biocoat, Becton Dickinson Labware). 15. Transfect with the expression plasmid DNA bearing each of the trans-acting factor genes using LipofectAMINE. 16. Fix the transfected cells or intact cells with 3.7% formaldehyde in PBS. 17. After washing with PBST (PBS containing 0.05% Tween 20), permeabilize the cells with 0.1% Triton X-100 for 5 min. 18. Treat with 3% (w/v) skim milk in PBS at room temperature. 19. Incubate the cells with mouse anti-FLAG IgG (10 μg/mL) as the primary antibody overnight at 4 ⬚C. Use other antibodies for localization markers.

222

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

20. After another wash with PBST, incubate the cells further with Alexa Fluor 488 conjugated anti-mouse IgG as the secondary antibody for 1 h at room temperature. 21. After incubation, wash the cells and mount with Vectashield (Vector Laboratories, Burlingame, CA, USA), and visualize fluorescent images with Axiovert 200M microscope (Carl Zeiss, Germany). RNase Treatment 22. Incubate the cell lysate with anti-FLAG antibody-conjugated beads at 4 ⬚C for 4 h. 23. After washing five times, incubate pre-rRNP complex on the anti-FLAG antibody-conjugated beads with the lysis buffer containing 1 μg/mL RNaseA at 37 ⬚C for 10 min. 24. Wash twice with the lysis buffer, and elute with 20 μL of 500 μg/mL FLAG peptide. 25. Load 0.5 μg of the eluted proteins on an 11% SDS–polyacrylamide gel, and follow by silver staining. Protease Digestion of the Isolated Pre-rRNP Complex 26. Precipitate 0.2 μg of the isolated complex using 20 μL of mixed methanol and chloroform (1:1 v/v). 27. After vacuum drying, dissolve the precipitate in Tris buffer (50 mM TrisHCl, 6 M urea, 0.005% n-octylglucopyranoside, pH 9.0). 28. Digest the precipitate with 5 μL of Achromobacter protease I (40 pM, substrate-to-enzyme ratio ⫽ 50:1) overnight at 37 ⬚C. 29. Inject the digestion mixture directly onto the nano-ESI column via a 5 μL sample loop. 30. Analyze the peptide mixture according to the conditions described in Section 2-1-1, Experimental Example 2-1. RESULT Typical staining patterns of the immunoprecipitates obtained from whole cell extract on SDS-PAGE gel are shown in Fig. 3-7A. A number of protein bands on a silver-stained SDS-PAGE gel for each of the immunoprecipitates were detected. In contrast, only several protein bands were stained in mock immunoprecipitate prepared from control cells that were transfected with the expression vector lacking the protein genes used. In addition, when other FLAG-tagged proteins unrelated to ribosome biogenesis were used, an entirely different pattern of protein bands was obtained on SDS-PAGE gels. Thus, these results indicate that most of the proteins stained on SDS-PAGE gels are specifically associated with each of the baits used. RNA integrity for the NCL, B23, FIB, NNP1/Nop52, and Nop56-associated proteins is demonstrated in Fig. 3-7A. In the case of B23, for example, all of the proteins

DYNAMICS OF RIBOSOME BIOGENESIS

223

visible by silver staining on SDS-PAGE gel (except one additional protein) were dissociated from FLAG-B23 after RNase treatment. This additional protein band was B23 as determined by the peptide mass fingerprinting (PMF) method. This indicates that endogenous B23 interacted with FLAG-tagged B23 even after the RNase treatment, and formed an oligomer. Meanwhile, as in the cases of NCL and B23, most of the FIB-associated proteins were also dissociated from FLAG-FIB. However, five proteins that were strongly stained on SDS-PAGE gel were retained with FLAG-FIB even after the RNase treatment. They appeared to show approximate stoichiometry in their staining intensity on SDS-PAGE gel and seem to be present more abundantly than those for the constituents that show RNA integrity in the isolated complex. These results indicate that at least two different sets of assembly are present in the complex obtained by the use of FLAG-FIB: one shows RNA integrity for holding the associating proteins and the other does not. Thus, RNA integrity is necessary for the association of most of the proteins present in each of the complexes prepared with the FLAG-tagged proteins indicated, suggesting that the complexes are principally RNP complexes. To estimate the size, the immunoisolated RNP complexes were fractionated by ultracentrifugation through a sucrose gradient and collected into 18 fractions, each of which was analyzed by SDS-PAGE. As an example, the result of FIB-associated pre-rRNP complex is indicated in Fig. 3-7C. While FLAGFIB was observed in nearly all the fractions, the majority of FIB was present in the 40S fraction and the largest RNP complexes in the gradient (fractions 9–14). The proteins that were shown to remain associated with FIB after RNase treatment were found mostly in the subcomplex fractions (fractions 1–4), indicating that these RNAindependent FIB-associated proteins constitute an independent module. Thus, FIB forms a subcomplex that is primarily independent of the rest of the FIB-associated pre-RNP complex. Each of the Lys-C digests of the isolated pre-rRNP complexes was analyzed at least three times by nano-LC-MS/MS and searched against human sequence database with the CID spectra obtained. For example, the NCL-associated pre-rRNP complex resulted in a total of 1080 identified peptides, and 134 proteins met the identification criteria. Of these 134 proteins, 65 proteins were ribosomal proteins (41 for large subunit and 24 for small subunit). Nano-LC-MS/MS analysis identified 14 more ribosomal proteins of the large 60S subunit and 11 more of the small 40S subunit, respectively, and covered all ribosomal proteins except L10 found by gelbased PMF analysis (135). The remaining 69 proteins were nonribosomal proteins. The total number identified was 134, which was 74 more than the PMF method had identified and covers no less than 90% the total number of proteins identified by the PMF method. Some discrepancies in identification between the PMF method and the LC-MS/MS method were observed and might be caused partly because the NCL pre-rRNP complex was not necessarily a complex of stoichiometric constituents due to its transient nature as an intermediate particle of mature ribosome. Some nonribosomal proteins might be low in abundance in the intermediate particles and might not always be detected in every preparation. In addition, some proteins might produce only highly hydrophobic peptides and/or very small hydrophilic peptides by Lys-C digestion, which were recovered only in very low yield and/or passed

224

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

through without any separation from a reversed-phase column, respectively. Those proteins might escape identification by the LC-based method. Furthermore, some proteins might have been more resistant to Lys-C digestion than the other proteins in a stable large RNP complex, which might result in generation of a number of larger and more hydrophobic peptides in extremely low yields. Nonetheless, the shotgun method using highly sensitive nano-LC-MS/MS is an automated 90 min process and has some advantages in terms of high throughput and identification accuracy over the gel-based PMF analysis, which is relatively time and labor intensive and requires multistep manual handling. The same shotgun method was used to identify the protein constituents of the other isolated pre-rRNP complexes. It identified 924 peptides in the B23associated pre-rRNP complex and assigned 112 identified proteins, of which 71 proteins were ribosomal proteins (44 for large subunit and 27 for small subunit) and 41 were nonribosomal proteins. Meanwhile, the FIB-associated pre-rRNP complex resulted in a total of 1218 identified peptides and 161 identified proteins, of which 71 proteins were ribosomal proteins (43 for large subunit and 28 for small subunit) and the remaining 90 proteins were nonribosomal proteins (42). In the case of FIB, the number of identified proteins included those retained with FLAG-FIB after RNase treatment of the FIB-associating RNP complex. The proteins associated with FIB in the absence of rRNA include protein argininemethyltransferase, p32 splicing factor 2 associating proteins, SKB1 (S. cerevisiae) homolog, tubulin alpha 3, and tubulin beta 1. The identification of protein components of the other pre-rRNP complexes is not described here, but some of the results are shown in Table 3-4. 䉳 3-2-3

Quantitative (Dynamic) Analysis Using Isotope-Labeled Reagents

The reverse-tagging approach coupled with shotgun analysis is very useful to isolate a series of protein complexes that are related functionally to one another; however, it cannot characterize dynamic aspects of the protein components commonly found among the isolated protein complexes. For such components, a quantitative approach is absolutely required. The differential protein analysis using LC-MS coupled with stable isotope labeling of proteins in vivo or in vitro accelerates analysis of the dynamic aspects of protein expression, post-translational modifications, protein–protein interactions, and the molecular dynamics of functional multiprotein cellular components and thereby allows moment-by-moment snapshot analyses of cellular functions (see Chapter 2, Section 2-2). Various strategies based on stable isotope coding and MS are developed as described in Chapter 2, Section 2-2-1. In those strategies, differentially labeled peptides are analyzed by LC-MS/MS in a data-dependent manner, protein identification is based on MS/MS data, and quantification is based directly on MS data or XICs of the MS signal. Protein ratios can be determined by comparing the relative intensities of MS signals (or XICs) from differentially labeled peptides (see Chapter 2, Section 2-2-2). Experimental Example 3-4 shows the analysis of the relative abundance of all the ribosomal proteins present in NCL-, FIB-, and B23-associated pre-rRNP complexes (described in

DYNAMICS OF RIBOSOME BIOGENESIS

225

the previous section) using isotopically labeled reagents. In this method, each of the isolated pre-rRNP complexes was digested with trypsin and labeled with an isotopecoded reagent. The NCL-associated pre-rRNP complex is labeled with light reagent; another complex, fibrillarin- or B23-associated pre-rRNP complex, is labeled with heavy reagent (Fig. 3-8A). Two differently labeled peptide mixtures were then combined and analyzed by an LC-MS/MS system. Relative quantitation was done by MS analysis. The corresponding peptide is identified by subsequent MS/MS analysis (Fig. 3-8A). 䉴 Experimental Example 3-4 Relative quantification using O-methylisourea and LC-MS (Yamauchi et al., manuscript in preparation). MATERIALS • O-methylisourea (S-methylisothiourea hemisulfate salt, Sigma-Aldrich, St. Louis, MO, USA). • Isotope labeled O-methylisourea-1-13C,15N2HCl [Sigma-Aldrich (Isotech), St. Louis, MO, USA)]. • Achromobacter protease 1 (Lysylendopeptidase, Lys-C). • Trypsin (sequence grade) (Promega, Madison, WI, USA). APPARATUS Chapter 2, Experimental Examples 2-1 and 2-2. SOFTWARE TOOLS • MassLynx. • Mascot. • STEM 13 (available from our website; http://www.sci.metro-u.ac.jp/ proteomicslab/). System Requirements for STEM. STEM requires a Windows NT4.0-, 2000-, or XP-based PC equipped with a minimum of a 600 MHz Pentium processor and ∼128 MB of RAM. Some of the STEM commands use internal functions of Microsoft Excel; therefore, Excel 2000, XP, or 2003 is also required. For example, a Windows 2000-based DELL Dimension 4300S (Dell Inc., Texas, USA) equipped with an Intel Pentium 4 1.9-GHz processor and 256 MB SDRAM can be used. PROCEDURE Trypsin Digestion 1. Dissolve the protein mixture (if alkylation is necessary, follow Experimental Example 2-1, Procedure 4-7) in 100 mM Tris-HCl, pH 8.5 containing 1

226

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

Fig. 3-8. Quantitative analysis of affi nity-purified multiprotein complexes using isotope-labeled O-methylisourea and LC-MS/MS. (A) In this technology, a pair of target protein complexes are digested with protease such as trypsin and resulting peptide mixtures are differentially labeled in vitro with “light” (normal) or “heavy” (13C, 15Nlabeled) O-methylisourea. The N-guanidinated peptide mixtures are then combined and analyzed by a LC-MS/MS system in a data-dependent mode, in which the relative abundance of each peptide pair is quantitated from the relative intensities of their MS signals while identifying the peptide by the MS/MS signals. (See insert for color representation.) (B) Validation of mass spectrometric quantification by N-guanidination with O-methylisourea. Rat liver ribosome isolated by sucrose density gradient ultracentrifugation was digested with lysylendopeptidase and labeled with “heavy” (13C, 15N-labeled) or “light” O-methylisourea, respectively. The differentially labeled preparations are mixed at various ratios (1:1, 1:2, 1:10) and analyzed by the LC-MS system for quantitative identification of the ribosome components. The figure indicates that the relative abundances of most of the approximately 80 protein components of mature mammalian ribosome are estimated, with standard deviations of 1.0 ± 0.11. (Yamauchi et al., manuscript in preparation). (C) Validation of mass spectrometric quantification by comparing with SDS-PAGE gel staining. The pre-rRNP complexes isolated by epitope-tag affi nity purification are visualized by gel staining. High contents of at least five proteins (PRMT5, tubulin-α and β, PRMT-1, SF2Ap32) that remained associated with bait-FIB after RNase treatment are clearly shown. Those proteins are contained in the FIB-associated pre-rRNP complex at the quantitative ratios indicated on the right side as the relative values for those present in the NCL-associated pre-rRNP complex. On the other hand, ribosomal proteins RPL11 and RPL27a that stained almost equally among the pre-rRNP complexes, gave relative values close to 1 (0.82 and 0.92, respectively).

DYNAMICS OF RIBOSOME BIOGENESIS

227

1

0.5

0

L3 L4 L5 L6 L7 L7a L8 L9 L10 L10a L11 L12 L13 L13a L14 L15 L17 L18 L21 L22 L23 L23a L24 L26 L27 L27a L28 L29 L30 L31 L32 L34 L35 L35a L36 L36a L37a L38 L39 L40 S2 S3 S3a S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S15a S15a S16 S17 S18 S19 S20 S21 S23 S24 S25 S26 S27 S27a S27a SA P0 P1 P2

Relative abundance

1.5

Protein (B)

Fig. 3-8. (Continued) (D) Semiquantitative mass spectrometric analysis of pre-rRNP complexes formed during ribosome biogenesis. The pre-rRNP complexes are immunopurified using three “transacting” factors—FIB, B23, and NCL—as baits. After lysylendpeptidase digestion of the complexes, the peptide mixtures are differentially labeled with “light” or “heavy” O-methylisourea as in part (A) and subjected to semiquantitative mass spectrometric analysis. In this figure, the relative abundance of 50 ribosomal proteins from the FIB complexes and the B23 complexes are normalized to those in the NCL complexes. The results indicate that the FIB-associated pre-rRNP complex contains at least twice the amount of each protein component of small subunit ribosome as compared to the others. (Provided by Dr. Yamauchi.)

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

228 3.5

Relative abundance

3 2.5 2 1.5 1

P0 P1 P2

S2 S3 S3a S4 S7 S8 S9 S10 S15a S16 S17 S19 S20 S24 S28

0

L3 L4 L5 L6 L7 L7a L8 L9 L10a L11 L12 L13 L13a L15 L17 L18 L21 L22 L23a L24 L26 L27 L27a L29 L30 L31 L32 L34 L35 L35a L36 L39

0.5

Protein (D)

Fig. 3-8. (Continued)

mM CaCl2 and 8 M urea and remove insoluble material by centrifugation at 20,000g. 2. Dilute the supernatant with 100 mM ammonium bicarbonate, pH 8.5 containing 1 mM CaCl2 to 2 M urea. 3. Add trypsin to the protein sample solution at an enzyme-to-substrate ratio of 1:50 (w/w) and incubate overnight at 37 ⬚C with tumbling. Peptide Modification 4. Add solid O-methylisourea or isotope-labeled O-methylisourea to 1.5 M final, adjust the pH to 10.0 with NaOH (∼0.5 M final), and incubate at 37 ⬚C overnight in a draft hood (the reaction is malodorous) to modify the peptides. 5. Mix the two peptide samples that have been modified with isotope-unlabeled and isotope-labeled O-methylisourea reagents. 6. Acidify the digest to pH 2 by adding an aliquot of 6 M HCl and, where necessary, remove any precipitates by centrifugation. 7. Apply directly to 1D-RPLC-MS/MS; or for analysis by 2D-LC-MS/MS, neutralize the supernatant with aqueous ammonia to pH 8, dilute the peptide mixture with an equal volume of water, and apply to the 2D-LC-MS/MS system. LC-MS/MS Analysis of Peptides 8. A DNLC-MS/MS system or an automated microscale 2D-LC-MS/MS system using Q-ToF2 or ultima (Micromass, UK), according to Experimental Examples 2-1 or 2-2, respectively, were used to analyze the mixed tryptic digest.

DYNAMICS OF RIBOSOME BIOGENESIS

229

Protein Identification and Data Analyses 9. See Experimental Examples 2-1 and 2-2. Protein Quantification Using STEM 13 Software (153) 10. Convert MS spectra to an ASCII file using MassLynx (Micromass) to obtain the dataset used for quantitative analysis in CQ-mode (in C-mode) of STEM 13, which analyzes mass spectral data for two peptide mixtures labeled with different stable isotopes. Since STEM also extracts the identified peptide information from the DAT file generated by the V-mode and then outputs a list of identified proteins, the mass spectra in ASCII format are searched as mass spectral peaks of identified proteins. CQ-mode searches further for mass peaks of peptides labeled with light and heavy isotopes by utilizing the difference in mass to find pairs of labeled peptides. The relative abundance of peptides is determined by comparing the peak intensities of these pairs, and the results are outputted as a list of identified peptides. When the peptides are labeled by a guanidination reaction with 12C/14N O-methylisourea (light) or 13C/15N O-methylisourea (heavy), the “light” or “heavy” reagent introduces an additional 43 or 46 atomic mass units (amu), respectively, into an epsilon-amino group of a lysine residue, such that the modification causes a difference of 3 amu/peptide in each peptide pair (see Chapter 2, Section 2-2-2). The relative abundance of each protein pair is quantitatively measured by comparing two different isotope mass signals obtained by LC-MS analysis (see Section 2-2-2). STEM is also compatible with isotope-labeling reagents with various mass differences, such as H 216O/H 218O differential labeling with 4 amu/peptide mass unit (154). If the mass peaks overlap as a consequence of a small mass difference between labeling agents, the natural peptide peak ratio is theoretically calculated using the natural isotope abundance ratio (see Section 2-2-2, Fig. 2-16D). The observed peak ratio is corrected using the calculated value. In order to evaluate the quantitative results of the method, Pptn is defined as 12

2 ⎡⎛ Obs Tn ⎞ ⎤ n Pptn ⫽ ∑ ⎢⎜ ⫺ ⎟  k⫽2 ⎢ ⎣⎝ Obs1 T1 ⎠ ⎦ n

(3-1)

where Obs is an observed high of the isotope mass peak, and T is a theoretical abundance ratio calculated using the natural isotope abundance ratio. The numerical suffixes, 1, 2, 3…, to Obs and T are in the order of isotope molecular mass, beginning with the smallest. The level of concomitant noise is considered to be insignificant if the value of Pptn is less than 0.4. When the value is greater than 0.4, the mass spectra are visually inspected to verify the reliability of the quantitation. To facilitate visual inspection, use a graphical

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

230

interface in the STEM 13 program, which displays a portion of the mass spectra. If a single protein is quantitated by multiple peptides derived from different portions of the polypeptide, an average value is calculated and displayed by STEM. RESULT To validate the method, a ribosome particle was isolated from rat liver by conventional sucrose density gradient ultracentrifugation; each of the divided ribosome samples was digested with lysylendopeptidase (Lys-C) and labeled with “heavy” or “light” O-methylisourea. The differentially labeled preparations were then mixed at various ratios and analyzed by the LC-MS system for quantitative identification of the ribosome components. It is generally accepted that the functional mammalian ribosome consists of 80 protein components, 47 from the large 60S subunit and 33 from the small 40S subunit, as well as ribosomal RNAs. By using LC-MS analysis, most of these proteins were identified and the relative abundance of each component was estimated for the three different preparations (Yamauchi et al., manuscript in preparation, Fig. 3-8B), except for several proteins that produced a small number of multiply charged peptide ions within the range of mass spectrometric detection. The result indicates that the quantitation is reasonably accurate within the range of relative abundance, with standard deviations of 1.0 ± 0.11. This method was also applied to the analysis of the dynamics of pre-rRNP complexes formed at various stages of ribosome biogenesis in human cells. Three well-known trans-acting factors (NCL, FIB, and B23) involved in human ribosome biogenesis were used to analyze quantitative changes in their protein components (135, 155). The protein components present in the three distinctive pre-rRNP complexes were first analyzed by SDS-PAGE, which confirmed the presence of many of the protein bands are commonly found in the isolated protein complexes, and discovered some bands that are unique to each protein complex (Fig. 3-8C). In all cases, RNase treatment prompted the dissociation of the majority of the protein components of all isolated pre-rRNP complexes, indicating that association of most of the proteins with the three trans-acting factors as RNA dependent. However, at least five protein-staining bands remained associated with FLAG-FIB after RNase treatment (Fig. 3-8C). FIB and these proteins (including splicing factor 2-associated p32, protein arginine methyltransferases, and tubulins) formed a subcomplex that was independent of its association with pre-rRNP complex, as described in the previous section (42); thus, the protein components of the subcomplex were representative of more abundant proteins uniquely present in the protein complexes associated with FIB. As expected, quantitative analysis identified all of the protein components of the subcomplex and showed that they were nine- to elevenfold more abundant than those present in the NCL-associated pre-RNP complex (Fig. 3-8C). In addition, the analysis revealed that the exogenously expressed FIB was recovered 48-fold more compared to endogenous FIB present in the NCL-associated pre-rRNP complex. On the other hand, protein bands stained equally in both complexes on a SDS-PAGE gel gave

DYNAMICS OF RIBOSOME BIOGENESIS

231

almost equal value of mass intensity. For instance, ribosomal proteins RPL11 and RPL27a, which were indistinguishable between FIB- and NCL-associated pre-rRNP complexes in terms of SDS-PAGE staining (Fig. 3-8C), were quantified as relative ratios of 0.82 and 0.92 (mass intensity of ribosomal protein present in FIB-associated pre-rRNP complex/that present in nucleolin-associated pre-rRNP complex) by quantitative mass analysis, respectively. Thus, the results obtained based on the present quantitative method are consistent with those obtained using the SDS-PAGE staining method (42), validating that the method is reliable for quantitative comparison of prerRNP complexes formed at different stages of ribosome biogenesis. One of the most intriguing results was obtained when abundances of ribosomal proteins among the pre-rRNP complexes associated with NCL, FIB, and B23 were compared. Although the different protein compositions can easily be distinguished, the common protein composition cannot be distinguished by qualitative analysis. It has been shown by qualitative LC-MS/MS analysis that while three trans-acting factors were associated with different sets of nonribosomal proteins, they were associated with almost identical sets of ribosomal proteins from large and small subunits (42, 135). By using the quantitative analytical method, we found that while ribosomal proteins from the large subunit were present in equal amounts, almost all of those from the small subunit were present twice as much in the FIB-associated pre-rRNP complex as in the NCL-associated pre-rRNP complex (Fig. 3-8D). However, no such difference was detected between the pre-rRNP complexes associated with NCL and B23; ribosomal proteins from both small and large subunits were present equally in those complexes (Fig. 3-8D). Based on the composition of the trans-acting factors involved or expected to be involved in ribosome biogenesis in the pre-rRNP complexes, the FIB-associated pre-rRNP complex was found to be formed at an earlier stage of ribosome biogenesis than the NCL-associated prerRNP complex (32, 155). These results indicate that FIB is associated preferentially with the pre-rRNP complex involved in assembly of the 40S small subunit and support the dichotomy hypothesis of ribosome synthesis in the nucleolus, in that the 40S small subunit is assembled at an initial stage, whereas the 60S large subunit is assembled later. The isolated B23-associated pre-rRNP complex is very similar to the NCL-associated pre-rRNP complex in terms of ribosomal protein composition as well as nonribosomal protein composition, suggesting that these isolated complexes are formed at very close stages of ribosome biogenesis. These results indicate the usefulness of the present quantitative method for the analysis of the dynamics of pre-rRNP complexes formed at various stages of ribosome biogenesis in human cells. 䉳 3-2-4

Outline of Human/Mammalian Ribosome Biogenesis

Based on the results of reverse-tagging analysis described earlier and the other results, an outline of the dynamic process of human/mammalian ribosome biogenesis can be described as follows. Human/mammalian ribosome biogenesis is a series of dynamic processes in that many trans-acting factors and ribosomal proteins are associated with pre-rRNAs and their processing products in the form of pre-rRNP

232

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

complexes, and each trans-acting factor is associated with the pre-rRNP complex only transiently—when it is carrying out its given action. Since assembly of the small subunit takes place first, 40S trans-acting factors must be assembled in the pre-rRNP complexes in the very early stages of ribosome biogenesis. Although pre40S particles that contain a number of 40S trans-acting factors have not yet been isolated, Par14, DDX47, FIB, and NNP1/Nop56-associated pre-rRNP complexes that contain a part of the processome machinery have been successfully isolated. The earliest human/mammalian pre-rRNP complex is possibly that associated with Par14, which probably constitutes pre-90S particles. The DDX47-associated complex comes next, and FIB-associated pre-rRNP complex is probably formed at later stages than the Par14- and DDX47-associated pre-rRNP complexes. The FIB-associated pre-rRNP complex is associated preferentially with a number of ribosomal proteins for the small subunit when compared with those associated with NCL or B23. Thus, the FIB-associated complex may be late pre-90S and may be transformed into B23-associated complex through the formation of NCL and/or NNP1/Nop52associated complexes. During this process, the exosome may work on NCL-prerRNP complex. Some of the isolated pre-rRNP complexes are distinctively different from one another, and each pre-rRNP complex probably is checked at a particular stage for the proper processing of pre-rRNA and assembly of ribosomal proteins on appropriate pre-rRNP complex. There may be some strict checkpoints to control the quality of the extremely complex products—points where the pre-rRNP complex is stable during the synthetic process. Supporting this assumption, the formation of B23-associated complex may be monitored by tumor suppressor Arf, which has the ability to control ubiquitination of B23. Overexpression of Arf induces degradation of B23; thus, it is very intriguing to speculate that expression of Arf may inhibit the formation of stable B23-associated complex and conversely accumulates NCL- or Nop52-associated complexes. It will also be very interesting to know whether other checkpoints are present between the pre-rRNP complexes. Currently, involvement of the ubiquitin–proteasome system in ribosome biogenesis has been demonstrated by showing the changes of dynamic behaviors of some late processing trans-acting factors in human cells upon treatment of proteasome inhibitors and in the presence of ubiquitinated proteins in pre-rRNP complexes (156). Thus, it is possible that the ubiquitin–proteasome system has a role in quality control of ribosome product at various stages of ribosome biogenesis or may actively participate in the process of ribosome synthesis. What we don’t know at present is how each ribosomal protein is supplied at the right place, at the right time, and to the appropriate pre-rRNP complex in the cell. There must be a strict regulatory mechanism by which each ribosomal protein is supplied appropriately to the nucle(ol)ar site where ribosome synthesis takes place.

3-3

DYNAMIC ANALYSIS OF SUBCELLULAR STRUCTURES

Current proteomics technology based on LC-MS has now begun to be applied to the dynamic analysis of an entire subcellular structure containing hundreds of proteins

DYNAMIC ANALYSIS OF SUBCELLULAR STRUCTURES

233

isolated biochemically (see Chapter 1, Section 1-3-5, and Section 3-1) (10). A monumental work for the analysis of subcellular dynamics is that of the human nucleolus using in vivo SILAC labeling and LC-MS/MS methodologies (as Mentioned in Section 1-3-3 and described in more detail later) (25). A similar approach has been taken in the dynamic analysis of the nuclear proteome during apoptosis of human T cells (157). In this analysis, the nuclei of human T leukemia cells were labeled with SILAC and isolated by sequential extraction with three different buffer conditions to uncover quantitative changes during apoptosis. LC-MS/MS analyses of three nuclear fractions extracted from naive and apoptotic cells resulted in the identification and quantification of 1174 putative nuclear proteins and revealed a dynamic recruitment of mitochondria into nuclear invaginations during apoptosis. A quantitative method without labeling has also been applied in comparing mitochondria proteome from rat muscle, heart, and liver (158). This method quantified proteins based on extracted ion chromatogram (XIC) values of mass intensity obtained by LC-MS analysis of in-solution digests of mitochondria (see Chapter 2, Section 2-2-4) (159). More specifically, the ion intensity of each peptide correlated with the elution times of sequenced peptide was used as a measure of the relative abundance between the three tissue samples; that is, the sum of those top 3 (or top 2) values (xPAI) of the two/three highest intensity values within each set of peptides was divided by a total sum representing the global intensity of all the proteins identified in the tissue to obtain a relative percentage of each protein with respect to the others in the same sample (158). Comparison of the relative percentages of 689 mitochondrial proteins among three tissues provided new insight into the tissue-specific expression of mitochondrial proteins and revealed a number of candidates uniquely expressed in skeletal muscle and liver mitochondria. Although there are many other examples of quantitative proteomics approaches applied to the dynamic analysis of subcellular structures or organelles, the next section outlines the proteome dynamics of the nucleolus because the method has applicability to not only substructures but also multiprotein complexes and post-translational modifications. 3-3-1 Proteome Dynamics of the Nucleolus The best characterized organelle that has been analyzed by subcellular proteomics is the nucleolus isolated from HeLa cells (25, 160–163). The nucleoli are plurifunctional nuclear domains involved in the regulation of several major cellular processes, such as ribosome biogenesis, the biogenesis of nonribosomal ribonucleoprotein complexes, the cell cycle, and cellular aging. The first stage is the qualitative proteome analysis of HeLa cell nucleoli. In this analysis, the nucleoli are isolated in high purity by density gradient fractionation (164) and are confirmed by their homogeneity and preservation in internal morphology (by electron microscopy), transcriptional activity [by 5-bromo-UTP (Br-UTP) incorporation], and inhibitory effect of actinomycin D on transcription (by 5-fluorouridine incorporation analysis of control and actinomycin D-treated HeLa cells). Protein mixtures from individual 1D/2D-PAGE gel slices are then in-gel digested with either trypsin or endoproteinase Lys-C. The resulting peptide mixtures are analyzed in many runs of MALDI-ToF and nanoscale

234

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

LC-MS/MS on a Q-ToF and an ion trap FT-ICR. To reduce false-positives caused by random impurities, independently prepared nucleoli from two laboratories are analyzed and only proteins identified in both preparations are included in the final nucleolar proteome. Tandem mass spectra are searched in the human sequence database, and only those peptides are considered that conform to full trypsin or LysC specificity and whose mass matches the calculated mass within 3 ppm. Unique analyzed peptide sequences are unambiguously matched to human genes with an average mass accuracy of 0.7 ppm. These peptides are used to identify proteins in the purified nucleoli with high stringency, requiring at least two high-scoring peptides per protein. Under these conditions, a false-positive rate for protein identification is estimated below 0.1%. The dataset from all those nucleolar MS analyses defines an updated group of 692 proteins that reproducibly copurify with human nucleoli (available at http://www.lamondlab.com/NOPdb/). Bioinformatic classification demonstrates functional diversity of the nucleolar proteome and the presence of approximately one-third of proteins with no previous functional information, although many proteins that stably copurify with nucleoli are also present at other cellular locations and some only accumulate transiently in nucleoli (25, 163). The results demonstrate that the approaches used in subcellular proteomics for functional classification of proteins are extremely useful. Many of the nucleolar proteins are highly mobile and often exchange rapidly between nuclear bodies and the surrounding nucleoplasm, as shown by techniques such as fluorescence recovery after photobleaching (FRAP) or fluorescence loss in photobleaching (FLIP) (1, 165). Thus, in the second stage of analysis, the quantitative changes of protein components in the nucleoli upon environmental perturbations are analyzed by a quantitative proteomic method. To analyze the dynamic change of the nucleolar proteome during environmental perturbations with several inhibitors, including actinomycin D (inhibitor of RNA polymerase I), 5,6-dichlorobenzimidazole riboside (DRB, inhibitor of RNA polymerase II), and MG132 (proteasome inhibitor), a series of dynamic proteomic studies on nucleoli isolated either from control cells or from cells treated with each of those inhibitors are performed by using stableisotope labeling by amino acids in cell culture (SILAC) (25, 166). The experimental processes used for this study are: 1. Metabolic labeling of cultured cells with either normal arginine (12C6 14N4-Arg, Arg0), carbon-substituted arginine (13C6 14N4-Arg, Arg6), or carbon-plus-nitrogen-substituted arginine (13C6 15N4-Arg, Arg10). This triple labeling allows three cell states to be measured in one experiment and distinguishes peptides derived from three cell states after proteolytic digestion of the proteins in the mass spectrameter by their offsets of either zero, six, or ten mass units (25). 2. Treatment of the three cell populations with an inhibitor for three different lengths of time (inhibition of transcription with the drug is assayed by Br-UTP incorporation). 3. Isolation of nucleoli directly from a cell pool that is a mix of equal amounts of cells from each time point. 4. The analysis of nucleolar proteins by LC-MS/MS.

DYNAMIC ANALYSIS OF SUBCELLULAR STRUCTURES

235

Because every arginine containing tryptic peptide occurs in three isotopic forms, the intensities of these three mass spectrometry peaks directly reveal the relative ratios of the corresponding protein in the nucleolus at each of the three time points (Fig. 3-9). The analysis can be repeated with a common zero point and additional time points of transcription inhibition to achieve higher time resolution. Two or four sets of the experiments with different time intervals, for example, reveal the relative ratios of the corresponding protein at each of the five or nine time points (25). The dynamic change in relative ratios of proteins obtained by this analysis can be validated by kinetic experiments using the cell line expressed, for example, the corresponding yellow fluorescent protein (YFP)-tagged proteins. A series of SILAC analyses, in the case of actinomycin D treatment, monitored 489 proteins out of the 692 detected nucleolar proteomes: the relative levels of 489 proteins were quantified (A)

(B) Time 2D min

+ Heavy 1

Time 2D min, Act D

nca 14N4–Arg (Arg5) +

Heavy 2

4 Counts x 100

nca 14N4–Arg (Arg0)

nca 15N4–Arg (Arg10)

2

0 min Arg0

1

400

4 LLQLVEDR

Mix orl by 1:1:1

3

0 min Arg0

2

Identify and quantity protectrs by nunoLCMS/MS

Normal fold charge

Digest with trypain

501

40 min Arg6

100 min Arg10

1 0 401

400

(D) Isolate nuclmol

20 min Arg6

m/z

(C)

Time 3D min, Act D

50 min Arg10

LLQLVEDR

3

0 401

Counts x 100

Light

501

m/z

2 1 0 0

50

100

150

Act D (min)

Fig. 3-9. Determination of nucleolar protein dynamics by stable isotope labeling by amino acids in cell culture (SILAC). [Reprinted by permission from Macmillan Publishers Ltd.; Anderson et al., Naure (Ref. 25) (2005).] (A) The proteomes in three cell populations are metabolically labeled with stable isotope derivatives of arginine—Arg0, Arg6, and Arg10— by the technology called SILAC and each cell population with differential isotope label is treated with actinomycin D for 0, 20, and 80 min, respectively. Cells are mixed, and nucleoli are purified and analyzed by MS. (B) and (C) MS spectra of peptides LLQLVEDR derived from p68, which indicates that increasing amounts of p68 are recruited to the nucleolus following the actinomycin D treatment. (D) Time course of the p68 level (y-axis is in units of normalized fold change of p68).

236

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

at five or nine separate time points after inhibiting transcription. Detailed informatics analysis of the dynamic data revealed not only the previously described effects of cellular growth and environmental conditions on nucleolar morphology and ribosome synthesis but also the new dynamic aspects of the nucleolus during environmental perturbation by inhibitors. In other words, these analyses provide a more detailed and quantitative insight into how the cellular response to environmental stress and growth conditions affects the nucleolus on the proteome scale. Currently, the human nucleolar proteome has been deciphered in functional annotation for one-third of the functionally uncharacterized proteins of the identified nucleolar proteins using the informatics approach, by which possible protein complexes are assembled using existing data, including nucleolar protein catalog, protein–protein interaction, protein complex network, and nucleolar dynamics. This approach provided a draft of human ribosome biogenesis and predicted a number of protein complexes with functions beyond ribosome biogenesis in the nucleolus. Thus, it may also be useful for other organelles or systems wired in a similar way, by integrating different types of data with high throughput proteomics, followed by a detailed biological analysis and experimental validation (134). In any case, the importance of proteomic analysis cannot be overstressed. REFERENCES 1. Misteli, T. (2001). Protein dynamics: implications for nuclear architecture and gene expression. Science 291:842–847. 2. Kitano, H. (2002). Systems biology: a brief overview. Science 295:1662–1664. 3. Ideker, T., Thorsson, V., Ranish, J. A., Christmas, R., Buhler, J., Eng, J. K., Bumgarner, R., Goodlett, D. R., Aebersold, R., and Hood, L. (2001). Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292(5518):929–934. 4. Lin, B., White, J. T., Lu, W., Xie, T., Utleg, A. G., Yan, X., Yi, E. C, Shannon, P., Khrebtukova, I., Lange, P. H., Goodlett, D. R., Zhou, D., Vasicek, T. J., and Hood, L. (2005). Evidence for the presence of disease-perturbed networks in prostate cancer cells by genomic and proteomic analyses: a systems approach to disease. Cancer Res. 65(8):3081–3091. 5. Zheng, P. Z., Wang, K. K., Zhang, Q. Y., Huang, Q. H., Du, Y. Z., Zhang, Q. H., Xiao, D. K., Shen, S. H., Imbeaud, S., Eveno, E., Zhao, C. J., Chen, Y. L., Fan, H. Y., Waxman, S., Auffray, C., Jin, G., Chen, S. J., Chen, Z., and Zhang, J. (2005). Systems analysis of transcriptome and proteome in retinoic acid/arsenic trioxide-induced cell differentiation/ apoptosis of promyelocytic leukemia. Proc. Natl. Acad. Sci. USA 102(21):7653–7658. 6. Shiio, Y., Suh, K. S., Lee, H., Yuspa, S. H., Eisenman, R. N., and Aebersold, R. (2006). Quantitative proteomic analysis of myc-induced apoptosis: a direct role for Myc induction of the mitochondrial chloride ion channel, mtCLIC/CLIC4. J. Biol. Chem. 281(5):2750–2756. 7. Beck, H. C., Nielsen, E. C., Matthiesen, R., Jensen, L. H., Sehested, M., Finn, P., Grauslund, M., Hansen, A. M., and Jensen, O. N. (2006). Quantitative proteomic analysis of posttranslational modifications of human histones. Mol. Cell. Proteomics 5(7):1314–1325.

REFERENCES

237

8. Neher, S. B., Villen, J., Oakes, E. C., Bakalarski, C. E., Sauer, R. T., Gygi, S. P., and Baker, T. A. (2006). Proteomic profiling of ClpXP substrates after DNA damage reveals extensive instability within SOS regulon. Mol. Cell 22(2):193–204. 9. Armstrong, J. D., Pocklington, A. J., Cumiskey, M. A., and Grant, S. G. (2006). Reconstructing protein complexes: from proteomics to systems biology. Proteomics 6(17):4724–4731. 10. Andersen, J. S., and Mann, M. (2006). Organellar proteomics: turning inventories into insights. EMBO Rep. 7(9):874–879. 11. Souchelnytskyi, S. (2005). Bridging proteomics and systems biology: What are the roads to be traveled? Proteomics 5:4123–4137. 12. Noble, D. (2002). Modelling the heart: insights, failures and progress. Bioessays 24(12):1155–1163. 13. Schoeberl, B., Eichler-Jonsson, C., Gilles, E. D., and Muller, G. (2002). Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors. Nat. Biotechnol. 20(4):370–375. 14. Ballif, B. A., Roux, P. P., Gerber, S. A., MacKeigan, J. P., Blenis, J., and Gygi, S. P. (2005). Quantitative phosphorylation profiling of the ERK/p90 ribosomal S6 kinasesignaling cassette and its targets, the tuberous sclerosis tumor suppressors. Proc. Natl. Acad. Sci. USA 102(3):667–672. 15. Papin, J. A., Hunter, T., Palsson, B. O., and Subramaniam, S. (2005). Reconstruction of cellular signalling networks and analysis of their properties. Nat. Rev. Mol. Cell Biol. 6:99–111. 16. Blagoev, B., Ong, S.-E., Kratchmarova, I., and Mann, M., (2004). Temporal analysis of phosphotyrosine-dependent signaling networks by quantitative proteomics. Nat. Biotechnol. 22:1139–1145. 17. Kratchmarova, I., Blagoev, B., Haack-Sorensen, M., Kassem, M., and Mann, M. (2005). Mechanism of divergent growth factor effects in mesenchymal stem cell differentiation. Science 308(5727):1472–1477. 18. Pomerening, J. R., Sontag, E. D., and Ferrell, J. E. (2003). Building a cell cycle oscillator: hysteresis and bistability in the activation of Cdc2. Nat. Cell Biol. 5:346–351. 19. Bornheimer, S. J., Maurya, M. R., Farquhar, M. G., and Subramaniam, S. (2004). Computational modeling reveals how interplay between components of a GTPasecycle module regulates signal transduction. Proc. Natl. Acad. Sci. USA 101:15899– 15904. 20. Hasty, J., McMillen, D., and Collins, J. J. (2002). Engineered gene circuits. Nature 420:224–230. 21. Guido, N. J., Wang, X., Adalsteinsson, D., McMillen, D., Hasty, J., Cantor, C. R., Elston, T. C., and Collins, J. J. (2006). A bottom–up approach to gene regulation. Nature 439(7078):856–860. 22. Rabut, G., Doye, V., and Ellenberg, J. (2004). Mapping the dynamic organization of the nuclear pore complex inside single living cells. Nat. Cell Biol. 6(11):1114–1121. 23. de Hoog, C., and Mann, M. (2004). Proteomics. Annu. Rev. Genomics Hum. Genet. 5:267–293.; Gorg, A., Weiss, W., and Dunn, M. J. (2004). Current two-dimensional electrophoresis technology for proteomics. Proteomics 4:3665–3685. 24. Hayano, T., and Takahashi, N. (2004). A method for the analysis of functional protein complexes using SPR-MS, assembly snapshot approach for dynamic analysis. In

238

25. 26.

27.

28.

29.

30.

31. 32.

33.

34.

35.

36.

37.

38.

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

Experimental Manuals for Proteome Analyses, T. Isobe and N. Takahashi (Eds.), Yodosha, Tokyo, Japan, pp. 214–224 (in Japanese). Andersen, J. S., Lam, Y. W., Leung, A. K., Ong, S. E., Lyon, C. E., Lamond, A. I., and Mann, M. (2005). Nucleolar proteome dynamics. Nature 433(7021):77–83. Ranish, J. A., Yi, E. C., Leslie, D. M., Purvine, S. O., Goodlett, D. R., Eng, J., and Aebersold, R. (2003). The study of macromolecular complexes by quantitative proteomics. Nat. Genet. 33:349–355. Ranish, J. A., Yudkovsky, N., and Hahn, S. (1999). Intermediates in formation and activity of the RNA polymerase II preinitiation complex: holoenzyme recruitment and a postrecruitment role for the TATA box and TFIIB. Genes Dev. 13:49–63. Brand, M., Ranish, J. A., Kummer, N. T., Hamilton, J., Igarashi, K., Francastel, C., Chi, T. H., Crabtree, G. R., Aebersold, R., and Groudine, M. (2004). Dynamic changes in transcription factor complexes during erythroid differentiation revealed by quantitative proteomics. Nat. Struct. Mol. Biol. 11(1):73–80. Guerrero, C., Tagwerker, C., Kaiser, P., and Huang, L. (2006). An integrated mass spectrometry-based proteomic approach: quantitative analysis of tandem affinitypurified in vivo cross-linked protein complexes (QTAX) to decipher the 26 S proteasome-interacting network. Mol. Cell. Proteomics 5(2):366–378. Rodriguez, P., Braun, H., Kolodziej, K. E., de Boer, E., Campbell, J., Bonte, E., Grosveld, F., Philipsen, S., and Strouboulis, J. (2006). Isolation of transcription factor complexes by in vivo biotinylation tagging and direct binding to streptavidin beads. Methods Mol. Biol. 338:305–323. Kumar, A., and Snyder, M. (2002). Protein complexes take the bait. Nature 415: 123–124. Takahashi, N., Yanagida, M., Fujiyama, S., Hayano, T., and Isobe, T. (2003). Proteomic snapshot analysis of preribosomal ribonucleoprotein complexes formed at various stages of ribosome biogenesis in yeast and mammalian cells. Mass Spectrom. Rev. 22:287–317. Grandi, P., Rybin, V., Bassler, J., Petfalski, E., Strauss, D., Marzioch, M., Schaefer, T., Kuster, B., Tschochner, H., Tollervey, D., Gavin, A. C., and Hurt, E. (2002). 90S pre-ribosomes include the 35S pre-rRNA, the U3 snoRNP, and 40S subunit processing factors but predominantly lack 60S synthesis factors. Mol. Cell 10:105–115. Nissan, T. A., Bassler, J., Petfalski, E., Tollervey, D., and Hurt, E. (2002). 60S preribosome formation viewed from assembly in the nucleolus until export to the cytoplasm. EMBO J. 21: 5539–5547. Makarov, E. M., Makarova, O. V., Urlaub, H., Gentzel, M., Will, C. L., Wilm, M., Luhrmann, R. (2002). Small nuclear ribonucleoprotein remodeling during catalytic activation of the spliceosome. Science 298(5601):2205–2208. Husi, H., Ward, M. A., Choudhary, J. S., Blackstock, W. P., and Grant, S. G. N. (2000). Proteomic analysis of NMDA receptor-adhesion protein signaling complexes. Nat. Neurosci. 3:661–669. Kemmeren, P., van Berkum, N. L., Vilo, J., Bijma, T., Donders, R., Brazma, A., and Holstege, F. C. (2002). Protein interaction verification and functional annotation by integrated analysis of genome-scale data. Mol. Cell 9:1133–1143. Rigaut, G., Shevchenko, A., Rutz, B., Wilm, M., Mann, M., and Seraphin, B. (1999). A generic protein purification method for protein complex characterization and proteome exploration. Nat. Biotechnol. 17:1030–1032.

REFERENCES

239

39. Ichimura, T., Kakiuchi, K., Sasamoto, K., and Isobe, T (2004). Methods for epitope-tag expression. In Experimental Manuals for Proteome Analyses, T. Isobe and N. Takahashi (Eds.), Yodo-sha, Tokyo, Japan, pp. 168–179 (in Japanese). 40. Shimamoto, A., and Yanagida, M. (2000). Method for interaction analysis using epitope-tag immunoprecipitation. In Analytical Methods in Proteomics, T. Isobe and N. Takahashi (Eds.), Yodo-sha, Tokyo, Japan, pp. 166–174, (in Japanese). 41. Ho, Y., Gruhler, A., Heilbut, A., Bader, G. D., Moore, L., Adams, S. L., Millar, A., Taylor, P., Bennett, K., Boutilier, K., Yang, L., Wolting, C., Donaldson, I., Schandorff, S., Shewnarane, J., Vo, M., Taggart, J., Goudreault, M., Muskat, B., Alfarano, C., Dewar, D., Lin, Z., Michalickova, K., Willems, A. R., Sassi, H., Nielsen, P. A., Rasmussen, K. J., Andersen, J. R., Johansen, L. E., Hansen, L. H., Jespersen, H., Podtelejnikov, A., Nielsen, E., Crawford, J., Poulsen, V., Soensen, B. D., Matthiesen, J., Hendrickson, R. C., Gleeson, F., Pawson, T., Moran, M. F., Durocher, D., Mann, M., Hogue, C. W. V., Figeys, D., and Tyers, M. (2002). Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415:180–183. 42. Yanagida, M., Hayano, T., Yamauchi, Y., Shinkawa, T., Natsume, T., Isobe, T., and Takahashi, N. (2004). Human fibrillarin forms a sub-complex with splicing factor 2 associated p32, protein/arginine methyltransferases, tubulin α3 and β1, which is independent of its association with preribosomal ribnucleoprotein complexes. J. Biol. Chem. 379:1607–1614. 43. Ichimura, T., Yamamura, H., Sasamoto, K., Tominaga, Y., Taoka, M., Kakiuchi, K., Shinkawa, T., Takahashi, N., Shimada, S., and Isobe, T. (2005). 14-3-3 proteins modulate the expression of epithelial Na⫹ channels by phosphorylation-dependent interaction with Nedd4-2 ubiquitin ligase. J. Biol. Chem. 280:13187–13194. 44. Hozel, H., Rohrmoser, M., Schlee, M., Grimm, T., Harasim, T., Malamoussi, A., Gruber-Eber, A., Kremmer, E., Hiddemann, W., Bornkamm, G. W., and Eick, D. (2005). Mammalian WDR12 is a novel member of the Pes1-Bop1 complex and is required for ribosome biogenesis and cell proliferation. J. Cell Biol. 170(3):367–378. 45. Ghaemmaghami, S., Huh, W.-K., Bower, K., Howson, R. W., Belle, A., Dephoure, N., O’Shea, E. K., and Weissman, J. S. (2003). Global analysis of protein expression in yeast. Nature 425:737–741. 46. Kumar, A., Agarwal, S., Heyman, J. A., Matson, S., Heidtman, M., Piccirillo, S., Umansky, L., Drawid, A., Jansen, R., Liu, Y., Cheung, K. H., Miller, P., Gerstein, M., Roeder, G. S., and Snyder, M. (2002). Subcellular localization of the yeast proteome. Genes Dev. 16(6):707–719. 47. Gavin, A.-C., Aloy, P., Grandi, P., Krause, R., Boesche, M., Marzioch, M., Rau, C., Jensen, L. J., Bastuck, S., Dümpelfeld, B., Edelmann, A., Heurtier, M., Hoffman, V., Hoefert, C., Klein, K., Hudak, M., Michon, A. M., Schelder, M., Schirle, M., Remor, M., Rudi, T., Hooper, S., Bauer, A., Bouwmeester, T., Casari, T., Drewes, G., Neubauer, G., Rick, J. M., Kuster, B., Bork, P., Russell, R. B., and Superti-Furga, G. (2006). Proteome survey reveals modularity of the yeast cell machinery. Nature 440(30):631–636. 48. Krogan, N. J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A. P., Punna, T., Peregr, L., Alvarez, J. M., Shales, M., Zhang, X., Davey, M., Robinson, M. D., Paccanaro, A., Bray, J. E., Sheung, A., Beattie, B., Richards, D. P., Canadien, V., Lalev, A., Mena, F., Wong, P., Starostine, A., Canete, M. M., Vlasblom, J., Wu, S., Orsi, C., Collins, S. R., Chandran, S., Haw, R., Rilstone, J. J., Gandi, K., Thompson, N. J., Musso, G., Onge, P. S., Ghanny, S., Lam, M. H. Y., Butland, G., Altaf-Ul, A. M., Kanaya, K. S., Shilatifard, A., O’Shea, E., Weissman, J.

240

49.

50. 51.

52. 53. 54. 55.

56. 57.

58.

59.

60.

61.

62.

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

S., Ingles, C. J., Hughes, T. R., Parkinson, J., Gerstein, M., Wodak, S. J., Emili, A., and Greenblatt, J. F. (2006). Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440(30):637–643. Neubauer, G., King, A., Rappsilber, J., Calvio, C., Watson, M., Ajuh, P., Sleeman, J., Lamond, A., and Mann, M. (1998). Mass spectrometry and EST-database searching allows characterization of the multi-protein spliceosome complex. Nat. Genet. 20:46–50. Zhou, Z., Licklider, L. J., Gygi, S. P., and Reed, R. (2002). Comprehensive proteomic analysis of the human spliceosome. Nature 419:182–185. Dragon, F., Gallagher, J. E. G., Compagnone-Post, P. A., Mitchell, B. M., Porwancher, K. A., Wehner, K. A., Wormsley, S., Settlage, R. E., Shabanowitz, J., Osheim, Y., Beyer, A. L., Hunt, D. F., and Baserga, S. J. (2002). A large nucleolar U3 ribonucleoprotein required for 18S ribosomal RNA biogenesis. Nature 417:967–970. Harlow, E., and Lane, D. (1999). Using Antibodies: Laboratory Manual, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. Peters, J.-M., King, R. W., Hoog, C., and Kirschner, M. W. (1996). Identification of BIME as a subunit of the anaphase-promoting complex. Science 274:1199–1201. Houry, W. A., Frishman, D., Eckerskorn, C., Lottspeich, F., and Hartl, F. U. (1999). Identification of in vivo substrates of the chaperonin GroEL. Nature 402:147–154. Izumikawa, K., and Takahashi, N. (2004). Method on the isolation of endogenous protein complexes using antibody-immobilized affinity beads. In Experimental Protocols for Proteomics, T. Isobe and N. Takahashi (Eds.), Yodo-sha, Tokyo, Japan pp. 191–202 (in Japanese). Putnam, F. W. (1987). Immunoglobulins: structure, function, and genes. In The Plasma Proteins, Vol. V, F.W. Putnam (Ed.), Academic Press, New York, pp. 49–140. Kawabe, T., Tsuyama, N., Kitao, S., Nishikawa, K., Shimamoto, A., Shiratori, M., Matsumoto, T., Anno, K., Sato, T., Mitsui, Y., Seki, M., Enomoto, T., Goto, M., Ellis, N. A., Ide, T., Furuichi, Y., and Sugimoto, M. (2000). Differential regulation of human RecQ family helicases in cell transformation and cell cycle. Oncogene 19(41):4764–4772. Fujiyama, S., and Takahashi, N. (2004). Affinity purification. In Experimental Protocols for Proteomics, T. Isobe and N. Takahashi (Eds.), Yodo-sha, Tokyo, Japan, pp. 180–190 (in Japanese). Fujiyama, S., Yanagida, M., Hayano, T., Miura, Y., Uchida, T., Fujimori, F., Isobe, T., and Takahashi, N. (2002). Isolation and proteomic characterization of human parvulin-associating preribosomal ribonucleoprotein complexes. J. Biol. Chem. 277:23773– 23780. Yanagisawa, J., Kitagawa, H., Yanagida, M., Wada, O., Ogawa, S., Nakagomi, M., Oishi, H., Yamamoto, Y., Cole, M. D., Tora, L., Takahashi, N., and Kato, K. (2002). Nuclear receptor function requires a TFTC-type histone acetyl transferase complex. Mol. Cell 9:553–562. Naar, A. M., Beaurang, P. A., Zhou, S., Abraham, S., Solomon, W., and Tjian, R.(1999). Composite co-activator ARC mediates chromatin-directed transcriptional activation. Nature 398:828–832. Rachez, C., Lemon, B. D., Suldan, Z., Bromleigh, V., Gamble, M., Naar, A. M., ErdjumentBromage, H., Tempst, P., and Freedman, L. P. (1999). Ligand-dependent transcription activation by nuclear receptors requires the DRIP complex. Nature 398:824–828.

REFERENCES

241

63. Ryu, S., and Tjian, R.(1999). Purification of transcription cofactor complex CRSP. Proc. Natl. Acad. Sci. USA 96:7137–7142. 64. Levine, M., and Tjian, R. (2003). Transcription regulation and animal diversity. Nature 424:147–151. 65. Sanders, S. L., Jennings, J., Canutescu, A., Link, A. J., and Weil, P. A. (2002). Proteomics of the eukaryotic transcription machinery: identification of proteins associated with components of yeast TFIID by multidimensional mass spectrometry. Mol. Cell. Biol. 22:4723–4738. 66. Chan, S. P., Kao, D. I., Tsai, W. Y., Cheng, S. C. (2003). The Prp19p-associated complex in spliceosome activation. Science 302(5643):279–282. 67. Cronshaw, J. M., Krutchinsky, A. N., Zhang, W., Chait, B. T., and Matunis, M. J. (2002). Proteomic analysis of the mammalian nuclear pore complex. J. Cell Biol. 158 (5):915–927. 68. Li, T., Evdokimov, E., Shen, R. F., Chao, C. C., Tekle, E., Wang, T., Stadtman, E. R., Yang, D. C., and Chock, P. B. (2004). Sumoylation of heterogeneous nuclear ribonucleoproteins, zinc finger proteins, and nuclear pore complex proteins: a proteomic analysis. Proc. Natl. Acad. Sci. USA 101:8551–8556. 69. Zachariae, W., Shin, T. H., Galova, M., Obermaier, B., and Nasmyth, K. (1996). Identification of subunits of the anaphase-promoting complex of Saccharomyces cerevisiae. Science 274:1201–1204. 70. Grossberger, R., Gieffers, C., Zachariae, W., Podtelejnikov, A. V., Schleiffer, A., Nasmyth, K., Mann, M., and Peters, J. M. (1999). Characterization of the DOC1/APC10 subunit of the yeast and the human anaphase-promoting complex. J. Biol. Chem. 274:14500–14507. 71. Leonoudakis, D., Conti, L. R., Anderson, S., Radeke, C. M., McGuire, L. M., Adams, M. E., Froehner, S. C., Yates, J. R. III, and Vandenberg, C. A. (2004). Protein trafficking and anchoring complexes revealed by proteomic analysis of inward rectifier potassium channel (Kir2.x)-associated proteins. J. Biol. Chem. 279(21):22331–22346. 72. Angers, S., Li, T., Yi, X., MacCoss, M. J., Moon, R. T., and Zheng, N. (2006). Molecular architecture and assembly of the DDB1-CUL4A ubiquitin ligase machinery. Nature 443(7111):590–593. 73. Link, A. J., Eng, J., Schieltz, D. M., Carmack, E., Mize, G. J., Morris, D. R., Garvik, B. M., and Yates, J. R. (1999). Direct analysis of protein complexes using mass spectrometry. Nat Biotechnol. 17:676–682. 74. Natsume, T., Yamauchi, Y., Nakayama, H., Shinkawa, T., Yanagida, M., Takahashi, N., and Isobe,T. (2002). Direct nano flow liquid chromatography–tandem mass spectrometry system for functional proteomics. Anal. Chem 74:4725–4733. 75. Lee, S., Berger, S. J., Martinovic, S., Pasa-Tolic, L., Anderson, G. A., Shen, Y., Zhao, R., and Smith, R. D. (2002). Direct mass spectrometric analysis of intact proteins of the yeast large ribosomal subunit using capillary LC/FTICR. Proc. Natl. Acad. Sci. USA 99:5942–5947. 76. Ranish, J. A., Hahn, S., Lu, Y., Yi, E. C., Li, X. J., Eng, J., and Aebersold, R. (2004). Identification of TFB5, a new component of general transcription and DNA repair factor IIH. Nat. Genet. 36(7):707–713. 77. Sato, S., Tomomori-Sato, C., Parmely, T. J., Florens, L., Zybailov, B., Swanson, S. K., Banks, C. A., Jin, J., Cai, Y., Washburn, M. P., Conaway, J. W., and Conaway, R. C.

242

78.

79.

80.

81.

82.

83.

84.

85.

86.

87.

88.

89.

90.

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

(2004). A set of consensus mammalian mediator subunits identified by multidimensional protein identification technology. Mol Cell. 14(5):685–691. Zhang, Z., Wu, C. H., and Gilmour, D. S. (2004). Analysis of polymerase II elongation complexes by native gel electrophoresis. Evidence for a novel carboxyl-terminal domain-mediated termination mechanism. J. Biol. Chem. 279(22):23223–23228. Trivedi, A. K., Bararia, D., Christopeit, M., Peerzada, A. A., Singh, S. M., Kieser, A., Hiddemann, W., Behre, H. M., and Behre, G. (2006). Proteomic identification of C/EBP-DBD multiprotein complex: JNK1 activates stem cell regulator C/EBPalpha by inhibiting its ubiquitination. Oncogene 26(12):1789–1801. Miles, R. R., Crockett, D. K., Lim, M. S., and Elenitoba-Johnson, K. S. (2005). Analysis of BCL6-interacting proteins by tandem mass spectrometry. Mol. Cell. Proteomics. 4(12):1898–1909. Hedman, E., Widen, C., Asadi, A., Dinnetz, I., Schroder, W. P., Gustafsson, J. A., and Wikstrom, A. C. (2006). Proteomic identification of glucocorticoid receptor interacting proteins. Proteomics 6(10):3114–3126. Shiio, Y., Rose, D .W., Aur, R., Donohoe, S., Aebersold, R., and Eisenman, (2006). Identification and characterization of SAP25, a novel component of the mSin3 corepressor complex. Mol. Cell. Biol. 26(4):1386–1397. Wood, A., Schneider, J., Dover, J., Johnston. M., and Shilatifard. A. (2005). The Bur1/ Bur2 complex is required for histone H2B monoubiquitination by Rad6/Bre1 and histone methylation by COMPASS. Mol. Cell. 20(4):589–599. Tackett, A. J., Dilworth, D. J., Davey, M. J., O’Donnell, M., Aitchison, J. D., Rout, M. P., and Chait, B. T. (2005) Proteomic and genomic characterization of chromatin complexes at a boundary. J. Cell Biol. 169:35–47. Fierro-Monti, I., Mohammed, S., Matthiesen, R., Santoro, R., Burns, J. S., Williams, D. J., Proud, C. G., Kassem, M., Jensen, O. N., and Roepstorff, P. (2006). Quantitative proteomics identifies Gemin5, a scaffolding protein involved in ribonucleoprotein assembly, as a novel partner for eukaryotic initiation factor 4E. J. Proteome Res. 5(6):1367–1378. Moore, J. M., Galicia, S. J., McReynolds, A. C., Nguyen, N. H., Scanlan, T. S., and Guy, R. K. (2004). Quantitative proteomics of the thyroid hormone receptor–coregulator interactions. J. Biol. Chem. 279(26):27584–27590. Du, Y. C., Gu, S., Zhou, J., Wang, T., Cai, H., Macinnes, M. A., Bradbury, E. M., and Chen, X. (2006). The dynamic alterations of H2AX complex during DNA repair detected by a proteomic approach reveal the critical roles of Ca(2⫹)/calmodulin in the ionizing radiation-induced cell cycle arrest. Proteomics 5(6):1033–1044. Lewis, T. S., Hunt, J. B., Aveline, L. D., Jonscher, K. R., Louie, D. F., Yeh, J. M., Nahreini, T. S., Resing, K. A., and Ahn, N. G. (2000). Identification of novel MAP kinase pathway signaling targets by functional proteomics and mass spectrometry. Mol. Cell. 6(6):1343–1354. Blagoev, B., Kratchmarova, I., Ong, S. E., Nielsen, M., Foster, L. J., and Mann, M. (2003). A proteomics strategy to elucidate functional protein–protein interactions applied to EGF signaling. Nat. Biotechnol. 21:315–318. Olsen, D., Moore, K. A., Fukata, M., Kazuta, T., Trinidad, J. C., Kauer, F. W., Streuli, M., Misawa, H., Burlingame, A. L., Nicoll, R. A., and Bredt, D. S. (2005). Neurotransmitter release regulated by a MALS–liprin–presynaptic complex. J. Cell Biol. 170:1127–1134.

REFERENCES

243

91. Bernhard, O. K., Cunningham, A. L., and Sheil, M. M. (2004). Analysis of proteins copurifying with the CD4/lck complex using one-dimensional polyacrylamide gel electrophoresis and mass spectrometry: comparison with affinity-tag based protein detection and evaluation of different solubilization methods. J. Am. Soc. Mass Spectrom. 15(4):558–567. 92. Farr, C. D., Gafken, P. R., Norbeck, A. D., Doneanu, C. E., Stapels, M. D., Barofsky, D. F., Minami, M., and Saugstad, J. A. (2004). Proteomic analysis of native metabotropic glutamate receptor 5 protein complexes reveals novel molecular constituents. J. Neurochem. 91(2):438–450. 93. Coussen, F., Perrais, D., Jaskolski, F., Sachidhanandam, S., Normand, E., Bockaert, J., Marin, P., and Mulle, C. (2005). Co-assembly of two GluR6 kainate receptor splice variants within a functional protein complex. Neuron 47(4):555–566. 94. Tian, Q., Feetham, M. C., Tao, W. A., He, X. C., Li, L., Aebersold, R., and Hood, L. (2004). Proteomic analysis identifies that 14-3-3zeta interacts with beta-catenin and facilitates its activation by Akt. Proc. Natl. Acad. Sci. USA 101(43):15370–15375. 95. Bouwmeester, T., Bauch, A., Ruffner, H., Angrand, P.-O., Bergamini, G., Croughton, K., Cruciat, C., Eberhard, D., Gagneur, J., Ghidelli, S., Hopf, C., Huhse, B., Mangano, R., Michon, A.-M., Schirle, M., Schlegl, J., Schwab, M., Stein, M. A., Bauer, A., Casari, G., Drewes, G., Gavin, A. C., Jackson, D. B., Joberty, G., Neubauer, G., Rick, J., Kuster, B., and Superti-Furga, G. (2004). A physical and functional map of the human TNFα/NF-κB signal transduction pathway. Nat. Cell Biol. 6(2):97–105. 96. Sharan, R., and Ideker, T. (2006). Modeling cellular machinery through biological network comparison. Nat. Biotechnol. 24(4):427–433. 97. Thomas, G. (2000). An encore for ribosome biogenesis in the control of cell proliferation. Nat. Cell Biol. 2:E71–E72. 98. Tollervey, D. (1996). Trans-acting factors in ribosome synthesis. Exp. Cell. Res. 229:226–232; Lafontaine, D. L. J., and Tollervey, D. (1998). Birth of the snoRNPs: the evolution of the modification-guide snoRNAs.Trends Biochem. Sci. 23, 383–388. 99. Warner, J. R. (1990). The nucleolus and ribosome formation. Curr. Opin. Cell Biol. 2: 521–527. 100. Piñol-Roma, S. (1999). Association of non-ribosomal nucleolar proteins in ribonucleoprotein complexes during interphase and mitosis. Mol. Biol. Cell 10:77–90. 101. Warner, J. R. (2001). Nascent ribosome. Cell 107:133–136. 102. Fatica, A., and Tollervey, D. (2002). Making ribosomes. Curr. Opin. Cell Biol. 14:313–318. 103. Warner, J. R., and Soeiro, R. (1967). Nascent ribosomes from HeLa cells. Proc. Natl. Acad. Sci. USA 58(5):1984–1990. 104. Vebema, J., and Tollervey, D. (1999). Ribosome synthesis in Saccharomyces cerevisiae. Annu. Rev. Genet. 33:261–311. 105. Fromont-Racine, M., Senger, B., Saveanu, C., and Fasiolo, F. (2003). Ribosome assembly in eukaryotes. Gene 313:17–42. 106. Milkereit, P., Kuhn, H., Gas, N., and Tschochner, H. (2003). The pre-ribosomal network. Nucleic Acids Res. 31:799–804. 107. Leary, D. J., and Huang, S. (2001). Regulation of ribosome biogenesis within the nucleolus. FEBS Lett. 509(2):145–150. 108. Babler, J., Grandi, P., Gadal, O., Lebmann, T., Petfalski, E., Tollervey, D., Lechner, J., and Hurt, E. (2001). Identification of a 60S preribosomal particle that is closely linked to nuclear export. Mol. Cell 8:517–529.

244

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

109. Bassler, J., Grandi, P., Gadal, O., Lessmann, T., Petfalski, E., Tollervey, D., Lechner, J., and Hurt, E. (2001). Identification of a 60S preribosomal particle that is closely linked to nuclear export. Mol. Cell 8:517–529. 110. Harnpicharnchai, P., Jakovljevic, J., Horsey, E., Miles, T., Roman, J., Rout, M., Meagher, D., Imai, B., Guo, Y., Brame, C. J., Shabanowitz, J., Hunt, D. F., and Woolford, J. L. Jr. (2001). Composition and functional characterization of yeast 66S ribosome assembly intermediates. Mol. Cell 8:505–515. 111. Savino, T. M., Gébrane-Younés, J., Mey, J. D., Sibarita, J. B., and Hernandez-Verdun, D. (2001). Nucleolar assembly of the rRNA processing machinery in living cells. J. Cell Biol. 153:1097–1110. 112. Horsey, E. W., Jakovljevic, J., Miles, T. D., Harnpicharnchai, P., and Woolford, J. L. Jr. (2004). Role of the yeast Rrp1 protein in the dynamics of pre-ribosome maturation. RNA 10(5):813–827. 113. Saveanu, C., Bienvenu, D., Namane, A., Gleizes, P. E., Gas, N., Jacquier, A., and Fromont-Racine, M. (2001). Nog2p, a putative GTPase associated with pre-60S subunits and required for late 60S maturation steps. EMBO J. 20:6475–6484. 114. Fatica, A., Cronshaw, A. D., Dlakic, M., and Tollervey, D. (2002). Ssf1p prevents premature processing of an early pre-60S ribosomal particle. Mol. Cell 9:341–351. 115. Udem, S. A., and Warner, J. R. (1972). Ribosomal RNA synthesis in Saccharomyces cerevisiae. J. Mol. Biol. 65:227–242. 116. Hayano, T., and Takahashi, N. (2006). Snapshot analysis of functional protein complexes: proteomic analysis of ribosome synthesis pathway. Saibou kougaku 25:624–629 (in Japanese). 117. Talkington, M. W. T., Siuzdak, G., and Williamson, J. R. (2005). An assembly landscape for the 30S ribosomal subunit. Nature 438(7068):628–632. 118. Larson, D. E., Zahradka, P., and Sells, B. H. (1991). Control points in eukaryotic ribosome biogenesis. Biochem. Cell Biol. 69(1):5–22. 119. Bowman, L. H., Rabin, B, and Friesen, J. D. (1981). Multiple ribosomal RNA cleavage pathways in mammalian cells. Nucleic Acids Res. 9:4951–4966. 120. Hadjiolova, K. V., Nicoloso, M., Mazan, S., Hadjiolov, A. A., and Bachellerie, J. P. (1993). Alternative pre-rRNA processing pathways in human cells and their alteration by cycloheximide inhibition of protein synthesis. Eur. J. Biochem. 212:211–215. 121. Riggero, D., and Pandolfi, P. P. (2003). Does the ribosome translate cancer? Nat. Rev. Cancer 3:179–192. 122. Dixon, J., Edwards, S. J., Gladwin, A. J., Dixon, M. J., Loftus, S. K., Bonner, C. A., Koprivnikar, K., and Wasmuth, J. J. (1996). Positioning cloning of a gene involved in the pathogenesis of Treacher Collins syndrome. Nat. Genet. 12:130–136. 123. Hayano, T., Yanagida, M., Yamauchi, Y., Shinkawa, T., Isobe, T., and Takahashi, N. (2003). Proteomic analysis of human Nop56p-associated pre-ribosomal ribonucleoprotein complexes: possible link between Nop56p and the nucleolar protein treacle responsible for Treacher Collins syndrome. J. Biol. Chem. 278(36):34309–34319. 124. Valdez, B. C., Henning, D., So, R. B, Dixon, J., and Dixon, M. J. (2004). The Treacher Collins syndrome (TCOF1) gene product is involved in ribosomal DNA gene transcription by interacting with upstream binding factor. Proc. Natl. Acd. Sci. USA 101(29):10709–10714.

REFERENCES

245

125. Ginisty, H., Amalric, F., and Bouvet, P. (1998). NCL functions in the first step of ribosomal RNA processing. EMBO J. 17: 1476–1486. 126. Yu, Y., Maggi, L. B. Jr., Brady, S. N., Apicelli, A. J., Dai, M. S., Lu, H., and Weber, J. D. (2006). Nucleophosmin is essential for ribosomal protein L5 nuclear export. Mol. Cell. Biol. 26(10):3798–3809. 127. Huang, N., Negi, S., Szebeni, A., and Olson, M. O. (2005). Protein NPM3 interacts with the multifunctional nucleolar protein B23/nucleophosmin and inhibits ribosome biogenesis. J. Biol. Chem. 280(7):5496–5502. 128. Jansen, R. P., Hurt, E. C., Kern, H., Lehtonen, H., Carmo-Fonseca, M., Lapeyre, B., and Tollervey, D. (1991). Evolutionary conservation of the human nucleolar protein fibrillarin and its functional expression in yeast. J. Cell Biol. 113(4):715–729. 129. Westendorf, J. M., Konstantinov, K. N., Wormsley, S., Shu, M.-D., Matsumoto, Taniura, N., Pirollet, F., Klier, F. G., Gerace, L., and Baserga, S. J. (1998). M phase phosphoprotein 10 is a human U3 small nucleolar ribonucleoprotein component. Mol. Biol. Cell 9:437–449. 130. Strezoska, Z., Pestov, D. G., and Lau, L. F. (2000). Bop1 is a mouse WD40 repeat nucleolar protein involved in 28S and 5.8S rRNA processing and 60S ribosome biogenesis. Mol. Cell. Biol. 20:5516–5528. 131. Lapik, Y. R., Fernandes, C. J., Lau, L. F., and Pestov, D. G. (2004). Physical and functional interaction between Pes1 and Bop1 in mammalian ribosome biogenesis. Mol. Cell 15:17–29. 132. Ruggero, D., Grisendi, S., Piazza, F., Rego, E., Mari, F., Rao, P. H., Cordon-Cardo, C., and Pandolfi, P. P. (2003). Dyskeratosis congentia and cancer in mice deficient in ribosomal RNA modification. Science 299:259–262. 133. Pinol-Roma, S. (1999). Association of nonribosomal nucleolar proteins in ribonucleoprotein complexes during interphase and mitosis. Mol. Biol. Cell 10:77–90. 134. Hinsby, A. M., Kiemer, L., Karlberg, E. O., Lage, K., Fausboll, A., Juncker, A. S., Andersen, J. S., Mann, M., and Brunak, S. (2006). A wiring of the human nucleolus. Mol. Cell. 22(2):285–295. 135. Yanagida, M., Shimamoto, A., Nishikawa, K., Furuichi, Y., Isobe, T., and Takahashi, N. (2001). Isolation and proteomic characterization of the major proteins of the nucleolin-binding ribonucleoprotein complexes. Proteomics 1:1390–1404. 136. Fujiyama, S. (2004). PhD dissertation, Tokyo University of Agriculture and Technology. 137. Sekiguchi, T., Hayano, T., Yanagida, M., Takahashi, N., and Nishimoto, T. (2006). NOP132 is required for proper nucleolus localization of DEAD-box RNA helicase DDX47. Nucleic Acids Res. 34(16):4593–4608. 138. Boon, K., Caron, H. N., van Asperen, R., Valentijn, L., Hermus, M. C., van Sluis, P., Roobeek, I., Weis, I., Voute, P. A., Schwab, M., and Versteeg, R. (2001). N-myc enhances the expression of a large set of genes functioning in ribosome biogenesis and protein synthesis. EMBO J. 20(6):1383–1393. 139. Nelson, S. A., Aris, J. P., Patel, B. K., and LaRochelle, W. J. (2000). Multiple growth factor induction of a murine early response gene that complements a lethal defect in yeast ribosome biogenesis. J. Biol. Chem. 275(18):13835–13841. 140. Kroll, S. L., Barth-Baus, D., and Hensold, J. O. (2001). The carboxyl-terminal domain of the granulocyte colony-stimulating factor receptor uncouples ribosomal biogenesis from cell cycle progression in differentiating 32D myeloid cells. J. Biol. Chem. 276(52):49410–49418.

246

DYNAMICS OF FUNCTIONAL CELLULAR MACHINERY

141. Savkur, R. S., and Olson, M. O. (1998). Preferential cleavage in pre-ribosomal RNA by protein B23 endoribonuclease. Nucleic Acids Res. 26:4508–4515. 142. Sugimoto, M., Kuo, M.-L., Roussel, M. F., and Sherr, C. J. (2003). Nucleolar Arf tumor suppressor inhibits ribosomal RNA processing. Mol. Cell 11:415–424. 143. Korgaonkar, C., Hagen, J., Tompkins, V., Frazier, A. A., Allamargot, C., Quelle, F. W., and Quelle, D. E. (2005). Mol. Cell. Biol. 25:1258–1271. 144. Itahana, K., Bhat, K. P., Jin, A., Itahana, Y., Hawke, D., Kobayashi, R., and Zhang, Y. (2003). Tumor suppressor ARF degrades B23, a nucleolar protein involved in ribosome biogenesis and cell proliferation. Mol. Cell 12:1151–1164. 145. Rizos, H., McKenzie, H. A., Ayub, A. L., Woodruff, S., Becker, T. M., Scurr, L. L., Stahl, J., and Kefford, R. F. (2006). Physical and functional interaction of the p14Arf tumour suppressor with ribosome. J. Biol. Chem. 281(49):38080–38088. 146. Volarevic, S., Stewart, M. J., Ledermann, B., Zilberman, F., Terracciano, L., Montini, E., Grompe, M., Kozma, S. C., and Thomas, G. (2000). Proliferation, but not growth, blocked by conditional deletion of 40S ribosomal protein S6. Science 288(5473): 2045–2047. 147. Jorgensen, P., Nishikawa, J. L., Breitkreutz, B. J., and Tyers, M. (2002). Systematic identification of pathways that couple cell growth and division in yeast. Science 297(5580):395–400. 148. Sudbery, P. (2002). Cell biology. When wee meets whi. Science 297:351–352. 149. Fath, S., Milkereit, P., Podtelejnikov, A. V., Bischler, N., Schultz, P., Bier, M., Mann, M., and Tschochner, H. (2000). Association of yeast RNA polymerase I with a nucleolar substructure active in rRNA synthesis and processing. J. Cell Biol. 149:575–590. 150. Granneman, S., and Baserga, S. J. (2005). Crosstalk in gene expression: coupling and co-regulation of rDNA transcription, pre-ribosome assembly and pre-rRNA processing. Curr. Opin. Cell Biol. 17:281–286. 151. Gallagher, J. E., Dunbar, D. A., Granneman, S., Mitchell, B. M., Osheim, Y., Beyer, A. L., and Baserga, S. J. (2004). RNA polymerase I transcription and pre-rRNA processing are linked by specific SSU processome components. Genes Dev. 18:2506–2517. 152. Osheim, Y. N., French, S. L., Keck, K. M., Champion, E. A., Spasov, K., Dragon, F., Baserga, S. J., and Beyer, A. L. (2004). Pre-18S ribosomal RNA is structurally compacted into the SSU processome prior to being cleaved from nascent transcripts in Saccharomyces cerevisiae. Mol Cell 16:943–954. 153. Shinkawa, T., Taoka, M., Yamauchi, Y., Ichimura, T., Kaji, H., Takahashi, N., and Isobe, T. (2005). STEM: a software tool for large-scale proteomic data analyses. J. Proteome Res. 4(5):1826–1831. 154. Qian, W. J., Monroe, M. E., Liu, T., Jacobs, J. M., Anderson, G. A., Shen, Y., Moore, R. J., Anderson, D. J., Zhang, R., Calvano, S. E., Lowry, S. F., Xiao, W., Moldawer, L. L., Davis, R. W., Tompkins, R. G., Camp, D. G. 2nd; and Smith, R. D. (2005). Quantitative proteome analysis of human plasma following in vivo lipopolysaccharide administration using 16O/18O labeling and the accurate mass and time tag approach. Mol. Cell. Proteomics. 4:700–709. 155. Yanagida, M. (2003). PhD dissertation, Tokyo University of Agriculture and Technology. 156. Stavreva, D. A., Kawasaki, M., Dundr, M., Koberna, K., Müeller, W. G., TsujimuraTakahashi, T., Komatsu, W., Hayano, T., Isobe, T., Raska, I., Misteli, T., Takahashi,

REFERENCES

157.

158.

159. 160.

161.

162. 163.

164. 165.

166.

247

N., and McNally, J. G. (2006). Potential roles for ubiquitin and the proteasome during ribosome biogenesis. Mol. Cell. Biol. 26(13):5131–5145. Hwang, S. I., Lundgren, D. H., Mayya, V., Rezaul, K., Cowan, A. E., Eng, J. K., and Han, D. K. (2006). Systematic characterization of nuclear proteome during apoptosis: a quantitative proteomic study by differential extraction and stable isotope labeling. Mol. Cell. Proteomics 5(6):1131–1145. Forner, F., Foster, L. J., Campanaro, S., Valle, G., and Mann, M. (2006). Quantitative proteomic comparison of rat mitochondria from muscle, heart, and liver. Mol. Cell. Proteomics 5(4):608–619. Ong, S. E., and Mann, M. (2005). Mass spectrometry-based proteomics turns quantitative. Nat. Chem. Biol. 1(5):252–262. Andersen, J. S., Lyon, C., E., Fox, A. H., Leung, A. K. L., Lam, W. W., Steen, H., Mann, M., and Lamond, A. I. (2002). Directed proteomic analysis of the human nucleolus. Curr. Biol. 12:1–11. Scherl, A., Coute, Y., Déon, C., Calle, A., Kindbeiter, K., Sanchez, J.-C., Greco, A., Hochstrasser, D., and Diaz, J.-J. (2002). Functional proteomic analysis of human nucleolus. Mol. Biol. Cell 13:4100–4109. Leung, A. K. L., Andersen, J. S., Mann, M., and Lamond, A. I. (2003). Bioinformatic analysis of the nucleolus. Biochem. J. 376:553–569. Couté, Y., Burgess, J. A., Diaz, J.-J., Chichester, C., Lisacek, F., Greco, A., and Sanchez, J.-C. (2006). Deciphering the human nucleolar proteome. Mass Spectrom. Rev. 25:215–234. Mauramatsu, M., Smetana, K., and Busch, H. (1963). Quantitative aspects of isolation of nucleoli of the Walker carcinosarcoma and liver of the rat. Cancer Res. 25:693–697. Lamond, A. I., and Sleeman, J. E. (2003). Nuclear substructure and dynamics. Curr. Biol. 13:R825–R828; Leung, A. K., and Lamond, A. I. (2002). In vivo analysis of NHPX reveals a novel nucleolar localization pathway involving a transient accumulation in splicing speckles. J. Cell Biol. 157:615–629. Ong, S. E., Blagoev, B., Kratchmarova, I., Kristensen, D. B., Steen, H., Pandey, A., and Mann, M. (2002). Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 1:376–386.

INDEX

18S pre-rRNA, 206 30S pre-rRNA, 206 40S subunit synthesis, 205 60S biogenesis factors, 205 90S preribosomal complex, 204 90S preribosome particles, 205 Absolute quantification (AQUA), 148, 150 Acetylation, 17, 23 Achromobacter protease I, 89 Acid-cleavable linker, 136 Actinomycin D (inhibitor of RNA polymerase I), 170, 233, 234 Activity-based protein profiling (ABP), 16 Affinity purification, 177 Affinity purification-tags, 176 Affinity-tag-fused protein, 188 Affinity-tag purification, 174 All-in one capillary column, 73 All-in-one ESI column, 74 Aminoethylcycteine, 111, 114 Aminoethylcycteine modification, 113 Anaphase-promoting complex, 197 Anion exchange (AE), 72 Antibody-fixed beads, 180, 181 Antibody-fixed protein G-Sepharose beads, 185

Anti-FLAG antibody-fixed beads, 210 Arf, 208 ASAPRatio, 143 Assembly snapshot, 205 Assembly snapshot analysis, 171, 202 Automated multidimensional LC-MS/MS, 197 Automated quantification, 144 Avidin, 106 b series ion, 11, 12 B23, 219 B23-associated pre-rRNA complex, 224 B23-associated pre-rRNP complex, 231 B-cell lymphoma 6 (BCL6) transcription factor, 197 BEMAD, 22 Benchmark success rate, 198 Biomarker, 13 Biomarker discovery, 145 Biotinylated peptide, 106 Biphasic capillary column, 65 Bottom-up approach, 9, 70 Box C/D snoRNAs, 205 Box C/D snoRNPs, 218 5-Bromo-UTP (Br-UTP) incorporation, 233

Proteomic Biology Using LC-MS: Large Scale Analysis of Cellular Dynamics and Function By Nobuhiro Takahashi and Toshiaki Isobe Copyright © 2008 John Wiley & Sons, Inc.

249

250 C. elegans, 96, 98, 123 Capillary column, 74 13 C4-succinic anhydride, 140 α-Casein, 116 β-Casein, 116 Cataloging proteomics, 15 CD4/lck receptor, 197 Cell-surface proteins, 102 Cellular localization, 204 Cellular machinery, 196 Chaperonin (GroEL), 15 Cleavable stable isotope-labeled synthetic peptide, 149 cMyc, 208 Coding sequence (CDS) identifier, 97 Codon adaptation index (CAI), 99, 100 Collision-induced dissociation (CID), 10, 11 Column packing, 76, 77 Complex-interaction proteomics, 16 Con-A agarose column, 121 Coomassie blue, 4 cprotein complex, 168 cullin-RING ubiquitin ligase complex, 197 Culture-derived isotope tag (CDIT), 134 α-Cyano-4-hydroxycinnamic acid, 7 Cytoplasm, 204 DAT file, 88 Database search, 88 Data-dependent collision-induced dissociation MS/MS, 84 Data-dependent tandem MS, 66 DDX47-associated pre-rRNP complexes, 218, 232 Descriptive proteomics, 15 Diagnosis, 13 Diamond-Blackfan anemia, 208 5,6-Dichlorobenzimidazole riboside (DRB, inhibitor of RNA polymerase II), 234 Different LC-MS, 146 Dimethyl pimelimidate (DMP), 181 Direct analysis of large protein complexes (DALPC), 68 Direct delivery pump, 79 Direct nano LC (DNLC) system, 80, 83 Double-tagging approach, 177, 179 Double-tagging methodology, 196 Dynamic analysis, 224 Dynamic range of protein abundance, 14 Dyskeratosis congentia, 208 Edman aequencing, 5 EGF-receptor-MAP kinase, 197 Electrical switching valve, 81

INDEX Electrospray ionization (ESI), 6, 7 Electrospray ionization apparatus, 72 Electrospray ionization interface, 73 Electrospray ionization-MS/MS, 12 Electrospray ionization spray needle, 72 Electrospray ionization-ToF MS, 65 β-Elimination, 18, 110, 113 emPAI, 91, 152 Endopeptidase Lys-C, 8 Enhancer binding protein α (EBPα), 197 Epitope-tag, 174 Epitope-tagged ubiquitin, 127 Escherichia coli, 98 Estrogen receptor, 196 Exosome, 218 Exponentially modified version of PAI (emPAI), 152 Expression proteomics, 15 Extracted ion chromatograms (XICs), 140 Factor Xa recognition sequence, 188 Fast atom bombardment (FAB), 6 Fibrillarin, 208 Fibrillarin (FIB)-associated pre-rRNP complex, 223, 231 FLAG epitope, 196 FLAG-tag, 176, 179 Fluorescence loss in photobleaching (FLIP), 234 Fluorescence recovery after photobleaching (FRAT), 234 Fluorescent dye, 4 Focused proteomics, 16 Fourier transformed ion cyclotron resonance (FT-ICR), 6 Fraction-splitting phenomenon, 92 Fragment ion mass spectrum, 10 FT-ICR-MS, 70 Functional annotation, 236 Functional proteomics, 15, 101 Gas phase protein sequencer, 5 Gel-based technique, 10 Gel electrophoresis, 63 Gene ontology (GO) database, 108 Genome, 2 Global analysis of protein expression, 14 Global proteome machine, 71 Glucocorticoid receptor, 197 GluR6 kainate receptor, 197 Glutathione-S-transferase (GST)-tag, 176, 188 Glycosylation, 17, 21 Glysine–glysine (GG) modification, 125 Gradient device, 80

INDEX GST-fused hParvulin (GST-hParvulin), 192 GST-hParvulin, 189 GST-Sepharose, 189 H/ACA snoRNAs, 205 HA-tag, 176 High performance liquid chromatography (HPLC), 3 His-tag, 176 Histon H2AX-associating protein complex, 197 Histon H2B monoubiquitinating Rad6/Bre1, 197 Horizontially transferred gene, 100 hParvulin-associated complex, 194 HSA, 89 Human nucleolar proteome, 236 Human plasma, 5 Human pre-rRNP complexes, 211 Human ribosome biogenesis, 209 Human serum albumin (HAS), 86 Human trans-acting factors, 211 Human/mammalian ribosome biogenesis, 231 Hydrophilic interaction chromatoghraphy, 122 Hydrophobicity, 90 ICAT, see Isotope-coded affinity tag ICAT reagents, 134 IGOT, see Isotope-coded glycosylation-site specific tagging IκB, 198 Image analyzing apparatus, 5 Immobilized pH gradient, 5 Immobilized protein beads, 187 Immunoaffinity purification, 180, 184 Immunocytochemical analysis, 221 Immunoprecipitation, 208 In vitro labeling, 132 In vivo labeling, 132 In-gel protease digestion, 9 In-gel protease digestion (Protocol), 10 Integrated capillary column, 73 Interaction proteomics, 16 Interactome, 16 Internal standard, 148 Inward rectifier potassium channel (Kir2.x)-associated complex, 197 Ion-exchange column, 71, 78 Ion source, 6 Ion-trap (IT), 6 Ion trap FT-ICR, 234 Ionization, 6 Isobaric tag, 136 Isoelectric point (pI), 3, 99

251 Isolation of pre-rRNP complexes, 220 Isotope-coded affinity tag (ICAT), 69, 127, 170 Isotope-coded glycosylation-site specific tagging (IGOT), 21, 118, 123 Isotope labeled O-methylisourea, 226, 228 Isotope-labeled reagents, 224 iTRAQ, 138 iTRAQ reagents, 135 Keystone style arch, 75 Label-free quantification, 145, 146 Large scale identification technology (LSIT) with 2D-LC-MS/MS, 68 Large scale immunoprecipitation, 15 Laser-capture microdissection, 14 LC-FTICR MS/MS, 147 Lectin column, 124 Liquid chromatography, 63 Loading capacity, 91 LSU knob, 211 Machinery proteomics, 16 MALDI-ToF, 8 MALDI-ToF mass spectrometer, 183 Malignant progeression, 208 Mammalian ribosome biogenesis, 209 Mascot, 12, 66, 68, 88, 143 Mass analyzer, 6 Mass spectrometry (MS), 6 Mass-to-charge ratio (m/z), 6 Mass-coded abundance tagging (MCAT), 139 MassLynx, 88 Mataborome, 2 α-Mating factor, 172 Matrix molecule, 7 Matrix-assisted laser desorption (MALDI), 6, 7 Metabolic labeling, 234 Metabotropic glutamate receptor 5 protein, 197 β-Methylaminoethylcycteine, 114 Methylation, 17, 22 N-methyl-D-aspartate (NMDA) receptor-adhesion protein, 197 MG132 (proteasome inhibitor), 234 Michael addition, 120 Micropipette puller, 75 Mitochondria proteome, 233 Modification specific proteomics, 16, 17, 109 Molecular weight (mass)(Mr), 3, 9, 99 Moment-by-moment snapshot analysis, 224 Monoclonal IgG1-fixed protein-G beads, 184 Monolith column, 79 Mouse embryonic stem (ES) cell, 104

252 MS/MS instrument, 10 MS/MS ion search, 12 MS2Ratio, 144 MS-based quantification, 131 MS-Fit, 9, 186 MSQuant, 152 Multidimensional LC-MS/MS, 64 Multidimensional liquid chromatography (multi-LC), 64 Multidimensional protein identification technology (MudPIT), 68 Multidimensional separation, 92 Multi-protein complexes, 167, 196 myc-tag, 176 mzXML, 71 Nanoflow LC system, 80 Nanoflow pump, 83 Nano-LC-MS/MS, 209 Natural isotope abundance, 229 Natural isotope abundance ratio, 141 Natural isotopes, 141 NCBInr, 88 NF-E2p18/MafK, 173 NF-κB, 198 N-glycosylation site, 118 N-glycosylation site mapping, 117 Nitration, 28 2-Nitrobenzenesulfenyl (NBS)-Cl13C6, 138 2-Nitrobenzenesulfenyl (NBS) reagents, 135 S-Nitrosylation, 27 Nop56-associated pre-rRNP complex, 232 Northern blot analysis, 201 Nuclear pore complex, 169, 197 Nucleolar proteome, 234 Nucleolin, 208 Nucleolin (NCL)-associated pre-rRNP complex, 223, 230 Nucleolus, 170, 233 Nucleophosmin (B23), 208 Nucleophosmin 3 (NPM3), 219 Nucleoplasm, 204 Nucleus/cytoplasm, 205 One-dimensional (1D)-RPLC-MS/MS, 84 18 O-labeled peptide, 123 18 O-labeling, 138 18 O-water, 138 O-glycosylation, 120 O-methylisourea, 135, 139, 225 Organellar proteomics, 16, 101 Outline pathway of ribosome synthesis in yeast, 204

INDEX Packing materials, 78 Par14-associated pre-rRNP complexes, 218, 232 Particle size, 78 Peptide mass fingerprinting (PMF), 8, 186 Peptide sequence tag, 11, 12 Peptide-N-glycosidase (PNGase), 120 Peptidyl prolyl cis-trans isomerase, 188, 195 Perl language, 144 Phosphopeptide, 110 Phosphoproteomics, 110 Phosphorylation, 17, 18 Phosphorylation site, 116 Phosphorylation site mapping, 109, 114, 115 Phosphoserine, 114 Phosphospecific proteolysis, 115 Phosphothreonine, 114 pI, see Isoelectric point Pin1, 189 Plasma membrane, 102 Playing hands style arch, 75 PMF method, see Peptide mass fingerprinting Polyubiquitin, 124 Porosity, 78 Post-transcriptional modification, 206 Post-translational modification (PTM), 17, 109, 168, 206 Pre-40S particles, 206 Pre-60 particles, 201 Pre-60S processing factors, 218 Pre-90S processing factors, 218 Precursor ion scanning, 18 Preparation of nuclear extract, 194 Preribosomal ribonucleoprotein (pre-rRNP) complex, 196, 199, 201, 209 Preribosomal RNA (pre-rRNA), 198 Pre-rRNA processing machinery, 219 PreSession recognition sequence, 188 Presynaptic MALS-CASK-liprin-α, 197 Processing pathway of the primary 35S prerRNA, 203 Processome, 206 Protease digestion, 222 Protease digestion in solution, 186 Proteasome, 125 Protein abundance index (PAI), 151 Protein chemistry, 2 Protein cleavage-isotope dilution mass spectrometry (PC-IDMS), 149 Protein content, 152 Protein glycosylation, 117 Protein identification, 107 Protein phosphorylation, 109 Protein profiling, 65

INDEX Protein-A, 180 Protein-G, 180 ProteinProspecter, 141 Protein-protein interaction, 168 Proteome, 1 Proteome analysis, 1 Proteome-browsing technology, 150 Proteomics, 1 PTMs, see Post-translational modification Pull-down purification, 187 Pulse chase, 201 Pulse chase monitored by quantitative MS (PC/QMS), 207 QTAX, see Quantitative analysis of tandem affinity-purified in vivo cross-linked protein complexes Quantitative analysis, 224 Quantitative analysis of tandem affinity-purified in vivo cross-linked protein complexes (QTAX), 173, 207 Quantitative proteomics, 131, 140, 170 RCA120, 124 rDNA, 219 rDNA transcription, 219 RecQ5-specific antibody-fixed protein-G beads, 183 Reference peptides, 150 RelEx, 143 ReNCon device, 81 ReNCon system, 80 Retention time, 90 Reversed-phase (RP) column, 71, 72 Reverse-tagging, 171 Reverse-tagging analysis, 210, 220 Reverse-tagging approach, 211 Reverse-tagging methodology, 174 Reverse-tagging strategy, 201 Ribosomal protein, 131 Ribosome biogenesis, 169, 198 Ribosome RNA (rRNA), 200 RNA interference, 179 RNA polymerase (RNAP) II, 170 RNA polymerase II preinitiation complex, 172, 197 RNase treatment, 222, 230 Sedimentation profile, 203 Semiquantitative mass spectrometric analysis, 226 SEQUEST, 12, 66, 68, 143 Shotgun, 102 Shotgun analysis, 68, 220 Shotgun approach, 130

253 Shotgun method, 186, 210 SILAC, see Stable isotope labeling by amino acids in cell culture Silver staining, 4 Sin3-histon deacetylase (HDAC), 197 Size exclusion (SE), 72 Size of pre-rRNAs, 201 Small nucleolar RNA (snoRNA), 200 Small subunit (SSU) processome, 211 Small ubiquitin-like modifier (SUMO), 126 Snapshot analysis, 201, 211 Sodium dodecyl sulfate (SDS), 3 Sonar, 143 SpecArray, 147 Spliceosome, 197 Split flow system, 79 Splitter, 79 Stable isotope-labeled reference peptide, 148 Stable isotope labeling, 224 Stable isotope labeling by amino acids in cell culture (SILAC), 133, 134, 168, 233 Stable isotope tagging, 122 Stable-isotope dilution, 132 Stage-specific protein association, 173 STEM, 13, 69, 88, 143, 225, 229 Strong cation exchange (SCX), 72 Strong cation-exchange (SCX) column, 91 Subcellular proteomics, 16, 101 Sulfo-N-hydroxysuccinimide (NHS)-LC-biotin, 98, 102, 108 Sumoylation, 17, 24 System biology, 168 Tabacco etch virus (TEV)-protease, 175 Tandem affinity purification (TAP)-tag, 177 Tandem MS (MS/MS), 10, 11 TAP strategy, 197 TAP-MS approach, 198 TAP-tag, 175, 176, 179 TAP-tagged trans-acting factors, 201 TATA-binding protein, 171 Taylor cone, 74 Thrombin, 193 Thrombin cleavage, 188, 189 Time of flight (ToF), 6 TNF-α/NF-κB pathway, 197 TNF-α/NF-κB signal transduction pathway, 199 ToF-ToF, 8 Top-down approach, 10, 70 Trans-acting factor, 200 Trans-acting factors involved in human/ mammalian ribosome biogenesis, 213 Transcription, 169

INDEX

254 Transcription machinery, 197 Transcriptome, 2 Translation, 169 Treacher Collins syndrome, 208, 218 Treacle, 218 Triple-stage quadrupole (TSQ), 6 Trypsin, 8 Tumor suppressor Arf, 232, 219 2D-LC system, 90 2D-PAGE, 17, 64 Two-dimensional differential image gel electrophoresis (2D-DIGE), 4, 131 Two-dimensional electrophoresis (2DE), 3, 5 Two-dimensional electrophoresis (2DE) catalog database, 13 Two-dimensional electrophoresis (2DE) gel-based protein profiling, 13 Two-dimensional image-converted analysis of liquid chromatography and mass spectrometry (2D-ICAL), 147 Two-step affinity purification (TAP), 175 U3 snoRNA, 201 U3-snoRNA-associated processing machinery, 206

U3 snoRNP, 205 U3 snoRNP complex, 179 UBF1, 219 Ubiquitin E3 ligase, 125 Ubiquitinated protein, 129 Ubiquitination, 17, 23, 124 Ubiquitin-proteasome system, 232 Ubiquitin-protein conjugate, 130 Ubiquitin-related protein, 129 Ubiquitin-ubiquitin linkage, 126 Ultracentrifugation of pre-rRNP complexes, 221 Universal signal response factor, 153 Wnt signaling complex, 197 XIC, 233 xPAI, 233 XPRESS, 143 y series ion, 11, 12 Yeast cell fusion library, 14 Yeast transcription factor STE12, 172 Yellow fluorescent protein (YFP), 235

Fig. 1-4. Protein identification by in-gel protease digestion and peptide mass fingerprinting. In this method, a target protein, such as that separated by two-dimensional electrophoresis, is in-gel digested with a sequence-specific protease, usually trypsin, and the resulting peptide mixture is analyzed by mass spectrometry, thus generating a peptide mass fingerprint. Experimentally determined peptide masses are compared with those obtained theoretically by a search algorithm such as MSFit or Mascot.

Fig. 1-5. Protein identification by tandem, or MS/MS, mass spectrometry. In this method, individual peptides from a peptide mixture are isolated in the first step in the mass spectrometer and fragmented by collision-induced dissociation (CID), that is, by collision with an inert gas in a collision cell, during the second step in order to obtain the structural information of the peptide.

(C)

Hydrophobicity

(M S)

at om

M as s sp ec tro m et ry

on

g an ch x e

hr ec

hy ap r og

Structural information

)I

(1

(2) Reversed phase chromatography

as s

e

(3 )M

g ar Ch

Fig. 2-1.

(4) Tandem MS

Fig. 2-1. (A) Schematic diagram of an automated multidimensional LC-MS/MS system. The system is composed of two independent LC assemblies equipped with an ion-exchange or a reversed-phase microcapillary column connected in tandem through a column-switching valve and a reversed-phase “trap” column-based solvent desalting unit, and a mass spectrometer with electrospray source. The large amount of MS and MS/MS data collected are then processed by a series of data processing and database-searching programs (e.g., Mascot or SEQUEST) for automatic assignment of peptides and proteins on a genome/protein sequence database. [From R. J. Simpson (Ed.), Proteins and Proteomics: A Laboratory Manual, Cold Spring Harbor Laboratory Press (2002).] In another system, two microcapillary columns are replaced by a single biphasic column packed with an ion-exchange and a reversed-phase packing material in series (inset). [Adopted by permission from Macmillan Publishers Ltd.; Link et al., Nat. Biotechnol. (Ref. 6) (1999). Reprinted from Issaq et al., J. Chromatogr. B. (Ref. 7) (2005), with permission from Elsevier.] (C) Hierarchy of peptide separation and MS-based structural analysis in multidimensional LC-MS/MS technology. The peptides in a complex biological mixture are first separated by ion-exchange chromatography according to the charge characteristics, and subsets of peptides eluted from the ion-exchange column are further separated by reversed-phase chromatography according to the hydrophobicity. The eluate is continuously sprayed into the mass spectrometer through an electrospray interface for “data-dependent” tandem MS, in which MS and MS/MS data of each peptide are automatically collected for subsequent structural analysis.

Fig. 2-2. “Shotgun” analysis by 1D-LC- or multidimensional LC-MS/MS. In this analysis, numerous peptides produced by protease digestion of complex biological samples, such as crude cell extracts or protein complexes, are continuously sprayed, like a shotgun, from the LC spray-tip into the mass spectrometer, and a large volume of generated MS and MS/MS data are processed by a data retrieval system for automatic assignment of peptides and proteins on a genome/protein sequence database.

Mr calculated (kDa)

1000

100

10

1 3

5

7

9

11

13

pI calculated

Fig. 2-9. Two-dimensional display of the proteome that was experimentally obtained by the shotgun analysis (orange: 1480 proteins) or predicted from the genome sequence (m52p) (yellow: 4280 entries), and the proteome predicted from the genes encoded within horizontally transferred genes called “k-loop” (blue: 490 protein). Mr and pI of the entire proteome were calculated from the amino acid sequences without considering post-translational modifications. The y-axis is presented as a logarithmic scale.

Fig. 2-10. Subcellular localization of FLAG-tagged proteins expressed in D3 cells. Panels (top to bottom): RIKEN cDNA B430119L13, trophoblast glycoprotein, glycoprotein A33, hypothetical protein D7Ertd458e, and empty vector. Cells are stained with anti-FLAG, antiCD9 (an ES cell plasma membrane marker), and Hoechst 33342.

Fig. 2-16. Schematic view of different strategies for protein quantification, intensity corrected. In the isotope labeling with O-methylisourea, the resulting mass signals of a peptide pair labeled with the heavy or light reagent overlap as a consequence of a small mass difference between labeling agents. Thus, the observed signal ratio is corrected by calculating the natural peptide signal ratio that is estimated theoretically from the natural isotope abundance ratio.

Fig. 2-18. Absolute quantification by MS-based technology using reference peptides. To quantify the absolute amount of a particular peptide in a sample mixture, known amounts of the isotopically labeled peptide are spiked into the mixture as a reference standard and subjected to LC-MS analysis. The amount of the peptide present in the sample mixture is estimated by comparing the MS signal intensity with that of the reference peptide. A typical calibration standard curve is illustrated in the figure.

Fig. 3-1. Proteomics approach to the dynamic analysis of affinity-purified cellular machinery (multiprotein complexes). (B) Reverse-tagging approach to isolate protein complexes. In this technology, a target protein complex is first isolated by use of first bait protein with an epitope tag. After proteomic characterization, another protein component identified in the complex is selected as second bait to fish protein complexes from the cell. (From Ref. 32.) (C) Assembly snapshot analysis of cellular machinery (multiprotein complex). The approach can be used to isolate protein complexes (machinery) with some differences as sequential snapshots of the biochemical composition of cellular machinery (multiprotein complexes) related to one another or formed in a series of processes in a specific biological event. (From Ref. 24.)

Fig. 3-6.

Fig. 3-6. (A) Assembly snapshot analysis of pre-rRNP complexes formed at various stages of ribosome biogenesis. Using appropriate protein as affinity bait can isolate each of the prerRNP complexes formed at various stages of ribosome biogenesis. (C) The 90S preribosomal complex is proposed to contain the 35S rRNA, the U3 snoRNA, and some 40S processing factors. The early pre-rRNA is cleaved at A0, A1, and A2, and the processed products are eventually incorporated into mature 40S and 60S particles. (From Ref. 116.) (D) Outline pathway of ribosome synthesis in yeast. The pre-90S particle is proposed to contain the 35S/33S/32S rRNA and many trans-acting factors including SSU processome (indicated in square). The early pre-rRNA cleavages [at sites A0 to A2 in part (C)] lead to the separation of the pre-40S and pre-60S particles. In both pathways, a series of intermediates are formed (pre-60S E0, pre-60S E1, pre-60S E2, pre-60S M, pre-60S L, and pre-60C for intermediates of large 60S subunit, and pre-40S E and pre-40S L for intermediates of small 40S subunit) according to their position on the proposed pathway. Each of the intermediates is proposed to contain the pre-rRNAs indicated. Trans-acting factors are indicated at both sides of the intermediates and boxed with an arrow bar to show their presence on the corresponding intermediates. Probable cellular localization of the intermediates is also indicated. Final maturation of large and small subunits occurs in the cytoplasm, and they are also included (matured 60S and matured 40S). Note that it is very probable that other preribosomal complexes exist in addition to those shown, and it is not clear in what order the components are gained and lost between the complexes.

Fig. 3-8. Quantitative analysis of affinity=purified multiprotein complexes using isotopelabeled O-methylisourea and LC-MS/MS. (A) In this technology, a pair of target protein complexes are digested with protease such as trypsin and resulting peptide mixtures are differentially labeled in vitro with “light” (normal) or “heavy” (13C, 15N-labeled) O-methylisourea. The N-guanidinated peptide mixtures are then combined and analyzed by a LC-MS/MS system in a data-dependent mode, in which the relative abundance of each peptide pair is quantitated from the relative intensities of their MS signals while identifying the peptide by the MS/MS signals.

WILEY-INTERSCIENCE SERIES IN MASS SPECTROMETRY Series Editors Dominic M. Desiderio Departments of Neurology and Biochemistry University of Tennessee Health Science Center Nico M. M. Nibbering Vrije Universiteit Amsterdam, The Netherlands



John R. de Laeter Applications of Inorganic Mass Spectrometry Michael Kinter and Nicholas E. Sherman Protein Sequencing and Identification Using Tandem Mass Spectrometry Chhabil Dass Principles and Practice of Biological Mass Spectrometry Mike S. Lee LC/MS Applications in Drug Development Jerzy Silberring and Rolf Eckman Mass Spectrometry and Hyphenated Techniques in Neuropeptide Research J. Wayne Rabalais Principles and Applications of Ion Scattering Spectrometry: Surface Chemical and Structural Analysis Mahmoud Hamdan and Pier Giorgio Righetti Proteomics Today: Protein Assessment and Biomarkers Using Mass Spectrometry, 2D Electrophoresis, and Microarray Technology Igor A. Kaltashov and Stephen J. Eyles Mass Spectrometry in Biophysics: Cofirmation and Dynamics of Biomolecules Isabella Dalle-Donne, Andrea Scaloni, and D. Allan Butterfield Redox Proteomics: From Protein Modifications to Cellular Dysfunction and Diseases Silas G. Villas-Boas, Ute Roessner, Michael A.E. Hansen, Jorn Smedsgaard, and Jens Nielsen Metabolome Analysis: An Introduction Mahmoud H. Hamdan Cancer Biomarkers: Analytical Techniques for Discovery Chabbil Dass Fundamentals of Contemporary Mass Spectrometry Kevin M. Downard (Editor) Mass Spectrometry of Protein Interactions Nobuhiro Takahashi and Toshiaki Isobe Proteomic Biology Using LC-MS: Large Scale Analysis of Cellular Dynamics and Function



• •