Metabolomics: The Frontier of Systems Biology 4431251219, 9784431251217

Metabolism is the sum of the chemical reactions in cells that produce life-sustaining chemical energy and metabolites. I

121 78 15MB

English Pages 256 [267] Year 2005

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Metabolomics: The Frontier of Systems Biology
 4431251219, 9784431251217

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

M. Tomita T. Nishioka (Eds.) Metabolomics The Frontier of Systems Biology

M. Tomita, T. Nishioka (Eds.)

Metabolomics The Frontier of Systems Biology

With 112 Figures, Including 4 in Color

Springer

Masaru Tomita, Ph.D. Professor and Director Institute for Advanced Biosciences Keio University Tsuruoka 997-0035, Japan Takaaki Nishioka, Ph.D. Professor Graduate School of Agricuhure Kyoto University Kyoto 606-8502, Japan

This book is based on the Japanese original, M. Tomita, T. Nishioka (Eds.), Metabolome Kenkyu no Saizensen, Springer-Verlag Tokyo, 2003. Library of Congress Control Number: 2005928331 ISBN 4-431-25121-9 Springer-Verlag Tokyo Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Product liability: The publishers cannot guarantee the accuracy of any information about dosage and application contained in this book. In every individual case the user must check such information by consulting the relevant literature. Springer is a part of Springer Science+Business Media springeronline.com

© Springer-Vertag Tokyo 2005 Printed in Japan Typesetting: Camera-ready by the editor. Printing and binding: Nikkei Printing, Japan Printed on acid-free paper

Preface The aim of this book is to review metabolomics research. The information is presented in a way that allows the reader to view the subject of metabolomics from a broad perspective. Creative and progressive research on metabolomes began in Japan and Germany in the 1990s and ranged from the development of specialized chemical analytical techniques to the construction of databases and methods for metabolic simulation. The authors have been directly involved in the development of all the subject areas that are discussed in this book, including research related to capillary electrophoresis, liquid chromatography, mass spectrometry, metabolic databases, and metaboUc simulation. As the title suggests, the latest cutting-edge research projects are presented here. In addition, a selected group of applied cases, representative of likely future scenarios, is presented. It is our hope that this book will generate further metabolomic research across a broad range of life-science disciplines, and that hitherto unforeseen applications and innovative technologies will arise from such efforts. It is especially important that medical institutions and venture enterprises should actively participate in metabolomic research, thus ensuring that it matures into a discipline offering practical medical benefits. To promote the application of metabolomic research, a number of key issues will require breakthroughs; these include the popularization of chemical analytical techniques, the development of simple but stable specialized analyzers, the parallel processing and miniaturization of such devices, and the advancement of metabolic systems biology. Researchers, technicians, and university students are urged to take on these challenges to advance metabolomic research. We would like to thank the staff of Springer-Verlag, Tokyo, for their help in bringing this book to fruition. Masaru Tomita Institute for Advanced Biosciences Keio University Takaaki Nishioka Division of Applied Biosciences Graduate School of Agriculture Kyoto University

Contents Preface

V

Color Plates

IX

Part I. Introduction Chapter 1: Overview M. Tomita Part II. Analytical Methods for Metabolome Sciences Chapter 2: Development and Application of Capillary Electrophoresis-Mass Spectrometry Methods to Metabolomics T. Soga

1

7

Chapter 3: Application of Electrospray Ionization Mass Spectrometry for Metabolomics R. Taguchi

25

Chapter 4: High-Performance Liquid Chromatography and Liquid Chromatography/Mass Spectrometry Analyses of Metabolites in Microorganisms H. Miyano

37

Chapter 5: Metabolome Profiling of Human Urine with Capillary Gas Chromatography/Mass Spectrometry T.Kuhara

53

Chapter 6: Metabolic Profiling by Fourier-Transform Ion Cyclotron Resonance Mass Spectrometry (FT-ICR-MS) and Electrospray Ionization Quadrupole Time-Of-Flight Mass Spectrometry (ESI-Q-TOF-MS) K. Hirayama

75

Chapter 7: Metabolome Analysis by Capillary Electrophoresis L. Jiaand S. Terabe

91

Chapter 8: High-Performance Liquid Chromatography for Metabolomics: High-Efficiency Separations Utilizing Monolithic Silica Columns T. Ikegami, H. Kobayashi, H. Kimura, V.V. Tolstikov, O. Fiehn, and N. Tanaka 107

VIII

Contents

Part III. Applications of Metabolome Analysis to Biosciences Chapter 9: Combined Analysis of Metabolome and Transcriptome: Catabolism in Bacillus subtilis T. Nishioka, K. Matsuda, and Y. Fujita

127

Chapter 10: Metabolomics in Arabidopsis thaliana K. Saito

141

Chapter 11: Lipidomics: Metabolic Analysis of Phospholipids R. Taguchi

155

Chapter 12: Chemical Diagnosis of Inborn Errors of Metabolism and Metabolome Analysis of Urine by Capillary Gas Chromatography/Mass Spectrometry T.Kuhara

167

Part IV. Metabolome Informatics Chapter 13: Introduction to the ARM Database: Database on Chemical Transformations in Metabolism for Tracing Pathways M.Arita 193 Chapter 14: The Genome-Based E-CELL Modeling (GEM) System K. Arakawa, Y. Yamada, K. Shinoda, Y. Nakayama, and M.Tomita

211

Chapter 15: Hybrid Dynamic/Static Method for Large-Scale Simulation of Metabolism and its Implementation to the E-CELL System Y. Nakayama

221

Part V. Metabolomics and Medical Sciences Chapter 16: Metabolomics and Medical Sciences T. Nishioka

233

Index

245

m

U

o 6 D

O

Cd C

O D Ö

S C

o . Ü

Ö.5

m/z87 m/z89

g jl: e) i2 M l i^^S:feg3 tgorou.^

\K.

'"^

m/z191 m/z195 m /z 229 m /z 259

Cmin]

Fig. 6. CE/MS electropherograms for a standard mixture (100 jaM each) of 25 metabolites of the glycolytic, TCA, and pentose phosphate pathways. PEP, phosphoenol pyruvate; DHAP, dihydroxyacetone phosphate; 3PG, 3-phosphoglycerate; E4P, erythrose 4-phosphate; Ru5P, ribulose 5-phosphate; R5P, ribose 5-phosphate; GIP, glucose 1-phosphate; F6P, fructose 6-phosphate; G6P, glucose 6-phosphate; 2,3DPG, 2,3-diphosphoglycerate; F16P, fructose 1,6-diphosphate (reproduced from Ref. [15] with permission from the American Chemical Society) Figure 6 illustrates the electropherograms obtained following the analysis of a 25-anionic standard mixture of glycolytic, TCA, and pentose phosphate pathways obtained by CE/MS. Since the deprotonated molecular ion, [M-H]~, dominated the mass spectrum for each compound, anions were detected at their deprotonated molecular weights [15]. Although the migration times of succinate, malate, 2-oxoglutarate, and phosphoenol pyruvate are very close, they were selectively detected by MS. Even isomers such as ribulose 5-phosphate (Ru5P) and ribose 5-phosphate (R5P), and glucose 1-phosphate (GIP), fructose 6-phosphate (F6P), and glucose 6phosphate (G6P) were resolved. A total of 124 anionic metabolite standards including carboxylic acids, phosphorylated carboxylic acids, phosphorylated sugars, and other phosphorylated compounds were determined bythismethod[10].

Development and Application of CE/MS 17

3.4. Nucleotide- and Coenzyme A (CoA)-Related Compound Analysis When using the above anion analysis configuration, significant adsorption of multivalent ions (e.g., nucleotides and CoA derivatives) on the cationic-coated capillary occurs (Fig. 7a) [16]. To prevent adsorption and allow^ precise quantification, a pressure-assisted CE/MS technique (to counteract the EOF) using a noncharged polymer coated capillary (Fig. 7b) was developed [16]. Although the migration times for most of the compounds were very close, 41 different nucleotides including cyclic nucleotides, deoxynucleotides, and CoA derivatives were simultaneously determined (Fig. 8) [10]. Altogether, the three different CE/MS configurations allowed the analysis of a total of 352 metabolic standards [10].

Fig. 7. Schematic of pressure-assisted CE/MS system for the analysis of nucleotide and coenzyme A (CoA)-related compounds, a Multivalent anions such as nucleotides and CoA-related compounds readily adsorbed on the SMILE(+) cationic polymer coated capillary which was used for anion analysis by CE/MS. b To avoid adsorption, a noncharged polymer-coated capillary was employed instead, and constant flow of mobile phase toward the mass spectrometer was driven by applying air pressure to the capillary inlet vial to prevent current drop

18

T. Soga

J^£B46 !^:^^ ^-^m: dUMP _.^..„ M P P -L-2MP

m/z303 m/z304 m/z306 m/z307 m/z321 m /z 322

.A_yM£__

m/z323

A-CAÜ^P d A M- ^P _ , ,_. AcGMP AAMPdGMP

m/z328 m/z330 \J\J\J m/z344 ^/^346

Ill/ ^

J U M E

m /z 347

-A.2ME

m /z 362

JA dCDP -JQDE i CDP -i> UDP i dADP A ADPdGDP HDP k GDP idCTP i dUTP A TTP i CTP I UTP dATP iiATP dGTP JLJME * GTP j y NAD J NADH J NADP -AJfclADPH -JLCoA^

ni /z 386 m/z401 m /z 402 m /z 403 m/z410 m /z 426 m /z 427 m /z 442 m /z 466 m /z 467 m /z 481 m /z 482 m /z 483 m /z 490 m /z 506 m /z 507 m /z 522 m /z 662 m /z 664 ^ / 3 742 ^ /^ 744 ^ /2 766

-A-EAS

m /z 784

A Acetyl CoA i Succinvl CoA

^ /^ 808 ^n /z 866

10

15

20

Cmin]

Fig. 8. Analysis of a standard mixture (100 |LIM each) of 41 nucleotide and CoA compounds by pressure-assisted CE/MS. The migration time of a number of nucleotides were close, but they could be easily differentiated by MS (reproduced from Ref [10] with permission from the American Chemical Society)

Development and Application of CE/MS 19

3.5. Comprehensive Metabolome Analysis by CE/MS and Application to 6. subtilis Extracts In this section, comprehensive methods for metabolome analysis using CE/MS are described. As mentioned before, CE/MS methods permit infusion of any charged species into MS. Therefore, monitoring eluting ions over a w^ide range of m/z values by MS enables the comprehensive analysis of charged metabolites. The above-described CE/MS methods w^ere applied to the analysis of metabolites ranging between 70 and 1000 m/z values in B. subtilis extracts. Since a large number of metabolites are present at different concentrations in B. subtilis, it was necessary to limit the range in single ion monitoring (SIM) mode to a window^ of 30 different m/z values to maximize detection sensitivity. This narrow m/z scanning technique allowed a several-fold increase in sensitivity and detection of a large number of metabolites. To cover the necessary mass range (701000), each sample was analyzed successively 33 times using an automatic injection sequence while varying the m/z monitoring range between 70 and 1000 in both cation and anion modes [10]. Figure 9 shows the results for cations extracted from exponentially-growing B. subtilis cells (71 0.5) for a 101-130 m/z value range as obtained by this method. The peak contents were elucidated by comparing the components' molecular weight and migration time with those of metabolite standards. Over the range of 101-130 m/z, and among the corresponding 62 peaks, we could identify 18 metabolites such as cadaverine, GAB A, A^,A^-dimethylglycine, diethanolamine and serine. Although complete analysis took over 16 hours, the whole procedure is highly automated and could reveal up to 1053 cationic metabolites including from exponentially-growing B. subtilis cells including 70 clearly identified ions. For anions, the CE/MS method yielded 637 anions including 78 important metabolites involved in glycolysis, the TCA cycle, and pentose phosphate pathways. Finally, several nucleotides and CoA-related compounds were detected using the third CE/MS method. Overall, a total of 1692 metabolites, including 150 positively identified compounds from exponentially growing B. subtilis cell extracts, were determined using the three different CE/MS methods [10]. The sensitivity of these methods was extremely high, revealing the presence of as little as 40 zepto moles of adenine and 350 attomoles of glutamate per cell. The relative standard deviation (%RSD) (n=5) for metabolite quantification for the whole procedure, including metabolite extraction through CE/MS analysis for 63 identified metabolites, was generally between 2% and 40% for peak areas, except for the smallest peaks where the %RSD was larger. For relative migration time, the %RSD for identified metabolites was better than 2.9%

20

T. Soga

(Table 1). These results thus indicate that the relative migration times can help define unidentified peaks.

Ji2 Cadaverinei

0.4797 0 61371 0,4652L. nifithannlamine

, 0 8993 0.8818 110 9067 0.7188 GABA 0 8119 N.N-Dimethylglycine

1 7799/v

JlSer_

iP?7gi.. K (Phenvlenediamine^ 1.7913/> .0.4063

Cvtosine 0.7607 0^.198 Q,3§ja.. t Creatine 0 4801 0.6450 0 6196 0.6789 Pro A 0.9665 (5-Aminopentanamide) 0.4801 Q',jgni'Jinggg?t?fe v^' luLSMl

1.7789 A

1.7758/^

^

L-HorTx?seme

0-4653 b-Phenylethvamine -JQ 3441

, AThf 0 9412

JLCuiinsQ-6279 JLa44M,

-K0344ß 0 6516j. A5-Methvlcvtosine lmidazole-4-acetate i 0.6815 n fi453°kP^QC760^0 8394 • »ilts-LYPjn.el^-l^ct^m) Octylamine 0 8998Ka9071

Taurine ^

m/z m/z m/z m/z m/z m/z m/z m/z m/z m /z m/z m/z m/z m/z m/z m/z m/z m/z m/z m/z m/z m/z m/z m/z m/z m/z m/z m/z m/z m/z

101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130

Cmin]

Fig. 9. Selective ion electropherograms for cationic metabolites in exponentially-growing (T-0.5) B. subtilis 168 cells in the range of 101 to 130 m/z. The numbers on top of unidentified peaks are relative migration times normalized with methionine sulfone (internal standard) (reproduced from Ref. [10] with permissionfromthe American Chemical Society)

3.6. Comparative Metabolome Analysis During B. subtilis Sporulation Nutritional limitation leads Bacillus species such as B. subtilis and Bacillus anthracis to produce a dormant, environmentally resistant spore [17]. This phenomenon is universally accepted as a basic model of bacterial dif-

Development and Application of CE/MS 21 ferentiation. The complex morphological changes that occur during sporulation are thought to be tightly controlled by metabolic networks [18]. However, no comprehensive approach has been used to demonstrate alterations in metabolite profiles. We thus took advantage of the abovedescribed CE/MS methods to perform metabolite profiling in B. subtilis cells at different time points before and during spore formation stages The levels of both cationic and anionic metabolites at J-o.s (0.5 h before To\ To (corresponding to the end of exponential growth), and T2 (2 h after To) were measured by CE/MS and changes in metabolite levels were compared. (Fig. 10) [10]. B. subtilis cells undergo sporulation under conditions of glucose deprivation [19]. Interestingly, the level of most metabolites in the glycolytic, pentose phosphate, and TCA pathways markedly decreased in the early stage of sporulation (Fig. 10a). In particular, the level of fructose 1,6disphosphate (F16P) rapidly dropped more than 100-fold during sporulation. Since F16P is a key factor in catabolite repression mediated by the transcriptional factors CcpA and CcpC, it is possible that the decrease in F16P results in suppression of catabolite control and a subsequent expression of catabolic genes involved in sporulation [18]. Sonenshein and coworkers previously showed that all inactivating mutations in the B. subtilis TCA cycle genes cause a defect in sporulation, thus suggesting that activation of the TCA cycle is indispensable for sporulation [18]. In our experiments, both c/5-aconitate and isocitrate, intermediates in the TCA cycle, were shown to accumulate at To (Fig. 10a). Subsequently, the concentration of these metabolites, malate, and 2oxoglutarate decreased, while acetyl CoA and succinyl CoA increased at T2 (Fig. 10b) [10]. These findings are in good agreement with previously reported studies regarding changes in en2yme and metabolite levels in the TCA pathway, demonstrating the power of our CE/MS methods to monitor these metabolic responses simultaneously [20]. Transcriptional alterations of gene expression during sporulation have been previously measured using DNA array techniques [21,22]. The expression of most genes involved in these metabolic pathways was found to decrease during sporulation. On the other hand, the level of several metabolites such as c/5-aconitate, isocitrate, CoA, acetyl CoA, succinyl CoA, lysine and ß-alanine were found to increase in our study. These results suggest that the metabolic consequences of gene expression changes cannot always be correctly predicted from transcriptome analysis, most likely because metabolism may be regulated at other levels such as posttranscriptional control and/or modification of enzyme activity. Further metabolome research will thus be necessary to better characterize these complex biological phenomena.

22

T. Soga

Fig. 10. Metabolic profiling upon onset of sporulation, based on the simultaneous analysis of charged metabolites by CE/MS. See Color Plate 1. a Changes in metabolite levels during the late logarithmic growth phase (To vs Tlo.s). (b) Changes in metabolite levels during the early stage of sporulation (T2 vs Tlo.s)Magenta and red boxes indicate metabolites whose levels increased 2- to 10fold and more than 10-fold, respectively. Light blue and indigo boxes indicate the metabolites whose level was decreased to 0.1—0.5 and less than 0.1 of the original level, respectively. White boxes represent the metabolites whose levels remain approximately the same. Black boxes with white lettering indicate the undetected metabolites. The B. subtilis metabolic map was constructed based on the ARM database [24] (see www.metabolome.jp/arm.html) (reproduced from Ref. [10] with permission from the American Chemical Society)

4. Conclusion The CE/MS techniques described here enabled the comprehensive, direct, and sensitive analysis of charged species, and revealed the presence of

Development and Application of CE/MS 23 1692 compounds in B. subtilis cells including 150 that could be positively identified. The methods were applied not only to bacteria but also more recently to plant and mammalian cells, yielding quantitative values for a considerable number of metabolites [23]. However, about 90% of the metabolites detected in complex extracts could not as yet be identified. It is thus imperative to also develop powerful methods that will allow the identification of uncharacterized compounds. Toward this goal, CE-time-offlight mass (CE/TOF-MS) might help to determine the chemical formulae of unknown compounds and CE/MS/MS could provide structural information. Since the proposed CE/MS methods greatly facilitate the global determination of charged species, they can be used as universal tools for quantitative metabolome analysis. Metabolome data, together with transcriptome and proteome analysis, will provide important new information to elucidate the biological functions of uncharacterized cellular components.

References 1. Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng, J K, Bumgamer R, Goodlett DR, Aebersold R, Hood L (2001) Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292:929-934 2. Raamsdonk LM, Teusink B, Broadhurst D, Zhang N, Hayes A, Walsh MC, Berden JA, Brindle KM, Kell DB, Rowland JJ, Westerhoff HV, van Dam K, Oliver SO (2001) A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations. Nat Biotechnol 19:45-50 3. Fiehn O, Kopka J, Dormann P, Altmarm T, Trethewey RN, Willmitzer L (2000) Metabolite profiling for plant fiinctional genomics. Nat Biotechnol 18:1157-1161 4. Fiehn O, Kopka J, Trethewey RN, Wilhnitzer L (2000) Identification of uncommon plant metabolites based on calculation of elemental compositions using gas chromatography and quadrupole mass spectrometry. Anal Chem 72 :3573-3580 5. Reo NV (2002) NMR-based metabolomics. Drug Chem Toxicol 25:375-382 6. Aharoni A, Ric de Vos CH, Verhoeven HA, Maliepaard CA, Kruppa G, Bino R, Goodenowe, DB (2002) Nontargeted metabolome analysis by use of Fourier Transform Ion Cyclotron Mass Spectrometry. OMICS 6:217-234 7. Castrillo JI, Hayes A, Mohammed S, Gaskell SJ, Oliver, SG (2003) An optimized protocol for metabolome analysis in yeast using direct infusion electrospray mass spectrometry. Phytochemistry 62:929-937 8. Stenson AC, Landing WM., Marshall AG, Cooper WT (2002) Ionization and fragmentation of humic substances in electrospray ionization Fourier transform-ion cyclotron resonance mass spectrometry. Anal Chem 74:4397^409

24

T. Soga

9. Avery MJ (2003) Ionization and fragmentation of humic substances in electrospray ionization Fourier transform-ion cyclotron resonance mass spectrometry. Rapid Commun Mass Spectrom 17:197-201 10. Soga T, Ohashi Y, Ueno Y, Naraoka H, Tomita M, Nishioka T (2003) Quantitative metabolome analysis using capillary electrophoresis mass spectrometry. J Proteome Res 2:488-494 11. Li SFY (1993) Capillary electrophoresis—^principles, practice and applications. J Chromatogr Library vol. 52. Elsevier, Amsterdam 12. Soga T, Heiger DN (2000) Amino acid analysis by capillary electrophoresis electrospray ionization mass spectrometry. Anal Chem 72:1236-1241 13. Lukacs KD, Jorgenson JW (1987) Capillary zone electrophoresis: effect of physical parameters on separation efficiency and quantitation. J High Res Chromatogr 10:622-624 14. Katayama H, Ishihama Y, Asakawa N (1998) Stable cationic capillary coating with successive multiple ionic polymer layers for capillary electrophoresis. Anal Chem 70:5272-5277 15. Soga T, Ueno Y, Naraoka H, Ohashi Y, Tomita M, Nishioka T (2002) Simultaneous determination of anionic intermediates for Bacillus subtilis metabolic pathways by capillary electrophoresis electrospray ionization mass spectrometry. Anal Chem 74:2233-2239 16. Soga T, Ueno Y, Naraoka H, Matsuda K, Tomita M, Nishioka T (2002) Pressure-assisted capillary electrophoresis electrospray ionization mass spectrometry for analysis of multivalent anions. Anal Chem 74:6224-6229 17. Errington J (1993) Bacillus subtilis sporulation: regulation of gene expression and control of morphogenesis. Microbiol Rev 57:1-33 18. Sonenshein AL, Hoch JA, Losick R (2002) Bacillus subtilis and its closest relatives. ASM Press, Washington DC, pp 129-162 19. Grossman AD (1995) Genetic networks controlling the initiation of sporulation and the development of genetic competence in Bacillus subtilis. Annu Rev Genet 29:477-508 20. Uratani-Wong B, Lopez JM, Freese E (1981) Induction of citric acid cycle enzymes during initiation of sporulation by guanine nucleotide deprivation. J Bacteriol 146:337-344 21. Fawcett P, Eichenberger P, Losick R, Youngman P (2000) The transcriptional profile of early to middle sporulation in Bacillus subtilis. Proc Natl Acad Sei USA 97:8063-8068 22. Britton RA. Eichenberger P, Gonzalez-Pastor JE, Fawcett P, Monson R, Losick, R, Grossman AD (2002) Genome-wide analysis of the stationaryphase sigma factor (sigma-H) regulon of Bacillus subtilis. J Bacteriol 184:4881-4890 23. Sato S, Soga T, Nishioka T, Tomita M (2004) Simultaneous determination of main metabolites in rice leaf usmg capillary electrophoresis mass spectrometry and capillary electrophoresis diode array detector. Plant J 40:151-163 24. Arita M (2003) In silico atomic tracmg by substrate-product relationships in Escherichia coli intermediary metabolism. Genome Res 13:2455-2466

Chapter 3: Application of Electrospray Ionization Mass Spectrometry for Metabolomics Ryo Taguchi Department of Metabolome, Graduate School of Medicine, The University of Tokyo, 7-3-1 Kongo, Bunkyo-ku, Tokyo 113-0033, Japan

1. Introduction

In parallel with the present progress of genomics and proteomics, metabolomics is also recognized to be very important for post genome studies (Fig. 1). In metabolomics, the technique of mass spectrometry (MS) is considered to be a most important tool. Furthermore, recent advances in mass spectrometry have made it possible to acquire comprehensive analyses of metabolites, and in particular, to elucidate precise functions of individual proteins differently expressed in cells. To understand actual physiological functions of individual proteins, in addition to genomics, transcriptomics, and proteomics, metabolomics are essential to obtain a further understanding of each physiological and biological function of proteins. In this process, studies on comprehensive profiling on metabolites in the cells are inevitable. To identify actual substrates for enzyme proteins, low-molecularweight ligands for receptor proteins, and low-molecular-weight metabolites for their carrier proteins, metabolomics by mass spectrometry is very useful. Another aim of metabolomics is to identify biological metabolites from MS data and obtain profiling patterns of alteration of these metabolites under specific circumstances. As a result of these analytical processes of profiling, elucidation of an unknown pathway or the exact substrate specificity of new enzyme proteins can be obtained (Fig. 1). At times, even a new hypothesis can be verified with this process. Recently, simultaneous studies on genetic alterations and metabolome studies have been recognized to be very effective for linkage analyses of metabolic pathways. Findings of soft ionization in mass spectrometry have induced paradigm changes in the applications of mass spectrometry in research [1]. For example, it is possible to deduce a specific hypothesis about the metabolites

26

R. Taguchi

Fig. 1. Basic techniques used in genome, transcriptome, proteome, and metabolome. Mass spectrometry (MS) has become a most popular method in proteomics and metabolomics

that support the facts from the new discoveries regarding metabolites or their specific alteration in a specific biological phenomenon. Thus, comprehensive analyses of metabolic metabolites under genetically, environmentally, or physiologically different conditions are very important (Fig. 2). Electrospray ionization (ESI) [1]. and matrix-assisted laser desorption/ ionization (MALDI) are very mild ionization methods as compared with previous ionization methods. With respect to metabolites as target metabolites of metabolome, individual molecular structures are typically quite conamon and the structural and metabolic relationships between each metabolite are well studied. Thus we can easily imagine their metabolic linkage based on existing knowledge. On the basis of these circumstances, it is possible to obtain effective results from comprehensive analysis of metabolites by mass spectrometric analyses, for elucidating new functions of enzyme proteins including substrate specificities. ESI-MS allows individual metabolites in a mixture to be effectively analyzed [2-10]. Further, connecting a high-performance liquid chromatography (HPLC) system with a mass spectrometer enables more than several hundred or thousand metabolites to be identified at once [10]. By using Fourier-transform ion cyclotron resonance mass spectrometry (FT-ICR-MS), more than several

Application of ESI-MS for Metabolomics

27

Fig. 2. Strategies in metabolomics. Linkage analyses of expression of protein and metabolome are very important in functional studies of enzyme proteins

hundred different metabolites in a mixture eluted at the same retention time can be effectively and separately identified due to its high resolution and accurate mass [4]. By using MS/MS type instruments, individual metabolites of a specific m/z value in the mixture can be identified. LC/ESI-MS and MS/MS are very powerful methods both in basic life science and applicative research such as drug discovery [6]. Features of LC/ESI-MS in metabolomics are that the data from separation by LC and separation with a mass spectrometer can be used in pseudo-two-dimensional analyses, and the construction of databases and search tools such as those already popular in genomics and proteomics. Techniques in this field are progressing very rapidly, with the progress of soft ionization such as MALDI and ESL Recent advances in this field have led to many hybrid systems such as ion trap MS (ITMS) and time-of-flight MS (TOF-MS). The most important feature of these ionization techniques is that individual naturally occurring metabolites can be ionized without any collision [1]. Further, they enable very sensitive measurements of molecular

28

R. Taguchi

amounts at the pico- or femtomolar level. Thus, this method is suitable for determining very small amounts of biological metabolites. By using conventional ionization methods such as electron ionization (EI) and chemical ionization (CI), it is very difficult to acquire molecular-related ions without any collisions. The fragment pattern of each metabolite is basically used for criteria of identification. For this reason these methods were exclusively used for the identification of a purified single metabolite. In the case of mixtures, gas chromatography/mass spectrometry (GC/MS) [5] was used after derivation for separation and sensitivities. Thermospray ionization and atmospheric pressure chemical ionization (APCI) have also been used in combination with HPLC separation. However, for metabolites that are difficult to ionize, GC/MS was not available. As a partially effective method to obtain molecular-related ions, fast atom bombardment (FAB) ionization has been reported up until the common usage of ESI or M ALDI. ESI and MALDI enable researchers to detect very small levels of detectable biological metabolites. As a result of the development of ESI and MALDI, mass spectrometry has become an essential tool for biological researchers. Until very recently, these methods were mainly recognized as tools for confirming newly synthesized metabolites and especially tools for experienced analysts. Within the past ten years, there have been many improvements made such as TOF-MS and ITMS, and the speed of spread of the utilization of these MS techniques is remarkable. MALDI is essentially used as an off-line method. On the other hand, ESI can be used in a flow system, and can be easily combined with on-line separation methods such as HPLC or capillary electrophoresis (CE) [8]. Sensitivity of detection by ESI-MS mainly depends on the concentration of metabolites in the sample solution. Thus, to obtain the highest sensitivity it is very important to use a low elution rate with a small-sized colunm, and for this purpose, a capillary or nana LC system combined with ESI-MS has been used. For proteome studies, using a MALDI system is also very popular in combination with one- or two-dimensional gel electrophoresis. Recently MALDI-TOF-MS and MALDI-IT-TOF-MS have been appUed to metabolic studies in combination with separations on polymer-based or other support membranes. These mass spectrometries allow high-throughput structural identification of individual metabolites by their unique MS/MS method.

Application of ESI-MS for Metabolomics

29

Table 1. Types of mass spectrometer used with an ESI source Triple quadrupole MS Quadrupole ion-trap MS Quadrupole time-of-flight MS FT-ICR-MS

Multiple LC/MS/MS mode Data dependent scan, MS" High resolution and high accuracy Extremely high resolution and high accuracy, MS" ESI, electrospray ionization; LC, liquid chromatography; FT-ICR-MS, Fourier-transform ion cyclotron resonance mass spectrometry

2. Mass Spectrometries Used with ESI In Table 1, several features of individual mass spectrometries which can be used in combination with ESI are listed.

2.1. Single-stage Quadrupole Mass Spectrometry This instrument is essentially used with HPLC, because of lack in MS/MS measurement process. In this system, switching quickly from positive to negative is very important. From the molecular-related ions obtained in each retention time, the fragment ions can be obtained by in-source collision of these metabolites by switching from low- to high-energy ionization conditions.

2.2. Triple-Stage Quadrupole MS (Tandem Mass, MS/MS) By using triple-stage quadrupole MS (Fig. 3), surveys of precursor ions and fragment ions are easily obtained simultaneously. Analyses such as product ion scanning, precursor ion scanning, neutral loss scanning, and multiple reaction monitoring (MRM) can be possible. Among these methods, precursor ion scanning and neutral loss scanning have begun to be utilized in metabolomics, and these methods will become much more popular in the near future. MRM, also called selected reaction monitoring (SRM), is commonly used in combination with HPLC in the field of pharmaceutical studies.

30

R. Taguchi

Fig. 3. A triple-staged quadrupole MS. A figure modeled by Quattro II (from a catalog by Micromass)

2.3. Quadrupole Ion-Trap MS Quadrupole ion trap MS (Fig. 4) is very useful for structural analysis because of its availability of MS/MS/MS adding to MS/MS. Data-dependent MS/MS acquisition is a very popular and effective method for shotgun proteomics. In the area of metabolomics also, data-dependent MS/MS surveillance will become more popular in the near future. 2.4. Single-stage TOF-MS High resolution, high accuracy, and high-speed acquisition should be features of ESI-TOF-MS. However, the relatively slow rate in switching from positive to negative is a weak point of this type of MS. Recently some venture companies in metabolomics have tried using two systems of a single-stage type of TOF-MS with four HPLCs by monitoring positive and negative ions simultaneously. In their system, the eluted samples from four HPLC systems were induced to an ESI source by switching with an interval of one second by the multiplex systems for high throughput screening.

Application of ESI-MS for Metabolomics

31

Fig. 4. A quadrupole ion-trap MS. A figure modeled by Esquire 3000 (from a catalog by Brucker)

2.5. Quadrupole Hybrid TOF-MS Quadrupole TOF-MS (Fig. 5) is a hybrid of quadrupole and TOF-MS. In this system molecular-related ions are firstly detected by TOF-MS. Then, on obtaining MS/MS data, the precursor ions are selected by quadrupole, and their fragment ions are detected in a TOF analyzer. Mass accuracy and resolution is very high in both molecular-related ions and fragments. An especially high s/n ratio in the spectrum of precursor ions is its main feature. 2.6. FT-ICR-MS In the analysis of FT-ICR-MS, more than 1000 substances can be separately analyzed at one flow injection without separation by HPLC due to its high resolution. Namely, two metabolites different within 0.01 amu can be effectively separated and identified. Theoretically more than 100 different substances can be detected within 1 amu, meaning that more than 10 000 peaks can exist within 100 amu. This type of MS has begun to be used in the area of metabolomics due to the fact that the molecular elements of each metabolite can be effectively identified with a mass accuracy of less than 2 ppm.

32

R. Taguchi

Fig. 5. A quadrupole time-of-flight (TOF) hybrid MS. A figure modeled by Q-TOF (from a catalog by Micromass)

2.7. Other Types of MS Recently, several different types of hybrid MS have become available on the open market. Different types of these hybrid MS such as a hybrid of quadrupole and linear ion trap, a hybrid of ion trap and TOF, and a hybrid of linear ion-trap and FT-ICR, have also different kinds of features for targeting, such as highly sensitive identification or in structural studies. On the other hand, even in the similar types of mass spectrometer, the features of each are rather different. Thus, it is very important to choose a mass spectrometer that suits a specific research project.

3. Methods of Sample Injection Used in ESI-MS

3.1. Using a Syringe Pump This system is commonly served from MS companies as a basic sample injection system and is used for introducing standard samples for calibration

Application of ESI-MS for Metabolomics

33

of MS systems. When more than 100 jxl of total volume is available for the analysis, this method is also applicable to the normal sample analysis. 3.2. Flow Injection This method is essentially used with an injector valve, and preliminary injected samples into a sample loop by microsyringe can be induced to MS by the flow from pump systems. With this method less than 1 |LI1 of samples can be effectively analyzed at a flow rate of several micro- or tens of nanoliters per minute. The analysis at nano flow is especially useful to structural study with MS/MS methods such as precursor ion scanning or neutral loss scanning. In this small scale of analysis, contamination within the connecting line of flow injection should be avoided. 3.3. One-Shot Nano Flow Probe This method uses a disposable nano-chip. The sample is loaded in a nano flow chip and the sample introduced into MS with the natural power of aspiration under low vacuum in the MS system. Usually 10 to 100 nl flow can be obtained. In this system, the problem of contamination is greatly decreased, but each sample should be introduced manually. 3.4. Using LC Systems for Sample Injection This system is essentially same as a flow injection system, except that a separation column will be set between MS and the injector. Particularly in a nano flow experiment, it is reconmiended to set a colunm as close to the MS system as possible. To use gradient elution, a gradient LC system will be needed; quadrupole, ITMS, TOF-MS, and FT-ICR-MS systems are possible to be combined with LC systems. By using ITMS or TOFMS, the high throughput analysis has been applied with a short column of 10-60 mm length in proteomics. In the ESI analysis, fewer than 30 samples eluted at the same retention time can be easily and separately analyzed by MS. In the analysis of small amounts of the target metabolites, the most important aim in using the separation with HPLC is to remove nonvolatile ions and the contaminated ions that are likely to be detected with high ionization efficiency and cause ion suppression to target metabolites. In this case, the reproducibility of retention time is less important for identification. Accurate quantitation is very difficult because ionization efficiency is influenced by contaminated ions and their percent contents in the samples

34

R. Taguchi

easily varied [3,5]. But even at a semiquantitative level, the differences in profiles are informative in forming some effective speculation. Colunms used for the ESI-MS have many variations in their size from conventional to nano. As a mechanism of ESI, the optimal flow is 2-10 |Lil/min. But to avoid contamination within the MS system, the flow from several |uJ/min to less than 100 [jj/min is desirable. If the flow is higher than this value a part of the eluent should be introduced into MS after splitting. To analyze samples in proteomics or metabolomics, at least capillary or nano LC is required. To achieve effective analysis with a nano LC/MS system, specified techniques for many stages of nano LC/MS systems, such as pump systems, column, connection with MS, and nano spray tips are required. In these various areas many innovations have been reported. The selection of columns for the separation of metabolites depends on the chemical features of each metabolite. For the analyses of peptides and many drugs, the C18reversed-phase column has been the most popular. In this case, most of nonvolatile salts are eluted at an earlier stage of elution without any adsorption. In the analysis of lipids we commonly use a normal phase column. In such a case, using an ion exchange column, both volatile acid and base should be selected.

4. Conclusion The focused analysis for the metabolites obtained by rough separation with the differences in solubility compared to the extraction solvent is very effective, because further separation such as by LC or CE conditions suitable for the chemical properties of these metabolites is more easily selected. This is true for all metabolomics studies, especially in the detection of minor components. By using the differences in solubility, metabolites can be categorized into different groups of chemical and physiological nature. For each specified category of the metabolites, specified fields of metabolomics such as for peptidome, glycome, and Upidome can exist. It is very important that new strategies of analytical methods and databases for each of these categories of metabolites are created.

References 1. Fenn JB, Mann M, Meng CK, Wong SF, Whitehouse CM (1989) Electrospray ionization for mass spectrometry of large biomolecules. Science 246:64-71

Application of ESI-MS for Metabolomics

35

2. Mashego MR, Wu L, Van Dam JC, Ras C, Vinke JL, Van Winden WA, Van Gulik WM, Heijnen JJ (2004) MIRACLE: Mass isotopomer ratio analysis of U-13C-labeled extracts. A new method for accurate quantification of changes in concentrations of intracellular metabolites. Biotechnol Bioeng 85(6):620-628 3. Lafaye A, Junot C, Ramounet-Le Gall B, Fritsch P, Tabet JC, Ezan E (2003) Metabolite profiling in rat urine by liquid chromatography/electrospray ion trap mass spectrometry. Application to the study of heavy metal toxicity. Rapid CommunMass Spectrom 17(22):2541-2549 4. Aharoni A, Ric de Vos CH, Verhoeven HA, Maliepaard CA, Kruppa G, Bino R, Goodenowe DB (2002) Nontargeted metabolome analysis by use of Fourier Transform Ion Cyclotron Mass Spectrometry. OMICS 6(3):217-234 5. Katz JE, Dumlao DS, Clarke S, Hau J (2004) A new technique (COMSPARI) to facilitate the identification of minor compounds in complex mixtures by GC/MS and LC/MS: tools for the visualization of matched datasets. J Am Soc Mass Spectrom 15(4):580-584 6. Strauss AW (2004) Tandem mass spectrometry in discovery of disorders of the metabolome. J Clin Invest 113(3):354-356 7. Castrillo JI, Hayes A, Mohammed S, Gaskell SJ, Oliver SG (2003) An optimized protocol for metabolome analysis in yeast using direct infusion electrospray mass spectrometry. Phytochemistry 62(6):929-937 8. Soga T, Ohashi Y, Ueno Y, Naraoka H, Tomita M, Nishioka T (2003) Quantitative metabolome analysis using capillary electrophoresis mass spectrometry. J Proteome Res 2(5):488-494 9. Jiao Z, Baba T, Mori H, Shimizu K (2003) Analysis of metabolic and physiological responses to gnd knockout in Escherichia coli by using C-13 tracer experiment and enzyme activity measurement. FEMS Microbiol Lett 28; 220(2):295-301 10. Taguchi R, Hayakawa J, Takeuchi Y, Ishida M (2000) Two-dimensional analysis of phospholipids by capillary liquid chromatography/electrospray ionization mass spectrometry. J Mass Spectrom 35(8):953-966

Chapter 4: High-Performance Liquid Chromatography and Liquid Chromatography/ IVIass Spectrometry Analyses of IVIetabolites in Microorganisms Hiroshi Miyano Institute of Life Sciences, Ajinomoto Co. Inc., 1-1 Suzuki-cho, Kawasaki-ku, Kawasaki, Kanagawa 210-8681, Japan

1. Introduction The metabolome is defined as the quantitative complement of all of the low molecular weight endogenous metabolites and those intermediates, including substrates, products, and regulatory factors, in a particular physiological or developmental state. Metabolomics is one of the important research areas in the post-genome era, like proteomics and transcriptomics, because endogenous metabolites are the products resulting from intracellular regulation. Thus, the concentrations of intracellular metabolites are reflected in the final response of a biological system, and are derived from inheritable genetic modifications or environmental variations. A prevailing approach of metabolomics is the determination of identified intracellular metabolites, including amino acids, intermediates in the tricarboxylic acid cycle (TCA cycle), intermediates in glycolysis, intermediates in the pentose phosphate pathway, nucleotides, and other endogenous metabolites. The approach also includes the isotopic distribution analysis of metabolites in cells cultured with a substrate bearing a stable isotope, e.g., ^^C-labeled glucose for interpretations of the complex time-related concentration, activity, and flux of intracellular metabolites in cells. The results are useful for metabolic engineering. Another important approach of metabolomics is metabolic profiling, in which all of the responses of intracellular metabolites derived from nuclear magnetic resonance (NMR) and Fourier-transform ion cyclotron resonance mass spectrometry (FT-ICRMS) are observed [1]. The responses derived from both identified and unidentified metabolites are statistically compared by multivariate analysis. The unidentified metabolites characterized by the statistical analysis will be identified as key metabolites.

38

H. Miyano

Genetic research dramatically developed in the 1990s because of the revolutionary progress in analytical methods and improvements in analytical equipment. Comprehensive protein analyses have been significantly developed because of the progress in analytical technology, such as mass spectrometry (MS) and improvements in protein databases. In addition, NMR techniques and protein molecular modeling have assisted structural-functional analyses of proteins. Analyses of intracellular metabolites, however, have not been developed, although many specific analytical methods for certain metabolites were developed over the last century. The reason is that metabolites do not directly bear relevance to the central dogma from DNA to protein via mRNA. Another reason is that the endogenous metabolites in cells have multitudes of properties and functional groups, such as amino acids, organic acids, sugars, sugar phosphates, nucleotides, lipids, coenzymes, and inorganic ions. Furthermore, a wide range of concentration levels of intracellular metabolites exists in cells, and few methods exist for analyses of multiple compounds simultaneously. This chapter describes the analyses of amino acids, organic acids including intermediates in the TCA cycle, and phosphate esters, including intermediates in glycolysis and intermediates in the pentose phosphate pathway, by high-performance liquid chromatography (HPLC) and LC/MS, which comprise the optimal equipment for high-performance methods of separation analysis. Many HPLC applications for biological analyses and drug kinetics have been reported, because the advantage of the HPLC method is that researchers can select an appropriate method from many different separation and detection modes. HPLC is better for structure-related separation and detection than other methods. Methods for the inactivation of metabolism and the extraction of metabolites from cells are also reported, because the preparation procedures before measurement are important for analyses of time-related concentrations and activities of intracellular metabolites.

2. Inactivation of IMetabolism and Extraction of [Metabolites from Cells The measurements of intracellular metabolites are divided into three steps: (1) inactivation of metabohsm in cells, (2) extraction of metabolites from cells, and (3) measurement of metabolites. Researchers have to consider both the biological and chemical stability of individual metabolites.

HPLC and LC/MS analyses for metabolome

39

Metabolic reactions in microorganisms, especially catabolic pathways and energy metabolism reactions, have high turnover rates. Cytosolic glucose is converted at approximately 1 mM s"^ and cytosolic ATP at approximately 1.5 mM s"^ [2]. The turnover times reported for amino acids were also in the range of seconds [3]. For reliable measurements of the intracellular concentrations of metabolites, the metabolism in the sampled cells should be rapidly inactivated as compared to the metabolic reaction rates, to avoid uncontrolled reactions, and the sampling rate should be high enough to study rapid and dynamic metabolic reactions [2]. Buchholz et al. developed equipment for routine rapid sampling and inactivation of metabolism that allows 4-5 samples to be obtained per second, continuously during the fermentation process [4]. hicubation mixtures including microorganisms are sprayed into cold solvent at a temperature of -20°C to ~50°C, which immediately inactivates metabolism. Other groups also developed automated sampling devices coupled to fermentation tank reactors [2,5-7]. Extraction of the metabolites from cells was also carried out at sub-zero temperatures to prevent re-activation of enzymes. Hans et al. demonstrated the necessity of immediate termination of metabolic activities for the analysis of intracellular amino acids. They compared the analytical values of amino acids in exponentially growing yeast cells that were prepared in cold methanol (below -20°C), before cell harvest by centrifugation and subsequent extraction, with those prepared by cooling in cold water (+4°C) [3]. Significant concentration changes were observed between the amino acids prepared in cold methanol and those prepared in cold water (Fig. 1). 100 % 50

Q

-^LJLJLjyi I

-50 l Fig. 1. Comparison of observed intracellular amino acid contents of yeast cells that were extracted after metabolism was inactivated at different temperatures. The vertical axis is the ratio of the observed value of individual amino acid concentrations prepared in cold water (+4°C) to those prepared in cold methanol (-20°C). Data adapted from Hans et al. [3]

40

H. Miyano

Several metabolites are known to be chemically unstable in acidic or basic solutions. Adenine nucleotides, including adenosine triphosphate (ATP), are unstable in acidic solutions at a room temperature. The reduced form of nicotinamide adenine dinucleotide is unstable under acidic conditions, although the oxidized form is degraded under basic conditions [6]. ß-Keto acids are nonenzymatically degraded to the decarboxylate form under both acidic and basic conditions. Oxaloacetic acid in the TCA cycle, for instance, is easily converted to pyruvic acid, which is the final metabolite in glycolysis and is transformed to acetyl CoA, a TCA cycle substrate, by pyruvate dehydrogenase. Although amino acids are known to be comparatively stable metabolites, tryptophan is easily decomposed under basic conditions and glutamine is cyclized by intramolecular condensation. Researchers should not select a procedure of metabolite inactivation and extraction from cells by only considering the largest numbers of intracellular metabolites that can be collected. It is important to select appropriate procedures by which the target metabolites are prevented from catabolism and chemical degradation, by considering the properties of the compounds. Maharjan and Ferenci recently reported the influence of extraction methodology on the metabolome profiling of Escherichia coli [8].

3. Free Amino Acid Analysis 3.1. History of Amino Acid Analysis High-performance liquid chromatography has been widely used for quantitative analyses of amino acids. The ninhydrin reagent is used for the system that follows the principle of the post-column derivative conversion method. Stein, Moore, and Spackmann developed an amino acid analyzer system for the colorimetric determination of amino acids by producing the purple color with a ninhydrin reagent, which first became available by Ruhemann in 1911, after stepwise separation of the amino acids while raising the pH of the citric acid buffer solution with a cation exchange resin. The analyzer is now capable of assaying not only protein hydrolysates but also amino acids in biological fluids, with the progressive improvement of HPLC pumping performance and the development of more effective column resins. Apart from the ninhydrin method, a variety of other techniques for converting amino acids to sensitive analyzable fluorescent derivatives (derivatization) have been developed since the late 1970s [9,10]. Sensitive methods for the detection of intracellular amino acids are useful because

HPLC and LC/MS analyses for metabolome

41

smaller numbers of cells are preferable for the rapid inactivation of metabolism and the efficient extraction of metabolites. The reagents used to produce fluorescent derivatives include or/Äo-phthalaldehyde (OPA) (^ex. 340-345 nm, A.em. 455 nm), 9-amino quinolyl-A^-hydroxysuccinimidyl carbamate (AQC) (Xex. 245 nm, Xem. 395 nm), fluorescein isothiocyanate (FITC) (Xex.490 nm, A.em.515 nm), and 4-fluoro-7-nitrobenzofurazan (NBD-F) (^x. 470 nm, ^em. 530 nm). The detection sensitivity is within the range from sub-picomole to femtomole. To detect the lower amounts of fluorescent derivatives of amino acids, the wavelength of the fluorescence excitation is brought closer to that of laser light. 3.2. Free Intracellular Amino Acid Analysis Quantitative analyses of free intracellular amino acids have been reported in a few papers [3,11-16]. Hans et al. determined the dynamics of intracellular amino acid pools in different growth phases by a precolumn fluorescent derivatization method with AQC [3] and those during autonomous oscillations of Saccharomyces cerevisiae ATCC 32167 in batch cultures by a precolunm fluorescent derivatization method with OPA [16]. After separation of cells from the medium and inactivation of metabolism at temperatures below freezing, the intracellular amino acids were extracted with either boiling buffered ethanol (75% ethanol/0.25 M 4-(2-hydroxyethyl)-lpiperazineethanesulfonic acid (HEPES; pH 7.5)), or boiling water. Extraction with boiling water is preferable to that with boiling buffered ethanol for the precolumn derivatization method, because salts such as HEPES adversely affect the derivatization rate, yield, and reproducibility of the HPLC retention time. Although AQC and OPA are not significantly different as derivatization reagents for primary amino acids, these authors reported that histidine/glutamine and serine/asparagine derivatized with AQC could not be separated. We determined the free intracellular amino acids in E. coli Kl2 MG 1655 by fluorescent detection and mass detection after the derivatization and column separation. The cultivation medium consisted of 8 g of Bacto-Tryptone, 5 g of Bacto-Yeast Extract and 1.25 g of sodium chloride in 250 ml of distilled water at pH 7, adjusted by a 1 M sodium hydroxide solution. Samples were collected after a 4-h incubation (at log phase) and after a 7-h incubation (at stationary phase). A 20-ml sample of the culture broth was drawn from the culture flask and rapidly mixed with 30 ml of 60% of methanol containing 70 mM HEPES, which was pre-cooled at -70°C (inactivation step). The aliquots were centrifuged at -20°C, and the supernatant was discarded. The pellets were resuspended in 1 ml of 50%

42

H. Miyano

methanol, and the intracellular metabolites were extracted from the cells by a freeze-thaw method [17]. The aliquots were extracted with 0.25 ml of chloroform, and the hydrophilic metabolites in aqueous phase, including free intracellular amino acids, were collected (extraction step.) Extracts were lyophilized after ultrafiltration (10 kDa) and were stored at -78°C until analysis. An extraction procedure that prevents salt from being mixed in the sample is also useful for the precolumn derivatization HPLC method. Figure 2 shows the chromatogram of the free intracellular amino acids, extracted from E. colU derivatized with NBD-F and detected by fluorescence with an excitation wavelength of 470 nm and emission of 530 nm. To avoid interfering fluorescence of around 400 nm emitted from the biological substances, selective and sensitive detection of biomolecules was attained by tagging them with benzofurazan reagents such as NBD-F [9,18,19]. Figure 3 shows the LC/MS spectra of the same sample detected by a single ion monitoring (SIM) mode. Since NBD-F specifically reacts with an amino group to produce a fluorescent adduct, almost all amines, including free intracellular amino acids, are detected on the chromatogram (Fig. 2). Although this method is not comprehensively applicable for all metabolites, it is useful for detecting compounds with amine functional groups. In contrast, individual amino acids can be separately detected by LC/MS, which is widely used in pharmacokinetics because of its higher selectivity. The precolumn LC/MS method will be quite useful for the quantification of known amino acids. The concentrations of the intracellular amino acids in E. coli Kl2 at log phase and stationary phase are summarized in Table 1. We have to mention that the values are not absolute, because the intracellular concentrations of metabolites widely vary with the culture conditions.

.J^JU^AMJOJ 30

wm 60

W^u,_^J^ 90

120

Fig. 2. High-performance liquid chromatography (HPLC) chromatogram of Escherichia coli intracellular amino acids at log phase, derivatized with 4-fluoro-7-nitrobenzofurazan (NBD-F), with fluorescent detection

HPLC and LC/MS analyses for metabolome

20

43

min

Fig. 3. Liquid chromatography/mass spectrometry (LC/MS) chromatogram of E. coli intracellular amino acids at log phase, derivatized with NBD-F, by single ion monitoring mode (mass spectrometer: PE Sciex API365)

4. Analysis of TCA Cycle Intermediates There are many different organic acids in cells, like the case with amines. In particular, the organic acids in the TCA cycle (citric acid cycle, Krebs cycle) are essential to cellular activity. The TCA cycle plays two important roles in cells. The one is catabolism, which is concerned with energy production. Adenosine triphosphate is generated in the process of 2-oxoglutaric acid oxidation to succinic acid. The other is anabolism, in which the intermediates in the TCA cycle are the seeds for amino acid biosynthesis. Thus, determination of the intermediates in the TCA cycle is also important in physiological and biochemical studies. In particular, the change of the flux and the absolute levels of intermediates in the TCA cycle may reflect defects in the substrate flow into mitochondria or the discharge of products, and energy metabolism changes of the cellular redox state. Intermediates in the TCA cycle include citric acid, isocitric acid, 2-oxoglutaric acid, succinic acid, fumaric acid, malic acid, and oxaloacetic acid. Pyruvic acid should

44

H. Miyano

also be monitored, because it is a key intermediate between the glycolytic pathway and the TCA cycle. Hydrophilic organic acids, including intermediates in the TCA cycle, can be determined by HPLC. The intracellular concentrations of TCA cycle intermediates in a microbial cell extract can be analyzed by anion-exchange HPLC with a basic solution as an eluent [20,21]. Groussac and colleagues separated organic acids on an lonCarboPac AS 11 (Dionex) column using a 0.5-35 mM NaOH elution [21]. Five organic acids in the TCA cycle, including succinic acid, malic acid, fumaric acid, 2-oxoglutaric, and citric acid, were identified within 20 min. Detection limits in the range of milligrams per liter were achieved by the use of conductivity. They applied the method for a dynamic analysis of TCA cycle intermediates in S. cerevisiae, in response to a glucose pulse. In addition to the amino acid analysis mentioned above, derivatization of a carboxyl group with a fluorescent reagent before column-separation is also useful for selective and sensitive detection [9]. The 4-(A^, A^-dimethylaminosulfonyl)-7-piperazino-2,1,3-benzoxadiazole (DBD-PZ) reagent, which has a benzofurazan skeleton like NBD-F, can emit strong fluorescence, with emission wavelengths around 560 nm [22]. Each intermediate in the TCA cycle has two or more carboxyl groups per molecule, that is, 2-oxoglutaric acid, succinic acid, fumaric acid, malic acid, and oxaloacetic acid have two carboxyl groups, and isocitric acid and citric acid have three carboxyl groups. Thus, it seemed that multiple derivatives could be generated from each intermediate in the TCA cycle by the reaction with DBD-PZ. As it is preferable to detect the analyte as a single peak on the chromatogram, the parameters to be investigated were the types of condensing reagents and base catalysts, the concentrations of condensing reagents, base catalysts and DBD-PZ, the reaction time, the reaction temperature and the dissolving solvent. The structures of these DBD-PZ derivatives were confirmed by LC/MS, which proved that all of the carboxylic groups were completely labeled with DBD-PZ under the optimal conditions, except for oxaloacetic acid, which was converted to pyruvic acid during derivatization [23]. The limits of fluorescence detection for all adducts were between 2 and 100 fmol, at a signal to noise ratio of 3. Among the organic acids examined, the citric acid derivative with DBD-PZ showed the lowest detection limit of 2 fmol, indicating that the method has the merit of high sensitivity. Figure 4 shows the chromatogram of free intracellular organic acids in E. coli Kl2 MG 1655 by fluorescent detection, after the derivatization with DBD-PZ and column separation. Intracellular organic acids, including intermediates in the TCA cycle, are detected on the chromatogram because DBD-PZ specifically reacts with carboxyl groups to produce the fluorescent

HPLC and LC/MS analyses for metabolome

_ju U 10

45

.^-^^....J^^^-J^ 20

30

min 40

50

Fig. 4. HPLC chromatogram of E. coli intracellular organic acids at log phase, derivatized with 4-(A^,7V-dimethylaminosulfonyl)-7-piperazino-2,l ,3-benzoxadiazole, with fluorescent detection

first MS/second MS citric acid/isocitric acid

J 073.3/761.0

2-oxoglutaric acid

__733.1/670.0

succinic acid

TV.

_705.1/394.0 _703.1/640.2

fumaric acid

721.1/410.0 382.1/336.8

malic acid pyruvic acid/oxaloacetic acid 0

mm

10

Fig. 5. LC/MS/MS chromatogram of E. coli intracellular intermediates in tricarboxylic acid (TCA) cycle at log phase, derivatized with NBD-F, by selected reaction-monitoring mode (mass spectrometer: PE Sciex API365). Extracted m/z values of the first mass and the second mass are described on therightside of each chromatogram of the intermediate, respectively adducts. Although the method is not comprehensive for all metabolites, it is useful for detecting compounds with carboxyl functional groups. Figure 5 shows the mass chromatogram of the TCA cycle intermediates that were derivatized with DBD-PZ. Detection was carried out by the selected reaction monitoring (SRM) mode, using a triple-quadrupole mass spectrometer. Based on the mass spectra of the derivatives, the precur-

46

H. Miyano

sor-product transitions are determined. The details are described in Fig. 5. The precolumn LC/MS method combined with precolumn derivatization will be quite useful for the selective detection of not only TCA cycle intermediates but also known organic acids. The intracellular concentrations of TCA cycle intermediates in E. coli Kl2 at log phase and stationary phase are summarized in Table 1. Like the case with the amino acids, we have to mention that the values are not absolute, because the intracellular concentrations of the metabolites change greatly with the culture conditions. Table 1. Intracellular concentrations of amino acids and intermediates in tricarboxylic acid (TCA) cycle in Escherichia coli Kl2 Intracellular concentrations^

Amino acids

Log phase Above 1 mM Above 100 |iM

Above 10 ^M

Above 1 joM

Lysine Glycine, alanine, y-aminobutyric acid, ornithine, glutamic acid, histidine, phenylalanine, arginine Serine, proline, valine, threonine, isoleucine, leucine, aspartic acid, glutamine

Intermediates1 in TCA cycle

Stationary phase Lysine Alanine, y-aminobutyric acid, glutamic acid, histidine

Log phase

Stationary phase

Glycine, valine, isoleucine, leucine, ornithine, aspartic acid, glutamine, methionine, phenylalanine, arginine Serine, proline. threonine

Citric acid

Succinic acid

Fumaric acid. isocitric acid. malic acid, 2-oxoglutaric acid

Citric acid. fumaric acid. isocitric acid malic acid, 2-oxoglutaric acid

Succinic acid

^The values are not absolute, because the intracellular concentrations of metabolites vary greatly under the culture conditions. Culture conditions are described in the text

HPLC and LC/MS analyses for metabolome

47

In the case of carboxyl group derivatization, the carboxyl group of an organic acid could react with the amino group of the reagent by a condensation reaction. A large amount of amino acids would prevent the condensation reaction of an organic acid with the reagent, because the carboxyl group of an amino acid could react with the amino group of another amino acid. Precolumn derivatization methods of organic acids have to accommodate the possibility of a large number of amino acids, especially when determining the TCA cycle intermediates in industrial microorganisms involved in amino acid production.

5. Analysis of Glycolysis and Pentose Phosphate Pathway Intermediates Glycolysis (Embden-Meyerhof pathway, Embden-Meyerhof-Pamas pathway) is the basic energy metabolic system for almost all organisms. In glycolysis, the conversion of one mole of glucose to two moles of pyruvic acid, which is a TCA cycle substrate, is accompanied by the net production of two moles each of ATP and the reduced form of nicotinamide adenine dinucleotide (NADH). The pentose phosphate pathway is also important in glucose metabolism for generating NADPH (reduced NAD phosphate) for biosynthetic reactions and pentose sugars for nucleotide biosynthesis. Intermediates in both pathways are grouped as phosphate esters with sugar alcohol. (Hereafter, they are collectively called sugar phosphates) Buchholz et al. developed a novel LC/MS method for the quantification of intracellular concentrations of phosphate esters, using a cyclodextrin-bonded phase column eluted with aqueous ammonium acetate and methanol [24]. The intracellular intermediates such as glycolysis intermediates, nucleotides, and cofactors in E. coli Kl2 could be quantified, with detection limits from 0.02 to 0.50 mM. Although isobaric substances, such as glucose-6-phosphate/fructose-6-phosphate and 3-phosphoglycerate/ 2-phosphoglycerate, were not separated under these conditions, these intermediates can be separated on a porous graphitic carbon colunm. Sugar phosphates can also be analyzed by using anion-exchange chromatography, as with organic acids [20,21]. Conductometry or pulsed amperometry are generally used for the detection, because sugar phosphates lack a characteristic ultraviolet absorption and there are no specific reagents for phosphate esters. An anion-exchange chromatograph/mass spectrometer can successfully measure sugar phosphates eluted in a solution of sodium hydroxide, with postcolumn removal of sodium ions using a commercially available ion suppressor before sample introduction into the mass spec-

48

H. Miyano

trometer [25,26]. The effect of the suppression is demonstrated in Fig. 6. The flow channel of the eluent is placed between the cation-exchanged membranes, and the other sides of the membranes are placed by the flow channel of the regenerated solution. Electrodes for electrolysis are fixed outside of the flow channels of the regenerated solution, and hydrogen ions are generated from water on the positive electrode. The suppressor behaves like a cation exchanger, and replaces the sodium counterions with hydronium ions. Thus, when the analytes leave the suppressor, they are in a water solution. The analysis of sugar phosphates by an anion-exchange chromatograph/mass spectrometer with an anion suppressor has the useful features of high separation ability derived from the anion-exchange Chromatograph and the high selectivity derived from mass spectrometry. Sugar phosphates are detected as molecular ions [M-H]". After collision activation, the most common first fragmentation step is the removal of a sugar or alcohol moiety, and thus the specific daughter fragment ion [H2PO4]" is observed. The characteristic cleavage is available for the selective detection of sugar phosphates using LC/MS/MS. The chromatograms of E. coli Kl2 extracts obtained by using anion-exchange chromatogram-suppressor-MS/MS are shown in Fig. 7, which also demonstrates a chromatogram with pulsed amperometric detection as a reference. A selected reaction monitoring mode was used for the detection of sugar phosphates. Molecular ions [M-H]" were detected by the first MS, while m/z 97 [H2PO4]" were detected by the second MS. The detection limits are 0.1 to 5 |LiM. The demonstration shows that pulse amperometric detection is useful for the metabolic profiling of intermediates with phosphate esters, while LC/MS/MS is quite useful for the specific detection of sugar phosphates in glycolysis and the pentose phosphate pathway.

Fig. 6. Inner structure and anion exchange mechanism within suppressor

HPLC and LC/MS analyses for metabolome

MS/MS

JK

MS/MS

IPAD

JX^

49

339.1/97.1 259.1/97.1

WJuAllfÄuu-j 10

20

30

Fig. 7. LC/MS/MS chromatogram of E. coli intracellular sugar phosphates, by selected reaction-monitoring mode (mass spectrometer: PE Sciex API365). Extracted m/z=259 of the first mass and extracted m/z=97 of the second mass are detected as galactose-1-phosphate, glucose-1-phosphate, galactose-6-phosphate, glucose-6-phosphate, fructose-6-phosphate, and mannose-6-phosphate by order of elution. Extracted m/z=339 of the first mass and extracted m/z-91 of the second mass are detected as fructose-1,6-bisphosphate. IPAD, integrated pulsed amperometric detection

6. Conclusion This chapter has described the analytical methods of metabolites by HPLC and LC/MS, based on the functional group. Amino acids and organic acids can be detected by using specific derivatization reagents for amino groups and carboxyl groups, respectively. Sugar phosphates are detected by selected reaction monitoring, using the characteristic cleavage between the phosphate and sugar moiety or by pulsed amperometry using a reducing sugar. It is quite difficult to perform comprehensive and simultaneous analyses of a variety of intracellular metabolites by HPLC, as well as by other methods. We propose that the intracellular metabolites should first be classified by functional groups. A hundred amines, several dozen organic acids, and several dozen phosphate esters can be observed as peaks by HPLC. The combination of every profile results in a comprehensive analysis of the intracellular metabolites.

50

H. Miyano

The HPLC method based on functional groups is also useful for the determination of intracellular metabolites, especially by LC/MS/MS. The methods are applicable for the specific determination of amino acids, TCA cycle intermediates, and glycolysis and pentose phosphate pathway intermediates, which are the most fundamental metabolites and intermediates in cells. Integration of the method and instrument development are essential for a significant advancement of metabolomics, and we believe that the methods mentioned in this chapter provide useful tips toward this end.

References 1. Hirayama K (2005) Metabolome profiling by using FT-ICR-MS and ESI-Q-TOFMS. In: Tomita M, Nishioka T (eds) Metabolomics: The Frontier of Systems Biology. Springer, Tokyo, pp 75-90 2. Schaefer U, Boos W, Takors R. Weuster-Botz D (1999) Automated sampling device for monitoring intracellular metabolite dynamics. Anal Biochem 270:88-96 3. Hans M, Heinzle E, Wittmann C (2001) Quantification of intracellular amino acids in batch-cultures of Saccharomyces cerevisiae- Appl Microbiol Biotechnol 56:776-779 4. Buchholz A, Hurlebaus J, Wandrey C, Takors R (2002) Metabolomics: quantification of intracellular metabolite dynamics. Biomol Eng 19:5-15 5. Theobald U, Mailinger W, Reuss M, Rizzi M (1993) In vivo analysis of glucose-induced fast changes in yeast adenine nucleotide pool applying a rapid sampling technique. Anal Biochem 214:31-37 6. Theobald U, Mailinger W, Baltes M, Rizzi M, Reuss M (1997) In vivo analysis of metabolic dynamics in Saccharomyces cerevisiae: I. Experimental observations. Biotechnol Bioeng 55:305-316 7. Weuster-Botz D (1997) Sampling tube device for monitoring intracellular metabolite dynamics. Anal Biochem 246:225-233 8. Maharjan RP, Ferenci T (2003) Global metabolite analysis: the influence of extraction methodology on metabolome profiles of Escherichia coli- Anal Biochem 313:145-154 9. Uchiyama S, Santa T, Okiyama N, Fukushima T, Imai K (2001) Fluorogenic and fluorescent labeling reagents with a benzofurazan skeleton. Biomed Chromatogr 15:295-318 10. Fukushima T, Usui N, Santa T, Imai K (2003) Recent progress in derivatization methods for LC and CE analysis. J Pharm Biomed Anal 30:1655-1687 11. Ohsumi Y, Kitamoto K, Anraku Y (1988) Changes induced in the permeability barrier of the yeast plasma membrane by cupric ion. J Bacteriol 170:2676-2682

HPLC and LC/MS analyses for metabolome

51

12. Amezaga M, Davidson I, McLaggan D, Verbeul A, Abee T, Booth I (1995) The role of peptide metabolism in the growth of Listeria monocytogenes ATCC 23074 at high osmolarity. Microbiology 141:41-49 13. Martinez-Force E, Benitez T (1995) Effects of varying media, temperature, and growth rates on the intracellular concentrations of yeast amino acids. BiotechnolProg 11:386-392 14. Gent D, Slaughter J (1998) Intracellular distribution of amino acids in an slpl vacuole-deficient mutant of the yeast Saccharomyces cerevisiae- J Appl Microbiol 84:752-758 15. Roe A, McLaggan D, Davidson I, O'Byrne C, Booth I (1998) Perturbation of anion balance during inhibition of growth of Escherichia coli by weak acids. J Bacteriol 180:767-772 16. Hans M, Heinzle E, Wittmann C (2003) Free intracellular amino acid pools during autonomous oscillations in Saccharomyces cerevisiae- Biotechnol Bioeng 82:143-151 17. de Koning W, van Dam K (1992) A method for the determination of changes of glycolytic metabolites in yeast on a subsecond time scale using extraction at neutral pH. Anal Biochem 204:118-123 18. Watanabe Y, Imai K (1981) High-performance liquid chromatography and sensitive detection of amino acids derivatized with 7-fluoro-4-nitrobenzo-2oxa-l,3-diazole. Anal Biochem 116:471-^72 19. Watanabe Y, Imai K (1984) Sensitive detection of amino acids in human serum and dried blood disc of 3 mm diameter for diagnosis of inborn errors of metabolism. J Chromatogr 309:279-286 20. Bhattacharya M, Fuhrman L, Ingram A, Nickerson K, Conway T (1995) Single-Run separation and detection of multiple metabolic intermediates by anion-exchange high-performance liquid chromatography and application to cell pool extracts prepared from Escherichia coli. Anal Biochem 232:98-106 21. Groussac E, Ortiz M, Fran9ois J (2000) Improved protocols for quantitative determination of metabolites from biological samples using high performance ionic-exchange chromatography with conductimetric and pulsed amperometric detection. Enzyme Microb Technol 26:715-723 22. Toyo'oka T, Ishibashi M, Takeda Y, Nakashima K, Akiyama S, Uzu S, Imai K (1991) Precolumn fluorescence tagging reagent for carboxylic acids in high-performance liquid chromatography: 4-substituted-7-aminoalkylamino2,1,3-benzoxadiazoles. J Chromatogr 588:61-71 23. Kubota K, Fukushima T, Miyano H, Hirayama K, Imai K (2002) HPLC-fluorescence determination method for carboxylic acids related to TCA cycle as a tool for metabolome. 26th International symposium on high performance liquid phase separations related techniques (HPLC2002) (Montreal) Abstracts, p 55 24. Buchholz A, Takors R, Wandrey C (2001) Quantification of intracellular metabolites in Escherichia coli Kl2 using liquid chromatographic-electrospray ionization tandem mass spectrometric techniques. Anal Biochem 295:129-137

52

H. Miyano

25. Conboy J, Henion J (1992) High-performance anion-exchange chromatography coupled with mass spectrometry for the determination of carbohydrates. Biol Mass Spectrom 21:397-407 26. Gardner M, Voyksner R, Haney C (2000) Analysis of pesticides by LC-electrospray-MS with postcolumn removal of nonvolatile buffers. Anal Chem 72:4659-4666

Chapter 5: Metabolome Profiling of Human Urine with Capillary Gas Chromatography/Mass Spectrometry Tomiko Kuhara Division of Human Genetics, Medical Research Institute, Kanazawa Medical University, 1-1 Daigaku, Uchinada-machi, Kahoku-gun, Ishikawa 920-0293, Japan

1. Urine Can Provide Considerable Biological Information Metabolites are end products of cellular processes, and their levels reflect the response of biological systems, at the systems level. Metabolic profiling is a high-throughput approach to measuring and interpreting complex metabolic parameters in biosamples such as urine, blood, cells, and tissues. Human urine contains many classes of compounds, including organic acids, amino acids, purines, pyrimidines, sugars, sugar alcohols, sugar acids, amines, and other compounds, at a variety of concentrations. Measuring changes in metabolite concentrations is a powerful approach for assessing gene function. In the urine of a patient with a deficiency of an enzyme or its cofactor, the enzyme's substrate accumulates and/or there is a marked increase in metabolites that are formed secondarily via side paths, owing to the accumulation of the substrate. In some cases, instead of the substrates or its secondary metabolites, the level of the substrate precursor increases, owing to the derepression of end-product inhibition. Therefore, human urine can provide the necessary evidence to diagnose inborn errors of metabolism (lEMs). Besides lEMs, acquired metabolic disorders can be detected by metabolome analysis.

2. Urine is Superior to Blood Urine is superior to blood for metabolic profiling. We previously reported the use of urine to identify individuals with methylmalonic acidemia or

54

T. Kuhara

propionic acidemia [1]. Chamberlin and Sweeley reported that urine on filter paper is generally more useful for making diagnoses than a blood-spot on filter paper, except for diseases where very hydrophobic compounds accumulate [2]. We also compared capillary gas chromatography-mass spectrometry (GC/MS) analysis using urine on filter paper with results using serum or blood on filter paper. As shown in Fig. 1, urine from a patient with ornithine transcarbamylase deficiency showed a marked increase in orotate and uracil, but serum did not. Urine is also preferable to blood for screening for pyrimidine degradation disorders [3,4]. For lEMs that cause anuria due to kidney dysfunction, serum or plasma is used after the onset of the dysfunction. Special attention is required for sample preparation for the chemical diagnosis of primary hyperoxaluria types I and II, and for the monitoring after liver transplantation in type I [5].

3. Timing of Sampling is Important for the Chemical Diagnosis of Some lEMs but Not All Metabolite levels reflect the response of biological systems. Most defects in gluconeogenesis and fatty acid ß-oxidation are easily detected during fasting. The gluconeogenesis disorders, glucose-6-phosphatase deficiency, fructose-1,6-diphosphatase deficiency and pyruvate carboxylase deficiency, can only be chemically diagnosed during fasting (see fructose-1,6-diphosphatase deficiency example in Fig. 2). Fructose-1,6-diphosphatase (D-fructose-l,6-diphosphate 1-phosphohydrolase; EC 3.1.3.11) is a key enzyme of gluconeogenesis. Deficiency of fiiictose-1,6-diphosphatase (MIM 229700), originally described in 1970, therefore causes severe lactic acidemia and hypoglycemia during fasting conditions, and increased glycerol excretion during fasting [6-8]. During remissions, the urinary metabolic profiles appear normal, compared with control samples. Figure 2 shows the total ion chromatograms (TIC) of trimethylsilyl (TMS) derivatives of metabolites in urine from a patient with fructose-1,6-disphosphatase deficiency; the simplified urease-pretreatment procedure during an episode (upper panel) and a remission (lower panel). During a hypoglycemic episode, the metabolic profile changes dramatically: lactate, glycerol, and glycerol-3-phosphate levels all markedly increase in the urine. Lactate and glycerol are, however, not specific markers, as the former increases under a variety of disease conditions and the latter as a result of the glycerol infusion treatment (see Section 2.2 in Chapter 13 for details). Glycerol also increases markedly in glycerol kinase deficiency.

Human Urine Metabolome Profiling

11.00

12.00

11.00

12.00 Time(min)

55

Time(min)

m/z 241 X 2 6.00

7.00

8.00

9.00

10.00

Fig. 1. Mass chromatograms of trimethylsilyl (TMS) derivatives of metabolites in urine {upper) and serum {lower)froma patient with ornithine transcarbamylase deficiency. Both samples were prepared by the simplified urease pretreatment. The ions oim/z 241, [M-15]^ at 6.5 min and ofm/z 254, [M-HCOOTMS]"" at 9.43 min are due to uracil (di-TMS) and orotate (tri-TMS), respectively. The ion of m/z 327 [M-15]^ at 11.8 min is due to «-heptadecanoate (mono-TMS) used as an internal standard (75): 0.1 ml urine or serum was spiked with 50 nmol «-heptadecanoate. TIC, total ion current chromatogram

e.OC

700

800

900

10 0C

T(me(min)

1

,

,1

.

i .. .-J-^

nVz 357 X 6 _ ^ m/z 205 X / m/2 181x1

Fig. 2. Total ion chromatograms {TIC) of trimethylsilyl derivatives of metabolites from the urine of a patient with afructose-1,6-diphosphatasedeficiency. The samples were prepared using the simplified urease pretreatment and were obtained during a hypoglycemic episode {upper) and during a remission {lower). The ions targeted were m/z 231 for 2,2-dimethylsuccinate (2,2-DMS, ISi), m/z 229 for 2-hydroxyundecanoate (2HUD, IS2), m/z 329 for creatinine, m/z 357 for glycerol-3-phosphate {G-3-P\ m/z 205 for glycerol, and m/z 191 for lactate {Lac). During the episode, 3-hydroxybutyrate {ßHB) also increased. Glycerol-3-phosphate was markedly reduced but still detectable in the samples from the patient during remission andfromthe control, even in the scanning mode

56

T. Kuhara

4. Metabolic Profiling of Organic Acids The profiling of urinary organic acids by GC/MS is adopted for diagnosing organic acidemias. Since the discovery of isovaleric acidemia in 1966, lEMs classified as organic acidemias, in w^hich organic acids accumulate in the urine, have been discovered by GC/MS [9]. Due to its high chromatographic performance, sensitive and specific identification, and quantification, GC/MS is indispensable for the chemical diagnoses of these disorders. For GC/MS analysis, urinary organic acids are extracted with ethyl ether and/or ethyl acetate under acidic conditions w^ith or w^ithout adding sodium chloride, and are then dehydrated with sodium sulfate and evaporated to dryness; the residues are derivatized to increase their volatility and therefore their suitability for GC/MS analyses. The derivatization and/or silylation is performed with or without prior oximation [10-13]. Some polar acids are useful for making diagnoses: orotate is the most useful target for the screening of six primary hyperammonemias and orotic aciduria; methylcitrate is the key target for propionic acidemia; and glycerol-3-phosphate is the target for diagnosing fructose-1,6-diphosphatase deficiency [14]. To measure these polar acids quantitatively, they were extracted by DEAE-Sephadex ion exchange [13,15-17]. Glycerol3-phosphate is poorly recovered by solvent extraction. Quantitative analysis is difficult without the respective stable isotope-labeled internal standard (IS). Extraction with DEAE-Sephadex significantly improves the recovery of glycerol-3-phosphate, but this procedure takes several hours and extracts inorganic acids such as phosphate or sulfate, which interfere with the subsequent GC/MS analysis. Furthermore glycerol is not recovered by DEAE-Sephadex extraction [16,18]. Metabolic profiling of the organic acids in urine is not a novel technology. The procedure described by Hoffmann et al. is more quantitative but not widely applied, because it requires extensive sample preparation [19]. Most often, laboratories measure organic acids neither quantitatively nor semiquantitatively in terms of IS equivalents, because measurement varies from laboratory to laboratory. Currently, errors in quantitative results as great as 50% are acceptable for the diagnosis of inherited disorders, but in follow-up, the error for organic acids of clinical interest should be - Mass Spectrum

Fig. 2. Principles of FT-ICR-MS

Fig. 3. Extemal appearance of FT-ICR-MS

Fig. 4. Scheme of FT-ICR-MS

Metabolomic Profiling by FT-ICR-MS and ESI-Q-TOF-MS

81

4.2. ESI-Q-TOF-MS This is an MS/MS system connected directly to a quadrupole mass spectrometer and a time-of-flight mass spectrometer. Ions are generated in the ion source, as shown in Fig. 5, and are separated with the quadrupole. The ion with a specific m/z is selected and introduced into the collision chamber. The selected ion is degraded into product ions by collision with an inert gas such as Ar or He. The products are analyzed with TOF-MS. Figure 6 shows the appearance of the device. The features of this instrument are described below. 1. The resolution is comparatively high (10 000). 2. The mass accuracy is comparatively high. 3. MS/MS can be measured.

Fig. 5. Principles of ESI-Q-TOF-MS

Fig. 6. Extemal appearance of ESI-Q-TOF-MS

82

K. Hirayama

5. Conditions We describe the measurement conditions of the intracellular metabolites of the E. coli K-12 strain, by using FT-ICR-MS and ESI-Q-TOF-MS. 5.1.FT-ICR-MS Mass spectrometer: ApexII 7T (Active Shielded) (Bruker Daltonics) Mass calibration: Mixture of PEG200, PEG400, and PEG600 Ionization: ESI Measurement ion: Positive ion Flow rate: 100 \\[l\\ Ion source temperature: 150°C Sample: 500-fold dilution with 1% acetic acid/50% methanol Sample introduction method: Infusion Range of scanning: w/zlOO-500 and //i/zSOO-lOOO 5.2. ESI-Q-TOF-MS 5.2.1. Profile Data Mass spectrometer: Q-Tof-2^^ (Micromass) Mass calibration: Mixture of PEG200, PEG400, and PEG600 Ionization: ESI Measurement ion: Positive ion Flow rate: 300 jtu/h Ion source temperature: 80°C Sample: 500-fold dilution with 0.1% formic acid/50% acetonitrile Sample introduction method: Infusion Range of scanning: m/z 50-600 5.2.2. Collision-Induced Dissociation (CID) Using Ar Gas Capillary voltage: 3200 V Cone voltage: 20 V Tempe;rature of source block: 80° °C Desolvation gas temperature: 120°°C Collision voltage: 12 V RF setting: 0.50

Metabolomic Profiling by FT-ICR-MS and ESI-Q-TOF-MS

83

(a)

r.i.

239. 1064

2.0

1.0

1

477.2046

308.0920

147.1134

0.0

1

1

150

200

. L.

1

..! i.I

. ;

i

I

1

250

(b)

0.05

0,00

ijUi

k

,ii,iillii|:J

MM-

ki

477)2070 J 450

I

m/z

Fig. 7. FT-ICR mass spectra (a) and expanded section (b) of the Escherichia coli K-12 culture. After culturing the E, coli for 7.2 h, the sample is obtained through the sampling —> quenching —>extraction procedure

6. Metabolomic Profiling by the Infusion Method 6.1. Analysis of the Intracellular Metabolites of the £. coli K-12 Strain by FT-ICR-MS After the E. coli K-12 strain is cultured for 7.2 h, the sample is obtained by the procedure of sampling -^ quenching -> extraction. The FT-ICR mass spectrum of the sample and its close-up are shown in Fig. 7a and b, respectively. The m/z 239.1064 and m/z 477.2046 ions originated from the Hepes used for the pretreatment. The theoretical [M+H]"*" value of Hepes is 239.1065, which was calculated by adding 1.0078 (mass of hydrogen) to 238.0987, calculated from the molecular formula of Hepes, C8H18N2O4S. The difference (only 0.0001 u mass ) between the observed and theoretical values proves the high mass accuracy of FT-ICR-MS. Some of the me-

84

K. Hirayama

tabolite ions observed in Fig. 7(b) can be identified because of the high mass accuracy of FT-ICR-MS. Here, we explain the identification procedure of two metabolites that yielded m/z 147.1134 and m/z 308.0920 in Fig. 7b. When the jc-axis is expanded further around the ions of m/z 147.1134 and m/z 308.0920, it is apparent that these two are singly charged ions, because the corresponding isotopes are above the peaks by one mass unit. Tables 1 and 2 show the molecular formulae estimated by using the accurate mass values of m/z 147.1134 and m/z 308.0920, calculated by the data processing software of the mass spectrometer. It is possible to search the metabolites from the estimated chemical formula by using KEGG, on the Internet (http://genome.jp /kegg/). The method is easy. After opening KEGG (URL: http://genome.jp /kegg/) -^ Open KEGG (Table of Contents) -> 1-2. Hierarchical Classification -^ Structure Search —» COMPOUND, the compounds within the KEGG database can be searched by submitting the molecular formula. At this time, because the molecular formula of [M+H]^ is described in the table, it is necessary to search it by using the molecular formula of M. When the compounds in the KEGG database were searched by using the molecular formula of M, corresponding to the three high ranks in Table 1, C6H14N2O2, C4H12N5O, and C2H10N8, it was understood that the compounds that corresponded to C4H12N5O and C2H10N8 did not exist, but five compounds in Fig. 8 existed as C6H14N2O2. Among them, the m/z \A1 A\?>A observed by FT-ICR-MS can be concluded to be the [M+H]"^ of L-lysine (Lys), because it is present in vivo. Moreover, in the three high ranks in Table 2, the compounds that corresponded to C18H13NO4 and C3H17N9O4S2 did not exist, and only reduced GSH (glutathione) of Fig. 9 corresponded to C10H17N3O6S, when the compounds in the KEGG database were searched by using the molecular formula of M. Since a lot of reduced GSH exists in vivo, m/z 308.0920 can be concluded to be the [M+H]"" of reduced GSH. Figure 10 shows the FT-ICR mass spectra of the region of m/z 250-350 of the intracellular elements of E. coli K-12, after 7.2 h (a) and 3.8 h (b) of cultivation. It is understood that different amounts of elements exist at 7.2 and 3.8 h. The differential spectrum is effective for clarifying the differences in the elements between the incubation times. Figure 11 shows the differential spectra from m/z 250 to 290. Spectrum (a) is that of 3.8 h minus 7.2 h, and (b) is that of 7.2 h minus 3.8 h.

Metabolomic Profiling by FT-ICR-MS and ESI-Q-TOF-MS

85

Table 1.

147.1134

147.1134 147.1120 147.1107 147.1174 147.1082 147.1207

C6H15N2O2 C4H13N5O C2HnN8 CnHi5 CyHnNS CgHipS

308.0923 308.0923 308.0916 308.0916 308.0925 308.0914 308.0928 308.0912

C18H14NO4 C3H18N9O4S2 CioHigNsO^S C9Hi2N,oOS CnH22N30S3 C2H14N9O9 C4H16N6O10 C9H20N6S3

Table 2. 308.092

Fig. 8. Result of the KEGG database search with the structural formula of C6H14N2O2. Only part is published due to space constraints, although five compounds were identified

86

K. Hirayama

Fig. 9. Results of the KEGG database search with the structural formula C10H17N3O6S

r. i.

(a) 308.0920

0.15

i

0.10 ., 1. ,.. . 1.1. 1,

LL_ .i^Lili

,. 1,1. •

1

.1.

1 .

1 (b)

0.05 308. 0927

0 L.L.i; Ji.l II 1

1.

1 iJ.I. ,ii,l

1

1

260

280

.. I l h . ,, .1 ,ll 1

i

320

.. . il - -

-



1, +

340

J , 1,1. 1

m/2

Fig. 10. FT-ICR mass spectra at the m/z 250-350 region of the intracellular elements of E. coli K-12. a After 7.2 h of cultivation, b After 3.8 h of cultivation

Metabolomic Profiling by FT-ICR-MS and ESI-Q-TOF-MS

87

(a)

r.i. 0.10 282.2805 284.2964 0.05

256. 2649 254.1634 i

269.1095

_____

j

-0.00

1 ,,

1J 1... II i i . .

.

1, J , 1 i _,.,.

f..>

•' 1 '

i i '

, „J 1 i,Ll 1 11 J . 1 , \"J *-'-f-'f""^" '•-! T— 1

(b)

-0.05

-0 10

1

-

1

Fig. 11. Differential spectra from m/z 250 to 290 of the intracellular elements of E. coli K-12. Differential spectra of a 3.8 h minus 7.2 h, and b 7.2 h minus 3.8 h

6.2. Analysis of the Intracellular Elements of E. coli K-12 by ESI-Q-TOF-MS After the E. coli K-12 strain is cultured for 7.2 h, the sample is obtained through the sampling -^ quenching -^ extraction procedure. The ESI-Q-TOF mass spectrum of the sample and its close up are shown in Fig. 12a and b, respectively. Ions m/z 239.1152, m/z 268.1189, m/z 477.2323, and m/z 506.2333 originated from the Hepes used for the pretreatment. Figures 13a and b are expansions of m/z 140-160 and m/z 300-320 of Fig. 12. Since they are intracellular components of E. coli K-12, m/z 147.1222 and m/z 308.1067 are thought to be Lys and reduced GSH, respectively. When the accurate masses of these ions are requested, they only have to be calculated based on the ions that exist in the same spectrum, that is, the theoretical [M+H]^ value of Hepes. When the mass of H, 1.0078, is added to the calculated mass from the molecular formula C8H18N2O4S of Hepes, 238.0987, it becomes 239.1065. This value becomes the theoretical m/z of [M+H]"^ value of Hepes. If the observed [M+H]^ values of Lys and reduced GSH are calculated based on this value, then they become 147.1146 and 308.0939, respectively. On the other hand, the [M+H]^ values calculated from the molecular formulae C6H14N2O2 of Lys and C10H18N3O6S of re-

88

K. Hirayama

duced GSH become 147.1134 and 308.0916, respectively. Then, the m/z \A1AT12 and 308.1067 values observed on the spectrum are presumed to be [M+H]^ values of Lys and reduced GSH, respectively. To confirm this, one only has to measure the CID (collision-induced dissociation) spectrum (It is called the MS/MS spectrum in colloquial language) that obtains each [M+H]"^ as a precursor ion. The CID spectrum from the [M+H]^ of the authentic Lys sample is shown in Fig. 14a, and the CID spectrum from m/z 147.1222 of the intracellular elements of E. coli K-12 strain is shown in Fig. 14b. Moreover, the CID spectrum from the [M+H]^ of the authentic reduced GSH sample is shown in Fig. 15a, and the CID spectrum from m/z 308.1067 of the intracellular elements of E. coli K-12 strain is shown in Fig. 15b. It can be concluded that each compound is Lys and reduced GSH from the spectra in Figs. 14 and 15, respectively. The ESI-Q-TOF mass spectra of the region of m/z 100-200 of the intracellular elements of E. coli K-12, after 3.8 h (a) and 7.2 h (b) of cultivation, are shown in Fig. 16. When a quantitative comparison of the intracellular elements is performed, in general, an internal standard substance is added to the analyte. For the analysis shown in Fig. 16, the amount of Lys, which gives an m/z 147.1222 in the culture at 7.2 h is twice as abundant as that at 3.8 h, even though there is no internal standard.

405,2203 50

•^'

100 r

100

150

200

250

,5062333

[•"•I' 'I I" i' I' I ' I — t ^'' I' ' i — Y — ^ ™ ' 300 350 400 4 5 0 500 550 6 0 0

173.0982 239.1152

268.1189 405.2203 308.1067

477.2323

506.2333

i

385.23041 2331?!

\i± 50

100

150

2C0

250

546.2128

321 15941

u iMilj

300

350

400

450

576.4730

4liJ(ni 500

550 600

Fig. 12. ESI-TOF mass spectra (a) and its expanded version (b) of the E. coli K-12 culture. After 7.2 h of culture, the sample was obtained through the sampling —»quenching -^extraction procedure

Metabolomic Profiling by FT-ICR-MS and ESI-Q-TOF-MS

(a)

89

(b) 151.1037

100 r

152.0696 309.1188 318.1860

156.0965

146.1821 •,

...VL

-io^ m/z 160

i ^

/

~^,

i-l^-.

..,L..i.i.l. 315

Fig. 13. ESI-TOF mass spectra of the intracellular elements of E. coli K-12. After 7.2 h of culture, the sample was obtained through the sampling —» quenching -^extraction procedure, a m/z 140-160. b m/z 300-320

100,

% 147.1249 131.0999 1

r :

(b)

84.0843

%

147.1173 84.0468

\

0

I.,

130. D918

00

h

1

1

50

60

70

102.0578

130.0498 131.0957 148.0903

1—'

80

1

90

1

r

100 110 120 130 140 150 160 170 180 190 200

Fig. 14. a Collision-induced dissociation (CID) spectrum from [M+H]^ of Lys and b that from m/z 147.1222 of the intracellular elements of E. coli K-12

179.0243

162.0027 3.0150 I .

130.0391

[180.0369

233.0280 |234,0 1234,0456

308.0621 291.0454 1 n—I—I—I—r-

162.0328 179.0654

76.0292 60

80

291.0917308.1067

130.0545

f" ! ' I — r " I " f" "I''" i'—r^-1—'I

(b)

I' '"I ' 'i""i ''Z*"!

4^

f 'I !•" 'I" 'I • i f

i"*i'

I" !• f

100 120 140 160 180 200 220 240 260 280 300 320 340

Fig. 15. a CID spectrum from [M+H]^ of reduced glutathione (GSH) and b that from m/z 308.1067 of the intracellular elements of E. coli K-12

90

K. Hirayama

(a)

" -

147.1222

-

136.0682 M.

0

jlSI.108 7156.0914

74.0921

189.0649 1

1.

100

124.0447

147.1222151.1037 "l 1152.0696 1 ^

189.0649 168.0746 [174.0921 ,.1

1194.1495

120 125 130 135 140 145 150 155 160 165 170 175 180 185 190 195 200 205

Fig. 16. ESI-TOF mass spectra at the region of m/z 100-200 of the intracellular elements ofE. coli K-12. a After 3.8 h of cultivation, b After 7.2 h of cultivation

7. Conclusion Metabolic profiling by FT-ICR-MS and ESI-Q-TOF-MS is an effective method to capture a rough image of metabolism. It should be performed after the accuracy is confirmed by using the authentic sample after an elaborate calibration is done, when an individual metabolite is identified from the accurate mass for FT-ICR-MS. Moreover, when ESI-Q-TOF-MS is used, the CID spectrum should also be measured, and compared with that of the authentic sample.

References

2.

Aharoni A, Ric De Vos CH, Verhoeven HA, Maliepaard CA, Kruppa G, Bino R, Goodenowe DB (2002) Nontargeted metabolome analysis by use of Fourier transform ion cyclotron mass spectrometry. J Integr Biol 6:217-234 Maharjan R, Ferenci T (2003) Global metabolite analysis: the influence of extraction methodology on metabolome profiles of Escherichia coli- Anal Biochem 313:145-154

Chapter 7: Metabolome Analysis by Capillary Electrophoresis Li Jia and Shigeru Terabe Graduate School of Material Science, University of Hyogo, Kamigori, Hyogo 678-1297, Japan

1. Introduction Separations by capillary electrophoresis (CE) are based on the differences in electrophoretic mobilities of ions in electrophoretic media inside narrow-bore capillaries (less than 100 |Lim i.d.). The ability to obtain high separation efficiencies by CE was highlighted in the early 1980s. The late 1980s and early 1990s saw the advent of commercially available equipment. In late 1990s CE made a broadening of the range of separation mechanisms and instrumentation developments aimed at addressing practitioner's demands. Capillary electrophoresis also expanded applications in a wide range of fields in late 1990s. Compared to high-performance liquid chromatography (HPLC), CE offers a number of advantages, which include reduced method development time, reduced running costs and almost no solvent consumption, and up to two orders of magnitude higher separation efficiency. The principal disadvantage of CE is its relatively low concentration sensitivity with conventional absorbance detectors. However, now several approaches have been developed to overcome the difficulty, which include modified capillary dimensions and on-line sample preconcentration techniques [stacking, sweeping, transient-isotachophoresis (tITP), dynamic pH junction, and dynamic pH junction-sweeping]. There are several separation modes in CE, i.e., capillary zone electrophoresis (CZE), capillary gel electrophoresis, micellar electrokinetic chromatography (MEKC), capillary electrochromatography, capillary isoelectric focusing, and capillary isotachophoresis, in which CZE and MEKC are the most popular modes and are most suitable for the metabolome analysis since the metabolome usually consists of small molecules. Hence, in this chapter we introduce just the three modes of CE and their applications in metabolome analysis. For those who want to learn CE, a textbook is cited [1].

92

L. Jia and S. Terabe

2. Instrumentation Figure 1 shows a schematic diagram of the basic instrumental set-up of CE, which consists of an injection system, a high-voltage power supply, two buffer reservoirs, a capillary and a detector. Commercial CE instruments are additionally equipped with an autosampler allowing series analysis and capillary thermostating, and a computer for instrumental control and data acquisition. Different modes of CE separations can be performed using the same CE instrument. The typical voltages used are in the range of 5-30 kV, which results in currents in the range of 10-100 |LIA. The capillary is a key element of the CE separation. Cylindrical polyimide-coated fused silica capillaries with a narrow diameter (50-75 |am) are the most often used today. The external polyimide protective coating increases the mechanical strength of a capillary as bare fused silica is extremely fragile. The widespread use of fused silica is due to its intrinsic properties, which include transparency over a wide range of wavelength and a high thermal conductance. The narrow capillary diameter facilitates the dissipation of Joule heating. Electro-osmotic flow (EOF) is generated inside the capillary when a voltage is applied between the both ends of the capillary filled with a running solution. Electro-osmotic flow originates from the negative charges caused by the ionization of the silanol groups on the inner wall of the capillary. A key feature of EOF is that it has a flat flow profile, which favorably minimizes zone broadening, leading to high separation efficiencies. The strength of EOF depends on several factors: surface charge on the capillary wall, viscosity and permittivity of the solution, and temperature. It should be noted that the surface charge is significantly affected by the pH. That is, EOF is strong under alkaline or neutral conditions, reduced under pH 5, and almost suppressed below pH 2. Sample injection is performed by temporarily replacing one of buffer reservoirs with a sample vial. Typical injection volumes range from picoliters to nanoliters. There are two commonly used injection methods for CE: hydrodynamic and electrokinetic. Hydrodynamic injection is accomplished by the application of a pressure difference between the two ends of a capillary. The amount of sample injected can be manipulated by varying the injection time and the pressure difference. A major limitation of the hydrodynamic injection is that it is not suitable for the injection of highly viscous samples. Electrokinetic injection is performed by applying a voltage at the sample vial for a certain period of time, resulting in the transport of sample into the capillary by electromigration, which includes contributions from both electrophoretic migration of sample ions and electro-

Metabolome Analysis by Capillary Electrophoresis

93

osmotic flow. The amount of sample injected can be controlled by varying the injection time and the applied voltage. There are two biases occurring in electrokinetic injection. One is a discrimination of the injected sample components due to the mobility differences of the analytes. The other is the change in the absolute amount injected into the capillary due to the difference in the conductivity of the sample solution. With some modifications, most HPLC detection modes can be applied to CE. Among them, on-colunm UV absorption and fluorescence detection, and mass spectrometry (MS) are very useful for metabolome analysis. Capillary electrophoresis/MS is not described in this chapter (see Chapter 2). On-column UV absorbance is the most widely accepted detector currently due to its relatively universal detection capability, simple adaptation, and low cost. The capillary itself acts as the on-column detector cell, which was made by removing the protective polyimide coating from a small section of the fused silica capillary. One of the main issues with UV absorbance detection is that of insufficient detection sensitivity owing to the limitation of the small inside diameter of the capillary and low injection volume. Generally, the concentration detection limits are of the order of 10~^ M for most analytes with chromophores. There are two ways to enhance performance in absorption detection. One is to increase the optical path length. Z-shaped or bubble cells are commercially available extended path length absorbance detectors. The other is on-column preconcentration technique by increasing the injection volume, which will be discussed below. The majority of instruments also have UV diode array detectors available, which is beneficial in the identification of unknown compounds and examination of peak purity by providing spectral information. Laser-induced fluorescence (LIF) detection is a highly sensitive detection method in CE. The concentration detection limits of LIF detection are of the order of 10'^ M for analytes with fluorophores. Unfortunately, laser sources are expensive and the excitation wavelengths available are rather limited. Moreover, very few compounds are native fluorescent. Hence, pre- or postcolumn derivatization with some types of fluorophore is needed to extend the application of LIF. Indirect detection can be employed for UV or fluorescently inactive compounds. In indirect detection, the background electrolyte contains either a UV-absorbing or fluorescent constituent that provides a stable baseline signal. As analyte zones migrate through the detector, they effectively displace the absorbing constituent to reduce the background signal, leading typically to negative peaks. The sensitivity for indirect detection is slightly less than that for direct detection counterpart. The linear dynamic range is also lower than direct detection counterpart.

94

L. Jia and S. Terabe

Fig. 1. Schematic of capillary electrophoresis instruments. 7, coolant; 2, cassette; i, high-voltage (HV) power supply; 4y capillary; 5, electrodes; 6, detector; 7, reservoirs; 8, carousel for sample solutions and running buffers

3. Separation Modes and Principles 3.1. Capillary Zone Electrophoresis Capillary zone electrophoresis is the simplest and most widely used separation mode in CE. Figure 2 depicts the schematic principle of the CZE separation. The separation is based on the differential electrophoretic mobilities of solutes, which are characteristic properties of analyte ion in a given media and at a given temperature. Therefore, only charged compounds or ions can be separated by this method. An uncoated fused-silica capillary tube is typically used for CE. Separation is optimized by choosing an electrolyte system, with suitable pH, ionic strength, and composition. The pH value of the electrolyte solution is the most important separation parameter since it influences the dissociation of weakly acidic, basic, or zwitter-ionic analytes. The use of additives such as organic solvents and complexing agents (cyclodextrins, crown ethers) is also an effective method to improve resolution.

Metabolome Analysis by Capillary Electrophoresis

95

Fig. 2. Separation principle of capillary zone electrophoresis. +, cation; -, anion; N, neutral analyte; EOF, electro-osmotic flow

Fig. 3. Separation principle of micellar electrokinetic chromatography (MEKC). S, analyte; ~, anionic surfactant; EOF, electro-osmotic flow

96

L. Jia and S. Terabe

3.2. Micellar Electrokinetic Chromatography Micellar electrokinetic chromatography was first introduced by Terabe and coworkers in 1984, and it is particularly useful for the separation of small molecules including neutral analytes. Figure 3 depicts the schematic principle of MEKC separation. The separation is based on the partitioning of analytes between the micellar phase and the aqueous solution phase. An ionic micellar solution is employed as a separation solution, and under the capillary electrophoretic condition the ionic micelle migrates at a different velocity from the bulk solution because the micelle is subjected to the electrophoretic migration. The micelle corresponds to the stationary phase in chromatography and often is called the pseudostationary phase. A fraction of the analyte incorporated by the micelle migrates at the velocity of the micelle, while the rest of the analyte free from the micelle migrates at the EOF velocity. Under neutral or alkaline conditions, the electro-osmotic velocity is faster than the electrophoretic velocity of the micelle in the opposite direction and hence, the micelle also migrates in the same direction as EOF. When an anionic micelle such as sodium dodecyl sulfate (SDS) is employed, all the neutral analytes migrate toward the cathode due to the strong EOF. The less-incorporated analytes or hydrophilic analytes migrate faster than the more incorporated analytes or hydrophobic analytes. For ionic compounds, charge-to-size ratios, hydrophobicity and charge interactions at the surface of the micelles combine to influence the separation of the analytes. Since MEKC is a chromatographic technique, the separation selectivity is manipulated by the chromatographic considerations. The choice of the surfactant, the pH and composition of the running solution, and the use of additive are important factors to manipulate selectivity. The chemical structure of the surfactant, in particular that of the polar group, affects selectivity significantly. To resolve highly hydrophobic compounds by MEKC, several modifiers (cyclodextrin, organic solvents, urea or glucose) are developed to reduce the fraction of analytes incorporated by the micelle.

4. On-Line Sample Preconcentration Methods An approach to improve the detection sensitivity in terms of concentrations is on-line sample preconcentration, which is performed by injecting a large sample volume and by electrokinetically focusing analyte zones prior to separation. To date, five major on-line preconcentration techniques have been reported in CE: field-enhanced sample stacking, sweeping, dynamic

Metabolome Analysis by Capillary Electrophoresis

97

pH junction, dynamic pH junction-sweeping, and transient isotachophoresis, among which the first three is now explained.

4.1. Field-Enhanced Sample Stacking Field-enhanced sample stacking utilizes a high electric field observed in the sample zone by preparing the sample solution in a low electric conductivity matrix [2]. Since the electrophoretic velocity is proportional to the field strength, analyte ions migrate at a much faster velocity in the sample zone than in the separation zone and stack at the boundary between the sample and separation zones (Fig. 4). Theoretically, the degree of sample stacking is proportional to the ratio of resistivities of the sample solution and background solution. However, the concentration efficiency in the sample stacking is deteriorated by a mismatch of the EOF. Electro-osmotic flow velocity is also proportional to the field strength and must be different between the two zones due to the difference in electric field strength. Owing to the continuity of the solution, the bulk electro-osmotic velocity must be constant throughout the capillary. Therefore, mixing must occur at the boundary of the two zones. This discrepancy is minimized when the EOF is suppressed. It should be noted that although neutral analytes are not concentrated simply by this stacking technique, the technique is available also in MEKC, provided the neutral analytes are incorporated by the micelle.

Fig. 4. Schematic of field enhancing sample stacking. Right-hand side shows the high electrical conductivity (low electric field) zone and left-hand side the low electrical conductivity (high electric field) zone. Charged ions migrate fast in the left zone and slow down at the boundary

98

L. Jia and S. Terabe

4.2. Dynamic pH Junction Dynamic pH junction is an efficient preconcentration technique for the weakly ionic analytes if the difference in pH between the sample matrix and background solution can cause significant changes in their mobilities [3]. Generally, the sample is prepared in a buffer where the mobility of the analyte is zero (about 1 pH unitpA^). Focusing is hypothesized to be caused by the formation of a transient pH titration within the sample zone, which results in rapid focusing of analytes that undergo velocity changes in the selected pH range (Fig. 5). Single or mixed buffer types can be used to generate an appropriate dynamic pH junction. The sample may consist of the same buffer or different electrolyte type as the background solution to optimize the pH junction range for the focusing of weakly acidic, basic or zwitter-ionic analytes (mobility is pH dependent) based on their pK^ and/or p/. Dynamic pH junction is a powerful technique for the metabolome analysis because most metabolites are weakly acidic or basic.

Fig. 5. Schematic illustration of dynamic pH junction. EOF, electro-osmotic flow

Metabolome Analysis by Capillary Electrophoresis

99

4.3. Sweeping Sweeping is performed by injecting a long plug of the sample solution which is to have a similar electric conductivity as that of the separation solution but devoid of the pseudostationary phase and by applying voltage with the separation solution in the inlet vial [4]. Sweeping is defined as the picking and accumulating of analytes by the charged pseudostationary phase that fills or penetrates the sample zone during application of a voltage (Fig. 6). Sweeping is based on the partitioning mechanism and the concentration efficiency is dependent on the retention factor of an analyte. The higher the retention factor, the higher the concentration efficiency. An advantage of sweeping is that sample matrix can contain relatively high concentrations of electrolytes since low conductivity is not required for the sample matrix. Sweeping is also powerful even in the presence of a strong EOF although concentration efficiency is higher under a suppressed EOF. Both charged and uncharged metabolites that possess high retention factors can be effectively preconcentrated by sweeping. Chemical derivatization of hydrophilic metabolites with a hydrophobic probe can be an effective way to enhance sweeping performance. The combination of different on-line sample preconcentration techniques can enhance concentration efficiency or expand the range of analytes effectively concentrated, e.g., electrokinetic stacking and sweeping, dynamic pH junction and sweeping.

5. Example of Metabolome Analysis by CE 5.1. Analysis of Amino Acids and Amines with LIF Detection Amino acids are important metabolites in the cell, but most amino acids do not have strong chromophores. Therefore, derivatization of amino acids with fluorescent or UV probes is required to enhance detector sensitivity. In our lab, LIF detection with argon ion laser (488 nm) as an excitation source was employed and 4-fluoro-7-nitrobenz-2-oxa-l,3-diazole (NBD-F) was used as an fluorescence-labeling reagent [5]. Due to the high hydrophobicity of the derivatized amino acids, an MEKC method was developed to analyze amino acids in the cell extract of Bacillus subtilis, as shown in Fig. 7. The concentrations of major amino acids in the cell extract were estimated to be from sub |LIM to tens of mM as shown in Table 1.

100

L. Jia and S. Terabe

Fig. 6. Schematic illustration of sweeping

2000

Pro Gin, Lys, R

Leu

Glu

Orn

ß-A\3 Citrulline

Arg

Tyr

Asp

vJUwJ 10

12

14

18

20

22

24

26

28

30

Cmin]

Fig. 7. Separation of NBD-F derivatized amino acids in the cell extract of B. subtiliS' SAH, 5'-adenosylhomocysteine; GABA, y-aminobutyric acid. Conditions: capillary, 50 jim i.d.x56 cm (45 cm to the detector); running solution, 50 mM sodium dodecyl sulfate (SDS)-2 M urea-40 mM Briji 35 in 100 mM borate buffer (pH 9.0); injection, hydrostatic 10 s at 5 kPa; applied voltage, 20 kV; detection, laser-induced fluorescence (LIF) with Ar ion laser (488 nm)

Metabolome Analysis by Capillary Electrophoresis

101

Table 1. Quantitative analysis of some amino acids in Bacillus subtilis cell extract Amino acid Concentration/M (S-6 Concentration/M (S-6 Glucose medium)

Alanine 3.14X10' Glutamic acid 1.85X10^ Aspartic acid 4.2x10^ Capillary electrophoresis conditions are given in Fig.

Malate medium) 1.15X10^ 2.54X10^ 9.1x10^

7

5.2. Analysis of Purines by CE with UV Detection New separation platforms for high-throughput analysis based on multiplexed CE (capillary array format) promise rapid and highly efficient separations, as highlighted by its important role in rapid DNA sequencing used in the Human Genome Project. In our research group, a multiplexed CE system w^ith UV detection in conjunction with dynamic pH junction was demonstrated as a novel method for the sensitive and high-throughput analysis of purine metabolites [6]. The optimization of purine focusing can be rapidly assessed by systematically altering the sample matrix properties, such as the buffer co-ion, pH, and ionic strength using a 96-capillary array format. The method permits focusing of large sample injection volumes, resulting in more than a 50-fold enhancement in concentration sensitivity compared to conventional injections. The technique also demonstrated excellent intercapillary precision and linearity in terms of normalized migration times and peak areas. 5.3. Analysis of Nucleotides by CE with UV Detection The pyridine nucleotides (NAD, NADP, NADH, and NADPH) represent a class of coenzyme involved in a number of critical catabolic and anabolic pathways in living organisms. The adenine nucleotides (AMP, ADP, and ATP) also play an important role as physiological signaling molecules which bind to membrane purine receptors. Figure 8 presents electropherograms in analysis of B. subtilis cell extracts from glucose and malate as culture media [7]. Nanomolar (nM) detectability of analytes by CE with UV photometric detection is achieved through effective focusing of large sample plug (about 10% of capillary length) using sweeping by borate complexation method, reflected by a LODs of about 2x10'^ M. Concentrations of pyridine and adenine nucleotides in a single cell were estimated at

102

L. Jia and S. Terabe

millimolar level. The concentrations of the analytes were also found to be different in cell extracts derived from glucose or malate culture media. 5.4. Analysis of Flavins by CE with LIF Detection The flavins, riboflavin (RF), flavin mononucleotide (FMN), and flavin adenine dinucleotide (FAD), represent an important class of metabolites in the cell, which are natively fluorescent. A CE method with LIF detection was developed to analyze trace amounts of flavins from different types of biological samples (including bacterial cell extracts, recombinant protein, pooled human plasma and urine) using dynamic pH junction-sweeping as an on-line preconcentration technique [8]. Picomolar detectability of flavins by CE-LIF detection was realized with on-line preconcentration (up to 15% capillary length used for injection) by dynamic pH junctionsweeping, resulting in a LOD of about 4.0 pM for the flavin coenzymes FAD and FMN. More than a 1200-fold improvement in concentration sensitivity was demonstrated compared to conventional injections. Submicromolar amounts of flavin coenzymes were measured directly from formic acid cell extracts of B. subtilis. Figure 9 shows electropherograms depicting analysis of flavin coenzymes in cell extracts of B. subtilis- Significant differences in flavin concentration were measured in cell extracts derived from either glucose or malate as the carbon source in the culture media. 1UU 90 - (a) 80 70 5 60 < 50 ^ 40 30 20 10 — 1 0

NAD

AMP

ADP

NADP j I 1 ATPI . Jlt-'^./'-'^f—

10

L

11 12 13 14 15 16 Cmin]

95 85 - (b) 75 65 555 »l 0)

o Ü

c

1 1

0

>% "ü O •^ E LU y

Phosphoenol pyruvate

1 Pyruvate Fig. 2. Carbon sources have different access to the metabolic pathway network of Bacillus subtilis. The central carbon flow on the Embden-Meyerhof pathway is glycolytic or gluconeogenetic; glucose,fructose,glycerol, and gluconate is glycolytic, whereas malate is gluconeogenetic. The carbon sources are transported from the culture medium to the cells as glucose-6-phosphate (G-6P),fructose-6-phosphate(F-6P), glycerol-6-phosphate (glycerol-6P), gluconate, and malate. These are the "starting metabolites" for the catabolism

2. Proposed Mechanism of Catabolite Repression in 6. subtilis Microorganisms grow on various organic substances as a carbon source. When a mixture of different organic substances is available in the grovv1:h medium, they prefer some carbon sources. After consuming the favored carbon sources, they catabolize the others. Their cells repress the enzyme activities that are necessary for catabolizing the other carbon sources. This is called "catabolite repression." Glucose is often the most effective carbohydrate causing repression. In E. coli and B. subtilis, catabolite repressions have been studied by measuring gene expressions w^ith DNA microarrays [1,2]. Proposed mechanisms for the repressions are completely different betw^een the tw^o microorganisms.

130

T. Nishioka et al.

In the E. coli cells growing on glucose as a carbon source, the intracellular level of cyclic AMP (cAMP) is quite low. This suppresses the expression of enzyme genes that are necessary for catabolizing the other carbon sources. Growing on lactose as a carbon source, the E. coli cells increase cAMP, which forms a complex with its receptor protein, C-reactive protein (CRP). The camp-CRP complex binds to the promoter region in the lactose operon, expressing the genes of enzymes that are necessary to catabolize lactose [3,4]. This mechanism, however, is inapplicable to B, subtilis, because no cAMP exists in the B. subtilis cells. Fujita et al. [5] proposed a different mechanism for the catabolite repression in B. subtilis (Fig. 1). On glucose or fructose as a carbon source, fructose 1,6-bisphosphate (F16P) in the B. subtilis cells increases and activates HPr kinase to phosphorylate HPr protein. Phosphorylated HPr protein forms a complex with a transcription factor, CcpA. By binding to a catabolite-responsive element ere, the complex suppresses the gene expressions for enzymes that are necessary for metabolism of other carbon sources.

3. Experimental Methods Bacillus subtilis strain 168 was cultured at 37°C on modified 2xSG medium containing either glucose, fructose, gluconate, malate, or glycerol as a carbon source (Fig. 2). The concentration of carbon sources in the culture medium was adjusted to be the same as a total of carbon atoms, 150 mM carbon atom; concentrations in the culture medium were 25 mM for glucose, fructose, and gluconate, 37.5 mM for malate, and 50 mM for glycerol. Yeast extract was not added to the culture medium, because its addition induced the expressions of more genes on DNA microarray chips prepared for B. subtilis. Bacillus subtilis was cultured and its growth monitored using optical density at 600 nm (ODÖOO). When the growth reached the middle of logarithmic phase (0.85 ODeoo) in each culture, we collected B. subtilis cells in 10 ml culture medium on a glass filter and quickly extracted the metabolites in the cells by immersing the glass filter in 1 ml methanol containing methionine sulfone and PIPES as a cationic and an anionic internal standard, respectively. Phospholipids and biopolymers, which are adsorbed on the inner wall of a glass capillary and interfere with analysis, were removed from the methanol extracts by two methods: partitioning with chloroform-water solution and microfiltration. The filtrates from the microfiltration were lyophilized and stored at -80°C. Lyophilized samples were dissolved in water before analysis.

Combined Analysis of Metabolome and Transcriptome

131

Each sample was analyzed with three capillary electrophoresis/mass spectrometry (CE/MS) systems that were conditioned for the separations of cationic and anionic metabolites, and mononucleotides [6,7]. We identified metabolites by matching their migration times on CE and m/z values on MS to those measured with a total of 352 metabolic standards. Among the metabolic intermediates of energy metabolism, glyceraldehyde-3phosphate and oxaloacetate were not measured because they are unstable and decompose during the extraction procedure. The number of cells in a 10-ml sample and volume of a cell were estimated as 4x10^ cells/ml at OD6oo=1.0 and 2.75x10"^^ 1, respectively [8].

4. Results 4.1. Growth Curves We cultured B. subtilis on each of the five different carbon sources: glucose, fructose, gluconate, malate, and glycerol. Glucose and fructose have been reported as carbon sources that suppress the catabolism of the other three carbon sources [5]. Bacillus subtilis cells grew at almost the same rate on the different carbon sources; the mean of doubling times was about 1.4 h. When compared on the basis of the growth curves, the logarithmic phase on glycerol or gluconate was delayed by 1.5 h from that on glucose or fructose. At the middle of the logarithmic phase in each culture, OD6oo=0.85, we collected the cells on a glass filter and quickly extracted metabolites from the cells on the filter with methanol. The intracellular amount of metabolites in the extract was measured with CE/MS. 4.2. Metabolite Profiles of the 8. subtilis Cells Grown on Different Carbon Sources We measured the intracellular amounts of 88 metabolites in the B. subtilis cells grown on five different carbon sources. The measured metabolites include amino acids and metabolic intermediates in glycolytic and Entner-Doudoroff pathways and tricarboxylic acid (TCA) cycles. Figure 3a shows the metabolite profile in the cells grown on glucose. In the profile, circles show metabolites and their sizes are proportional to the concentrations of metabolites. Metabolites are plotted on the metabolic pathway network of B. subtilis predicted by ARM [9,10]. Intracellular amounts of the metabolic intermediates are low and not accumulated in glycolytic and

132

T. Nishioka et al.

Entner-Doudoroff pathways that are a major route of the carbon flux. Citrate and succinate are higher in the TCA cycle. A greater amount of lactate was probably induced under anaerobic culture conditions. Aeration of the culture medium might not have been sufficient near the mid logarithmic phase when they consumed oxygen at the maximum rate. Figure 3b shows the metabolite profile observed in the cells grown on fructose. This profile is quite similar to the one observed in the cells grown on glucose. Fructose and glucose are the favored carbon sources for B. subtilis. Figure 3c-e shows the metabolite profiles in the cells grown on gluconate, glycerol, and malate, respectively, whose catabolism was suppressed by glucose and fructose. These three profiles share almost the same features as the cells grown on glucose and fructose. Although the five metabolite profiles appeared similar, we found a specific and common difference between the profiles of the suppressed and the suppressing carbon sources. In the three profiles of the suppressed carbon sources, gluconate in Fig. 3c, glycerol-3-phosphate in Fig. 3d, and malate in Fig. 3e are have a much greater intracellular amount than the other profiles. These three metabolites are the starting metabolites in the catabolism of gluconate, glycerol, and malate, respectively. In contrast, the starting metabolites in the two profiles of the suppressing carbon sources are glucose-6-phosphate and fructose-1-phosphate, which have a lower amount (Fig. 3a,b). In other words, starting metabolites were accumulated in the cells growing on carbon sources whose catabolism is suppressed but not on those that repress others. The intracellular amount of a starting metabolite theoretically depends on the difference between two fluxes: the uptake of a carbon source as a starting metabolite and the subsequent transformation of the starting metabolite into other metabolic intermediates. Accumulation of a starting metabolite indicates that the uptake might be larger than the transformation. In B. subtilis, the rate-limiting step is most likely the intracellular transformation or the uptake depending on the repressed or the repressing carbon sources, respectively.

Combined Analysis of Metabolome and Transcriptome

133

Fig. 3. Metabolite profiles of Bacillus subtilis cells growing on five different carbon sources, a-e Metabolite profiles of the cells growing on glucose, fructose, glycerol, gluconate, and malate, respectively. Metabolites were extracted from the cells at the maximum growth rate m each culture. Circles are metabolites and are located on the metabolic pathways of glycolysis, pentose phosphate cycle, and tricarboxylic acid (TCA) cycle. The size of each circle corresponds to the intracellular amount of the metabolite. The number shown on each circle is the intracellular concentration of the metabolite in mM

134

T. Nishioka et al.

Fig. 3. (continued)

Combined Analysis of Metabolome and Transcriptome

135

Fig. 3. (continued)

4.3. Physiological Meanings of Similar Metabolite Profiles The metabolite profiles we observed were those of the B. subtilis cells that were grown at the maximum growth rate. While the maximum growth rates observed on different carbon sources were the same, the metabolite profiles were also similar independent of the carbon sources. This suggests that the similar profiles might form the biochemical basis of the same maximum growth rates. In other words, B. subtilis that were cultured on different carbon sources globally regulated the metabolism to maintain the metabolite profile optimized for the maximum growth rate. The optimized metabolite profile must be equal to the average of the five metabolite profiles we observed. Bacillus subtilis might use the profile as a template for metabolic regulation. Every living organism has its own template because they have specific nutritional requirements and metabolic pathway networks.

136

T. Nishioka et al.

4.4. Fructose-1,6-Bisphosphate (F16P), a Key Metabolite in the Uptai^e of Glucose and Fructose Three types of transport systems are known for the uptake of carbon sources in B, subtilis: phosphotransferase systems (PTSs), channels, and active transporters [11]. Glucose and fructose are transported by different PTSs, PtsG, and FruA, respectively. Glycerol and gluconate are transported by a channel protein, GlpF, and an active transporter protein, GntP, respectively. The transporter system for malate has not been identified. The two PTSs first phosphorylate glucose and fructose in culture medium to the corresponding phosphates, glucose-6-phosphate and fructose-1-phosphate, respectively, then transport the phosphorylated sugars into the cell (Fig. 1). The proposed mechanism of the catabolite repression in B. subtilis supposes that the transport of glucose and fructose by PTS and the following catabolism of their phosphates increase the intracellular concentration of fructose-1,6-bisphosphate, F16P. The increase is assumed as a key metabolic intermediate that induces catabolite repression. The measured amount of F16P in the cells growing on the suppressing carbon sources. Fig. 3a,b, was as low as that of F16P in the cells grown on the suppressed carbon sources. Fig. 3c-e. These profiles do not satisfy the proposed mechanism; F16P is not a sole factor that induces catabolite repression in B. subtilis.

4.5. Combining a Metabolite Profile with a Gene Expression Profile Fujita et al. cultured B. subtilis on malate and glucose, extracted mRNAs from the cells at the middle logarithmic phase of each culture, OD6oo=0.8, and measured the gene expressions by using B. subtilis DNA microarrays that contained 4055 protein genes [12]. The complete microarray data of their study are available on the KEGG web site, http://www.genome.jp/ kegg/expression. The ratio (malate/glucose) of the gene expressions in the cells growing on malate to those on glucose contains the sum of the difference between the suppressed and the suppressing carbon sources, and that between the gluconeogenesis and the glycolysis. The ratio (malate/glucose) of enzyme genes are overlaid on the metabolite profile in the cells growing on malate (Fig. 4). In Fig. 4, the names of genes are in boxes when their expression increased or decreased more than twofold on malate. The catabolism of malate and glucose shares the Embden-Meyerhof pathway, although their carbon atoms flow in opposite directions; malate is

Combined Analysis of Metabolome and Transcriptome

137

gluconeogenesis, whereas glucose is glycolysis. In addition, we observed that the metabolite profiles and maximum cell growth rates on the two carbohydrates were similar. These facts prompted us to expect that the gene expression profiles would also be similar growing on either malate or glucose. However, the observed gene transcripts were significantly reduced in the Embden-Meyerhof pathway on malate, because the five enzyme genes in Embden-Meyerhof pathway, eno, pgm, tpiA, pgk, and gapA, form an operon on the B. subtilis genome, whose expressions are regulated by glucose (Fig. 5). In spite of the reduced ratio (malate/glucose) in the central carbon flow pathway, the metabolite profile and maximum cell growth rate on malate remained similar to those on glucose. This suggests that the enzymes in the Embden-Meyerhof pathway might constitute enough for the gluconeogenesis of malate to support the maximum grow1;h rate because a physiological role of the pathway is to carry the central carbon flow. Among the enzyme reactions in the Embden-Meyerhof pathway in B. subtilis, the sole irreversible reaction is the transformation between glycerate-1,3-diphosphate and glyceraldehyde-3-phosphate. GapA and GapB irreversibly catalyze the reactions in glycolysis and gluconeogenesis, respectively. Because GapB is indispensable for the catabolism of malate, GapB was upregulated by 61 times more on malate than on glucose. Malate as the starting metabolite could be catabolized by three possible pathways: from malate to pyruvate, from oxaloacetate to phosphoenolpyruvate, and from acetyl-CoA to pyruvate. The ratio (malate/glucose) of all the enzyme genes in the three pathways was higher than twofold. In spite of the upregulation malate accumulated in the cells, suggesting that the three pathways are rate-limiting in the catabolism of malate. A bypass for the gluconeogenesis of malate is the Entner-Doudoroff pathway that produces 6-phosphogluconate via 2-dehydro-3-deoxygluconate from pyruvate. The enzyme genes in this pathway were upregulated by 2-4 times on malate. DNA microarrays predicted the major metabolic pathways used for the catabolism of malate. The ratio (malate/glucose) was increased at the enzyme genes within the distance of two or three reaction steps from malate.

138

T. Nishioka et al.

Fig. 4. Combined profiles of metabolites and gene expressions of Bacillus subtilis. See Color Plate 2. Gene expression profile was overlayed to the metabolite profile of the cells grown on malate (Fig. 3e). Letters in italic are the gene names of enzymes. Number in the parentheses after each gene name shows the ratio of gene expressions in the cells growing on malate to those on glucose; ratio (malate/glucose). Genes in red were upregulated on malate, whereas those in blue were downregulated. Malate increased the gene expression of the enzymes for gluconeogenesis

Fig. 5. Operon composed of the five enzyme genes on Embden-Meyerhof pathway. From the KEGG database

Combined Analysis of Metabolome and Transcriptome

13 9

5. Conclusions The metabolite profiles of the B. subtilis cells are similarly independent on the carbon sources regardless of whether they suppress others or are suppressed by others. All the similar profiles were measured at the maximum growth rate, suggesting that B. subtilis has a predetermined metabolite profile optimized for the maximum growth rate. Differences in carbon sources induced local perturbations in the predetermined profile. One of such perturbations was the accumulation of the starting metabolites in the suppressed carbon sources. Combined analysis of the metabolite profile and DNA microarrays revealed that the first reaction in the catabolism was rate-limiting when B. subtilis was grown on suppressed carbon sources, although the enzyme genes of the reactions were upregulated. The present analysis suggests that the decrease or increase in the gene expression of an enzyme does not always result in the accumulation or decrease in its substrates or products, because of the multiplicity of metabolic pathway networks. Metabolome and transcriptome data that supplement each other provide much informatiion to study the global regulation of metabolism.

References 1. Yoshida K, Kobayashi K, Miwa Y, Kang CM, Matsunaga M, Yamaguchi H, Tojo S, Yamamoto M, Nishi R, Ogasawara N, Nakayama, T, Fujita Y (2001) Combined transcriptome and proteome analysis as a powerful approach to study genes under glucose repression in Bacillus subtilis. Nucleic Acids Res 29:683-692 2. Zheng D, Constantinidou C, Hobman JL, Minchin SD (2004) Identification of the CRP regulon using in vitro and in vivo transcriptional profiling. Nucleic Acids Res 32:5874-5893 3. Kolb A, Busby S, Buc H, Garges, S, Adhya S (1993) Transcriptional regulation by cAMP and its receptor protein. Annu Rev Biochem 62:749-795 4. Saier M, Ramseier, T, Reizer J (1996) Regulation of carbon utilization. In: Neidhardt F, Curtiss R, Ingraham J, et al (eds) Escherichia coli and Salmonella: cellular and molecular biology, vol. 1. ASM Press, Washington, DC, pp 1325-1343 5. Fujita Y, Miwa Y, Galinier, A, Deutscher J (1995) Specific recognition of the Bacillus subtilis gnt cis-acting catabolite-responsive element by a protein complex formed between CcpA and seryl-phosphorylated HPr. Mol Microbiol 17:953-960 6. Soga T, Ueno Y, Naraoka H, Matsuda K, Tomita, M, Nishioka T (2002) Pressure-assisted capillary electrophoresis electrospray ionization mass spectrometry for analysis of multivalent anions. Anal Chem 74:6224-6229

140

T. Nishioka et al.

7. Soga T, Ueno Y, Naraoka H, Ohashi Y, Tomita, M, Nishioka T (2002b) Smultaneous determination of anionic intermediates for Bacillus subtilis metabolic pathways by capillary electrophoresis electrospray ionization mass spectrometry. Anal Chem 74:2233-2239. 8. Fujita, Y, Freese E (1979) Purification and properties of fructose-1,6bisphosphatase of Bacillus subtilis. J Biol Chem 254:5340-5349 9. Arita M (2003) In silico atomic tracing by substrate-product relationships in Escherichia coli intermediary metabolism. Genome Res 13:2455-2466. 10. Arita M (2004) The metabolic world of Escherichia coli is not small. Proc Natl Acad Sei USA 101:1543-1547 11. Deutscher J, Galinier, A, Martin-Verstraete I (2002) Carbohydrate uptake and metabolism. In: Sonenshein A, Hoch J, Losick R (eds) Bacillus subtilis and its closest relatives from genes to cells. ASM Press, Washington DC, pp 129-162 12. Doan T, Servant P, Tojo S, Yamaguchi H, Lerondel G, Yoshida K-I, Fujita, Y, Aymerich S (2003) The Bacillus subtilis ywkA gene encodes a malic enzyme and its transcription is activated by the YufLAfufM two-component system in response to malate. Microbiology 149:2331-2343

Chapter 10: Metabolomics in Arabidopsis thaliana Kazuki Saito Graduate School of Pharmaceutical Sciences, Chiba University, CREST of Japan Science Technology Agency, 1-33 Yayoi-cho, Inage-ku, Chiba 263-8522, Japan and RIKEN Plant Science Center, Tsurumi-ku, Yokohama 230-0045, Japan

1. Introduction Plant science has stepped forward into the "post-genome (sequence) era" in earnest by the completion of determination of the whole genome sequences of Arabidopsis thaliana and of rice. Plant metabolomics has recently emerged as an important field of post-genome sciences. Even if there is no visible change in cells and individual plants, metabolomics, which allows phenotyping by exhaustive metabolic profiling, can show precisely how the cells respond as a system. There is an advantage with A. thaliana, a model plant for modem plant science, because the resources related to its genome sequence can be fully applied. In this chapter, metabolomics study with A. thaliana will be described. Some reviews and commentary articles can be referred to for more detailed discussion [1-8].

2. The Impact of IVIetabolomics Study with a Model Plant for Genomics A way to understand an entity as a system by its comprehensive analysis is peculiar in post-genome (sequence) science. In other words, the fields of genomics (all genome sequences), transcriptomics (all cellular transcripts), proteomics (all cellular proteins), and metabolomics (all cellular low-molecular-weight metabolites), comprise post-genome science. The general idea of "metabolomics" or "metabolome" was defined several years ago first in the field of the microbiology [9], and its importance in plant science was pointed out immediately after that [1]. However, it has been overlooked in comparison with other so-called "-ome science" even recently. For example, when a publication is referred to in PubMed for 2004 in June, the number of "genome or genomics" related articles amounts to 97 420; "transcriptome or transcriptomics," 820; "proteome or proteomics,"

142

K. Saito

5393; however, the number for "metabolome or metabolomics" totals only 163. But, since the utility of crops, medicinal plants, industrial plants, etc., is due to the variety and producibility of their metabolites, the importance of plant metabolomics should be emphasized more than those of microorganisms and animals. The number of metabolites from the plant kingdom has been estimated at 200 000 or even more, up to 1 000 000 [7,10]. These numbers are significantly greater than those of microorganisms and animals. Metabolomics is based upon the nontargeted comprehensive analysis of total metabolites, while the usual "metabolite analysis" or "phytochemical analysis" is based upon the targeted analysis of a particular group of compounds. In addition, metabolomics is integrated with other -ome sciences with the aid of bioinformatics, as shown in Fig. 1. In other words, the whole metabolic change can correspond with genome, transcriptome, and protein functions for better understanding an entity as a system. Although the term "metabolite profiling" or "metabolic profiling" has often been used to point at comprehensive profiling of metabolic change for many years, metabolomics should differ from the classical "metabolite profiling" by means of integrating with other genome sciences. The following four classifications are defined by Fiehn as strictly relevant to research related to metabolomics [2]: -Target analysis -Metabolic profiling -Metabolomics -Metabolic fingerprinting

3. The Elements Composing Metabolomics Metabolomics is composed by a chemical analysis of metabolite and an in silico analysis of the data. Chemical analysis is the basis of metabolome analysis, and it is used for the comprehensive qualification and quantitation of whole metabolites. To do this, proper extraction methods without changing the metabolite profile and proper technology of data acquisition are required. At present, mass spectrometry (MS) and nuclear magnetic resonance spectrometry (NMR) are mainly used for metabolome analysis. The hyphenated terminology with several chromatographic methods is often used, e.g., gas chromatography-mass spectrometry (GC/MS), high-performance liquid chromatography-mass spectrometry (LC/MS), and recently, capillary electrophoresis-mass spectrometry (CE/MS). The analytical methods without preseparation by chromatography, for example.

Metabolomics in Arabidopsis thaliana

143

Fourier-transform ion cyclotron resonance mass spectrometry (FT-ICR-MS) and time-of-flight mass spectrometry (TOF-MS), are put to practical use. Nuclear magnetic resonance analysis of the cell extract as it is and an on-line NMR analysis of LC-stopped flow method are also reported. For data analysis, multivariate analysis such as a principal component analysis (PCA) and a hierarchical cluster analysis (HCA), as well as self-organization mapping (SOM), are carried out for data mining. Further, integration of metabolome data with other -ome data is performed to identify the gene/protein function and eventually leads to metabolic and cellular simulation in silico. Metabolite profiling can be applied to identification of mutants and function of genes. This technology, called "chemical bio-panning," is a branch field of metabolomics.

Fig. 1. Outline of post-genome sciences. Genomics, transcriptomics, proteomics, and metabolomics are integrated by the aids of bioinformatics. Complexity increases along the line from genomics to metabolomics

144

K. Saito

Table 1. Genome-related resources of Arabidopsis thaliana Number

Annotated genes EST clones Full-length cDNA clone T-DNA/transposon inserted mutant lines Single nucleotide polymorphism Up to May 2003. EST, expressed sequence tag

28 974 178 000 27 000 140 000 57 000

4. Genome-Related Resources of Arabidopsis tiialiana Arabidopsis thaliana is most suitable for carrying out research effectively, because a resource related to its genome is satisfactory. The resources related to the Arabidopsis genome useful for metabolomics research is summarized in Table 1. Moreover, AraCyc (http://www.arabidopsis.org/ tools/aracyc/), a database about the metabolic pathway of A. thaliana, is useful for metabolomics research of ^. thaliana [11]. By June 2004, 186 metabolic pathways and 1161 reactions have been covered, and 53% of these reactions are annotated with enzymes and genes. This database will be improved from time to time.

5. Examples of Metabolomics Research of Arabidopsis We proceed with the research on the comprehensive analysis of gene expression-metabolic pathway network in plant cells. The response under nutritional stress of sulfur and nitrogens and the effects of a single Myb transcription factor have been investigated through transcriptome and metabolome analyses.

5.1. Change of Metabolome and Transcriptome by Nutrition Stresses Arabidopsis plants were grown by hydroponic culture under nutritional stress of sulfur and/or nitrogen for 3 weeks (long-term starvation experiments). The plants were also grown on agar plates with sufficient nutrition for 3 weeks and then subjected to sulfur starvation or addition of 0-acetylserine (OAS), a key intermediary metabolite of sulfur assimilation.

Metabolomics in Arabidopsis thaliana

145

for 2 days (short-term experiments) (Fig. 2). Transcriptome analysis with DNA microarray was carried out for the leaves and roots of about 14 samples . With the same samples, metabolomics combining non-targeted comprehensive FT-MS analysis and targeted analysis by capillary electrophoresis and HPLC for amino acids was investigated [12-15]. Data mining was conducted by pairwise one-to-one correlation, hierarchical cluster analysis, and self-organization mapping. The clusters of the experimental group by metabolomics corresponded to those by transcriptomics (Fig. 3). In other words, both metabolome and transcriptome were influenced most predominantly by the difference of organ (leaf and root), followed by the difference of cultivation conditions (long-term hydroponic culture or short-term agar plate culture) and then by the treatments (sulfur and nitrogen starvation). Furthermore, it was shown that addition of OAS caused a transcriptome change nearly identical with that by sulfur deficiency even under the sulfur-sufficient condition. Metabolome changes also showed similar trends though less evident that with transcriptome. These results indicated that OAS acts as a positive regulatory factor for transcriptome and metabolome, in addition to a key metabolic intermediate in the sulfur assimilation pathway. Self-organization mapping of transcriptome and metabolome indicated the groups of genes and metabolites, of which the patterns of change are similar under given conditions. For example, the genes involved in photosynthesis form a cluster, and the genes involved in pentose phosphate form another cluster. In the same way, a group of metabolites or ion peaks of FT-MS exhibits similar patterns of change. The glucosinolate-related metabolites and the genes involved in glucosinolate metabolism are integrated in a metabolic map. Further experiments with time-course changes show the coordinated modulation of glucosinolates together with their degradation products and gene expression involved in glucosinolate metabolism. These integrated analyses of metabolome and transcriptome allowed us to identify unknown gene functions in Arabidopsis; for example see references [16,17]. Taken together, the networks of auxin and methyl jasmonate signalings in addition to glucosinolate metabolism were modulated by sulfur starvation [15]

146

K. Saito

Fig. 2. Outline of nutritional stress experiments in Arabidopsis. Long-term starvation experiments: Arabidopsis plants were grown by hydroponic culture under nutritional stress of sulfur and/or nitrogen for 3 weeks. Short-term experiments: the plants were also grown on agar plates with sufficient nutrition for 3 weeks and then subjected to sulfur starvation or OAS addition for 2 days. OAS, O-acetylserine

Fig. 3. Hierarchical clustering analysis of transcriptome and metabolome data under nutritionally stressed conditions. See Color Plate 4. Names of experimental groups are listed in Fig. 2

Metabolomics in Arabidopsis thaliana

147

5.2. Secondary Metabolism Controlled by PAP1 Transcriptional Factor It is known that anthocyanins are highly accumulated when the PAPl gene encoding a Myb-like transcription factor is constitutively expressed [18]. DNA microarray and the metabolome analysis of the activation tagged line and cDNA overexpressing line clarify the holistic effects by an ectopic expression of a single Myb-like factor [19]. Metabolic profiling of flavonoids by LC/MS was combined with the comprehensive nontargeted analysis by FT-MS. PAPl gene expression was specific to increasing the accumulation of anthocyanins. Several new anthocyanins were tentatively identified from PAPl overexpressing lines (Fig. 4). Expression of the genes involved in anthocyanin production was upregulated with these changes in

Cmin]

Fig. 4. High-performance liquid chromatography/photodiode array/mass spectrometry (HPLC-PDA-MS) profile of the anthocyanin fraction of the wild-type (a) and^a/?l-D mutant (b) Arabidopsis. 1, cyanidin-3-Glc-(Xyl)-(coumaroylGlc)-5-Glc; 2, cyanidin-3-Glc-(Xyl)-(coumaroyl-Glc)-5-Glc-(malonyl); 3, cyanidin-3-Glc-(Xyl-sinapoly)-(coumaroyl-Glc)-5-Glc; 4, cyanidin-3-Glc-(Xylsinapoly)-(coumaroyl-Glc)-5-Glc-(malonyl); 5, cyanidin-3-Glc-(Xyl) (coumaroyl)-5-Glc-(malonyl); 6, cyanidin-3-Glc- (Xyl-sinapoly)- (coumaroyl)- 5-Glc(malonyl) (Tohge et al, [19])

anthocyanin metabolite changes. The particular gene whose expression was induced by PAPl among paralogous members of a gene family was presumed to be actually responsible for the production of anthocyanins (Fig. 5).

148

K. Saito

Fig. 5. Intercalation of transcriptome and metabolome data on the biosynthetic pathway of flavonoids. Arrows indicate the paralogous genes in Arabidopsis. Black arrow indicates the upregulated genes in pap l-D mutant, and dashed arrow indicates the downregulated genes. The levels of cyanidin derivatives and quercetin derivatives increased, and that of kaempferol derivatives decreased. PAL, phenylalanine ammonia lyase; C4Hy cinnamate 4-hydroxylase; 4CL, 4-coumarate coenzyme A ligase; CHS, chalcone synthase; CHL chalcone isomerase; F3H, flavanone 3-hydroxylase; FLS, flavonol synthase; FGT, flavonoid glycosyltransferase; F37I, flavonoid 3'-hydroxylase; DFR, dihydroflavonol reductase; ANS, anthocyanidin synthase; AAT, anthocyanin acyltransferase; GST, glutathione-^'-transferase These assumptions allowed the tentative identification of specific genes responsible for biosynthesis of anthocyanin. For example, 110 glycosyltransferase-family genes, 50 acyltransferase-family genes, and 30 glutathione-5'-transferase-family genes are coded in the genome of ^. thaliana. By integrating metabolome and transcriptome data on some reactions, we were able to comprehensively predict the function of the related genes. Thereafter, some functions were experimentally identified by classical analysis of T-DNA knock-out mutant lines and by in vitro enzymatic study using recombinant proteins. Not only the genes coding for the biosynthetic enzymes, but the genes for transporter-like proteins and regulatory proteins were also predicted to be involved from the results of PAPl gene overex-

Metabolomics in Arabidopsis thaliana

149

pression. These results show the usefulness of integrated analysis of metabolome with transcriptome for functional genomics of Arabidopsis [16]. The protocol of integration of metabolome and transcriptome for gene identification is applicable not only to Arabidopsis but also to useful exotic plants. Perilla frutescens, a herbal plant producing anthocyanin, and Ophiorrhizapumila, a plant producing the antitumor alkaloid camptothecin, are being investigated by comprehensive metabolite profiling of both targeted and nontargeted analysis and by extensive gene expression profiling using polymerase chain reaction-select differential screening, expressed sequence tag analysis, and cDNA differential display [20,21].

5.3. The Metabolome Analysis of Arabidopsis Ecotypes The group led by L. Willmitzer in the Max-Planck Institute of Plant Molecular Physiology in Germany has conducted pioneering work on plant metabolomics. They used GC/MS for nontargeted analysis of hydrophilic compounds, amino acids, organic acids, sugars, sugar alcohols, and amines, after methoxylation and silylation. About 330 metabolites were detected in A. thaliana leaf extracts and about half of them were identified [22-24]. Comparison of four Arabidopsis genotypes (two homozygous ecotypes, Col-2 and C24, and a mutant of each ecotype) indicated that each genotype possessed a distinct metabolic phenotype. Principal component analysis of those four metabolic phenotypes indicated that the metabolic phenotypes of the two ecotypes were more divergent than those of the mutant and their parent ecotypes [25]. Fiehn and coworkers are further investigating several different ecotypes and their recombinant inbred lines [26].

5.4. Secondary Metabolites by Capillary LC/ESI-MS Capillary LC/ESI quadrupole time-of-flight mass spectrometry was applied to comprehensive analysis of Arabidopsis extracts [27]. About 2000 mass signals (1400 in leaves and 800 in roots) were detected, and most of them were presumed to be secondary products. As a case study, this analytical method was applied to the tt4 mutant lacking a functional chalcone synthase. In addition to the expected disappearance of kaempferol and its glycosides in the mutant, the levels of some indole-derived metabolites were reduced, suggesting the connection of the flavonoid metabolism to the indole metabolism such as auxin.

150

K. Saito

5.5. ^H-NMRfor Arabidopsis Metabolomics ^H-NMR spectroscopy has been used for metabolomics of Arabidopsis. The lyophilized tissue of Arabidopsis was extracted with deuterated HaO-MeOH and directly determined with ^H-NMR for metabolic fingerprinting [28]. Nine ecotypes of Arabidopsis were investigated for clustering by principal component analysis. Response to salt stress in Arabidopsis was analyzed by ^H-NMR fingerprinting [29].

6. Chemical Bio-panning with Arabidopsis T-DNA iVIutants Nontargeted metabolite profiling can be applied to screen mutants in which accumulation of particular metabolites is expected. This is referred to as "chemical bio-panning" [30]. The screening of Arabidopsis mutants exhibiting different accumulation patterns of flavonoids led to the selection of interesting loss-of-function mutants in 1991 [31]. However, it was not easy to identify the gene(s) responsible for the mutation caused by chemical mutagenesis. Nowadays, Arabidopsis mutant lines of gain-of-function are available with T-DNA activation technology. The combination of T-DNA activation-tagged mutants of Arabidopsis to this screening program allows rapid identification of genes involved in particular metabolic pathways. Screening can be carried out by nontargeted chemical analysis using GC/MS, HPLC, and CE, or by targeted analysis of a specific group of compounds such as anthocyanins and thiol compounds (easy to detect by fluorescent HPLC). Positive selection against growth inhibitors (heavy metals, metabolite analogues, herbicides, etc.) is also useful for large-scale screening.

Acknowledgments The author would like to thank his colleagues who shared unpublished data. The author' s original studies described in this chapter were supported in part by Grants-in-Aid for Scientific Research from the Ministry of Education, Culture, Sports, Science and Technology, Japan, and by CREST of Japan Science and Technology Agency (JST), Japan.

Metabolomics in Arabidopsis thaliana

151

References 1. Trethewey RN, Krotzky AJ, Willmitzer L (1999) Metabolic profiling: a Rosetta stone for genomics? Curr Opin Plant Biol 2:83-85 2. Fiehn O (2002) Metabolomics—the link between genotypes and phenotypes. Plant MoL Biol 48:155-171 3. Weckwerth W, Fiehn O (2002) Can we discover novel pathways using metabolomics analysis? Curr Opin Biotech 13:156-160 4. Roessner U, Willmitzer L, Femie AR (2002) Metabolic profiling and biochemical phenotyping of plant systems. Plant Cell Rep 21:189-196 5. Femie AR (2003) Metabolome characterization of plant system analysis. Funct Plant Biol 30:111-120 6. Sumner LW, Mendes P, Dixon RA (2003) Plant metabolomics: large-scale phytochemistry in the functional genomics era. Phytochemistry 62:817-836 7. Trethewey R (2004) Metabolite profiling as an aid to metabolic engineering in plants. Curr Opin Plant Biol 7:196-201 8. Kopka J, Femie A, Weckwerth W, Gibon Y, Stitt M (2004) Metabolite profiling in plant biology: platforms and destinations. Genome Biol 5:109 9. Tweeddale H, Notley-McRobb L, Ferenci T (1998) Effect of slow growth on metabolism of Escherichia coli, as revealed by global metabolite pool ("Metabolome") analysis. J Bacteriol 180:5109-5116 10. Dixon RA, Strack D (2003) Phyotchemistry meets genome analysis, and beyond. Phytochemistry 62:815-816 11. Mueller LA, Zhang P, Rhee SY (2003) AraCyc. A biochemical pathway database for Arabidopsis. Plant Physiol 132:453^60 12. Aharoni A, de Vos CH, Verhoeven HA, Mailiepaard CA, Kruppa G, Bino RJ, Goodenowe DB (2002) Nontargeted metabolome analysis by use of Fourier transform ion cyclotron mass spectrometry. OMICS J Integr Biol 6:217-234 13. Hirai MY, Fujiwara T, Awazuahara M, Kimura T, Noji M, Saito K (2003) Global expression profiling of sulfur-starved Arabidopsis by DNA macroarray reveals the role of 0-acetyl-L-serine as a general regulator of gene expression in response to sulfur nutrition. Plant J 33:651-663 14. Hirai MY, Yano M, Goodenowe D, Kanaya S, Kimura T, Awazuhara M, Arita M, Fujiwara T, Saito K (2004) Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in Arabidopsis thaliana. Proc Natl Acad Sei USA 101:10205-10210 15. Hirai MY, Saito K (2004) Post-genomics approaches for the elucidation of plant adaptive mechanisms to sulphur deficiency. J Exp Bot 55:1871-1879 16. Saito K (2004) Functional genomics through integration of transcriptomics and metabolomics in y^rafe/c/op^w thaliana- Third Intemational Conference on Plant Metabolomics, Ames, Iowa, 3-6 June 2004, Abstracts, p 19 (http ://w WW. metabolomic s.nl/) 17. Nikiforova V, Freitag J, Kempa S, Adamik M, Hesse H, Hoefgen R (2003) Transcriptome analysis of sulfur depletion in Arabidopsis thaliana'- interlacing of biosynthetic pathways provides response specificity. Plant J 33:633-650

152

K. Saito

18. Borevitz JO, Xia Y, Blount J, Dixon RA, Lamb C (2000) Activation tagging identifies a conserved MYB regulator of phenylpropanoid biosynthesis. Plant Cell 12:2383-2393 19. Tohge T, Nishiyama Y, Hirai MY, Yano M, Nakajima J, Awazuhara M, Inoue E, Takahashi H, Goodenowe DB, Kitayama M, Noji M, Yamazaki A, Saito K (2005) Functional genomics by integrated analysis of metabolome and transcriptome of Arabidopsis plants overexpressing an MYB transcription factor. Plant J 42:218-235 20. Yamazaki M, Nakajima J-I, Yamanashi M, Sugiyama M, Makita Y, Springob K, Awazuhara M, Saito K (2003) Metabolomics and differential gene expression in anthocyanin chemo-varietal forms of Perillafrutescens- Phytochemistry 62:987-995 21. Yamazaki Y, Urano A, Sudo H, Kitajima M, Takayama H, Yamazaki M, Aimi N, Saito K (2003) Metabolite profiling of alkaloids and strictosidine synthase activity in camptothecin producing plants. Phytochemistry 62:461^70 22. Fiehn O, Kopka J, Trethewey RN, Willmitzer L (2000) Identification of uncommon plant metabolites based on calculation of elemental compositions using gas chromatography and quadrupole mass spectrometry. Anal Chem 72:3573-3580 23. Rossener U, Wagner C, Kopka J, Trethewey RN, Willmitzer L (2000) Simultaneous analysis of metabolites in potato tuber by gas chromatography-mass spectrometry. Plant J 23:131-142 24. Rossener U, Luedemann A, Brust D, Fiehn O, Linke T, Willmitzer L, Femie AR (2001) Metabolic profiling allows comprehensive phenotyping of genetically or environmentally modified plant systems. Plant Cell 13:11-29 25. Fiehn O, Kopka J, Doermann P, Altman T, Trethewey RN, Willmitzer L (2000) Metabolite profiling for plant functional genomics. Nat Biotechnol 18:1157-1161 26. Fiehn O (2004) High quality automated analyses of Arabidopsis metabolic phenotypes: Ecotypes, environmental stress and crosses. Third International Conference on Plant Metabolomics, Ames, Iowa, 3-6 June 2004, Abstracts, p 8 (http://www.metabolomics.nl/) 27. Von Roepenack-Lahaye E, Degenkolb T, Zerjeski M, Franz M, Roth U, Wessjohann L, Schmidt J, Scheel D, Clemens S (2004) Profiling of Arabidopsis secondary metabolites by capillary liquid chromatography coupled to electrospray ionization quadrupole time-of-flight mass spectrometry. Plant Physiol 134:548-559 28. Ward JL, Harris C, Lewis J, Beale MH (2003) Assessment of ^H NMR spectroscopy and multivariate analysis as s technique for metabolite fingerprinting of Arabidopsis thaliana- Phytochemistry 62:949-957 29. Kikuchi J, Shinozaki K, Hirayama T (2004) Stable isotope labeling of Arabidopsis thaliana for an NMR-based metabolomics approach. Plant Cell Physiol 45:1099-1104 30. Dixon RA (2001) Phytochemistry in the genomics and post-genomics eras. Phytochemistry 57:145-148

Metabolomics in Arabidopsis thaliana

153

31. Graham TL (1991) A rapid, high resolution high performance liquid chromatography profiling procedure for plant and microbial aromatic secondary metabolites. Plant Physiol 95:584-593

Chapter 11: Lipidomics: Metabolic Analysis of Phospholipids Ryo Taguchi Department of Metabolome, Graduate School of Medicine, The University of Tokyo, 7-3-1 Kongo, Bunkyo-ku, Tokyo 113-0033, Japan

1. Introduction In the study of phospholipases that have been the subject of our research for a long time, we have found that soft ionization mass spectrometry by electrospray ionization (ESI-MS) was effective and useful in elucidating their substrate specificity at the level of individual molecular species of phospholipids and their physiological functions [1-3]. We therefore began a comprehensive analysis with ESI-MS. There are several advantages in applying ESI-MS to the analysis of phospholipids. First, ESI is a milder ionization method than matrix-assisted laser desorption/ionization (MALDI). Therefore it is convenient to analyze phosphoethers and lipid esters which have relatively weak inner molecular bonds. Second, phospholipids can be easily ionized by ESI because of their phosphoryl group and their polar head group. Third, in the case of lipid analysis, most of the nonvolatile ions can be removed during the extraction with organic solvents. This is convenient when the analysis needs to be done without these salts. Also, lipids, different from peptides, have molecular species some of which are different from each other by a different number of unsaturated bonds. Liquid chromatography-mass spectrometry (LC/MS), which can take advantage of both the separation by high-performance liquid chromatography (HPLC) and the separation according to the number of mass values, was found to be effective [4]. In addition, recent research has proven that high resolution and high mass accuracy analysis by Fourier-transform ion cyclotron resonance FT-ICR-MS was effective in the case of lipid analyses because of its ability to distinguish each molecular species by its atomic composition [5]. We have attempted to perform various types of experiments such as a combined analysis with HPLC, an analysis with a change of ionization conditions in a positive and a negative ion mode [4], high-resolution

156

R. Taguchi

analysis at the level of 1 to 2 ppm [5], tandem mass analysis with fragment data, and quantitative analyses with multiple reaction monitoring and selected reaction monitoring. Among the many experimental modes, an appropriate analytical method should be chosen according to the purpose and the accuracy that needs to be achieved in each analysis. We are now making an effort to construct a lipid database and a search engine that will be useftil in analyzing various mass data obtained from each analytical method.

2. Analysis by Mass Spectrometry for Lipid IVIetabolomes In our studies, we selected several different approaches to mass spectrometric analysis of lipids (Table 1) [4-7]. The most popular methods that have been used in metabolic analysis were selected ion monitoring (SIM) and multiple reaction monitoring (MRM). These methods were normally used in combination with HPLC as LC/MS. The individual metabolites were identified from their retention time and m/z value. In the case of MRM, essentially the combination with the detection of precursor ions and major fragment ions was used. Even in this analysis, ESI-MS made it possible to detect more than ten metabolites by a single LC analysis. MRM is commonly used in quantitative analysis by mass spectrometry. But both in SIM and MRM analyses, the target metabolites to be analyzed must be defined in advance, and data of their molecular masses and their fragments are required in advance to set the analytical conditions. On the other hand, comprehensive analysis by soft ionization is essentially used for crude mixtures containing many different metabolites, and these metabolites existing in the samples are expected to be identified as much as possible. Even in the absence of preliminary perception for a specific metabolite before mass analysis, the significant difference in profiling data of metabolites can be obtained by mass analyses. In this case, however, some focuses by an individual researcher in the specific category of metabolites may be effective in detecting an important factor with a low amount. For this purpose, a precursor ion scanning method and neutral loss scanning methods are used for comprehensive analysis of focused categories of metabolites with structural similarities (6). By using these analytical methods, the detection limits of individual metabolites are increased up to 100-fold. By focusing on some limited categories of metabolites, the detection limit is greatly enhanced, thus making it possible to detect minor but important metabolites. We are now attempting to develop the optimal collision conditions for individual metabolites to use these methods for the detection of a specified class of phospholipids.

Lipidomics

157

Table 1. Several analytical methods for lipid metabolome by mass spectrometry Comprehensive analysis focused on whole phospholipids LC/MS and cycle sequence High-resolution analysis by FT-ICR-MS Group-specific analysis Neutral loss scanning Precursor ion scanning (class-specific or fatty acid-specific analysis) Molecular-specific analysis SRM, selected reaction monitoring MEM, multiple reaction monitoring (quantitative analysis combination of molecular-related ions and their fragments) On the other hand, in the case of not using any MS/MS, the combination of differences in the elution time as determined by separation w^ith HPLC and mass resolution in terms of m/z values are used as important data for the detection of individual metabolites. In this system, some separation methods such as HPLC and capillary electrophoresis (CE) are used to eliminate the number of metabolites eluted at the same retention time. Further, in the analysis of FT-ICR-MS, w^hen a limited mass range of metabolites is selected, very small mass differences such as 0.001 can be effectively detected w^ith high-resolution efficiency of 10^. Even w^ith broadband analysis, mass resolution of more than 100 000 and mass accuracy of less than 2 ppm can be obtained [5]. At this resolution, elementary composition of individual metabolites can be determined from their accurate mass values. Recently, untargeted metabolic analysis by FT-ICR-MS has beeen applied to metabolomics by several venture companies. We are planning to construct several different lipid databases and search tools for mass data obtained by different analytical methods, such as comprehensive analysis of whole phospholipids, focused analysis for a specific fatty acid or a specific polar head, or an identification method for the detection of a single individual metabolite.

3. Features of Lipid Analysis by ESI-IMS In ionization by ESI, the observed ions vary according to polar head of the phospholipids. In addition, neutral lipids such as triglycerides can be also

158

R. Taguchi

effectively ionized as ammonium-adduct ions in the presence of ammonium formate. The data of m/z values both in positive and negative ion modes are very important for identification of phospholipid classes. Species of polar head groups or hydrophilic and hydrophobic balance greatly influence ionic efficiency. In the case of quantification, correction by difference in ionization by polar head groups, fatty acyl chain length, and number of double bonds of individual metabolites is necessary. By using triple-stage quadrupole MS, with a cycle sequence for LC/MS for 1 h, most of major phospholipids are roughly separated by polar head groups (Fig. 1) [4]. During this analysis, in addition to normal detection with positive and negative ions, detection by in-source fragmentation at a higher collision energy was used. Individual molecular species of phospholipids were identified by pseudo-two-dimensional analysis of retention time and m/z value (Fig. 2).

Fig. 1. A two-dimensional map of phospholipids by mass spectrometry. PC, phosphatidylcholine; PE, phosphatidylethanolamine; SM, sphingomyelin; lysoPC, lyso phosphatidylcholine

Lipidomics

>"

T=>

r—

V

> DG y > MG> /

• i

EÖH \ /

"]^

) PC

lyso Vv lyso \ \ PG ^y PI / /

lyso PE

z._

\

))_^

\

/

IJD

\

Iwcn

h'/

\ — \

1

0 100 r

lyso PA /\

PC 1

\

\ 'yso \

z

y

"")

100

y

^

/

22.4 min

^ ^

SM

25.1 min / \

0 lysoPC

100

i

)

/

^PAF

)'^*)>

i

)) SM

//

159

0

)

(

27.4 min

y\

20

10

30

[min]

Fig. 2. Elution profile of phospholipids by liquid chromatography-electrospray ionization mass spectrometry with a silica column. PC, phosphatidylcholine; PE, phosphatidylethanolamine; PI, phosphatidylinositol; PS, phosphatidylserine; PG, phosphatidylglycerol; SM, sphingomyelin; MG, monoglycerol; DG, diglycerol; TG, triglycerol; FA, fatty acid; PAF, platelet activating factor

4. Several Practical Approaches to Metabolic Analyses of Lipids 4.1. Comprehensive Analysis of Phospholipids from Whole Cells by LC/MS The most abundant molecular species of phosphatidylcholine (PC), diacyl 16:0/18:1 PC, can be effectively identified w^ith a total number of 10^-10^ cells. However, more than 10^ cells are required for detecting a much smaller number of metabolites such as one thousand metabolites or less. In a typical analysis a lipid mixture is extracted from 10^-10^ cells by Bligh and

160

R. Taguchi

Dj^er's methods. We analyzed an extract by liquid chromatography-electrospray mass spectrometry (LC/ESI-MS). For the identification of phospholipids, an effective database and search tool, such as Mascot in proteome, is not yet in existence. Thus we are trying to construct identification tools for individual metabolites from their mass spectrum with our ow^n experimental data and expanded theoretical data containing molecular relied ions and their fragment ions along with a search engine named "L|pid Search" that has been available on our web site since 2003. 4 4 . Analysis of Phospholipids of Human Serum Aijalysis by LC/ESI-MS of phospholipids from human serum, lyso phosph^tidylethanolamine (PE) and lyso phosphatidylcholine (PC), were analy2|ed with their molecular species. )[n molecular species in lyso phosphatidic acid (PA), both 1-acyl and 2-4cyl types were detected. Further, concerning cyclic PA, a small amount of Mkyl type was also observed. To detect these small amounts of molecular species, we confirmed that the neutral loss scanning method is very effectivb. 4.^. High-Resolution Analysis by FT-ICR-MS Effective separation and identification of an isotopic peak of PC containing one ^^C atom from a monoisotopic peak of sphingomyelin was obtained by practical broadband analysis of nano ESI-FT-ICR-MS. In the analysis of FT-ICR-MS, a mass difference of 0.01 amu can be easily detected under resolution of 10^. As shown in Fig. 3, PC and sphingomyelin (SM) were identified without separation by HPLC. Also, in the case of FT-ICR-MS analysis of PEfi-omCaenorhabditis elegans with an odd number fatty acids separated and identified from alkyl-acyl PE with an even number of fatty acids [5]. Oxidative phospholipids are synthesized under oxidative stress, and these metabolites seem to be major components causing arteriosclerosis. Oxidized PCs from soybean were analyzed by broadband analysis mode of nano ESI-FT-ICR-MS. Individual molecular species of oxidative PC such as 34:3 diicyl PC with peroxide (+20) {m/z 788.544) or 34:2 diacyl PC with peroxide (+20) {m/z 790.560) were separated from their nonoxidized phospholipids [5]. By the analysis with FT-ICR-MS, more than 80 000 resolution can be easily obtained even in broadband analysis mode. We assumed that the in-

Lipidomics

161

dividual molecular species of PC and PE with the mass difference of 0.01-0.05 can be identified by high-resolution analysis using FT-ICR-MS. This method is extremely effective to identify diacyl, or alkyl- or alkenyl-acyl subclasses with very close m/z values within 0.04. Several recent papers have indicated the existence of significant amounts of phospholipid molecular species with an odd-number acyl chain even in normal mammalian cells. For exact qualitative and quantitative determination for an odd number of phospholipids, both clear mass peak separation of individual molecular species of phospholipids with FT-ICR-MS and determination of fatty acid fragments with MS/MS analysis are needed. Furthermore, the mass values with high accuracy and high resolution obtained by FT-ICR-MS are also expected to give us a chance to realize the possibility of the existence of unexpected or unknown metabolites or derivatives.

9.0e+0S fIPC 38:4j 8.0e+0S [

ISM24

III

^

1

7.0e+0S 810.6168 6.0e+0S [

\ 1

5.0e+0S

1 1

4.0e+0S

L

1

813.7006 813.6367 1

IPC 38:5| 3.0e+0S [L808.6011 2.0e+0S [\

1

\

24:2

L 0

811.6202 / 811.6325 \ \ 811.6851

1

\

809.6041

/ 815.7162 814.7040

1 1 \ \ j

1

\

|SM24:0|

i

816.7200

r

312.6885 \i

1.0e+0S 1

1

1

810

1

1

r-

m^i^k^^^mMhLmi^¥M^j>^.^mii^^.m((m^'% 814

812

816

Cm/z]

Fig. 3. High-resolution analysis and identification of phosphatidylcholine and sphingomyelin by Fourier-transform ion cyclotron resonance-mass spectrometry

162

R. Taguchi

4.4. Other Applications By applying the methods described above, we have obtained experimental results suggesting new functions of phospholipids through collaborations. Identifying molecular species of phospholipids with very long fatty acyl chains [8], confirming the existence of cyclic PA in human serum albumin fractions [9], and then finding of the substrate specificities of two phospholipases Ai were achieved [10,11]. Further, a study was conducted to analyze phospholipids localized in a specific membrane domain in order to elucidate the specific function of each molecular species of phospholipids. Concerning its membrane localization, we elucidated the concentrations of specific phospholipid molecules in lipid droplets [12]. This indicated the existence of a special membrane organization of lipid droplets. Further, by using these methods we were able to obtain new biological substrates in addition to known substrates determined by in vitro activities. By applying these techniques to highly expressed cells in combination with some forms of stimulation process of the precursors in the system, key roles of lipid metabolizing enzymes in some physiological functions can be effectively elucidated. Thus we would like to emphasize the important role of the analysis of metabolomes in the functional study of lipid metabolizing enzymes.

5. A Search Tool for Lipid Metabolome "Lipid Search" is a tool for the identification of phospholipid molecular species by experimental mass data, which can be accessed from our home page (http://metabo.umin.ac.jp) [13]. The database for this search was constructed of essential core mass data of phospholipids commonly detectable in biological sources. Three different search windows are available for the identification of molecular species of phospholipids by "Lipid Search" (Fig. 4). The first window only uses molecular-related mass values in a positive or a negative ion mode. Here, the mass accuracy determines the probability of identification results. The mass tolerance in the search conditions was determined by the type of mass spectrometer used in each experiment. By pasting a personal data table of mass values and their corresponding ion intensities into our search table, the most probable lipid molecules and their corresponding compensated peak intensities can be obtained. Furthermore, it is possible to narrow down the candidate molecules by selecting a possible class of phospholipid in advance using the data obtained from the retention

Lipidomics

163

"Lipid Search" LS/MS (Lipid Search for Mass Spectrometry)

Select Class P-Ethanol, Amine group

P-lnosito)

GPE

rjPI

P-Choline group

L i LPE

D L P I

D

P-Serine group

P-Ethanol group

riLPMe

E l PS

n

Others

D

ALL

PC

L i LPC n

RAF

n

SM(SP)

G SM(DISP) ["i SM{PHSP)

PEt

P-Methanol group IJPMe

v i LPS

QLPEt

r] cPA

P-Grycerol group

P-acid group

CI '^'^

n

PG

G

D

LPG

[jLPA

PA

Precursor (m/z) :762.30;8 3S73.8800 l.ltVIt 1774.6710 2.£6S6o4 12.7584 Tolerant (Da)

^\

-^l

:o4

Polarity Q

Positive

• •> Negative

0°!J.?.gyiL.^ LS/MS (lipid search for mass spectrometry) Copyright (2003) Department of Metalwlome, Graduate School of Medicine, the University )f Tokyo.

Fig. 4. Search window 1 in "Lipid Search" [13]. Identification results can be obtained by pasting a data table containing m/z vales and ion intensities to this window. From May, 2005, a revised version of lipidsearch which can be applied directly by low data of mass will be available time of LC, or a precursor ion scanning or neutral loss scanning of the polar head group of phospholipids. The second search w^indow^ is for identification by mass profile data obtained from the precursor ion scanning of the specific fragments of individual fatty acyl chains. Here almost all mass peaks of the lipid molecular species containing a focused fatty acid can be obtained immediately. In the third w^indow, data detected by MS/MS is required. The mass data of this window^ w^ere constructed from the combination data of the m/z value of a molecular-related ion and the m/z values of its fragment ions. In the third search window^, as w^ell as the second w^indow^, individual pairs of fatty acyl chains at the sn-l and sn-l positions can be obtained. Here, at least the data of one fragment ion of a fatty acyl chain is required. Three different w^indow^s of our search program can be selected from the main page of "Lipid Search."

164

R. Taguchi

6. Conclusions At present, we are attempting to develop basic analytical methods, a database, and search tools for lipidomics, i.e., metabolomics focusing on lipids. Further, in order to develop in new^ analytical methods and identification tools, we believe that our efforts will be successful only through collaboration with numerous cooperating researchers and application personnel of MS manufacturers. Analysis by ESI-MS can be applied to elucidate lipid mediators in the cells as well as for the purposes of detection and quantification, and for localization in the cells of minor lipid metabolites. Bioinformatics studies based on both data of proteome of enzymes and receptors, as well as metabolome of lipids, will make it possible to acquire new knowledge on the metabolic and signaling pathways of lipid metabolism.

References 1.

2. 3. 4.

5.

6.

7.

Han X, Zupan LA, Hazen SL, Gross RW (1992) Semisynthesis and purification of homogeneous plasmenylcholine molecular species. Anal Biochem 200: 119-124 Murphy RC (1993) Mass Spectrometry of lipids. Plenum Press: New York, pp 71-282 Kim, HY, Salem N (1993) Liquid chromatography-mass spectrometry of lipids. Prog Lipid Res 32: 221-245 Taguchi R, Hayakawa J, Takeuchi Y, Ishida M (2000) Two-dimensional analysis of phospholipids by capillary liquid chromatography/electrospray ionization mass spectrometry. J Mass Spectrometry 35: 953-966 Ishida M, Yamazaki T, Houjou T, Imagawa M, Harada A, Inoue K, Taguchi R (2004) High-resolution analysis by nano-electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry for the identification of molecular species of phospholipids and their oxidized metabolites. Rapid Commun Mass Spectrom 18: 2486-2494 Houjou T, Yamatani K, Nakanishi H, Imagawa M, Shimizu T, Taguchi R (2004) Rapid and selective identification of molecular species in phosphatidylcholine and sphingomyelin by conditional neutral loss scanning and MS3. Rapid Commun Mass Spectrom 18: 3123-3130 Houjou T, Yamatani K, Imagawa M, Shimizu T, Taguchi R (2005) A shotgun tandem mass spectrometric analysis of phospholipids with

Lipidomics

8.

9.

10.

11.

12.

13.

165

normal-phase and/or reverse-phase liquid chromatography/electrospray ionization mass spectrometry. Rapid Commun Mass Spectrom 19: 654-666 Yokoyama K, Saitoh S, Ishida M, Yamakawa Y, Nakamura K, Inoue K, Taguchi R, Tokumura A, Nishijima M, Yanagida M, Setaka: Biochim M (2001) Very-long-chain fatty acid-containing phospholipids accumulate in fatty acid synthase temperature-sensitive mutant strains of the fission yeast Schizosaccharomyces pombe fas2/lsdl. Biochim Biophys Acta 1532: 223-233 Kobayashi T, Tanaka-Ishii R, Taguchi R, Ikezawa H, Murakami-Murofushi K (1999) Existence of a bioactive lipid, cyclic phosphatidic acid, bound to human serum albumin. Life Sei 65: 2185-2191 Hosono H, Aoki J, Nagai Y, Bandoh K, Ishida M, Taguchi R, Arai H, Inoue K (2001) Phosphatidylserine-specific phospholipase Al stimulates histamine release from rat peritoneal mast cells through production of 2-acyl-l-lysophosphatidylserine. J Biol Chem 276: 2966429670 Sonoda H, Aoki J, Hiramatsu T, Ishida M, Bandoh K, Nagai Y, Taguchi R, Inoue K, Arai H (2002) A novel phosphatidic acid-selective phospholipase Al that produces lysophosphatidic acid. J Biol Chem 277: 34254-63 Tauchi-Sato K, Ozeki S, Houjou H, Taguchi R, Fujimoto T (2002) The surface of lipid droplets is a phospholipid monolayer with a unique Fatty Acid composition. J Biol Chem 277: 44507-44512 Lipid Search : http://lipidsearch.jp and http://metabo.umin.ac.jp

Chapter 12: Chemical Diagnosis of Inborn Errors of Metabolism and Metabolome Analysis of Urine by Capillary Gas Chromatography/Mass Spectrometry Tomiko Kuhara Division of Human Genetics, Medical Research Institute, Kanazawa Medical University, 1-1 Daigaku, Uchinada-machi, Kahoku-gun, Ishikawa 920-0293, Japan

1. Use of Metabolite Analysis to Evaluate Enzymatic Reactions: Enzyme Deficiency, Lack of Coenzyme, or Substrate Overload Urinary metabolite analysis can be used to evaluate enzyme deficiency, lack of coenzymes, or substrate overload, by comparing the data of patients with those of age-matched, healthy controls. In the absence of special diet or drugs, metabolite analysis can be used to nearly comprehensively detect enzyme dysfunctions that are caused by genetic abnormalities. Almost all mutations—known or unknown, common or uncommon—^that result in a significant reduction in enzyme activity can be detected. Enzyme dysfunction can be due to an abnormal structure of an enzyme/apoenzyme, a reduced quantity of a normal enzyme/apoenzyme, or a lack of a coenzyme. It can also result from an abnormal regulatory gene, abnormal sub-cellular localization, or post-transcriptional or post-translational modification [1]. Urine contains numerous metabolites. The metabolome analysis of urine by the combined use of urease pretreatment, stable isotope dilution, gas chromatography/mass spectrometry (GC/MS), mass chromatography, and comparison with age-matched controls offers reliable data for the simultaneous screening or molecular diagnosis of over 130 inborn errors of metabolism (EEMs). These lEMs include hyperammonemias, lactic acidemias, organic acidemias, and lEMs of amino acids, pyrimidines, purines, carbohydrates, and other metabolites.

168

T. Kuhara

2. The Role of Metabolome Analysis in Early, Rapid, and Differential Diagnosis Emergency diagnostic laboratory evaluation should cover all differential diagnoses that are therapeutically relevant, and should alw^ays include ammonia, glucose, lactate, and acid-base status as w^ell as a urine test for ketones. These tests are indispensable to the planning and execution of the first steps of metabolic emergency treatment, and should be available within 30 min [2]. Information on blood pH, ammonium, lactate, glucose, acid-base status, and ketones is important but does not focus on the etiology of the patients' illness. For hyperammonemias, there are more than 42 etiologies, for lactic acidemia, more than 24, and for lEMs of methylmalonate, sulfiir-containing amino acids, vitamin Bn, and folate, more than 22. The metabolome analysis of urine by urease pretreatment of urine or eluates from dried urine on filter paper, stable-isotope dilution, and GC/MS enables the simultaneous screening and molecular diagnosis of numerous lEMs. Early, rapid, and differential diagnosis is most effectively achieved by this procedure.

2.1. Differential Diagnosis of Hyperammonemia As shown in Table 1, there are more than 42 etiologies that give rise to hyperammonemia. Primary hyperammonemia is caused by disorders in any of six urea cycle enzymes and two membrane transport systems; disorders in the latter two are known as hyperomithinemia-hyperammonemia-homocitruUinuria (HHH) syndrome (MIM 258870) and lysinuric protein intolerance (LPI, MIM 247900) [3-5]. Of the above eight disorders, urinary levels of uracil and/or orotate are not elevated in deficiencies of carbamoylphosphate synthetase and A^-acetylglutamate synthetase [6]. In the remaining six disorders, the utilization of carbamoylphosphate is impaired and accumulates in the mitochondria. The activity of carbamoylphosphate synthase is 100 times lower in the cytosol than in the mitochondria, but carbamoylphosphate accumulation in the mitochondria causes an increase in the de novo synthesis of pyrimidine in the cytosol, inducing a marked increase in uracil and/or orotate. Thus, orotate and uracil are specific indicators for the screening and diagnosis of these six primary hyperammonemias. The lack of an increase in these two indicators together with rather persistent hyperammonemia, hyperglutaminuria, pyroglutamic aciduria, and alaninuria strongly suggests the primary hyperammonemia caused by one of the first two urea cycle disorders (primary hyperammonemia in Table 1).

Chemical Diagnosis of Inborn Errors of Metabolism

169

Ornithine transcarbamylase (EC 2.1.3.3) is a hepatic mitochondrial protein that catalyzes the formation of citrulline from ornithine and carbamoylphosphate. An ornithine transcarbamylase deficiency (MIM 311240) does not always show an increase in specific amino acids except for pyroglutamate, glutamine, alanine, or proline. Argininosuccinate synthase (EC 6.3.4.5), the rate-limiting enzyme in the urea cycle, is located in the hepatic cytosol. It catalyzes the formation of argininosuccinate from citrulline and asparagine. A deficiency of argininosuccinate synthase causes citrullinemia (MIM 215700). Argininosuccinate lyase (EC 4.3.2.1) catalyzes the formation of arginine. A deficiency of this enzyme (MIM 207900) causes argininosuccinic acidemia. Arginase (EC 3.5.3.1), a cytosolic enzyme, releases urea and ornithine from arginine. An arginase deficiency (MIM 207800) causes hyperargininemia. In the HHH syndrome, ornithine and homocitrulline increase, and in LPI, lysine markedly increases in the urine but not in the serum. Therefore, to prevent the misdiagnosis of LPI, examination of the urine is critical. For patients with orotic aciduria and/or uraciluria, the levels of citrulline, arginine, and homocitrulline are further examined using a conventional amino acid autoanalyzer, or soft ionization MS, such as fast atom bombardment or electrospray ionization (not GC/MS). The patients are discriminated between ornithine transcarbamylase deficiency, citrullinemia, argininosuccinic aciduria, arginase deficiency, LPI, or HHH syndrome. In the urease-pretreatment procedure, orotate and uracil are determined sensitively and quantitatively by using stable isotope dilution method, and lysine can be quantified and ornithine measured semiquantitatively, using d4-lysine as an internal standard [7]. Recently, liver transplantation has become available as a treatment of lEMs. The urinary levels of orotate and uracil resumed normal after a female patient with ornithine transcarbamylase deficiency received a living related liver transplant (not shown). Thus, the stable isotope dilution method is used not only to make a diagnosis but also to monitor and/or to evaluate the treatment, including liver transplantation, for the disorders in carbamoylphosphate utilization. It would be applicable to experiments using animal models of these diseases, especially in the development of new therapies, including gene therapy [8]. Secondary hyperammonemia is caused by several organic acidemias and other lEMs that cause hepatic dysflinction, such as tyrosinemia or Wilson's disease. Eleven organic acidemias, galactosemia, and hereditary fructose intolerance can be chemically diagnosed [9-11]. Fatty acid oxidation defects associated with rather persistent 3-hydroxydicarboxylic aciduria can also be screened [12].

170

T.Kuhara

Table 1. There are more than 42 etiologies in hyperammonemia I. Primary hyperammonemia (8) (1) Urea cycle disorder • 1 Carbamoylphosphate synthase def. •2 7V-Acetylglutamate synthase def. 1^3 Ornithine transcarbamylase def. ^4 Argininosuccinate synthase def. i!V5 Argininosuccinate lyase def. ^6 Arginase def. (2) Membrane transport disorder 1^7 Lysinuric protein intolerance (LPI) i^ 8 Hyperomithinemia-hyperammonemia-homocitrullinemia (HHH) syndrome II, Secondary hyperammonemia (34) (1) Organic acidemia (11) * 1 Branched chain keto acid DH def. (MSUD) *2 Isovaleryl- CoA DH def ^3 Multiple carboxylase def. *4 ß-Methylcrotonyl-CoA carboxylase def. *5 ß-Ketothiolase def. ^6 Propionyl-CoA carboxylase def. *7 Methyhnalonyl-CoA mutase def. *8 ß-Hydroxy-ß-methylglutaryl-CoA lyase def. *9 Multiple acyl-CoA DH def * 10 Dihydrolipoyl DH (E3) def. ir 11 Tyrosinemia type I (2) Lactic acidemia (8) ^ 1 Pyruvate DH (Ej) def •2 Dihydrolipoyl transacetylase (E2) def. •3 Pyruvate carboxylase def •4 Oxidative phosphorylation defect (I-V) (5) (3) Fatty acid oxidation defects (7) j ^ T F P , i^LCHAD, • M C A D , (4) Hepatic dysfunction (inherited) (5) it: 1 Galactosemia type I

• M C K T , VLCAD,

CAT, CPTII

Chemical Diagnosis of Inborn Errors of Metabolism

171

Table 1. (continued) ^2 Hereditaryfructoseintolerance 3 Glycogen storage disease type IV •4 Citrin deficiency 5 Hemochromatosis (5) Other causes (4) 1 Transient neonatal hyperammonemia 2 Hepatic failure 3 Portacaval shunt 4 Obstructive urinary tract infection * , Differential molecular diagnosis (DMD) [13]; DMD: -i^, screened (SC) [8], orotate t (6), polar 3-hydroxydicarboxylate t (2); • , screened (SC) but not always [13] def., deficiency; DH, dehydrogenase 2.2. Differential Diagnosis of Lactic Acidemia In lactic acidemia, a serum lactate concentration is higher than 2 mM. More than 24 etiologies are know^n for lactic acidemia (Table 2). Primary lactic acidemia includes pyruvate dehydrogenase complex disorders, oxidative phosphorylation (OXPHOS) disorders, gluconeogenesis disorders, and membrane transport defects [13]. In OXPHOS disorders, mutations in both nuclear and mitochondrial DNA are involved, except for mutations that involve complex II. Nuclear DNA encodes 70 OXPHOS subunits. Mitochondrial DNA encodes 12 OXPHOS-associated subunits, 22 tRNAs, and 2 rRNA subunits. The mutations in mitochondrial DNA cause a complex phenotypic expression metabolically and clinically. Mitochondrial DNA is susceptible to mutation, and a large number of mutations are accumulated at a high incidence. Eight organic acidemias are secondary lactic acidemias. These organic acidurias are differentially and chemically diagnosed. Lactic acidemia and lactic aciduria are observed during hypoglycemic episodes in patients with gluconeogenesis disorders. In addition, sensitive and specific indicators, glyceroI-3-phosphate and glycerol, increase in a fructose-1,6-diphosphatase deficiency. Glucose-6-phosphatase deficiency gives specific metabolic profiles during hypoglycemic episodes. Of the 24 etiologies, all can be screened except for very mild lactic acidemia that causes no lactic aciduria. Differential molecular diagnosis can be applicable for 8 or 10 disorders. During remission, the profiles for patients w^ith glconeogenesis disoders

172

T.Kuhara

shows no abnormality. Table 2. There are more than 24 etiologies in lactic acidemia L Pyruvate dehydrogenase complex disorders (4) 1 Pyruvate DH (Ei) def. 2 Dihydrolipoyl transacetylase (E2) def. Dihydrolipoyl DH (E3) def. 3 Pyruvate DH phosphatase def 4 E3 bmding protein def IL Oxidative phosphorylation (OXPHOS) (5)^ 1 Complex I 2 Complex II 3 Complex III 4 Complex IV 5 Complex V III. Gluconeogenesis disorders (4) 1 Pyruvate carboxylase def 2 Phosphoenolpyruvate carboxylase def iV 3 Fructose-1, 6-bisphosphatase def ^4 Glucose-6-phosphatase def IV. Membrane transport (3) 1 2 _3

Pyruvate: mt NADH: mt Lactate: cytoplasmic

V. Organic acidurias (8) * 1 Branched chain keto acid DH def (MSUD) -kl Lipoamide DH (E3) def *3 Isovaleryl- CoA DH def *4 Multiple carboxylase def *5 ß-Methylcrotonyl-CoA carboxylase def *6 ß-Ketothiolase def *7 Propionyl-CoA carboxylase def *8 Methylmalonyl-CoA mutase def. All screened (SC) except for very mild lactic acidemia and during remission in gluconeogenesis. Differential molecular diagnosis (DMD) 8 ( * ) , but 10 (lir) during episode in gluconeogenesis def, deficiency; DH, dehydrogenase ^Nuclear DNA encodes 70 OXPHOS subunits. mt DNA encodes 12 OXPHOS, 22 tRNA, and 2 rRNA subunits

Chemical Diagnosis of Inborn Errors of Metabolism

173

2.3. Inborn Errors of Metabolism of Sulfur-Containing Amino Acids, Cobalamin, and Folate, and Methylmalonic Acidemia There are more than 22 etiologies of methylmalonic aciduria and disorders of the metabolism of sulfur-containing amino acids, cobalamin, and folate. The trans-sulfuration pathway converts the sulfur atom of methionine into the sulfur atom of cysteine, and produces more methionine by the methylation of homocysteine. Homocystinuria types I, II, and III are characterized by different etiologies, biochemical abnormalities, and therapeutic measures. In type I, a deficiency of cystathionine ß-synthase (L-serine hydrolyase [adding homocysteine]; EC 4.2.1.22), homocysteine increases, resulting in methionine overproduction. A simple treatment with pyridoxine for the pyridoxine-responsive type, or a dietary restriction of methionine and supplementation with cystine for the pyridoxine-unresponsive type, greatly improves the outcome of affected infants [14]. In type II, defective remethylation due to a deficiency of TV^'^^-methylenetetrahydrofolate reductase (5-methyltetrahydrofolate: [acceptor] oxidoreductase; EC 1.1.99.15) (MTHFR, EC 1.1.1.68), folate, and betaine may have the advantage of lowering the homocysteine levels and increasing the methionine levels [15]. Recently, the importance of the role of folate and of early detection of type II patients has been stressed [16,17]. Type III is caused by a deficiency of TV^-methyltetrahydrofolate homocysteine methyltransferase (iS-adenosylL-methionine: L-homocysteine iS-methyltransferase, EC 2.1.1.10), resulting in the defective synthesis of methylcobalamin and deoxyadenosylcobalamin. This condition, caused by a genetic mutation, or nutritional vitamin Bn deficiency, is accompanied by combined homocystinuria and methylmalonic aciduria [18]. If the neonatal screening method for homocystinuria targets methionine in filter-paper blood spots, type II is not detected, because it causes low or relatively normal levels of plasma methionine with moderate homocystinuria. Furthermore, isolated hypermethioninemia due to a deficiency of hepatic methionine adenosyltransferase (^-adenosylmethionine synthetase, ATP: L-methionine iS-adenosyltransferase; EC 2.5.1.6) is detected as well; the latter is thought to be free of clinical symptoms in most cases, indicating that the accumulation of methionine in the body is not harmful. The simplified urease pretreatment is able to differentiate the three types of homocystinuria by the simultaneous measurement of methionine, homocystine, methylmalonate, uracil, and creatinine in filter paper urine samples, when each respective stable isotope-labeled compound is used [19]. The total ion current (TIC) and mass chromatograms of the trimethylsilyl derivatives of metabolites from a male patient with type I are shown in Figs. 1 and 2.

174

T. Kuhara

liv^

ob' ' / o b '

UMI)!!^^

IM

's'.ob' ' 'g'o'o' " lÖ.Öo' ' 'l-I.Öo' ' '12.00' ' ld.Ö0'tirne{min

Fig. 1. Total ion chromatograms (TIC) of the trimethylsilyl derivatives of metabolitesfromthe urine of a patient with homocystinuria type I during transient megaloblastic anemia due to folate and vitamin B12 deficiency. The major components of the peaks are: 7, glycine; 2, ß-aminoisobutyrate; 3, phosphate and leucine; 4, erythritol; 5, threitol; 6, methionine and internal standard (IS); 7, tetronate; 8, creatinine and IS; A, orotate and ^^N2-orotate (IS); P, mannitol; 10, urate; 77, n-heptadecanoate spiked; 72, cystine and IS; 75, pseudouridine; B, homocystine-dg (IS); ^'^ homocystine

9.20 9.30 9 40 9.50 9.60 9,70 9.80 9,90 10.00 J''"®, 13.50 13.60 13.70 13.BO 13.90 14.00 14.10 14,20 ^'me (min) (mm;

Fig. 2. Part of the total ion current (TIC) and mass chromatograms of Fig. 1. a m/z 254 (256) and 357 (359) for orotate. b m/z 278 (282) and 128 (131) for homocystine

Chemical Diagnosis of Inborn Errors of Metabolism

175

Urinary metabolite levels determined using the simplified urease procedure were compared before and after treatment with folate in a patient who temporarily had developed megaloblastic anemia and whose serum folate had been below the normal range. By way of this treatment the megaloblastic anemia disappeared, the level of orotate decreased to the normal range, the methionine level doubled, and the homocystine decreased markedly but was still distinctly higher than the control range. Thymidylate synthase, which catalyzes the conversion of deoxyuridine monophosphate (dUMP) to deoxythymidine monophosphate (dTMP), is folate-dependent, and pyrimidine biosynthesis is regulated by end-product inhibition. Folate deficiency thus causes impaired DNA synthesis, enhanced pyrimidine biosynthesis, megaloblastic anemia, and orotic aciduria. Folate supplementation significantly reduced the level of homocystine and dramatically increased that of methionine. Therefore, this simple yet sophisticated diagnostic procedure has proved usefiil for monitoring the biochemical and nutritional conditions of patients, especially for acquired deficiency of folate and vitamin B12, as well as for evaluating the efficacy of treatments [19].

3. Tailor-Made Medicine and Disease Diagnosis

3.1. Inborn Errors of Pyrimidine and Purine Metabolism Inherited enzyme defects in the de novo synthesis of purines and pyrimidines or in their salvage and catabolism can cause alterations in cellular nucleotide patterns and the accumulation of normal or abnormal purines, pyrimidines, and their degradation products in body fluids. Twenty-four disorders have now been recognized. These defects manifest clinically with a broad spectrum of symptoms, including severe neurological abnormalities, fatal immunodeficiency, anemia, or urolithiasis. Inborn errors of purine and pyrimidine metabolism still present a diagnostic problem [20]. To date, several methods have been reported, including high-performance liquid chromatography (HPLC), thin-layer chromatography, capillary electrophoresis, GC/MS, and nuclear magnetic resonance, for this diagnosis [21-26]. These methods, however, analyze only a part of the index metabolites in urine, and cannot be used readily for quantification, are lack of specificity or sensitivity, or require further analysis for a differential diagnosis [27]. Recently, high-performance liquid chromatography-electrospray ionization~MS/MS (LC/ESI-MS/MS) methods have been

176

T.Kuhara

developed for this purpose, but they still cannot be used for the accurate and concurrent determination of several metabolites [28,29]. Thus, metabolome analysis should be employed to screen and make a chemical diagnosis of these defects. Here, we describe a rapid and specific procedure for the chemical diagnoses of pyrimidine degradation disorders, Lesch-Nyhan syndrome, and adenine phosphoribosyl transferase deficiency using GC/MS.

3.2. Deficiencies of Pyrimidine Degradation The pyrimidine nucleic acid bases, thymine (T) and uracil (U), are degraded by four successive enzyme reactions in humans, as shown in Fig. 3. The degradation is catalyzed by dihydropyrimidine dehydrogenase (5,6-

H N - ' N ^ ^ NADPH+H^ NADP^

0V H (1)

-^i;^-

HN''\ ^ ^ H-C-COOH u '^ r^r^>, u ^ . .^ H-L-UOOH H ^ ^ * ^ H-C-COOH oxaloacetate H /

"^ r "-^^^ H-C-COOH 1. ^N^ ^ H - C - COOH H - C - COOH'SOcitrate

1^^'^ fumarate H-C-COOH HOOC-C-H k ™

H-C-COOH ^^ 1 NAD k_.^ADH

^ ^ FAD-^\

M 7 H-C-(OOH succinate H-C-COOH

^

CoA

H^-^^CO, H-C-::OOH H-C-H a-ketoglutarate / Q QOOH

iy^ / I ^ /

H-C-H ;.

^JC

"*^/ \

C-COA

GTPcDP.Pi ^ succinyl CoA

i

\

X

\ ^oA NAD

eo2^^°^

Fig. 4. Tricarboxylic acid (TCA) cycle. NAD, nicotinamide adenine dinucleotide; FAD, flavin adenine dinucleotide; NADH, FADH, reduced NAD, FAD; CoA, coenzyme A

m

Carboxylic acid

Fig. 5. Examples of chirality- or aromaticity-dependent symmetry. Arabitol and xylitol have different chiralities of carbon atoms, and only arabitol is stereochemically symmetric. Likewise, carboxylic acid is symmetric

)

NH2

O

O "^

o L-Serine

Pyruvate

O

O

O

^-O

Hydroxy pyruvate

O

NH2

L-Alanine

Fig. 6. Amino-transfer reaction. The amino group of 1-serine is transferred to 1-alanine, and the rest of the structure to hydroxypyruvate. Since structures of these molecules are all similar, it is impossible to predict the atomic correspondence only from their structures

ARM Database

199

2.4. Nitrogen Metabolism The major part of the traditional metabolic map describes the carbon metabolism, and the metabolism of other atomic elements (e.g., nitrogen or sulfur) is comparatively less well treated. For example, the relationship of Urea cycle with other amino-transfer reactions is not explicitly shown in the metabolic map. Figure 6 shows a typical amino-transfer reaction in which the amino group in L-serine is transferred to L-alanine. In the nitrogen metabolism, this reaction constitutes a pathway between L-serine and L-alanine, whereas in the carbon metabolism, the same reaction forms a pathway between L-serine and hydroxyl-pyruvate. From this example, it is clear that the notion of pathways depends on the atomic element under focus. In summary, the following data are required to correctly trace metabolic pathways: • Structures of metabolites including the information on stereochemistry, aromaticity, and tautomerization • The atom-to-atom correspondence between metabolites in each catalytic transformation These indispensable data for computing pathways are not provided by traditional metabolic databases. So far, metabolic databases have been designed to display prepared (or fixed) views to the users, and do not support function to search or reconstruct pathways on demand.

3. ARM Database The Atomic Reconstruction of Metabolism (ARM) Database is designed to search pathway models in metabolism [3].

3.1. What is a Pathway Model? Ideal functionality of metabolic databases is often described as the prediction of unknown pathways or the design of new metabolic maps. Such ideas are generalized to the notion of "pathway models." A model is an abstract description of experimental observation. In the study of metabolism, a model may refer to a set of differential equations used in quantitative simulations, or a set of biochemical reaction formulas for qualitative simulations. The ability of flexibly selecting a model is an important prerequisite of biological simulation. Few software systems, however, have

200

M. Arita

been designed for the flexible selection of models. For example, most software tools for metabolic simulation aim to reproduce biological mechanisms (by changing parameters) on a fixed model. Most optimization tools output limited number of optimal models only. From a practical standpoint, it is crucial that they could provide multiple options so that each user can select the most persuasive, elegant model to explain the given biological mechanism. Database designers should be aware how the model-selection is supported in their databases, hi metabolic study the fundamental step in the model-selection is tracing (or searching) pathways. The ARM database uses a graph representation of the entire metabolic network to achieve this process. 3.2. Graph Representation of the Metabolic Networic "Graph" is defined by a set of nodes and a set of edges, each connecting two nodes as in a railroad map. For the basic terms for graph, refer to standard textbooks in graph theory [4]. To transform metabolic reactions into graph, the software tool in ARM first detects the structural correspondence in each reaction formula at the atomic scale. Figure 7 shows the transformation corresponding to the reactions in Figs. 3 and 6. The set of biochemical reactions are then transformed to a graph where nodes and edges represent metabolites and their substructural correspondences, respectively. Li this graph, any metabolic pathway forms a graph path. Note that the reverse does not always hold. The software tool in ARM computes a metabolic pathway between the given two compounds as follows. The algorithm to search a pathway 1. Let an integer value A: = 0. 2. Find the Ath shortest graph path P between the given nodes. 3. If P does not represent a valid pathway, increment k and go to Step 2. 4. Output p. Since not all graph paths are valid metabolic pathways, an algorithm to find the kth shortest path for any k is necessary [5]. To check the validity of a pathway, reachability between the given compounds must be verified. 3.3. Digitization of Molecular Structures Molecular structures are described either in the MOL- or SMILES format [6,7]. In the ARM database, molecular structures are represented as graphs

ARM Database

201

where nodes and edges correspond to atoms and their bondings, respectively. Aromatic rings are detected by applying the Hueckel rule for all cyclic parts in the structure [8]. By this rule, nucleic acids and heme are found to contain aromatic rings. The Hueckel Rule A ring structure is aromatic if it is physically planar and contains (4«+2) n electrons («: positive integer). Each double bond in the conjugation (alternative appearance of single and double bonds) contributes two n electrons, and each nitrogen, oxygen, or sulfur that is not doubly bonded contributes two n electrons. A classic algorithm called the Morgan method is used to detect the symmetry of molecular graphs. In the Morgan method, each graph node is labeled with an integer value that is iteratively updated so that topologically equivalent nodes will receive the same integer labels [9]. The Morgan Algorithm 1. For each graph node, initialize its label as its node degree. 2. For each node, update its label as the sum of all the integer labels of its adjacent nodes and its current label. 3. While the updating procedure results in finer classification of nodes, repeat Step 2. 4. Consider nodes with the same integer labels as topologically equivalent. The algorithm fails to detect the symmetry of a graph in which all nodes share the same degree (i.e., a regular graph). Theoretically, the problem of finding topological symmetry is related with the graph isomorphism problem [10]. For molecular structures in metaboUsm, however, such difficulty does not arise, and a variant of the Morgan method that considers atomic number and chirality suffices [11]. EC 2. 2.1.1 5ed7P Sed7P GA3P O^

Rib5P '"---.^XylbP »-O

EC 2. 2.1.2 Se Sed7P GA3P 0-*

Ery4P ^^ ,,Fru6P »-O

EC 2. 6.1.51 L-Serine Pyruvate O^

Hydroxy pyruvate L-Alanine •O

Fig. 7. Graphic representation of biochemical reactions. Metabolites and their (sub)structural relationships become nodes and edges, respectively. The figure shows carbon mappings

202

M. Arita

3.4. Drawing Molecular Structures Since the SMILES format does not provide the coordinate information of molecules, an algorithm to draw structures is required for their graphical display. Since most molecular structures in metabolism are simple, the software tool in ARM uses the following procedure. The algorithm to draw structures 1. Find biconnected components (rings) of the structure. 2. For each biconnected component: A) Find its cycle basis C so that the sum of its component-cycle sizes in C is the minimum of all cycle bases. B) Draw the largest cycle in C using an equilateral polygon. C) Draw adjacent cycles of the lastly drawn cycle so that they spread outward. 3. Sort all the biconnected components, and consider the entire structure as a tree with the largest component at its root. From the root, place each component breadth-first, and connect them by drawing branches. Branches are always drawn outward. 4. After the layout of all components, detect, if any, pair of nodes that are too close. 5. For all edges in the shortest path between the detected close pair, adjust the length and the angle so that no node pairs will be too close. 3.5. Digitization of Enzymatic Reactions If the mass balance of a metabolic reaction is equilibrated, all atoms in one side (say, substrates) must correspond one-to-one to the atoms in the other side (products). In the ARM database, such structural relationship at the atomic scale is digitized as a set of atomic correspondents, i.e., atomic position-pairs between substrates and products. Atomic correspondents can be grouped for each substrate-product pair of molecules, as shown in Figs. 3 and 6. Each group is called an atomic mapping [3]. Structural comparison is applied molecule-wise for both sides of a reaction: first substrate and first product, second substrate and second product, and so on. In each comparison, the largest common substructure between two metabolites is computed and registered as one mapping. Then, leftover structures are collected and compared again until all atoms are matched. For example, in the reaction EC 2.2.1.1: Rib5P+Xyl5P=Sed7P+GA3P

ARM Database

203

RibSP and Sed7P, and Xyl5P and GA3P are compared with priority. For this purpose, all reaction formulas in the ARM database were rearranged so that the molecular orders on the left and right sides roughly correspond one-to-one. Reactions may contain generic names such as "alcohol." Such abstract names were substituted with their corresponding specific names: e.g., methanol, ethanol, propanol, and butanol. For example, the reaction by alcohol dehydrogenase (EC 1.1.1.1): (Original) alcohol+NAD=aldehyde+NADH was rewritten as (Curated)methanol4-NAD=formaldehyde+NADH ethanol+NAD=acetaldehyde+NADH

In a reaction for polymerization, it is inherently impossible to compute one-to-one atomic correspondence because the polymer size is indeterminable. As a compromise, variable regions of polymers were substituted with their corresponding monomers or dimers in the ARM database. Consequently, the notation for polymers, typically described as (...)„, was changed to its corresponding monomeric or dimeric expression. For example, the reaction by DNA ligase (EC 6.5.1.1) was rewritten as follows: (Original) ATP+(DNA)^+(DNA)^AMP+PPi+(DNA)^^^ (Curated)ATP+DNA+H20=AMP+DNA+PPi For this reason, the function for ligase or protease is not explicitly described in the ARM database, and consequently water molecule (H2O) is artificially introduced to balance the number of atoms. For the detection of a common substructure, the software tool in ARM uses the Morgan method. By applying the method for only a few steps, it is possible to detect topologically equivalent, small substructures. By extending computed substructures in this manner, the software tool detects the largest common substructure between two molecules under comparison [3].

4. Pathway Statistics The ARM database provides data for the basic metabolism in Escherichia coli. Bacillus subtilis, and Saccharomyces cerevisiae- The number of me-

204

M. Arita

tabolite structures and biochemical reactions are about 2500 and 2000, respectively. They together constitute one large graph representing the entire metabolism [3]. Hereafter the network is referred to as the metabolic graph. For a flexible network analysis, any weight of positive value can be assigned to edges in the metabolic graph. Because pathways are computed by the Ä:-shortest paths algorithm, by changing edge weights, we can search pathways for different purposes: typical examples include pathways preferring or avoiding specified metabolites, or pathways containing as many specified reaction types as possible. Also realized is the function to trace specific atoms (carbon, nitrogen, or sulfur) using the mapping information at the atomic scale. 4.1. Analysis of £. coli metabolism About 850 annotations for E. coli metabolic genes accounted for roughly 1000 reaction formulas in about 600 EC enzyme subsubclasses [12,13]. These reactions are converted into 1230 atomic mappings, without duplication, among 906 metabolites for the carbon and nitrogen metabolism. Of 1230 mappings, 1179 accounted for carbon-carbon relationships among 905 metabolites (the only excluded metabolite was ammonia) [14]. Since most reactions contain three or more mappings, the statistic indicates the presence of frequently used atomic mappings. One such example is the mapping between ATP and ADP, which appears in most phosphorylation reactions. With our current dataset, each metabolite exhibits less than 1.5 patterns of structural transformations on average. Table 2 shows the metabolites whose structures are most diversely transformed in currently identified biochemical reactions. Carbon dioxide is the most variously transformed metabolite because it is involved in all decarboxylation reactions. Next in the order of the number of transformations are acetyl CoA and pyruvate, which play central roles in metabolism. ATP is also highly convertible because not only of phosphorylation but of adenylation to form FAD or NAD. »S-Adenosyl methionine is convertible because it transfers a methyl group to different donors. Li contrast. Table 3 shows the metabolites that frequently appear in biochemical reactions. Water, orthophosphate, and other inorganics without carbon atoms are excluded from Table 3. Most of the high-ranking metabolites are cofactors. The ranks for L-glutamate and keto-glutarate became also high because of amino-transfer reactions they participate in. On the other hand, central metabolites such as acetyl CoA and pyruvate were not highly ranked.

ARM Database

Table 2. Top 20 of the most diversely transformed metabolites 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Acetyl-CoA Pyruvate CoA ATP ^-Adenosyl-L-methionine L-Glutamate D-Galactose D-5-Phospho-ribosyl 1-diphosphate D-Glucose D-Fructose 6-phosphate L-Aspartate Dihydroxy-acetone phosphate UDP-D-Galactose Acetate Glycine Oxaloacetate L-Serine UDP-A^-Acetyl-D-glucosamine Keto-glutarate

Table 3. Top 20 of the most frequently appearing metabolites ~1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

ÄTP NAD NADH ADP CO2 NAD phosphate CoA NADH phosphate L-Glutamate AMP Pyruvate Keto-glutarate Acetyl-CoA D-Galactose Acetate L-Methionine D-5-Phospho-ribosyl 1-diphosphate L-Homocysteine D-Glucose Phosphoenol pyruvate

205

206

M. Arita

Table 4. Top 20 metabolites ranked by the score of the number of appearance divided by the number of conversions 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

NADH NAD phosphate NADH phosphate NAD ADP L-Homocysteine Protein A^(pai)-phospho-L-histidine ATP L-Glutamine L-Methionine Protein L-histidine AMP Inosine diphosphate »S-Adenosyl-L-homocysteine Pyridoxal L-Glutamate L-Tetrahydrofolate Deoxy ATP CoA Flavin mononucleotide

35000 ^ 30000 i 25000 g, 20000 2 15000 E 10000 ^

5000

n II II II II II II II1

0 1

3

5

7

9

1

1

II H n n n n

11 13 15 17 Pathway length

19

21

. 23

25

Fig. 8. The distribution of pathway lengths in basic metabolism of Escherichia coli' The average path length is around 8. Biochemical reactions are considered directed, based on the Biochemical Pathways wall-chart by Roche Applied Sciences

ARM Database

207

A class of metabolites called cofactors can be characterized by focusing on the pattern of structural transformations. That is, we can define cofactors as frequently appearing metabolites whose structures are little transformed. Table 4 shows the metabolites ranked by the score of the number of appearance divided by the number of structural transformations. In Table 4, acetyl Co A and pyruvate do not appear and cofactors occupy the high ranks. Among high-ranking metabolites, protein (phospho)-L-histidine functions as a cofactor in sugar-transfers, and methionine and homocysteine function as cofactors in methyl-transfers. These statistics depend on the chosen set of biochemical reactions and the precise rankings in Tables 2-4 per se do not give much biological insight. Still, such global analysis suggests the characteristic of the entire network. For example, metabolic networks were reported to satisfy small-worldness. The definition of a small-world network is: (i) most nodes (metabolites) have a low connection degree and the degree distribution follows a power law; (ii) high-degree nodes, called hubs, dominate the network, and most nodes are clustered around hubs; and (iii) the average path length (i.e., the average of the shortest path length over all pairs of nodes in the network) remains the theoretical minimum, that of a random graph. Many naturally developed networks are known to satisfy these properties [15]. According to recent reports, the metabolic network of bacteria forms a small world of average path length (AL)«3 [16,17]. This means that any pair of metabolites is connected through a few reaction steps. However, previous analyses did not verify the reachability of atoms through reaction steps, and therefore included unrealistic pathways through which no atoms are actually transferred. Such overestimation increases the possibility of false connections and consequently reduces the AL. When the AL of pathways that conserve at least one carbon atom is computed, the distribution of pathway lengths becomes as shown in Fig. 8. The distribution may slightly differ depending on the reversibility of reactions, but the AL cannot be as small as 3. Moreover, since each reaction represents a unique biochemical step, its function cannot be compensated by others. For these reasons, it is unlikely that a metabolic network satisfies characteristics of a small-world network [14].

208

M. Arita

5. Future Applications 5.1. Degradation Pathways of Xenobiotics As environmental issues become increasingly focused, bacteria that can degrade persistent organic pollutants such as polychlorinated biphenyl (PCB) receive more attention. There exists an on-line database that specializes in degradation pathways of xenobiotics [18]. One major theme in future biotechnology is the systematic prediction of the degradation pathway achieved by combination of microorganisms or of simple genetic engineering. The hurdles for this prediction are that (i) the target compound is unspecified and that (ii) prediction of uncharacterized reactions is necessary. The former problem can be solved by radially computing all pathways from a given compound. The latter problem is more challenging. Since catalytic transformations in uncharacterized pathways are assumed to be similar to those of known ones, structure-based classification of reactions is indispensable. One solution is to automatically apply basic reaction steps such as oxidation or reduction classified in this manner, and to test all possible biochemical transformations.

5.2. Prediction of Drug {Metabolism A similar approach can be applied to the prediction of drug metabolism or drug degradation. The difficult part is that a drug molecule is not always broken down to be excreted but make a conjugate (e.g., glucuronide conjugation). To cope with such structural changes the application of known, basic reaction steps is informative. The software tool in ARM has an ability to apply such characteristic conversions to compute all possible degradation pathways, but the conversions listed in textbooks are too general to be used for this purpose. For example, a monoamine oxidase can substitute an amino group with a hydroxyl group, but the substitution is not applicable to all amino groups. Therefore, more specific degradation conditions are to be clarified, preferably for each species under investigation, to achieve realistic prediction using the ARM database.

ARM Database

209

6. Conclusion The ARM database provides a unique function to digitize the atom-to-atom correspondences between metabolites in identified biochemical reactions. Currently, no other metabolic database can trace biochemical pathways at the atomic scale. The technique used can be extended to other pathway searches in which molecular structures are stepwise transformed. Such an application includes the design of bio-processes for useful materials such as amino acids or plastics. Li the train network, the primary criterion to select a route is the distance (i.e., shortest paths are chosen), hi the evolutionary process of metabolic networks, however, it seems that this criterion has not been much exercised: not a few metabolic pathways include redundant steps and are not optimized in terms of length or robustness. The systems biology on metabolic networks is still at its earliest stage, and will become more important in future biological research.

References 1. Michal G (ed) (1999) Biochemical pathways. Wiley/Spektrum Akademischer Verlag, New York 2. Alberts B, Bray D, Johnson A, Lewis J, Raff M, Roberts K, Walter P (1998) Essential cell biology. Garland, New York. 3. Arita M (2003) In silico atomic tracing by substrate-product relationships in Escherichia co//intermediary metaboUsm. Genome Res 13(ll):2455-2466 4. Bondy JA, Murty USR (1976) Graph theory with applications. Elsevier North-Holland, Amsterdam. 5. Eppstein D (1998) Finding the k shortest paths. SIAM J Comput 28(2):652-673 6. MOL Format: MDL Information Systems. Description downloadable from http://www.mdli.com 7. SMILES Format: Daylight Chemical Information Systems. Description downloadable from http://www.daylight.com 8. VoUhardt KPC, Schore NE (1998) Organic chemistry: structure and function, 3rd edn. Freeman, New York 9. Wipke WT, Dyott TM (1974) Stereochemically unique naming algorithm. J Am Chem Soc 96:4834-4842 10. Koebler J, Schoening U, Toran J (1993) The graph isomorphism problem. Birkhauser, Boston 11. Arita M (2000) Metabolic reconstruction using shortest paths. Simulation Pract Theory 8(2): 109-125 12. Karp PD, Riley M, Saier M, Paulsen IT, Paley S, Pellegrini-Toole A (2002) The Ecocyc database. Nucleic Acids Res 30(l):56-58

210

M. Arita

13. Ouzounis CA, Karp PD (2000) Global properties of the metabolic map of ^5cherichia coli. Genome Res 10(4):568-576 14. Arita M (2004) The metabolic world of Escherichia coli is not small. Proc Natl Acad Sei USA 101(6):1543-1547 15. Strogatz SH (2001) Exploring complex networks. Nature 410(6825):268-276 16. Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL. (2000) The large-scale organization of metabolic networks. Nature 407(6804):651-654 17. Fell DA, Wagner A (2000) The small world of metabolism. Nat Biotechnol 18(11):1121-1122 18. Ellis LB, Hou BK, Kang W, Wachett LP (2003) The University of Minnesota Biocatalysis/Biodegradation Database. Nucleic Acids Res 31(l):262-265

Chapter 14: The Genome-Based E-CELL Modeling (GEM) System Kazuharu Arakawa, Yohei Yamada, Kosaku Shinoda, Yoichi Nakayama, and Masaru Tomita 1311 Laboratory for Bioinformatics, Institute for Advanced Biosciences, Keio University, Endo 5322, Fujisawa 252-8520, Japan

1. Introduction Rapid advances in the sequencing technologies have brought forth a vast amount of genome data. The GenBank database at the National Center for Biotechnology Liformation now distributes more than 100 complete genomes [1]. This fast accumulation of data is also present in all fields of molecular biology, including transcriptome, proteome, and metabolome. With the existence of the large amount of data, understanding of the cell requires a "systems biology" approach in order to view the dynamic behavior as a complex system [2]. Several successful attempts for this purpose through cell simulation have already been reported [3-5]. Dynamic simulation provides an integrative view through systematic modeling of reaction networks within a cell from quantitative data. Each reaction is required to be expressed in accurate rate equations with precise parameters that are difficult to be found as a complete set. Therefore, large-scale modeling of cells in silico demands a novel, high-throughput approach. If successfully integrated, the availability of a large amount of genome sequence, transcripts and expression data, enzyme reaction data, metabolic pathway maps, and the data of metabolites in cells will create a strong base for a whole cell model. In this chapter we discuss a genome-based large-scale modeling approach, implemented as the Genome-based E-CELL Modeling System (GEM System) on the generic bioinformatics analysis workbench, G-language Genome Analysis Environment (G-language GAE) [6]. The GEM System enables automatic generation of a cell-wide metabolic pathway model ready for pseudodynamic simulation over E-CELL simulation environment, based on the input genome sequence. A Graphical User Interface (GUI) is provided via G-language GAE for easy manipulation.

212

K. Arakawa et al.

2. Integration of Public Databases In whole-cell modeling, integration of different biological databases is a challenging task because the target field is broad and the subject of each database is not the same. For example, it is difficult to link a proteome database to the transcriptome database because this step requires a reference of protein to mRNA. Moreover, even if the subject is the same, the names of genes and proteins are often ambiguous and are thus difficult to match. However, because of the rapid advance in sequencing technology and the abundance of genome sequence data, most databases contain links to the genome sequences regardless of the subject, so that it is possible to link the large amount of biological information by following the Central Dogma. Automation of this integration process by bioinformatics realizes high-throughput extraction of data. The GEM System contains a set of internal databases that integrates public biological databases such as EMBL, SWISS-PROT, KEGG, ARM, BRENDA, and WIT [7-12]. Two main databases exist: one is the "Variable" database, which contains the unified names of genes, metabolites, and enzymes, and the other is the "Process" database which stores enzyme reaction stoichiometry that is checked for atomic consistency using chemical formulae of the substrates. After matching the genome sequence to the corresponding enzyme, the entire metabolic network is reconstructed and generated from these two databases (Fig. 1).

3. Prediction of Coding Regions The GEM System accepts the genome sequence as its input and automatically generates a simulation model based on the sequence data. Although information about the location of each coding region is available if the input genome sequence is in annotated database formats such as GenBank and EMBL, genomes sequenced in-house or those in raw sequence formats must first of all be scanned for potential open reading frames. In the GEM System, Glimmer2 is employed for prokaryotes and GlimmerM for eukaryotes for this purpose [13,14]. Glimmer is known to have a high rate of false positives but a low rate of false negatives, and because the subsequent steps in the GEM System will filter out the false-positive hits, it is most convenient.

Genome-based E-CELL Modeling (GEM) System

213

Fig. 1. Public biological databases are stored internally in a relational database for quick access and data retrieval. From this internal database, a database containing consistent metabolite names and another database containing consistent enzyme stoichiometry are derived as variable dsvd process databases. Simulation model is generated using these internal databases

Fig. 2. Genes are matched to their enzyme products through three levels of matching. Level 1 directly refers to the annotation in the input genome database file, and Level 2 performs homology search against the SWISS-PROT database for those left unmatched. Lastly, all remaining genes are searched by orthology in Level 3

214

K. Arakawa et al.

4. Matching Genes to Enzymes In our top-down approach, enzymes are not matched from the annotation but from the sequence itself. Current approaches in modeling usually rely on sequence searches by homology; however, homology-only methods are usually insufficient in functional genomics. Lester et al. suggest that the similarity obtained by interspecies homology search for orthologs is "generally poor" [15]. We have conducted a multiple alignment test for amino acid sequences in SWISS-PROT having the same EC number, and also found that distant species have very weak similarity even though their biological functions are analogous (data not shown). However, homology search is effective between related species, and between sequences having high similarity. Therefore, genome annotation techniques usually screen open reading frames with homology, and then combine several methods to annotate sequences with low homology. In the GEM System, the required function is to connect an amino acid sequence to stoichiometric equations. Searching for orthology and identifying the EC number for the sequence may achieve this, since the reaction mechanism is conserved between the orthologs. Among the several database of orthologous genes, we use the "cognitor" program provided with the expert curated COG database in the GEM System [16-18]. The "cognitor" program assigns a COG id for a given amino acid sequence, and a COG id is allocated to represent one gene product. We assigned COG ids for all SWISS-PROT entries, and found that of the monomer enzymes for which EC numbers are assigned, 83.3% has one-to-one relationship between the EC number and the COG id. It is worth noting that the rest of 16.7% is not completely random, but has about one-to-three relationship in average, which can be further specified to be one-to-one relationship by combining homology information. Therefore a sequence can be matched to an EC number using a hybrid method of homology and orthology. Using the above approach, three levels of matching the sequences to stoichiometric equations is implemented as shown in Fig. 2. To obtain the most reliable matches, level 1 uses the reference in EMBL genome database which links a gene sequence to a SWISS-PROT accession number, and level 2 performs a BLASTP homology search against the SWISS-PROT database with relatively high cutoff values [19]. In the GEM System, the default e-value is e-05. When the matched SWISS-PROT data do not contain the "CATALYTIC ACTIVITY" line with the stoichiometric equation, the system searches for SWISS-PROT accession numbers in the same cluster category in the WIT database. This will provide a list of analogous enzymes with their stoichiometry.

Genome-based E-CELL Modeling (GEM) System

215

For sequences with low similarity, an orthology search is performed on level 3 of the system. About 70% of most microbial sequences have already-assigned COG ids in the PTT database, and the unassigned genes go through the online "cognitor" and the offline "dignitor" programs. Then a BLASTP homology search is performed against SWISS-PROT sequences assigned to the same COG id, to accomplish the best one-to-one match. A WIT cluster search is additionally performed for stoichiometric equations where necessary. At this level of search, the KEGG Enzyme database can also be used directly from the EC number matched by orthology. All of the above information retrieved is readily checked in the pathway databases for connectivity. By matching every gene to the enzyme product, the system cannot distinguish isozymes from enzymes with multiple subunits; moreover, there may be false negatives during the matching process. Stoichiometry of isozymes responsible for a specialized part of pathways are also difficult to obtain. Therefore, all extracted stoichiometry is checked for connectivity based on KEGG and BioCyc reference pathways. The resulting output file is compiled into E-CELL Simulation Environment (ERI) format file, ready for large-scale qualitative simulation of the entire cell. The significance of these methods is that the only required information is the genome sequence, so that models from any organism whose complete genome is available can be constructed, regardless of the accuracy and progress in annotation.

5. Hybrid Algorithm The reaction list generated by the GEM System is the list of enzyme stoichiometry for static simulation, and cannot be applied for the study of dynamic behavior. A key to this problem is the hybrid algorithm of dynamic and static simulation methods (for details see the chapter 15 by Y. Nakayama). With this hybrid algorithm, when all rate-limiting reactions are dynamically represented, every other reaction can be statically represented with the same accuracy as completely dynamic simulation. This algorithm reduces as much as 80% of the necessary parameters for the dynamic simulation. The static part of simulation, in other words 80% of the dynamic cell model, can then be directly generated using GEM System stoichiometry, using the hybrid algorithm. Moreover, using the G-language GAE functions, it is possible to predict the rate-limiting enzymes. AUosteric enzymes are often responsible for the rate-limiting step, and known allosteric enzymes can be specified by homology searches against protein databases such as SWISS-PROT. Then the

216

K. Arakawa et al.

literature search module in the GEM System can be automated to search for the necessary parameters to dynamically model the rate-limiting reaction. Obviously, the whole modeling process cannot be automated and many parameters can only be obtained by biological experiments; nevertheless, the GEM System with the hybrid algorithm will greatly reduce the cost required in large-scale modeling.

6. Results and the Modeling Environment Using the GEM System package, several whole-cell simulation models are generated, including a virtual Escherichia coli model consisting of 2264 reactions of 1682 metabolites, which is the largest model ever created with E-CELL (see Fig. 3) [20]. It is worth noting that the whole conversion process takes only about 30 seconds on a personal computer when a GenBank or EMBL format genome sequence is seeded to the GEM System, and that the whole process can be manipulated with a friendly GUI (see Fig. 4).

Fig. 3. Dynamic changes in quantities of metabolites are visualized with the .e (dot e) viewer, receiving the simulation results from E-CELL Simulation Environment. Metabolic pathway model generated from Escherichia coli Kl2 MG 1665 genome consists of 2264 reactions and 1682 metabolites

Genome-based E-CELL Modeling (GEM) System

217

Fig. 4. The GEM System is equipped with a user-friendly graphic interface for easy access and manipulation. Generation of a simulation model is an automated procedure requiring only the genome sequence as its input. Users can modify numerous parameters of the system through this interface, which is powered by the G-language GAE Although this simulation model is only qualitative and cannot represent the dynamic behavior, the GEM System is flexible enough to build on to the generated simulation model. Because the generated information contains a link to the biological database used in the GEM System, additional information can be loaded onto the simulation. One example of such extension to a model is protein localization. It is necessary to understand the specific localization of proteins, especially in eukaryotes where organelles make up the complex system of the cell. Using e-Rice, a virtual cell model of rice, enzyme reactions are separated into compartments of cytoplasm, chloroplast, cytoskeleton, endoplasmic reticulum, extracell, Golgi apparatus, lysosome, mitochondria, nucleus, peroxisome, plasma membrane, and vacuole according to the classification by Chou et al. [21]. The GEM System can also take advantage of the G-language GAE analysis functions for nucleotide and peptide sequences. Tissue specific expression of genes can be modeled by mapping the cDNA and EST sequences available in the UniGene database (http://www.ncbi.nih.gov/ UniGene/) using the cDNA Analysis System (CASYS) of G-language GAE. CASYS contains an automated mapping module, with which the

218

K. Arakawa et al.

GEM System realizes a specific modeling of cells of certain tissue. A wealth of microarray data available on the Internet is another source for this purpose. The GEM System also includes a gene expression prediction module for the estimation of enzyme quantity, a text search and retrieval module for literature search over Medline (http://www.ncbi.nih.gov/entrez/), a pathway checking module for finding substrates without input, and a pathway viewer using "a Java applet for visualizing protein-protein interaction" and BioLayout [22,23]. Shown in Fig. 5 is the glycolysis pathway of the Escherichia coli genome.

Fig. 5. As graphically represented using "a Java applet for visualizing protein-protein interaction," the Escherichia coli model generated by the GEM System effectively reconstructs metabolic pathways of the organism based on the genome sequence. Shown here is the glycolysis pathway. Generated pathways are also connected by the intemal pathway check procedure

Genome-based E-CELL Modeling (GEM) System

219

7. Discussion The GEM System realizes rapid and automatic generation of a large-scale static metabolic pathway simulation model from the genome sequence. This can be the draft model for the dynamic model using the hybrid algorithm, if the rate-limiting enzymes are modeled with dynamic rate equations. This enables observation and study of the dynamic behavior of cells in silico, effective not only for the study of life, but also for metabolic engineering and pharmaceutical experiments. When the dynamic model of a virtual cell with the enzyme reactions is achieved, the next goal is to model the signal transduction pathways and gene expression, which will include transcription, translation, and degradation processes. The GEM System is a strong platform for this purpose, because of the backbone on G-language GAE. Direct access to G-language GAE and references to the biological database accomplish a link to genome analyses. Required parameters may be directly calculated from the genome sequence as the simulation runs, and this will truly become the simulation from the genome.

References 1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2003) GenBank. Nucleic Acids Res 31:23-27 2. Kitano H (2002) Computational systems biology. Nature 420:206-210 3. Takahashi K, Yugi K, Hashimoto K, Yamada Y, Pickett CJF, Tomita M (2002) Computational challenges in cell simulation. IEEE Intelligent Systems 17:64-71 4. Tomita M, Hashimoto K, Takahashi K, Shimizu TS, Matsuzaki Y, Miyoshi F, Saito K, Tanida S, Yugi K, Venter JC, Hutchinson CA (1999) E-CELL: software environment for whole-cell simulation. Bioinformatics 15:72-84 5. Tomita M (2001) Whole-cell simulation: a grand challenge of the 21st century. Trends Bibtechnol 19:205-210 6. Arakawa K, Mori K, Ikeda K, Matsuzaki T, Kobayashi Y, Tomita M (2003) G-language Genome Analysis Environment: a workbench for nucleotide sequence data mining. Bioinformatics 19:305-306 7. Brooksbank C, Camon E, Harris MA, Magrane M, Martin MJ, Mulder N, O'Donovan C, Parkinson H, Tuli MA, Apweiler R, Bimey E, Brazma A, Henrick K, Lopez R, Stoesser G, Stoehr P, Cameron G (2003) The European Bioinformatics Institute:is data resources. Nucleic Acids Res 31:43-50 8. Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O'Donovan C, Phan I, Pilbout S, Schneider M (2003)

220

9. 10. 11.

12.

13. 14. 15.

16. 17. 18.

19. 20.

21. 22. 23.

K. Arakawa et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31:365-370 Kanehisa M, Goto S, Kawashima S, Nakaya A (2002) The KEGG databases at GenomeNet. Nucleic Acids Res 30:42-46 Arita M (2000) Metabolic reconstruction using shortest paths. Simulat Pract Theory 8:109-125 Schomburg I, Chang A, Hofmann O, Ebeling C, Ehrentreich F, Schomburg D (2002) B REND A: a resource for enzyme data and metabolic information. Trends Biochem. Sei 27:54-56 Overbeek R, Larsen N, Pusch GD, D'Souza M, Selkov E Jr, Kyrpides N, Fonstein M, Maltsev N, Selkov E (2000) WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res 28:123-125 Salzberg SL, Delcher AL, Kasif S, White O (1998) Microbial gene identification using interpolated Markov models. Nucleic Acids Res 26:544-548 Delcher AL, Harmon D, Kasif S, White O, Salzberg SL (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res 27:4636^641 Lester PJ, Hubbard SJ (2002) Comparative bioinformatics analysis of complete proteomes and protein parameters for cross-species identification in proteomics. Proteomics 2:1392-1405 Koonin EV, Tatusov RL, Galperin MY (1998) Beyond complete genomes: from sequence to structure and function Curr. Opin Struct Biol 8:355-363 Tatusov RL, Koonin EV, Lippman DJ (1997) A genomic perspective on protein families. Science 278:631-637 Tatusov RL, Natale DA, Garkavtsev IV, Tatusova TA, Shankavaram UT, Rao BS, Kiryutin B, Galperin MY, Fedorova ND, Koonin EV (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res 29:22-28 Ahschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J. Mol. Biol. 215:403^10 Blattner FR, Plunkett G 3rd, Bloch CA, Pema NT, Burland V, Riley M, Collado-Vides J, Glasner JD, Rode CK, Mayhew GF, Gregor J, Davis NW, Kirkpatrick HA, Goeden MA, Rose DJ, Mau B, Shao Y (1997) The complete genome sequence of Escherichia coli K-12. Science 277:1453-1474 Chou KC, Elrod DW (1999) Protein subcellular location prediction. Protein Eng. 12:107-118 Mrowka RA (2001) Java applet for visualizing protein-protein interaction. Bioinformatics 17:669-671 Enright AJ, Ouzounis CA (2001) BioLayout-an automatic graph layout algorithm for similarity visualization. Bioinformatics 17:853-854

Chapter 15: Hybrid Dynamic/Static Method for Large-Scale Simulation of Metabolism and its Implementation to the E-CELL System Yoichi Nakayama Institute for Advanced Biosciences, Keio University, Nipponkoku 403-1, Daihouji, Tsuruoka, Yamagata 997-0017, Japan

1. Introduction A significant problem in the development of dynamic metabolic simulation models is a lack of data on the dynamic characteristics of reactions. Using the Genome-based E-CELL Modeling (GEM) system (Chaptre 14) and other databases, it has become possible to express metabolic pathways and the stoichiometry of the reactions comprised. However, the construction of dynamic models requires differential equations and their parameters that represent the dynamic characteristics of reactions. This chapter is an introduction to a new simulation method called the hybrid dynamic/static simulation method, and explains how it reduces the required dynamic data. Cell simulation is the reconstruction of intracellular reactions on a computer based on quantitative data in an attempt to analyze the systematic characteristics of the cell and predict unknown pathways/mechanisms. We have developed a general simulation software called the E-CELL system for reproducing the entire processes within a cell [1-3]. This system is flexibly structured to enable not only dynamic simulation based on rate equations, but also the simulated reproduction of virtually all configurations including the stochastic algorithms of Gillespie [4] and StochSim [5], as well as the S-System [6] and generalized mass action (GMA). We have constructed a simulation model of normal erythrocytes using this E-CELL system, and this model has been employed to analyze the glucose-6-phosphate dehydrogenase (G6PD) deficiency. The results of this simulation indicated that the pathways for glutathione synthesis partially compensate for the deficiency in reduced glutathione associated with a loss of G6PD [7]. From this example, in order to construct a simulation model for general purposes that enables the prediction of abnormal conditions, such as enzyme deficiency, it is essential to cover all the metabolic pathways.

222

Y. Nakayama

2. Principles of {Hybrid Dynamic/Static Simulation Method In general, enzyme rate equations are used in continuous and dynamic simulation models. Various types of this ordinary differential equation exist corresponding to the relevant reaction mechanism, but the common characteristic is they all now calculate the reaction rate from the amount of the substances concerned. The substances concerned are proteins that catalyze reactions such as enzymes, and substrates, products, and effectors. In addition, the rate equation also requires parameters such as rate constants contained in the equation. The Michaelis-Menten equation, which is a typical rate equation, is still used in many cell simulations, but it was originally contrived in the field of enzymology for analyzing the property of the initial velocity, which is a state in which there exists only substrates and no products. In dynamic simulation of living cells accompanying various changes in state, the conditions assumed by this equation—^that the amount of enzyme-substrate complex is always at a steady state and that the reaction is irreversible—^become inappropriate in most cases. With the Michaelis-Menten equation, if the reaction is reversible, no effect can be considered on the state of equilibrium between substrate and products. The development of an accurate simulation model that is applicable for general purposes, therefore, requires examining the reaction mechanism for each enzyme and formulating a detailed equation such as by the King-Altman method [8], MWC model [9], or KNF model [10], and then using an equation calculated by a method whereby each rate constant is approximated from experimental data. What have now become the biggest issues for constructing a simulation model are the rate reaction dependent on this reaction mechanism, parameters used in this equation, and the concentration of metabolites. There is a high-throughput method being established that measures the concentration of metabolites en masse using an assay called capillary electrophoresis/mass spectrometry (CE/MS), but for other enzymes, accompanying enzyme purification has become a major problem, with no high-throughput assay yet established. To provide a solution for these problems from the perspective of simulation algorithm, we have developed the hybrid dynamic/static simulation method. This method is a combination of metabolic flux analysis [11-14], which is a static method that has been developed in the field of metabolic engineering, and conventional dynamic simulation. This metabolic flux analysis is constructed using stoichiometric matrices that describe relationships only by the stoichiometric coefficients. The principle of this

Hybrid Dynamic/Static Method for Simulation

223

Reaction X: Si -^ 2S2 Reaction Y: S2 ^ S3 Reaction Z: 2S3 -* 3Si Influx: -^ A Efflux: C -^

0

-1

3

0

Stoichiometric system -*Dynamic system