Biotechnology Annual Review 14 [1 ed.] 978-0-444-53226-8

Biotechnology is a diverse, complex, and rapidly evolving field. Students and experienced researchers alike face the cha

284 108 8MB

English Pages 1-475 [481] Year 2008

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Content:
Biotechnology Annual Review
Page iii

Copyright page
Page iv

Preface
Page v
M. Raafat El-Gewely

Editorial board
Pages vii-viii

List of contributors
Pages ix-xii

The social network of a cell: Recent advances in interactome mapping Review Article
Pages 1-28
Sebastian Charbonnier, Oriol Gallego, Anne-Claude Gavin

Gene expression microarray data analysis demystified Review Article
Pages 29-61
Peter C. Roberts

UCSC Genome Browser: Deep support for molecular biomedical research Review Article
Pages 63-108
Mary E. Mangan, Jennifer M. Williams, Scott M. Lathe, Donna Karolchik, Warren C. Lathe III

Potential implications of availability of short amino acid sequences in proteins: An old and new approach to protein decoding and design Review Article
Pages 109-141
Joji M. Otaki, Tomonori Gotoh, Haruhiko Yamamoto

Network models in drug discovery and regenerative medicine Review Article
Pages 143-170
David A. Winkler

Use of the cauliflower Or gene for improving crop nutritional quality Review Article
Pages 171-190
Xiangjun Zhou, Joyce Van Eck, Li Li

How to predict and prevent the immunogenicity of therapeutic proteins Review Article
Pages 191-202
Huub Schellekens

Protein arginine methylation in health and disease Review Article
Pages 203-224
John M. Aletta, John C. Hu

Identification and characterization of a novel cytotoxic protein, parasporin-4, produced by Bacillus thuringiensis A1470 strain Review Article
Pages 225-252
Shiro Okumura, Hiroyuki Saitoh, Tomoyuki Ishikawa, Eiichi Mizuki, Kuniyo Inouye

G protein-independent cell-based assays for drug discovery on seven-transmembrane receptors Review Article
Pages 253-274
Folkert Verkaar, Jos W.G. van Rosmalen, Marion Blomenröhr, Chris J. van Koppen, W. Matthijs Blankesteijn, Jos F.M. Smits, Guido J.R. Zaman

The application of low shear modeled microgravity to 3-D cell biology and tissue engineering Review Article
Pages 275-296
Stephen Navran

Ethnomedicines and ethnomedicinal phytophores against herpesviruses Review Article
Pages 297-348
Debprasad Chattopadhyay, Mahmud Tareq Hassan Khan

Free radical processes in green tea polyphenols (GTP) investigated by electron paramagnetic resonance (EPR) spectroscopy Review Article
Pages 349-401
K.F. Pirker, J. Ferreira Severino, T.G. Reichenauer, B.A. Goodman

Critical review and appraisal of published clinical literature: Useful skill in biotechnology product development Review Article
Pages 403-410
MaryAnn Foote

The current status and future potential of personalized diagnostics: Streamlining a customized process Review Article
Pages 411-422
Terri D. Richmond

Recent advances in the development of transgenic papaya technology Review Article
Pages 423-462
Evelyn Mae Tecson Mendoza, Antonio C. Laurena, José Ramón Botella

Index of authors
Page 463

Keywords index
Pages 465-475

Recommend Papers

Biotechnology Annual Review 14 [1 ed.]
 978-0-444-53226-8

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Biotechnology Annual Review Volume 14

Chief Editor M. Raafat El-Gewely Department of Biotechnology University of Tromsø Tromsø, Norway

Amsterdam  Boston  Heidelberg  London  New York  Oxford Paris  San Diego  San Francisco  Singapore  Sydney  Tokyo

Elsevier Radarweg 29, PO Box 211, 1000 AE Amsterdam, The Netherlands Linacre House, Jordan Hill, Oxford OX2 8DP, UK First edition 2008 Copyright r 2008 Elsevier B.V. All rights reserved No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email: [email protected]. Alternatively you can submit your request online by visiting the Elsevier web site at http://www.elsevier.com/locate/permissions, and selecting Obtaining permission to use Elsevier material Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress ISBN: 978-0-444-53226-8 ISSN: 1387-2656 For information on all Elsevier publications visit our website at elsevierdirect.com Printed and bound in Hungary 08 09 10 11 12 10 9 8 7 6 5 4 3 2 1

v

Preface

The Biotechnology Annual Review Series is now fourteen volumes strong. A great part of the success of this publication is due to the ever-expanding awareness of what biotechnology can do in the different areas of applications from agriculture to therapeutics and from industrial utilizations to vaccines and by frequently covering of the new tools of the art. We aim in this series to cover comprehensive chapters dealing with current topics of interest with all needed illustrations and references. This forum makes it easier for scholars, investigators, students and educators as well as decision makers to go in depth about a given topic. Starting from Volume 15 this series will be linked to ‘‘New Biotechnology’’ by Elsevier and will be published as a special issue (http://www.elsevier.com/ wps/find/journaldescription.cws_home/713354/description#description). Moreover, we will be open to thematic volumes focusing on one theme for a given volume. We welcome new contributions that would meet such criteria. Please contact me or any member of the Editorial Board for any suggestion. M. Raafat El-Gewely Chief Editor

vii

Editorial board Chief Editor M. Raafat El-Gewely, Department of Molecular Biotechnology, Gene Systems Group, Institute of Medical Biology, University of Tromsø, 9037 Tromsø, Norway, E-mail: [email protected]

Editors Marin Berovic, Faculty of Chemistry and Chemical Engineering, Department of Chemical and Biochemical Engineering, University of Ljubljana, Askercˇeva 5, 1001 Ljubljana, Slovenia, E-mail: [email protected] Thomas M.S. Chang, Artificial Cells & Organs Research Centre, McGill University, 3655 Drummond St., Room 1005, Montreal, Quebec, Canada H3G 1Y6, E-mail: [email protected] Thomas T. Chen, University of Connecticut, Molecular & Cell Biology, TLS 415, 75 North Eagleville Road, Unit 3125, Storrs, CT 06269-3125, USA, E-mail: [email protected] Frank Desiere, F. Hoffmann–La Roche Ltd, Diagnostics Division, Bldg. 52/1607, CH-4070 Basel, Switzerland, E-mail: [email protected] Franco Felici, Department of Microbiological, Genetic and Molecular Science, University of Messina, Salita Sperone 31, 98166 Messina, Italy, E-mail: franco. [email protected] MaryAnn Foote, MaryAnn Foote Associates, 4284 Par Five Court, Westlake Village, California 91362, USA, E-mail: [email protected] Leodevico (Vic) L. Ilag, Cryptome Pharmceuticals Ltd, Level 1, Baker Heart Research Institute Bldg, Commercial Road, Melbourne, VIC 3004, Australia, and PO Box 6492, St Kilda Road Central, Melbourne, VIC 8008, Australia, E-mail: [email protected] Kuniyo Inouye, Laboratory of Enzyme Chemistry, Division of Food Science and Biotechnology, Graduate School of Agriculture, Kyoto University, Sakyo-ku, Kyoto 606-8502, Japan, E-mail: [email protected] Guido Krupp, AmpTec GmbH, Koenigstr. 4A, D-22767 Hamburg, Germany, E-mail: [email protected]

viii Alfons Lawen, Monash University, Clayton Campus, Department of Biochemistry and Molecular Biology, Room 312, Building 13D, Clayton, Victoria 3800, Australia, E-mail: [email protected] K. John Morrow Jr., 625 Washington Avenue, Newport, KY 41071, USA, E-mail: [email protected] Jocelyn H. Ng, 2/6 Jurang Street, Balwyn VIC 3103, Australia, E-mail: j.ng@ bigpond.net.au Eric Olson, Vertex Pharmaceuticals, 130 Waverly Street, Cambridge, MA 02139, USA, E-mail: [email protected] Vincenzo Romano-Spica, University Institute of Motor Science, IUSM, P.zza Lauro e Bosis 15, 00194 Rome, Italy, E-mail: [email protected]

ix

List of contributors John M. Aletta, CH3 BioSystems, LLC, University Commons (Suite 8), 1416 Sweet Home Road, Amherst, New York 14228, USA, and New York State Center of Excellence in Bioinformatics and Life Sciences Buffalo, New York, USA W. Matthijs Blankesteijn, Department of Pharmacology and Toxicology, Cardiovascular Research Institute Maastricht (CARIM), University of Maastricht, Maastricht, The Netherlands Marion Blomenro¨hr, Molecular Pharmacology Unit, Organon BioSciences, Oss, The Netherlands Jose´ Ramo´n Botella, School of Integrative Biology, Faculty of Biological and Chemical Sciences, University of Queensland, Brisbane, Queensland 4072, Australia Sebastian Charbonnier, EMBL, Structural and Computational Biology Unit, Meyerhofstrasse 1, D-69117 Heidelberg, Germany Debprasad Chattopadhyay, ICMR Virus Unit, I.D. & B.G. Hospital Campus, General Block: 4, First floor, 57, Dr. Suresh C. Banerjee Road, Beliaghata, Kolkata 700 010, India MaryAnn Foote, MA Foote Associates, Westlake Village, CA 91362, USA Oriol Gallego, EMBL, Structural and Computational Biology Unit, Meyerhofstrasse 1, D-69117 Heidelberg, Germany Anne-Claude Gavin, EMBL, Structural and Computational Biology Unit, Meyerhofstrasse 1, D-69117 Heidelberg, Germany B.A. Goodman, Department of Environmental Research, Austrian Research Centers GmbH – ARC, 2444 Seibersdorf, Austria, and Department of Chemistry & Chemical Engineering, University of Guangxi, Nanning 530005, Guangxi, People’s Republic of China Tomonori Gotoh, Department of Information Science, Kanagawa University, Hiratsuka, Kanagawa 259-1293, Japan John C. Hu, Biomedical Sciences Program, University at Buffalo, State University of New York Buffalo, New York, USA

x Kuniyo Inouye, Division of Food Science and Biotechnology, Graduate School of Agriculture, Kyoto University, Kitashirakawa-Oiwakecho, Sakyo-ku, Kyoto 606-8502, Japan Tomoyuki Ishikawa, Fukuoka Industrial Technology Centre, Kurume, Fukuoka 839-0861, Japan Donna Karolchik, UCSC Genome Bioinformatics Group: Center for Biomolecular Science & Engineering, CBSE/ITI, 501D Engineering II Building, University of California, Santa Cruz, 1156 High St., Santa Cruz, CA 95064, USA Mahmud Tareq Hassan Khan, Department of Pharmacology, Institute of Medical Biology, University of Tromsø, 9037 Tromsø, Norway Warren C. Lathe III, OpenHelix, LLC, 12600 SE 38th Street, Suite 230, Bellevue, WA 98006, USA Scott M. Lathe, OpenHelix, LLC, 12600 SE 38th Street, Suite 230, Bellevue, WA 98006, USA Antonio C. Laurena, Institute of Plant Breeding, Crop Science Cluster, College of Agriculture, University of the Philippines Los Ban˜os, College, Laguna, Philippines Li Li, U.S. Department of Agriculture-Agricultural Research Service, Plant, Soil and Nutrition Laboratory, Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853, USA Mary E. Mangan, OpenHelix, LLC, 12600 SE 38th Street, Suite 230, Bellevue, WA 98006, USA Evelyn Mae Tecson Mendoza, Institute of Plant Breeding, Crop Science Cluster, College of Agriculture, University of the Philippines Los Ban˜os, College, Laguna, Philippines Eiichi Mizuki, Fukuoka Industrial Technology Centre, Kurume, Fukuoka 839-0861, Japan Stephen Navran, Synthecon, Inc., 8042 El Rio, Houston, Texas 77054, USA Shiro Okumura, Fukuoka Industrial Technology Centre, Kurume, Fukuoka 839-0861, Japan Joji M. Otaki, The BCPH Unit, Laboratory of Cell and Functional Biology, Department of Chemistry, Biology and Marine Science, University of the Ryukyus, Nishihara, Okinawa 903-0213, Japan

xi K.F. Pirker, Department of Environmental Research, Austrian Research Centers GmbH-ARC, 2444 Seibersdorf, Austria T.G. Reichenauer, Department of Environmental Research, Austrian Research Centers GmbH – ARC, 2444 Seibersdorf, Austria Terri D. Richmond, School of Pharmacy, Regulatory Science Program, University of Southern California, 1540 Alcazar St. CHP G32, Los Angeles, CA 90033, USA Peter C. Roberts, VizX Labs, 200 West Mercer Street, Suite 500, Seattle, WA 98119, USA Hiroyuki Saitoh, Fukuoka Industrial Technology Centre, Kurume, Fukuoka 839-0861, Japan Huub Schellekens, Utrecht University, Faculty of Science, Department of Pharmaceutical Sciences, P.O. Box 80.082, 3508 TB Utrecht, The Netherlands J. Ferreira Severino, Department of Environmental Research, Austrian Research Centers GmbH – ARC, 2444 Seibersdorf, Austria Jos F.M. Smits, Department of Pharmacology and Toxicology Cardiovascular Research Institute Maastricht (CARIM), University of Maastricht, Maastricht, The Netherlands Joyce Van Eck, Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, NY 14853, USA Chris J. van Koppen, Molecular Pharmacology Unit, Organon BioSciences, Oss, The Netherlands Jos W.G. van Rosmalen, Molecular Pharmacology Unit, Organon BioSciences, Oss, The Netherlands, and Department of Pharmacology and Toxicology, Cardiovascular Research Institute Maastricht (CARIM), University of Maastricht, Maastricht, The Netherlands Folkert Verkaar, Molecular Pharmacology Unit, Organon BioSciences, Oss, The Netherlands, and Department of Pharmacology and Toxicology, Cardiovascular Research Institute Maastricht (CARIM), University of Maastricht, Maastricht, The Netherlands Jennifer M. Williams, OpenHelix, LLC, 12600 SE 38th Street, Suite 230, Bellevue, WA 98006, USA David A. Winkler, CSIRO Molecular and Health Technologies, Clayton 3168, Australia

xii Haruhiko Yamamoto, Professor Emeritus, Kanagawa University, Yokohama, Kanagawa 221-8790, Japan Guido J.R. Zaman, Molecular Pharmacology Unit, Organon BioSciences, Oss, The Netherlands Xiangjun Zhou, U.S. Department of Agriculture-Agricultural Research Service, Plant, Soil and Nutrition Laboratory, Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853, USA

1

The social network of a cell: Recent advances in interactome mapping Sebastian Charbonnier, Oriol Gallego and Anne-Claude Gavin EMBL, Structural and Computational Biology Unit, Meyerhofstrasse 1, D-69117 Heidelberg, Germany Abstract. Proteins very rarely act in isolation. Biomolecular interactions are central to all biological functions. In human, for example, interference with biomolecular networks often lead to disease. Protein–protein and protein–metabolite interactions have traditionally been studied one by one. Recently, significant progresses have been made in adapting suitable tools for the global analysis of biomolecular interactions. Here we review this suite of powerful technologies that enable an exponentially growing number of large-scale interaction datasets. These new technologies have already contributed to a more comprehensive cartography of several pathways relevant to human pathologies, offering a broader choice for therapeutic targets. Genome-wide scale analyses in model organisms reveal general organizational principles of eukaryotic proteomes. We also review the biochemical approaches that have been used in the past on a smaller scale for the quantification of the binding constant and the thermodynamics parameters governing biomolecular interaction. The adaptation of these technologies to the large-scale measurement of biomolecular interactions in (semi-)quantitative terms represents an important challenge. Keywords: yeast-two hybrids, TAP/MS, protein complexes, affinity constant, affinity purification, pulldown, phage display, ITC, SPR, metabolites, small-molecules, interaction.

Introduction Since the sequencing of the first eukaryotic genome, Saccharomyces cerevisiae, some 10 years ago [1], our understanding of the basic building blocks that make up a cell has spectacularly improved. The explosion of new analytical tools in the fields of genomics, proteomics and metabolomics contributes ever-growing molecular repertoires of a cell. However, biology does not rely on biomolecules acting in isolation. Biological function usually depends on the concerted action of molecules acting in protein complexes, metabolic or signaling pathways or networks. Traditionally, assays were designed to study a few selected gene products and their interactions in a defined number of chosen biological contexts. These decades of one-by-one studies have contributed a wealth of knowledge on how cells sense, store and transduce information through a defined number of signaling routes or pathways [2]. In human, impaired or deregulated protein–protein or protein–metabolite interaction often leads Corresponding author: Tel.: +49-0-6221-387-8816.

E-mail: [email protected] (A.-C. Gavin). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00001-X

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

2 to disease. Also, the majority of targets of current therapeutics are part of a limited number of cellular pathways [3]. A precise cartography of these pathways is believed to contribute to the identification of drug targets, and to help understanding the mechanism of action and side effects of therapeutic compounds. One of the goals of this review is to exemplify and illustrate the importance of protein–protein, but also protein–small-molecule (metabolite) interactions for human biology and pathology. Recently, the study of protein–protein, and more broadly biomolecular interactions, has taken center stage. New strategies have been designed that allow the study of interactions more globally at the level of entire biological systems. A section of this review is dedicated to the description of the technologies adapted to chart biomolecular interactions on a systems-wide scale. Their respective advantages, shortcomings and limitations are presented. These methods have already contributed molecular maps of several pathways involved in human pathologies. More global, genome-scale protein–protein interaction screens performed in a model organism, S. cerevisiae, have provided a molecular network that serves as basis for the interpretation of simple genetic data such as gene essentiality. There is a greater tendency for proteins central in networks to be lethal when deleted [4]. Biomolecular interactions in human diseases The spatial and temporal coordination of the many cellular enzyme activities through extensive and highly regulated protein–protein interaction networks bears remarkable functional relevance. The specificity of protein–protein recognition is believed to essentially rely on two distinct mechanisms. In some cases, specialized domains or binding sites accommodate smaller determinants or peptides present on the interaction partner [2]. For example, Src homology 2 (SH2) domains specifically interact with small peptides containing a phosphotyrosyl residue. PSD95-DLG-ZO1 (PDZ) domains target small (4 amino acid long) consensus binding motifs located at the C-terminus of the interaction partners. These short linear motifs are critical to many biological processes. They often show medium to low affinities (0.5–10 mM) and thus tend to be mediators in transient interactions, especially in signaling networks. The derived plasticity from these weak affinities and the often low conservations might be an advantage for fast adaptation of networks according to changing environment [5–7]. In other cases, protein–protein interaction involves much larger interfaces. This modality of recognition requires mostly folded domains and occurs with higher affinities in the low nanomolar or even picomolar range. A famous example are proteins containing a leucine zipper, in which a-helices interact tightly and fit together [2,8]. Very similar to the associations taking place between proteins, interactions involving small molecule metabolites and proteins play key biological

3 functions. The relationship between metabolomes and proteomes is not limited to enzyme–substrate or –product interactions. Many metabolites have well-known signaling roles, as second messengers. Others, such as succinate and a-ketoglutarate, two intermediates of the citric acid cycle, are ligands of mammalian G-coupled receptors, GPR91 and GPR99 [9]. Many metabolic enzymes and many signaling proteins are allosterically modulated by metabolites. Binding to small molecules is often mediated by a variety of more than thousand specific domains (Protein Data Bank: www.rcsb.org/pdb/). Among the 2,000 monogenic syndromes with a known molecular basis (Online Mendelian Inheritance in Man, 2000), mutations that affect biomolecular interactions are not uncommon (Table 1). Noteworthy examples are mutations in receptors that affect their ability to interact with the cognate ligands. The Apert syndrome, characterized by skull malformation, syndactyly and often mental deficiency, is caused by mutations in the fibroblast growth factor receptor 2 (FGFR2) that selectively increase the affinity for FGF2 [10]. Enzymes are also extensively involved in protein– protein interactions. For instance, DNMT3B is a DNA methyltransferase implicated in the immunodeficiency, centromeric instability, facial anomalies (ICF) syndrome, a rare autosomal recessive disorder. Missense mutations characterized in ICF patients map not only within the catalytic site but also affect an N-terminal PWWP domain, involved in protein–protein interactions [11]. Also, mutations have been characterized that prevent the assembly of a functional multiprotein complex. A good example is a RFXANK gene mutant that fails to assemble a regulatory factor X (RFX) complex required for the expression of MHC class II genes. This leads to the bare lymphocyte syndrome [12]. Finally, a variety of abnormal or erroneous interactions between brain proteins can result in the formation of toxic aggregates of proteinacious fibrils. These ‘‘fatal attractions’’ [13] are the apparent cause of a variety of neurodegenerative disorders, such as sporadic and familial Alzheimer’s disease, Parkinson’s disease, amyotrophic lateral sclerosis and prion encephalopathies. Similarly to protein–protein interactions, deregulation of protein–metabolite interactions leads to many pathological status in human (Table 1). For example, the Bannayan-Riley-Ruvalcaba syndrome, characterized by macrocephaly, multiple lipomas and hemangiomas [14], is caused by mutation in the phosphatase PTEN. Different mutations map in a protein kinase C conserved region 2 (C2) domain that has relatively broad specificities for phospholipids. Pleckstrin-homology domains (PH) (http://smart.embl-heidelberg.de/), the 11th most frequent domain in human, mediate the specific recruitment of key signaling proteins to the membrane pools of phosphatidylinositol phosphates (PIPs). PH domains share an extremely conserved fold, despite divergent primary sequences. X-linked a-gamma-globulinaemia (XLA), characterized by the absence of mature B-lymphocyte and all immunoglobulin

4 Table 1. Altered biomolecular interaction in human diseases. Disease and syndrome Protein–protein interaction Apert syndrome Familial melanoma CADASIL Bare lymphocyte syndrome Branchio-oto-renal/ Branchio-otic syndromes Adrenoleukodystrophy

Holt-Oram syndrome ICF syndrome

Mutated gene product

Interacting partner

References

FGF Fibroblast growth factor receptor 2: FGFR2 Tumor suppressor Cyclin-dependent gene: p16(INK4) kinases (CDK4, CDK6) NOTCH3 NOTCH3, Fringe RFXANK RFX complex

[10]

SIX1

EYA1

[114]

ATP-binding cassette transporter: ABCD1 Tbx5 Methytransferase gene: DNMT3B Gigaxonin Mismatch repair gene: MLH1

ABCD1

[115]

NKX2.5 Unknown

[116] [117]

MAP1B-LC PMS2

[118] [119]

Phospholipids

[14]

PI3,4,5P

[15,16]

ATP

[120]

1,25-dihydroxyvitamin D3 Broad variety of ligand

[121]

PI3,4,5P, PI3,4P

[123]

Giant axonal neuropathy Hereditary nonpolyposis colorectal cancer Protein-metabolite interaction Bannayan-RileyPTEN, C2 Ruvalcaba syndrome X-linked BTK, PH agammaglobulinemia Pseudoxanthoma ABCC6, NBF elasticum (nucleotide binding fold) Vitamin D-dependent VDR, Steroid Rickets, Type II hormon-binding Sporadic colon cancers, PPARG, LigandFamilial Partial binding Lipodystrophy Type 3 Human breast, colorectal AKT1, PH and ovarian cancers

[110,111] [112,113] [12]

[122]

isotypes, is caused by mutations in the Bruton’s protein tyrosine kinase (Btk) (Table 1) [15]. Many mutations have been reported that cluster within the tyrosine kinase domain and also in the amino terminal PH domain of Btk [15,16]. The examples listed in Table 1 do not represent a comprehensive inventory. They illustrate that the spatial and temporal orchestration of the many enzyme activities through extensive and highly regulated biomolecular interaction networks bears remarkable functional relevance. Mutational

5 lesions or environmental factors impairing the pathway flow or deregulating connections lead to pathology, as surely as interferences with the catalytically active sites do. Charting biomolecular interactions The sequencing of the full repertoire of genes in several organisms and significant breakthrough in the fields of proteomics opened the way to more holistic protein–protein interaction analyses. Mainly two types of approaches have been adapted: the yeast two-hybrid system that allows the mapping of pairwise associations and affinity purification methods coupled to mass spectrometry, MS-based protein identification, designed for the characterization of protein complexes (Table 2). Alternative strategies are emerging, such as pulldown coupled to MS or microarrays that will also be presented. These screening methods are generally limited to qualitative measurements. Table 2. Overview of large-scale protein–protein interaction studies. Method Yeast two-hybrid Y2H

Organism

Interactions

References

H. pylori S. cerevisiae S. cerevisiae C. elegans D. melanogaster D. melanogaster H. sapiens H. sapiens Kaposi sarcomaassociated herpes virus Varicella-zoster virus Epstein-Barr virus C. jejuni

B1,520 B4,500 B1,000 B5,000 B20,400 B2,300 B2,800 B3,200 B120

[47] [124] [48] [51] [49] [50] [52] [53] [102]

B170 B40 B11,600

[102] [101] [125]

B4,100

[28]

B5,250 B220 B3,620

[29] [30] [27] [32] [126] [24]

B40

[77]

Affinity chromatography TAP/mass S. cerevisiae spectrometry S. cerevisiae S. cerevisiae E. coli H. sapiens O. sativae Immuno-affinity S. cerevisiae purification/mass spectrometry Others Protein microarrays S. cerevisiae

6 Biomolecular Networks Biochemical: AP/TAP-MS, pulldown Genetic: yeast two-hybrid, phage display Arrays

Molecular recognition Structures and modeling approaches

Koff [P]+[B] Designer: Petra Riedinger

Kon

[PB]

Dynamics, quantitative measurements SPR, ITC, equilibrium dialysis, hold-up & stopped flow

Fig. 1. Main methodologies in the study of biomolecular interactions. The appro-

aches are grouped according to their aim. (The color version of this figure is hosted on Science Direct.)

We also review the arsenal of quantitative approaches that are currently amenable to smaller scale work (Fig. 1). Finally, a number of databases have been developed that integrate biomolecular interaction data from various origins (large-scale physical or genetic interactions, literature mining) and provide a very rich source of information (Table 3). The necessity to integrate the exponentially growing number of protein interaction datasets being generated, recently lead to the development of ‘‘The Minimum Information required for reporting a Molecular Interaction experiment (MIMIx),’’ a community standard for the representation of protein interaction data [17]. Large-scale approaches: Interactome mapping Biochemical approaches Analysis of protein complexes: Affinity purification-mass spectrometry The emergence of sensitive and high-throughput MS methods has fuelled the development of methods employing the biochemical purification of whole cellular assemblies. Pioneering work used specific antibodies directed against epitopes present on endogenous protein complexes or recombinant specific interaction domains combined with MS to identify protein complexes. This

7 Table 3. Protein–protein interaction databases available on the Internet and their URL addresses. See also The Jena Center for Bioinformatics Protein–Protein Interaction Website (http://www.imb-jena.de/jcb/ppi/). Database

Internet URL address (http://)

Species

Experimental datasets for protein–protein interactions: The GRID

www.thebiogrid.org

KDBI

xin.cz3.nus.edu.sg/group/kdbi/ kdbi.asp www.agklebe.de/affinity lpdb.chem.lsa.umich.edu

C. elegans, D. melanogaster, S. cerevisiae www.yeastgenome.org/ S. cerevisiae SGD mips.gsf.de/genre/proj/yeast/ S. cerevisiae CYGD BOND bond.unleashedinformatics.com/ A. thaliana, B. taurus, Action? C. elegans, D. melanogaster, G. gallus, H. pylori, HIV1, H. sapiens, R. norvegicus, S. cerevisiae, M. musculus www.hprd.org/ H. sapiens HPRD DIP dip.doe-mbi.ucla.edu/ C. elegans, D. melanogaster, E. coli, H. pylori, H. sapiens, M. musculus, R. norvegicus, S. cerevisiae mint.bio.uniroma2.it/mint/ B. taurus, C. elegans, MINT Welcome.do D. melanogaster, E. coli, H. pylori, H. sapiens, M. musculus, R. norvegicus, S. cerevisiae IntAct www.ebi.ac.uk/intact/index.jsp C. elegans, D. melanogaster, E. coli, H. sapiens, M. musculus, S. cerevisiae www.biocarta.com/ H. sapiens, M. musculus Biocarta MPPI mips.gsf.de/proj/ppi/ Mammals www.rcsb.org/pdb/ Archae, Bacteria, Eucaryota, PDB Viruses Experimental datasets for protein–small-molecule interactions:

AffinDB LPDB

has been successfully applied to the protein assembled around neurotransmitter receptors N-methyl-D-aspartate (NMDA) and 5-hydroxytryptamine 2c (5-HT2C) [18,19]. The main advantage is that endogenously expressed and native proteins are retrieved from cells or even tissues, which is closest to physiological conditions. Availability of specific antibodies or other affinity capturing agents remains a major limitation. More generic approaches exploiting known high-affinity interactions have been broadly developed where the proteins of interest ‘‘baits’’ are fused to an affinity-tag that is eventually captured on a suitable affinity chromatography

8 resin. Many different genes can be fused to the same tag in parallel, expressed in the appropriate cell type and isolated using the same affinity resins. The components of the purified protein complexes are identified by MS either directly (shot-gun sequencing approaches) or after one-dimensional gel electrophoresis. The tag is often an epitope-tag. An antibody directed against the tag, instead of the bait protein itself, is used for protein complex purification. A broad variety of epitope-tags are available such as Myc, HA, Flag and KT3. Other tagging systems have also been developed that exploit enzyme–substrate interaction, for example, between glutathioneS-transferase (GST) and glutathione (GSH) or the use of strep tags [20]. A variety of tag-affinity resin pairs are currently available. Their respective performances in a variety of expression systems have been extensively reviewed [21–23]. In the past, mono-affinity approaches have been adapted to the large-scale purification of yeast protein complexes assembled around more than 700 baits involved in cell signaling and in the DNA damage response [24] (Table 2). Further improvements aimed at a higher discrimination against unspecific protein background while retaining the essential components of the protein complex. More stringent washing steps in ‘‘traditional’’ mono-affinity purification schemes are not always compatible with the preservation of the protein complex integrity. The tandem affinity purification (TAP) protocol utilizes sequentially two epitope-tags, Protein A and calmodulin binding peptide (CBP), instead of one and addresses some of the signal-to-noise issues [25,26]. The TAP-fusion protein is expressed in cells and a protein complex can assemble under physiological conditions with the endogenous components. The tagged protein along with associated partners is retrieved by two steps of affinity purifications (Fig. 2A). First, the protein A tag is immobilized on immunoglobulin resin. The protein complex is specifically eluted by protease cleavage, using Tobacco Etch Virus (TEV) protease. The TEV protease very specifically cleaves a seven amino-acid sequence that has been introduced between the Protein A and the CBP tags. The TEV cleavage sequence is only found in a few human, mouse or yeast proteins, ensuring that the retrieved complexes are not digested. In a second affinity step, the complex is immobilized to calmodulin-coated beads via the CBP tag. This step removes the TEV and further contaminants. As the CBP–calmodulin interaction is calcium dependent, a second specific elution step is achieved through the removal of calcium with a chelating agent (EGTA). The TAP/MS protocol has been rapidly adapted to high-throughput analyses of protein complexes in a variety of organisms, including the bacteria Escherichia coli [27], S. cerevisiae [28–30], plants [31] and human [32] (Table 2). Proteome-wide screens in yeast, including more than 2,000 baits, provided the largest repertoire of eukaryotic protein complexes so far [29,33]. The TAP–MS approach is not limited to one cell type, it is possible to monitor and quantify changes in complex compositions in different cell lines,

9 during development or following various cell stimulations [34–36]. Only one protein is cloned and tagged, all other components of the complex are native proteins and reflect the natural diversity of protein isoforms, such as alternative splicing and post-translational modifications. Because of the two steps of purification, it generally efficiently reduces the unspecific protein background and most importantly, stringent purification condition can be avoided. The protein complexes can be kept under ‘‘native’’ conditions throughout the purification procedure. For example, TAP-purified complexes have been successfully used for electron microscopy studies [37,38]. A diversity of different tag combinations are also now available that have been optimized for expression in mammalian or insect cells [39,40]. The affinity purification/MS methods are not generally designed to monitor very labile or transient interactions (generally Kd p mid nM; unpublished data). In addition, the fusion with an epitope-tag may sometimes interfere with the biological function of the tagged protein, as it impairs its folding, its recruitment within a protein complex or its sub-cellular localization. These risks can be significantly reduced by creating and analyzing both N-terminal and C-terminal fusions in parallel. Finally, over-expression can also lead to aberrant localization, protein aggregation or toxicity. Generally, tight controls over the expression levels and the baitprotein localization should be included in any systematic screen setting [32]. Protein or small molecule pulldown The pulldown assay is probably one of the oldest and widest spread techniques to identify biomolecular interactions [41]. The assay monitors the ability of a ligand (bait), for example, a recombinant protein, a domain, a peptide or a metabolite bound to a matrix, to specifically capture proteins from a complex cell extract (Fig. 2B). The binding of the bait to the matrix can be achieved by chemical cross-linking. Alternatively, the bait can be expressed as a tag-fusion, for example, GST-fusion, with specificity for a particular affinity resin, for example, glutathione sepharose (GST-pulldown). Analyte proteins (preys), typically a cellular extract, are incubated with the bound bait and non-interacting proteins are washed away under mild washing conditions. Protein interactors can then be eluted by high-salt conditions, cofactors, competitors, chaotropic agents or sodium dodecyl sulfate (SDS) and are identified with specific antibodies (western blot) or by MS. In contrast to classical affinity chromatography methods (see aforementioned), pulldown is less physiological, as the binding happens in vitro on a solid support. The assay, however, is generally very sensitive. With appropriate concentrations of the immobilized ligand, i.e., well above the Kd of the interaction, interactions with binding constants as weak as 105 M can be detected. This high sensitivity, however, comes at the cost of a relatively low specificity and a high rate of false positives. Generally,

10 Purification step 1 TEV Protein A

Y

Glu

Purification step 2

Y

Y Z

X

CBP Ca2+

Ca2+ GST

Glu 1D SDS-PAGE

X

Z

Incubation

Ca2+ Calmodulin

GST

Z

X

CBP

X

Y

Mass spectrometry

Wash Elute * *

(a)

(b)

Y

Analyze: SDS-PAGE, western blot,

3-4 rounds of panning

AD

X X

DBD

Z

AD

Wash prom.

X

reporter

Y

Specific elution, titration, amplification in E.coli

AD

DBD

Sequencing of phage DNA identification of binders

reporter

prom.

(c)

(d)

Tag

(e)

Tag

11 adequate controls, for example, inactive mutants or analogs, must be carefully designed to discriminate artifactual binding. Mann’s group used synthetic peptides from the four members of the ErbB-receptor family either tyrosine phosphorylated or non-phosphorylated in pulldown experiments. By quantitative MS they characterized the phosphotyrosine-dependent interactions induced by growth factors stimulation. The analysis recapitulates almost all previously known epidermal growth factor receptor substrates as well as 31 novel effectors [42]. Another interesting area of pulldown application is the monitoring of small molecule- or metabolite-binding profiles in complex proteomes. Such approaches have been broadly used to study protein–lipid interactions using, for example, biotinylated liposome of varying lipid composition or

Fig. 2. Technologies for the large-scale charting of biomolecular interactions. (A)

Tandem Affinity Purification (TAP): A bait protein is fused to a TAP-tag built of a protein A coupled via a TEV protease sensitive linker to a calmodulin-binding peptide. The TAP-bait fusion is expressed at endogenous levels and can form a functional complex in native like conditions. The protein complex is isolated via two subsequent chromatographic purification steps, the first involving purification on IgG beads followed by elution via TEV cleavage and a second step on calmodulin beads followed by EGTA elution. Individual protein complex subunits can be visualized on SDS PAGE and identified by mass spectrometry. X, bait protein; Y and Z, natural complex subunits (preys); CBP, Calmodulin-binding peptide. (B) Pulldown: GST pulldown is shown here as an example for chromatographic affinity protein or small molecule pulldown. The bait protein is fused to an affinity tag (GST) which can bind an affinity chromatography resin (Glu, gutathione sepharose beads). It is incubated with prey proteins and unspecific proteins are washed away, while interacting proteins remain bound to the bait. Binders can be visualized by SDS PAGE, western blot or any other suitable method. (The color version of this figure is hosted on Science Direct.) (C) Yeast two hybrid: A bait protein is fused to a DNA-binding domain of a specific promoter, which is coupled to a reporter gene. A bank of prey proteins fused to an activation domain is cloned into the yeast cells, where only one prey protein is expressed in one yeast clone. If interaction between bait and prey takes place the reporter gene is activated and an expression phenotype can be visualized. DBD, DNA binding domain; AD, activation domain; X, bait protein; Y and Z, prey proteins. (D) Phage display: A phage library with individual phages expressing different prey proteins are incubated with a bait protein, which is immobilized on a solid surface. Non-specific phages are washed away and specific interacting phages are titrated and amplified. Specifically binding phages are amplified in 4–5 rounds of panning and the binding preys identified by sequencing of the phage DNA. (E) Protein or small molecule microarray: A bank of prey proteins or small molecules is spotted on a solid surface. The array is then incubated with labeled or tagged bait. Labels can be visualized either directly (GFP, Radioactivity) or indirectly via antibodies.

12 concentration [43]. More recently, pulldown approaches coupled to MS have elegantly been applied to the proteome-wide charting of cAMP/cGMP- [44] and purine-binding proteomes [45]. Genetic approaches Monitoring binary interaction; the yeast two-hybrid system The yeast two-hybrid system is a genetic, ex vivo assay that allows the charting of binary interactions. Its principle relies on the modular nature of transcription factors that contain both a site-specific DNA-binding domain (DBD) and a transcriptional activation domain (AD) that recruits the transcriptional machinery to the promoters. The interaction between a ‘‘bait’’ fusion (protein X-DBD hybrid protein) and a ‘‘prey’’ fusion (protein Y-AD hybrid protein) reconstitutes a functional transcription factor which turns on the expression of reporter genes or selection markers [46] (Fig. 2C). The system is readily scalable and has very rapidly evolved to genome-wide strategies that have been broadly applied to the charting of protein–protein interactions in a variety of organisms (Table 2), including Helicobacter pylori (Rain et al. [47]), budding yeast [48], Drosophila melanogaster [49,50], Caenorhabditis elegans [51] and Homo sapiens [52,53]. The two-hybrid system is a sensitive assay suitable for the detection of weak and/or very transient interactions. Dissociation constants down to 106 M, corresponding to the range of the weakest interaction occurring in the cell, can be detected this way. The system also has drawbacks mainly related to its ex vivo nature. Expressed fusion proteins are forced to the nucleus, which may not be their natural location. For example, membrane proteins are usually not compatible with such a nuclear-based assay. Ectopically expressed proteins may not undergo the appropriate post-translational modifications. Similarly, interaction often involves cooperative, allosteric events or chaperone-assisted assembly that may not occur in the nucleus. Finally, transcription factors, as well as other proteins (about 5–10% of gene products) can auto-activate transcription of the reporter genes making them unsuitable for this approach. Several modified versions of the yeast two-hybrid system have been developed that address some of these limitations. They involve the reconstitution of modular proteins other than transcription factors that enable the analyses of proteins not amenable to the ‘‘classical’’ two-hybrid assay (essentially membrane proteins and transcription factors). It includes the SOS [54] or the Ras recruitment systems [55], the G-protein-based screening assay [56], the split-ubiquitin system [57] and the mammalian protein–protein interaction trap (MAPPIT) based on the complementation of signaling-deficient type I cytokine receptors [58]. Although some of these assays are apparently robust [59], none of them has yet been used in highthroughput proteome-wide screens.

13 Reverse versions of the two-hybrid system have been developed, where the disruption of a given protein–protein interaction generates a signal. These reverse approaches, initially developed in yeast [60], have matured in an arsenal of assays that allow the screening for small molecules disrupting selected protein–protein interactions [61,62]. Finally, the most recent application of the two-hybrid principle is MASPIT, a three-hybrid trap for quantitative screening of small molecule–protein interactions in mammalian cells [63]. Using MASPIT, Caligiuri et al. [63] could show that besides its well-known inhibitory action on the SRC kinase, the pyrido[2,3-d]pyrimidine PD173955 is also a potent inhibitor of several ephrin receptor tyrosine kinases. Phage display Smith, Scott and colleagues first proposed a way of displaying polypeptides on the surface of filamentous M13-derived bacteriophages [64]. Polypeptides (preys) are expressed as fusions with the phage coat protein pIII [64,65]. During the phage assembly process, the resulting fusion proteins are transported to the bacterial cell membrane and are incorporated into the phage particle along with the single-stranded DNA (ssDNA) encoding the displayed fusion protein [65]. Phage libraries displaying large diversity of preys (109 unique sequences) can be created, amplified and screened for the specific binding to an immobilized target protein (bait). Usually three to five rounds of panning are sufficient to enrich for phages expressing a peptide sequence interacting with the bait. The identity of the polypeptide binder is deduced by sequencing the corresponding phage DNA (Fig. 2D). When necessary, affinity maturation for individual clones can be performed through generation of secondary libraries of mutated peptides [66–68]. Phage display is suitable for the charting of interactions with affinity constants in the micromolar to nanomolar range. To fine-tune the sensitivity of the assay to a particular bait, alternative systems have been engineered. They use different viral coat proteins that are expressed and displayed at the phage surface with different stoichiometry [69]. The system has clear limitations as many proteins do not fold properly in bacterial periplasm. To circumvent this problem, new strategies use lytic bacteriophages, such as T4 [70], T7 [71] or P4 [72,73], that assemble their capsids in the cytoplasm. Phage display has proven particularly powerful to the selection of relatively small peptides. Using phage display, Robinson et al. [74] screened a peptide library for the selection of nonameric sequences that specifically bind and target human papilloma viruses (HPVs) transformed cells. They identified three different consensus tumor-targeting sequences that could be employed for the selected delivery of therapeutics. Other illustrative examples are recent epitomics screens where peptide libraries were selected for the specific binding to antibodies from serum of cancer patients. The selected

14 peptides were used to develop a peptide array used for the diagnostic of cancer [75,76]. Protein and small molecule microarrays It has become routine to use DNA microarrays to probe the expression of thousands of genes in parallel. Similarly, protein or small molecule microarrays have been developed. They provide a mean to the rapid and parallel screening of large numbers of proteins for biochemical activities, protein–protein, protein–lipid, protein–nucleic acid and protein–small molecule interactions (Fig. 2E). The first high-density proteome microarray consisted of 5,800 GST-HisX6 tag-fusion yeast proteins [77]. Such protein microarrays are currently commercially available. Details on the procedure have been extensively reviewed [78,79]. Protein arrays are in principle amenable to proteome-wide screens for protein–protein interactions. So far though, they have been only used on limited scale to identify new calmodulinand phospholipid-binding proteins as well as to monitor domain–domain and antigen–antibody interactions [77,80–82]. Similarly, a variety of small molecule arrays have been developed, that involve the covalent coupling of entire chemical collections of synthetic small molecules [83,84] or natural products [85] to a solid surface. Small molecule micorarrays have been used in a broad range of applications such as the determination of protease activity profile on cell lysates [86], the study of ligand binding specificity of proteins and domains [87] and the identification of protein modulators [88]. For instance, using a small molecule microarray containing a selection of 3,800 compounds, Kuruvilla et al. [88] identified 8 small molecules that selectively bind the yeast transcriptional regulator Ure2. Measurement of the dynamics of biomolecular interactions: The quantitative methods Generally, the information contributed by large-scale studies of biomolecular interactions is static. Indeed, the methods designed to the charting of networks on a large-scale fail to capture the dynamic aspect of recognition that is central to the whole cell functioning. Many of the large-scale methods imply the expression of proteins under non-physiological conditions and in these ex vivo or in vitro systems, the regulation and the fine-tuning of the molecular interactions at cellular and physiological levels are usually lost. To fill this gap, approaches based on the affinity purification of molecular assemblies formed inside the cell and the measurement, by quantitative MS, of changes in complex stoichiometry following various cell stimulations have emerged [34,36]. Also, in silico strategies imply the integration of interaction

15 with global expression data [89–91]. Pseudo-affinity scores have been developed that approximate the tendency of protein pairs to associate and form direct physical contact [29]. In this paragraph, we review several biochemical approaches that have been used in the past on a smaller scale to quantify the binding constants and the thermodynamic parameters governing biomolecular interactions. Some of the methods might be amenable to more global strategies (Fig. 3). Fluorescence-based assays such as fluorescence resonance energy transfer (FRET) also hold great promises. They allow the monitoring and quantification of protein–protein interactions inside a living cell. These technologies are outside the scope of this review and will not be discussed; they have been extensively reviewed elsewhere [92,93].

µM Strong binding

Weak binding

Kd measurement

Weak binding

nM

Strong binding

Low throughput Equilibrium dialysis Stopped flow ITC Qualitative

Surface plasmon resonance

Holdup Quantitative Phage display Yeast two hybrid Microarray Pulldown AP/MS or TAP/MS High throughput

Fig. 3. Recapitulative plot of the described methods. The methods cited in the paper

are plotted according to their features. Qualitative methods are depicted on the left side, quantitative on the right side. The Y-axis codes compatibility for high throughput. The X-axis codes for the capacity to detect and analyze weak or strong interactions. In case Kd can be quantitatively described, the working range is indicated as a double-headed arrow, ranging from sub-micromolar to nanomolar. Black boxes are indicative for techniques which allow working in solution; grey boxes for techniques, where one partner is bound to a solid matrix.

16 Surface plasmon resonance: Equilibrium and competition in solution Almost 15 years after the development of the first biosensor relying on the principle of surface plasmon resonance (SPR) [94], SPR-based strategies have gained popularity in the field of interactomics, because of the high accuracy, reliability and sensitivity in reporting binding rates. An electromagnetic wave is measured, which propagates in a sensor surface between a dielectric and a metal (usually gold), the so-called surface plasma wave (SPW). Biacore and other SPR-based instruments use an optical method to measure the refractive index near the sensor surface (within 300 nm). A sensor surface is integrated in a flow cell (IFC) through which an aqueous solution passes under continuous flow (1–100 ml/min). The ligand (bait) is immobilized by chemical cross-linking onto the sensor surface either directly or on different carboxymethylated dextran matrices. The binding of the analyte (prey) results in an increase in local density at the surface of the sensor. This change is measured in real time and expressed in resonance units (RU). One RU represents the binding of approximately 1 pg protein/mm2. SPR has been broadly used to study the binding behavior of macromolecules such as recombinant proteins with their natural ligands. It generates real-time data and is well designed for the analysis of binding kinetics. An interesting extension is the possibility to design experiments, where measurements are made at different temperatures. This allows the monitoring of thermodynamic parameters (entropy, etc.). SPR performs for the measurement of interactions that cover broad ranges of Kd, from nM to the high mM range. An important limitation to kinetic analysis is due to the effect of mass transport that affects interactions with fast kon values. At kon faster than about 106–107 M1s1, the measurements significantly lose accuracy. Stopped flow Stopped flow experiments [95,96] are designed to measure interactions that have a very fast kinetic (rapid kon and koff ) and that are usually not amenable to SPR or other methods. The principle relies on the very fast mixing of bait and prey solutions. The time required for mixing the two solutions, the ‘‘dead time,’’ is the shortest measurable time point in a stopped flow experiment. It ranges from 500 to 40 ms for the latest generations of stopped flow instruments. Fast reactions also require fast measurement methods. The mixing chamber is usually coupled to an external device for measuring the binding reaction, such as UV or visible spectroscopy, circular dichroism spectrometer, fluorescence spectrometer or electrical conductivity. Besides the possibility to monitor fast kinetics other advantages rely on the fact that measurement happens in solution. In consequence, there is no need of coupling the bait or the prey to a solid surface. Finally, only low amounts of biological material are needed.

17 Calorimetry – isothermal titration calorimetry (ITC) Isothermal titration calorimetry (ITC) is a quantitative technique that is designed to measure the binding affinity (Ka), enthalpy changes (dH ) and binding stoichiometry (n) between two or more molecules interacting in solution. An isothermal titration calorimeter is composed of two identical cells made of a highly efficient thermal conducting material surrounded by an adiabatic jacket. Sensitive thermocouple circuits are used to detect temperature differences between a buffer-filled reference cell and a sample cell containing the interacting macromolecule. During the experiment, the ligand (bait) is injected into the sample cell containing a prey. An exothermic reaction produces heat; the opposite occurs in case of an endothermic reaction. A power is applied to the reference cell and is coupled to a feedback circuit, activating a heater, located on the sample cell. The calorimeter measures the power needed over time to maintain the reference and the sample cell at an identical temperature. ITC measures interaction directly in solution and does not require any modification or immobilization of bait or prey. ITC directly measures the heat change during the complex formation and, in addition to the binding constants, it also measures the thermodynamic parameters governing the interaction [97,98]. The main limitation remains the need of relatively high amount of sample (in the order of milligrams). Equilibrium dialysis Equilibrium dialysis is probably one of the simplest and nevertheless an effective assay for the study of interactions between molecules. The experimental setup is based on two chambers which are separated by a dialysis membrane. The molecular weight cut off of this membrane is chosen such that it will retain the ligand (bait) while the analyte (prey) diffuses freely. A known concentration and volume of prey is placed into one of the chambers. Its diffusion across the membrane and binding to the bait takes place until equilibrium has been reached. At equilibrium, the concentration of prey in free solution is the same in both chambers. In the bait chamber, however, the overall concentration is higher due to the presence of bait–prey complexes. The equilibrium binding of various concentrations of the prey and bait can be used to determine the Kd as well as number of binding sites on the bait. Equilibrium dialysis also offers the ability to study low affinity interactions that are undetectable using other methods. It has been used to the detailed study of antigen–antibody interactions [99]. Hold up The hold up assay, also called ‘‘comparative chromatography retention assay,’’ is based, like the pulldown assay, on the reversible binding of a bait

18 to affinity resin. The main difference is that the hold up assay does not include washing steps and directly measures the amounts of prey remaining in solution upon exposure to the resin-bound bait. Therefore, in contrast to pulldown experiments, the hold up assay gives access to visualization of complexes at equilibrium conditions. Because it measures interaction at equilibrium, it allows detection of fast-exchanging protein complexes. Equilibrium dissociation constants (Kd) are measured that are comparable to the ones measured using SPR [100]. The hold up assay is specifically adapted to monitor weak protein interactions, where high concentrations of bait are necessary. The method is extremely simple and the general experimental setup is prone to automation. Work on this aspect is actually in progress. Conclusions and perspectives Nowadays, a growing choice of technologies is available to the scientists for the global charting of biomolecular interactions. These large-scale approaches have already contributed comprehensive cartographies of the proteins functionally involved in various human pathways that underlie pathologies. For instance, in human, the systematic mapping, by TAP/MS, of the protein interaction network around 32 components of the proinflammatory TNF-alpha/NF-kappa B signaling cascade led to the identification of 221 molecular associations. The analysis of the network and directed functional perturbation studies using RNA interference highlighted 10 new functional modulators that provided significant insight into the logic of the pathway as well as new candidate targets for pharmacological intervention [32]. Recently, global analysis of the interaction between a variety of viruses and their hosts provided new hypotheses on viral strategies for replication and persistence [101–103]. Generally, the elucidation of pathways or cellular processes important to human diseases is expected to contribute alternative therapeutic targets with better chemical tractability and also to provide a molecular frame for the interpretation of genetic links. In the simple eukaryote, S. cerevisiae, more global, genome-wide screens for protein complexes raised the interesting view that protein multifunctionality may be a more general attribute than initially anticipated [28,104]. Different protein complexes very often use the same protein to exert their different biological functions. About 37% of the proteins were found to be part of more than one protein complex [105]. Proteins, similar to the globular domains they are made of, are used in a combinatorial manner and contribute to the assembly of a variety of ‘‘molecular machines.’’ Protein modularity or multifunctionality has been proposed to support parsimonious increases in organismal complexity, i.e., with a relatively constant number of genes. The understanding of protein modularity in higher eukaryotes may

19 not only provide a molecular frame for the explanation of genetic traits such as genetic pleiotropy, but is also expected to contribute to the selection of more specific and ‘‘safer’’ drug targets. The systematic and unbiased charting of biomolecular networks in a variety of organisms contributes to our understanding of the sequences, motifs and structural folds involved in the processes of molecular recognition [5]. These recent advances opened new avenues for the identification of leads that specifically abrogate or modulate disease-relevant interactions. Promising successes include FTY720 (fingolimod; 2-amino-2[2-(4-octylphenyl) ethyl]-1,3-propanediol, Novartis) a sphingosine-one phosphate (S1P) analog that binds four of the S1P receptors [106], disruptors of the interaction between p53 and murine double minute 2 (MDM2) [107], compounds that interfere with the interaction between Bcl-2 and Bak [108] and small molecules inhibitors of SH3-mediated interactions [109]. Finally, the adaptation of existing technology to the large-scale measurement of biomolecular interactions in (semi-)quantitative terms represents an important challenge. Acknowledgments Our work is supported by the EMBL and the EU-grant ‘‘3D repertoire.’’ We would like to thank Petra Riedinger for help with the figures. We are grateful to Christoph Mu¨ller, Vladimir Rybin, Gilles Trave´ and Yves Nomine´ for critical reading of the manuscript. References 1. Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M, Louis EJ, Mewes HW, Murakami Y, Philippsen P, Tettelin H and Oliver SG. Life with 6000 genes. Science 1996;274:546, 563–567. 2. Pawson T, Raina M and Nash P. Interaction domains: from simple binding events to complex cellular behavior. FEBS Lett 2002;513:2–10. 3. Brown D and Superti-Furga G. Rediscovering the sweet spot in drug discovery. Drug Discov Today 2003;8:1067–1077. 4. Jeong H, Mason SP, Barabasi AL and Oltvai ZN. Lethality and centrality in protein networks. Nature 2001;411:41–42. 5. Neduva V, Linding R, Su-Angrand I, Stark A, de Masi F, Gibson TJ, Lewis J, Serrano L and Russell RB. Systematic discovery of new recognition peptides mediating protein interaction networks. PLoS Biol 2005;3:e405. 6. Neduva V and Russell RB. DILIMOT: discovery of linear motifs in proteins. Nucleic Acids Res 2006;34:W350–W355. 7. Neduva V and Russell RB. Peptides mediating interaction networks: new leads at last. Curr Opin Biotechnol 2006;17:465–471. 8. Landschulz WH, Johnson PF and McKnight SL. The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins. Science 1988;240:1759–1764.

20 9. He W, Miao FJ, Lin DC, Schwandner RT, Wang Z, Gao J, Chen JL, Tian H and Ling L. Citric acid cycle intermediates as ligands for orphan G-protein-coupled receptors. Nature 2004;429:188–193. 10. Anderson J, Burns HD, Enriquez-Harris P, Wilkie AO and Heath JK. Apert syndrome mutations in fibroblast growth factor receptor 2 exhibit increased affinity for FGF ligand. Hum Mol Genet 1998;7:1475–1483. 11. Shirohzu H, Kubota T, Kumazawa A, Sado T, Chijiwa T, Inagaki K, Suetake I, Tajima S, Wakui K, Miki Y, Hayashi M, Fukushima Y and Sasaki H. Three novel DNMT3B mutations in Japanese patients with ICF syndrome. Am J Med Genet 2002;112:31–37. 12. Wiszniewski W, Fondaneche MC, Louise-Plence P, Prochnicka-Chalufour A, Selz F, Picard C, Le Deist F, Eliaou JF, Fischer A and Lisowska-Grospierre B. Novel mutations in the RFXANK gene: RFX complex containing in-vitro-generated RFXANK mutant binds the promoter without transactivating MHC II. Immunogenetics 2003;54:747–755. 13. Trojanowski JQ and Lee VM. ‘‘Fatal attractions’’ of proteins: a comprehensive hypothetical mechanism underlying Alzheimer’s disease and other neurodegenerative disorders. Ann N Y Acad Sci 2000;924:62–67. 14. Eng C. PTEN: one gene, many syndromes. Hum Mutat 2003;22:183–198. 15. Hagemann TL, Chen Y, Rosen FS and Kwan SP. Genomic organization of the Btk gene and exon scanning for mutations in patients with X-linked agammaglobulinemia. Hum Mol Genet 1994;3:1743–1749. 16. Hyvonen M and Saraste M. Structure of the PH domain and Btk motif from Bruton’s tyrosine kinase: molecular explanations for X-linked agammaglobulinaemia. Embo J 1997;16:3396–3404. 17. Orchard S, Salwinski L, Kerrien S, Montecchi-Palazzi L, Oesterheld M, Stumpflen V, Ceol A, Chatr-Aryamontri A, Armstrong J, Woollard P, Salama JJ, Moore S, Wojcik J, Bader GD, Vidal M, Cusick ME, Gerstein M, Gavin AC, Superti-Furga G, Greenblatt J, Bader J, Uetz P, Tyers M, Legrain P, Fields S, Mulder N, Gilson M, Niepmann M, Burgoon L, Rivas Jde L, Prieto C, Perreau VM, Hogue C, Mewes HW, Apweiler R, Xenarios I, Eisenberg D, Cesareni G and Hermjakob H. The minimum information required for reporting a molecular interaction experiment (MIMIx). Nat Biotechnol 2007;25:894–898. 18. Husi H, Ward MA, Choudhary JS, Blackstock WP and Grant SG. Proteomic analysis of NMDA receptor-adhesion protein signaling complexes. Nat Neurosci 2000;3:661–669. 19. Becamel C, Alonso G, Galeotti N, Demey E, Jouin P, Ullmer C, Dumuis A, Bockaert J and Marin P. Synaptic multiprotein complexes associated with 5-HT(2C) receptors: a proteomic approach. Embo J 2002;21:2332–2342. 20. Junttila MR, Saarinen S, Schmidt T, Kast J and Westermarck J. Single-step Strep-tag purification for the isolation and identification of protein complexes from mammalian cells. Proteomics 2005;5:1199–1203. 21. Korf U, Kohl T, van der Zandt H, Zahn R, Schleeger S, Ueberle B, Wandschneider S, Bechtel S, Schnolzer M, Ottleben H, Wiemann S and Poustka A. Large-scale protein expression for proteome research. Proteomics 2005;5:3571–3580. 22. Lichty JJ, Malecki JL, Agnew HD, Michelson-Horowitz DJ and Tan S. Comparison of affinity tags for protein purification. Protein Expr Purif 2005;41:98–105. 23. Terpe K. Overview of tag protein fusions: from molecular and biochemical fundamentals to commercial systems. Appl Microbiol Biotechnol 2003;60:523–533. 24. Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, Yang L, Wolting C, Donaldson I, Schandorff S, Shewnarane J, Vo M, Taggart J, Goudreault M, Muskat B, Alfarano C, Dewar D, Lin Z, Michalickova K,

21

25.

26.

27.

28.

29.

30.

31.

32.

33.

Willems H, Sassi H, Nielsen PA, Rasmussen KJ, Andersen JR, Johansen LE, Hansen LH, Jespersen H, Podtelejnikov A, Nielsen E, Crawford J, Poulsen V, Sorensen BD, Matthiesen J, Hendrickson RC, Gleeson F, Pawson T, Moran MF, Durocher D, Mann M, Hogue CW, Figeys D and Tyers M. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002;415:180–183. Rigaut G, Shevchenko A, Rutz B, Wilm M, Mann M and Seraphin B. A generic protein purification method for protein complex characterization and proteome exploration. Nat Biotechnol 1999;17:1030–1032. Puig O, Caspary F, Rigaut G, Rutz B, Bouveret E, Bragado-Nilsson E, Wilm M and Seraphin B. The tandem affinity purification (TAP) method: a general procedure of protein complex purification. Methods 2001;24:218–229. Butland G, Peregrin-Alvarez JM, Li J, Yang W, Yang X, Canadien V, Starostine A, Richards D, Beattie B, Krogan N, Davey M, Parkinson J, Greenblatt J and Emili A. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature 2005;433:531–537. Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, Remor M, Hofert C, Schelder M, Brajenovic M, Ruffner H, Merino A, Klein K, Hudak M, Dickson D, Rudi T, Gnau V, Bauch A, Bastuck S, Huhse B, Leutwein C, Heurtier MA, Copley RR, Edelmann A, Querfurth E, Rybin V, Drewes G, Raida M, Bouwmeester T, Bork P, Seraphin B, Kuster B, Neubauer G and Superti-Furga G. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 2002;415:141–147. Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B, Edelmann A, Heurtier MA, Hoffman V, Hoefert C, Klein K, Hudak M, Michon AM, Schelder M, Schirle M, Remor M, Rudi T, Hooper S, Bauer A, Bouwmeester T, Casari G, Drewes G, Neubauer G, Rick JM, Kuster B, Bork P, Russell RB and Superti-Furga G. Proteome survey reveals modularity of the yeast cell machinery. Nature 2006; Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, Punna T, Peregrin-Alvarez JM, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie B, Richards DP, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Canete MM, Vlasblom J, Wu S, Orsi C, Collins SR, Chandran S, Haw R, Rilstone JJ, Gandi K, Thompson NJ, Musso G, St Onge P, Ghanny S, Lam MH, Butland G, Altaf-Ul AM, Kanaya S, Shilatifard A, O’Shea E, Weissman JS, Ingles CJ, Hughes TR, Parkinson J, Gerstein M, Wodak SJ, Emili A and Greenblatt JF. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 2006;440:637–643. Rohila JS, Chen M, Chen S, Chen J, Cerny R, Dardick C, Canlas P, Xu X, Gribskov M, Kanrar S, Zhu JK, Ronald P and Fromm ME. Protein–protein interactions of tandem affinity purification-tagged protein kinases in rice. Plant J 2006;46:1–13. Bouwmeester T, Bauch A, Ruffner H, Angrand PO, Bergamini G, Croughton K, Cruciat C, Eberhard D, Gagneur J, Ghidelli S, Hopf C, Huhse B, Mangano R, Michon AM, Schirle M, Schlegl J, Schwab M, Stein MA, Bauer A, Casari G, Drewes G, Gavin AC, Jackson DB, Joberty G, Neubauer G, Rick J, Kuster B and Superti-Furga G. A physical and functional map of the human TNF-alpha/NF-kappa B signal transduction pathway. Nat Cell Biol 2004;6:97–105. Krogan NJ, Peng WT, Cagney G, Robinson MD, Haw R, Zhong G, Guo X, Zhang X, Canadien V, Richards DP, Beattie BK, Lalev A, Zhang W, Davierwala AP, Mnaimneh S, Starostine A, Tikuisis AP, Grigull J, Datta N, Bray JE, Hughes TR, Emili A and

22

34. 35. 36. 37.

38.

39.

40.

41. 42. 43.

44.

45.

46. 47.

48.

49.

Greenblatt JF. High-definition macromolecular composition of yeast RNA-processing complexes. Mol Cell 2004;13:225–239. Ranish JA, Yi EC, Leslie DM, Purvine SO, Goodlett DR, Eng J and Aebersold R. The study of macromolecular complexes by quantitative proteomics. Nat Genet 2003;33:349–355. Gingras AC, Aebersold R and Raught B. Advances in protein complex analysis using mass spectrometry. J Physiol 2005;563:11–21. Gingras AC, Gstaiger M, Raught B and Aebersold R. Analysis of protein complexes using mass spectrometry. Nat Rev Mol Cell Biol 2007;8:645–654. Aloy P, Ciccarelli FD, Leutwein C, Gavin AC, Superti-Furga G, Bork P, Bottcher B and Russell RB. A complex prediction: three-dimensional model of the yeast exosome. EMBO Rep 2002;3:628–635. Aloy P, Bottcher B, Ceulemans H, Leutwein C, Mellwig C, Fischer S, Gavin AC, Bork P, Superti-Furga G, Serrano L and Russell RB. Structure-based assembly of protein complexes in yeast. Science 2004;303:2026–2029. Burckstummer T, Bennett KL, Preradovic A, Schutze G, Hantschel O, Superti-Furga G and Bauch A. An efficient tandem affinity purification procedure for interaction proteomics in mammalian cells. Nat Methods 2006;3:1013–1019. Yang P, Sampson HM and Krause HM. A modified tandem affinity purification strategy identifies cofactors of the Drosophila nuclear receptor dHNF4. Proteomics 2006;6:927–935. Ratner D. The interaction bacterial and phage proteins with immobilized Escherichia coli RNA polymerase. J Mol Biol 1974;88:373–383. Schulze WX, Deng L and Mann M. Phosphotyrosine interactome of the ErbB-receptor kinase family, Mol Syst Biol 2005;1:2005.0008. Davidson WS, Ghering AB, Beish L, Tubb MR, Hui DY and Pearson K. The biotincapture lipid affinity assay: a rapid method for determining lipid binding parameters for apolipoproteins. J Lipid Res 2006;47:440–449. Scholten A, Poh MK, van Veen TA, van Breukelen B, Vos MA and Heck AJ. Analysis of the cGMP/cAMP interactome using a chemical proteomics approach in mammalian heart tissue validates sphingosine kinase type 1-interacting protein as a genuine and highly abundant AKAP. J Proteome Res 2006;5:1435–1447. Bantscheff M, Eberhard D, Abraham Y, Bastuck S, Boesche M, Hobson S, Mathieson T, Perrin J, Raida M, Rau C, Reader V, Sweetman G, Bauer A, Bouwmeester T, Hopf C, Kruse U, Neubauer G, Ramsden N, Rick J, Kuster B and Drewes G. Quantitative chemical proteomics reveals mechanisms of action of clinical ABL kinase inhibitors. Nat Biotechnol 2007;25:1035–1044. Fields S and Song O. A novel genetic system to detect protein–protein interactions. Nature 1989;340:245–246. Rain JC, Selig L, De Reuse H, Battaglia V, Reverdy C, Simon S, Lenzen G, Petel F, Wojcik J, Schachter V, Chemama Y, Labigne A and Legrain P. The protein–protein interaction map of Helicobacter pylori. Nature 2001;409:211–215. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S and Rothberg JM. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 2000;403:623–627. Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, Vijayadamodar G, Pochart P, Machineni H, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A, Minto M, Burgess S, McDaniel L, Stimpson E,

23

50.

51.

52.

53.

54.

55. 56.

57.

58.

59.

Spriggs F, Williams J, Neurath K, Ioime N, Agee M, Voss E, Furtak K, Renzulli R, Aanensen N, Carrolla S, Bickelhaupt E, Lazovatsky Y, DaSilva A, Zhong J, Stanyon CA, Finley RL, Jr., White KP, Braverman M, Jarvie T, Gold S, Leach M, Knight J, Shimkets RA, McKenna MP, Chant J and Rothberg JM. A protein interaction map of Drosophila melanogaster. Science 2003;302:1727–1736. Formstecher E, Aresta S, Collura V, Hamburger A, Meil A, Trehin A, Reverdy C, Betin V, Maire S, Brun C, Jacq B, Arpin M, Bellaiche Y, Bellusci S, Benaroch P, Bornens M, Chanet R, Chavrier P, Delattre O, Doye V, Fehon R, Faye G, Galli T, Girault JA, Goud B, de Gunzburg J, Johannes L, Junier MP, Mirouse V, Mukherjee A, Papadopoulo D, Perez F, Plessis A, Rosse C, Saule S, Stoppa-Lyonnet D, Vincent A, White M, Legrain P, Wojcik J, Camonis J and Daviet L. Protein interaction mapping: a Drosophila case study. Genome Res 2005;15:376–384. Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, Vidalain PO, Han JD, Chesneau A, Hao T, Goldberg DS, Li N, Martinez M, Rual JF, Lamesch P, Xu L, Tewari M, Wong SL, Zhang LV, Berriz GF, Jacotot L, Vaglio P, Reboul J, HirozaneKishikawa T, Li Q, Gabel HW, Elewa A, Baumgartner B, Rose DJ, Yu H, Bosak S, Sequerra R, Fraser A, Mango SE, Saxton WM, Strome S, Van Den Heuvel S, Piano F, Vandenhaute J, Sardet C, Gerstein M, Doucette-Stamm L, Gunsalus KC, Harper JW, Cusick ME, Roth FP, Hill DE and Vidal M. A map of the interactome network of the metazoan C. elegans. Science 2004;303:540–543. Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li S, Albala JS, Lim J, Fraughton C, Llamosas E, Cevik S, Bex C, Lamesch P, Sikorski RS, Vandenhaute J, Zoghbi HY, Smolyar A, Bosak S, Sequerra R, Doucette-Stamm L, Cusick ME, Hill DE, Roth FP and Vidal M. Towards a proteome-scale map of the human protein– protein interaction network. Nature 2005; Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H, Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, Timm J, Mintzlaff S, Abraham C, Bock N, Kietzmann S, Goedde A, Toksoz E, Droege A, Krobitsch S, Korn B, Birchmeier W, Lehrach H and Wanker EE. A human protein–protein interaction network: a resource for annotating the proteome. Cell 2005;122:957–968. Aronheim A, Zandi E, Hennemann H, Elledge SJ and Karin M. Isolation of an AP-1 repressor by a novel method for detecting protein–protein interactions. Mol Cell Biol 1997;17:3094–3102. Broder YC, Katz S and Aronheim A. The ras recruitment system, a novel approach to the study of protein–protein interactions. Curr Biol 1998;8:1121–1124. Ehrhard KN, Jacoby JJ, Fu XY, Jahn R and Dohlman HG. Use of G-protein fusions to monitor integral membrane protein–protein interactions in yeast. Nat Biotechnol 2000;18:1075–1079. Stagljar I, Korostensky C, Johnsson N and te Heesen S. A genetic system based on splitubiquitin for the analysis of interactions between membrane proteins in vivo. Proc Natl Acad Sci USA 1998;95:5187–5192. Eyckerman S, Verhee A, der Heyden JV, Lemmens I, Ostade XV, Vandekerckhove J and Tavernier J. Design and application of a cytokine-receptor-based interaction trap. Nat Cell Biol 2001;3:1114–1119. Iyer K, Burkle L, Auerbach D, Thaminy S, Dinkel M, Engels K and Stagljar I. Utilizing the split-ubiquitin membrane yeast two-hybrid system to identify protein–protein interactions of integral membrane proteins. Sci STKE 2005;2005:pl3.

24 60. Vidal M, Brachmann RK, Fattaey A, Harlow E and Boeke JD. Reverse two-hybrid and one-hybrid systems to detect dissociation of protein–protein and DNA-protein interactions. Proc Natl Acad Sci USA 1996;93:10315–10320. 61. Kato-Stankiewicz J, Hakimi I, Zhi G, Zhang J, Serebriiskii I, Guo L, Edamatsu H, Koide H, Menon S, Eckl R, Sakamuri S, Lu Y, Chen QZ, Agarwal S, Baumbach WR, Golemis EA, Tamanoi F and Khazak V. Inhibitors of Ras/Raf-1 interaction identified by two-hybrid screening revert Ras-dependent transformation phenotypes in human cancer cells. Proc Natl Acad Sci USA 2002;99:14398–14403. 62. Eyckerman S, Lemmens I, Catteeuw D, Verhee A, Vandekerckhove J, Lievens S and Tavernier J. Reverse MAPPIT: screening for protein-protein interaction modifiers in mammalian cells. Nat Methods 2005;2:427–433. 63. Caligiuri M, Molz L, Liu Q, Kaplan F, Xu JP, Majeti JZ, Ramos-Kelsey R, Murthi K, Lievens S, Tavernier J and Kley N. MASPIT: Three-hybrid trap for quantitative proteome fingerprinting of small molecule–protein interactions in mammalian cells. Chem Biol 2006;13:711–722. 64. Smith GP. Filamentous fusion phage: novel expression vectors that display cloned antigens on the virion surface. Science 1985;228:1315–1317. 65. Kehoe JW and Kay BK. Filamentous phage display in the new millennium. Chem Rev 2005;105:4056–4072. 66. Irving MB, Pan O and Scott JK. Random-peptide libraries and antigen-fragment libraries for epitope mapping and the development of vaccines and diagnostics. Curr Opin Chem Biol 2001;5:314–324. 67. Groves M, Lane S, Douthwaite J, Lowne D, Rees DG, Edwards B and Jackson RH. Affinity maturation of phage display antibody populations using ribosome display. J Immunol Methods 2006;313:129–139. 68. Ho M, Nagata S and Pastan I. Isolation of anti-CD22 Fv with high affinity by Fv display on human cells. Proc Natl Acad Sci USA 2006;103:9637–9642. 69. Sergeeva A, Kolonin MG, Molldrem JJ, Pasqualini R and Arap W. Display technologies: application for the discovery of drug and gene delivery agents. Adv Drug Deliv Rev 2006;58:1622–1654. 70. Jiang J, Abu-Shilbayeh L and Rao VB. Display of a PorA peptide from Neisseria meningitidis on the bacteriophage T4 capsid surface. Infect Immun 1997;65:4770–4777. 71. Danner S and Belasco JG. T7 phage display: a novel genetic selection system for cloning RNA-binding proteins from cDNA libraries. Proc Natl Acad Sci USA 2001;98:12954–12959. 72. Castagnoli L, Zucconi A, Quondam M, Rossi M, Vaccaro P, Panni S, Paoluzi S, Santonico E, Dente L and Cesareni G. Alternative bacteriophage display systems. Comb Chem High Throughput Screen 2001;4:121–133. 73. Malys N, Chang DY, Baumann RG, Xie D and Black LW. A bipartite bacteriophage T4 SOC and HOC randomized peptide display library: detection and analysis of phage T4 terminase (gp17) and late sigma factor (gp55) interaction. J Mol Biol 2002;319: 289–304. 74. Robinson P, Stuber D, Deryckere F, Tedbury P, Lagrange M and Orfanoudakis G. Identification using phage display of peptides promoting targeting and internalization into HPV-transformed cell lines. J Mol Recognit 2005;18:175–182. 75. Wang X, Yu J, Sreekumar A, Varambally S, Shen R, Giacherio D, Mehra R, Montie JE, Pienta KJ, Sanda MG, Kantoff PW, Rubin MA, Wei JT, Ghosh D and Chinnaiyan AM. Autoantibody signatures in prostate cancer. N Engl J Med 2005;353:1224–1235.

25 76. Draghici S, Chatterjee M and Tainsky MA. Epitomics: serum screening for the early detection of cancer on microarrays using complex panels of tumor antigens. Expert Rev Mol Diagn 2005;5:735–743. 77. Zhu H, Bilgin M, Bangham R, Hall D, Casamayor A, Bertone P, Lan N, Jansen R, Bidlingmaier S, Houfek T, Mitchell T, Miller P, Dean RA, Gerstein M and Snyder M. Global analysis of protein activities using proteome chips. Science 2001;293:2101–2105. 78. Schweitzer B, Predki P and Snyder M. Microarrays to characterize protein interactions on a whole-proteome scale. Proteomics 2003;3:2190–2199. 79. Poetz O, Schwenk JM, Kramer S, Stoll D, Templin MF and Joos TO. Protein microarrays: catching the proteome. Mech Ageing Dev 2005;126:161–170. 80. Poetz O, Ostendorp R, Brocks B, Schwenk JM, Stoll D, Joos TO and Templin MF. Protein microarrays for antibody profiling: specificity and affinity determination on a chip. Proteomics 2005;5:2402–2411. 81. Espejo A, Cote J, Bednarek A, Richard S and Bedford MT. A protein-domain microarray identifies novel protein–protein interactions. Biochem J 2002;367:697–702. 82. Hiller R, Laffer S, Harwanegg C, Huber M, Schmidt WM, Twardosz A, Barletta B, Becker WM, Blaser K, Breiteneder H, Chapman M, Crameri R, Duchene M, Ferreira F, Fiebig H, Hoffmann-Sommergruber K, King TP, Kleber-Janke T, Kurup VP, Lehrer SB, Lidholm J, Muller U, Pini C, Reese G, Scheiner O, Scheynius A, Shen HD, Spitzauer S, Suck R, Swoboda I, Thomas W, Tinghino R, Van Hage-Hamsten M, Virtanen T, Kraft D, Muller MW and Valenta R. Microarrayed allergen molecules: diagnostic gatekeepers for allergy treatment. Faseb J 2002;16:414–416. 83. Uttamchandani M, Walsh DP, Yao SQ and Chang YT. Small molecule microarrays: recent advances and applications. Curr Opin Chem Biol 2005;9:4–13. 84. MacBeath G, Koehler AN and Schreiber SL. Printing small molecules as microarrays and detecting protein–ligand interactions en masse. J Am Chem Soc 1999;121:7967–7968. 85. Schmitz K, Haggarty SJ, McPherson OM, Clardy J and Koehler AN. Detecting binding interactions using microarrays of natural product extracts. J Am Chem Soc 2007;129:11346–11347. 86. Winssinger N, Ficarro S, Schultz PG and Harris JL. Profiling protein function with small molecule microarrays. Proc Natl Acad Sci USA 2002;99:11139–11144. 87. Blixt O, Head S, Mondala T, Scanlan C, Huflejt ME, Alvarez R, Bryan MC, Fazio F, Calarese D, Stevens J, Razi N, Stevens DJ, Skehel JJ, van Die I, Burton DR, Wilson IA, Cummings R, Bovin N, Wong CH and Paulson JC. Printed covalent glycan array for ligand profiling of diverse glycan binding proteins. Proc Natl Acad Sci USA 2004;101:17033–17038. 88. Kuruvilla FG, Shamji AF, Sternson SM, Hergenrother PJ and Schreiber SL. Dissecting glucose signalling with diversity-oriented synthesis and small-molecule microarrays. Nature 2002;416:653–657. 89. Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA and Gerstein M. Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 2004;431:308–312. 90. Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP and Vidal M. Evidence for dynamically organized modularity in the yeast protein–protein interaction network. Nature 2004;430:88–93. 91. de Lichtenberg U, Jensen LJ, Brunak S and Bork P. Dynamic complex formation during the yeast cell cycle. Science 2005;307:724–727.

26 92. Presley JF. Imaging the secretory pathway: the past and future impact of live cell optical techniques. Biochim Biophys Acta 2005;1744:259–272. 93. Wallrabe H and Periasamy A. Imaging protein molecules using FRET and FLIM microscopy. Curr Opin Biotechnol 2005;16:19–27. 94. Jonsson U, Fagerstam L, Ivarsson B, Johnsson B, Karlsson R, Lundh K, Lofas S, Persson B, Roos H, Ronnberg I, et al. Real-time biospecific interaction analysis using surface plasmon resonance and a sensor chip technology. Biotechniques 1991;11: 620–627. 95. Gibson QH and Milnes L. Apparatus for rapid and sensitive spectrophotometry. Biochem J 1964;91:161–171. 96. Chance B. The kinetics of the enzyme-substrate compound of peroxidase. 1943. Adv Enzymol Relat Areas Mol Biol 1999;73:3–23. 97. Weber PC and Salemme FR. Applications of calorimetric methods to drug discovery and the study of protein interactions. Curr Opin Struct Biol 2003;13:115–121. 98. Perozzo R, Folkers G and Scapozza L. Thermodynamics of protein–ligand interactions: history, presence, and future aspects. J Recept Signal Transduct Res 2004; 24:1–52. 99. Ji QC, Rodila R, Morgan SJ, Humerickhouse RA and El-Shourbagy TA. Investigation of the immunogenicity of a protein drug using equilibrium dialysis and liquid chromatography tandem mass spectrometry detection. Anal Chem 2005;77:5529–5533. 100. Charbonnier S, Zanier K, Masson M and Trave G. Capturing protein–protein complexes at equilibrium: the holdup comparative chromatographic retention assay. Protein Expr Purif 2006;50:89–101. 101. Calderwood MA, Venkatesan K, Xing L, Chase MR, Vazquez A, Holthaus AM, Ewence AE, Li N, Hirozane-Kishikawa T, Hill DE, Vidal M, Kieff E and Johannsen E. Epstein-Barr virus and virus human protein interaction maps. Proc Natl Acad Sci USA 2007;104:7606–7611. 102. Uetz P, Dong YA, Zeretzke C, Atzler C, Baiker A, Berger B, Rajagopala SV, Roupelieva M, Rose D, Fossum E and Haas J. Herpesviral protein networks and their interaction with the human proteome. Science 2006;311:239–242. 103. Brizard JP, Carapito C, Delalande F, Van Dorsselaer A and Brugidou C. Proteome analysis of plant-virus interactome: comprehensive data for virus multiplication inside their hosts. Mol Cell Proteomics 2006;5:2279–2297. 104. Gavin AC and Superti-Furga G. Protein complexes and proteome organization from yeast to man. Curr Opin Chem Biol 2003;7:21–27. 105. Krause R, von Mering C, Bork P and Dandekar T. Shared components of protein complexes-versatile building blocks or biochemical artefacts? Bioessays 2004;26: 1333–1343. 106. Sonoda Y, Yamamoto D, Sakurai S, Hasegawa M, Aizu-Yokota E, Momoi T and Kasahara T. FTY720, a novel immunosuppressive agent, induces apoptosis in human glioma cells. Biochem Biophys Res Commun 2001;281:282–288. 107. Vassilev LT, Vu BT, Graves B, Carvajal D, Podlaski F, Filipovic Z, Kong N, Kammlott U, Lukacs C, Klein C, Fotouhi N and Liu EA. In vivo activation of the p53 pathway by small-molecule antagonists of MDM2. Science 2004;303:844–848. 108. Oltersdorf T, Elmore SW, Shoemaker AR, Armstrong RC, Augeri DJ, Belli BA, Bruncko M, Deckwerth TL, Dinges J, Hajduk PJ, Joseph MK, Kitada S, Korsmeyer SJ, Kunzer AR, Letai A, Li C, Mitten MJ, Nettesheim DG, Ng S, Nimmer PM, O’Connor JM, Oleksijew A, Petros AM, Reed JC, Shen W, Tahir SK, Thompson CB, Tomaselli KJ, Wang B, Wendt MD, Zhang H, Fesik SW and Rosenberg SH. An

27

109.

110.

111.

112. 113.

114.

115.

116. 117.

118.

119. 120.

121.

122.

123.

inhibitor of Bcl-2 family proteins induces regression of solid tumours. Nature 2005;435:677–681. Inglis SR, Stojkoski C, Branson KM, Cawthray JF, Fritz D, Wiadrowski E, Pyke SM and Booker GW. Identification and specificity studies of small-molecule ligands for SH3 protein domains. J Med Chem 2004;47:5405–5417. Yarbrough WG, Buckmire RA, Bessho M and Liu ET. Biologic and biochemical analyses of p16(INK4a) mutations from primary tumors. J Natl Cancer Inst 1999;91:1569–1574. Cammett TJ, Luo L and Peng ZY. Design and characterization of a hyperstable p16INK4a that restores Cdk4 binding activity when combined with oncogenic mutations. J Mol Biol 2003;327:285–297. Zweifel ME, Leahy DJ, Hughson FM and Barrick D. Structure and stability of the ankyrin domain of the Drosophila Notch receptor. Protein Sci 2003;12:2622–2632. Arboleda-Velasquez JF, Rampal R, Fung E, Darland DC, Liu M, Martinez MC, Donahue CP, Navarro-Gonzalez MF, Libby P, D’Amore PA, Aikawa M, Haltiwanger RS and Kosik KS. CADASIL mutations impair Notch3 glycosylation by Fringe. Hum Mol Genet 2005;14:1631–1639. Ruf RG, Xu PX, Silvius D, Otto EA, Beekmann F, Muerb UT, Kumar S, Neuhaus TJ, Kemper MJ, Raymond RM, Jr., Brophy PD, Berkman J, Gattas M, Hyland V, Ruf EM, Schwartz C, Chang EH, Smith RJ, Stratakis CA, Weil D, Petit C and Hildebrandt F. SIX1 mutations cause branchio-oto-renal syndrome by disruption of EYA1-SIX1-DNA complexes. Proc Natl Acad Sci USA 2004;101:8090–8095. Zhou HX. Improving the understanding of human genetic diseases through predictions of protein structures and protein–protein interaction sites. Curr Med Chem 2004;11:539–549. Fan C, Liu M and Wang Q. Functional analysis of TBX5 missense mutations associated with Holt-Oram syndrome. J Biol Chem 2003;278:8780–8785. Chen T, Tsujimoto N and Li E. The PWWP domain of Dnmt3a and Dnmt3b is required for directing DNA methylation to the major satellite repeats at pericentric heterochromatin. Mol Cell Biol 2004;24:9048–9058. Ding J, Liu JJ, Kowal AS, Nardine T, Bhattacharya P, Lee A and Yang Y. Microtubuleassociated protein 1B: a neuronal binding partner for gigaxonin. J Cell Biol 2002;158:427–433. Guerrette S, Acharya S and Fishel R. The interaction of the human MutL homologues in hereditary nonpolyposis colon cancer. J Biol Chem 1999;274:6336–6341. Hu X, Plomp A, Wijnholds J, Ten Brink J, van Soest S, van den Born LI, Leys A, Peek R, de Jong PT and Bergen AA. ABCC6/MRP6 mutations: further insight into the molecular pathology of pseudoxanthoma elasticum. Eur J Hum Genet 2003;11:215–224. Ritchie HH, Hughes MR, Thompson ET, Malloy PJ, Hochberg Z, Feldman D, Pike JW and O’Malley BW. An ochre mutation in the vitamin D receptor gene causes hereditary 1,25-dihydroxyvitamin D3-resistant rickets in three families. Proc Natl Acad Sci USA 1989;86:9783–9787. Barroso I, Gurnell M, Crowley VE, Agostini M, Schwabe JW, Soos MA, Maslen GL, Williams TD, Lewis H, Schafer AJ, Chatterjee VK and O’Rahilly S. Dominant negative mutations in human PPARgamma associated with severe insulin resistance, diabetes mellitus and hypertension. Nature 1999;402:880–883. Carpten JD, Faber AL, Horn C, Donoho GP, Briggs SL, Robbins CM, Hostetter G, Boguslawski S, Moses TY, Savage S, Uhlik M, Lin A, Du J, Qian YW, Zeckner DJ, Tucker-Kellogg G, Touchman J, Patel K, Mousses S, Bittner M, Schevitz R, Lai MH,

28 Blanchard JE and Thomas JE. A transforming mutation in the pleckstrin homology domain of AKT1 in cancer. Nature 2007;448:439–444. 124. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M and Sakaki Y. A comprehensive twohybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 2001;98:4569–4574. 125. Parrish JR, Yu J, Liu G, Hines JA, Chan JE, Mangiola BA, Zhang H, Pacifico S, Fotouhi F, Dirita VJ, Ideker T, Andrews P and Finley RL, Jr. A proteome-wide protein interaction map for Campylobacter jejuni. Genome Biol 2007;8:R130. 126. Rohila JS, Chen M, Cerny R and Fromm ME. Improved tandem affinity purification tag and methods for isolation of protein heterocomplexes from plants. Plant J 2004;38: 172–181.

29

Gene expression microarray data analysis demystified Peter C. Roberts VizX Labs, 200 West Mercer Street, Suite 500, Seattle, WA 98119, USA Abstract. The increasing use of gene expression microarrays, and depositing of the resulting data into public repositories, means that more investigators are interested in using the technology either directly or through meta analysis of the publicly available data. The tools available for data analysis have generally been developed for use by experts in the field, making them difficult to use by the general research community. For those interested in entering the field, especially those without a background in statistics, it is difficult to understand why experimental results can be so variable. The purpose of this review is to go through the workflow of a typical microarray experiment, to show that decisions made at each step, from choice of platform through statistical analysis methods to biological interpretation, are all sources of this variability. Keywords: microarray, microarray data analysis, gene expression, normalization, preprocessing, statistical analysis, clustering, cross-platform comparison, correction for multiple testing, pathway, gene ontology.

Introduction Since the original description of the use of cDNA microarrays in gene expression analysis in 1995 [1], followed a year later by oligonucleotide arrays [2], the technology has rapidly moved from the domain of specialists to being available to the whole research community, through core facilities at most academic research institutions. The arrays themselves have evolved from representing less than 50 genes, to now addressing over 50,000 transcripts on whole genome arrays for complex mammalian genomes, in some cases represented by over 1 million individual features. These arrays have traditionally measured the differential expression of known and putative protein-coding genes. The gene expression microarray data analysis process can be broken down into three main parts: preprocessing, the conversion of the signal from an array scanner to a normalized value appropriate for comparison of expression across the arrays in a study; comparative statistical analysis, to identify significantly differentially expressed genes or co-expressed genes; and biological interpretation, preferably with a statistical measure of significance. Early microarray experiments only considered the difference in expression of a gene between two sets of samples, producing highly variable results; thus, it was soon realized that statistical analysis was required to obtain meaningful Tel.: +1-206-283-4363. Fax: +1-206-283-1606.

E-mail: [email protected] (P.C. Roberts). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00002-1

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

30 results. Initially standard statistical methods were used to identify significantly differentially expressed genes and these are still widely employed. However, the low number of replicates used in a typical microarray study; the variability introduced by the many technical steps in preparing samples; and large number of individual tests being performed on a single array; are inappropriate for standard statistical methods, so new statistical analysis methods have been developed. Many of these have been implemented in the opensource statistical language R [3], usually as part of the Bioconductor project [4]. Unfortunately these different approaches can result in quite different results; consequently, no ‘best method’ can be identified. Recently it has been realized that there are common themes in these approaches [5], helping to identify the most appropriate methods. As microarrays have become more widely accessible to the broader research community, they are increasingly being utilized by investigators with limited knowledge of statistics. These researchers tend to be more interested in biological insights, whereas the emphasis in terms of analysis tools has been on generating statistically robust gene lists, often at the expense of biological interpretation [6]. Also, as gene expression data is now required to be deposited in publicly accessible repositories, researchers are performing meta analysis of this data. Again, they often lack the knowledge to assess the quality of the data sets of interest to them and what is the appropriate analysis approach. Little assistance is readily available to these ‘unsophisticated’ users, and most of the analysis tools available to them are designed for statisticians and bioinformaticians, which assume a level of a priori knowledge to be able to use them effectively. In this review I will endeavor to provide a general overview of the whole gene expression microarray data generation and analysis process. The focus will be on commercial gene expression microarray platforms, as custom inhouse spotted microarrays are increasingly being supplanted by commercial microarrays. In particular, the emphasis will be on the whole genome microarrays that have been designed to address the transcriptome of a given organism. Other applications for microarrays have been developed for genotyping, microRNA measurement, chromosomal region copy number and other areas, which will not be addressed in this review. Gene expression microarray platforms Microarray designs have utilized spotted cDNAs [1] or oligonucleotide probes [2]. Depending on the platform, oligonucleotides are synthesized in situ on glass slides or synthesized, purified and attached to substrates. In the case of the Illumina platform, the purified oligonucleotides are attached to beads that are randomly dispensed into wells on a slide. In addition there is the choice of one-color or two-color experimental designs. For one-color systems a single sample is hybridized to the array. For two-color experiments, two

31 RNA samples, labeled with different dyes, are mixed and hybridized on the same microarray. Two-color systems can provide experimental design flexibility, since the second sample can either be an experimental sample or a standard RNA sample applied to all the arrays [7,8]. Although the original idea behind the development of two-color systems was that the competitive hybridization would reduce errors due to slide variation, it has been found, from well-controlled experiments, that analyzing the two samples independently, rather than generating ratios, can increase experimental accuracy [9]. The commercial microarray platforms are predominantly one-color oligonucleotide microarrays. The Agilent platform is one exception, initially designed as a two-color system [10], that has been modified for the analysis of one-color or two-color experiments. Most companies offer focused arrays for specific research areas and well characterized genes and transcripts. They also offer design and manufacturing of custom arrays. Over time the array manufacturers have increased the number of features that can be placed on one slide. Now several manufacturers offer multiplexed whole genome arrays. Table 1 compares the technologies of the major commercial whole genome microarray platforms. The key difference between platforms lies in the number and length of oligonucleotide probes on the microarrays: either short-oligonucleotide (25–30 bases) or long-oligonucleotide (50–70 bases). Long oligonucleotides have greater sensitivity and are better for analyzing low copy number mRNAs [11–13]; short oligonucleotides have better specificity, being less likely to cross-hybridize with other RNAs [14]. Most platforms have a single probe for each target, each probe occurring once at a fixed location on the array. Two platforms have multiple probes per target: Illumina BeadChips have over 20 randomly located technical replicates for each probe; Affymetrix 3u expression GeneChips have 11 probe pairs per probe set and the recently introduced GeneChip Gene ST arrays have 26 probes per gene. In addition to the differences in technology, the commercial companies differ in the number of species-specific arrays they offer. Affymetrix, through their GeneChip Consortia Program, offers a large number of microarrays to support different eukaryotic genomics projects. Nimblegen has focused on arrays for prokaryotic genomics. All manufacturers have whole genome arrays for human, mouse and rat. Gene content Most commercial microarrays for human, mouse and rat whole genomes share common targets, primarily transcripts from the National Center for Biotechnology Information (NCBI) Reference Sequence collection (RefSeq) [15]. Each manufacturer has tried to expand beyond this basic set of welldefined targets, in an endeavor to address all the protein-coding genes on the

Spotted

Spotted

Beads

Beads

In-situ photolithography In-situ photolithography Spotted

60

30

50

50

60

Nimblegen: HG18 4plex Phalanx Biotech: Human OneArray

In-situ ink-jet

60

60

60

25

In-situ photolithography In-situ photolithography

25

Affymetrix: U133 plus 2 Affymetrix: Human gene ST 1.0 Agilent: Human 4  44k Applied Biosystems: Human genome survey v2 Applied Microarrays: CodeLink human whole genome Illumina: Human-6 v2 Illumina: HumanRef-8 v2 Nimblegen: HG18

Oligonucleotide deposition

Oligo length

Manufacturer: array

30,968

24,000

47,633

22,000

48,000

57,347

34,000

42,000

28,869

54,000

Targets

1

3

8

1  W20

1  W20

1

1

1

26

11 pairs

Probes per target

32,050

72,000

3,85,000

70,00,000

100,00,000

54,841

35,000

44,000

764,885

13,00,000

Total features

1

4

1

8

6

1

1

4

1

1

Arrays per slide

Table 1. Comparison of human whole genome microarray platforms. Data obtained from manufacturers websites.

3u end

3u end

3u end

3u end

3u end

3u end

3u end

3u end

Exons

3u end

Probe location

33 genome. In most cases this predated the completion of the sequencing of the human genome [16,17], when it was assumed that there would be a lot more protein-coding genes than were ultimately found in complex mammalian genomes. UniGene clusters [18] with a limited number of expressed sequence tags (EST), proprietary sequences not available publicly, and predicted genes were all sources of additional target sequences. In most cases, probes were designed to target the 3u end of individual transcripts. The ability to map transcripts to the genome, an important modification to UniGene cluster generation, has led to consolidation into fewer transcriptional loci. The public sequence databases are continually being updated and UniGene clusters are revised on a regular basis, resulting in reassignment of some transcripts. One result of this is that probes, previously thought to map to different genes, have been shown to map to the same gene, in some cases to the same transcript. This redundancy needs to be taken into consideration during biological interpretation. Also, probes previously mapped to a gene can be disassociated from that gene, which can cause confusion when gene lists are reanalyzed. Consequently, care should be taken in the conclusions drawn from microarray experiments that the probe target is an authentic transcript of the gene it is mapped to. The microarray manufacturers do update their probe mapping but it is usually not on a consistent basis and not in sync with the public databases. This can lead to ambiguities between public information and the probe annotation supplied by the array manufacturers. Another source of ambiguity, documented for Affymetrix 3u expression arrays in particular, is probe sequence inaccuracies [19–25], so the probes do not match their stated target sequence. There are also probes that map to more than one gene, therefore are not specific. Several groups have created modified CDF files to eliminate these ambiguous probes from analysis, leading to more reliable results [21,22,25]. Gene expression microarray experiment process The standard workflow for a microarray experiment is shown in Fig. 1. The key to successful gene expression experiments is good experimental design and attention to detail. There are many technical steps between sample preparation and microarray scanning (Fig. 2), each of which can introduce new sources of error and bias, which can have a profound impact on data analysis and interpretation. To minimize the impact of technical error, it is advised that a single technician process all the samples at the same time and run them on microarrays from the same manufacturing batch [5]. If this is not possible, random samples for each condition should be distributed between either technicians or days to avoid bias [26]. The principle of randomization should also be used with multiplex arrays; in most cases more than one slide will be used and samples should be randomly assigned to the slides.

34 Experimental design

Sample preparation RNA extraction Reverse transcription In vitro transcription

Microarray processing Hybridization Washing Scanning

Data preprocessing Non-specific signal correction Normalization Filtering

Differential expression Comparative statistics Multiple test correction Clustering

Biological interpretation Gene annotation Gene ontologies Pathway analysis

Fig. 1. Gene expression microarray experiment workflow. The goal of a typical

microarray experiment is to identify genes that are statistically significantly differentially expressed and identify the underlying biological processes. The actual methods used at any point will be defined by the experimental design and the microarray platform.

Experimental design Gene expression microarray experiments are designed for one of two purposes: evaluation of differential gene expression between groups, referred to as class comparison; or for classification studies, referred to as class discovery and class prediction [27]. These experiments are expensive and time

35 Random priming

3’ in vitro transcription Purified total RNA

External RNA controls Affymetrix; tERC: Agilent; Applied Biosystems

rRNA reduction Reverse transcription cDNA

2nd cycle reverse transcription cDNA

In vitro transcription cRNA

Fragmentation Fragmented cDNA

Fragmentation Fragmented cRNA

Terminal labeling Fragmented biotinylated cDNA

Hybridization

cERC: CodeLink

cERC: Affymetrix; Applied Biosystems

cERC, control oligo: Affymetrix

Fig. 2. RNA sample processing. The traditional method for labeling RNA samples

for hybridizing to microarrays has used 3u in vitro transcription to generate labeled cRNA for hybridizing to the microarray. The microarray manufacturers usually provide kits for this process that contain external RNA controls (ERC). These are added either to the total RNA (tERC), as controls for the reverse transcription and in vitro transcription reactions; or to the cRNA (cERC) prior to or after fragmentation, as controls for the microarray processing steps. Affymetrix whole transcript arrays use a different protocol, using random hexamer priming to generate labeled cDNA that is hybridized to the microarray. This may include an rRNA reduction step, depending on the amount of starting RNA. An additional control oligo, added at the same time as the cERC, is an additional control for the microarray processing.

consuming and a good experimental design is essential, to maximize the return in terms of usable information [26–30]. Experimental design involves a clear scientific hypothesis, an appreciation of the number of factors to be compared and the confidence that can be assigned to the observations: the simplest design is usually the best. Confidence in the results, the power of the analysis, is derived from using the appropriate number of replicates [31–35]. Microarray studies utilize two types of replicates, technical or biological. Biological replicates, samples from individual subjects, are the only choice for good biological inference [26,36]. Technical replicates, a single or pooled sample applied to multiple

36 microarrays, only measure the consistency of the experimental system and provide limited biological information [26]. In some cases, such as two-color systems when controlling for dye-dependant effects, technical replicates are combined with biological replicates; the average intensity of the technical replicates should be used in subsequent statistical tests. Due to the amount of data generated from microarray experiments, classic power calculation methods for estimating the number of replicates required in an experiment are inadequate, so microarray-specific methods have been proposed [31–35]. In general a minimum of five replicates per group is recommended. This number will change based on whether the samples come from inbred or outbred animals, which will increase the biological variance [34]. The reality of microarray experiments is that investigators tend to run a limited number of replicates due to sample or cost constraints; even so, no less than three replicates should be used [6]. Pooling of samples can be used, when the cost of samples is much less than the cost of microarrays or when insufficient RNA is available to run on an array and amplification methods are not desirable [33,37,38]. The same number of samples should be used for each pool and multiple pools should be used for each group. Pooling may introduce biases and does not provide the same statistical power as analyzing individual samples, but it is better than comparing a limited number of samples [5]. Pooling of samples is not appropriate for classification studies, as these rely on inter-individual variation and co-variation [5]. For investigators with limited statistical knowledge considering running microarray experiments, it cannot be overemphasized that time spent in experimental design with a statistician will ensure that valid conclusions can be drawn from the results, especially for more complex experimental designs. Sample preparation The steps involved in sample preparation are shown in Fig. 2. There are several different commercial reagents and kits in the market suitable for RNA isolation. The key factor is to make sure that the input total RNA sample is of high integrity. The standard method for assessing this is to use an Agilent 2100 Bioanalyzer, which can be used to generate an RNA integrity number (RIN) [39], to assess RNA quality. The microarray manufacturers generally provide or recommend reagent kits for labeling samples and kits are also available from other commercial sources. These kits usually use a modification of the Eberwine method to produce complementary RNA (cRNA) labeled with a dye or other functional group [40]. The kits for specific platforms often contain external RNA controls (ERC), that are complementary in sequence to control probes on the microarrays and are often present as a concentration series, so can be used to assess concentration response [41,42]. The ERC are used to monitor both the RNA labeling reactions, by being added to the total RNA sample prior to cDNA synthesis,

37 and the hybridization, washing and scanning process, by being added to the cRNA immediately prior to hybridization. In addition to this standard approach, there are several different methods for amplifying RNA from samples with low RNA concentrations [43]. As with pooling of samples, RNA amplification can introduce biases that should be taken into consideration when designing experiments and analyzing data. Data preprocessing The purpose of data preprocessing is to convert the raw signal for labeled RNA hybridized to a probe to a normalized value, an adjustment to account for variance from technical rather than biological sources [36]. Many papers have been published on preprocessing of microarray data, especially for twocolor systems and Affymetrix arrays. Usually the process involves quantification of the signal from the microarray, background corrections and normalization within and across arrays. In most cases the microarray manufacturers provide software that adequately performs most of these functions. There are also preprocessing packages available in Bioconductor, developed by researchers not satisfied with the methods provided by the array manufacturers. A logarithmic transformation followed by quantile normalization has become the preferred method of preprocessing for onecolor microarrays [44]. Quantile normalization assumes that the arrays have a similar signal distribution, which is typical for most experiments. However, quantile normalization should not be used when comparing tissues with markedly different expression profiles. The purpose of the logarithmic transformation is to stabilize the variance inherent in the microarray data, changing calculations from multiplicative to additive [36]. However, as the intensity values approach zero, this transformation is less effective. To offset this, some algorithms add a constant to the intensity values, so computations of low-intensity signals have improved variance [45]. A large number of methods have been developed for preprocessing Affymetrix data [45,46]. This was because the standard Affymetrix software, Microarray Suite 5 (MAS5) [47] and earlier versions use the difference in signal between paired perfect match (PM) and mismatched (MM) probes, to account for nonspecific binding. In some cases the MM probe has a stronger signal than the PM probe, resulting in negative signal values, and on occasion false-negative expression. MAS5 also does not normalize across arrays, instead scaling within arrays to a defined intensity value. The alternate methods generally ignore the MM probes, using global or model-based background correction, and normalizing across arrays. The most significant source of difference between the preprocessing methods is how they perform background corrections [45]. Of the alternative protocols that have been developed, RMA [48] and GCRMA [49] are those most commonly utilized. Affymetrix has also developed a new algorithm, PLIER [50], that uses an

38 improved PM–MM background correction and quantile normalization. These methods are all available in Bioconductor, mostly in the affy package. There is some concern that they have been optimized using a limited data set from a single human array version, so may have different performance with other arrays [5,51]. It is also likely that the preprocessing method of choice will vary with experimental design. For two-color systems preprocessing involves other factors. It is well known that dyes have individual biases that need to be adjusted for. In particular, the dye cyanine 5 (Cy5), commonly used in two-color microarray experiments, is rapidly degraded by ozone [52], whereas the other commonly used dye, Cy3, is not. Consequently, air quality can become a major factor in two-color analysis. As with the Affymetrix arrays, background correction methods have a marked effect on preprocessing [53]. Local background subtraction can result in negative intensities and should be avoided in favor of model-based methods, where only positive intensity values are returned. There have been arguments for not performing any background correction but this can also result in problems with downstream analysis methods with cDNA microarrays [53]. For Agilent microarrays, the feature extraction software corrects for both dye biases and background. It has been shown that this results in an increase in the variability in low-intensity data [54], that may be due to the background correction used. In general, logarithmic transformation followed by loess smoothing, is the normalization method of choice [55]. These methods are available in the limma Bioconductor package. The large number of technical replicates for probes on Illumina arrays, allows for more robust variance stabilization and normalization, utilizing both quantile and loess normalization [56]. This can be implemented using the lumi Bioconductor package. Differential expression analysis The main goal of microarray experiments is to identify genes that are significantly differentially expressed between two or more experimental conditions, usually by comparing the average intensity values of the replicate samples. Microarray analysis methods attempt to minimize two types of error in measures of differential expression: type 1 or false-positive errors and type 2 or false-negative errors [5]. Controlling type 1 errors is the major goal of many of the statistical methods developed specifically for microarray analysis. Type 2 errors are more likely to be due to properties of the platform being used; a gene may not be detected due to limited sensitivity. Filtering data using quality values and fold change cutoffs Because some genes are not expressed in any sample and not all expressed genes are differentially expressed between samples, it makes sense to remove

39 these uninformative data points from the analysis process using filters. These are usually based on either quality flags or fold change cutoffs, though more sophisticated methods have been described [57]. The simplest measure of differential expression is fold change, the magnitude of the difference in the expression of a gene between the conditions, usually reported as a ratio or log-ratio. Using fold change alone as a measure of significant differential expression is not appropriate, as it does not assess the reproducibility of the measurement or confidence in the observation [5,58]. For low-intensity genes, a relatively small change in signal can have a marked effect on fold change, resulting in type 1 errors. A fold change cutoff value can be set for filtering genes, above which genes are considered differentially expressed; however, the cutoff value is arbitrary, and setting it too high can result in type 2 errors. During the preprocessing of the array data there is usually an assessment of whether a gene is expressed: does the gene have signal significantly above nonspecific signals? This is usually reported as a quality value, which is different for each platform: the Affymetrix MAS5 algorithm flags genes as being present (P), absent (A) or marginal (M) but RMA and GCRMA do not provide a quality metric; the Codelink platform provides similar flags as measures of signal quality; Illumina BeadStudio software provides a detection p value based on the signal from the replicate beads, which is usually reported as the inverse of the p value; Agilent feature extraction software provides several values but the IsWellAboveBG flag is generally used; and Applied Biosystems uses a signal-to-noise ratio for expression and a flag value for signal quality. Quality values can be used to set quality filters for the data. Some software allows for different filtering options: only analyzing genes with perfect quality scores in all samples; allowing genes to be analyzed where the majority of samples have a perfect quality value; or analyzing all genes regardless of quality flags. In cases where the gene is not expressed in one sample group but is in another sample group, either a nominal positive expression value needs to be used for the unexpressed samples, to avoid division by zero, or the gene should be excluded from analysis, which is usually undesirable. Comparative statistics The most common microarray experiment compares expression between just two groups of samples. This means that standard t tests are used to assess the statistical significance of the observed change in expression at an individual gene level [58]. The actual t test to be used is dependent on the experimental design. Usually an unpaired Student t test is used, as the samples are considered to have equal variance, being randomly assigned to each group. In some cases, such as before and after drug treatment, a paired t test

40 provides more power. When the samples are not equally variable a Welch’s t test is appropriate. The standard t tests report a p value for each comparison at an individual probe level. The p value is the confidence that there is a true difference in expression, and p values that fall below a nominal level, usually 0.05, are considered significant. The number of replicates in a typical microarray experiment are not usually sufficient to make standard t tests robust, as they are sensitive to the effects of outlier values [58]. In addition, the large number of individual tests being run in parallel means that there will be a large number of type 1 errors. Consequently, modified t tests have been developed specifically for microarray analysis, utilizing an approach referred to as variance shrinkage [5]. They are nonparametric, using the variance of all the genes on the array to improve the power of the test. In particular Bayesian statistical approaches have been used, as they have been found to improve analysis of microarray experiments with a limited number of replicates [59,60]. The significance analysis of microarrays (SAM) is another popular approach [61]. Though these approaches improve on the classic t tests in terms of controlling the false-positive rate, there is still no ‘best method’ [62]. When samples from more than two conditions are being compared, an analysis of variance (ANOVA) is often used to estimate the relative expression of each gene in each sample [58]. Depending on the experimental design there are different types of ANOVA that can be used. As with the t test, attention needs to be paid to the variance of the samples in a group and between the groups. When only a single factor is present, the one-way ANOVA is used to compare expression at the individual gene level. When two or more factors are being compared, a two-way ANOVA should be used [6]. For more complex designs, for instance, multiple conditions with biological and technical replicates, multi-way ANOVAs should be used [58,63]. Also, modifications of the standard ANOVA have been described that use global gene variance instead of, or combined with, gene-specific variance, to control type 1 errors [58]. Corrections for multiple testing The standard statistical approach for controlling the false-positive rate when using multiple comparisons is to use corrections for multiple testing. These adjust the p values based on the total number of tests being performed. There are two approaches that are taken: a family wise error rate (FWER) control, the simplest being the Bonferroni correction [5,58]; or a false discovery rate (FDR) correction, usually that of Benjamini and Hochberg [64]. Figure 3 shows an example of the effect of different correction methods on the number of significant genes identified. The FWER corrections adjust the p value to reflect the probability that one or more false-positive errors occur in a list. The Bonferroni correction is

41

5% PCER 12,143 probes

5% FDR 8,053 probes

5% FWER 87 probes

Fig. 3. Effect of different multiple testing approaches on significantly differentially

expressed gene lists. The numbers are from analysis of the CodeLink data set kidney control and aristolochic acid treated kidney sample from the MAQC rat toxicology study [116] (GEO accession: GSE5350) without any additional preprocessing or filtering. The CodeLink rat whole genome array has 33,790 probes. The per comparison error rate (PCER) was from an unpaired Student t test with a p value cutoff of 0.05. The Benjamini and Hochberg correction was used for FDR. The Bonferroni correction was used for the FWER. Data was generated using GeneSifter.

very stringent, dramatically reducing or even eliminating lists of differentially expressed genes and therefore increasing the chance of type 2 errors. There are step-down modifications of the Bonferroni correction that are less conservative, including those developed by Holm [65] and Westfall and Young [66]. However, the FWER corrections are a poor choice in discovery research, where a limited number of false-positive results are acceptable compared to eliminating true positives. The FDR correction is less stringent and adjusts the p value to control the frequency of type 1 errors in the list of significantly differentially expressed genes [67]. Positive FDR (pFDR) applies a factor to the FDR, equivalent to the proportion of nondifferentially expressed genes to the total number of genes, which reduces the correction [68]. This increases the power of the analysis, while not eliminating all false-positive errors [58]. Rather than controlling FDR below a threshold, it has been suggested that FDR estimating procedures are preferable for microarray analysis [5]. These assign a false-positive probability value to each differentially expressed gene. The control and estimation of FDR is an active area of investigation [69–71].

42 Cluster analysis Another commonly used method for statistically analyzing microarray data from multiple conditions is to use clustering. This can be used on normalized data without any other analysis being performed or on a statistically significant list of differentially expressed genes from an ANOVA. Clustering algorithms recognize patterns in the data [72]. Usually the clustering algorithms are unsupervised, the raw data being analyzed with no assumption of underlying structure. Two basic approaches are taken, either the visualization of overall expression patterns or the partitioning of genes into discrete groups. In either case, genes with similar expression profiles are grouped together. It is often assumed that genes that cluster together are coexpressed; however, unsupervised clustering algorithms will always produce clusters based on the parameters that are set; the quality and relevance of the clusters is not a factor. For classification applications, where the quality and relevance of the groups are very important, additional information about the relationship of the samples is used for supervised clustering algorithms. Hierarchical clustering is the original method used, and is still widely utilized [73]. This is a simple agglomerative clustering method, where genes are sequentially added to the cluster based on the similarity of their expression profile. There are also hierarchical clustering algorithms that take a divisive approach, starting with a single cluster and finishing with the individual genes, but this is more computationally intensive. It is a useful tool for visualizing expression patterns in microarray data, the typical output being a dendrogram of the genes, from which clusters of closely matched genes can be identified. Usually a graphic visualization, referred to as a heatmap, is also generated, where the log ratios of the intensity are usually represented as a color scale, from intense red for the highest positive values, through black for the mean intensity, to intense green for the most negative values. Figure 4A shows an example of the typical output. Many different algorithms have been used for partitioning microarray data. These include K-means algorithms [74], self-organizing maps (SOM) [75] and partitioning around medoids (PAM) [76]. There is no good method of knowing what is the best algorithm to use for a particular dataset [77,78]. These approaches require that the number of clusters to partition the data into be specified at the outset; however, it is difficult to assess what is the appropriate number of clusters for a particular dataset. One approach to assessing the appropriate number of clusters is to use silhouette widths [79]. These are a measure of how closely the genes in a cluster match the mean expression profile for the cluster; the larger the overall mean silhouette width, the better the clustering of the data. This requires repeated clustering using different numbers of initial nodes. Figure 4B shows an example of a PAM output with silhouette values. Other methods have also been developed to

B

Affymetrix human U133A GeneChips (GEO accession: GSE3218) and preprocessed with GCRMA in GeneSifter. (A) Heatmap and dendrogram from hierarchical clustering of genes of the Wnt signaling pathway between the sample groups. (B) Partitioning around medoids (PAM) silhouettes, four clusters from 4,975 significantly differentially expressed genes, identified using a one-way ANOVA and Benjamini and Hochberg FDR correction, with an adjusted p value o0.0001 and at least a four-fold change in expression compared to a normal testis control. (The color version of this figure is hosted on Science Direct.)

Fig. 4. Examples of hierarchical and partition clustering. The data is from a study of male germ cell tumor samples analyzed using

NT S CC E1 E2 T YS

A

44 address the question [80–82]. There is a similar issue with validation of the quality of the clusters [77,82]. Intuitively, genes in the same cluster are co-expressed and therefore share a biological function; however, this assumption is often not borne out by the functional analysis of the genes in a cluster [72]. The reasons for this are: the complexity of the biology underlying a given gene list; the limited number of samples relative to the number of genes on a typical microarray; and the strict assignment of genes to clusters. Standard clustering techniques assign a gene to a single cluster and, once assigned, it cannot be reassigned to a different cluster. Fuzzy clustering has been used to address this problem, by assigning probabilities to genes in clusters; ultimately, the gene is assigned to the cluster where it has the highest probability score [83]. Other approaches, for instance principal component analysis (PCA) [84] and independent component analysis (ICA) [85,86], have used linear models, which allow genes to belong to more than one cluster. Supervised classification To use microarray data as a phenotype classification tool, it is necessary to identify a set of discriminative genes that can be used to assign samples to pre-defined categories. The original description of the use of gene expression microarray data for cancer classification was that of Golub [87]. Since then there have been many publications describing different approaches to the problem [5,88], utilizing several standard data sets for cancer classifier testing [88]. There are several inherent problems in the development of classifiers, including the presence of redundant transcripts in gene expression data; and introducing bias by not using completely separate data sets to create and then validate the predictive algorithm. This leads to models that are susceptible to overfitting, performing well with test data but not with new data [5]. It is widely accepted that a smaller set of non-redundant informative genes will provide the most accurate classifiers; however, there is no current method of choice to identify such genes. Biological interpretation Once a list of significantly differentially expressed genes has been obtained, the next consideration is the identification of the biological processes represented in the list. The information associated with a particular gene, the annotation, is available from many online sources [89–91]. The NCBI has many annotation resources [92], including Entrez Gene, which integrates much of the gene information for a large number of organisms. There are similar resources at the European Bioinformatics Institute (EBI) [93]. Two sources of extensive gene and genome level annotation for multiple species are Ensembl [94] and the UCSC Genome Browser [95], that has tracks

45 showing the location of Affymetrix probe sets. For organisms other than human, mouse and rat annotation is generally sparse. Intensively studied organisms, such as yeast and Drosophila, have rich data resources. Many of the organisms that have had their genomes sequenced have very limited annotation, usually in a dedicated database that is difficult to query. Commonly used sources of functional information associated with genes are the Gene Ontology (GO) database [96] and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database [97]. Useful functional information can also be found at PANTHER [98], which has pathway and ontology information. It is relatively easy to collect information for a single gene, if somewhat time consuming, but it is not easy to identify broad biological themes in this way. Microarray gene lists can have thousands of entries and it can be difficult to query databases to obtain all the related information and mine that information for common themes. For GO several tools have been developed that can take a list of genes and return lists of ontologies, with statistical measures of their significance [99–101]. Similar reports can be generated for pathways [99]. Statistical significance is assessed using either z scores or p values and FDR corrections. The z score represents whether the number of genes associated with an ontological term or pathway is significantly overrepresented, a score W2, or underrepresented, a score o2, compared to the normal distribution. The DAVID system integrates a lot of different data sources and provides rich functional reports for microarray gene lists [102]. Gene expression microarray data analysis software Software is absolutely essential to the analysis of microarray data. However, there are very few software packages that cover all the steps in microarray analysis. This means that data tends to go through a series of individual software applications that mirror the steps in the workflow in Fig. 1. There is also a limited selection of software resources that not only provide analysis tools but also associated data storage and management capabilities. Open-source software There are several papers published each month on some aspect of microarray analysis and these papers generally have a link to access the associated software. Some are set up as websites, some software is available as Microsoft Excel plug-ins [61], but most of the statistical approaches appear as packages in Bioconductor [4]. Bioconductor is a software resource for genomics data analysis for biostatisticians and bioinformatics experts, who appreciate its power and flexibility, and do not mind the difficult interface. As mentioned earlier, there are specific packages in Bioconductor for the

46 different commercial microarray platforms. Other packages can be more general in nature and gather tools for specific analysis approaches, for instance Bayesian methods. For a casual user the learning curve is steep, requiring the learning of the R statistical scripting language [3], on which Bioconductor is based. Some graphical user interfaces have been developed to make Bioconductor more accessible. As previously mentioned, the DAVID Knowledgebase and associated tools are a popular application for biological interpretation [102]. Finding other sources of microarray analysis and interpretation applications is a daunting task [103], due to the sheer number available, though attempts to catalog them have been made [104]. Commercial software The commercial software available for microarray analysis integrates much of the functionality available in separate Bioconductor packages. The applications often support only one or two microarray platforms. Most have preprocessing, differential expression analysis and clustering tools and allow the integration of R or other scripting languages. There is often an emphasis on data visualization. However, biological interpretation tools are usually not well integrated, unless that is the focus of the application. Table 2 summarizes the capabilities of the most popular commercial applications. Though the user interfaces make them easier to use than most open-source software, they are generally designed for sophisticated users and so can be difficult to learn for nonexperts. The exception is GeneSifter, the only webbased application, which was designed with the philosophy that non expert research scientists should be able to perform their own microarray data analysis for common experimental designs. This system has broad microarray platform support, built-in data management, preprocessing, differential expression analysis, clustering tools and integrated biological significance analysis. A common criticism of commercial software is that it is a ‘black box’; the actual code being run for a statistical analysis being inaccessible to the user. GeneSifter only uses algorithms from R and Bioconductor. Cross-platform comparisons In spite of attempts to standardize microarray data reporting, led by the Microarray Gene Expression Database Society (MGED) [105] with the Minimum Information About a Microarray Experiment (MIAME) standard [106], most attempts to compare results from different microarray platforms tended to show poor concordance [12,23,107–111]. The sources of the inconsistencies were identified as: the difficulty of comparing platforms at a gene level [12]; employing different protocols for sample preparation [108,110]; and different statistical tests in data analysis [111]. When

Integromics: ArrayHub

TIBCO: Spotfire Decision Site VizX Labs: GeneSifter Biodiscovery: GeneDirector Genologics: Geneus X

X X

X X X X

All

All

Affymetrix, Illumina Affymetrix, ABI, twocolor

X

X

X

Affymetrix, Illumina All

X

X

All

X

X

X

X

X

X

X X

X

X

X

X

X

Differential expression

X

All

X

X

X

Affymetrix, Nimblegen Affymetrix

Affymetrix, two-color Genepix, other Affymetrix, cDNA

X

Affymetrix

Genomatix: ChipInspector Insightful: S+ArrayAnalyzer Molecular Devices: Acuity Ocimum Biosolutions: Genowiz Partek: Genomics Suite Rosetta Biosoftware: Resolver SAS: JMP Genomics

X

All

Preprocessing

Agilent: Genespring GX Biotique Systems: X-ray DNAStar: ArrayStar

Data storage

Microarray platforms

Company: product

X

X

X

X

X

X

X

X

X

X

X

X

Gene annotation

X

X

X

X

X

Functional analysis

Table 2. Comparison of commercial gene expression microarray data analysis software capabilities.

Win

Win, Mac, Linux Client/server

Web browser

Win

Win

Client/server

Win, Linux

Win, Mac, Linux

Win, Mac, Linux Win, Linux, Solaris Client/server

Win

Win

Win, Mac

Computer platform

R

S-Plus

SAS

Proprietary

Proprietary

Proprietary

Proprietary

S-Plus

Proprietary

Proprietary

Excel

Proprietary

Statistics

48 comparing cDNA and short-oligonucleotide platforms, similar trends in direction of differential expression, but not magnitude, were observed [109]. It was also found that results were more variable between laboratories than between platforms when common samples were run at several sites [108,110]. When common procedures were implemented, and good consistency between replicate samples was achieved, the consistency between laboratories and platforms improved. Microarrays were identified as being key technologies in future submissions to the U.S. Food and Drug Administration (FDA) and the lack of uniformity between studies and platforms was of major concern. This led to two initiatives, the MicroArray Quality Control Consortium (MAQC) [111] from the FDA and External RNA Control Consortium (ERCC) [112] from the National Institute of Standards and Technology (NIST). Both represented government, commercial and academic interests. The MAQC Consortium published a group of papers in September 2006 addressing many of the issues that had been raised, to try and identify the sources of discordance observed in other studies [9,41,113–116]. All the data generated is publicly available and acts as a superb resource for comparing analysis methods. The main study [115] utilized a pooled titration series using different ratios of two reference RNA samples: a Universal Human Reference RNA and a Human Brain Reference RNA. These were assayed on six different commercial microarray platforms and the NCI provided an in-house spotted oligonucleotide microarrays. Each platform was used at multiple test sites. Each test site ran five replicate assays for each RNA pool. The microarray providers used their own software for intensity signal quantification and a quality measure for each probe on the array. However, this made the resulting analysis more complicated because of the differences in data preprocessing and quality filtering. To make sure that, as far as possible, each platform was actually measuring the same target, a common set of 12,091 probes, mapping to a non-redundant list of genes and transcripts, was identified. The generation of this list was aided by the actual probe sequences being made available by the companies involved, allowing more rigorous comparisons than had previously been possible. In general, the inter-platform detection of these genes varied more than intra-platform variation between test sites; however, the different platforms appeared to detect similar changes in gene abundance. The optimum method for generating overlapping gene lists between platforms was to use a ranked fold change analysis, which ignored the absolute degree of observed change. Using standard statistical tests reduced the concordance between platforms. Perhaps not surprisingly, the simple statistical methods chosen have been challenged [117]. The two reference RNAs were also used to compare the concordance of results from one-color and two-color microarray studies [9]. This comparison used three different platforms, hybridizing both one-color and two-color

49 samples to each. This eliminated problems due to different platforms being used for each experimental design. Within each platform there were high correlation coefficients and good concordance between the differentially expressed gene lists for the two approaches. Two-color designs appeared to have slightly better sensitivity but one-color designs had lower compression of the expression values. Using individual intensity values rather than ratios for the two-color design appeared to have the greatest sensitivity. As well as the artificial RNA sample comparison of the main MAQC study, four platforms were compared using a biologically relevant toxicogenomics data set [116]. This data was derived from four groups of six rats treated with a range of plant-derived toxins [31]. Liver samples were collected for all groups, as well as kidney samples from control animals and those treated with aristolochic acid, a nephrotoxic compound. These six groups of six replicate samples were assayed on five sets of rat whole genome microarrays from four commercial sources. The results validated the approach used in the main study that rank fold change was the best method of maximizing gene list overlap between platforms. The fold change ranking also resulted in much better concordance when comparing platforms, based on the biological functions significantly represented in gene lists. The MAQC study also included validation of the microarray data using quantitative measurement of gene expression [113]. Generally, good concordance was seen between the quantitative assays and the microarray results for genes that were detectable on both platforms. Where discordance was observed it could be explained by difference in the location of probes, meaning that alternate splice variants could be detected. Genes found to have low expression by the quantitative assays were those that showed the most variable concordance between both the quantitative assays and microarray platforms and between the different microarray platforms in the main MAQC study. This reflected the sensitivity range of the different microarray platforms. Overall, the MAQC study showed that data was consistent at individual test sites, reproducible between test sites and comparable between platforms. However, to ensure the reliability of microarray-based studies, there is still a need for unified metrics and standards, to identify poor quality arrays and monitor performance at microarray facilities.

Public gene expression data repositories A condition of many journals for publishing papers in which gene expression microarray data is described is that the data has to be publicly accessible. This is also a condition of federal grant funding agencies. Two major data repositories are the Gene Expression Omnibus (GEO) [118] at the NCBI, and ArrayExpress [119] at the EBI. The data submitted to the repositories has to

50 be MIAME-compliant, so all the experimental details are available. Many submitters also include the original raw data files. The two repositories currently contain nearly 300,000 individual samples in approximately 10,000 studies, most of which are from gene expression microarray experiments. This huge volume of data is available for meta analysis, either to extend or confirm researchers own results, or for data mining, for instance to identify previously missed gene and disease relationships [120,121]. However, when performing meta analysis it is important to remember that analysis across microarray platforms is not straightforward.

The future The original design philosophy for gene expression microarrays was to measure the expression of all protein-coding genes. However, with the refinement of whole genome annotation, it has become clear that the estimates of the number of these genes have been over optimistic. The latest estimate for humans is approximately 20,500 [122]. This will lead to further consolidation of gene expression microarray platform content, ultimately leading to the same set of genes and transcripts being represented on each platform. As the content goes down and feature densities increase, multiplexing of arrays will increase. At the same time the number of potential non-coding RNA transcripts has dramatically increased after the publication of the results of the ENCODE pilot project [123]. Tiling microarrays were extensively used in this and the follow-up project. It is likely that new microarrays will appear to measure the diverse RNA species that are being identified and also to study transcriptional control elements. This is already occurring with microarrays for micro RNAs and recent announcements of the release of microarrays for several platforms containing CpG elements. It is likely that microarray platforms will face increasing competition from high-density RT-PCR based platforms. Another technology that will have a profound effect on microarrays is digital gene expression (DGE) [124], using the new massively parallel sequencing systems. DGE is basically an extension of serial analysis of gene expression (SAGE) [125], a tag counting method of gene expression analysis, first described in the same issue of Science as the original cDNA microarray paper. The problem with SAGE was that it was expensive to sequence enough tags to get true quantitation of gene expression. The new sequencing technologies mean that deep sequencing is much cheaper and much faster and millions, rather than thousands, of tags are sequenced. This technology has been touted as the end of microarrays, and has resulted in Applied Biosystems Inc. abandoning their microarray platform in favor of their new sequencing platform. It is more likely that

51 the two technologies will be complementary; DGE results leading to new microarray designs. As well as new technologies for measuring gene expression, the use of microfluidic devices to study gene expression at a single cell level, and laser microdissection to analyze select sets of cells from tissues, will lead to a greater appreciation of localized gene expression and potentially better classification. Gene expression microarray data analysis has generally reached a state of consensus. However, developments in microarray technology, new competing technologies and sample preparation will require new analysis methods or refining of existing methods. Improvement in classification methods and more biologically meaningful clustering approaches are required. Increased interest in meta analysis of co-variance of genes across studies and across platforms should see improvements in data analysis methods for this purpose. Also, gene expression microarray data is increasingly being used with other types of data to obtain a larger biological picture, referred to as systems biology. This is a rapidly expanding field and integration of these disparate data types will be a challenge. Acknowledgements The author would like to thank Leah Klein for helpful discussions and critical review of the manuscript. References 1. Schena M, Shalon D, Davis RW and Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995;270:467–470. 2. Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, Mittmann M, Wang C, Kobayashi M, Horton H and Brown EL. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 1996;14:1675–1680. 3. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, 2007. 4. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH and Zhang J. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 2004;5:R80. 5. Allison DB, Cui X, Page GP and Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet 2006;7:55–65. 6. Olson NE. The microarray data analysis process: from raw data to biological significance. NeuroRx 2006;3:373–383. 7. Kerr KF, Serikawa KA, Wei C, Peters MA and Bumgarner RE. What is the best reference RNA? And other questions regarding the design and analysis of two-color microarray experiments. OMICS 2007;11:152–165.

52 8. Novoradovskaya N, Whitfield ML, Basehore LS, Novoradovsky A, Pesich R, Usary J, Karaca M, Wong WK, Aprelikova O, Fero M, Perou CM, Botstein D and Braman J. Universal reference RNA as a standard for microarray experiments. BMC Genomics 2004;5:20. 9. Patterson TA, Lobenhofer EK, Fulmer-Smentek SB, Collins PJ, Chu T, Bao W, Fang H, Kawasaki ES, Hager J, Tikhonova IR, Walker SJ, Zhang L, Hurban P, de Longueville F, Fuscoe JC, Tong W, Shi L and Wolfinger RD. Performance comparison of one-color and two-color platforms within the MicroArray Quality Control (MAQC) project. Nat Biotechnol 2006;24:1140–1150. 10. Hughes TR, Mao M, Jones AR, Burchard J, Marton MJ, Shannon KW, Lefkowitz SM, Ziman M, Schelter JM, Meyer MR, Kobayashi S, Davis C, Dai H, He YD, Stephaniants SB, Cavet G, Walker WL, West A, Coffey E, Shoemaker DD, Stoughton R, Blanchard AP, Friend SH and Linsley PS. Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol 2001;19:342–347. 11. Chou C, Chen C, Lee T and Peck K. Optimization of probe length and the number of probes per gene for optimal microarray analysis of gene expression. Nucleic Acids Res 2004;32:e99. 12. Shippy R, Sendera TJ, Lockner R, Palaniappan C, Kaysser-Kranich T, Watts G and Alsobrook J. Performance evaluation of commercial short-oligonucleotide microarrays and the impact of noise in making cross-platform correlations. BMC Genomics 2004;5:61. 13. Ramdas L, Cogdell DE, Jia JY, Taylor EE, Dunmire VR, Hu L, Hamilton SR and Zhang W. Improving signal intensities for genes with low-expression on oligonucleotide microarrays. BMC Genomics 2004;5:35. 14. Barrett JC and Kawasaki ES. Microarrays: the use of oligonucleotides and cDNA for the analysis of gene expression. Drug Discov Today 2003;8:134–141. 15. Pruitt KD, Tatusova T and Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 2007;35:D61–D65. 16. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, Stange-Thomann N, Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A, Dunham I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A, Plumb R, Ross M, Shownkeen R, Sims S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P, Wenning S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh T, Kawagoe C, Watanabe H, Totoki Y, Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S,

53 Rump RW, Yang H, Yu J, Wang J, Huang G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA, Abola AP, Proctor MJ, Myers RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki K, Minoshima S, Evans GA, Athanasiou M, Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blo¨cker H, Hornischer K, Nordsiek G, Agarwala R, Aravind L, Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS, Galagan J, Gilbert JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P, Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka E, Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A, Wetterstrand KA, Patrinos A, Morgan MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S and Chen YJ. Initial sequencing and analysis of the human genome. Nature 2001;409:860–921. 17. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, Levine AJ, Roberts RJ, Simon M, Slayman C, Hunkapiller M, Bolanos R, Delcher A, Dew I, Fasulo D, Flanigan M, Florea L, Halpern A, Hannenhalli S, Kravitz S, Levy S, Mobarry C, Reinert K, Remington K, Abu-Threideh J, Beasley E, Biddick K, Bonazzi V, Brandon R, Cargill M, Chandramouliswaran I, Charlab R, Chaturvedi K, Deng Z, Di Francesco V, Dunn P, Eilbeck K, Evangelista C, Gabrielian AE, Gan W, Ge W, Gong F, Gu Z, Guan P, Heiman TJ, Higgins ME, Ji RR, Ke Z, Ketchum KA, Lai Z, Lei Y, Li Z, Li J, Liang Y, Lin X, Lu F, Merkulov GV, Milshina N, Moore HM, Naik AK, Narayan VA, Neelam B, Nusskern D, Rusch DB, Salzberg S, Shao W, Shue B, Sun J, Wang Z, Wang A, Wang X, Wang J, Wei M, Wides R, Xiao C, Yan C, Yao A, Ye J, Zhan M, Zhang W, Zhang H, Zhao Q, Zheng L, Zhong F, Zhong W, Zhu S, Zhao S, Gilbert D, Baumhueter S, Spier G, Carter C, Cravchik A, Woodage T, Ali F, An H, Awe A, Baldwin D, Baden H, Barnstead M, Barrow I, Beeson K, Busam D, Carver A, Center A, Cheng ML, Curry L, Danaher S, Davenport L, Desilets R, Dietz S, Dodson K, Doup L, Ferriera S, Garg N, Gluecksmann A, Hart B, Haynes J, Haynes C, Heiner C, Hladun S, Hostin D, Houck J, Howland T, Ibegwam C, Johnson J, Kalush F, Kline L, Koduru S, Love A, Mann F, May D, McCawley S, McIntosh T, McMullen I, Moy M, Moy L, Murphy B, Nelson K, Pfannkoch C, Pratts E, Puri V, Qureshi H, Reardon M, Rodriguez R, Rogers YH, Romblad D, Ruhfel B, Scott R, Sitter C, Smallwood M, Stewart E, Strong R, Suh E, Thomas R, Tint NN, Tse S, Vech C, Wang G, Wetter J, Williams S, Williams M, Windsor S, Winn-Deen E, Wolfe K, Zaveri J, Zaveri K, Abril JF, Guigo´ R, Campbell MJ, Sjolander KV, Karlak B, Kejariwal A, Mi H, Lazareva B, Hatton T, Narechania A, Diemer K, Muruganujan A, Guo N, Sato S, Bafna V, Istrail S, Lippert R, Schwartz R, Walenz B, Yooseph S, Allen D, Basu A, Baxendale J, Blick L, Caminha M, Carnes-Stine J, Caulk P, Chiang YH, Coyne M, Dahlke C, Mays A, Dombroski M, Donnelly M, Ely D, Esparham S, Fosler C, Gire H, Glanowski S, Glasser K, Glodek A, Gorokhov M, Graham K, Gropman B, Harris M, Heil J, Henderson S, Hoover J, Jennings D, Jordan C, Jordan J, Kasha J, Kagan L, Kraft C, Levitsky A, Lewis M, Liu X, Lopez J, Ma D, Majoros W, McDaniel J,

54

18. 19.

20.

21.

22.

23. 24. 25.

26. 27. 28. 29. 30.

31.

32. 33.

34.

Murphy S, Newman M, Nguyen T, Nguyen N, Nodell M, Pan S, Peck J, Peterson M, Rowe W, Sanders R, Scott J, Simpson M, Smith T, Sprague A, Stockwell T, Turner R, Venter E, Wang M, Wen M, Wu D, Wu M, Xia A, Zandieh A and Zhu X. The sequence of the human genome. Science 2001;291:1304–1351. Schuler GD. Pieces of the puzzle: expressed sequence tags and the catalog of human genes. J Mol Med 1997;75:1432–1440. Mecham BH, Wetmore DZ, Szallasi Z, Sadovsky Y, Kohane I and Mariani TJ. Increased measurement accuracy for sequence-verified microarray probes. Physiol Genomics 2004;18:308–315. Harbig J, Sprinkle R and Enkemann SA. A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 array. Nucleic Acids Res 2005;33:e31. Carter SL, Eklund AC, Mecham BH, Kohane IS and Szallasi Z. Redefinition of Affymetrix probe sets by sequence overlap with cDNA microarray probes reduces crossplatform inconsistencies in cancer-associated gene expression measurements. BMC Bioinformatics 2005;6:107. Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, Watson SJ and Meng F. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 2005;33:e175. Draghici S, Khatri P, Eklund AC and Szallasi Z. Reliability and reproducibility issues in DNA microarray measurements. Trends Genet 2006;22:101–109. Okoniewski MJ and Miller CJ. Hybridization interactions between probesets in short oligo microarrays lead to spurious correlations. BMC Bioinformatics 2006;7:276. Alberts R, Terpstra P, Hardonk M, Bystrykh LV, de Haan G, Breitling R, Nap J and Jansen RC. A verification protocol for the probe sequences of Affymetrix genome arrays reveals high probe accuracy for studies in mouse, human and rat. BMC Bioinformatics 2007;8:132. Kerr MK. Design considerations for efficient and effective microarray studies. Biometrics 2003;59:822–828. Miller LD, Long PM, Wong L, Mukherjee S, McShane LM and Liu ET. Optimal gene expression analysis by microarrays. Cancer Cell 2002;2:353–361. Yang YH and Speed T. Design issues for cDNA microarray experiments. Nat Rev Genet 2002;3:579–588. Zhang S and Gant TW. A statistical framework for the design of microarray experiments and effective detection of differential gene expression. Bioinformatics 2004;20:2821–2828. Hsu JC, Chang J, Wang T, Steingrı´ msson E, Magnu´sson MK and Bergsteinsdottir K. Statistically designing microarrays and microarray experiments to enhance sensitivity and specificity. Brief Bioinform 2007;8:22–31. Pan W, Lin J and Le CT. How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol 2002;3,research0022. Pavlidis P, Li Q and Noble WS. The effect of replication on gene expression microarray experiments. Bioinformatics 2003;19:1620–1627. Han E, Wu Y, McCarter R, Nelson JF, Richardson A and Hilsenbeck SG. Reproducibility, sources of variability, pooling, and sample size: important considerations for the design of high-density oligonucleotide array experiments. J Gerontol A Biol Sci Med Sci 2004;59:306–315. Wei C, Li J and Bumgarner RE. Sample size for detecting differentially expressed genes in microarray experiments. BMC Genomics 2004;5:87.

55 35. Tsai C, Wang S, Chen D and Chen JJ. Sample size for gene expression microarray experiments. Bioinformatics 2005;21:1502–1508. 36. Kreil DP and Russell RR. There is no silver bullet – a guide to low-level data transforms and normalisation methods for microarray data. Brief Bioinform 2005;6:86–97. 37. Kendziorski C, Irizarry RA, Chen K, Haag JD and Gould MN. On the utility of pooling biological samples in microarray experiments. Proc Natl Acad Sci USA 2005;102:4252–4257. 38. Mary-Huard T, Daudin J, Baccini M, Biggeri A and Bar-Hen A. Biases induced by pooling samples in microarray experiments. Bioinformatics 2007;23:i313–i318. 39. Schroeder A, Mueller O, Stocker S, Salowsky R, Leiber M, Gassmann M, Lightfoot S, Menzel W, Granzow M and Ragg T. The RIN: an RNA integrity number for assigning integrity values to RNA measurements. BMC Mol Biol 2006;7:3. 40. Van Gelder RN, von Zastrow ME, Yool A, Dement WC, Barchas JD and Eberwine JH. Amplified RNA synthesized from limited quantities of heterogeneous cDNA. Proc Natl Acad Sci USA 1990;87:1663–1667. 41. Tong W, Lucas AB, Shippy R, Fan X, Fang H, Hong H, Orr MS, Chu T, Guo X, Collins PJ, Sun YA, Wang S, Bao W, Wolfinger RD, Shchegrova S, Guo L, Warrington JA and Shi L. Evaluation of external RNA controls for the assessment of microarray performance. Nat Biotechnol 2006;24:1132–1139. 42. Kerr KF. Extended analysis of benchmark datasets for Agilent two-color microarrays. BMC Bioinformatics 2007;8:371. 43. Nygaard V and Hovig E. Options available for profiling small samples: a review of sample amplification technology when combined with microarray profiling. Nucleic Acids Res 2006;34:996–1014. 44. Bolstad BM, Irizarry RA, Astrand M and Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003;19:185–193. 45. Irizarry RA, Wu Z and Jaffee HA. Comparison of Affymetrix GeneChip expression measures. Bioinformatics 2006;22:789–794. 46. Seo J and Hoffman EP. Probe set algorithms: is there a rational best bet? BMC Bioinformatics 2006;7:395. 47. Affymetrix. New statistical algorithms for monitoring gene expression on GeneChip probe arrays Technical Note. 2001. 48. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U and Speed TP. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003;4:249–264. 49. Wu Z, Irizarry RA, Gentleman R, Martinez-Murillo F and Spencer F. A model-based background adjustment for oligonucleotide expression arrays. J Am Stat Assoc 2004;99:909–917. 50. Affymetrix. Guide to probe logarithmic intensity error (PLIER) estimation Technical Note. 2005. 51. Shedden K, Chen W, Kuick R, Ghosh D, Macdonald J, Cho KR, Giordano TJ, Gruber SB, Fearon ER, Taylor JMG and Hanash S. Comparison of seven methods for producing Affymetrix expression scores based on false discovery rates in disease profiling data. BMC Bioinformatics 2005;6:26. 52. Fare TL, Coffey EM, Dai H, He YD, Kessler DA, Kilian KA, Koch JE, LeProust E, Marton MJ, Meyer MR, Stoughton RB, Tokiwa GY and Wang Y. Effects of atmospheric ozone on microarray data quality. Anal Chem 2003;75: 4672–4675.

56 53. Ritchie ME, Silver J, Oshlack A, Holmes M, Diyagama D, Holloway A and Smyth GK. A comparison of background correction methods for two-colour microarrays. Bioinformatics 2007;23:2700–2707. 54. Zahurak M, Parmigiani G, Yu W, Scharpf RB, Berman D, Schaeffer E, Shabbeer S and Cope L. Pre-processing Agilent microarray data. BMC Bioinformatics 2007;8:142. 55. Smyth GK and Speed T. Normalization of cDNA microarray data. Methods 2003;31:265–273. 56. Lin SM, Du P and Kibbe WA. Model-based variance-stabilizing transformation for Illumina microarray. Nucleic Acids Res 2008;36:e11. 57. Calza S, Raffelsberger W, Ploner A, Sahel J, Leveillard T and Pawitan Y. Filtering genes to improve sensitivity in oligonucleotide microarray data analysis. Nucleic Acids Res 2007;35:e102. 58. Cui X and Churchill GA. Statistical tests for differential expression in cDNA microarray experiments. Genome Biol 2003;4:210. 59. Baldi P and Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t test and statistical inferences of gene changes. Bioinformatics 2001;17:509–519. 60. Sartor MA, Tomlinson CR, Wesselkamper SC, Sivaganesan S, Leikauf GD and Medvedovic M. Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments. BMC Bioinformatics 2006;7:538. 61. Tusher VG, Tibshirani R and Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001;98:5116–5121. 62. Jeffery IB, Higgins DG and Culhane AC. Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics 2006;7:359. 63. Li H, Wood CL, Getchell TV, Getchell ML and Stromberg AJ. Analysis of oligonucleotide array experiments with repeated measures using mixed models. BMC Bioinformatics 2004;5:209. 64. Benjamini Y and Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc. Ser B (Methodological) 1995;57:289–300. 65. Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat 1979;6: 65–70. 66. Westfall P and Young S. Resampling-based multiple testing: examples and methods for p-value adjustment, Wiley, 1993. 67. Reiner A, Yekutieli D and Benjamini Y. Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 2003;19:368–375. 68. Storey JD. A direct approach to false discovery rates. J R Stat Soc: Ser B (Statistical Methodology) 2002;64:479–498. 69. Lu X and Perkins DL. Re-sampling strategy to improve the estimation of number of null hypotheses in FDR control under strong correlation structures. BMC Bioinformatics 2007;8:157. 70. Ploner A, Calza S, Gusnanto A and Pawitan Y. Multidimensional local false discovery rate for microarray studies. Bioinformatics 2006;22:556–565. 71. Perelman E, Ploner A, Calza S and Pawitan Y. Detecting differential expression in microarray data: comparison of optimal procedures. BMC Bioinformatics 2007;8:28. 72. Boutros PC and Okey AB. Unsupervised pattern recognition: an introduction to the whys and wherefores of clustering microarray data. Brief Bioinform 2005;6:331–343.

57 73. Eisen MB, Spellman PT, Brown PO and Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998;95:14863–14868. 74. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ and Church GM. Systematic determination of genetic network architecture. Nat Genet 1999;22:281–285. 75. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES and Golub TR. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 1999;96:2907–2912. 76. Kaufman L and Rousseeuw PJ. Finding groups in data. An introduction to cluster analysis, Wiley, 1990. 77. Garge NR, Page GP, Sprague AP, Gorman BS and Allison DB. Reproducible clusters from microarray research: whither? BMC Bioinformatics 2005;6(Suppl. 2):S10. 78. Datta S and Datta S. Evaluation of clustering algorithms for gene expression data. BMC Bioinformatics 2006;7(Suppl. 4):S17. 79. Rousseeuw P. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 1987;20:53–65. 80. Tibshirani R, Walther G and Hastie T. Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc: Ser B (Statistical Methodology) 2001;63:411–423. 81. Dudoit S and Fridlyand J. A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biol 2002;3,RESEARCH0036. 82. Handl J, Knowles J and Kell DB. Computational cluster validation in post-genomic data analysis. Bioinformatics 2005;21:3201–3212. 83. Dembe´le´ D and Kastner P. Fuzzy C-means method for clustering microarray data. Bioinformatics 2003;19:973–980. 84. Raychaudhuri S, Stuart JM and Altman RB. Principal components analysis to summarize microarray experiments: application to sporulation time series. Pac Symp Biocomput 2000;5:455–466. 85. Liebermeister W. Linear modes of gene expression determined by independent component analysis. Bioinformatics 2002;18:51–60. 86. Teschendorff AE, Journe´e M, Absil PA, Sepulchre R and Caldas C. Elucidating the altered transcriptional programs in breast cancer using independent component analysis. PLoS Comput Biol 2007;3:e161. 87. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD and Lander ES. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999;286:531–537. 88. Zhang J and Deng H. Gene selection for classification of microarray data based on the Bayes error. BMC Bioinformatics 2007;8:370. 89. Troyanskaya OG. Putting microarrays in a context: integrated analysis of diverse biological data. Brief Bioinform 2005;6:34–43. 90. Teufel A, Krupp M, Weinmann A and Galle PR. Current bioinformatics tools in genomic biomedical research (Review). Int J Mol Med 2006;17:967–973. 91. Quackenbush J. Extracting biology from high-dimensional biological data. J Exp Biol 2007;210:1507–1517. 92. Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Ostell J, Miller V, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L and Yaschenko E. Database resources of the national center for biotechnology information. Nucleic Acids Res 2007;35:D5–D12.

58 93. European Biotechnology Institute: http://www.ebi.ac.uk/. 94. Hubbard TJP, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Ouverdin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A and Birney E. Ensembl 2007. Nucleic Acids Res 2007;35:D610–D617. 95. Kuhn RM, Karolchik D, Zweig AS, Trumbower H, Thomas DJ, Thakkapallayil A, Sugnet CW, Stanke M, Smith KE, Siepel A, Rosenbloom KR, Rhead B, Raney BJ, Pohl A, Pedersen JS, Hsu F, Hinrichs AS, Harte RA, Diekhans M, Clawson H, Bejerano G, Barber GP, Baertsch R, Haussler D and Kent WJ. The UCSC genome browser database: update 2007. Nucleic Acids Res 2007;35:D668–D673. 96. Gene Ontology Consortium. The Gene Ontology Project in 2008. Nucleic Acids Res 2008;36:D440–D444. 97. Aoki-Kinoshita KF and Kanehisa M. Gene annotation and pathway mapping in KEGG. Methods Mol Biol 2007;396:71–92. 98. Mi H, Guo N, Kejariwal A and Thomas PD. PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways. Nucleic Acids Res 2007;35:D247–D252. 99. Doniger SW, Salomonis N, Dahlquist KD, Vranizan K, Lawlor SC and Conklin BR. MAPPFinder: using gene ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol 2003;4:R7. 100. Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M, Narasimhan S, Kane DW, Reinhold WC, Lababidi S, Bussey KJ, Riss J, Barrett JC and Weinstein JN. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 2003;4:R28. 101. Khatri P and Dra˘ghici S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 2005;21:3587–3595. 102. Huang DW, Sherman BT, Tan Q, Kir J, Liu D, Bryant D, Guo Y, Stephens R, Baseler MW, Lane HC and Lempicki RA. DAVID bioinformatics resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic Acids Res 2007;35:W169–W175. 103. Cannata N, Merelli E and Altman RB. Time to organize the bioinformatics resourceome. PLoS Comput Biol 2005;1:e76. 104. Fox JA, McMillan S and Ouellette BFF. Conducting research on the web: 2007 update for the bioinformatics links directory. Nucleic Acids Res 2007;35:W3–W5. 105. Microarray Gene Expression Database Society: http://www.mged.org/. 106. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC, Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V, Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J and Vingron M. Minimum information about a microarray experiment (MIAME) – toward standards for microarray data. Nat Genet 2001;29: 365–371. 107. Tan PK, Downey TJ, Spitznagel ELJ, Xu P, Fu D, Dimitrov DS, Lempicki RA, Raaka BM and Cam MC. Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res 2003;31:5676–5684.

59 108. Bammler T, Beyer RP, Bhattacharya S, Boorman GA, Boyles A, Bradford BU, Bumgarner RE, Bushel PR, Chaturvedi K, Choi D, Cunningham ML, Deng S, Dressman HK, Fannin RD, Farin FM, Freedman JH, Fry RC, Harper A, Humble MC, Hurban P, Kavanagh TJ, Kaufmann WK, Kerr KF, Jing L, Lapidus JA, Lasarev MR, Li J, Li Y, Lobenhofer EK, Lu X, Malek RL, Milton S, Nagalla SR, O’malley JP, Palmer VS, Pattee P, Paules RS, Perou CM, Phillips K, Qin L, Qiu Y, Quigley SD, Rodland M, Rusyn I, Samson LD, Schwartz DA, Shi Y, Shin J, Sieber SO, Slifer S, Speer MC, Spencer PS, Sproles DI, Swenberg JA, Suk WA, Sullivan RC, Tian R, Tennant RW, Todd SA, Tucker CJ, Van Houten B, Weis BK, Xuan S and Zarbl H. Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods 2005;2:351–356. 109. Petersen D, Chandramouli GVR, Geoghegan J, Hilburn J, Paarlberg J, Kim CH, Munroe D, Gangi L, Han J, Puri R, Staudt L, Weinstein J, Barrett JC, Green J and Kawasaki ES. Three microarray platforms: an analysis of their concordance in profiling gene expression. BMC Genomics 2005;6:63. 110. Wang H, He X, Band M, Wilson C and Liu L. A study of inter-lab and inter-platform agreement of DNA microarray data. BMC Genomics 2005;6:71. 111. Shi L, Tong W, Fang H, Scherf U, Han J, Puri RK, Frueh FW, Goodsaid FM, Guo L, Su Z, Han T, Fuscoe JC, Xu ZA, Patterson TA, Hong H, Xie Q, Perkins RG, Chen JJ and Casciano DA. Cross-platform comparability of microarray technology: intraplatform consistency and appropriate data analysis procedures are essential. BMC Bioinformatics 2005;6(Suppl. 2):S12. 112. Baker SC, Bauer SR, Beyer RP, Brenton JD, Bromley B, Burrill J, Causton H, Conley MP, Elespuru R, Fero M, Foy C, Fuscoe J, Gao X, Gerhold DL, Gilles P, Goodsaid F, Guo X, Hackett J, Hockett RD, Ikonomi P, Irizarry RA, Kawasaki ES, Kaysser-Kranich T, Kerr K, Kiser G, Koch WH, Lee KY, Liu C, Liu ZL, Lucas A, Manohar CF, Miyada G, Modrusan Z, Parkes H, Puri RK, Reid L, Ryder TB, Salit M, Samaha RR, Scherf U, Sendera TJ, Setterquist RA, Shi L, Shippy R, Soriano JV, Wagar EA, Warrington JA, Williams M, Wilmer F, Wilson M, Wolber PK, Wu X and Zadro R. The external RNA controls consortium: a progress report. Nat Methods 2005;2:731–734. 113. Canales RD, Luo Y, Willey JC, Austermiller B, Barbacioru CC, Boysen C, Hunkapiller K, Jensen RV, Knight CR, Lee KY, Ma Y, Maqsodi B, Papallo A, Peters EH, Poulter K, Ruppel PL, Samaha RR, Shi L, Yang W, Zhang L and Goodsaid FM. Evaluation of DNA microarray results with quantitative gene expression platforms. Nat Biotechnol 2006;24:1115–1122. 114. Shippy R, Fulmer-Smentek S, Jensen RV, Jones WD, Wolber PK, Johnson CD, Pine PS, Boysen C, Guo X, Chudin E, Sun YA, Willey JC, Thierry-Mieg J, Thierry-Mieg D, Setterquist RA, Wilson M, Lucas AB, Novoradovskaya N, Papallo A, Turpaz Y, Baker SC, Warrington JA, Shi L and Herman D. Using RNA sample titrations to assess microarray platform performance and normalization techniques. Nat Biotechnol 2006;24:1123–1131. 115. MAQC Consortium. Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Schrf U, Thierry-Mieg J, Wang C, Wilson M, Wolber PK, Zhang L, Amur S, Bao W, Barbacioru CC, Lucas AB, Bertholet V, Boysen C, Bromley B, Brown D, Brunner A, Canales R, Cao XM, Cebula TA, Chen JJ, Cheng J, Chu T, Chudin E, Corson J, Corton JC, Croner LJ,

60

116.

117.

118.

119.

120. 121.

122.

123.

Davies S, Davison TS, Delenstarr G, Deng X, Dorris D, Eklund AC, Fan X, Fang H, Fulmer-Smentek S, Fuscoe JC, Gallagher K, Ge W, Guo L, Guo X, Hager J, Haje PK, Han J, Han T, Harbottle HC, Harris SC, Hatchwell E, Hauser CA, Hester S, Hong H, Hurban P, Jackson SA, Ji H, Knight CR, Kuo WP, LeClerc JE, Levy S, Li Q, Liu C, Liu Y, Lombardi MJ, Ma Y, Magnuson SR, Maqsodi B, McDaniel T, Mei N, Myklebost O, Ning B, Novoradovskaya N, Orr MS, Osborn TW, Papallo A, Patterson TA, Perkins RG, Peters EH, Peterson R, Philips KL, Pine PS, Pusztai L, Qian F, Ren H, Rosen M, Rosenzweig BA, Samaha RR, Schena M, Schroth GP, Shchegrova S, Smith DD, Staedtler F, Su Z, Sun H, Szallasi Z, Tezak Z, Thierry-Mieg D, Thompson KL, Tikhonova I, Turpaz Y, Vallanat B, Van C, Walker SJ, Wang SJ, Wang Y, Wolfinger R, Wong A, Wu J, Xiao C, Xie Q, Xu J, Yang W, Zhang L, Zhong S, Zong Y and Slikker WJ. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 2006;24:1151–1161. Guo L, Lobenhofer EK, Wang C, Shippy R, Harris SC, Zhang L, Mei N, Chen T, Herman D, Goodsaid FM, Hurban P, Phillips KL, Xu J, Deng X, Sun YA, Tong W, Dragan YP and Shi L. Rat toxicogenomic study reveals analytical consistency across microarray platforms. Nat Biotechnol 2006;24:1162–1169. Chen J, Hsueh H, Delongchamp R, Lin C and Tsai C. Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data. BMC Bioinformatics 2007;8:412. Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M and Edgar R. NCBI GEO: mining tens of millions of expression profiles – database and tools update. Nucleic Acids Res 2007;35:D760–D765. Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, Holloway E, Kolesnykov N, Lilja P, Lukk M, Mani R, Rayner T, Sharma A, William E, Sarkans U and Brazma A. ArrayExpress – a public database of microarray experiments and gene expression profiles. Nucleic Acids Res 2007;35:D747–D750. Lu Y, Yi Y, Liu P, Wen W, James M, Wang D and You M. Common human cancer genes discovered by integrated gene-expression analysis. PLoS ONE 2007;2:e1149. English SB and Butte AJ. Evaluation and integration of 49 genome-wide experiments and the prediction of previously unknown obesity-related genes. Bioinformatics 2007;23:2910–2917. Clamp M, Fry B, Kamal M, Xie X, Cuff J, Lin MF, Kellis M, Lindblad-Toh K and Lander ES. Distinguishing protein-coding and noncoding genes in the human genome. Proc Natl Acad Sci USA 2007;104:19428–19433. ENCODE Project Consortium. Birney E, Stamatoyannopoulos JA, Dutta A, Guigo´ R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S, Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, Giresi PG, Goldy J, Hawrylycz M, Haydock A, Humbert R, James KD, Johnson BE, Johnson EM, Frum TT, Rosenzweig ER, Karnani N, Lee K, Lefebvre GC, Navas PA, Neri F, Parker SCJ, Sabo PJ, Sandstrom R, Shafer A, Vetrie D, Weaver M, Wilcox S, Yu M, Collins FS, Dekker J, Lieb JD, Tullius TD, Crawford GE, Sunyaev S, Noble WS, Dunham I, Denoeud F, Reymond A, Kapranov P, Rozowsky J, Zheng D, Castelo R, Frankish A, Harrow J, Ghosh S, Sandelin A, Hofacker IL, Baertsch R, Keefe D, Dike S, Cheng J, Hirsch HA, Sekinger EA, Lagarde J, Abril JF, Shahab A, Flamm C, Fried C, Hackermu¨ller J, Hertel J, Lindemeyer M, Missal K, Tanzer A, Washietl S, Korbel J, Emanuelsson O,

61 Pedersen MC, Holroyd N, Taylor R, Swarbreck D, Matthews N, Dickson MC, Thomas DJ, Weirauch MT, Gilbert J, Drenkow J, Bell I, Zhao X, Srinivasan KG, Sung W, Ooi HS, Chiu KP, Foissac S, Alioto T, Brent M, Pachter L, Tress ML, Valencia A, Choo SW, Choo CY, Ucla C, Manzano C, Wyss C, Cheung E, Clark TG, Brown JB, Ganesh M, Patel S, Tammana H, Chrast J, Henrichsen CN, Kai C, Kawai J, Nagalakshmi U, Wu J, Lian Z, Lian J, Newburger P, Zhang X, Bickel P, Mattick JS, Carninci P, Hayashizaki Y, Weissman S, Hubbard T, Myers RM, Rogers J, Stadler PF, Lowe TM, Wei C, Ruan Y, Struhl K, Gerstein M, Antonarakis SE, Fu Y, Green ED, Karao¨z U, Siepel A, Taylor J, Liefer LA, Wetterstrand KA, Good PJ, Feingold EA, Guyer MS, Cooper GM, Asimenos G, Dewey CN, Hou M, Nikolaev S, Montoya-Burgos JI, Lo¨ytynoja A, Whelan S, Pardi F, Massingham T, Huang H, Zhang NR, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Seringhaus M, Church D, Rosenbloom K, Kent WJ, Stone EA, Batzoglou S, Goldman N, Hardison RC, Haussler D, Miller W, Sidow A, Trinklein ND, Zhang ZD, Barrera L, Stuart R, King DC, Ameur A, Enroth S, Bieda MC, Kim J, Bhinge AA, Jiang N, Liu J, Yao F, Vega VB, Lee CWH, Ng P, Shahab A, Yang A, Moqtaderi Z, Zhu Z, Xu X, Squazzo S, Oberley MJ, Inman D, Singer MA, Richmond TA, Munn KJ, Rada-Iglesias A, Wallerman O, Komorowski J, Fowler JC, Couttet P, Bruce AW, Dovey OM, Ellis PD, Langford CF, Nix DA, Euskirchen G, Hartman S, Urban AE, Kraus P, Van Calcar S, Heintzman N, Kim TH, Wang K, Qu C, Hon G, Luna R, Glass CK, Rosenfeld MG, Aldred SF, Cooper SJ, Halees A, Lin JM, Shulha HP, Zhang X, Xu M, Haidar JNS, Yu Y, Ruan Y, Iyer VR, Green RD, Wadelius C, Farnham PJ, Ren B, Harte RA, Hinrichs AS, Trumbower H, Clawson H, Hillman-Jackson J, Zweig AS, Smith K, Thakkapallayil A, Barber G, Kuhn RM, Karolchik D, Armengol L, Bird CP, de Bakker PIW, Kern AD, Lopez-Bigas N, Martin JD, Stranger BE, Woodroffe A, Davydov E, Dimas A, Eyras E, Hallgrı´ msdo´ttir IB, Huppert J, Zody MC, Abecasis GR, Estivill X, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VVB, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Koriabine M, Nefedov M, Osoegawa K, Yoshinaga Y, Zhu B and de Jong PJ. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007;447:799–816. 124. Velculescu VE and Kinzler KW. Gene expression analysis goes digital. Nat Biotechnol 2007;25:878–880. 125. Velculescu VE, Zhang L, Vogelstein B and Kinzler KW. Serial analysis of gene expression. Science 1995;270:484–487.

63

UCSC Genome Browser: Deep support for molecular biomedical research Mary E. Mangan1,, Jennifer M. Williams1, Scott M. Lathe1, Donna Karolchik2 and Warren C. Lathe III1 1

OpenHelix, LLC, 12600 SE 38th Street, Suite 230, Bellevue, WA 98006, USA UCSC Genome Bioinformatics Group: Center for Biomolecular Science & Engineering, CBSE/ITI, 501D Engineering II Building, University of California, Santa Cruz, 1156 High St., Santa Cruz, CA 95064, USA 2

Abstract. The volume and complexity of genomic sequence data, and the additional experimental data required for annotation of the genomic context, pose a major challenge for display and access for biomedical researchers. Genome browsers organize this data and make it available in various ways to extract useful information to advance research projects. The UCSC Genome Browser is one of these resources. The official sequence data for a given species forms the framework to display many other types of data such as expression, variation, cross-species comparisons, and more. Visual representations of the data are available for exploration. Data can be queried with sequences. Complex database queries are also easily achieved with the Table Browser interface. Associated tools permit additional query types or access to additional data sources such as images of in situ localizations. Support for solving researcher’s issues is provided with active discussion mailing lists and by providing updated training materials. The UCSC Genome Browser provides a source of deep support for a wide range of biomedical molecular research (http://genome.ucsc.edu). Keywords: UCSC Genome Browser, annotation, genomics, SNP, transcription factor, conservation, sequence analysis, ENCODE, human genome, training, bioinformatics, education.

Introduction In the last few years the amount of genomic sequence data has grown exponentially. The number of databases and tools that organize and analyze this data has grown in conjunction with the growth in the raw data. Since 1996, Nucleic Acids Research (NAR) has had an annual ‘‘database issue’’ that lists prominent or new databases and tools. In the first annual issue in 1996, 57 databases were included [1]. Just over 4 years later, 230 were listed in their online list. Now, more than 4 times the number of databases and tools are listed [2]. And that is only the tip of the iceberg. Few scientists are aware of, or make full use of, all the open-source and public resources available to them through the Internet. Tools and databases continue to grow in number. Among them are ‘‘genome browsers’’ which help organize all this sequence data, and add other Corresponding author: Tel.: (425) 401-1400.

E-mail: [email protected] (M.E. Mangan). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00003-3

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

64 data types and analysis tools. Genome browsers bring together genome data and analysis tools for draft and completed genomes into one accessible browser of the genome. They add additional data types such as single nucleotide polymorphisms (SNPs), domains, repeat elements, and more, to provide context for understanding genomic regions. All genome browsers organize data based on chromosomal locations in order to give their users the ability to search, zoom, or walk to different parts of the genome and chromosomes. Once a user arrives at a region of interest in the browser, the graphic chromosomal display offers different types of annotated data as ‘‘tracks’’ of information layered along by chromosomal location. Some examples of these ‘‘annotations’’ of the chromosome include genes and gene predictions, variation and repeats data, cross-species comparative data, and many more. Additionally, these graphical tracks usually link out to pages of more information. For example, a gene track page might have information about the gene structure, transcripts, expressed sequence tags (ESTs), SNPs, protein sequence, domains, and more. These data detail pages then usually link out to outside databases and data. For example, the protein information on a gene details page might link out to a Protein Data Bank (PDB) crystal structure [3]. In summary, genome browsers present the genome data and organize many other data types to provide context to understand genomic regions of interest. One of the oldest and most widely accepted genome browsers is the UCSC Genome Browser, which has been providing context for sequence information since the earliest days of the computational displays of Human Genome Project (HGP) data [4]. As more and more genomes are sequenced, the same framework has been used to display many other species data as well. Additionally, each new type of genome-based data brings additional insights into the genomic regions: the UCSC Genome Browser software has been enhanced and extended to provide new and informative views of the data. This tremendous resource is freely available on the web for anyone to use (http://genome.ucsc.edu/). The UCSC Genome Browser is developed and maintained by the Genome Bioinformatics Group, a cross-departmental team within the Center for Biomolecular Science and Engineering (CBSE) at the University of California Santa Cruz (UCSC). This team continues to implement new data and features every day. In this chapter, we present the basic features of the UCSC Genome Browser and introduce you to obtaining the data and views that can advance your molecular biomedical research projects, and deepen your understanding of genomic regions of interest. We will examine the ways to view various data types already available in the browser, and show you how to display your own data as well. We will show you associated tools to move beyond the default displays. We will describe new data types and features that you can

65 expect to see soon. Opportunities to expand your knowledge with additional training will be provided. Our goal is to help you to visualize and analyze molecular data that is crucial for your work, improving the effectiveness and efficiency of your research. UCSC Genome Browser: The basics – an overview There are various ways that different bioinformatics teams have chosen to represent genome data. Some are horizontal and some vertical. Different types of data can be selected for display. To provide you with an overview of the structure that the UCSC team has chosen, we show you this conceptual diagram of the organization of the viewer in Fig. 1A. The specific data groups will vary by species. For any species that you examine in the UCSC Genome Browser, you will find a similar organization. The official sequence from the team in charge of the species sequencing project is obtained from the public repository GenBank [5]. This sequence forms the framework on which all of the other data will be displayed. The official sequence is displayed as the uppermost track in the viewer. At low magnification this will appear as a series of periodic tick marks and nucleotide position numbers – but you can zoom in all the way to the base level and see the individual nucleotides. Beneath the official sequence – aligned at the appropriate nucleotides – you can find dozens of different data types that provide context for the genomic sequence regions in which you may be interested. These additional data displays are referred to as ‘‘annotation tracks’’ and offer insights to other features of the region. You can find the genes that are known to be present in a region. You will find variations such as SNPs, evolutionary relationships across multiple species, and many other data types of interest to biomedical researchers as well. The data types that you can find will vary by species, but all of the species will be organized in this manner: the official sequence at the top, annotation tracks below. Any item that you want to learn more about is clickable, and you can go deeper into any area you like (Fig. 1B). Basic searches in the browser To navigate to the genome locations you want to see in the UCSC Genome Browser, you start at the homepage (http://genome.ucsc.edu). You access the tools we will discuss in this chapter from the blue navigation bars at the top or at the side of the homepage. To get started with a basic search, click the links for Genome or Genome Browser from the homepage to access the Gateway. A sample of the Gateway search form is illustrated (Fig. 2A). On the Gateway page you need to make a number of choices. First you choose a clade and then a species (genome). You can search one species at a

66 A) Official sequence from sequence consortium teams: base positions indicated Mapping and Sequencing

Annotation Tracks

Phenotype and Disease Associations Genes and Gene Predictions mRNAs and ESTs Expression and Regulation

links out to more data

Comparative Genomics Variations and Repeats ENCODE

B)

Fig. 1. The UCSC Genome Browser organizes the official genome sequence and

many other data types to provide context for any genomic region of interest. Nucleotide sequence data at the top provides the framework on which all the other data is organized. (A) Conceptual diagram of the browser organization. (B) Sample image of the actual browser with organizational features indicated. Pages with more detailed information are one click away.

time. Next you choose an ‘‘assembly.’’ Assembly refers to the official backbone genomic sequence that is used to create the framework on which to hang all the other data. UCSC obtains the official assembly and then generates the annotation tracks for that genome. The date you see in the assembly menu refers to when the assembly sequence was frozen by the

67

Fig. 2. The Gateway search interface provides quick and easy access to the basic

genomic query functions. (A) The Gateway interface for searching the genome requires you to select: the genome to search; which official sequence assembly release you want to examine; the region of the genome that you wish to view; and the image width of the resulting view, in pixels. (B) Results of a search in the human genome for the text TP53 show all the records containing the text TP53. Some are probably the gene of interest (as indicated by the gene symbols on the left), others may be interacting or associated proteins (as indicated by the descriptive text on the right). Selecting the link for the appropriate item will take you to the genomic region indicated in the center.

assembly team. Usually you will want the most current assembly, but sometimes you may want to look back at older data and you will see that it is still available in the menu. Sometimes research papers are years in the making and may rely on coordinates from older assemblies that you need to evaluate. Even older data are still available in the UCSC archives if you need it, but may not be shown in the menu choices. You proceed to specify the position in the genome that you would like to examine. You can enter nucleotide ranges, gene symbols, or keywords. Finally you can set the image width. This is the size (in pixels) of the image that you will see when you examine the genomic region in the viewer. A ‘‘configure’’ button below lets you alter the image in many other ways as well, including changing the font and features sizes. For training, talks, or publications it may help to adjust these features for the images you will use.

68 Once you have made your choices in the Gateway, you submit your search to the database. For this chapter, we will illustrate the results of a search (with default settings) for TP53 – a well-characterized gene of biomedical interest (Fig. 2B). The result of this search is a large list of items in the database that match the text TP53. The list will be a mix of items that contain matching text somewhere in their record. Some of them will be the likely gene as you can see from the gene symbol on the left side, and some of them may have names that say ‘‘interact with TP53,’’ or have various related information in their gene details pages. In this case, we will have to select one of the links to go to the viewer. Usually if they share the gene symbol, the different items represent splice variants or other known variations. As a rule of thumb you may want to try selecting a long one. All of them will be visible when we get to the viewer – we just have to pick one at this point to be taken to the right genomic region. You can move around the region – upstream, downstream, zooming in or out – once you get there. The basic displays: An overview When you get to a region of the genome to examine, you will find that UCSC presents a set of default tracks for your view on the long browser page. We will spend a few moments now to understand the organization of this page. The Genome Viewer image is at the top of the page, and the controls for all the tracks are provided at the bottom of the page. An overview of an entire page is provided in (Fig. 3). The data in the image part of the viewer is organized into the same sections as the controls. If you are interested in the Mapping and Sequencing groups of tracks, you will find them in the uppermost portion of the viewer. If you are interested in Variation and Repeats, those correspond to the lower areas in the viewer. Each of the available tracks has a link to more information about the data that track contains and a menu to set the display type, which we will describe later. Displaying data types: Visual cues We will now take a closer look at the viewer. There are a number of helpful features of the display that we would like to illustrate. Some items are simple displays of a single location or short segment. These are illustrated as bars or tick marks. A sample of this is the SNP track shown in Fig. 4A. In this case all the SNPs are on one line. Later we will talk about how to change that for a different view of the SNPs. Another type of display will provide information about the gene structure, as we show in Fig. 4B. We focus on the UCSC Genes track here, with those likely isoforms from our original search adjacent to each other. The particular one that we clicked to get here has highlighting around the gene symbol. UCSC Genes represent

69

Fig. 3. Overview of the UCSC Genome Browser viewer, showing the human TP53

region. The viewer at the top shows graphical displays of the data. Controls for groups of similar data are collected in collapsible sections. Display data is layered in the same layout as the controls area, with ‘‘Mapping and Sequencing Tracks’’ near the top of the viewer, and ‘‘Variation and Repeats’’ further down, for example. The specific data types available will vary by species.

predictions based on data from well known and respected repositories of sequence information, such as RefSeq, GenBank, and UniProt. A moderately conservative set of predictions, this track includes protein-coding and putative non-coding transcripts. A series of thresholds of evidence, and

70

Fig. 4. Genome viewer graphics indicate important features with the shape and

height of the components. Color can provide important cues as well. (A) SNPs are indicted as tick marks, representing single nucleotides or small insertion–deletion (indel) items. Color code is the default setting from the SNP description page. (B) Gene structure is indicated using boxes and lines to represent coding and noncoding regions. Arrowheads on the lines show the direction of transcription of this gene: here we see the 5u end of the gene is to the right, and transcription would proceed to the left. (C) The conservation track displays the alignment scores across many species as a histogram, with the tallest peaks indicating the most conserved areas. Individual species can be seen below the group comparison. (The color version of this figure is hosted in Science Direct.)

quality control checks, have been applied to produce the set of predictions of the likely transcripts that you will see. A detailed explanation of this can be found in a paper from the UCSC team [6]. In this graphical representation of gene structures, full-height boxes represent protein-coding pieces, half-height boxes represent UTRs, and lines represent intron segments. Arrowheads on the gene structure lines indicate the direction of transcription, which could be in either orientation in this viewer. Be sure to watch for that for the correct interpretation of coding sequences and promoter regions. In Fig. 4B we specifically chose to show a gene that runs from right to left, so that you will be aware of that possibility. If you are looking at gene structure diagrams, you will find introns, exons, untranslated regions (UTRs), and the direction of transcription of the gene. If you are looking at the Conservation track (Mammal Cons) you will see a special type of display, which UCSC refers to as a wiggle track (Fig. 4C). The wiggle track is a nice way to display continuous data as a histogram. In this case it displays the likelihood of an evolutionary relationship as a bar with a height that represents the score from a Multiz comparison of many species [7]. Another way to display genomewide data is a new feature called Genome Graphs, which will help in the display of data from genome-wide association studies.

71 Track colors may have meanings. In the case of the UCSC Genes we see in Fig. 4B, colors offer some indication of the confidence in the quality of the sequence information. For example, in the UCSC Genes track, the color black indicates that there is a PDB structure entry for this transcript. Shades of blue indicate status – which may be reviewed or provisional, for example. However, sometimes colors are configurable. A good example of color coding is the situation with SNPs. It is often crucial to know if the variation at a site could cause a change in the amino acid sequence of the resulting protein, which is known as a coding non-synonymous change. We could choose to color those important SNPs differently from others. On the SNP information page you can do this with the filter shown (Fig. 5A). Here we illustrate that SNPs in the display should be colored by function, and non-synonymous SNPs should be blue and everything else should be black.

Fig. 5. Filters are available for some data types. These filters can be used to

customize the color keys to provide additional cues about the data. (A) SNPs can be filtered based on a number of functions, and different types of SNPs can subsequently be colored using a 5-color scheme. For our example we color ‘‘coding non-synonymous’’ SNPs in blue, all other SNPs are black. The resulting SNP track is shown. (B) ESTs can be color coded based on various criteria. In this case we choose to show ESTs found in brain tissue in green. This relies on the information in the GenBank record, which may or may not be present. (The color version of this figure is hosted in Science Direct.)

72 Another nice example of a way to color the items in a track is the ESTs track: you can choose to color ESTs by tissue type, and look for tissue-specific splice variants (Fig. 5B). For example, you could ask for the ESTs in the TP53 region to be colored green if they are from brain samples. However, it relies on that data being present in the original GenBank record, and tissue data is not always present in the original submission. Other tracks will have different color codes or filter options – you need to examine the track pages to find out the color keys, and determine if you can configure those colors yourself. Two ways to get to the track information pages are: (a) click on the ‘‘mini button’’ gray or blue bar on the left edge of the display area and (b) from the track controls area, click the hyperlinked name above the menu. Either action takes you to the track details description and filter options (if they are available; not everything can be filtered). Setting the layout with menu controls As you may have noticed, you can also change the layout or ‘‘visibility’’ of the display using the track menu controls: hide, dense, squish, pack, and full (Fig. 6A). Once you find a track of interest, you can set the display by using these menus. Here we will illustrate the different appearances of the menu selections, using the Human ESTs track as an example. We show the same region of human chromosome 17 as our TP53 gene in the Human ESTs section of the viewer, using the different menu options (Fig. 6B–E):  Hide: completely removes the data from your image.  Dense: all items become collapsed into a single line – it fuses all the rows of data into one line. In this case it means that you can see where there is EST coverage, but you do not know anything about individual ESTs in this view (Fig. 6B).  Squish: each item is on a separate line, but the graphics are only 50% of their regular height. Here you can see more information about individual ESTs (Fig. 6C).  Pack: each item is separate, but efficiently stacked like sardines. However, they are full-height diagrams – which makes it different from squish. Here you can see the GenBank accession numbers for the ESTs, which may be useful. In the case of ESTs you can also see arrowheads, which indicate the direction of the sequence ‘‘read’’ provided from the GenBank record (Fig. 6D).  Full: each item is on its own separate line, all the way down the browser viewer, up to approximately 1,000 rows (Fig. 6E). If you have too many items in any display you may exceed the capabilities of your Internet browser; in these instances the Genome Browser will

73

Fig. 6. The ‘‘visibility’’ or layout of the data can be adjusted with the menu for a

given track. (A) A typical menu for a data track may have a number of choices for setting the display. This menu shows a complete set of options: hide, dense, squish, pack, and full. Hide removes the track from the viewer completely. (B) Dense compresses all the data on a single line. (C) Squish displays all the data, efficiently packed and 50% of normal height. (D) Pack displays efficient packing, but has fullheight graphics. (E) Full puts every item on its own line, up to approximately 1,000 rows if your browser permits. If there are more items the display will reset to a more efficient display.

74 automatically revert to a more efficient view. If you still need to examine your data in a more distributed view you can try to zoom in to a smaller nucleotide range area that would contain fewer elements. To choose any of these options, just highlight it in the pulldown menu. To make the changes appear, you must click the ‘‘refresh’’ button that appears in the middle or at the very bottom of the Genome Browser page. Super-tracks for large-scale data Recently, a capability for a new class of ‘‘super-tracks’’ has been added. As more large-scale and genome-wide studies are performed and published there will be new challenges for the display of this data. Certain data tracks of related data may be grouped so that they can be shown or hidden by a single menu control option. Super-tracks are one way to address this scale and complexity, and the UCSC team will continue to develop new ways to view and manage such data. The best view for a given data type depends on your needs and the kind of data being presented. For example, sometimes it may be enough to know that there are ESTs in a region. Other times you may want to know more about a specific EST, its length, and which exon or intron regions it may possess. Page-wide display controls Besides the track menus, there are a number of other controls that permit you to adjust the view of the data in your viewer. Here we see the upper section of the Genome Viewer page, with several controls for adjusting your view of the genome (Fig. 7A). You can use the buttons with the arrowhead indicators to walk left or right along the chromosome in this area. You can take steps along the chromosome in large (signified by a triple arrowhead), medium (double arrowheads), or small (single arrowheads) jumps. These can be very handy if you are interested in what is going on near your search region. You can magnify the image area using the ‘‘zoom in’’ buttons – you can zoom in a little bit, or up to 10-fold. Or you can choose ‘‘base’’ to zoom all the way down to the nucleotide level right away. Similarly, you can ‘‘zoom out’’ with a different set of buttons. Alternatively, you can indicate a specific genome coordinate position in the ‘‘position/search’’ box. For example, if we wanted to see more of the possible promoter or downstream regions, we could subtract 1,000 from the 5u side, and add 1,000 to the 3u side, and get all of that extra sequence in our view. In addition, you can use this box just like the search box on the Gateway page – you can use it to search for text items if you enter text and hit ‘‘jump.’’ Another handy feature is the automatic zoom and recenter action. If you click your mouse on the nucleotide backbone track at the very top, the

75

Fig. 7. Controls to adjust the viewer magnification and layout are found on the page.

(A) At the top of the page controls to walk left or right along the chromosomal region are shown. You can move in small, medium, or large hops. You can zoom to different levels, and even down to the display of the base nucleotides to see the A, T, G, and C display. Zooming out at different rates is possible as well. A position box can be used to add or subtract nucleotides for your view in the window, or you can search this genome from here by entering text and clicking ‘‘jump.’’ A configure button can be used to access many controls. (B) Mid-page control buttons can reset your view to defaults, hide everything, upload custom tracks, configure (as above), and refresh the image with any changes you specified.

browser will automatically recenter the image where you clicked, and zoom in threefold. Finally, you can change the way your viewer looks with the ‘‘configure’’ button. From this button, you will access a page that gives you some choices about how this page should look, including changing the font and graphics sizes. In the center of the page are another set of controls (Fig. 7B). First, let us focus attention on the control buttons.  The ‘‘default tracks’’ button will get you back to the default settings – it is like an escape hatch if you made a lot of changes on the image and want to start over.  The ‘‘hide all’’ button is nice if you want to set up a specific display with only those annotation tracks that you are interested in – it will let you start to build a nice customized view for yourself with only those things you care about.  The ‘‘add custom tracks’’ button will enable you to view your own queries or lab-generated data in the viewer, and will be discussed in more detail later.  The ‘‘configure’’ button – also found near the top of this page, and also on the Gateway page – gives you access to a big web page that will let you make all sorts of changes to the viewer. You will be able to change the font and graphic size here; you can also change the window width (in pixels again) from this page. You can make broad changes to all the

76 track menus, which are all together and grouped on this page for quick access to entire sections.  The ‘‘refresh’’ button must be clicked to enforce any of the changes you made to those pulldown menus in the annotation tracks. The changes in the pulldown menus are not made automatically, you have to click this button. Display choices are stored One thing that is important to know about changes you have made to the viewer: the browser remembers your locations, changes, and settings until you clear them. A cookie is stored on your browser that remembers where you were looking in the genome and if you made changes to those menus. As we have discussed, there are a number of changes you can make – to the position, the track displays, and the filter options. These parameters are all saved on the computer you are using. This may be useful if you always want to look at the data the same way. Or as you move from one tool at this site to another, you ‘‘carry’’ your position with you. But, that may not be great if you have forgotten that you filtered out something, or turned off a track. And if you use a shared computer in the lab or a library – you do not know if someone made some changes since you used the browser last. The UCSC team refers to these settings as being stored in your ‘‘cart,’’ almost like a shopping cart. There are a couple of ways to clear out your cart: you can choose the ‘‘default tracks’’ button from the Viewer controls, or the link that says: ‘‘click here to reset’’ on the Gateway starting page. If you ever find that your browser is not displaying what you expect, try to clear your cart to remove any stored parameters and start again. Going beyond the image A number of features below the surface of the viewer are also of interest. You can see a gene in the viewer display, but perhaps you want to know a lot more about that gene. Click on a specific gene to see more. A new page will open: an information page that contains a great deal of extra data, references, and sources about that gene (or predicted gene, or SNP, or other item) in the viewer (Fig. 8A). We are going to show just one sample here of the detailed information on the human TP53 UCSC Gene page. A UCSC Gene page has summary information, links to sequences, other resources, microarray experiments, mRNA secondary structure, protein domains, orthologous genes in other species, mRNAs collected from GenBank, pathways, synonyms, and more. This page is actually quite long, and you will not be able to see all the details right now. But later you should go and see for yourself. There is extensive information about this gene, and links to many other resources as well – practically one-stop shopping for well-known genes.

77

Fig. 8. Clicking on items in the viewer will yield new pages with details about the

item that was clicked. (A) Clicking on a TP53 isoform in the viewer brings up a descriptive page with extensive details about this gene, and links to even more resources for supplemental information. In this image we highlight microarray data, domains, and pathways, but other types of data are available as well. (B) Clicking a SNP yields data relevant to SNP investigations, such as reference and variant alleles, functions (if known), and more.

78 One thing to note: not all genes will have this level of detail, and not every species will have all this information. We have specifically chosen a wellknown gene for our example. Some genes will not have protein structures, some will not have pathway information, and some will not have microarray data. But if the data is available, it will be available to you on these detail pages. Other pages will carry different types of data, of course. Various data types will have different details pages, with relevant types of information about that item. You only have to click on any item in the viewer to get to these details pages. Click on a SNP and you will see details important for SNP data, such as the variant sequence, function, and so on. We show here a small part of an SNP detail page – position, sequence, validation status, function, and more (Fig. 8B). Click on a segment of the conservation track to examine the specific alignments across the species in that area. What you will find when you click depends on the kind of data you are looking at. What about the sequence? So far, we have seen visual cues, and lots of text-based data, but one frequently asked question that people have at this point is ‘‘where is the sequence data?’’ We want to spend a couple of moments on that topic so that you will know that you can get to the sequence level data. From the viewer, there are two handy ways to get the sequence information. First, from your TP53 viewer section, you can simply click the DNA link in the blue navigation bar at the top of the page. The link will bring you to a new ‘‘Get DNA in Window’’ web page (Fig. 9A). The position you were looking at in the viewer is carried here, and is specified in the position box. On this page you have several options to format the sequence:  You can tweak the output by adding some bases upstream or downstream.  You can get the sequence in upper or lower case.  You can mask repeated, low complexity regions.  You can get the reverse strand. You could just click the ‘‘get DNA’’ button to get the sequence in a new web page; the output will be in FASTA format. The second button option offers even more ways to customize the output DNA sequence (Fig. 9A). If you click the ‘‘extended case/color options’’ button, you will get a new page that lets you change the case of individual items, change their colors, underline specific features, and so on. The choices that you will see in the list are based on the tracks actively shown in the Genome Viewer window you

79

Fig. 9. To get the sequence of a region or an item you have several options. (A) Using

the ‘‘DNA’’ link at the top of the browser page you will be able to access the ‘‘get DNA in window’’ feature. It provides the span that was shown in your browser. You can get simple FASTA sequence output, or choose to color, underline, and change the font of outcome with the ‘‘extended case/color options’’ feature. We show a sample with some text underlined and case altered. (B) From a gene detail page you can access sequence information about that item. Genomic, mRNA, and protein sequence are available.

80 were looking at. If that is too much, go back and turn off some tracks. If you are missing data you want to view, turn it on in the browser. This is a really unique way to look at your sequence of interest. As you can see in a sample output, different features look different by color, case, or underlines. This could be copied and pasted to a word-processing document for storage as well. The two options that we just described deal with getting the whole region of DNA that is currently shown in your viewer. But you have another option – you can get just the sequence you want for any specific item from an annotation track. In this second example of how to get sequence data, we show a screen shot of the TP53 annotation track in the UCSC genes section. As before, we click on the line to get to the TP53 details page. From the details pages you can get the specific sequence for that item. We are looking at the details for this item which represents a TP53 isoform. Focus on a part of that details page – the sequence section (Fig. 9B). You can scroll down the details page to find the sequence section. Here you will find links to the genomic (the official or reference sequence for that span), mRNA sequence (which may differ from the genomic), and the protein sequence (of this isoform). You can use these links to get this specific sequence, plus additional options if you choose the genomic sequence – which is great for promoter studies, intron studies, and so on. So, the sequence data for the items in your viewer is just a couple of clicks away, using either the DNA link at the top to get the whole window, or the links from the information pages to obtain sequence for specific items. Basic sequence searches So far we have described what would happen if you used a gene symbol or keyword type of text search from the Gateway query option. But there may be times that you want to use a sequence to begin your search. In the UCSC browser, the tool you will use for sequence searching is called BLAT [8]. You may be familiar with the alignment tool called Basic Local Alignment Search Tool (BLASTs or BLAST2), found at the National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/BLAST/) among other places [9]. If you have used the NCBI databases and searched for similar sequences, you have probably used BLAST. But BLAT is different – it is the BLAST-like alignment tool. It searches the database slightly differently than BLAST does. BLAT requires an index of the sequences in the database – something like the index in the back of a biochemistry textbook. The BLAT index consists of all the possible unique 11-oligomer sequences in the genome (or 4-mers for protein sequences). Just as you can quickly scan a book index to find the correct word, BLAT scans the index for matching 11-mers, and then builds the rest of the match out from there. It is a very fast way to search the sequences. The outcome will still

81 be a pair of sequences that are lined up with each other so you can compare the matches: yours compared to the official genomic reference sequence. The underlying algorithm that accomplishes it is different from BLAST. BLAT works best with sequences with high identity, and greater than 21 bases long, but do not let that scare you – you can find more distant matches as well. BLAT rapidly finds nucleotide sequences that have 95% or greater similarity of length 25 bases or longer. For protein queries, the algorithm finds genomic matches to your amino acids sequence submission with 80% or better similarity for stretches of 20 amino acids or more. In general, if the sequences arose within the last 350 million years they can be detected by the BLAT tool. For many people it will be enough to know that there is a means of searching for your region of interest in the database by starting with a sequence. For the more casual BLAT user, check out the documentation at the UCSC web site for a little more detail about the way BLAT works, without tremendous amounts of mathematical equations. For a more detailed description of the algorithm, you can see the publication by Jim Kent that describes BLAT in more detail [8]. So now we know a little bit about the BLAT tool. How do we get to it? Let us start at the UCSC Genome Browser homepage. As for most UCSC tools, you use the navigation bars at the top or at the side of the UCSC homepage. Select a link called BLAT to get started. The interface for BLAT is shown in Fig. 10A. We will work our way down this page. As you can see along the top there are a few parameters you can change – some choices you have to make. First, you must choose one species to search. You search one species at a time with this tool. Then you choose an assembly – which we have seen before in the Gateway basic search section. Next, you may let the BLAT tool guess whether you have entered nucleotides or amino acids, or you can tell it which one you are using. Sort output – on default settings here – will list the best-scoring matches first, but a few other sorting options are also available. Output type specifies whether you want the output to be in the browser form or in files you can use later. Hyperlink is the default which displays in the browser, and that is what we will be using for this example. The other type, pat space layout (PSL) output styles, is useful for people who want a differently structured, text-based output that can be used for a variety of purposes. For this example, we will demonstrate the default ‘‘hyperlink’’ choice. There is a big text box where you paste your sequence. You can paste one or more sequences, but there are limits to how much BLAT you can do, as it is a large burden on the servers. You can submit up to 25,000 bases or 10,000 amino acids, up to a total of 25 sequences. If you need to do more BLAT, UCSC asks that you download it and run a local copy. Instructions for this can be found in the documentation.

82

Fig. 10. Another way to access information in the UCSC Genome Browser is to

begin with a sequence. (A) The BLAT tool is used to find matching sequences. Choose the search features, paste sequences, and submit the query to the database. The results will have statistics about the genomic region of the match, the length, the identity, and hyperlinks to view the match in the Browser or as Details. (B) BLAT results from the Browser link take you to the region of the match in the viewer. The Details link offers a page with the sequence alignment itself. (The color version of this figure is hosted in Science Direct.)

Shown here are two short mRNA sequences that we will use in our example. These are in the common FASTA format, which you have to use if you are going to use multiple sequences. There is also an option to upload your sequence (or sequences), if you keep a file of them. There are two ways to send your query to the database. A ‘‘submit’’ button is one choice. There is another special button – the ‘‘I’m feeling lucky’’

83 button. If you click that – just like in a Google search – you will be taken to the position of your best match right away, in the Genome Viewer. But we will be demonstrating the plain ‘‘submit’’ button right now. We see the results of a BLAT search against the human genome, using the sample human mRNA sequences we showed (Fig. 10A). We have sorted the list by the query and score: the results for the first sequence are together, and the results for the second sequence are together. Best matches for each are at the top of that group. You will see we have a really high-scoring matchup at the top in each case. After that they appear to be less good matches – pretty small regions, probably. The percentage identity is available, genomic location of the match, and the span of the match in the genomic sequence is provided. Now, you will remember that we asked for hyperlinked results in our setup. You can see that there are two columns of links for us (Fig. 10A). One says ‘‘browser,’’ the other ‘‘details.’’ Here we demonstrate a click of the ‘‘browser’’ link for the matches (Fig. 10B). This will link to the position of this match in the Genome Viewer. When you link from the BLAT results to the browser, you get a special track appearing in the Viewer. Just down from the top there is a new line on the browser – it says ‘‘Your Sequence from Blat Search.’’ And the name of your query sequence is listed over on the left. Here you can examine the genomic context of this sequence match. On the UCSC genes, because we are zoomed in to a small region for this match, you will be able to see the methionines indicated in green (stop codons would be indicated in red, if present). Also, note the direction of this gene – it is on the negative strand and therefore runs from right to left in this case as you can determine from the arrowheads on the gene structure diagram. We also show the outcome if you clicked the ‘‘details’’ link from the BLAT results page (Fig. 10B). It is impossible to see the whole alignment page clearly – even with our short query sequence this is a large web page. You can see that the page is divided into several parts. The top part shows the query sequence you put in (in this case our human CXCL5 mRNA sequence). The middle part of the page shows the matchup of your sequence (in blue capital letters) to the genomic sequence. This gives you a quick look at the possible exon/intron structure if you have used an mRNA as we have. It is a nice way to see which parts are the likely exons in an mRNA, and the likely introns in black text. The bottom part shows you the actual nucleotide-for-nucleotide matches – this may be more like the BLAST results you are used to seeing. In the side-by-side alignment at the bottom you can see where the query sequence on the upper row (starts with number y001) lines up with the genomic sequence. You may judge the quality of the match yourself in this section. So you can start to search the UCSC Genome Browser data with a sequence, and view the results in either the Genome Viewer browser, or at the level of alignment details, depending on your needs.

84 Advanced exploration with the Table Browser The UCSC Genome Browser is a large database with multiple ways to access the data. Underlying the site is a MySQL database that stores all the information (Fig. 11A) [10]. The Genome Viewer we have been discussing is one way to access this data. Another way to access the data is with the Table Browser interface (Fig. 11B). The Genome Browser database is comprised of ‘‘tables’’ of data. Some tables are primary and contain positional information – linked back to the genome position. One example is ‘‘knownGene’’ table which includes data such as chromosomal position, gene name, exon and intron sizes, and the like. Other tables include ‘‘auxiliary’’ types of data. Some have positional information and others do not, such as one that relates knownGene IDs to the corresponding Ensembl gene IDs. Or the place where author or submitter names are stored – which do not hold a genomic position of their own, for example. These myriad tables of data comprise the database. You can examine the data visually with the Viewer, which performs your query and creates the display for you, or you can reach into the database and pull out large swathes of information using the Table Browser in a customized manner. You are taking the same data out – just querying and displaying it in different ways. The Table Browser allows you to access these same tables of data directly – filtering, manipulating, and downloading the data in a very customized and flexible manner which is not possible with the Genome Browser Gateway query. For example, you may want a list of all the SNPs in a given gene, or maybe you want to output all the known genes on Chromosome 21. You may even ask questions genome-wide. You can also do a narrow search of only those SNPs that are non-synonymous coding variations found in disease genes. The possibilities of customized searching and data retrieval are nearly endless with the Table Browser.

Sample search for all the SNPs in a gene To start an example advanced search, go to the Table Browser by clicking on either of those buttons found on the homepage for ‘‘Tables’’ or ‘‘Table Browser.’’ To introduce you to advanced searching we are going to illustrate a few tasks, each building on the other and allowing us to use and learn different features of the Table Browser. Our first task will be to download the sequences of all the SNPs in a given gene. For this example we will use the human dystrophin gene, or DMD – a huge gene spanning many exons and a large genomic segment. You could look across this region in the viewer, and click all the SNPs (Fig. 12A). But for a gene of this size (with over 2,000 SNPs in this span) this would take a

85

Fig. 11. The UCSC Genome Browser data can be queried in advanced ways. (A)

The mySQL database underlies the UCSC Genome Browser system. The same data can be used to generate the informative displays, or used to extract large swathes of data in text or sequence form following complex and creative queries. (B) The Table Browser interface provides user-friendly access to the database. Making choices about the types of data you want to examine are simplified with menus and filter options.

86

Fig. 12. SNPs in a region can be examined visually in the Genome Browser or

downloaded in bulk from the Table Browser. (A) The human DMD gene region can be set to display the gene and the SNPs in the region. (B) The same exact region can be queried for the SNPs, and the resulting list can contain all the important details about the SNPs (location, sequence, variant, validation status, function, etc.) as a table. The tabular data can be copied and pasted for more analysis in other tools.

really long time. Instead, we will use the Table Browser to extract the text output of the same information in a few seconds (Fig. 12B). As in the other interfaces in the Genome Browser and related tools, you must choose the genome and assembly that you wish to search in. The first row of choices will set the clade, genome, and assembly – which we are familiar with from the Gateway. Here we will choose ‘‘Vertebrate’’ from the clade menu, ‘‘Human’’ from the genome menu, and ‘‘March 2006’’ from the assembly menu. As before, you can search only one genome at a time. Next you need to choose a type of data to examine. The second row instructs us to choose the Group and Track in which we are interested. The menu for Group will remind you of the organization of the Genome Viewer pages – with groups of data such as ‘‘Mapping and Sequencing’’ or ‘‘Gene and Gene Prediction’’ tracks. In this case we want SNPs, so we will choose the ‘‘Variation and Repeats’’ group. The tracks in that group will then be displayed in the tracks menu. For our example, we will choose SNPs (126). This represents the dbSNP database release number 126 [11]. The third row then offers us the tables from which we can extract the data. You will see that multiple tables contribute to this track. Some names will be clearer than others. In this case the primary or main table is snp126.

87 But others may contain information you will want eventually. If you have any questions about the type of data in the selected table, you can click the ‘‘describe table schema’’ button which leads to a new page that shows the field names, an example value, data type, and definition of each field of the database table. If the database contains other tables which are related to the selected table by a field with shared values, those related tables are listed. If the selected table is the main table for a track, then the track description text is included. A textual description of the data and source is provided. The fourth row asks you to specify the scope of the query. Do you want to look at the whole genome? Are you interested only in the ENCODE data subset [12]? Or is there a given gene or genomic region that you want? For our example we want to have SNPs in a given gene – but as you can see you have various choices for how broad or narrow your query can be. We will type DMD into the position box. We do not know the exact nucleotide numbers, but there is an easy way to get that. If you click the ‘‘lookup’’ button, the database will run a query for your gene. You will get a list of results that match, and click an appropriate link to use as your query range. For this case we will choose the top result (DMD (uc004ddf.1) at chrX:32444634-33267647 - Dystrophin (Muscular dystrophy, Duchenne and Becker types) (DMD protein)). But there may be other times where you will want to choose a specific isoform, perhaps. When you click the link, you will find that the position range is automatically inserted back into the Table Browser interface. At this time you may also notice the ‘‘define regions’’ button on the fifth row which offers you the chance to paste or upload a list of identifiers. You may find that you want to query across multiple chromosomes, or genes, or positions – or you may want to start with a list of IDs you have from some other purpose. You can override the region choices by using these flexible alternatives. We will come back to the other options later – let us first pull that list of SNPs out and have a look. If you move to the ‘‘output format’’ section you will see there are a number of choices. You can get a look at all the fields in this table. You can choose a subset of them. You can get the sequence of your items. You can get a couple of useful formats for programming tools. You can create a special ‘‘Custom Track’’ of this query, which we will discuss later in more detail. Or you can simply look at the list with links to the browser for easy viewing. For our purpose, let us just look at the full table by selecting ‘‘all fields from selected table.’’ Make that choice, and then click the ‘‘get output’’ button near the bottom of the form. We will re-write the full query here to summarize: we want to see in the human sequence from March 2006, all the SNPs in the position of the DMD gene, output as a table with all the fields (Fig. 11B). Click ‘‘get output’’ to see the results. You will be presented with a lot of text on a new web page (Fig. 12B). This tab-delimited page could be saved in a text editor, uploaded

88 into a spreadsheet, and used for any number of purposes. All of the SNPs in the genomic span that comprise the DMD gene are listed, in just a few seconds. Sample search for filtered SNPs Maybe, though, the only SNPs of interest to you are those which change the amino acid sequence of the protein. The others are not important for you at this point. There is a way to get just those instead. If you return to your Table Browser form your choices should still be in place. Now we can decide to ‘‘filter’’ that set for just the coding, non-synonymous SNPs. In the sixth row you will find the filter ‘‘create’’ button. Click that to see your filtering options (Fig. 13A). A new page will open that lets you specify the kind of data that you will pull from the underlying database. Essentially this is a form that creates an SQL query for you. (At the bottom of the form you can enter a free-form SQL query if you know SQL, but you do not have to know it to use this form.) You can make menu choices and enter your own text in the individual boxes instead. For this example we will focus on the function data parameters, but as you can see you can be really precise about exactly which snp126 data you want to examine. We will choose func: does include coding non-synon, and store that by clicking the submit button. Once you submit, the buttons for filter will change to ‘‘edit’’ and ‘‘clear’’ back on the Table Browser main form, reminding you that you have made filter choices. Like the Genome Browser, changes and choices you make in the Table Browser will be ‘‘remembered’’ until further changes are made. If we now get the output of ‘‘all fields from selected table’’ like we did before, you will get a new list of items (Fig. 13B). At this time we get a much reduced set of SNPs – only those which change the amino acid sequence, according to the dbSNP database. Sample search for SNPs in transcription factor binding sites So we have easily downloaded large swathes of data, or chosen to extract a subset with a filter, in a few simple steps. Let us clear that filter (by clicking the clear button) and try something else. We can create even more complex queries – we can ask which SNPs in DMD overlap with other data in the Genome Browser database. For example, one of the data types stored in the database is for predicted Transcription Factor Binding Sites (TFBS). If you click the seventh row ‘‘intersection’’ create button, you can combine your SNP query with the data from the TFBS dataset (Fig. 14A). In the new interface, select Expression and Regulation group and then TFBS Conserved track. The appropriate table will be selected. You can choose the default of all overlap or specify other combinations. For our purpose we will

89

Fig. 13. Filters can be used to refine the query and focus on important features.

(A) The filter for the SNP table allows refinement on many aspects. We highlight the function choice, which will allow us to select only SNPs which are coding nonsynonymous changes to the amino acid sequence of the protein. (B) The results of the filtered query show only SNPs that meet our criteria.

simply choose the defaults – but be aware that you can create the overlap in various ways. When you submit these choices, the Table Browser main interface will reflect this by changing the intersection button to ‘‘edit’’ and ‘‘clear’’ to remind you that changes have been made. This time, let us select the output format as ‘‘hyperlinks to Genome Browser’’ and get the output. To summarize the query: we want to see in the human sequence from March 2006, the SNPs in the position of the DMD gene, but only those that intersect with predicted TFBS, output as hyperlinks to the browser. Click ‘‘get output’’ to see the results. As you can see in Fig. 14B, you can get the list of SNPs that meet our criteria, and use the links to have a look at them in the Genome Browser viewer.

90

91 Another feature in the Table Browser is the correlation tool which we will only briefly introduce here. It is available for data tables which contain genomic positions and computes a simple linear regression on the scores in two datasets. If a dataset does not contain a score for each base position, then the Table Browser assigns a score of 1 for each position covered by an item in the table, and 0 otherwise. The Table Browser computes the linear regression quickly and then displays several graphs for visualizing the correlation, as well as summary statistics including the correlation coefficient ‘‘r.’’ When datasets and parameters are chosen with some forethought, the correlation feature is a powerful tool for quickly gauging the relationship between two datasets. This correlation tool allows you to see what, if any, correlation there is between two datasets. For example, you might want to determine if there is any correlation between GC content and chromosome structure or between certain types of genes and repeats. As you can see there are many ways to create and run queries that could be very intricate and informative. We hope this introduction gives you an idea of where you can go with this tool. There are other complexities and details about the Table Browser which we cannot cover here – but we hope this taste will whet your appetite for more. More details on the Table Browser If you would like to know more about how the Table Browser was developed, its features and technical aspects, the link on the left navigation bar will take you to a list of UCSC Genome Bioinformatics Group publications. One in particular pertinent to our discussion is the paper from the 2004 database issue of Nucleic Acids Research Journal on the Table Browser [10]. The list of publications will also include more recent publications about changes and enhancements to the database. For superusers who know SQL, you can even access the underlying data directly from this URL http://genome.ucsc.edu/FAQ/FAQdownloads# download29 and build your own SQL queries. This is beyond the scope of our introduction, but we wanted you to be aware that it is possible. Another way to take your queries further is to use Galaxy. Galaxy is a separate resource provided by a team at Penn State [13]. Queries created on the UCSC Genome Browser site can be sent directly to Galaxy where users Fig. 14. Combining datasets is possible with the ‘‘intersection’’ button of the Table

Browser. This can increase the complexity of the query, as well as refine the search space. (A) The intersection interface allows you to select which type of data you want to combine with the previous part of your query. It is also possible to submit how much overlap should occur between the sets, and whether the query should be ‘‘and’’ or ‘‘or’’ between the items. (B) Output as ‘‘hyperlink’’ lists your results with direct links back to the browser for viewing.

92 can perform additional types of analyses. These features are beyond our discussion here, but we encourage you to explore the Galaxy tools when you are ready. Custom tracks: Your query as a new track Once you have created some complex Table Browser queries, you may find that you want to view the output back in the Viewer. Or you may want to query on that even more with the Table Browser itself. If you choose the ‘‘custom track’’ output format from the Table Browser, you can save the work in a number of ways (Fig. 15). If you have spent time building a complex query and may want to revisit that later, we suggest that you ‘‘get custom track in file’’ which will be saved to your desktop, and can be uploaded at any time. The other options are only temporary – you can look at the data for 48 hours in the viewer or the Table Browser. But in a file you can view it again, or send it to others for them to see. All you will need to do is access an ‘‘add custom tracks’’ button from the Gateway or a Genome Browser mid-page control button.

Fig. 15. Complex query results from the Table Browser can be saved as custom tracks. These can be stored to use again later, or viewed in the Genome Browser. They can be further queried in the Table Browser. Setup features are available, including the place for a URL that could link to a page you create describing your data.

93 Custom tracks with your own data Besides the ability to create displays on complex query results that you have generated, you can also create custom tracks from your own data that you have obtained in a lab or in other software tools. These are also called ‘‘custom tracks’’ and offer a terrific way to visualize new data types that are important for your research, in the context of the other known genomic information. Custom tracks are also becoming a useful way to present data that is submitted to publications. To access the instructions and documentation for custom tracks, click on the homepage navigation for ‘‘Custom Tracks’’ (Fig. 16A). There is a link taking you to directions of how to create a custom track from your data, ‘‘Displaying Your Own Annotations in the Genome Browser.’’ We are going to briefly look at these instructions for a quick overview on how to get and view your own data in the browser. There is a lot that you can do with this custom track file, but the basics are quite straightforward – essentially you need to specify a genomic location and use a bit of structure, but it does not require programming skills. Generating custom tracks is a four-step process. In this process, you can type the information into any text editor or spreadsheet program. The four steps are (Fig. 16B):  Define how you want the annotation track to show up by default (colors, visibility, etc.).  Define the characteristics you want the Genome Browser to open up to (chromosomal location, pixel size, etc.).  Format your data.  Upload the file you created and view your track. Your first step is to set up the track characteristics. A track attribute line must go first in your file. Track attributes are on one line with nothing separated by a carriage return. If a file contains only one custom track, the track line is optional, if there are multiple custom tracks in one file, each must start with a track line. First the line starts with the word ‘‘track’’; this is followed by an attribute name and equal sign and the attribute characteristics. We show you a few here, but there are many others to choose from; you will need to look at the help section to see what those are. The second characteristics are the browser attributes – how do you want the viewer to set itself up to hold and view your track? Every browser attribute begins with the word ‘‘browser,’’ a space, and then an attribute name. This is followed by a space and then the attribute feature. Each browser attribute must be on a separate line of the file (unlike track characteristics which must be on one line). You can set the default image width of the browser, and you are able to set up the default visibility of the

94

Fig. 16. Custom tracks can be used to display your own data in the browser, and

become available for queries in the table browser. (A) Information about the process of creating custom tracks is accessible from the home page. A new page with a link to the formatting instructions as well as examples of community-contributed tracks is at hand. (B) A sample of the formatting of a custom track. It does not require programming – just a little bit of structure to put things in the right rows and columns. This sample code generates a diagram like the one shown below it. This same data also becomes available in the Table Browser for more query options.

other annotation tracks. For example, ‘‘hide’’ followed by a space and ‘‘all’’ will hide all other annotation tracks, while ‘‘full’’ followed by the name of a data table will put the annotation track on full by default. The ‘‘all’’ will override other tracks you might put in. You can add several tracks to each visibility attribute by separating them by a space. One thing to note though: both the browser and the track lines are optional, you are not required to set those up to view a quick track. The next step is to format your data. There are several different types of data formats you can use and the help section on custom tracks has explanations of several, including how to set them up. We are going to

95 format our data as a Browser Extensible Data (BED) record as an example since BED is relatively straightforward and a format used by UCSC. Here you see an example of what a BED record format looks like (Fig. 16B). The BED record is simply tab delineated text with the following columns: the chromosome number the item resides on and the chromosomal location start and end points of your individual items. Optionally, you can name your item, give it a ‘‘score’’ of 0–1,000, strand, box features to be drawn, color, and more. That is an overview of how a BED record is set up and the fields and types of data you can include. Other formats will have different fields and types of data you might be able to include. See the help section for more explanation of those. Once you have a custom track file, you can upload it to the browser for viewing and querying. Go to the Genome Browser Gateway by clicking ‘‘genomes’’ from the top navigation bar found on all pages, choose the genome and assembly you wish to add your track to, and then click the button that says ‘‘add your own custom tracks’’ (Fig. 17). This will take you to the form to upload and manage your new track. This is also the place to upload a custom track file you might have saved from a Table Browser search earlier. This can either be a custom track text file you have created of your data or a previously downloaded custom track file of a table search you saved. Alternatively, you can copy and paste the custom track file you have created as we have done here using the custom track we just created in previous examples. Once you have uploaded or pasted in your custom track file, you

Fig. 17. Uploading and managing custom tracks is straightforward. Clicking a

‘‘manage custom tracks’’ button from a Gateway page yields a ‘‘Manage Custom Tracks’’ interface where these tracks for that species can be uploaded, deleted, and updated.

96 then click submit and you will be taken to the Genome Browser. Additionally, if you scroll down the page, you will find the new track is now displayed in the controls and you are able to change its visibility from the default we had set. The track will also now appear in the Table Browser, allowing you to filter and intersect with other tracks and do further analysis. Custom tracks offer a very flexible and simple way to view and analyze your own data along with any other annotated data in the Genome Browser. Together with the Genome Browser Viewer and the Table Browser, custom tracks create a very powerful and flexible tool for analysis and discovery. Community tracks We have shown you how you can create your own track. Members of the scientific community have created tracks like this and have made them available for others’ use. Sometimes they will be linked from a publication, or from a research team’s web site. You can find a list of some of these from the UCSC Genome homepage. Clicking on the custom tracks link in the left menu of the homepage will take you to the same Custom Tracks page we described earlier. If you scroll down a bit, you will see that many labs have submitted data in the form of customized annotation tracks for the community to view. Clicking on the link of any of these submitted tracks will open up a browser page to the Genome Browser with that track now included for your viewing – or for querying in the Table Browser. Saving your views as sessions After you have been working around the UCSC Genome Browser, and have done creative and informative queries and custom tracks, setting up your view exactly the way you want it – you can save this to replicate it exactly in the future, or even to share this setup with others. The UCSC team has included session management functionality in the Genome Browser, which allows users to save and share browser sessions. Users are able to configure their browsers with specific track combinations, including custom tracks, and save the configuration options. Multiple sessions may be saved for future reference, for comparison of scenarios or for sharing with colleagues. Saved sessions persist for one year after the last access, unless deleted. Custom tracks persist for at least 48 hours after the last time they are viewed. The feature is accessed via the ‘‘Sessions’’ link in the top blue bar in any assembly. To ensure privacy and security, users must login to the genomewiki site and create a username and password. Individual sessions may be designated by the user as either ‘‘shared’’ or ‘‘non-shared’’ to protect the privacy of confidential data. A URL will be generated that you can save or send to others to view that ‘‘session’’ with your settings.

97 Associated tools at the UCSC site In addition to the standard visualization and querying that you can perform with the UCSC Genome Browser, there are other tasks, analyses, and data types that you might want to examine around your molecular data. We will briefly introduce you to a few of these handy additional tools here.

Gene Sorter Have you ever wanted to quickly see if there are genes similar to your gene of interest – similar expression pattern, protein sequence, domains, functions, etc.? The Gene Sorter can be a powerful search tool for analysis of your gene in question or to find genes that might be of interest to your research. The Gene Sorter takes a gene of interest and lists other genes in the genome sorted by a similarity type to the reference gene you chose. Each gene listed also has columns of data about that gene, and the genes can be filtered to limit the search. A sample of the Gene Sorter interface is shown in Fig. 18A. When you first approach the Gene Sorter, you are given several choices. First, you must choose which species you want to search in. At this time, several popular species are available for your queries. Next you will choose which assembly you wish to search; the default will most often be the recent one. Then you will add the gene name or accession number of the gene you want to sort by. If we enter a gene name or keyword, the results will give us a list of all gene records that contain that name or keyword for you to choose from. Your next major choice is by what criteria do you wish to seek similarity to the gene in question. You have several choices of pre-calculated similarity sorting options in the menu, but the specific choices and datasets will vary by species and assembly. For example, some of these include similarity in expression patterns, protein homology, Pfam domain structure, gene distance, name, Gene Ontology (GO) similarity, and others. Your next option is to configure the display to add, subtract, and change data columns. Click the ‘‘configure’’ button, and the configuration page will appear (Fig. 18B). Some columns of data will be checked and shown by default, or you may change the configuration of the resulting data, adding more columns of data. For example, you can add a column of GO terms for each gene in the family, or the best mouse ortholog. You can also change the order in the table that the column appears with the up and down arrows. Some data columns can be configured further. For example, the GNF Atlas 2 expression data column can be configured to show all tissues or a selected set, or have the values to be absolute or a ratio to the mean level of expression. There are a few other things you can change such as the colors

98

Fig. 18. The Gene Sorter tool can be used to find genes related to a gene of interest by

a number of useful metrics, including expression pattern, protein similarity, domain families, and more. (A) The Gene Sorter interface can be used to compare genes in a variety of ways. The specific choices vary by species. The microarray data is available in color in the interface. (The color version of this figure is hosted in Science Direct.) (B) Gene Sorter displays can be configured to output the display to suit your needs. (C) Filtering the data can be a remarkable way to obtain insights – such as looking only for genes highly expressed in a given tissue, or which contain certain domains.

that signify the level of expression in expression data columns and a toggle to show all splice variants. You can hide all data columns, which would be useful if you wanted to choose specific columns and eliminate default or other previous choices. You can choose to show all data columns, or if you need to, return to the default columns. A handy feature that has been added is the ability to add columns of data that are user-generated with ‘‘custom column.’’ This button will take you to a

99 page to upload your custom column and a link to find out how to format your column. This is beyond the scope of this introduction, but if you are interested, the help section is straightforward and should be simple to follow. The next option in your search is to filter the resulting genes that are sorted. This is a powerful feature that allows you to find exactly those genes that might be of interest in your research. It is wise to do some filtering with this feature because you could find yourself with a sorted list of every gene in the database. Clicking the ‘‘filter’’ button will take you to the filter page (Fig. 18C). Here you will find that you can filter any column of data. Shown here are only the first few displayed columns. If we were able to show the entire list, you would see that there are filter boxes for every column of data including those that you have chosen to display in the results and those you have not. For example, you could filter for only genes with gene names that have a specific text string or genes that have a certain minimum or maximum level of expression. For many data column filters, you can paste in or upload a list of filters if you have a large filter list. You are also able to save your filters for later use. To load your saved filters, just click the active ‘‘load’’ button and a list of your saved filters will appear. Once you choose your filters, you can list the resulting genes by name. This will be a text list of all the gene names your search will find listed alphabetically. This can be helpful to get an idea of how large your results list will be, decide if you need to tighten or loosen your filters, or make sure you are getting the genes you are expecting. We can see sample results based on our reference gene, sorted by protein similarity using BLASTP, configured with the data display we chose and filtered based on our choices in the filter interface (Fig. 18A). In-Silico PCR Another tool available from the UCSC bioinformatics team is called In-Silico PCR. Clicking the PCR link from the homepage will provide the interface shown in Fig. 19A. In-Silico PCR searches a genomic sequence database with a pair of PCR primers. It uses an indexing strategy of the BLAT algorithm to do this quickly. When you enter primer sequences, which must be at least 15 bases long, you can obtain the corresponding genomic stretch between these two primers. It is not a complicated tool – the first step is to select one genome and assembly. As we saw with Gene Sorter, a subset of the genomes is available at this time. Next, enter a forward primer and enter a reverse primer. These must be at least 15 bases in length. You have the option to flip the reverse primer if you need to.

100

Fig. 19. In-Silico PCR performs a search for amplification products on genomic

DNA, using primers of your design. (A) The simple interface permits the choice of species and assembly, and requires your primers. Some parameters can be adjusted. (B) Results of an In-silico PCR query with a primer pair that matches two genomic regions. The location of the amplicon, the size, and the sequence are provided, with links back to the genome browser.

The results of a sample In-Silico PCR query using a pair of primer sequences are shown (Fig. 19B). You can see that there is an indication of where your match occurs at the top – which is linked back into the browser so you can see where the match is located. The length of the amplified sequence is provided. The primers that were used are displayed (and the reverse one is flipped if you asked for that). And, you see the sequence you should expect from PCR on genomic DNA samples. Your primers are displayed in the uppercase letters. You will find the melting temperatures of the primers displayed as well. This data is based on code from the Primer3 tool, with some standard settings for salt and oligo concentrations [14]. You can use the In-Silico PCR tool to identify the expected amplification product on genomic DNA between two primers of interest.

101 Proteome Browser Another tool available from UCSC is the Proteome Browser [15]. From the homepage, click the Proteome or Proteome Browser links from the navigation bars to access the Proteome Browser interface (Fig. 20A). You start with a gene symbol or protein ID from the Proteome Browser Gateway, and you will get access to many important characteristics of the protein from the results (Fig. 20B). You can see the intron/exon structure for the protein; see polarity and hydrophobicity; and you can find the molecular weight and isoelectric point, domains, structural features, pathways, and more. This is a handy summary of many of the features of a protein that you might want to know. You can also access these pages from the details page of a known gene. A link to the Proteome Browser is provided on gene details pages. VisiGene Another tool associated with the UCSC Genome Browser is the VisiGene Image Browser of in situ image data. The browser serves as a virtual microscope, allowing users to retrieve images that meet specific search criteria, then interactively zoom and scroll across the collection. VisiGene gathers images from a number of sources: The Gene Expression Database at the Jackson Laboratory, the GenSat collection at NCBI, the Mahoney Center for Neuro-Oncology mouse transcription factors, the National Institute for Basic Biology in Japan’s collection of Xenopus embryos, and the Allen Brain Atlas collection. The simple search form – where you can enter a gene symbol, author, body parts or stages of development, and several types of IDs – is shown in Fig. 21A. The data is returned as a page with the images – which can be zoomed in or out – and are all linked to the publications and the appropriate genomic data resources. Shown here is an in situ hybridization for mRNA detection, but there are also other types of localization as well – such as antibodies for proteins, and so on (Fig. 21B). For some more spatial and temporal information about your genes of interest, and perhaps identify some reagents that might help you in your research, please be sure to check out the VisiGene Image Browser. Additional species browsers The software that is used to create the UCSC Genome Browser can be used by other groups to display their species of interest. An excellent example of this is a browser devoted to Archaeal species (http://archaea.ucsc.edu/; Fig. 22A) [16]. A malaria browser has been developed by the Ares lab and recently released (Fig. 22B; http://areslab.ucsc.edu/) [17]. A version of the

102

103 UCSC Genome Browser with Arabidopsis sequence data is available (http:// epigenomics.mcdb.ucla.edu/) [18]. A browser that contains data for the fission yeast Schizosaccharomyces pombe is available (http://pombe.nci.nih. gov/genome/) [19]. Mirror sites for the UCSC Genome Browser have been set up and installed in various locations. A mailing list with a great deal of useful advice has been set up specifically for teams who set up local copies or who would like to use the browser for alternative species. It is also possible for companies to take a copy of the browser behind a firewall for internal corporate use, with the existing species of interest or to add their own. Licensing information for this sort of use is available from the homepage. Coming attractions UCSC plans to continue to extend the Genome Browser in the years ahead to include more species and assembly updates, particularly for primates, model organisms, and species of critical importance to evolutionary studies. The sparse ENCODE datasets will be expanded to genome-wide annotations, and many new ENCODE tracks will be added. UCSC may also provide browsers for the low-coverage (2x) assemblies currently included in the Conservation tracks for higher-coverage organisms. The human and mouse UCSC Genes annotations will be updated approximately 3–4 times per year. There are plans to expand the browser’s collection of human variation, disease-related, expression, and genome-wide association data, and to explore the incorporation of federated data into the browser. Among the enhancements planned to the display and data-mining tools are improved use of custom tracks in the Table Browser, extending display features such as the next/ previous item navigation to a larger range of tracks, and the addition of a user-annotated wiki track. Next steps Once you have begun to delve in to the UCSC Genome Browser, you may find you have additional questions. Or you may find you want to do something more complex than we have illustrated in this introduction. Perhaps you will be curious about what other people are doing with this powerful tool. Maybe you will find you would like others in your lab or Fig. 20. The Proteome Browser presents protein-level data about a given isoform.

(A) A simple query box offers access to the protein data. (B) A sample Proteome Browser page illustrates exon structure, biochemical properties, domains, pathway links, and sequence data.

104

Fig. 21. The VisiGene browser is a virtual microscope of spatio-temporal gene

expression data. (A) VisiGene can be queried by genes, authors, developmental stages, tissues, and more. (B) Images are displayed for browsing (left) and for closer examination of your selections (right). The browser also provides links back to the genome browser and to other relevant resources relating to the image data, such as clones or probes, literature, and more.

105

Fig. 22. Other genome data can be displayed in the framework of the UCSC Genome

Browser tools. Other groups may use the UCSC Genome Browser framework to display their species of interest. (A) The homepage for the Archaeal Browser, http:// archaea.ucsc.edu/. Shown is a sample of the browser display of Escherichia coli K12 from the ‘‘Archaeal Browser.’’ The layout and software is similar to that of the main UCSC Genome Browser. (B) The Ares Lab Malaria Browser, http://areslab.ucsc. edu/, also has a familiar layout, but with different species of focus.

106 department to become more familiar with the resources. There are several ways to move forward and to go deeper.

Get help from the mailing list and documentation The ‘‘Contact Us’’ link of the UCSC Genome Browser homepage will take you to the page with contact information, address, phone, and email for the browser team. But the best and most active source of information is actually the mailing lists – a discussion group of people who are using the Genome Browser for all sorts of research. You can search the mailing list archives to see if your question has already been answered; if it has not, you can ask the Browser team yourself. There are actually two lists. One is a ‘‘low traffic list’’ of occasional general browser announcements, such as a new software tool or species genome update. The other is an interactive list where you can ask questions about the browser (simple or complex) and see what questions others are asking. The people at the UCSC Genome Browser answer quickly and in necessary detail. It is highly suggested that you sign up for the mailing list if you use the Genome or Table Browsers on a regular basis. The UCSC Genome Browser provides many data types, tools, and features that can assist you in searches for information that will enhance your research. However, a single chapter can only provide a snapshot of the tools at a given time. New features will be added frequently. Keeping track of the activities around this resource is made easier with the mailing lists. There is also documentation for each tool from the Help and FAQ navigation links on the homepage, and the genomewiki (http://genomewiki.ucsc.edu/). Citations and publications are accessible from the left navigation area on the homepage, with lists of numerous papers that describe the features.

Get training, or train others UCSC Genome Browser contracts with OpenHelix to perform outreach activities at conferences and in training workshops. OpenHelix also delivers regularly updated training materials, including online audio-visual tutorials, slides with full script, and exercises to reinforce the concepts developed in the tutorial materials. The materials are modular, so you can review the sections you want to remember, keep up-to-date, or use the material for training others. You can also receive a Quick Reference Cards with handy tips on using the UCSC Genome Browser (Fig. 23). You can also apply to bring a workshop to your institution (http://www.openhelix.com/ucsc). The goal of all these resources is to help researchers become effective and efficient users of the UCSC Genome Browser.

107

Fig. 23. OpenHelix provides training materials on the UCSC Genome Browser

features. These can be used to learn more or refresh your knowledge, or to train your staff and students. Tutorial movies, slides with full script, exercises, and Quick Reference Cards are available free of charge for non-commercial use. http:// www.openhelix.com/ucsc. (The color version of this figure is hosted in Science Direct.)

Acknowledgements We acknowledge support from a grant from the NIH (1R43GM073145-01) for early work to develop effective training materials. We would like to thank the entire Genome Bioinformatics Group team members who have developed and maintained this excellent resource (http://genome.ucsc.edu/staff.html). We appreciate the encouragement and contractual support provided by the team leaders Dr. Jim Kent and Dr. David Haussler for training and outreach work performed by OpenHelix. References 1. Multiple Authors. Nucleic Acids Res 1996;24:1–252. 2. Galperin MY. The molecular biology database collection: 2007 update. Nucleic Acids Res 2007;35:D3–D4. 3. Berman HM, Westbrook J, Feng Z, Gillil G, Bhat TN, Weissig H, Shindyalov IN and Bourne PE. The protein data bank. Nucleic Acids Res 2000;28:235–242. 4. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM and Haussler D. The Human Genome Browser at UCSC. Genome Res 2002;12(6):996–1006.

108 5. International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 2001;409(6822):860–921. 6. Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M and Haussler D. The UCSC known genes. Bioinformatics 2006;22(9):1036–1046. 7. Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D and Miller W. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res 2004;14(4):708–715. 8. Kent WJ. BLAT-the BLAST-like alignment tool. Genome Res 2002;12(4):656–664. 9. Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ. Basic local alignment search tool. J Mol Biol 1990;215(3):403–410. 10. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D and Kent WJ. The UCSC Table Browser data retrieval tool. Nucleic Acids Res 2004;32(Database issue):D493–D496. 11. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM and Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001;29(1):308–311. 12. The ENCODE Project Consortium et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 2007;447(7146):799–816. 13. Giardine B, Riemer C, Hardison RC, Burhans R, Elnitski L, Shah P, Zhang Y, Blankenberg D, Albert I, Taylor J, Miller W, Kent WJ and Nekrutenko A. Galaxy: a platform for interactive large-scale genome analysis. Genome Res 2005;15(10):1451–1455. 14. Rozen S and Skaletsky H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 2000;132:365–386. 15. Hsu F, Pringle TH, Kuhn RM, Karolchik D, Diekhans M, Haussler D and Kent WJ. The UCSC Proteome Browser. Nucleic Acids Res 2005;33(Database issue):D454–D458. 16. Schneider KL, Pollard KS, Baertsch R, Pohl A and Lowe TM. The UCSC Archaeal Genome Browser. Nucleic Acids Res 2006;34:D407–D410. 17. Chakrabarti K, Pearson M, Grate L, Sterne-Weiler T, Deans J, Donohue JP and Ares M, Jr. Structural RNAs of known and unknown function identified in malaria parasites by comparative genomics and RNA analysis. RNA 2007;13(11):1923–1939. 18. Zhang X, Clarenz O, Cokus S, Bernatavichute YV, Pellegrini M, Goodrich J and Jacobsen SE. Whole-genome analysis of histone H3 lysine 27 trimethylation in Arabidopsis. PLoS Biol 2007;5(5):e129. 19. Cam HP, Sugiyama T, Chen ES, Chen X, FitzGerald PC and Grewal SIS. Comprehensive analysis of heterochromatin - and RNAi-mediated epigenetic control of the fission yeast genome. Nat Genet 2005;37:809–819.

109

Potential implications of availability of short amino acid sequences in proteins: An old and new approach to protein decoding and design Joji M. Otaki1,, Tomonori Gotoh2 and Haruhiko Yamamoto3 1

The BCPH Unit, Laboratory of Cell and Functional Biology, Department of Chemistry, Biology and Marine Science, University of the Ryukyus, Nishihara, Okinawa 903-0213, Japan 2 Department of Information Science, Kanagawa University, Hiratsuka, Kanagawa 259-1293, Japan 3 Professor Emeritus, Kanagawa University, Yokohama, Kanagawa 221-8790, Japan Abstract. Three-dimensional structure of a protein molecule is primarily determined by its amino acid sequence, and thus the elucidation of general rules embedded in amino acid sequences is of great importance in protein science and engineering. To extract valuable information from sequences, we propose an analytical method in which a protein sequence is considered to be constructed by serial superimpositions of short amino acid sequences of n amino acid sets, especially triplets (3-aa sets). Using the comprehensive nonredundant protein database, we first examined ‘‘availability’’ of all possible combinatorial sets of 8,000 triplet species. Availability score was mathematically defined as an indicator for the relative ‘‘preference’’ or ‘‘avoidance’’ for a given short constituent sequence to be used in protein chain. Availability scores of real proteins were clearly biased against those of randomly generated proteins. We found many triplet species that occurred in the database more than expected or less than expected. Such bias was extended to longer sets, and we found that some species of pentats (5-aa sets) that occurred reasonably frequently in the randomly generated protein population did not occur at all in any real proteins known today. Availability score was dependent on species, potentially serving as a phylogenetic indicator. Furthermore, we suggest possibilities of various biotechnological applications of characteristic short sequences such as human-specific and pathogen-specific short sequences obtained from availability analysis. Availability score was also dependent on secondary structures, potentially serving as a structural indicator. Availability analysis on triplets may be combined with a comprehensive data collection on the j and c peptide-bond angles of the amino acid at the center of each triplet, i.e., a collection of Ramachandran plots for each triplet. These triplet characters, together with other physicochemical data, will provide us with basic information between protein sequence and structure, by which structure prediction and engineering may be greatly facilitated. Availability analysis may also be useful in identifying word processing units in amino acid sequences based on an analogy to natural languages. Together with other approaches, availability analysis will elucidate general rules hidden in the primary sequences and eventually contributes to rebuilding the paradigm of protein science. Keywords: amino acid composition, amino acid triplet, availability, nonredundant protein database, secondary structure, sequence similarity search, short constituent sequence, species specificity, structure–function, structure prediction.

Corresponding author: Tel.: þ81-98-895-8557. Fax: þ81-98-895-8557.

E-mail: [email protected] (J.M. Otaki). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00004-5

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

110 Introduction Protein is an interesting and fascinating group of molecules from the viewpoints of basic biology and biotechnology. We first overview general trends of protein sequence analysis from our historical and philosophical perspectives. We then discuss our novel approach to analyze protein sequences based on short amino acid sequences using a concept of ‘‘availability’’. In this way, we attempt to make the historical and philosophical status of our approach in protein science clear. We here start off with a well-known question on protein sequences: How were real protein sequences ‘‘chosen’’ out of the astronomical number of possible combinatorial sequences? Although possible answers to this question are necessarily multifaceted, we eventually need mechanistic explanations that are sufficient for rational protein design. Importantly, this question is directly related with the classical paradigm of protein science: sequence determines structure that determines function. Our approach, an analytical method for protein amino acid sequence, is necessarily related to this paradigm. Historical and philosophical perspectives on protein sequence information Necessity and chance in natural ‘‘protein design’’ A protein molecule is composed of amino acids connected covalently via peptide bonds. A given position of amino acid in a chain can be theoretically occupied by any one of 20 amino acids available in a cell. Possible amino acid sequences that can be made combinatorially from 20 species of amino acids increase exponentially in accordance with the increase in the number of amino acids that constitute a given sequence (Table 1). The number of combinatorially possible protein species with a typical length of, say, 350 amino acids is 20350, which is roughly 2.3  10455. This is certainly the astronomical number. In reality, however, only a limited number of sequences were ‘‘chosen’’ in proteins. Why and how? And what kinds of sequences are used and not used in proteins? One aspect associated with this problem is the fact that the number of proteins on the earth are quite small, compared to the entire population of combinatorially possible sequences. Suppose an average biological species such as Homo sapiens and Drosophila melanogaster has 104 genes and their corresponding 104 proteins, ignoring splicing variants, although this may still be an overestimate. In one study, the number of insect species on the earth was estimated to be about 5 million [1]. Since insects are thought to occupy about 80% of the whole biological species, it can be estimated that about 6 million species exist on the earth. For the sake of discussion, here suppose the number of biological species is 107, although this number of species diversity

1

2.0  101

n

20n

4.0  102

2

8.0  103

3

1.6  105

4 3.2  106

5 1.0  1013

10

30

1.0  1.1  1026 1039

20

100

1.1  1.3  1065 10130

50

1.0  10260

200

2.0  10390

300

2.3  10455

350

Table 1. The number of combinatorially possible sequences (20n) with respect to the number of amino acids (n) in proteins.

112 is probably an overestimate. Thus, the total number of protein species becomes 1011. To include the number of extinct proteins, suppose all 1011 proteins have been shaped by natural selection through continuous ‘‘minor revision’’ every year. Our earth has 4.6 billion years, and the overestimated 1010 years will suffice for our purpose. Then, the number of proteins invented on the earth in the past and present becomes 1021. If every protein sequence has been ‘‘revised’’ every second (which would not be true!), our outrageous estimate becomes B1028. Still, this number is not even comparable to the number of combinatorially possible protein species (Table 1). It is highly unlikely that the real protein sequences on the earth are determined after the full exploration of the entire population of combinatorially possible protein sequences. It is conceivable that only a tiny fraction of possible protein sequences have been explored in the course of protein evolution. This is probably because the seed sequences for ‘‘the tiny fraction’’ were accidentally determined at least to some extent and such seed sequences evolved to related sequences. Given an accidental aspect of biological sequence evolution, we next want a reasonable explanation on why and how such a tiny fraction of real proteins was ‘‘chosen’’ by nature, but not other tiny fractions of similar nonexistent proteins even from the identical seed sequence. It is easily conceivable that only a limited number of sequences can thermodynamically fold into ‘‘proper’’ three-dimensional structures (and hence execute specific functions according to the structure–function paradigm), which have been naturally selected in the course of protein evolution [2]. An invention of a new gene or protein appears to be carried out by gene fusion, duplication, or similar mechanisms [3–5]. Such evolutionary processes necessarily employ the same sequences repeatedly in different proteins. Thus, protein sequences on the earth are unavoidably ‘‘biased’’ from the random incorporation of amino acids in chains via natural selection based on folding structures and functionality. To a first approximation, frequently used sequences are supposed to be the ones that fold into specific structures and show specific function. In contrast, rare or nonexistent sequences are supposed to be the ones that do not fold well and do not show usable functional properties. Although it appears that this conventional answer has been widely accepted, it is tautological and unsatisfactory. Here we just point out that this is not to mention that nonexistent sequences do not fold and function at all. As discussed already, simply because combinatorially possible protein sequences are tremendously large in number and evolutionary processes are highly conservative, evolution would not have to have fully explored the entire possibility of sequences before the present-day proteins were made. Thus, what we really want is to what degrees the present real protein sequences are ‘‘necessary’’ for their functions with respect to the entire population of combinatorially possible sequences. To put it differently, how much physicochemical ‘‘necessity’’ is embedded in that sequence? And how

113 much did historical ‘‘chance’’ contribute to the determination of that sequence? We need to mechanistically understand what kinds of sequences fold how with regard to both existing and nonexistent ones. Without question, it is not a trivial matter to answer this question to the level of explanations that make most protein scientists feel convinced. If one can meaningfully compare all the real protein sequences to the entire collection of theoretically possible sequences, it would not be conceptually difficult to answer which sequences are used and which sequences are not used in organisms, through which it might be possible to obtain general rules for protein design. Practically, the astronomical number of the possible sequences prohibits such an ambitious attempt. But let us take a look at a bright side, which is the large population of combinatorially possible sequences that nobody, even the Mother Nature, has ever touched at all. It could serve as one of the theoretical grounds for a biotechnological opportunity in protein engineering in the future. And for the time being, it is important for us to put fundamental facts on the real protein sequences in order, which may hint at how to use such knowledge for biotechnological applications. Especially important is relations among sequence, structure, and function in the real protein population on the earth. Structure prediction and sequence similarity search Protein three-dimensional structure is thought to be determined solely by physicochemical nature of its amino acid sequences [6–8], despite that the actual folding process may need chaperons [9]. We still largely believe that amino acid sequence specifies its three-dimensional structure, and threedimensional structure of protein molecule specifies its function. Although some important exceptions have emerged as will be discussed in the section ‘‘Related studies on short amino acid sequences’’, this is the age-old paradigm of protein science, and our mechanistic understanding of this paradigm is still far from complete. If we thoroughly understand the physicochemical nature of amino acid sequences and decode the hidden ‘‘words’’ and ‘‘grammar’’ in them, we should be able to reconstruct the entire three-dimensional structures of proteins in silico solely from the primary sequences, which has not been achieved yet. Indeed, this entire process is so complex that it cannot be dealt with at once easily. Amenable and widely appreciated research foci that can be investigated practically at this point are how to develop a prediction method for secondary structures, a-helix [10] and b-strand [11], from the primary sequences on the one hand, and how to perform sequence similarity searches and develop alignment methods to compare sequences on the other hand. Fortunately, the amount of data that have been deposited in the databases on protein amino acid sequences and three-dimensional structures have increased rapidly. These databases have been serving as a basis for a recent

114 progress of bioinformatics on structure prediction and sequence similarity searches. One of the earliest attempts to elaborate such a prediction method was made by Chou and Fasman in 1970s [12,13], which examined the frequency of each amino acid (i.e., amino acid composition) in a-helix, b-strand, and turn with a limited number of protein three-dimensional structure data available at that time. A more physicochemical approach was made by Wu and Kabat in 1970s [14–16], which tried to define the j and c peptide-bond angles of the protein main chain based on the assumption that amino acid triplet is a unit restricting the j and c peptide-bond angles because they are mainly determined by the two flanking amino acids. Other representative empirical approaches are the Garnier-Osguthorpe-Robson (GOR) method [17,18] and the stereochemical method [19]. Although these trials for developing a secondary structure prediction method may be the simplest first step to mechanistically understand the basic folding process, the prediction accuracy of these methods was generally low, which is about 50%. Since then, many types of prediction programs have been developed with the help of the accumulated three-dimensional data, and indeed the prediction accuracy has recently been improved significantly, although the accuracy remains below 80% [20,21]. Nowadays, a new protein is often ‘‘discovered’’ as an array of letters from genomic or complementary DNA sequences without any other useful information on its three-dimensional structure and function. Sequence similarity searches are the most popular and obvious first step to characterize such an unknown protein. Indeed, protein sequence databases are most often and somewhat exclusively used in this way from a structure–function point of view. In most cases, sequences with at least more than dozens of amino acids are studied. The representative sequence similarity search program is BLAST [22,23], which first searches for short sequences as similarity seeds called ‘‘words’’ and expand them to compare longer sequences. Sequence, structure, and function Extensive use of protein databases in similarity searches naturally focuses on highly conserved sequences of proteins in multiple alignments. For this reason, common patterns are often searched for based on alignments. In a context of sequence similarity searches, ‘‘sequence motif ’’ refers to relatively short patterns of amino acids repeatedly seen in many different proteins. In a more structural context, ‘‘structural motif ’’ refers to super-secondary structures composed of a few secondary structures repeatedly found in threedimensional structures of proteins. There is no doubt that structural motifs are the smallest functional units of proteins, and sequence motifs may well represent some functional units of proteins. It should be emphasized that sequence and structural similarities are strongly correlated with each

115 other [24–26] and many programs for sequence–structure analyses has been developed successfully [27–34]. However, sequence–structure relations in proteins are not so simple. Although similar sequences are highly likely to form similar threedimensional structures, it has become apparent that even advanced sequence alignment programs often fail to detect a common sequences for similar three-dimensional structures. This cannot always be attributed to technical problems on the detection sensitivity of those programs. Rather, this fact implicitly indicates that similar three-dimensional structures can be formed by highly different sequences. This low sequence similarity with similar three-dimensional structures is believed to be the common feature of G-protein-coupled receptors (GPCRs), also called seven-transmembrane (7TM) receptors [35–38]. They constitute the largest group of eukaryotic proteins, as a result of highly divergent and convergent evolution [35–38]. Their sequences are highly diverse, and their ligand structures are even more diverse, which range from small ions to proteins [35–38]. Intriguingly, nonalignment-based information can be extracted using many different methods for their functional classification [39–49], suggesting that sequence similarity information reveals only one aspect, although highly informative, of the entire protein structure–function relations. Similar situation is also apparent in the secondary structure prediction methods where the incorporation of distant effects that are not revealed by similarity searches significantly improved the prediction accuracy [21]. Furthermore, in an evolutionary view, it is not very surprising to find a case where two proteins with unrelated sequences and three-dimensional structures show similar function as a result of convergent evolution. A fascinating example can be drawn from olfactory receptors of mammals [50–54] and insects [55–59]. Both mammals and insects use olfactory receptors that span plasma membrane seven times for odorant detection. Their amino acid sequences are totally different from each other, indicating their independent origins. Furthermore, mammalian olfactory receptors are typical GPCRs, whereas insect receptors do not appear to have a typical configuration of GPCRs, whose amino termini are located inside the cell [60–62]. Nonetheless, we believe that both human and fruit fly can detect the same odorant chemicals, for example, amyl acetate, a smell of bananas, suggesting that both organisms harbor functionally similar but structurally different receptor molecules. As discussed above, the entire collection of combinatorially possible protein sequences is extremely large. It may be the case that only a fraction of the entire possible sequences of amino acids is explored in one phylogenetic group, and other fraction of the possible sequences is explored in other phylogenetic group. To put it differently, independent invention of entirely new sequences of proteins with similar structure or function might have

116 occurred several times during the course of evolution as a result of convergent evolution in nature. Likewise, independent invention of entirely new sequences of proteins for a particular function should be possible by human biotechnology. From amino acid composition to short sequences: A database characterization One of the approaches to characterize proteins is to analyze amino acid compositions as discussed already. Suppose a system containing 104 sequences, each of which is composed of 350 amino acids. The number of amino acids in this system is 3.5  106. If each amino acid is incorporated randomly in equal frequency 0.05 at a given position, each amino acid is used 175,000 times in this system and 17.5 times in a given protein. If each amino acid is not randomly incorporated, amino acid composition and hence the number of occurrence of each amino acid of the system will be biased positively or negatively, depending on how often each amino acid is incorporated. This simple thought experiment illustrates that amino acid composition can be a characteristic descriptor of a system or protein. In fact, recent bioinformatics programs often use amino acid composition as a key piece of information for predicting protein characters; for predicting parallel helix–helix interfaces [63], ligand binding sites [64], secondary structures with hydrophobicity [65], and disordered regions with reduced composition [66–69] and candidate DNA/RNA binding proteins from proteome data [70]. The amino acid composition appears to affect the folding property of proteins [71], which may partly explain such a wide range of applications. Interestingly, a proteome-wide composition comparison among species can extract species-specific information [72–75]. These studies collectively demonstrated the importance of each amino acid as the smallest unit in a protein chain. However, it is also true that the compositional analysis does not provide us with any information on sequences themselves. The amino acid composition is merely a single descriptor of a protein among others. It is understandable that, by extrapolating this line of thinking, similarity searches with relatively large pieces of sequences that provide us with sequence-specific information have been preferred as a protein characterization tool in most occasions. Our approach to protein characterization, which will be discussed below, can be recognized as an intermediate between a simple compositional analysis and similarity search, focusing on short amino acid sequences in proteins. On the one hand, from the viewpoint of the compositional analysis, the concept of amino acid composition can be safely expanded into those of ‘‘longer’’ sequences. To put it differently, a protein database can be characterized by the ‘‘compositions’’ of double, triple, and quadruple (and more) combinations of amino acids. On the other hand, from the viewpoint of the entire combinatorially possible protein population discussed at the

117 beginning of this chapter, which is too large to handle rationally, a smaller set that is composed of short sequences may be small enough to handle rationally with a modern computational power, and at the same time, may be large enough to find diverse structures that form functional units. Hence, together with a rapid increase of protein sequence data available in databases, the characterization of a large protein database in terms of short combinatorial amino acid sequences is a timely attempt and it is likely to extract useful information on the relations among sequence, structure, and function. It should be noted that studies focusing on short sequences as an important analysis unit are not entirely new. We already know that the functional sites of proteins consist of a few sequences of amino acids, although they have to be supported three-dimensionally by other parts of the chain. Key residues in ‘‘active sites’’ or ‘‘binding sites’’ are generally composed of a few amino acids [76–78]. As discussed above, sequence motifs are usually relatively short sequences of proteins. Additionally, the success of BLAST, which use short sequences as seeds for similarity searches [22,23], implicitly indicates the importance and usefulness of short sequences in proteins. Biotechnologically speaking, it has been shown that small peptides can act as functional protein modulators [79–81]. In the rest of this chapter, we discuss the biological and biotechnological significance of short amino acid sequences in proteins. We first mention rationale and significance of examining the relative frequency of short amino acid sets, together with the definition of ‘‘availability’’. We then discuss availability differences between species and between secondary structures. Information extraction from protein sequences using availability scores Rationale for protein characterization using short sequences of amino acids We approach to the age-long paradigm of protein science discussed in sections ‘‘Necessity and chance in natural ‘protein design’ ’’ and ‘‘Structure prediction and sequence similarity search’’, focusing on stretches of short amino acid sequences such as two, three, four, five, and n ‘‘constituent’’ amino acid sequences, called here ‘‘doublets’’ (2-aa sets), ‘‘triplets’’ (3-aa sets), ‘‘quartets’’ (4-aa sets), ‘‘pentats’’ (5-aa sets), hexat (6-aa sets), and ‘‘n-aa sets’’. As indicated in Table 1, there are 20 singlets, 400 (¼202) doublets, 8,000 (¼203) triplets, and so on. In other words, 3-aa sets, for example, are composed of 8,000 different triplet species. We consider these short sequences as a building unit of protein molecules. That is, protein molecules are considered to be constructed by serial and linear superimpositions of short sequences, especially triplets. More concretely, our methodology is based on a simple concept of ‘‘counting all possible short amino acid sequence sets in a database’’ to

118 elucidate the hidden words and grammars in proteins [82,83]. The number of short amino acid sequences that are made from random incorporation of n amino acids rapidly increases in accordance with the increase of n (Table 1). With the increase of n, nonexistent sequences in databases emerge and become conspicuous in the theoretically possible combinatorial sequences of at least five amino acids. Further increase of n quickly makes the proportion of nonexistent sequences dominant in the entire population of combinatorially possible sequences. For example, the number of random sequences with 8 amino acids (i.e., 8-aa sets or ‘‘octats’’) is larger than the number of all octats in the nonredundant protein database (see section ‘‘Simple concept of availability: Counting short amino acid sequences in a database’’). Therefore, a significant portion of combinatorially possible 8-aa sets does not exist in the real protein database. Then, what is the minimum number of amino acids in these n-aa sets to distinguish protein species based on its existence or number of occurrence in proteins? Again, suppose a system containing 104 sequences, each of which has 350 amino acids. Focusing on triplets, the number of triplets in this system can be calculated as: Q ¼ X  2N

(1)

where Q is the total number of triplets, X is the total number of amino acids, and N is the number of proteins in the system. This equation can be generalized as follows when we focus on n-aa sets: Qn ¼ X  ðn  1ÞN

(2)

where Qn is the total number of n-aa constituents, X is the total number of amino acids, and N is the number of proteins in the system. Based on this equation, the number of n-aa constituent sequences in our system is 350  104(n1)  104¼104  (350(n1))¼3.5  106  (1(n1)/ 350). On the other hand, the number of n-aa constituents that can be made combinatorially is 20n. For comparison, we calculated Qn and 20n where n¼1B6 in this system in Table 2. It is expected that each constituent sequence of the 5–6-aa sets is used only once or not used at all in this system. This discussion implies that the appropriate number of amino acids that can properly serve as a unique descriptor for each protein is at least five and preferably six in this system. This means that each protein in this system can be characterized and specified by a set of combinatorial sequences of five or six amino acids if they are randomly incorporated in protein sequences. In reality, whether or not these aa set constituents are randomly incorporated in the real proteins has to be examined, which itself contributes to our understanding of the real protein sequences. However, 6-aa sets are

a

3.49  106 4.0  102

3.50  106 2.0  101

3.50  106  (1(n1)/350) 20n 8.0  103

3.48  106

3

The system contains 104 proteins, each of which is composed of 350 amino acids.

Constituents in the system Possible n-aa sets

2

1

n

1.6  105

3.47  106

4

3.2  106

3.46  106

5

6.4  107

3.45  106

6

Table 2. Comparison between the number of n-aa constituent sequences in the system and the number of combinatorially possible sequences of n-aa setsa.

120 composed of 6.4  107 combinatorial sets, which is practically difficult to comprehend and manage. To circumvent this practical difficulty, smaller sets of at most 104 sequences will be desirable, from which we can expand our methodologies and findings to larger sets. In this study, we basically focused on a practically manageable set, a set of triplets (3-aa sets), which is composed of just 8,000 triplet species, as a first step to extract information on protein sequences systematically from databases. If we examine a collection of randomly generated 104 sequences that are composed of 350 amino acids per protein (in which 20 species of amino acids are used randomly at equal probability 0.05), each triplet species will occur approximately 435 times (¼3.48  106/ 8.0  103) as Table 2 suggests. We believe that this number is not very daunting and well within a manageable range for human eyes and comprehension. In addition, there is a sound reason for analyzing triplets instead of larger sets at least at this point. Protein three-dimensional structures are defined by the conformation of the protein main chain that is determined by the j and c angles of peptide bonds connected to a-carbon atoms at the center of three consecutive amino acids. To put it differently, these j and c angles of a given amino acid are surely influenced, at least partially and at most mainly, by the amino acids of immediate neighbors located in both sides. Although in reality, the influences from distant peptides play an important role in the structural determination, such influences cannot be systematically examined at least at this point. This line of thinking considers amino acid triplets as a tentative unit of restricting the rotational freedom of the j and c angles. Once these angles are listed for 8,000 triplets, this information will be used for the determination of the protein main chain, as envisioned by Wu and Kabat in 1970s [14–16] (see sections ‘‘Availability as an indicator for secondary structures’’ and ‘‘Availability-based structure prediction’’). Simple concept of availability: Counting short amino acid sequences in a database Our idea is to examine how many times a given triplet occurs in the nonredundant protein database (‘‘nr’’ database), and this is to be carried out in all possible 8,000 triplet species comprehensively. We examined the ‘‘nr’’ database of NCBI (National Center for Biotechnology Information), which is the most comprehensive and nonredundant collection of protein sequences. At the time we downloaded the database in September 2003, it contained 1,539,248 sequences and 497,523,470 amino acids. A simple attempt can be made to construct a conceptual bar graph of expected 8,000 species of triplets (in x-axis) whose counts are shown in absolute numbers (in y-axis). A protein with n amino acids contains (n2) triplets in the chain. If there are N proteins in the nr database, the number of existing triplets to be counted in this system is N(n2). Practically, this

121 extremely large bar graph cannot be made easily and it is difficult to examine the whole tendency at once at least by human eyes. More importantly, this conceptual bar graph should be compared with that of the randomized ‘‘imaginary’’ or ‘‘ideal’’ proteins that are conceptually made from a pool of amino acids whose composition is identical to that of the database. This takes into account the nonrandom nature of the amino acid composition. This consideration is necessary because in reality, 20 species of amino acids are not used equally frequently to make a chain and thus this is to nullify the effect of the amino acid composition [82,83]. After generating a pool of the ‘‘imaginary’’ proteins, we examined how many times a given triplet occurs in this pool, and this counting was carried out in all possible 8,000 triplet species. Then, a simple attempt can be made to construct another conceptual bar graph of 8,000 species (in x-axis) where triplet-counts are shown (in y-axis), although again this bar graph is also difficult to ‘‘read’’. To highlight the difference between the real counts and the probabilistically calculated counts, availability (A), or ‘‘relative count’’, for each triplet or a given constituent sequence of n-aa set in a database is defined as follows: ðR  EÞ A¼ ¼ E

  R 1 E

(3)

where R is the number of a given triplet or a given constituent sequence of n-aa set that appeared in a real protein database (also called real count), and E is the probabilistically calculated number of a given triplet or a given constituent sequence of an n-aa set (also called expected count). When R ¼ E, its availability score is zero. Thus, deviations from zero may be considered as nonrandom nature of the triplet in question. When R ¼ 0, its availability score is 1, the minimum score. Using availability as a deviation indicator from random sequence, it becomes possible to construct a histogram that is manageable to human eyes (Fig. 1A). It may be possible to decipher biological bias inherent to constructing protein chains in this way [82,83]. Availability bias and its biological and biotechnological implications As discussed above, the distribution pattern of availability scores for 8,000 triplet species can be visually presented in a histogram where x-axis indicates availability scores (i.e., relative triplet-counts) and y-axis the number of triplet species in absolute number (Fig. 1A and 1B). It is evident that the distribution pattern of the real proteins (Fig. 1A) and that if the imaginary proteins (Fig. 1B) are highly different. Triplets from the imaginary proteins are distributed normally around zero, and indeed, almost all samples are

122

Fig. 1. Availability scores of triplets in the nr protein database. (A) Distribution of

triplet availability scores (relative triplet-counts) in the nr database. This should be compared to B. x-Axis is truncated. (B) Distribution of triplet availability scores in the randomly generated imaginary protein pool whose amino acid composition is identical to that of the nr database. Inset shows the same distribution with much smaller bar width, indicating its normality. x-Axis is truncated. Note scale difference of y-axis between A and B. (C) Species-dependence of triplet availability scores. The bold line for ‘‘nr’’ indicates the availability distribution of the entire proteins in the nr database, which is identical to A. Representative five model species were tested: human (Homo sapiens), mouse (Mus musculus), fruit fly (Drosophila melanogaster), nematode (Caenorhabditis elegans), and colon bacterim (Escherichia coli). See also Fig. 2 for further comparison.

123 located within 0.0070.01, as expected from the definition of availability of the imaginary proteins. Sharp contrast is observed in the triplet distribution of the real proteins, which are diverged with a variety of availability scores, although its peak value is also around zero. Among the possible 8,000 triplets, some triplets are more ‘‘available’’ or more frequently used than expected, and others are less ‘‘available’’ or less frequently used than expected. The distribution pattern of availability scores is highly skewed to the positive direction. According to Eq. (3), positive availability scores mean RWE, and negative scores mean RoE. That is, short constituent sequences with positive availability scores are ‘‘preferred’’ and those with negative availability scores are ‘‘avoided’’ as building sequences in proteins. Only about 20% of triplet species are located in a range 0.0070.01, which means that only about 20% of triplets are used almost randomly. Including this, most triplets are distributed in a range 0.5Bþ1.0. This skewed distribution demonstrates that triplet usage is not random; there would be a biological bias in triplet availability in proteins. That is, most triplets are nonrandomly used in the real proteins, although its difference from the random usage is not extremely large. This bias could be one of the reasons that the most popular search program BLAST has been used successfully; BLAST uses triplets as a seed for similarity searches [22,23] (see sections ‘‘Structure prediction and sequence similarity search’’ and ‘‘From amino acid composition to short sequences: A database characterization’’). Is this trend simply an evolutionary and nonfunctional aspect of proteins generated by chance or is this trend functionally critical in constructing proteins generated by necessity? Among the short sequences with highly positive availability scores, repeated sequences and palindromic sequences are abundantly found. These sequences are especially abundant in eukaryotic proteins and believed to be functionally important [75,84,85]. Sequences with negative availability scores are more enigmatic. In our previous work, we showed that some of the probabilistically expected pentats (5-aa sets) do not occur at all in the database despite the fact that they are supposed to be found reasonably frequently, which we call zero-count pentats [83]. This can be considered as an extreme case of negative scores. This low availability in pentats and similar low availability in triplets may be explained by several different reasons, and each constituent sequence would have unique reasons for its low availability. One reason may be that these constituent sequences are thermodynamically too unstable to be produced because of, for example, physicochemical steric hindrance among their amino acid residues. Another reason may be that the biological protein production machinery cannot produce such sequences due to physicochemical reasons of ribosome and its related molecules, i.e., hardware incompatibility of the production machinery.

124 Against these possibilities, these pentats can be chemically synthesized relatively easily as stable peptides by a conventional method [83]. Furthermore, they can also be synthesized as a part of fusion proteins in bacteria at reasonably high yields by a conventional method [83]. Thus, the thermodynamic instability of protein products and the hardware incompatibility of the biological production machinery are unlikely to be the reasons for the negative availability. Similar discussion can be found in a different study, in which based on computer molecular modeling, the possibility of steric hindrance as an explanation for rare triplets was ruled out [75]. Third possibility is an evolutionary reason. It is likely that if the system needs to make these pentats, it can make them as indicated by the synthesis experiments mentioned above [83], but evolution never explored such possibility due to the conservative nature of evolutionary process [3–5]. In other words, there has been no need to use such ‘‘new sequences’’ with unknown risks; the limited explored range of the sequence diversity in the entire population of combinatorially possible sequences is already large enough to make any functional proteins even though unexplored range may be much larger. The negative deviation of availability scores may simply be evolutionary ‘‘remnants’’ (i.e., accidental signature without any functional significance), but even so, such ‘‘remnants’’ are not evolutionary constraints per se because it can be changed relatively easily when necessary [83]. ‘‘Unexplored vacuum’’ in the entire population may be more appropriate description for this possible situation. Forth possibility, an alternative to the third one, is that the biological system might have positively kept away from such sequences through natural selection due to their disadvantageous effects after the full exploration of the combinatorially possible constituent sequences during evolution. If so, the fact that many constituent sequences have nonrandom availability scores could mean that they have some biological ‘‘functions’’. Then, an interesting look at the protein primary structures is that almost all parts of proteins are made of constituent sequences that have unique biological bias or function. To put it differently, as a result of functional evolution of proteins, although it is conservative, most parts of protein sequences may have some functions when viewed from the short constituent sequences. This may be true for the small constituent sequences, because the number of constituent sequences is not extremely large for short ones but it is extremely large for larger sequences (Table 1). The zero-count constituent sequences will occupy the majority of the constitutional sequences in the population among the combinatorial sets with 10 or more amino acids. This could, in turn, mean that evolutionary and functional unit of sequences is not more than 10 amino acids in length. Protein sequences may have to be a collection of small units whose amino acid combinations are within a manageable range for nature to examine their functionality.

125 These third and forth possibilities are not completely mutually exclusive, and depending on a particular constituent sequence, either or both may apply. Whatever the reasons behind the existence of zero-count constituent sequences may be, their applications would be envisioned, for example, as a novel seed for biomedical regents.

Availability difference between species and its biomedical applications Thus far, we discussed availability bias found in the entire entries of the nr database. Nonexistent short sequences that are not found in the nr database are entirely new or very rare to organisms. Accordingly, their biotechnological applications in designing a new type of proteins, for example, with a new folding pattern and function, may be expected. In addition, it would be prosperous to examine availability of particular groups of proteins such as biological species, structural groups, and functional protein groups in order to identify the group-specific constituent sequences. Such identified short sequences will provide biotechnology field with potential seeds for various kinds of applications. From this perspective, we performed comparisons of availability scores between biological species and between secondary structures, although these studies are still preliminary. The finding that the triplet availability in the nr database is biologically biased readily suggests the possibility that the degrees and ways of such bias are different in different biological species. As mentioned previously, speciesspecific preference for amino acid usage has already been demonstrated [72– 75], and indeed, the availability distribution patterns appear to be different from species to species [82,83] (Fig. 1C). The availability distribution of the colon bacterium Escherichia coli is especially different from those of other eukaryotic species, with more flattened pattern. Also noteworthy is the peak value of the nematode distribution, which is positively shifted from those of other eukaryotes. We examined correlations on the availability scores between species. High correlation coefficients were obtained between similar species such as human and mouse. Its scatter plot shows that most triplets are located along the diagonal line (Fig. 2A). This indicates that availability score for a given triplet is highly similar in both species, reflecting smaller phylogenetic distance between them. Lower correlations were detected between distant eukaryotes, such as between human and fruit fly (Fig. 2B) and between human and nematode (Fig. 2C). Further lower correlation coefficients were obtained between evolutionarily distant species, such as human and colon bacterium. Its scatter plot shows almost even distribution from 0.5 to þ0.5 around (0, 0) (Fig. 2D). That is, triplet availability scores of these species are independent of each other at least in this range.

126

Fig. 2. Scatter plots of 8,000 triplets in representative biological species. Both axes are shown from 0.5 to þ1.0. (A) Human–mouse plot. Spearman correlation coefficient rs ¼ 0.880. (B) Human–fruit fly plot. rs ¼ 0.525. (C) Human–nematode plot. rs ¼ 0.483. (D) Human–bacterium plot. rs ¼ 0.285.

Thus, as in the case of the amino acid composition [72–75], the pattern of availability scores may be considered as a parameter for evolutionary status of a given species, and this may reflect the finding in molecular evolution that sequence changes is roughly proportional to evolutionary time or distance. In other words, availability can be considered as a quantitative expression of evolutionary consequences of necessity and chance that have been accumulated in a particular species over time. Accordingly, its application to molecular phylogenetics to clarify phylogenetic relations among species will be expected. This viewpoint has already been confirmed by a proteomewide species comparison of the amino acid singlet, doublet, and triplet ‘‘compositions’’ [75]. Although this availability-based methods need proteome-wide sequence information, it is highly quantitative and probably accurate to compare phylogenetically distant organisms. Moreover, it is also advantageous over the classical methods of multiple alignments, because it does not require homologous proteins to be aligned properly from all organisms of interest. This also means that the genome-wide availability analysis is free from the problem of serious difference between gene

127 phylogeny and species phylogeny that conventional molecular phylogenetic analysis often encounters. More detailed explanation on these scatter plots follows. Triplets that were located above the diagonal line are used more frequently in biological species of y-axis, and those located below the line are used more frequently in biological species of x-axis. We take the human–bacterium plot (Fig. 2D) as an example here. Triplets located distantly far from the diagonal line above have relatively high availability in E. coli and relatively low availability in H. sapiens. Similar specific short sequences will be found in pathogenic bacteria. Comprehensive identification of such species-specific sequences awaits further research, and it will open up a new avenue for biotechnological applications. We could take advantage of making such artificial proteins with the nonexistent constituent sequences as a new type of inhibitors or modulators of biological activities of proteins. To name a few, it will be used as a diagnosis marker, and here it will help developing a highly sensitive and specific detection system for pathogenic bacteria. Depending on its protein function containing species-specific short sequences, anti-pathogenic medicines and immunological preventatives for infection will be expected as an effective alternative to conventional antibiotics and immunization, respectively. A specific pathogen may be medically eliminated from an infected patient with a minimum side effect if antibodies raised against the pathogen-specific constituent sequences are administered. A specific vaccination may be possible if a small stretch of peptide that is composed of the pathogen-specific constituent sequence is injected as a preventative measure. This idea may corroborate with an attempt to use short synthetic peptides as vaccines [86] and a biotechnological application of random peptide libraries [87]. In contrast, triplets located far below the diagonal line in the human– bacterium plot (Fig. 2D) indicate that they are relatively rare in E. coli and relatively frequent in H. sapiens. Taking account of their corresponding locations in scatter plots of other biological species such as mouse, fruit fly, and nematode, if those particular points are highly deviated only in mammals, these triplets may be derived from mammal-specific proteins for signal transduction pathways such as ligands, receptors, channels, and signaling enzymes. This line of discussion suggests that such triplets or their elongated corresponding 4–6-aa sets may be useful for identifying proteins with mammal-specific sequences, which then can serve as pharmaceutical seeds for novel drug targets. How about triplets located near the diagonal line? These triplets are used equally well in both H. sapiens and E. coli. If these are equally frequently used in all species, they may be derived from proteins of high importance in all organisms such as those for energy metabolism and biosynthesis of basic cellular components. These triplets and their elongated corresponding 4–6-aa

128 sets will provide us with important information on studying these fundamental proteins and basic principle of life. Comprehensive one-by-one examinations of all possible 4–6-aa sets are daunting task: it requires looking up 1.6  105B6.4  107 sequences of amino acid sets. One way to circumvent this problem is to narrow down candidate sequences by consulting triplet information. For example, with reference to the triplets frequently used in E. coli, one, two, or three amino acids may be added to these triplets, and availability scores of their corresponding tetrats, pentats, and hexats may again be examined. In this way, a computationally intensive process to find out short sequences with more than three amino acids that are specific to E. coli and never found in H. sapiens will be greatly facilitated.

Availability as an indicator for secondary structures Availability differences could be found not only between different species but also between different secondary structures. To examine such a possibility, we accessed the protein three-dimensional structure database, PDB (Protein Data Bank), which harbors about 40,000 protein structural data entries. These entries are highly redundant in that depending on crystallization methods and study details, three-dimensional data on the same sequences are deposited several times as independent entries. To avoid the redundancy, we used public software that searches for representative nonredundant structure entries, PDB-REPRDB, available through Internet. As a preliminary study, we examined 553 sequences (the number of amino acids, 155,843; the number of triplets, 101,352) of known three-dimensional structures in terms of triplet availability. These selected ‘‘representative’’ entries showed no large deviations of amino acid composition from those of the nr database, although it contained somewhat more A, G, D, Y and somewhat less L, R, and S (data not shown). Base on these representative sequences, we calculated availability scores of triplets. Then, stretches of a-helices and b-strands in the 553 unrelated protein molecules were identified and their sequence data were manually collected to make a small ‘‘a-helix database’’ or a ‘‘b-strand database’’ containing sequences of both parallel and anti-parallel b-sheets. The former contained 87,063 amino acids and 49,873 triplets. The latter contained 55,630 amino acids and 28,977 triplets. Availability scores for 8,000 triplets were recorded and their distribution patterns were depicted in histograms (Fig. 3A). Distributions of triplets in these secondary structures show similar patterns to each other, but they are less peaked than that of triplets in full sequences from which the secondary structure sequences were taken (labeled as ‘‘Full sequence’’ in Fig. 3A). Triplets in positive and negative values are relatively frequent in the secondary structures. This means that many triplets occur nonrandomly in

129

Fig. 3. Triplet availability in secondary structures. Samples were collected from PDB

using public software that searches for representative nonredundant structure entries (http://mbs.cbrc.jp/pdbreprdb-cgi/reprdb_menuJ.pl). (A) Distribution of triplet availability scores (relative triplet-counts) in a-helices and b-strands. (B) Scatter plot of 8,000 triplets in a-helices and b-strands.

the secondary structures, although further confirmation is necessary to solidify this conclusion. The scatter plot was made where x-axis indicated a-helices and y-axis b-strands from availability score 1 to þ6 (Fig. 3B). We found about 100 triplets at (1, 1), where they do not occur in either structure. From this point along the diagonal line to the upper right, triplets with low availability were about 600 (1.0B0.3), triplets with nearly random availability were about 900 (0.3Bþ0.3), and triplets with high availability were about 600 (Wþ0.3). These more than 2,000 triplet species are located at or near the diagonal line and thus they occur in both structures almost equally well. Furthermore, we found some constituent sequences that appear to be unique in a-helices or b-strands. Triplets located in the upper left side from the diagonal line, about 2,000, are used more frequently in b-strands than in a-helices. Conversely, triplets located in the lower right side, also about 2,000, are used more frequently in a-helices than in b-strands. We identified about 700 triplets that did not occur in b-sheets. Similarly, we identified about 300 triplets that did not occur in a-helices. Although further characterization of individual triplets will be expected, the structure-specific triplets or the availability bias between secondary structures could be used to predict secondary structures from sequences. We can tentatively conclude that depending on triplet species, there are some triplets used limitedly in a given secondary structure and there are also other triplets used in both secondary structures equally well. The former could constitute a part of hard or stiff backbone of proteins forming restricted j and c angles, and the latter a part of soft or flexible one, forming various

130 angles. For the functionality of proteins, the hard parts are important in supporting structural backbones, and the soft parts are important in interacting with other molecules. This information may make us being capable of finding active sites, ligand binding sites, or target sequences for mutagenesis and designing new structures for biotechnological purposes. Also, our availability concept can be combined with other approaches to predict secondary structures more accurately, for example, with hydropathy [88,89] or amphiphilicity [90]. Information on amino acids that interrupt the formation of secondary structures, called secondary structure breakers [12,13,91,92], should also be incorporated with other information on amino acid compositions discussed already (see section ‘‘From amino acid composition to short sequences: A database characterization’’). Such hybrid approach may be especially useful to predict transmembrane domains of membrane proteins. Availability-based structure prediction As discussed before, protein main chain structures are essentially determined by the j and c angles of peptide bonds connected with a-carbon atoms. Wu and Kabat in 1970s [14–16] explored a possibility of the three-dimensional structure predictions from the j and c bond angles in triplets, using a limited number of the structural data, based on the assumption that the j and c angles are determined only by the two neighboring amino acids in a peptide chain. At that time, triplet data are only partially available among the 8,000 species due to the limited number of structural data, and some brave speculations were unavoidable. Although these angles are influenced not only by the two neighboring amino acids but also by other amino acids, their classical approach should be re-examined, because now we have thousands of known structures with defined sequences and j and c angles in the PDB. On average, each triplet of 8,000 has about 100 combinations of the j and c angles in the database at present. It is now possible to reconstruct a ‘‘dictionary’’ for the angles in triplets with more accuracy, which will complement to the availability dictionary of triplets. Since the j and c angles can be visually expressed in the classical Ramachandran plot [93], it is highly desirable to make a collection of triplet Ramachandran plots using these structure data. Many of such Ramachandran plots would indicate relatively broad ranges of possible angles, although some of them may show quite restricted angles, given the broad distribution in the scatter plot in Fig. 3B. These broad angles are discouraging for an attempt, as seen in Wu and Kabat [14–16], in which researchers try to predict whole protein structures based on the angle data. Clearly, broad angles make it difficult to determine three-dimensional structures. Nonetheless, when combined with structural predictions based on similarity searches widely and heavily used today, these Ramachandran plots

131 may be helpful in finding out appropriate structures more easily in the case of sequences with very low structural similarities to known structures. In addition, the ranges of the angles in the Ramachandran plots may be weighted against triplet availability scores. These weighted scores could indicate usage flexibility of angles or likeliness of being influenced from remote amino acids other than the immediate neighboring amino acids. If one prepares a ‘‘dictionary’’ of 8,000 triplets in which systematic classification of triplet-weighted Ramachandran plots, correlations of triplets between functionally classified protein databases (e.g., GPCR, ion channel, transporter, cytosolic enzyme, and so on), and other physicochemical properties of triplets (i.e., hydrophobicity, ionic charge, physical size, and so on; see section ‘‘Availability as an indicator for secondary structures’’) are complied, it would be useful in identifying functionally important parts of protein sequences such as active site and ligand binding site and in assisting biotechnological applications especially in protein reverse engineering. Such a dictionary should contain comparative results from several different categories in terms of triplets and other n-aa sets; from the entire proteins on the earth, from a particular species, from a particular functional group of protein, and from a particular secondary structure. Eventually, we may be able to find a clue to accurately predict relations between sequence and function in this way. Revealing biological word processing units Thus far, we have stressed the possible importance of short amino acid sequences in proteins without paying much attention to biological ‘‘word processing’’ mechanism or ‘‘grammars’’. But grammars can also be deciphered from the use of availability at least partly. For this discussion, an analogy between protein chains and English sentences (without spaces between words, without any symbols other than 26 alphabetical letters, and without case sensitive expression) may be fruitful, because the nature of combinatorial sequences discussed in section ‘‘Necessity and chance in natural ‘protein design’ ’’ reminds us of natural languages. Words are composed of 26 letters of the alphabet, and sentences are composed of the combinations of words. Words that are frequently used (or those with high availability) can be found repeatedly in a variety of sentences if all sentences of a book are statistically analyzed. This is a way to decode English sentences when one has no primary knowledge of English but has a plenty of functional sentences (corresponding to protein sequences) in a book (corresponding to a protein database). Suppose that a triplet ‘‘the’’ has quite high availability in an English book. However, also suppose that the quartet ‘‘thex’’ (where x may be any letter) shows very low availability. This is because ‘‘the’’ is a functional word unit frequently used in English literature but most ‘‘thex’’ entries are not a part of proper words in English. Some important exceptions are ‘‘them’’, ‘‘then’’,

132 and ‘‘they’’, which are also functional units of English. Like this example, a stretch of letters whose length is just beyond the proper length of words will have quite low levels of availability. And this feature can be used for ‘‘word identification’’ in protein sequences. More concretely, suppose that we first focus on 100 triplets with the highest availability. And these 100 triplets can be classified into 2 groups based on their availability of their corresponding 2,000 quartets; one with the dramatic decline and the other with high availability. The triplets with high availability but whose corresponding quartets show low availability may be units or ‘‘words’’ used in protein sequences. Based on this analogy, triplets can be considered as words, and its meaning is expressed in availability, Ramachandran plots, and other physicochemical factors in the context of a whole protein sequence. Their combinations are considered to be the final outputs of structure and function as their meanings in sentences. From this point of view, our study to characterize triplets corresponds to an effort to make a preliminary dictionary of a natural language. A dictionary is highly useful in comprehending and constructing sentences, and similarly, we believe that these triplet characters are indispensable for understanding proteins and applications in biotechnology field. It will be used not only to design artificial proteins but also to assist a process of designing inhibitors and blockers. Current status of availability analysis and future of protein science Related studies on short amino acid sequences We started out our research as a conceptual extension of a simple and fundamental analysis on amino acid compositions performed by Chou and Fasman in 1970s [12,13]. Coincidentally, research on amino acid compositions has already been expanded into several directions as mentioned already in section ‘‘From amino acid composition to short sequences: A database characterization’’, and indeed our idea of examining short sequences is not entirely new. Similar exploratory studies on the usefulness of constituent sequences have been sporadically reported as a tool to examine an aspect of proteins without much systematic effort to characterize general features of the constituent sequences. For example, ‘‘dipeptide’’ (i.e., doublet) information [94] and ‘‘peptide composition’’ (i.e., a concept similar to availability) [95] were used for the prediction of membrane protein types. Dipeptide information is also used to predict gene expression level [96]. As discussed already, compositions of singlets, doublets, and triplets were demonstrated to be useful to reveal phylogenetic relations between species [75]. One study focused on amino acid quartets (called ‘‘4-grams’’ in the original literature) for protein classification on the assumption that the same quartets would not be found randomly in unrelated proteins [97]. This study showed that quartet

133 distributions were useful markers for alignment-independent classification of proteins. Impressively, usefulness of short sequences has already been tested in the field of GPCRs. This is partly because GPCR multiple alignments are notoriously difficult, due to highly convergent evolution with few conserved sequences (see section ‘‘Sequence, structure, and function’’), which naturally encourages many researchers to explore unconventional methods of sequence analyses [38–48]. Amino acid composition and ‘‘dipeptide composition’’ have been used for the classification of GPCRs [98,99] and for the characterization of their ligand binding sites [48]. More general concept of n-aa sets (called ‘‘n-tuples’’ in the original literature) were employed for the classification and identification of GPCRs [100]. Although this characterization procedure requires pre-determined classification, such searches for n-aa sets that are specific to a certain group of proteins could be a useful tool for identifying functionally important short sequences that are not revealed in a conventional way [101]. Furthermore, similar unconventional approaches may also provide us with insightful information, as suggested in other studies. Identification of low complexity short sequences was shown to be useful to infer protein functions [102]. The simple amino acid composition was fortified by a gapped combinations of amino acids with one, two, and three amino acid gaps called ‘‘pseudo-amino acid composition’’ [103], k-spaced amino acid pairs [104], or amino acid coupling patterns [105]. This information was used for the prediction of cellular locations of proteins [103], structural classes [106–108], and transmembrane domains [109]. The concept of examining gapped pairs of amino acids is complementary to our concept of examining consecutive amino acid sets. Furthermore, a different look at amino acids as placeholders has been proposed [110], which could corroborate with the biological ‘‘function’’ of domain length in GPCRs [40,43,45,60]. These studies, together with ours, demonstrated the biological and biotechnological significance of the constituent sequences in proteins. We propose that the entire database characterization on availability of both consecutive and gapped constituent sequences will eventually provide us with biotechnologically practical information on sequence–structure and structure–function relations. Rebuilding the paradigm As noted previously, stereotypical assembly of secondary structures often observed in protein crystal structures are called structural motifs, and they are considered as a functional unit (see section ‘‘Sequence, structure, and function’’). This line of thinking reflects a strong brief that particular structures are directly linked to functions of proteins. Although this concept is well proven in many cases, it has been pointed out that a part of protein

134 that does not form a specific structure plays an important role in protein function [66–69,111–116]. These findings lead us to rethink the classical paradigm of protein science on sequence, structure, and function. Consequently, the presumption that protein amino acid sequences on the earth have been evolutionary selected to thermodynamically form a stable structure as discussed in section ‘‘Necessity and chance in natural ‘protein design’ ’’ [2] may not be entirely correct. To better understand proteins, we believe that protein amino acid sequences should be examined from several different aspects beyond the classical paradigm of protein science. Two main approaches to protein decoding, structure prediction methods and sequence similarity searches are highly informative, but they are not sufficient to understand protein functionality entirely. Analyses of availability of short amino acids are one of the approaches that are independent of sequence similarity searches and also not entirely constrained by the classical paradigm of protein science. Another different approach that is essentially free from the conventional view is to develop a method to ‘‘read’’ the entire sequence of a protein without introducing a procedure of cutting a continuous sequence into pieces and to compare it to the rest of sequences in a database. Such comparison of the entire sequences is not possible for human eyes at all, but it is certainly possible for an artificial intelligent system. Although our availability analysis is based on the presumption that nature prefers to construct proteins using short sequences as a building unit, the ‘‘whole sequence reading’’ may also be achieved somehow, simply because proteins exist as a single continuous chain. We demonstrated that even among GPCRs, which show very limited sequence similarities and sequence motifs among them, an artificial intelligence program called self-organizing map can accurately identify and classify them based on the entire sequences without any alignment process [49]. It should be pointed out that other unconventional approaches that ‘‘read’’ the entire sequence at once are also prosperous in the GPCR field as discussed already [39–49] (see section ‘‘Sequence, structure, and function’’). Conclusion Focusing on short amino acid sequences, especially triplets, we discussed in this chapter potential implications of availability in protein science and engineering. Our concept of availability is quite simple, yet it will make a significant contribution to biological sciences. It will contribute to prediction of protein structure and function and to clarification of phylogenetic status of biological species. Importantly, it may also shed light on rebuilding the paradigm of protein science. Moreover, its potential biotechnological applications are numerous in biomedical field such as pharmaceutical seeds for new diagnosis regents and drugs. Therefore, it is highly desirable to put serious efforts on this topic to make our perspectives more realistic.

135 We believe that our approach is highly systematic philosophically and methodologically, and in this sense, it is unique. But at the same time, it is true that related approaches have been tried in a few groups in 1970s and also in other groups in 2000s, although less systematic. Now that numerous sequence and structure data are available in this post-genome era more than ever, together with sophisticated computer technologies, it is time to mine and polish old and precious treasures that foresighted researchers in 1970s had ‘‘prepared’’ for the future researchers, probably by necessity and by chance. Acknowledgments We thank K. Hamano, Y. Imamura, T. Hiramura, and M. Wada for technical help. This work was partly supported by Grant for the Advancement of Scientific Collaborations from Kanagawa University, by the Takeda Science Foundation, and by the 21st century COE program of University of the Ryukyus. References 1. Novotny V, Basset Y, Miller SE, Weiblen GD, Bremer B, Cizek L and Drozd P. Low host specificity of herbivorous insects in a tropical forest. Nature 2002;416:841–844. 2. Richardson JS. Looking at proteins: representations, folding, packing, and design. Biophys J 1992;63:1186–1290. 3. Rossmann MG and Argos P. Protein folding. Annu Rev Biochem 1981;50:497–532. 4. McCarthy AD and Hardie DG. Fatty acid synthase: an example of protein evolution by gene fusion. Trends Biochem Sci 1984;9:60–63. 5. Britten RJ. Almost all human genes resulted from ancient duplication. Proc Natl Acad Sci USA 2006;103:19027–19032. 6. Anfinsen CB and Redfield RR. Protein structure in relation to function and biosynthesis. Adv Protein Chem 1956;48:1–100. 7. Anfinsen CB. Principles that govern the folding of protein chains. Science 1973;181: 223–230. 8. Dobson CM and Karplus M. The fundamentals of protein folding: bringing together theory and experiment. Curr Opin Struct Biol 1999;9:92–101. 9. Georgopoulos C and Welch WJ. Role of the major heat shock proteins as molecular chaperones. Annu Rev Cell Biol 1993;9:601–634. 10. Pauling L and Corey RB. Configurations of polypeptide chains with favored orientations around single bonds: two new pleated sheets. Proc Natl Acad Sci USA 1951;37:729–740. 11. Pauling L, Corey RB and Branson HR. The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci USA 1951;27: 205–211. 12. Chou PY and Fasman GD. Prediction of protein conformation. Biochemistry 1974; 13:222–245. 13. Chou PY and Fasman GD. Prediction of the secondary structure of proteins from their amino acid sequences. Adv Enzymol Relat Areas Mol Biol 1978;47:45–148.

136 14. Wu TT and Kabat EA. An attempt to locate the non-helical and permissively helical sequences of proteins: application to the variable regions of immunoglobulin light and heavy chains. Proc Natl Acad Sci USA 1971;68:1501–1506. 15. Kabat EA and Wu TT. The influence of nearest-neighbor amino acids on the conformation of the middle amino acid in proteins: comparison of predicted and experimental determination of b-sheets in concanavalin A. Proc Natl Acad Sci USA 1973;70:1473–1477. 16. Wu TT and Kabat EA. An attempt to evaluate the influence of neighboring amino acid (n1) and (nþ1) on the backbone conformation of amino acid (n) in proteins: use in predicting the three-dimensional structure of the polypeptide backbone of other proteins. J Mol Biol 1973;75:13–31. 17. Garnier J, Osguthorpe DJ and Robson B. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 1978;120:97–120. 18. Gibrat JF, Garnier J and Robson B. Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. J Mol Biol 1987;198:425–443. 19. Lim VI. Algorithms for prediction of a-helical and b-structural regions in globular proteins. J Mol Biol 1974;88:873–894. 20. Huang JT and Wang MT. Secondary structural wobble: the limits of protein prediction accuracy. Biochem Biophys Res Commun 2002;294:621–625. 21. Rost B. Prediction in 1D: secondary structure, membrane helices, and accessibility. In: Structural Bioinformatics Bourne PE and Weissig H (eds), Hoboken, Wiley-Liss, 2003, pp. 559–587. 22. Altschul SF, Gish W, Miller W, Myers EW and Lipman DJ. Basic local alignment search tool. J Mol Biol 1990;215:403–410. 23. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W and Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25:3389–3402. 24. Chothia C and Lesk AM. The relation between the divergence of sequence and structure in proteins. EMBO J 1986;5:823–826. 25. Wood TC and Pearson WR. Evolution of protein sequences and structures. J Mol Biol 1999;291:977–995. 26. Wilson CA, Kreychman J and Gerstein M. Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. J Mol Biol 2000;297:233–249. 27. Yang AS and Honig B. An integrated approach to the analysis and modeling of protein sequences and structures. II: On the relationship between sequence and structural similarity for proteins that are not obviously related in sequence. J Mol Biol 2000;301:679–689. 28. Wallqvist A, Fukunishi Y, Murphy LR, Fadel A and Levy RM. Iterative sequence/ secondary structure search for protein homologs: comparison with amino acid sequence alignments and application to fold recognition in genome databases. Bioinformatics 2000;16:988–1002. 29. Ausiello G, Via A and Helmer-Citterich M. Query3d: a new method for high-throughput analysis of functional residues in protein structures. BMC Bioinformatics 2005;6(Suppl 4):S5. 30. Kann MG, Thiessen PA, Panchenko AR, Scha¨ffer AA, Altschul SF and Bryant SH. A structure-based method for protein sequence alignment. Bioinformatics 2005;21: 1451–1456.

137 31. Shmygelska A and Hoos HH. An ant colony optimization algorithm for the 2D and 3D hydrophobic polar protein folding problem. BMC Bioinformatics 2005;6:30. 32. Aydin Z, Altunbasak Y and Borodovsky M. Protein secondary structure prediction for a single-sequence using hidden semi-Markov models. BMC Bioinformatics 2006;7:178. 33. Meng EC, Pettersen EF, Couch GS, Huang CC and Ferrin TE. Tools for integrated sequence–structure analysis with UCSF Chimera. BMC Bioinformatics 2006;7:339. 34. Dunbrack RL. Sequence comparison and protein structure prediction. Curr Opin Struct Biol 2006;16:374–384. 35. Bockaert J and Pin JP. Molecular tinkering of G protein-coupled receptors: an evolutionary success. EMBO J 1999;18:1723–1729. 36. Scho¨neberg T. GPCR superfamily and its structural characterization. In: Understanding G Protein-coupled Receptors and their Role in the CNS Pangalos MN and Davies CH (eds), Oxford, Oxford University Press, 2002, pp. 3–27. 37. Kobilka BK. G protein coupled receptor structure and activation. Biochim Biophys Acta 2007;1768:794–807. 38. Yeagle PL and Albert AD. G-protein coupled receptor structure. Biochim Biophys Acta 2007;1768:808–824. 39. Baldi P and Chauvin Y. Hidden Markov Models of the G-protein-coupled receptor family. J Comput Biol 1994;1:311–336. 40. Otaki JM and Firestein S. Length analyses of mammalian G-protein-coupled receptors. J Theor Biol 2001;211:77–100. 41. Karchin R, Karplus K and Haussler D. Classifying G-protein coupled receptors with support vector machines. Bioinformatics 2002;18:147–159. 42. Lapinsh M, Gutcaits A, Prusis P, Post C, Lundstedt T and Wikberg JE. Classification of G-protein-coupled receptors by alignment-independent extraction of principal chemical properties of primary amino acid sequences. Protein Sci 2002;11:795–805. 43. Sugiyama Y, Polulyakh N and Shimizu T. Identification of transmembrane protein functions by binary topology patterns. Protein Engineering 2003;16:479–488. 44. Bhasin M and Raghava GP. GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors. Nucleic Acids Res 2004; 32:W383–W389. 45. Inoue Y, Ikeda M and Shimizu J. Proteome-wide classification and identification of mammalian-type GPCRs by binary topology pattern. Comput Biol Chem 2004;28: 39–49. 46. Huang Y, Cai J, Ji L and Li Y. Classifying G-protein coupled receptors with bagging classification tree. Comput Biol Chem 2004;28:275–280. 47. Wang YF, Chen H and Zhou YH. Prediction and classification of human G-protein coupled receptors based on support vector machines. Genomics Proteomics Bioinformatics 2005;3:242–246. 48. Wistrand M, Kall L and Sonnhammer EL. A general model of G protein-coupled receptor sequences and its application to detect remote homologs. Protein Sci 2006; 15:509–521. 49. Otaki JM, Mori A, Itoh Y, Nakayama T and Yamamoto H. Alignment-free classification of G-protein-coupled receptors using self-organizing map. J Chem Info Model 2006;46:1479–1490. 50. Buck L and Axel R. A novel multigene family may encode odorant receptors: a molecular basis for odor recognition. Cell 1991;65:175–187. 51. Zhao H, Ivic L, Otaki JM, Hashimoto M, Mikoshiba K and Firestein S. Functional expression of a mammalian odorant receptor. Science 1998;279:237–242.

138 52. Krautwurst D, Yau KW and Reed RR. Identification of ligands for olfactory receptors by functional expression of a receptor library. Cell 1998;95:917–926. 53. Malnic B, Hirono J, Sato T and Buck LB. Combinatorial receptor codes for odors. Cell 1999;96:713–723. 54. Touhara K, Sengoku S, Iaki K, Tsuboi A, Hirono J, Sato T, Sakano H and Haga T. Functional identification and reconstitution of an odorant receptor in single olfactory neurons. Proc Natl Acad Sci USA 1999;96:4040–4045. 55. Vosshall LB, Amrein H, Morozov PS, Rzhetsky A and Axel R. A spatial map of olfactory receptor expression on the Drosophila antenna. Cell 1999;96:725–736. 56. Clyne PJ, Warr CG, Freeman MR, Lessing D, Kim J and Carlson JR. A novel family of divergent seven-transmembrane proteins: candidate odorant receptors in Drosophila. Neuron 1999;22:327–338. 57. Gao Q and Chess A. Identification of candidate Drosophila olfactory receptors from genomic DNA sequences. Genomics 1999;60:31–39. 58. Wetzel CH, Behrendt HJ, Gisselmann G, Stortkuhl KF, Hovemenn B and Hatt H. Functional expression and characterization of a Drosophila odorant receptor in a heterologous cell system. Proc Natl Acad Sci USA 2001;98:9377–9380. 59. Stortkuhl KF and Kettler R. Functional analysis of an olfactory receptor in Drosophila melanogaster. Proc Natl Acad Sci USA 2001;98:9381–9385. 60. Otaki JM and Yamamoto H. Length analyses of Drosophila odorant receptors. J Theor Biol 2003;223:27–37. 61. Benton R, Sachse S, Michnick SW and Vosshall LB. Atypical membrane topology and heteromeric function of Drosophila odorant receptors in vivo. PLoS Biol 2006;4:e20. 62. Benton R. On the ORigin of smell: odorant receptors in insects. Cell Mol Life Sci 2006;63:1579–1585. 63. Kurochkina N. Amino acid composition of parallel helix-helix interfaces. J Theor Biol 2007;247:110–121. 64. Soga S, Shirai H, Kobori M and Hirayama M. Use of amino acid composition to predict ligand-binding sites. J Chem Inf Model 2007;47:400–406. 65. Kurgan LA, Stach W and Ruan J. Novel scales based on hydrophobicity indices for secondary protein structure. J Theor Biol 2007;248:354–366. 66. Romero P, Obradovic Z, Li X, Garner EC, Brown CJ and Dunker AK. Sequence complexity of disordered protein. Proteins 2001;42:38–48. 67. Weathers EA, Paulaitis ME, Woolf TB and Hoh JH. Insights into protein structure and function from disorder-complexity space. Proteins 2007;66:16–28. 68. Weathers EA, Paulaitis ME, Woolf TB and Hoh JH. Reduced amino acid alphabet is sufficient to accurately recognize intrinsically disordered protein. FEBS Lett 2004;576: 348–352. 69. Han P, Zhang X, Norton RS and Feng ZP. Predicting disordered regions in proteins based on decision trees of reduced amino acid composition. J Comput Biol 2006;13: 1723–1734. 70. Fujishima K, Komasa M, Kitamura S, Suzuki H, Tomita M and Kanai A. Proteomewide prediction of novel DNA/RNA-binding proteins using amino acid composition and periodicity in the hyperthermophilic archaeon Pyrococcus furiosus. DNA Res 2007;14: 91–102. 71. Ma BG, Chen LL and Zhang HY. What determines protein folding types? An investigation of intrinsic structural properties and its implications for understanding folding mechanisms. J Mol Biol 2007;370:439–448.

139 72. Tekaia F, Yeramian E and Dujon B. Amino acid composition of genomes, lifestyles of organisms, and evolutionary trends: a global picture with correspondence analysis. Gene 2002;297:51–60. 73. Tekaia F and Yeramian E. Evolution of proteomes: fundamental signatures and global trends in amino acid compositions. BMC Genomics 2006;7:307. 74. Bogatyreva NS, Finkelstein AV and Galzitskaya OV. Trend of amino acid composition of proteins of different taxa. J Bioinform Comput Biol 2006;4:597–608. 75. Pe’er I, Felder CE, Man O, Silman I, Sussman JL and Beckmann JS. Proteomic signatures: amino acid and oligopeptide compositions differentiate among phyla. Proteins 2004;54:20–40. 76. Chothia C, Levitt M and Richardson D. Helix-to-helix packing in proteins. J Mol Biol 1981;145:215–250. 77. Chothia C. Principles that determine the structure of proteins. Annu Rev Biochem 1984;53:537–572. 78. Reichmann D, Rahat O, Cohen M, Neuvirth H and Schreiber G. The molecular architecture of protein–protein binding sites. Curr Opin Struct Biol 2007;17:67–76. 79. Matsuura T, Miyai K, Trakulnaleamsai S, Yomo T, Shima Y, Miki S, Yamamoto K and Urabe I. Evolutionary molecular engineering by random elongation mutagenesis. Nat Biotechnol 1999;17:58–61. 80. Fenton C, Hansen A and El-Gewely MR. Modulation of the Escherichia coli tryptophan repressor protein by engineered peptides. Biochem Biophys Res Commun 1998;242: 71–78. 81. Cull MG, Miller JF and Schatz PJ. Screening for receptor ligands using large libraries of peptides linked to the C terminus of the lac repressor. Proc Natl Acad Sci USA 1992;89:1865–1869. 82. Otaki JM, Gotoh T and Yamamoto H. Frequency distribution of the number of amino acid triplets in the non-redundant protein datanase. J Jpn Soc Info Knowledge 2003;13:25–38. 83. Otaki JM, Ienaka S, Gotoh T and Yamamoto H. Availability of short amino acid sequences in proteins. Protein Sci 2005;14:617–625. 84. Karlin S, Brocchieri L, Bergman A, Mrazek J and Gentles AJ. Amino acid runs in eukaryotic proteomes and disease associations. Proc Natl Acad Sci USA 2002;99: 333–338. 85. Veitia RA. Amino acid runs and genomic compositional biases in vertebrates. Genomics 2004;83:502–507. 86. Hans D, Young PR and Fairlie DP. Current status of short synthetic peptides as vaccines. Med Chem 2006;2:627–646. 87. Scott JK and Craig L. Random peptide libraries. Curr Opin Biotechnol 1994;5:40–48. 88. Kyte J and Doolittle RF. A simple method for displaying the hydropathic character of a protein. J Mol Biol 1982;157:105–132. 89. Engelman DM, Steitz TA and Goldman A. Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu Rev Biophys Biophys Chem 1986;15:321–353. 90. Mitaku S, Hirokawa T and Tsuji T. Amphiphilicity index of polar amino acids as an aid in the characterization of amino acid preference at membrane–water interfaces. Bioinformatics 2002;18:608–616. 91. Levitt M. Conformational preferences of amino acids in globular proteins. Biochemistry 1978;17:4277–4285.

140 92. Imai K and Mitaku S. Mechanisms of secondary structure breakers in soluble proteins. Biophysics 2005;1:55–65. 93. Ramachandran GN and Sassiekharan V. Conformation of polypeptides and proteins. Adv Protein Chem 1968;28:283–437. 94. Pu X, Guo J, Leung H and Lin Y. Prediction of membrane protein types from sequences and position-specific scoring matrices. J Theor Biol 2007;247:259–265. 95. Yang XG, Luo RY and Feng ZP. Using amino acid and peptide composition to predict membrane protein types. Biochem Biophys Res Commun 2007;353:164–169. 96. Raghava GP and Han JH. Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein. BMC Bioinformatics 2005;6:59. 97. Vries JK, Munshi R, Tobi D, Klein-Seetharaman J, Benos PV and Bahar I. A sequence alignment-independent method for protein classification. Appl Bioinformatics 2004; 3:137–148. 98. Gao QB and Wang ZZ. Classification of G-protein coupled receptors at four levels. Protein Eng Des Sel 2006;19:511–516. 99. Elrod DW and Chou KC. A study on the correlation of G-protein-coupled receptor types with amino acid composition. Protein Eng 2002;15:713–715. 100. Daeyaert F, Moereels H and Lewi PJ. Classification and identification of proteins by means of common and specific amino acid n-tuples in unaligned sequences. Comput Methods Programs Biomed 1998;56:221–233. 101. Hobohm U and Sander C. A sequence property approach to searching protein databases. J Mol Biol 1995;251:390–399. 102. Li X and Kahveci T. A novel algorithm for identifying low complexity regions in a protein sequence. Bioinformatics 2006;22:2980–2987. 103. Chou K-C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins 2001;43:246–255. 104. Chen K, Kurgan LA and Ruan J. Prediction of flexible/rigid regions from protein sequences using k-spaced amino acid pairs. BMC Struct Biol 2007;7:25. 105. Liang HK, Huang CM, Ko MT and Hwang JK. Amino acid coupling patterns in thermophilic proteins. Proteins 2005;59:58–63. 106. Lin H and Li QZ. Using pseudo amino acid composition to predict protein structural class: approached by incorporating 400 dipeptide components. J Comput Chem 2007;28: 1463–1466. 107. Chen C, Tian YX, Zou XY, Cai PX and Mo JY. Using pseudo-amino acid composition and support vector machine to predict protein structural class. J Theor Biol 2006;243:444–448. 108. Zhang TL and Ding YS. Using pseudo amino acid composition and binary-tree support vector machines to predict protein structural classes. Amino Acids 2007;33:623–629. 109. Diao Y, Ma D, Wen Z, Yin J, Xiang J and Li M. Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity. Amino Acids 2007;34:111–117. 110. Rayment JH and Forsdyke DR. Amino acids as placeholders: base-composition pressures on protein length in malaria parasites and prokaryotes. Appl Bioinformatics 2005;4:117–130. 111. Dyson HJ and Wright PE. Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 2005;6:197–208. 112. Csizmo´k V, Doszta´nyi Z, Simon I and Tompa P. Towards proteomic approaches for the identification of structural disorder. Curr Protein Pept Sci 2007;8:173–179.

141 113. Sugase K, Dyson HJ and Wright PE. Mechanism of coupled folding and binding of an intrinsically disordered protein. Nature 2007;447:1021–1025. 114. Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, Oldfield CJ, Campen AM, Ratliff CM, Hipps KW, Ausio J, Nissen MS, Reeves R, Kang C, Kissinger CR, Bailey RW, Griswold MD, Chiu W, Garner EC and Obradovic Z. Intrinsically disordered protein. J Mol Graph Model 2001;19:26–59. 115. Wright PE and Dyson HJ. Intrinsically unstructured proteins: reassessing the protein structure–function paradigm. J Mol Biol 1999;293:321–331. 116. Dunker AK and Obradovic Z. The protein trinity–linking function and disorder. Nat Biotechnol 2001;19:805–806.

143

Network models in drug discovery and regenerative medicine David A. Winkler CSIRO Molecular and Health Technologies, Clayton 3168, Australia Abstract. Network motifs and modelling paradigms are attracting increasing attention as modelling tools in drug design and development, and in regenerative medicine. There is a gradual but inexorable convergence between these hitherto disparate disciplines. This review summarizes some very recent work in these areas, leading to an understanding of the complementary roles networks play and factors driving this convergence:  network paradigms can be excellent ways of modelling and understanding drug molecules and their action,  an understanding of the robustness and vulnerabilities of biological targets may improve the efficacy of drug design and discovery,  drug design has an increasingly large role to play in directing stem cell properties,  stem cell regulatory networks can be modelled in useful ways using network models at a suitable level of scale, and  the network tools of drug design are also very useful for the design of biomaterials used in regenerative medicine. Keywords: networks, feature selection, Bayesian methods, stem cells, cancer, QSAR.

Introduction Theory, mathematics, and computation have played an important role in the pharmaceutical industry, and have lead to the development of novel, rational approaches to the design of bioactive agents. Such methods, in concert with structural biology, informatics, and developments in high-throughput technologies, have provided useful methods for interpreting data, designing experiments, and discovering novel therapies. Theoretical and computational research has contributed to maintaining the efficiency of drug development in a research and regulatory climate where the bar is constantly being raised. Some argue that the efficiency of the pharmaceutical industry (as judged by NCEs annually) is in decline, and the current discovery methods (computational or otherwise) have not lived up to their promise. Two key reasons have been proposed for the less than optimal drug discovery efficiency: inability to account for drug-likeness or ADMET properties early in the hit or lead discovery phase of drug development and failure to understand the properties of the biological networks being perturbed by the drug [1]. Neither of these are a consequence of the failure of theoretical and computational methods to aid in drug discovery and Tel.: +61-3-9545-2477. Fax: +61-3-9545-2561.

E-mail: [email protected] (D.A. Winkler). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00005-7

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

144 development. Rather, they are shortcomings in the way people think about the discovery and development process, which are changing and require theoretical and computational methods to implement. This provides an opportunity for a paradigm shift in the way the new drugs are developed, and for concepts in theory and computation from complex systems science, and systems biology to be adapted and applied to the rapid development of in silico ADMET models, and network theory aimed at understanding systems robustness and vulnerability. More recently, regenerative biology and medicine has started to attract a similar level of attention as therapeutic or prophylactic therapies. The two main elements of regenerative medicine and biology that we will consider in this review are stem cell biology and biomaterials development. Unlike pharmaceutical research, the very large amount of experimental research in regenerative medicine has not been matched by an equivalent effort in theoretical, mathematical, or computational modelling research. This is surprising given that these methods provide a sound theoretical basis for understanding, predicting, and planning research in regenerative medicine. Although pharmaceutical and regenerative medicine approaches are currently quite disparate approaches to discover new therapies, overlap is starting to appear between the two fields, and there is increasing, albeit small, evidence of convergence in the approaches. The principal concept in which both fields can be compared and reviewed the network, either as a paradigm for understanding the behavior of systems of interacting elements, or as a robust modelling tool for design of agents and therapies. There is a rich literature on network theory as a mathematical concept, and on the behavior of biological networks systems, so the intention is not to extensively review these areas. This review will focus instead on how the concept of networks can be applied in regenerative medicine, and therapeutic and prophylactic drug development, research areas that have been much less reviewed. We aim to provide a brief overview of how networks are applied in these important areas, and how this may lead to some convergence in thinking and in computational approaches. Regenerative medicine is a nascent research area that is likely to be very significant in the future. In general, networks are an increasingly important concept in biology. This review aims to review some paradigm shifts in thinking about how networks can be used in what has until recently been relatively disparate fields of drug design and development, and regenerative medicine. There has been a convergence of thinking about the two areas, and the common feature is the role networks play. Networks in therapeutic and regenerative medicine The concept of the network plays different but complementary roles in pharmaceutical and regenerative medicine research. Modelling methods that use a network paradigm have been very useful in both fields.

145 In drug development, neural networks have significant advantages over more traditional methods of drug design and they are being used increasingly in drug discovery [2–14]. They can build robust models relating molecular structure to drug target, ADMET, or physicochemical properties, and are useful for pattern recognition and clustering of compounds according to function. More recently, the role of biological networks in modulating the efficacy of drug intervention has been discussed [15–18]. The off-target effects of drugs have also been under more scrutiny, particularly in understanding how these influence the biological networks in which the drug target is found [19]. The question of how multiple target effects influence the pharmaceutical outcome, and whether this polypharmacy of drugs can be managed or even engineered or exploited has been raised [19]. These questions are discussed in more depth in this review. In regenerative medicine, networks also have important roles to play. In stem cells it is important to understand how gene regulatory networks control stem cell fate, so that they can be controlled to yield expansion of stem cell compartments and directed differentiation. In biomaterials, the signalling networks involved in the communication between surfaces, or matrices, and cells need to be understood. This will provide a way of rationally designing biomaterials and bioactive surfaces with desirable biocompatibility or biomodulatory properties. Additionally, the first forays into design of biomaterials properties have involved the use of neural networks to model biomaterials properties [20]. Properties of networks Networks consist of entities or agents (e.g., molecules, proteins, or genes) which make up the nodes or vertices of the network, and interactions (e.g., forces, binding events, or transcription factors) that provide the connections between the nodes and are also called links or edges. Clearly almost any kind of interacting system can be described by the network paradigm. Complex systems science recognizes and exploits similarities in underlying network mechanisms, behaviours, and representations [21]. Striking similarities have been revealed between the properties of social networks, networks in ecosystems, and genetic regulatory networks for example [22]. These similarities sometimes suggest similarities in the laws governing how the interaction produce the observed system scale behavior (called an emergent property in complexity theory). Networks can adopt various architectures, which are defined by the nature of the connections in the networks. The most frequent and important architecture in biological systems is the scale-free or small world network [23–26]. In these, the number of nodes with a given number of connection obeys a power law, where a plot of the logarithm of the number of connections for each node against the logarithm of the number of nodes with

146

Fig. 1. Power law behavior. In this example, the number of ligands binding one or

more protein targets displays a power law relationship over several orders of magnitude.

a given number of connections is a straight line (Fig. 1). A small number of nodes (hubs) have many connections while a much larger number have only a small number of links. Many kinds of diverse interacting systems show power law behavior of this type. The slope of the log–log plot defines the exponent for the power law relationship. There is a tendency for the set of exponents found for diverse systems to cluster into distinct classes, called ‘‘universality classes’’ [27]. It has been suggested that systems falling into the same universality class have similar microscopic mechanisms that are responsible for the observed scale invariant behavior. Genetic regulatory networks are examples of biological scale-free networks. They are very robust against random damage to the network (a very desirable property for a biological regulatory network), although attacks targeting the hub genes can cause catastrophic collapse of the network. The number of connections, or connectedness, in a network is an important network property. As the number of links in an initially unconnected network is gradually increased, the path lengths (or equivalently, the size of the giant component) in the network undergo a kind of phase transition when a critical, threshold, number of connections is reached [22,28]. This transition defines a kind of self-organized critical point in which disturbances in the network switch from being isolated in part to the network and become system-spanning. This has implications for signal propagation (e.g., triggering cell fate decisions) in some biological systems.

147 Molecules and genomes as networks Molecules are essentially three-dimensional distributions of electron density located on a nuclear framework. In principal all physical and chemical properties can be derived from an accurate description of electron density using appropriate operators. Molecules are often described or represented as molecular graphs and the principles of graph theory used to derive properties [29]. They can also be considered as networks, where the vertices are atoms and the edges are bonds [30]. In this case the vertices have more complex properties (related to electron density between atoms) than simple linkages imply. Unlike other networks such as gene regulatory networks, social networks, or food webs, the interactions within molecules place very strong spatial restrictions on the topology of the network or graph. Thus, molecules have relatively strict geometries (allowing, however, for conformational degrees of freedom) compared with genomes and social groups. The analysis of molecules as networks is nascent, and it is not clear whether they exhibit any power law behaviour. There is extensive literature on molecules as graphs and networks, which has been reviewed recently by Garcia-Domenech et al. [31]. DNA sequences have also been shown to exhibit power law behavior by Salerno et al. [32]. There is substantial evidence that gene networks are also scale-free, exhibiting a power law distribution of connections. The master regulatory genes represent the highly connected hubs of the genetic network. Albert et al. [33] have examined the robustness of various types of network topology to random or deliberate attack or damage. They found scale-free networks to be surprisingly robust to unrealistically high random failure rates in the network compared with exponential networks where most nodes have a similar number of connections. This robustness suggests why the scale-free network has been selected for biological networks such as gene regulatory or metabolic networks. It also suggests why diseases that attack biological network hubs can often be severe. A recent example has been work on the role of biological networks in understanding ataxia. Lim and co-workers [34] showed that the hub proteins were those contributing substantially to the degeneration of the Purkinje cell and causing the clinical manifestation of ataxia. Conversely, drugs that target network hubs are likely to be more effective (for example, in killing a pathogen or pest). There are other beneficial consequences of scale-free networks in biological systems. They facilitate chemical diversity at minimal energy cost, and they recapitulate natural selection [35]. Mutations or deletions of hub genes are usually lethal, while mutations of weakly linked genes account for biological variability and natural selection. Scale-free networks can evolve rapidly toward an optimal functional state, and are robust to the consequences of many biochemical or genetic errors, unless these involve the hubs [35].

148 Networks and drug design Historically, networks have been used in drug discovery and development as tools in modelling. The field of Quantitative Structure–Activity Relationship (QSAR) has been steadily evolving toward using neural networks for classification and clustering, feature selection, and the building of robust structure–activity models. Figure 2 shows the exponential growth in papers using neural networks for QSAR and drug design. We illustrate the application of neural networks to drug design in the section on network modelling tools. Recent reviews in this also summarize the myriad of ways neural networks contribute to the design and development of drugs and other bioactives such as pesticides [2–4,36,37]. Some specific examples of how neural networks are used in drug modelling are given in later sections of this review. In terms of biological activity, interactions between molecules (e.g., drugs and proteins, proteins and DNA, etc.) are key, and these can often be conveniently described by networks. Very recently, there has been a paradigm shift in thinking in pharmaceutical research toward network ideas [16,18]. It has long been recognized that almost all drugs hit multiple targets. Figure 3 illustrates the varying degrees of polypharmacology observed in clinical drugs. This has been seen as a problem, as it led to off-target effects and toxicity. The recent change in thinking has been that this so-called polypharmacology of drugs, apart from being inevitable, is actually useful. Clearly, drugs are often designed to hit a specific target that is presumed to induce the desired phenotypic change, but it has only recently been 120

100

80

60

40

20

0 2006 2004 2005 2003 2007 2002 2001 2000 1999 1997 1996 1994 1998 1995 1992 1993 1991

Fig. 2. Graph of numbers of papers using neural networks in drug design vs. time.

(The color version of this figure is hosted on Science Direct.)

149

Fig. 3. Cluster analysis of 2,000 compounds and drugs from the BioPrints database

across 70 pharmacological assays (pIC50). Biological assays are on the x-axis and compounds are on the y-axis. Hierarchical clustering has been performed on both axes: compounds according to their fingerprint of biological activities and targets in a chemogenomic way according to the fingerprint of the activities of the compounds for each target. The activities are shown in the form of a heat map, with red most active and blue inactive. (The color version of this figure is hosted on Science Direct.)

appreciated that the robustness of biological networks is such that they will resist such perturbations and will adapt themselves to minimize the impact. There are now a number of examples of registered drugs that hit specific targets very well but show lower efficacy than expected in vivo [15]. Conversely, there are also registered drugs that hit as many as 14 targets but show very good clinical efficacy [15,18]. The properties of the biological networks are likely to be responsible for these anomalies. If we can understand the robustness of biological networks implicated in a specific drug therapy, we could better understand how to tailor or engineer the polypharmacological spectrum to produce the best outcome. In the case

150 where death of the organism (in the case of pathogens or agrochemical pests) or specific cell population (in tumors) is the desired outcome, it would be better to target key network hubs. Damage to these essential nodes in the network causes it to collapse. In other cases the desired outcome is manipulation of the network to restore it to near-normal operation. Understanding the robustness of the network could allow us to understand how to perturbed it to produce the most efficacious outcome. The task is then to design molecules so that they hit the appropriate nodes or vertices in the network [1]. Understanding the network properties of the interactome provides a means of improving the drug design and development process, potentially increasing efficacy of drugs and minimizing undesirable effects. It may, however, require a revision in the way drugs are assessed for registration by the FDA and other regulators. Networks and stem cells Systems biology studies how the components of biological systems interact with each other to produce the phenotypes observed. Not surprisingly, network models feature very strongly in systems biology. Good progress is being made in understanding gene regulatory networks in cells and organisms. Increasingly, the importance of protein–protein interactions (the proteome) is also being recognized. At another level, interactions between cell surface signals such as carbohydrates is also under intense study. The immense complexity of these networks and interactions prompts the question as to whether higher level, coarse-grained, or mesoscale descriptions of these networks are capable of representing the most important interactions that lead to the observed phenotypes. Simplistically, it might be assumed that the fractal, or scale-free nature of many biological networks might imply that higher levels of scale may be effective at capturing the essential elements of system behavior. There is increasing evidence that such models, which are also better matched to the availability of data, can produce useful insight and predictions of system level behavior [38–41]. Clearly there is much research activity in systems biology that could be captured under the network paradigm, and it is not the intention to review this. This work will concentrate on applications in regenerative medicine solely. Surprisingly, network and systems biology approaches to understanding and predicting the behavior of stem cells has received much less attention. One of the primary properties of stem cells that we need to understand is how they make fate decisions – that is, decisions on whether to self-renew and increase the size of the stem cell pool, die, or differentiate to become an adult cell. If the decision is taken to differentiate, which differentiation pathway should be followed? An understanding of what drives fate decisions, leading

151 to a rational means of controlling stem cell fate, would allow us to expand the pool of transplantable stem cells for bone marrow transplants for example, increasing dramatically the success of this therapy. Controlling differentiation would lead to methods for replacing or regenerating tissues damaged by disease or accident [42]. There are three main modelling approaches that have been applied to stem cells. Most frequently, dynamical systems models, which employ differential equations, are used to describe the interactions of stem cell intrinsic factors. Agent-based methods have been used by a very small number of groups. Network approaches most naturally fits the intrinsic and extrinsic interactions within stem cells but, surprisingly, have not been very widely used so far. Networks and biomaterials Modelling of biomaterials is nascent, but clearly important. Limited work has been reported that uses neural network-based QSAR modelling methods and molecular dynamics calculations to describe, explain, and predict biomaterials properties. There is great scope for this important area of modelling to be expanded [43]. Thus, network paradigms have not been applied to biomaterials modelling, with the exception of neural network quantitative structure–property relationships (QSPR) modelling [125]. As understanding of how biomaterials interact with cells and tissues grows, it is clear that networks will play a role in modelling these interactions. Network modelling approaches and tools As discussed previously, the interactions within even the simplest organisms is extremely complex. The networks that these interactions generate have not been properly characterized for even the simplest model organisms. Most progress has been made with yeast, where the entire interactome has been elucidated [44,45]. Lack of data and information for biological networks and the great complexity of the resulting networks even if all components and interaction could eventually be identified and quantified, suggests that mesoscale models are required. The most important questions to answer are whether mesoscale models can provide useful descriptions and predictions of biological systems, and how the appropriate level of scale can be identified. Increasingly, stem cell biologists report that small numbers of transcription factors seem to play a major role in the fate decisions in both embryonic and adult stem cells [46–52]. For example, in mouse embryonic stem cells, the intrinsic factors Oct4, Sox2, and Nanog play that role [49,51–55]. Extrinsic regulatory factors also play key roles in maintaining pluripotency. In mouse ESCs the key factors that have been identified are LIF, BMP, and Wnt [51].

152 In human hematopoietic stem cells, the important factors include Notch 1, BMP-1, and the cell cycle inhibitors p21 and p27 [56]. These master regulatory factors usually control many genes in the regulatory networks of stem cells, and often co-regulate each other [57]. Preliminary network modelling experiments have shown that mesoscale models of genetic regulatory networks in cells can indeed make useful predictions. Hart et al. [58] showed that relatively complex neural network models using 204 transcription factors from chromatin immunoprecipitation (ChIP) experiments, can accurately recapitulate to the temporal variation of gene expression in the yeast cell cycle. Interestingly, when they progressively eliminated the transcription factors, starting from those least relevant to the model, they found as few as five transcription factors could essentially reproduce the temporal variation in the genes (see Fig. 4). Winkler et al. [59,60] similarly found that five or fewer master regulatory genes were sufficient to accurately predict gene expression profiles in embryogenesis in the model organism C. elegans, and in the mouse. Although these studies did not directly involve stem cells, they described mesoscale modelling of regulatory networks in embryogenesis pathways closely related to stem cell differentiation pathways, and it is reasonable to conclude that similar models in stem cell differentiation pathways will also yield to mesoscale modelling.

Fig. 4. Performance of a neural net models of temporal variation of genes expression in yeast as a function of the number of transcription factors used in the model. Note the collapse of the model with less than 5 transcription factors out of the original 204.

153 These examples will be described in more detail in the section on recursive neural network models. Mesoscale network models of biological systems will be useful in describing, predicting, and ultimately controlling the properties of biological systems. Of most relevance to this review is their ability to model stem cell fate decisions for regenerative medicine, and build mesoscale models of key regulatory networks for drug design exploiting polypharmacology. In both drug design for therapy, or control of stem cell fate and design of biomaterials for regenerative medicine, methods for finding the appropriate level of scale are very important. The synthesis of a mesoscale model requires that the key controllers of network function be identified. In cellular networks these are assumed to represent the highly connected hubs in the network. Feature selection methods are crucial in identifying these controllers, and additionally often allow identification of genes controlled by them. An analogous situation applies in drug modelling using QSAR. It is not widely appreciated, but QSAR models can be considered mesoscale models of very complex structure–function relationships in biological systems. The models consist of a relatively small number of key molecular descriptors that control or modulate the observed, emergent, biological response. The biological system that links the interaction of the putative drug molecule with the biological response is almost always extremely complex, nonlinear, and unknown. Many ways of identifying important features or interactions in systems have been identified, and extended discussion of these is beyond the scope of this review. The reader is directed to recent reviews of feature selection in bioinformatics and drug design for further information [61–63]. We will, however, focus on some new feature selection methods that perform this function very efficiently, objectively, and in a supervised manner. To summarize, in drug design by QSAR or biomaterials design by QSPR, an important step in the modelling process is to identify a small subset of molecular descriptors that will give the best model. It is important to do this in a way that will not generate chance correlations (that generate apparently good models with little or no predictive power) [64]. Analysis of gene expression data from microarrays also involves a selection or filtering process to identify those genes with highest expression, or largest relative expression. Often this selection is based on filters that select genes that are expressed above a certain level, or that are differentially over- or underexpressed according to some differential expression threshold. Improved feature selection methods discussed further are likely to yield more relevant genes (those that might be used in mesoscale models and which represent important interactions in the regulatory network) for the experiment under study, selected in a more objective and supervised (context dependent) manner.

154 Feature selection The aim of feature selection is to select a relatively small set of features (e.g., molecular descriptors, or genes) that is most relevant to the biological property being modelled. In drug design it is possible to generate literally thousands of molecular descriptors using methods such as Comparative Molecular Field Analysis (CoMFA) [65] or DRAGON [66], or by quantum mechanical calculations. In regenerative medicine, array technologies also generate very large data sets with tens of thousands of independent variables such as gene expression levels. Modelling problems such as these, where the number of independent variable (descriptors) greatly exceed the number of dependent variable (biological responses) are known as grossly undetermined systems [67]. It is essential to prune these large feature spaces to a small number of relevant features. There are sound reasons for carrying out feature selection. If too many features are present, the model will be overfitted, and many, equally feasible models can be obtained form the data. It is also well known in statistical theory that sparser models (models built from fewer features) are better at generalizing (predicting) than less sparse models. One intuitively obvious reason why this occurs is that extra features in the model that do not contain much information, or that contain similar information to that in other features (i.e., are correlated with them) simply add noise to the model that obscures that signal and degrades the model’s predictive power. It is also important how the small, relevant feature sets are selected from the large pool of possibilities. As alluded to earlier, if this is not done carefully it is possible to introduce statistical artifacts like chance correlations. Finally, by reducing the size of feature space, and ensuring only relevant features are present, interpretation of the resulting model is often possible. Recent developments in the mathematics of grossly underdetermined systems have yielded novel sparse Bayesian feature selection methods that hold great promise for the types of feature selection problems encountered modelling for therapeutic and regenerative medicine. Figueiredo reported that an expectation maximization (EM) algorithm using a sparse (noninformative) prior could provide a parameter-free adaptive sparseness methodology [68]. In Figueiredo’s method, irrelevant parameters are automatically set exactly to zero. Unlike other ways of obtaining sparse classifiers (e.g., support vector machines, SVM) that involve (hyper)parameters which control the degree of sparseness of the resulting classifiers. Figueiredo’s approach does not involve any (hyper)parameters to be adjusted or estimated. This is achieved by the adoption of a Jeffreys’ noninformative hyperprior. He showed that this approach yields state-of-the-art performance that outperforms SVMs, and performs competitively with the best alternative

155 techniques. The method has been applied to supervised variable selection in QSAR with very encouraging results. QSAR Variable selection in QSAR commonly employs unsupervised (context independent) clustering, nonlinear mapping, or dimensional reduction methods. Supervised methods, in which the context of the experiment is used in the clustering or selection process, are employed less frequently but is potentially more valuable. Methods such as genetic algorithms, principal components analysis, stepwise regression, unsupervised and supervised clustering are commonly employed in QSAR and drug design for this purpose [61–63]. Clustering and classification of molecules has also been achieved by Kohonen networks or self-organizing maps [69,70]. Gasteiger’s group has led the application of these novel clustering methods in drug design, and has published extensively in this area [11–13,71–74]. SOMs provide a very good, nonlinear, and unsupervised way of cluster compounds according to properties, such as mode of action of clinical use (reference NCI database). Figure 5 shows how a SOM, with suitable descriptors, is able to cluster a series of molecules according to their modes of action [5]. Similar clusters have also been visualized and analyzed as networks by Yildirim et al. [18]. They developed networks based on the connectivity of drugs and drug targets that produced clusters that were functionally similar to those produced by clustering methods like SOM. They built a drug–target (DT) network based on clinical drugs that shared a common biological target. An example of such a network is illustrated in Fig. 6. They also generated target–protein networks in which two proteins are connected if they are acted on by the same drug. Microarrays/hub genes/diagnostics and classification Microarrays are one of the most useful sources of data for network models. cDNA arrays allow the genome of biological systems to be probed, and changes in response to specific conditions to be mapped. The types of arrays available are growing, and now include proteins (proteomics), lipids (lipidomics), sugars (glycomics), other metabolites (metabolomics), and kinase chips [75]. These techniques generate very large data sets, often containing tens of thousands of data points corresponding to the levels of gene expression, proteins, metabolites, etc. Methods for analysis of these data sets are often relatively simple, amounting to the application of filters for the absolute level of gene expression, and the level of differential gene expression, for example. Features that are selected by such filters are ranked and inferences made based on this ranking.

156

Fig. 5. Classification of a dataset into four different classes with a Kohonen neural network having a rectangular topology. Neurons are colored as follows: red, 5-HT1a-receptor agonists; orange, H2-receptor antagonists; yellow, MAOA inhibitors; green, thrombin inhibitors; black, conflict neurons; and white, unoccupied neurons. (The color version of this figure is hosted on Science Direct.)

Regardless of the types of interactions probed by the arrays, the problem is similar to that in QSAR where the number of independent variable (gene expression levels) is much larger than the number of dependent variable (arrays or conditions). Sparse Bayesian feature selection methods show considerable promise in identifying key, relevant genes in networks. For example, sparse feature selection methods have been used to identify classifier genes in a set of microarrays from four small, round, blue-cell tumors [76]. The method was able to find a small set of genes that could classify tissue sample in a test set accurately, and could discriminate between these tumors, other tumors not of that type, and non-tumor tissue [77]. Several of the classifier genes selected by the features selection method were known to

approved drugs and their target proteins. Circles and rectangles correspond to drugs and target proteins, respectively. A link is placed between a drug node and a target node if the protein is a known target of that drug. The area of the drug (protein) node is proportional to the number of targets that the drug has (the number of drugs targeting the protein). Color codes are given in the legend. (The color version of this figure is hosted on Science Direct.)

Fig. 6. Drug–target network (DT network) [18]. The DT network is generated by using the known associations between FDA-

158 be implicated in the tumors in the model. The method has been applied to other microarray analysis problems that are more relevant to regenerative medicine. Laslett et al. [78] reported a study of the very early stages of differentiation in embryonic stem cells. We employed the sparse feature selection methods on microarray data for four populations of embryonic stem cells with varying degrees of pluripotency. A small group of classifier genes were found and these were sufficient predict, with a high degree of confidence, the state of pluripotency of the stem cells. Several of the genes selected were known to be important for early differentiation, while other had not previously been associated with embryonic stem cell pluripotency. The key genes identified using these selection methods can be employed in mesoscale network models of regulatory networks as described below. They represent also an objective way of finding a suitable level of scale for network models. Network inference and analysis Much of systems biology is aimed at understanding how various biological interaction networks are organized in organisms. Much of this involves carefully designed experiments and models of specific parts of networks, and far fewer studies attempt to tackle larger scale modelling of these interaction networks. Microarray technologies clearly have provided methods for probing these complex interactions, providing a snapshot of a system at a particular instant of time, under specific conditions. A number of computational methods for using array data to infer networks have been devised, and a complete analysis of their advantages and disadvantages is beyond the scope of this chapter. One of the most powerful methods is the Bayesian Belief Net [79–82]. A Bayesian network is a graphical structure in which the nodes represent genes and their expression levels, the edges indicate interactions between genes (with conditional dependence relations), and which contains a suite of conditional probability distributions that jointly specify a distribution over a set of genes. A Bayesian network offers a simple and unique way of expanding the joint probability in terms of simpler conditional probabilities. An advantage of Bayesian networks is that a complex system of interacting genes can be visualized as being composed of simpler subsystems of genes, making interpretation and visualization clearer [81]. Boolean networks In its simplest form a regulatory network in a stem cell (of interest to regenerative medicine) can be represented by a Boolean network [83–86]. Boolean networks consist of nodes or vertices (that represent genes or families of genes), and edges (that represent interactions between nodes e.g.

159 transcription factors). The interactions between nodes are binary, being 1 when present and 0 when absent. The nodes contain rules that are triggered when specific Boolean expressions are triggered. In the case of a gene with three inputs, the rule may state for example, that the gene will be activated if inputs 1 and 2 are on but input 3 is off. The behavior of a Boolean network is deterministic, as the state of each gene changes according to its rule (truth table) each time the state of the network is updated. The entire state space of the network (the total of all possible states the system can adopt) can be calculated, and trajectories from any given starting point in the network result in the network reaching stationary or repeating patterns of activity called point or cyclic attractors. The regions of state space in which a given attractor is inevitably reached is called the basin of attractor for that specific attractor. Kauffman has drawn useful analogies between the behavior of Boolean networks, and the behavior of genes. He equated the attractors as corresponding to unique cell states of fates in a biological system [85,86]. Recursive networks There are two extremes in designing network models of regulatory processes. One can start with dense expression data (e.g., cDNA microarray data) and use inference methods (such as Bayesian belief nets [87–89]) to find networks consistent with the expression data. Alternatively, a large hypothetic network could be defined which is pruned or annotated using experimental data. Another alternative, that exploits mesoscale modelling, involves the use of a recursive neural network. This was first described in the context of modelling regulatory processes by Geard and Wiles [90]. They developed small network that modelled hypothetical gene regulation and expression in the model organism, the worm C Elegans. This proof of concept study showed that such models were capable of recapitulating expression levels in hypothetical genes. The method was extended and applied to real gene expression data by Winkler et al. [60], who showed that such models could explain the observed discretized gene expression levels in the same subset of cells involved in C. elegans embryogenesis with considerable fidelity. The method was able to predict whether specific genes were activated or not, with an error rate of less than 1%. The model was also capable of making predictions of gene expression in cells not used in developing the model, although the error rate in these predictions was considerably higher. Recursive neural networks were subsequently applied to quantitative and continuous experimental gene expression levels (as opposed to binary or discretized levels used in the C. elegans model) of a much larger number of genes in mouse embryogenesis pathway (Fig. 7), using the expression data from Hamatami et al. [91,92]. The recursive neural network models were mesoscale, requiring only 2 or 3 master

160

Fig. 7. Mouse embryogenesis pathway between fertilized egg (zygote) and blastocyst. (The color version of this figure is hosted on Science Direct.)

regulatory genes to accurately recapitulate the quantitative expressions of over 130 genes whose expression were determined by microarray analysis, as Fig. 8 shows. Moreover, the model was able to predict, with a surprisingly high level of fidelity, expression levels in cells in the embryogenesis pathway that were not used in developing the model [59]. The accuracy of prediction of expression of 135 mid pre-implantation genes from experiments by Hamatami et al. is illustrated in Fig. 8. The predictions of quantitative expression of these genes in the final three cells in the embryogenesis pathway based on a model trained on the expression levels in the first four cells in the pathway are shown in Fig. 9. Clearly the modelling method is capable of making accurate predictions about the level of expression of genes in the pathway, these genes effectively identifying the phenotype of the cell. It is being used to develop regulatory network models for stem cell differentiation. There is ample evidence that the properties of stem cell and other systems can often be modelled using a relatively small number of regulatory factors, thus providing a justification for using coarse-grained or mesoscale models. Hart and co-workers conducted an elegant study where they used a neural network to model the temporal variation of gene expression in yeast through two cell cycles. Their model used the levels of 204 transcription factors as input. They found the model performed well as the number of transcription factors was progressively reduced from 204 to 5, at which point the model collapsed [93].

Application of neural networks QSAR has exploited the abilities of neural networks to carry out model free, robust, nonlinear regression and classification for a considerable period of time. The use of neural networks in QSAR has been reviewed recently [2–4,7,8,36,94]. Neural networks have substantial advantages as pattern recognition and modelling tools for regenerative medicine and biomaterials, although this is only just beginning to be exploited. The most commonly used neural network, the backpropagation neural network, is capable, in principle,

161

Fig. 8. Ability of a mesoscale recursive neural network models to recapitulate the

quantitative expression levels of 135 mid pre-implantation genes. Three regulatory genes were used in the model. (The color version of this figure is hosted on Science Direct.)

of modelling any complex nonlinear relationship between molecular properties and biological properties, given sufficient data. Backpropagation neural networks provide an objective means of finding the level of nonlinearity in the relationship between structure and activity, and they can be readily trained to produce very good models. They do, however, have some disadvantages, the most prominent being that they can be overtrained (and become less able to generalize to new data), and finding the optimum network architecture and validating predictions can be very laborious. Overtraining results in an overly complex model that fits the noise as well as the underlying relationship in the data. Cross validation methods,

162

Fig. 9. Ability or mesoscale recursive neural network to predict the quantitative

expression levels of 135 genes in the 8-cell stage, morulla, and blastocysts using a models derived from the first four cell divisions only. (The color version of this figure is hosted on Science Direct.)

in which each compound is left out of the model in turn, and has its value predicted by the model, has a computational effort becomes increasingly onerous when larger data sets are modelled. For example, Ford et al. [95] used a consensus of 1,000 backpropagation neural network models to predict the activity of kinase ligands. In cancer diagnosis using microarray data, Kahn et al. [76] used a consensus of predictions of 3,750 backpropagation neural networks to produce a model that could distinguish between four types of small round blue-cell tumors based on microarray expression data. The shortcomings of standard backpropagation neural networks can be almost completely overcome by using a Bayesian regularized neural

163 network [96,97]. Employing a Bayesian regularizer allows the network to find the best balance between bias (model is too simple to explain the data) and variance (model is too complex and fits the noise, compromising its ability to predict new data). Such networks are trained until a maximum in the statistical evidence for the model is reached. These models are also substantially independent of the architecture of the network and do not need validation sets to stop the training, or to ensure maximum predictivity. Drug design Backpropagation neural networks have been applied widely in drug design and development, and it is beyond the scope of this review to discuss these. The reader is directed to recent reviews of the application of neural networks to drug design [4,7,10,98–105]. Bayesian regularized neural networks have been applied to a diverse range of modelling problems, such as drug efficacy modelling, blood–brain barrier partitioning, intestinal absorption, MHC class II binding [4,37,97,106–118]. Their use in drug discovery and development has been reviewed recently [2,119]. Neural networks are being adopted in drug discovery at an exponentially increasing rate, as the number of publications using this techniques illustrates (Fig. 2). Regenerative medicine There has been essentially no work published that describes the application of neural networks to stem cell research. There is clear scope for these methods to be used to build models describing the relationships between experimental variables and measured outcomes. Our group is starting to work in this space, applying Bayesian regularized neural networks to modelling stem cell behavior. The application of neural network models to biomaterials is nascent, with only a handful of publications describing simple application, with the majority of work being carried out at Rutgers University [20,120–126]. Most of the methods used in biomaterials modelling have been taken directly for neural network QSAR studies in drug design. Convergence It is clear that the network paradigm and network modelling methods are likely to be very important in regenerative medicine and biology. The ways in which networks have been employed in therapeutic and regenerative medicine have historically been distinct from those in therapeutic medicine. However, there overlap is starting to appear. With the interest in understanding the robustness and vulnerabilities of biological networks to better design and target drugs, there is a suggestion of convergence between these fields. Similarly, the emergence of work on applying neural networks and

164 graph theory to problems in regenerative materials and biomaterials design suggests a driving force for convergence from the opposite direction. If the fields of regenerative and therapeutic medicine/biology can learn from each other, substantial progress can be made in the very interesting, and little populated research overlap area that network paradigms occupy.

References 1. Korcsmaros T, Szalay MS, Bode C, Kovacs IA and Csermely P. How to design multitarget drugs: target search options in cellular networks. Expert Opin on Drug Discov 2007;2:1–10. 2. Winkler DA and Burden FR. Bayesian neural networks for modelling in drug discovery (invited review). Biosilico 2004;2:104–111. 3. Winkler DA. Neural networks in ADME and toxicity prediction. Drugs Future 2004; 29(10):1043–1057. 4. Winkler DA. Neural networks as robust tools in drug lead discovery and development. Mol Biotechnol 2004;27(2):139–167. 5. Terfloth L and Gasteiger J. Neural networks and genetic algorithms in drug design. Drug Discov Today 2001;6(15):S102–S108. 6. Sardari S and Sardari D. Applications of artificial neural network in AIDS research and therapy. Curr Pharm Des 2002;8(8):659–670. 7. Manallack DT and Livingstone DJ. Neural networks in drug discovery: have they lived up to their promise? Eur J Med Chem 1999;34(3):195–208. 8. Livingstone DJ, Manallack DT and Tetko IV. Data modelling with neural networks: advantages and limitations. J Comput Aided Mol Des 1997;11(2):135–142. 9. Aoyama T and Ichikawa H. Neural networks applied to pharmaceutical problems. 5. Obtaining the correlation indexes between drug activity and structural parameters using a neural network. Chem Pharm Bull(Tokyo) 1991;39(2):372–378. 10. Gasteiger J, Teckentrup A, Terfloth L and Spycher S. Neural networks as data mining tools in drug design. J Phys Org Chem 2003;16(4):232–245. 11. Gasteiger J and Li X. The Mapping of Molecular Electrostatic Potentials by KohonenNetworks Investigations of Neurotransmitters and their Agonists. In: SoftwareDevelopment in Chemistry, GDCH, Frankfurt/Main, 1993, Vol. 7, pp. 1–14. 12. Schneider G. Neural networks are useful tools for drug design. Neur Net 2000;13:13–15. 13. Anzali S, Gasteiger J, Holzgrabe U, Polanski J, Sadowski J, Teckentrup A and Wagener M. The use of self-organizing neural networks in drug design. Perspect Drug Discov Des 1998;9–11:273–299. 14. van Osdol WW, Myers TG and Weinstein JN. Neural network techniques for informatics of cancer drug discovery. Numer Comp Methods, Part C 2000;321:369–395. 15. Hopkins AL. Network pharmacology. Nat Biotechnol 2007;25(10):1110–1111. 16. Kitano H. A robustness-based approach to systems-oriented drug design. Nat Rev Drug Discov 2007;6:202–210. 17. Sharom JR, Bellows DS and Tyers M. From large networks to small molecules. Curr Opin Chem Biol 2004;8:81–90. 18. Yildirim MA, Goh KI, Cusick ME, Barabasi AL and Vidal M. Drug–target network. Nat Biotechnol 2007;25(10):1119–1126.

165 19. Hopkins AL, Mason JS and Overington JP. Can we rationally design promiscuous drugs? Curr Opin Struct Biol 2006;16(1):127–136. 20. Smith JR, Kholodovych V, Knight D, Welsh WJ and Kohn J. QSAR models for the analysis of bioresponse data from combinatorial libraries of biomaterials. QSAR Comb Sci 2005;24(1):99–113. 21. Finnigan J. The science of complex systems. Aust Sci 2005;June:32–34. 22. Strogatz SH. Exploring complex networks. Nature 2001;410:268–276. 23. Wagner A and Fell DA. The small world inside large metabolic networks. Proc R Soc Lond B 2001;268:1803–1810. 24. Arita M. Scale-freeness and biological networks. J Biochem 2005;138:1–4. 25. Avnir D, Bihan O and Malcai O. Is the geometry of nature fractal? Nature 1998; 279(5347):39–40. 26. de Arcangelis L and Herrmann HJ. Self-organized criticality on small world networks. Physica A 2002;308(1–4):545–549. 27. Stanley HE, Amarala LAN, Gopi Krishnan P, Ivanova PC, Keittb TH and Plerou V. Scale invariance and universality: organizing principles in complex systems. Physica A 2000;281:60–68. 28. Newman MEJ. The structure and function of complex networks. SIAM Rev 2003; 45(2):167–256. 29. Bemis GW and Murcko MA. The properties of known drugs. 1. Molecular frameworks. J Med Chem 1996;39(15):2887–2893. 30. Zhang Z and Grigorov MG. Similarity networks of protein binding sites. Proteins 2006;62:470–478. 31. Garcia-Domenech R, Galvez J, de Julian-Ortiz JV and Pogliani L. Some new trends in chemical graph theory. Chem Rev 2008;108(3):1127–1169. 32. Salerno W, Havlak P and Miller J. Scale-invariant structure of strongly conserved sequence in genomic intersections and alignments. Proc Natl Acad Sci USA 2006;103(35):13121–13125. 33. Albert R, Jeong H and Baraba´si A-L. Error and attack tolerance of complex networks. Nature 2000;406:378–382. 34. Lim J, Hao T, Shaw C, Patel AJ, Szabo´ G, Rual J-F, Fisk J, Li N, Smolyar A, Hill DE, Baraba´si A-L, Vidal M and Zoghbi HY. A protein–protein interaction network for human inherited ataxias and disorders of purkinje cell degeneration. Cell 2006;125: 801–814. 35. Loscalzo J, Kohane I and Barabasi A. Human disease classification in the postgenomic era: a complex systems approach to human pathobiology. Mol Syst Biol 2007;3:124. 36. Winkler DA and Burden FR. Application of neural networks to large dataset QSAR, virtual screening and library design (invited review). In: Combinatorial Chemistry Methods and Protocols, Bellavance-English L (ed), Totowa, New Jersey, Humana Press, 2002. ISBN 0-89603-980-3. 37. Winkler DA and Burden FR. Robust QSAR models from novel descriptors and Bayesian Regularized Neural Networks. Mol Simul 2000;24(4–6):243–258. 38. Barrett CL, Herring CD, Reed JL and Palsson BO. The global transcriptional regulatory network for metabolism in Escherichia coli exhibits few dominant functional states. Proc Natl Acad Sci USA 2005;102(52):19103–19108. 39. Csete M and Doyle J. Bow ties, metabolism and disease. Trends Biotechnol 2004; 22(9):446–450. 40. Csete ME and Doyle JC. Reverse engineering of biological complexity. Science 2002; 295(5560):1664–1669.

166 41. Israeli N and Goldenfeld N. Coarse-graining of cellular automata, emergence, and the predictability of complex systems. Phys Rev E 2006;73:026203-1–026203-17. 42. Lovell-Badge R. Overview: The future for stem cell research. Nature 2001; 414(1 November):88–91. 43. Kohn J, Welsh WJ and Knight D. A new approach to the rationale discovery of polymeric biomaterials. Biomaterials 2007;28(29):4171–4177. 44. Ge H, Liu Z, Church GM and Vidal M. Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat Genet 2001;29:482–486. 45. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M and Sakaki Y. A comprehensive twohybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA 2001;98:4569–4574. 46. Chew JL, Loh Y-H, Zhang W, Chen X, Tam W-L, Yeap L-S, Li P, Ang Y-S, Lim B, Robson P and Ng H-H. Reciprocal transcriptional regulation of Pou5f1 and Sox2 via the Oct4/Sox2 complex in embryonic stem cells. Mol Cell Biol 2005;25(14):6031–6046. 47. Zhu J and Emerson SG. Hematopoietic cytokines, transcription factors and lineage commitment. Oncogene 2002;21:3295–3313. 48. Gangenahalli GU, Gupta P, Saluja D, Verma YK, Kishore V, Chandra R, Sharma RK and Ravindranath T. Stem cell fate specification: role of master regulatory switch transcription factor PU.1 in differential hematopoiesis. Stem Cells Dev 2005;14: 140–152. 49. Orkin SH. Chipping away at the embryonic stem cell network. Cell 2005;122(6):828–830. 50. Rothenberg EV, Telfer JC and Anderson MK. Transcriptional regulation of lymphocyte lineage commitment. BioEssays 1999;21(9):726–742. 51. Pan GJ and Thomson JA. Nanog and transcriptional networks in embryonic stem cell pluripotency. Cell Res 2007;17(1):42–49. 52. Wang ZX, The CH-L, Kueh JLL, Lufkin T, Robson P and Stanton LW. Oct4 and sox2 directly regulate expression of another pluripotency transcription factor, Zfp206, in embryonic stem cells. J Biol Chem 2007;282(17):12822–12830. 53. Rodda DJ, Chew J-L, Lim L-H, Loh Y-H, Wang B, Ng H-H and Robson P. Transcriptional regulation of Nanog by Oct4 and Sox2. J Biol Chem 2005;280(26): 24731–24737. 54. Player A, Wang Y, Bhattacharya B, Rao M, Puri RK and Kawasaki ES. Comparisons between transcriptional regulation and RNA expression in human embryonic stem cell lines. Stem Cells Dev 2006;15(3):315–323. 55. Sun Y, Li H, Yang H, Rao MS and Zhan M. Mechanisms controlling embryonic stem cell self-renewal and differentiation. Crit Rev Eukaryot Gene Expr 2006;16(3): 211–231. 56. Stein MI, Zhu J and Emerson SG. Molecular pathways regulating the self-renewal of hematopoietic stem cells. Exp Hematol 2004;32(12):1129–1136. 57. Halley JD, Winkler DA, Burden FR. Towards a Rosetta stone for the stem cell genomestochastic gene expression, network architecture and external influences. Stem Cell Res 2008, in press. 58. Hart CE, Mjolsness E and Wold BJ. Transcription network: inferences from neural networks. PloS Comput Biol 2006;2:1592–1607. 59. Winkler DA, Burden FR and Halley JD. Modelling and predicting gene expression and fate decisions in mouse embryogenesis. Stem Cell Res, 2008, submitted. 60. Winkler DA, Burden FR and Halley JD. Using recursive networks to describe and predict gene expression and fate decisions during C. elegans embryogenesis. Artif Life, 2007, submitted.

167 61. Dutta D, Guha R, Wild D and Chen T. Ensemble feature selection: consistent descriptor subsets for multiple QSAR models. J Chem Inf Model 2007;47(3):989–997. 62. Liu Y. A comparative study on feature selection methods for drug discovery. J Chem Inf Model 2004;44:1823–1828. 63. Saeys Y, Inza I and Larranaga P. A review of feature selection techniques in bioinformatics. Bioinformatics 2007;23(19):2507–2517. 64. Topliss JG and Costello RJ. Chance correlations in structure–activity studies using multiple regression analysis. J Med Chem 1972;15(10):1066–1068. 65. Cramer RD, III, Patterson DE and Bunce JD. Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. J Am Chem Soc 1988;110(18):5959–5967. 66. Tetko IV, Gasteiger J, Todeschini R, Mauri A, Livingstone D, Ertl P, Palyulin VA, Radchenko EV, Zefirov NS, Makarenko AS, Tanchuk VY and Prokopenko VV. Virtual computational chemistry laboratory – design and description. J Comput Aided Mol Des 2005;19:453–463. 67. Platt DE, Parida L, Gao Y, Floratos A and Rigoutsos I. QSAR in grossly underdetermined systems: opportunities and issues. IBM J Res Dev 2001;45(3/4):533–544. 68. Figueiredo MAT. Adaptive sparseness for supervised learning. IEEE Trans Pattern Anal Mach Intell 2003;25:1150–1159. 69. Bienfait B. Applications of high-resolution self-organizing maps to retrosynthetic and qsar analysis. J Chem Inf Comput Sci 1994;34(4):890–898. 70. Bienfait B and Gasteiger J. Checking the projection display of multivariate data with colored graphs. J Mol Graph Model 1997;15(4):203–215. 71. Gasteiger J. Chemoinformatics in drug design. Drugs Future 2007;32(Suppl.):A–2. 72. Gasteiger J, Li XZ and Uschold A. The beauty of molecular-surfaces as revealed by selforganizing neural networks. J Mol Graph 1994;12(2):90–97. 73. Gasteiger J and Zupan J. Neural networks in chemistry. Angew Chem Int Ed Engl 1993; 32(4):503–527. 74. Kaiser D, Terfloth L, Kopp S, Schulz J, de Laet R, Chiba P, Ecker GF and Gasteiger J. Self-organizing maps for identification of new inhibitors of P-glycoprotein. J Med Chem 2007;50(7):1698–1702. 75. Stark J, Callard R and Hubank M. From the top down: towards a predictive biology of signalling networks. TRENDS in Biotechnol 2003;21(7):290–293. 76. Khan J, Wei JS, Ringne´r M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C and Meltzer PS. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 2001;7:673–679. 77. Burden FR, Winkler DA and Halley JD. Diagnosis of small round blue cell tumours using microarrays and sparse methods. Int J Cancer, 2008, submitted. 78. Laslett AL, Grimmond S, Gardiner B, Stamp L, Lin A, Hawes SM, Wormald S, NikolicPaterson D, Haylock D and Pera MF. Transcriptional analysis of early lineage commitment in human embryonic stem cells. BMC Dev Biol 2007;7:12–30. 79. Xiong MM, Li J and Fang XZ. Identification of genetic networks. Genetics 2004; 166(2):1037–1052. 80. Husmeier D. Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics 2003;19(17): 2271–2282. 81. Husmeier D. Reverse engineering of genetic networks with Bayesian networks. Biochem Soc Trans 2003;31:1516–1518.

168 82. Huang Z, Li J, Su H, Watts GS and Chen H. Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule mining. Decision Support Syst 2007;43(4):1207–1225. 83. Gupta RR and Achenie LEK. A network model for gene regulation. Comput Chem Eng 2007;31(8):950–961. 84. Huang S. Genomics, complexity and drug discovery: insights from Boolean network models of cellular regulation. Pharmacogenomics 2001;2(3):203–222. 85. Kauffman S. The ensemble approach to understand genetic regulatory networks. Physica a-Stat Mech App 2004;340(4):733–740. 86. Kauffman S, Peterson C, Samuelsson B and Troein C. Random Boolean network models and the yeast transcriptional network. Proc Natl Acad Sci USA 2003;100(25): 14796–14799. 87. Cheng J and Greiner R. Learning Bayesian belief network classifiers algorithms and system. Advances in Artificial Intelligence Proceeding of the 14th Biennial Conference of the Canadian Society for Computational Studies of Intelligence (Stroulia E and Matwin S, eds), Ottawa, 7–9 June 2001, Springer-Verlag, Berlin, pp. 141–151. 88. Ray SS, Bandyopadhyay S, Mitra P and Pal SK. Bioinformatics in neurocomputing framework. IEE Proc.-Circuits Devices Syst 2005;152(5):556–564. 89. Friedman N, Linial M, Nachman I and Pe’er D. Using Bayesian networks to analyze expression data. J Comput Biol 2000;7(3–4):601–620. 90. Geard N and Wiles J. A gene network model for developing cell lineages. Artif Life 2005;11(3):249–267. 91. Zernicka-Goetz M. Cleavage pattern and emerging asymmetry of the mouse embryo. Nat Rev Mol Cell Biol 2005;6(December):919–928. 92. Hamatani T, Carter M, Sharov A and Ko M. Dynamics of global gene expression changes during mouse preimplantation development. Dev Cell 2004;6(1):117–131. 93. Hart CE, Mjolsness E and Wold BJ. Connectivity in the yeast cell cycle transcription network: inferences from neural networks. Plos Comput Biol 2006;2(12):1592–1607. 94. Winkler D. The broader applications of neural and genetic modelling methods. Drug Discov Today 2001;6(23):1198–1199. 95. Ford MG, Pitt WR and Whitley DC. Selecting compounds for focused screening using linear discriminant analysis and artificial neural networks. J Mol Graph Model 2004; 22(6):467–472. 96. Burden FR and Winkler DA. New QSAR methods applied to structure–activity mapping and combinatorial chemistry. J Chem Inf Comput Sci 1999;39(2):236–242. 97. Burden FR and Winkler DA. Robust QSAR models using Bayesian regularized neural networks. J Med Chem 1999;42(16):3183–3187. 98. Agrafiotis DK, Cedeno W and Lobanov VS. On the use of neural network ensembles in QSAR and QSPR. J Chem Inf Comput Sci 2002;42(4):903–911. 99. Ajay A. Unified framework for using neural networks to build QSARs. J Med Chem 1993;36(23):3565–3571. 100. Basak SC, Grunwald GD, Gute BD, Balasubramanian K and Opitz D. Use of statistical and neural net approaches in predicting toxicity of chemicals. J Chem Inf Comput Sci 2000;40(4):885–890. 101. Di Fenza A, Alagona G, Ghio C, Leonardi R, Giolitti A and Madami A. Caco-2 cell permeability modelling: a neural network coupled genetic algorithm approach. J Comput Aided Mol Des 2007;21(4):207–221. 102. Salt DW, Yildiz N, Livingstone DJ and Tinsley CJ. The use of artificial neural networks in QSAR. Pestic Sci 1992;36(2):161–170.

169 103. Penzotti JE, Landrum GA and Putta S. Building predictive ADMET models for early decisions in drug discovery. Curr Opin Drug Discov Devel 2004;7(1):49–61. 104. Weaver DC. Applying data mining techniques to library design, lead generation and lead optimization. Curr Opin Chem Biol 2004;8(3):264–270. 105. Perkins R, Fang H, Tong W and Welsh WJ. Quantitative structure–activity relationship methods: perspectives on drug discovery and toxicology. Environ Toxicol Chem 2003; 22(8):1666–1679. 106. Winkler DA and Burden FR. Modelling blood–brain barrier partitioning using Bayesian neural nets. J Mol Graph Model 2004;22(6):499–505. 107. Polley MJ, Winkler DA and Burden FR. Broad-based quantitative structure–activity relationship modeling of potency and selectivity of farnesyltransferase inhibitors using a Bayesian regularized neural network. J Med Chem 2004;47(25):6230–6238. 108. Polley MJ, Burden FR and Winkler DA. Predictive human intestinal absorption QSAR models using Bayesian regularized neural networks. Aust J Chem 2005;58(12): 859–863. 109. Burden FR and Winkler DA. A quantitative structure–activity relationships model for the acute toxicity of substituted benzenes to Tetrahymena pyriformis using Bayesianregularized neural networks. Chem Res Toxicol 2000;13(6):436–440. 110. Burden FR and Winkler DA. Predictive Bayesian neural network models of MHC class II peptide binding. J Mol Graph Model 2005;23(6):481–489. 111. Caballero J and Fernandez M. Linear and nonlinear modeling of antifungal activity of some heterocyclic ring derivatives using multiple linear regression and Bayesianregularized neural networks. J Mol Model 2006;12(2):168–181. 112. Caballero J, Garriga M and Fernandez M. 2D autocorrelation modeling of the negative inotropic activity of calcium entry blockers using Bayesian-regularized genetic neural networks. Bioorg Med Chem 2006;14(10):3330–3340. 113. Fernandez M and Caballero J. QSAR modeling of matrix metalloproteinase inhibition by N-hydroxy-alpha-phenylsulfonylacetamide derivatives. Bioorg Med Chem 2007; 15(18):6298–6310. 114. Fernandez M and Caballero J. QSAR models for predicting the activity of non-peptide luteinizing hormone-releasing hormone (LHRH) antagonists derived from erythromycin A using quantum chemical properties. J Mol Model 2007;13(4):465–476. 115. Fernandez M, Carreiras MC, Marco JL and Caballero J. Modeling of acetylcholinesterase inhibition by tacrine analogues using Bayesian-regularized Genetic Neural Networks and ensemble averaging. J Enzyme Inhib Med Chem 2006;21(6):647–661. 116. Gonzalez MP, Caballero J, Tundidor-Camba A, Helguera AM and Ferna´ndez M. Modeling of farnesyltransferase inhibition by some thiol and non-thiol peptidomimetic inhibitors using genetic neural networks and RDF approaches. Bioorg Med Chem 2006; 14(1):200–213. 117. Xu M, Zeng G, Xu X, Huang G, Jiang R and Sun W. Application of Bayesian regularized BP neural network model for trend analysis, acidity and chemical composition of precipitation in North Carolina. Water Air Soil Pollut 2006;172(1–4): 167–184. 118. Xu M, Zeng G-M, Xu X-Y, Huang G-H, Sun W and Jiang X-Y. Application of Bayesian regularized BP neural network model for analysis of aquatic ecological data – A case study of chlorophyll-a prediction in Nanzui water area of Dongting Lake. J Environ Sci (China) 2005;17(6):946–952. 119. Niculescu SP. Artificial neural networks and genetic algorithms in QSAR. J Mol Struct Theochem 2003;622(1–2):71–83.

170 120. Martynenko A, Yang SX and Pan L. Intelligent computation of moisture content in shrinkable biomaterials. Drying Technol 2007;25(10):1667–1676. 121. Lucchinetti E and Stussi E. Prediction of elasticity constants in small biomaterial samples such as bone. A comparison between classical optimization techniques and identification with artificial neural networks. Proc Inst Mech Eng [H] 2004;218(H6):389–405. 122. Kaminski W, Strumi""o P, Tomczak E and Zbicinsk I. Modelling of thermal degradation process dynamics of bioproducts using artificial neural networks. J Syst Eng 1996;6(3): 159–165. 123. Gubskaya AV, Kholodovych V, Knight D, Kohn J and Welsh WJ. Prediction of fibrinogen adsorption for biodegradable polymers: integration of molecular dynamics and surrogate modeling. Polymer 2007;48(19):5788–5801. 124. Schut J, Bolikal D, Khan IJ, Pesnell A, Rege A, Rojas R, Sheihet L, Murthy NS and Kohn J. Glass transition temperature prediction of polymers through the massper-flexible-bond principle. Polymer 2007;48(20):6115–6124. 125. Smith JR, Kholodovych V, Knight D, Kohn J and Welsh WJ. Predicting fibrinogen adsorption to polymeric surfaces in silico: a combined method approach. Polymer 2005;46(12):4296–4306. 126. Smith JR, Knight D, Kohn J, Rasheed K, Weber N, Kholodovych V and Welsh WJ. Using surrogate modeling in the prediction of fibrinogen adsorption onto polymer surfaces. J Chem Inf Comput Sci 2004;44(3):1088–1097.

171

Use of the cauliflower Or gene for improving crop nutritional quality Xiangjun Zhou1, Joyce Van Eck2 and Li Li1, 1

U.S. Department of Agriculture-Agricultural Research Service, Plant, Soil and Nutrition Laboratory, Department of Plant Breeding and Genetics, Cornell University, Ithaca, NY 14853, USA 2 Boyce Thompson Institute for Plant Research, Cornell University, Ithaca, NY 14853, USA Abstract. Carotenoids are a group of pigments that are essential to human diets. An increasing interest in carotenoids as a nutritional source of vitamin A and health-promoting compounds has prompted the recent progress in metabolic engineering of carotenogenesis in food crops. Current strategies have been mainly focused on manipulating genes encoding carotenogeic enzymes. In many cases, it is difficult to reach the desired levels of carotenoid enhancement. In this chapter, we briefly summarize the recent progress on our understanding of carotenoid biosynthesis. We describe the isolation of a novel gene, the Or gene, from a highb-carotene orange cauliflower mutant. The Or gene encodes a plastid-targeted protein containing a cysteine-rich zinc finger domain and appears to be plant-specific. The insertion of a copia-like LTR retrotransponson in the Or gene confers high levels of carotenoid accumulation in the normally low-pigmented tissues. Rather than directly regulating carotenoid biosynthesis, the Or gene controls carotenoid accumulation by inducing the formation of chromoplasts, which provide a metabolic sink to sequester and deposit carotenoids. Examination of the Or transgenic potato tubers confirms that the Or-induced carotenoid accumulation is associated with the formation of a metabolic sink. Thus, the Or gene offers a new molecular tool to complement current approaches for nutritional enhancement in agriculturally important crops. Keywords: carotenoids, cauliflower, Or gene, b-carotene, chromoplast, metabolic sink, crop nutrition enhancement.

Introduction Carotenoids are a large group of natural pigments produced by all chlorophyll-containing photosynthetic organisms, some bacteria, and many species of fungi. Carotenoids are indispensable to animals and humans in providing precursors for vitamin A synthesis. Vitamin A fulfills many physiological functions in humans such as vision, reproduction, and cell proliferation [1]. Its deficiency, which affects over 140–250 million children under five years of age (World Health Organization), remains one of the most noticeable nutritional problems in many parts of the world. In addition, carotenoids as antioxidant compounds provide additional health benefits. Dietary intake of carotenoid-rich foods has been implicated in reducing the Corresponding author: Tel.: +1-607-2555708. Fax: +1-607-2551132.

E-mail: [email protected] (L. Li). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00006-9

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

172 incidence of cancer and cardiovascular diseases [2,3]. Some carotenoids also offer protection against age-related eye diseases, such as macular degeneration, the leading cause of age-related blindness. Apparently, developing carotenoid-enriched food plants will be the most effective approach to maximize their nutritional and health potential. Therefore, elucidation of the mechanisms underlying carotenoid biosynthesis and metabolic manipulation of carotenogenesis in food crops has become one of the most interesting research areas in current plant science and biotechnology. Most of our knowledge on carotenoid biosynthesis in plants has been derived from studies of Arabidopsis thaliana, cyanobacteria, tomato, and a few other plant species. In the past decades, a complete set of genes and enzymes of carotenoid biosynthesis has been isolated and characterized by a combination of molecular, genetic, biochemical, and genomic approaches [4–6]. The availability of a large number of carotenogenic genes from bacteria and plants along with the information on carotenoid metabolism has opened the door to genetic engineering of carotenoid content and composition in food plants to benefit humans [5,7–9]. So far, the popular strategy is to overexpress plant or bacterial genes that control the committed steps of carotenoid biosynthesis in food crops. Golden Rice is one of the best-known examples in using this strategy for improving crop nutritional value. In the late generation of Golden Rice 2, overexpression of genes in a minicarotenoid biosynthetic pathway leads to the accumulation of up to 37 mg/g total carotenoids in rice endosperm, a level adequate to provide most of the recommended dietary allowance of vitamin A for children in an average daily consumption of rice [10]. However, in many other cases, alteration of expression of carotenoid biosynthetic genes is insufficient to drastically enhance carotenoid levels in transgenic plants [5], or even leads to unexpected phenotypic changes [11]. This makes it essential to search for novel genes controlling plant carotenogenesis and to gain a better understanding of the mechanisms underlying carotenoid biosynthesis and accumulation in plants. Cauliflower Or gene represents a novel gene mutation. It causes many lowpigmented tissues of the plant, most noticeably the edible curd and the shoot meristem to accumulate high levels of b-carotene and turns these tissues orange [12–14]. The Or gene has been isolated by a map-based cloning strategy [15]. This gene appears to represent a regulatory gene in controlling carotenoid accumulation. It functions in increasing the sink capacity rather than altering the expression of genes involved in carotenoid biosynthesis [14–16]. In this chapter, we briefly summarize the progress on carotenoid biosynthesis and metabolic engineering of carotenoids in food crops for improving their nutritional value. We describe the isolation of the Or gene from the orange cauliflower mutant and elucidation of its functional role in controlling carotenoid accumulation. The successful use of the Or gene to

173 enhance carotenoid levels in transgenic potato tubers demonstrates that manipulation of sink formation offers an alternative strategy to increase carotenoid content in food crops.

Carotenoid biosynthesis in plants Carotenoids are comprised of carotenes and their oxidized form, xanthophylls. The carotenoid biosynthetic pathway (Fig. 1) has been long established by labeling and inhibition studies and mutant analysis, but isolation of carotenogenic genes has only been achieved during the past two decades [4]. Carotenoids are mainly C40 isoprenoids synthesized de novo essentially in all kinds of plastids except proplastids in plants [17]. Like other plastidic isoprenoids, carotenoid biosynthesis starts with the synthesis of isopentenyl pyrophosphate (IPP) and its allylic isomer dimethylallyl pyrophosphate (DMAPP) in plastids. Three IPP molecules are added to DMAPP to produce geranylgeranyl pyrophosphate (GGPP), the immediate precursor for biosynthesis of carotenoids, as well as tocopherols, gibberellins, phytol, taxol, etc. The first committed step in carotenoid biosynthesis is the condensation of two molecules of GGPP to produce phytoene catalyzed by phytoene synthase. Subsequently, two related enzymes phytoene desaturase (PDS) and z-carotene desaturase (ZDS) catalyze phytoene into lycopene. Two cis-trans isomerases of Z-ISO and CRTISO are needed to convert poly-cis-compounds into alltrans forms in plants. A single enzyme carotene desaturase (crtI) in bacteria confers the same reactions. Lycopene is the branching point of this pathway and cyclized to yield a- and b-carotene by lycopene e-cyclase and/or lycopene b-cyclase. These carotenes can undergo hydroxylation and epoxidation reactions to form xanthophylls, such as lutein, zeaxanthin, violaxanthin, and neoxanthin. Xanthoxin, the product that results from violaxanthin/neoxanthin cleavage by 9-cis-epoxycarotenoid dioxygenase, provides the direct substrate for phytohomone ABA synthesis [18,19].

Regulation of carotenoid biosynthesis in plants The composition and relative abundance of various carotenoids in plants are remarkably conserved in green tissues, but vary in a broad range in nongreen tissues or organs, indicating that plants have developed complex regulatory mechanisms controlling carotenogenesis. Several processes affect carotenoid accumulation, including those that determine the supply of isoprenoid precursors, the catalytic activity of the pathway, and metabolic turnover or conversion. In addition, carotenoid sequestration is also known to play an important role.

174 Pyruvate + Glyceraldehyde-3-phosphate Deoxyxylulose-5-phosphate synthase

1-Deoxy-D-xylulose 5-phosphate (DXP)

MEP pathway

Deoxylulose-5-phosphate reductoisomerase

2-C-Methyl-D-erythritol 4-phosphate (MEP) 1-hydroxy-2-methyl-2-(E)-butenyl 4-phosphate reductase

Isopetenylpyrophosphate (IPP)

Dimethylallyl diphosphate (DMAPP)

GGPP synthase

Condensation

Geranylgeranyl diphosphate (GGPP) Phytoene synthase

Phytoene Phytoene desaturase 15-cis ζ-Carotene isomerase

Desaturation Isomerization

ζ-Carotene ζ-Carotene desaturase Carotene isomerase

Lycopene Lycopene β-cyclase Lycopene ε-cyclase

Cyclization

α-Carotene

Lycopene β-cyclase

β-Carotene

β-Carotene hydroxylase ε-carotene hydroxylase

Lutein

Xanthophyll formation

β-Carotene hydroxylases

Zeaxanthin

Violaxanthin deepoxidase

Zeaxanthin epoxidase

Antheraxanthin Violaxanthin deepoxidase

Zeaxanthin epoxidase

violaxanthin Neoxanthin synthase

Neoxanthin Fig. 1. Carotenoid biosynthetic pathway in plants.

Supply of isoprenoid precursors GGPP serves as a common substrate for biosynthesis of carotenoids, gibberellins, phytol, and other isoprenoid compounds. Alteration of the precursor pool size into carotenogenic pathway is likely to affect carotenoid

175 levels [20]. For example, the 2-C-methyl-D-erythritol-4-phosphate (MEP) pathway synthesizes IPP and DMAPP for the production of plastidic isoprenoids. The null mutation of CLA1, which encodes the first enzyme in the pathway, produces less precursors, leading to lack of chlorophylls and carotenoid pigments with a retardation of chloroplast development and an albino phenotype [21,22]. Alteration of the expression of this gene results in different levels of carotenoids as well as other isoprenoids such as chlorophylls, tocopherols, abscisic acid, and gibberellins [23]. Hydroxymethylbutenyl diphosphate reductase (HDR) catalyzes the simultaneous synthesis of IPP and DMAPP in the last step of the MEP pathway. During tomato fruit ripening and Arabidopsis seedling de-etiolation, increased carotenoid production is associated with enhanced expression of the HDR gene [24]. These studies show the effect of availability of isoprenoid precursors for carotenoid accumulation. Regulation of carotenoid biosynthetic genes Among the carotenoid biosynthetic enzymes, PSY, which catalyzes the first committed step of carotenoid biosynthesis, is believed to control isoprenoid flux into the pathway [25]. Therefore, PSY is a highly regulated point in carotenoid biosynthesis. Light has been shown to have a strong effect on carotenogenesis in photosynthetic tissue while developmental cues affect carotenoid content in flowers and fruits. Studies show that both light and development cues regulate PSY expression. Light affects carotenoid biosynthesis by increasing expression of PSY in green pepper leaves, mustard, and Arabidopsis seedlings [26,27]. Comparison of Arabidopsis wild-type and phytochrome mutants reveals that up-regulation of PSY expression is mediated by phytochrome [27]. Indeed, the full-length promoter of Arabidopsis PSY mediates positive responses towards different light qualities [28]. On the other hand, PSY is strongly induced during flower and fruit development in tomato plants, implying unlike that in the leaves, PSY is mainly regulated by developmental signals in those tissues [29]. PDS, the second enzyme in carotenoid biosynthetic pathway, is also strongly induced during flower and fruit development. In tomato, high levels of PDS expression are found in organs and tissues where chromoplasts are formed [30]. More importantly, carotenogenesis requires coordinated expression of multiple carotenoid biosynthetic genes as demonstrated during citrus fruit maturation [31]. Metabolic turnover or conversion Metabolic turnover of carotenoids not only helps to maintain the steady level of carotenoids in plants, but also produces important signaling and accessory apocarotenoid molecules, such as phytohormone ABA and flavor volatiles

176 beta-ionone, pseudoionone, and geranylacetone for a variety of biological processes [32]. A family of carotenoid cleavage dioxygenases (CCDs) catabolizes oxidative cleavage of carotenoids. The first CCD gene was cloned through analysis of a maize ABA deficient mutant viviparous 14 [19]. Subsequently, more CCD genes were cloned and identified from other plant species, including Arabidopsis, tomato, petunia, cowpea, avocado, and citrus [33]. Analysis of enzymatic activities reveals that CCDs display complicated substrate and cleavage site specificity. In Arabidopsis, there are nine members of CCDs, of which AtCCD1 cleaves a variety of carotenoids symmetrically at the 9,10 or 9u,10u double bond to form a dialdehyde and one or two C13 products, whereas AtCCD7 protein catalyzes a specific 9,10 cleavage of betacarotene to produce apo-beta-carotenal and beta-ionone [34]. Similarly, CitCCD1 protein cleaved b-cryptoxanthin, zeaxanthin, and all-transviolaxanthin at the 9,10 and 9u,10u positions and 9-cis-violaxanthin at the 9u,10u positions [35]. Another class of CCDs is 9-cis-epoxycarotenoid dioxygenases (NCEDs), including VP14, which specifically cleaves 11,12 double bonds [18,35,36]. The third class of CCDs cleaves at the 7,8 double bonds of zeaxanthin [37]. In addition, lycopene cleavage dioxygenase, which is involved in sequential conversion of lycopene into bixin, cleaves lycopene at the 5,6 and 5u,6u positions [38]. A body of evidence shows that oxidative cleavage of carotenoids is induced under environmental stresses [18,36,39]. Additionally, circadian rhythm has been shown to affect carotenoid catabolism [40]. Moreover, developmental cues also play an important role. The transcript of LeCCD1 gene, which could confer the formation of the important flavor volatiles in tomato fruit, is accumulated during the ripening process [32]. Carotenoid sequestration One of the major carotenoid storage organelles within plant cells is chromoplasts. High levels of carotenoid accumulation in chromoplasts contribute the red, orange, and yellow color found in many flowers, fruits, and vegetables. While chromoplasts frequently derive from fully developed chloroplasts, as seen during fruit ripening of tomato and pepper, they also arise from other non-photosynthetic plastids, such as in carrot and squash [41]. Chromoplasts develop lipoprotein-sequestering substructures to sequester and retain large quantity of carotenoids [42]. Based on their ultrastructures, the carotenoid-sequestering structures within chromoplasts fall into globular, crystalline, membranous, fibrillar, and tubular types [43]. The various compositions of carotenoids, polar lipids, and proteins may determine the type of these structures. Most of what we have learned about the carotenoid sequestration comes from studies of globular, fibrilar, or tubular-type chromoplasts. Several genes encoding carotenoid-associated proteins in chromoplasts have been cloned

177 and identified [42]. The first fibrillin (fib) gene was isolated from pepper fruits [44]. Shortly after, another cDNA encoded the chromoplast-specific carotenoid-associated protein (CHRC) was cloned from Cucumis sativus corollas [45]. Both proteins carry a hydrophobic hairpin domain, which might be involved in interactions with carotenoids. More homologs have continually been characterized from different types of plastids and species [43,46], suggesting that they are involved in sequestration of not only carotenoids but also other hydrophobic compounds. The expression of genes encoding carotenoid-associated proteins is well associated with chromoplast development and carotenoid accumulation. The fib gene was shown to be strongly induced during fruit ripening [44]. The ChrC message was detected only in cucumber corollas, where its level increased in parallel to flower development [45]. Overexpession of the pepper fib gene in tomato fruits leads to a twofold increase in carotenoid content [47], while down-regulation of LeCHRC causes transgenic tomato flowers to accumulate 30% less carotenoids [48]. Metabolic engineering of carotenoids in crops Significant progress has been achieved in quantitative and qualitative manipulation of carotenoids in food crops for enhancing their nutritional quality and health benefits [5,7–9]. The strategy commonly used is to alter expression of the key enzymes or several enzymes of a mini-biosynthetic pathway in specific tissues such as in seeds, fruits, or other storage organs. Phytoene synthase is one of the key metabolic enzymes involved in carotenogenesis, and thus a prominent target for genetic manipulation of carotenoid content in food plants. Overexpression of the phytoene synthase gene from either bacteria or plants results in significant increase in total carotenoid levels in tomato fruit [49], potato tuber [50], and canola seed [51]. Expression of multiple biosynthetic genes in the carotenoid biosynthetic pathway leads to a profound increase of b-carotene in Golden Rice [10,52] and ‘‘golden’’ potato [53], as well as the accumulation of zeaxanthin in tomato fruit [54]. Qualitative modification of carotenoid composition in food crops by tissue-specific regulation of endogenous genes in the pathway is achieved for b-carotene or zeaxanthin enhancement in potato [55–57] and in tomato [58]. Expression of b-carotene ketolase gene in ketocarotenoid biosynthesis confers the accumulation of astaxanthin, a new and higheconomic value carotenoid, in potato tubers [59] and in model plants [60–62]. In a few cases, alteration of the expression of genes involved in light signal pathway also leads to enhanced levels of carotenoid accumulation. Cryptochromes are blue light photoreceptors found in plants, bacteria, and animals. Overexpression of the tomato CRY2 gene results in overproduction of anthocyanins and chlorophyll in leaves and of flavonoids and lycopene in fruits along with other phenotypic changes [63]. In tomato, fruit-specific

178 suppression of DET1 which encodes a nuclear regulatory protein in photomorphogenesis significantly increases carotenoid and flavonoid contents [64]. These results undoubtedly show that carotenoid biosynthesis is affected by other signal modules, indicating an alternative approach in carotenoid enhancement. The basic concept of metabolic engineering of carotenogenesis is to increase the metabolic flux towards carotenoid biosynthesis. However, in some cases, the increased flux into carotenogenesis can alter or reduce flux in other competing pathways, leading to unexpected phenotypic changes. For example, overexpression of PSY in canola and Arabidopsis seeds causes not only a high level of accumulation of carotenoids, but also leads to a delayed germination phenotype due to an increased content of the carotenoid-derived ABA [51,65]. Constitutive expression of tomato PSY-1 results in a stunted growth due to an insufficient synthesis of gibberellins and producing pleiotrophic effects such as premature pigmentation of seed coats and cotyledons [11]. Although it is possible to produce transgenic crops with significant increased levels of carotenoids via alteration of the expression of genes involved with carotenogenesis, the extent is limited due to a lack of fully understanding of the mechanisms underlying carotenogenesis and its interaction with other metabolic processes [7]. Novel carotenoid gene mutations will provide not only the new insights into the control mechanisms, but also possible genetic tools to manipulate carotenogenesis in food crops. Recent studies on the Or gene from an orange cauliflower mutant has offered such an endeavor for carotenoid enhancement. Functional analysis of the Or gene from cauliflower Characteristics of the orange cauliflower (Brassica oleracea var. botrytis) mutant The eye-appealing orange cauliflower (Fig. 2A) was first discovered in Bradford Marsh in Canada in 1970. Through many years of work by Professor Emeritus Michael Dickson at Cornell University and by other breeders, the orange cauliflower is now commercially available. The orange cauliflower results from a spontaneous mutation of a single gene, designated as Or for Orange gene [12]. The Or gene mutation causes not only the normally white curd to turn orange, but also imparts visible orange coloration in other parts of the plant, such as in the shoot meristems, the pith of the stem, and the vasculature at the base of the petioles [14]. The Or homozygous plants have an intense orange coloration in these tissues and exhibit stunted growth with a small curd and a delayed curd formation. The Or heterozygous plants are less pigmented and exhibit normal growth as wild-type plants. HPLC analysis reveals that while the wild-type curd tissue

179

A

WT

B

VC

Or

Or

Fig. 2. Effect of the Or gene on carotenoid accumulation in cauliflower (A) and transgenic potato tubers (B). (The color version of this figure is hosted on Science Direct.)

contains negligible amounts of carotenoids, the Or plants accumulate high levels of mainly b-carotene. None of the precursors of b-carotene was detected in either genotype. The outer curd tissues of Or homozygous plants accumulate approximately 8 mg/g fresh weight b-carotene, a level several hundred fold higher than that detected in comparable wild-type tissues [14]. No differences in leaf carotenoid content and composition were observed between wild-type and Or homozygous plants. Microscopic analysis shows that the b-carotene accumulation occurs within chromoplasts and is of several forms, predominantly as sheet structures [14]. These sheets occur as helical ribbons reminiscent of carotenoid-sequestering sheets in carrot root [66] and appear to be highly ordered structures. Based on the substructures, the Or chromoplasts are classified as membranous chromoplasts [67]. While each shoot apical meristem or curd inflorescence meristem cell of wild-type plants contains numerous colorless proplastids or leucoplasts, only one or two large chromoplasts are found in each affected cell in these Or tissues, suggesting a limited plastid division in the Or mutant [14]. The one or two chromoplasts constitute the entire plastidome in these cells [67].

180 The expression of carotenoid biosynthetic genes including PSY, PDS, ZDS, LCYB, LCYE, and CHYB is not dramatically altered by the mutation; neither is the expression of isoprenoid biosynthetic genes, including DXS, DXR, IPI, and GGPS [14]. The calli derived from the Or mutant seedlings accumulate significantly higher levels of carotenoids than those from wildtype seedlings, but when treated with norflurazon, a specific inhibitor of PDS, both calli synthesized comparable levels of phytoene. No major differences in carotenogenic gene expression were observed between the wild-type and Or calli. These results imply that the Or gene may exert its effect on carotenoid accumulation through a novel mechanism rather than increasing the capacity of biosynthesis [68]. Map-based cloning of the Or gene To gain more insights into the mechanism underlying Or-induced b-carotene accumulation, a map-based cloning strategy was employed to isolate the Or gene. Initially, 10 amplified fragment length polymorphism (AFLP) markers closely linked to the Or gene were identified in a mapping population consisting of 195 F2 individuals. Three AFLP markers that flank the Or gene were successfully converted into PCR-based sequence-characterized amplified region (SCAR) markers [69]. Two of which were used later for the analysis of a large segregating population consisting of 1,632 F2 individuals and a highresolution genetic map was developed. By screening a cauliflower bacterial artificial chromosome (BAC) library and through chromosome walking, the Or locus was defined in a single BAC clone [70]. This BAC was sequenced and nine putative genes were predicted. Fine genetic mapping identified a single candidate gene that cosegregates with the Or locus. Sequence analysis revealed that a 4.7 kb copia-like LTR retrotransposon is inserted in the mutant allele. Introduction of the genomic fragment containing the candidate gene with the retrotransposon insertion into wild-type cauliflower plants turns the white color of curd tissue into orange with the accumulation of b-carotene, which confirms the candidate gene to be Or [15]. Analysis of the Or gene and the encoding protein The Or gene is a single-copy sequence in the cauliflower genome. Comparison of the genomic sequence and full-length cDNA of Or reveals that the Or wild-type gene contains eight exons and seven introns. The insertion of a copia-like LTR retrotransponson in exon 3 results in the production of three alternatively spliced Or mutant transcripts following excision of the retrotransponson. All of the spliced transcripts can be read through and share the same start codon and stop codon with the wild-type gene. RT-PCR analysis revealed that these three transcripts are expressed at different levels in the mutant cauliflower, while only wild-type transcript is expressed in the

181

Transit peptide

Retrotransposon

Cys-rich Domain

Fig. 3. Topology of the OR protein. Transit peptide: M1-S22; Unknown domain:

V23-Y145; Transmembrane domain 1 (TM1): Y146-E168; Transmembrance domain 2 (TM2): D195-V217; Cysteine-rich domain: H225-L305. The arrow indicates the retrotransposon insertion site.

wild-type plants. In both mutant and wild-type plants, Or transcripts are expressed highly in young leaves, curds, and flower buds, and low in mature leaves and roots, indicating the insertion of retrotransponson does not change its pattern of tissue-specific expression [15]. The wild-type gene encodes a protein of 305 amino acid with an estimated molecular mass of 33.5 kD. The predicted protein has two putative transmembrane domains and is predicted to be plastid targeted (Fig. 3). Indeed, the Orwt:GPF fusion protein was found to be associated with noncolored plastids, such as leucoplasts in the epidermal cells of young leaves and amyloplasts in the young developing seeds of transgenic Arabidopsis. The characteristic of the OR protein is that it carries a cysteine-rich zinc finger domain in its C-terminal. This domain exists in DnaJ-like molecular chaperones, which participate in protein folding, assembly and disassembly, and translocation into organelles [71,72]. As OR lacks the typical N-terminal J domain of the molecular chaperones, OR is more likely a novel protein with a unique cellular function. OR is hypothesized to exert its function through protein–protein interaction mediated by its cysteine-rich domain [15]. The OR protein appears to be plant-specific and its homologs are present in all plant species examined. OR and its homologs share an exceptional sequence conservation (Fig 4), which suggests an important role of this protein in plant growth and development. Possible functional role of Or in inducing carotenoid accumulation The Or gene is not one of the carotenoid biosynthetic genes and appears to exert no direct effect on carotenoid biosynthesis. By contrast, the spatial and temporal expression patterns of the Or gene and subcellular localization of

182 Brassica Arabidopsis Gossypium Lycopersicon Medicago Vitis Oryza Sorghum Zea

(1) (1) (1) (1) (1) (1) (1) (1) (1)

MSCLGRILSVSYPPDPYGSRLSVSKLSS-PGRNRR----LRWRFTALDSD------SSSLDSDS---MSSLGRILSVSYPPDPYTWRFSQYKLSSSLGRNRR----LRWRFTALDPE------SSSLDSESS--MVCLSRVLTISCTVKPSPPYKPPSLSSRFVHTKCE--LKSRWRSMATEPD--SSSSAQSVESDSP--MVCAGRILYLSCSTTPFSPSTSAFPTSTYFHANRR--NGIRLRSMASDAD--ASSYATSLDSES---MLCLGVVGGGATTCLQLNNNK------RFIHLNNKKCFNKRWRVMALEFESDSSSFASSIDSSDT--MVYTGRILAVSYSPTTSYRYS----NSRFHQGKLK--SDLKWRAMVSGPE--ASAFAPSVDSES---MLCSARMLACSGLGGPGGRLRPSPRPGAYADRLRPPLPARRWRVASSAAASGGSPDLPSSSSSSSPPP MLCSGRMLACNGLLP--GRLR---LPRADAYRLRPPALARRWSVAASAAASGGSSDLPSSSSS---PP MLCSGRMLACNGVLP--GRLR---LPRADAYHLRPPALARRWRVVASAAASGGSPDLPSSSSS---PP

Brassica Arabidopsis Gossypium Lycopersicon Medicago Vitis Oryza Sorghum Zea

(54) (56) (62) (61) (60) (57) (69) (61) (61)

------------SDKFAAGFCIIEGPETVQDFAKMQLQEIQDNIRSRRNKIFLHMEEVRRLRIQQRIR ------------ADKFASGFCIIEGPETVQDFAKMQLQEIQDNIRSRRNKIFLHMEEVRRLRIQQRIK ------------ADKTAAGFCIIEGPETVQDFAKMELQEIEDNIRSRRNKIFLQMEEVRRLRIQQRIK ------------SDRNAAGFCIIEGPETVEDFAKMELQEIRDNIRSRRNKIFLHMEEVRRLRIQQRIK -----------TDKNSATGFCIIEGPETVQDFAKMELQEIQDNIRSRRNKIFLHMEEVRRLRIQQRIK ------------ADKNDTGFCIIEGPETVQDFAKMELQEIQDNIRSRRNKIFLHMEEVRRLRIQQRIK TPAAASFGSGDEQAAGSPGFCIIEGPETVQDFEKLDLQEIQDNIRSRRNKIFLHMEEIRRLRIQQRIK TPP---FGVGDDQAAASPGFCIIEGPETVQDFAKLDLQEIQDNIRSRRNKIFLHMEEIRRLRIQQRIK NPP---FGAGDDQTAASPGFCIIEGPETVQDFAKLDLQEIQDNIRSRRNKIFLHMEEIRRLRIQQRIK

Brassica Arabidopsis Gossypium Lycopersicon Medicago Vitis Oryza Sorghum Zea

(110) (112) (118) (117) (117) (113) (137) (126) (126)

NTELGIIDEEQEHELPNFPSFIPFLPPLTAANLRVYYATCFSLIAGIILFGGLLAPTLELKLGIGGTS NTELGIINEEQEHELPNFPSFIPFLPPLTAANLKVYYATCFSLIAGIILFGGLLAPTLELKLGIGGTS SAELGISKEEQESELPNFPSFIPFLPPLTSANLKVYYVTCYSLIAGIIIFGGLLAPTLELKLGLGGTS SAELGIITEAQENELPNFPSFIPFLPPLTSSNLKQYYATCISLIAGFMLFGGLLAPSLELKLGLGGTS NAELGIFKEEQENELPNFPSFIPFLPPLTSANLRQYYATCFSLISGIILFGGLLAPSLELKLGIGGTS NAELGILKE-QENELQNFPSFIPFLPPLSSANLKLYYAACFSLLAGIIIFGGLLAPTLELKLGLGGTS NVELGISVDVPEGELPDFPSFIPFLPPLSAANLKIYYATCFTLIAGIMVFGGFLAPILELKLGVGGTS NVELGISDEESDRELPDFPSFIPFLPPLSAANLKVYYATCFALIASIMVFGGLLAPILELKLGLGGTS NVELGISDEERDHELPDFPSFIPFLPPLSAANLKVYYATCFTLIAGIMVFGGFLAPILELKLGVGGTS

Brassica Arabidopsis Gossypium Lycopersicon Medicago Vitis Oryza Sorghum Zea

(178) (180) (186) (185) (185) (180) (205) (194) (194)

YKDFIQSLHLPMQLSQVDPIVASFSGGAVGVISALMVVEVNNVKQQEHKRCKYCLGTGYLACARCSST YADFIQSLHLPMQLSQVDPIVASFSGGAVGVISALMVVEVNNVKQQEHKRCKYCLGTGYLACARCSST YADFIRSVHLPMQLSQVDPIVASFSGGAVGVISALMVVEINNVKQQEHKRCQYCLGTGYLACARCSST YADFIGSMHLPMQLSQVDPIVASFSGGAVGVISALMVVEINNVKQQEHKRCKYCLGTGYLACARCSNT YADFIQNMHLPMQLSQVDPIVASFSGGAVGVISALMVVEINNVKQQEQKRCKYCLGTGYLACARCSNT YEDFIRSVHLPMQLSQVDPIVASFSGGAVGVISSLMIVEINNVKQQEKKRCKYCLGTGYLACARCSSS YADFIRSVHLPMQLSQVDPIVASFSGGAVGVISALMVVEINNVKQQEHKRCKYCLGTGYLACARCSST YEDFIRSVHLPMQLSEVDPIVASFSGGAVGVISALMVVEINNVKQQEHKRCKYCLGTGYLACARCSST YEDFIRSVHLPMQLSQVDPIVASFSGGAVGVISALMVVEINNVKQQELKRCKYCLGTGYLACARCSST

Brassica Arabidopsis Gossypium Lycopersicon Medicago Vitis Oryza Sorghum Zea

(246) (248) (254) (253) (253) (248) (273) (262) (262)

GSLIISEPVSAIAGGNHSVSTSKTERCSNCSGAGKVMCPTCLCTGMAMASEHDPRIDPFL GALVLTEPVSAIAGGNHSLSPPKTERCSNCSGAGKVMCPTCLCTGMAMASEHDPRIDPFD GSLVLTEPVSTLNGGDRPLSTPRTERCSNCLGSGKVMCPTCLCTGMAMASEHDPRIDPFD GSLVLIEPVSTIYGADKPLSPPKTERCSNCSGSGKVMCPTCLCTGMAMASEHDPRIDPFD GALVLIEPVSSFNGGDQPLSPPKTERCSNCSGSGKVMCPTCLCTGMAMASEHDPRIDPFD GALVLSEPVSTVNGGRQPLSPPKTERCSNCSGAGKVMCPTCLCTGMEMASEHDPRIDPFD GTLVLTEPVSTFSDGDQPLSTPRTERCPNCSGAGKVMCPTCLCTGMAMASEHDPRIDPFD GALVLTEPVSTFSDGNQPLSAPKTERCPNCSGSGKVMCPTCLCTGMAMASEHDPRIDPFI GALVLTEPVSTFSDGDQPLSAPKTERCPNCSGSGKVMCPTCLCTGMAMASEHDPRIDPFI

Fig. 4. Alignment of the predicted amino acid sequences of OR homologs from

higher plants. The amino acid sequences were obtained from GenBank and the Institute for Genomic Research (TIGR) Plant Transcript Assemblies Database. The accession numbers are: Arabidopsis thaliana NP_200975, Brassica oleracea var. botrytis ABH07405, Vitis vinifera CAO62305, Oryza sativa NP_001047593, Lycopersicon esculentum TA6503_4081, Gossypium hirsutum TA2322_3635, Medicago truncatula TA4160_3880, Glycine max TA6293_3847, Hordeum vulgare TA5326_4513, Triticum aestivum TA19897_4565, Sorghum bicolor TA4814_4558, and Zea mays TA16877_4577.

183 the protein are associated well with differentiation of plastids and carotenoid accumulation. The cauliflower homolog of plastid fusion/translocation factor (Pftf ) known to be involved in chromoplast differentiation [73] is expressed highly in the Or mutant curd, indicating enhanced differentiation of chromoplasts in the mutant [15]. Moreover, introduction of the Or gene into white cauliflower results in the formation of large membranous chromoplasts in the curd cells with increased levels of b-carotene. Taken together, genetic and cellular analysis suggests that the functional role of Or is associated with the differentiation of proplastids and/or other non-colored plastids into chromoplasts, which provide a metabolic sink for carotenoid accumulation [15]. Use of the Or gene to enrich carotenoids in transgenic potato tubers Enriching carotenoid levels in major staple crops is expected to have a broad and significant impact on human nutrition and health. Initially to see whether Or functions in another plant species, the Or gene was transformed into the Arabidopsis ap1-1/cal-1 ‘‘cauliflower’’ mutant. This mutant was chosen because it forms inflorescence meristems sharing the same structure as curd tissue of cauliflower. Expression of the Or transgene in the Arabidopsis mutant resulted in production of ‘‘orange-yellow’’ color instead of the normal pale-green hue in the inflorescence meristems (Li et al., unpublished data). HPLC analysis confirmed that the color shift is indeed associated with enhanced carotenoid accumulation. These results provide evidence that Or works across species to enhance carotenoid accumulation. The Or gene under the control of a tuber-specific promoter was transformed into potato plant. Visual examination of the flesh color of some transgenic tubers reveals that they exhibit a deep orange yellow-flesh hue (Fig. 2B). The subsequent HPLC analysis of these tubers showed that these transgenic tubers contain not only increased levels of violaxanthin and lutein normally present in the non-transformed controls, but also accumulate significant levels of b-carotene and three other metabolic intermediates of phytoene, phytofluene, and z-carotene, which are not detected in the controls. The total levels of increase were over sixfold. Interestingly, while carotenoids in potato tubers were reported to be stable or decrease during long-term cold storage [74,75], cold storage greatly increases the total levels of carotenoids as well as b-carotene in the Or transgenic tubers. Further, the high-carotenoid trait in the Or transgenic tubers is stable in a subsequent generation [16]. These results prove that the Or gene functions in the storage tissue of food crops to enhance carotenoid accumulation. Potato tubers normally accumulate carotenoids in amyloplasts [76]. Further examination of the cellular contents of these transgenic tubers by

184 light microscopy showed that in addition to various sizes of amyloplasts found in the controls, the Or transgenic tubers also contain intact chromoplasts with orange structures of helical sheets and fragments [16]. Those orange structures share common features including color, overall range of forms, and dichroism with the carotenoid-sequestering structures released from chromoplasts of orange carrot roots. The similarity suggests that like carrot, the Or transgene induces the formation of carotenoidsequestering structures within chromoplasts. The fact that the Or transgene induces the formation of chromoplasts in the transgenic potato tubes further confirms that the Or gene acts as a bona fide switch to trigger chromoplast differentiation for carotenoid accumulation. Carotenoids accumulate in high levels in chromoplasts in plants. Chromoplasts develop a unique mechanism to accumulate mass amounts of carotenoids by generating carotenoid-lipoprotein structures that function as a deposition sink to sequester and deposit carotenoids [42,77]. Indeed, the availability of such a sink for carotenoid deposition has been shown to be directly associated with carotenoid accumulation [42,78,79]. The Or case, along with previous studies on red pepper, cucumber, daffodil, and maize, demonstrates that the formation of a suitable sink plays an important role in conferring carotenoid accumulation [15,44,45,80,81]. Creating a metabolic sink is a new strategy to enrich carotenoids in staple food crops Accumulation of carotenoids or many other metabolites is a net result of several processes, such as biosynthesis and degradation. Enhanced levels of accumulation can be achieved by increasing the rate of biosynthesis or reducing the rate of turnover. Characterization of the Or gene and successful demonstration of use of Or to increase carotenoid content in transgenic potato tubers, the most important non-cereal staple crop, reveal that manipulation of chromoplast formation to create a metabolic sink exerts a profound effect on carotenoid accumulation. Thus, the Or gene offers an alternative new approach to complement effects relying on expression of carotenogenic genes for enhancing carotenoid levels in food crops, particularly in the storage tissues of staple crops. Many seeds or roots of important crops such as wheat, rice, barley, maize, canola, and cassava synthesize and accumulate low levels of carotenoids in amyloplasts or elaioplasts [17]. The low levels of accumulation could be due to a number of possible reasons, including low metabolic flux into the pathway, limited catalytic activity of particular enzymes in the pathway, and high turnover rate. In some cases, lack of a suitable metabolic sink for sequestering carotenoid end products could be the major cause, as in the case of the white cauliflower curds. Although the genes involved in carotenoid biosynthesis are expressed in the white curd tissue, the absence of a suitable

185 sink structures restrains the accumulation of carotenoids [14]. Thus, creating a suitable metabolic sink to effectively sequester endproducts provides another strategy to enrich carotenoid levels in food crops. The Or gene has been demonstrated to act as a bona fide switch to trigger the differentiation of non-colored plastids into chromoplasts in the normally white or low-pigmented tissues [15,16]. It provides a new molecular tool to enrich carotenoid content in storage tissues of staple crops by inducing the formation of chromoplasts, which provide a potential potent metabolic sink for carotenoid sequestration and deposition. Although the Or gene has been shown to confer high levels of carotenoid accumulation in cauliflower and transgenic potato tubers, it should be noted that the extent of carotenoid enrichment via enhancing sink capacity in food crops may be limited by a number of factors such as the maximal potential catalytic activity of carotenoid biosynthetic pathway in particular tissues or organs of crops [82]. Thus, concomitant manipulation of sink capacity to effectively sequester and deposit carotenoids along with the catalytic activity of this pathway to increase the metabolic flux would be a more promising strategy to quantitatively and qualitatively modify carotenoids in food crops to meet the requirement for optimal human nutrition and health.

Conclusions Significant progress has been made in genetic manipulation of carotenoid biosynthesis. Current advances to enhance carotenoid accumulation in food crops have been mainly focused on manipulation of the genes encoding the metabolic enzymes in the pathway. Such an approach is effective, but in many cases it has been proved to be difficult to reach to desired levels of carotenoid enhancement in plants. To develop better breeding and engineering strategies for carotenoid enrichment, it is essential to gain more insight on the overall regulation of carotenoid biosynthesis and interactions between carotenoid biosynthetic pathway and other signaling pathways in the plant cells. The successful cloning of a cauliflower Or gene reveals that manipulation of chromoplast formation to provide an effective metabolic sink for carotenoid sequestration and deposition exerts a profound effect on carotenoid accumulation. The demonstration of use of the Or gene to increase carotenoid content in transgenic potato illustrates an alternative new approach to complement effects relying on expression of carotenogenic genes for enhancing carotenoid levels in food crops. Creating an effective metabolic sink for carotenoid accumulation along with manipulation of the catalytic activity of this pathway can be especially useful for metabolic engineering of carotenoids in low-pigmented tissues of staple crops.

186 Acknowledgement We thank Dr. Shan Lu for his help with some of the figures. This work was supported by USDA National Research Initiative Grants, ARS Headquarter Postdoctoral Research Associated Fund, the Triad Foundation, and Helen Graham Charitable Foundation. References 1. Combs GF. The Vitamins: Fundamental Aspects in Nutrition and Health, 2nd edn, New York, USA, Academic Press, 1998. pp. 107–153. 2. Giovannucci E. Nutritional factors in human cancers. Adv Exp Med Biol 1999;472:29–42. 3. Hadley CW, Miller EC, Schwartz SJ and Clinton SK. Tomatoes, lycopene, and prostate cancer: progress and promise. Exp Biol Med 2002;227:869–880. 4. Cunningham FX and Gantt E. Genes and enzymes of carotenoid biosynthesis in plants. Annu Rev Plant Physiol Plant Mol Biol 1998;49:557–583. 5. Fraser PD and Bramley PM. The biosynthesis and nutritional uses of carotenoids. Prog Lipid Res 2004;43:228–265. 6. DellaPenna D and Pogson BJ. Vitamin synthesis in plants: tocopherols and carotenoids. Annu Rev Plant Biol 2006;57:711–738. 7. Botella-Pavia P and Rodriguez-Concepcion M. Carotenoid biotechnology in plants for nutritionally improved foods. Physiol Plant 2006;126:369–381. 8. Sandmann G, Ro¨mer S and Fraser PD. Understanding carotenoid metabolism as a necessity for genetic engineering of crop plants. Metab Eng 2006;8:291–302. 9. Taylor M and Ramsay G. Carotenoid biosynthesis in plant storage organs: recent advances and prospects for improving plant food quality. Physiol Plant 2005;124:143–151. 10. Paine JA, Shipton CA, Chaggar S, Howells RM, Kennedy MJ, Vernon G, Wright SY, Hinchliffe E, Adams JL, Silverstone AL and Drake R. Improving the nutritional value of Golden Rice through increased pro-vitamin A content. Nat Biotechnol 2005;23:482–487. 11. Fray RG, Wallace A, Fraser PD, Valero D, Hedden P, Bramley PM and Grierson D. Constitutive expression of a fruit phytoene synthase gene in transgenic tomatoes causes dwarfism by redirecting metabolites from the gibberellin pathway. Plant J 1995;8:693–701. 12. Crisp P, Walkey DGA, Bellman E and Roberts E. A mutation affecting curd colour in cauliflower (Brassica oleracea L. var. botrytis DC). Euphytica 1975;24:173–176. 13. Dickson MH, Lee CY and Bramble AE. Orange-curd high carotene cauliflower inbreds, NY156, NY163, and NY165. HortScience 1988;23:778–779. 14. Li L, Paolillo DJ, Parthasarathy MV, DiMuzio EM and Garvin DF. A novel gene mutation that confers abnormal patterns of beta-carotene accumulation in cauliflower (Brassica oleracea var. botrytis). Plant J 2001;26:59–67. 15. Lu S, Van Eck J, Zhou X, Lopez AB, O’Halloran DM, Cosman KM, Conlin BJ, Paolillo DJ, Garvin DF, Vrebalov J, Kochian LV, Kupper H, Earle ED, Cao J and Li L. The cauliflower Or gene encodes a DnaJ cysteine-rich domain-containing protein that mediates high levels of b-carotene accumulation. Plant Cell 2006;18:3594–3605. 16. Lopez AB, Van Eck J, Conlin B, Paolillo DJ, O’Neill J and Li L. Effect of the cauliflower Or transgene on carotenoid accumulation and chromoplast formation in transgenic potato tubers. J Exp Bot 2008;59:213–223. 17. Howitt CA and Pogson BJ. Carotenoid accumulation and function in seeds and non-green tissues. Plant Cell Environ 2006;29:435–445.

187 18. Qin X and Zeevaart JAD. The 9-cis-epoxycarotenoid cleavage reaction is the key regulatory step of abscisic acid biosynthesis in water-stressed bean. Proc Natl Acad Sci USA 1999;96:15354–15361. 19. Schwartz SH, Tan BC, Gage DA, Zeevaart JA and McCarty DR. Specific oxidative cleavage of carotenoids by VP14 of maize. Science 1997;276:1872–1874. 20. Gallagher CE, Cervantes-Cervantes M and Wurtzel ET. Surrogate biochemistry: use of Escherichia coli to identify plant cDNAs that impact metabolic engineering of carotenoid accumulation. Appl Microbiol Biotechnol 2003;60:713–719. 21. Mandel MA, Feldmann KA, Herrera-Estrella L, Rocha-Sosa M and Leon P. CLA1, a novel gene required for chloroplast development, is highly conserved in evolution. Plant J 1996;9:649–658. 22. Estevez JM, Cantero A, Romero C, Kawaide H, Jimenez LF, Kuzuyama T, Seto H, Kamiya Y and Leon P. Analysis of the expression of CLA1, a gene that encodes the 1-deoxyxylulose 5-phosphate synthase of the 2-C-methyl-D-erythritol-4-phosphate pathway in Arabidopsis. Plant Physiol 2000;124:95–104. 23. Crowell DN, Packard CE, Pierson CA, Giner JL, Downes BP and Chary SN. Identification of an allele of CLA1 associated with variegation in Arabidopsis thaliana. Physiol Plant 2003;118:29–37. 24. Botella-Pavia P, Besumbes O, Phillips MA, Carretero-Paulet L, Boronat A and Rodriguez-Concepcion M. Regulation of carotenoid biosynthesis in plants: evidence for a key role of hydroxymethylbutenyl diphosphate reductase in controlling the supply of plastidial isoprenoid precursors. Plant J 2004;40:188–199. 25. Cunningham FX. Regulation of carotenoid synthesis and accumulation in plants. Pure Appl Chem 2002;74:1409–1417. 26. Simkin AJ, Zhu C, Kuntz M and Sandmann G. Light-dark regulation of carotenoid biosynthesis in pepper (Capsicum annuum) leaves. J Plant Physiol 2003;160:439–443. 27. von Lintig J, Welsch R, Bonk M, Giuliano G, Batschauer A and Kleinig H. Lightdependent regulation of carotenoid biosynthesis occurs at the level of phytoene synthase expression and is mediated by phytochrome in Sinapis alba and Arabidopsis thaliana seedlings. Plant J 1997;12:625–634. 28. Welsch R, Medina J, Giuliano G, Beyer P and von Lintig J. Structural and functional characterization of the phytoene synthase promoter from Arabidopsis thaliana. Planta 2003;216:523–534. 29. Giuliano G, Bartley GE and Scolnik PA. Regulation of carotenoid biosynthesis during tomato development. Plant Cell 1993;5:379–387. 30. Corona V, Aracri B, Kosturkova G, Bartley GE, Pitto L, Giorgetti L, Scolnik PA and Giuliano G. Regulation of a carotenoid biosynthesis gene promoter during plant development. Plant J 1996;9:505–512. 31. Kato M, Ikoma Y, Matsumoto H, Sugiura M, Hyodo H and Yano M. Accumulation of carotenoids and expression of carotenoid biosynthetic genes during maturation in citrus fruit. Plant Physiol 2004;134:824–837. 32. Simkin AJ, Schwartz SH, Auldridge M, Taylor MG and Klee HJ. The tomato carotenoid cleavage dioxygenase 1 genes contribute to the formation of the flavor volatiles betaionone, pseudoionone, and geranylacetone. Plant J 2004;40:882–892. 33. Schwartz SH, Qin X and Zeevaart JA. Elucidation of the indirect pathway of abscisic acid biosynthesis by mutants, genes, and enzymes. Plant Physiol 2003;131:1591–1601. 34. Schwartz SH, Qin X and Loewen MC. The biochemical characterization of two carotenoid cleavage enzymes from Arabidopsis indicates that a carotenoid-derived compound inhibits lateral branching. J Biol Chem 2004;279:46940–46945.

188 35. Kato M, Matsumoto H, Ikoma Y, Okuda H and Yano M. The role of carotenoid cleavage dioxygenases in the regulation of carotenoid profiles during maturation in citrus fruit. J Exp Bot 2006;57:2153–2164. 36. Iuchi S, Kobayashi M, Yamaguchi-Shinozaki K and Shinozaki K. A stress-inducible gene for 9-cis-epoxycarotenoid dioxygenase involved in abscisic acid biosynthesis under water stress in drought-tolerant cowpea. Plant Physiol 2000;123:553–562. 37. Bouvier F, Suire C, Mutterer J and Camara B. Oxidative remodeling of chromoplast carotenoids: identification of the carotenoid dioxygenase CsCCD and CsZCD genes involved in crocus secondary metabolite biogenesis. Plant Cell 2003;15:47–62. 38. Bouvier F, Dogbo O and Camara B. Biosynthesis of the food and cosmetic plant pigment bixin (annatto). Science 2003;300:2089–2091. 39. Tan BC, Schwartz SH, Zeevaart JA and McCarty DR. Genetic control of abscisic acid biosynthesis in maize. Proc Natl Acad Sci USA 1997;94:12235–12240. 40. Simkin AJ, Underwood BA, Auldridge M, Loucas HM, Shibuya K, Schmelz E, Clark DG and Klee HJ. Circadian regulation of the PhCCD1 carotenoid cleavage dioxygenase controls emission of b-ionone, a fragrance volatile of petunia flowers. Plant Physiol 2004;136:3504–3514. 41. Marano MR, Serra EC, Orellano EG and Carrillo N. The path of chromoplast development in fruits and flowers. Plant Sci 1993;94:1–17. 42. Vishnevetsky M, Ovadis M and Vainstein A. Carotenoid sequestration in plants: the role of carotenoid-associated proteins. Trends Plant Sci 1999;4:232–235. 43. Kessler F, Schnell D and Blobel G. Identification of proteins associated with plastoglobules isolated from pea (Pisum sativum L) chloroplasts. Planta 1999;208:107–113. 44. Deruere J, Romer S, d’Harlingue A, Backhaus RA, Kuntz M and Camara B. Fibril assembly and carotenoid overaccumulation in chromoplasts: a model for supramolecular lipoprotein structures. Plant Cell 1994;6:119–133. 45. Vishnevetsky M, Ovadis M, Itzhaki H, Levy M, Libal-Weksler Y, Adam Z and Vainstein A. Molecular cloning of a carotenoid-associated protein from Cucumis sativus corollas: homologous genes involved in carotenoid sequestration in chromoplasts. Plant J 1996;10:1111–1118. 46. Kim HU, Wu SSH, Ratnayake C and Huang AHC. Brassica rapa has three genes that encode proteins associated with different neutral lipids in plastids of specific tissues. Plant Physiol 2001;126:330–341. 47. Simkin AJ, Gaffe´ J, Alcaraz JP, Carde JP, Bramley PM, Fraser PD and Kuntz M. Fibrillin influence on plastid ultrastructure and pigment content in tomato fruit. Phytochem 2007;68:1545–1556. 48. Leitner-Dagan Y, Ovadis M, Shklarman E, Elad Y, Rav David D and Vainstein A. Expression and functional analyses of the plastid lipid-associated protein CHRC suggest its role in chromoplastogenesis and stress. Plant Physiol 2006;142:233–244. 49. Fraser PD, Romer S, Shipton CA, Mills PB, Kiano JW, Misawa N, Drake RG, Schuch W and Bramley PM. Evaluation of transgenic tomato plants expressing an additional phytoene synthase in a fruit-specific manner. Proc Natl Acad Sci USA 2002;99: 1092–1097. 50. Ducreux LJ, Morris WL, Hedley PE, Shepherd T, Davies HV, Millam S and Taylor MA. Metabolic engineering of high carotenoid potato tubers containing enhanced levels of beta-carotene and lutein. J Exp Bot 2005;56:81–89. 51. Shewmaker CK, Sheehy JA, Daley M, Colburn S and Ke DY. Seed-specific overexpression of phytoene synthase: increase in carotenoids and other metabolic effects. Plant J 1999;20:401–412.

189 52. Ye X, Al-Babili S, Klo¨ti A, Zhang J, Lucca P, Beyer P and Potrykus I. Engineering the provitamin A (b-carotene) biosynthetic pathway into (carotenoid-free) rice endosperm. Science 2000;287:303–305. 53. Diretto G, Al-Babili S, Tavazza R, Papacchioli V, Beyer P and Giuliano G. Metabolic engineering of potato carotenoid content through tuber-specific overexpression of a bacterial mini-pathway. PLoS ONE 2007;2:e350. 54. Dharmapuri S, Rosati C, Pallara P, Aquilani R, Bouvier F, Camara B and Giuliano G. Metabolic engineering of xanthophyll content in tomato fruits. FEBS Lett 2002;519:30–34. 55. Ro¨mer S, Lu¨beck J, Kauder F, Steiger S, Adomat C and Sandmann G. Genetic engineering of a zeaxanthin-rich potato by antisense inactivation and co-suppression of carotenoid epoxidation. Metab Eng 2002;4:263–272. 56. Diretto G, Welsch R, Tavazza R, Mourgues F, Pizzichini D, Beyer P and Giuliano G. Silencing of beta-carotene hydroxylase increases total carotenoid and beta-carotene levels in potato tubers. BMC Plant Biol 2007;7:11. 57. Van Eck J, Conlin B, Garvin DF, Mason H, Navarre DA and Brown CR. Enhancing beta-carotene content in potato by RNAi-mediated silencing of the b-carotene hydroxylase gene. Am J Potato Res 2007;84:331–342. 58. D’Ambrosio C, Giorio G, Marino I, Merendino A, Petrozza A, Salfi L, Stigliani AL and Cellini F. Virtually complete conversion of lycopene into b-carotene in fruits of tomato plants transformed with the tomato lycopene b-cyclase (tlcy-b) cDNA. Plant Sci 2004;166:207–214. 59. Gerjets T and Sandmann G. Ketocarotenoid formation in transgenic potato. J Exp Bot 2006;57:3639–3645. 60. Mann V, Harker M, Pecker I and Hirschberg J. Metabolic engineering of astaxanthin production in tobacco flowers. Nat Biotechnol 2000;18:888–892. 61. Ralley L, Enfissi EMA, Misawa N, Schuch W, Bramley PM and Fraser PD. Metabolic engineering of ketocarotenoid formation in higher plants. Plant J 2004;39:477–486. 62. Sta˚lberg K, Lindgren O, Ek B and Ho¨glund AS. Synthesis of ketocarotenoids in the seed of Arabidopsis thaliana. Plant J 2003;36:771–779. 63. Giliberto L, Perrotta G, Pallara P, Weller JL, Fraser PD, Bramley PM, Fiore A, Tavazza M and Giuliano G. Manipulation of the blue light photoreceptor cryptochrome 2 in tomato affects vegetative development, flowering time, and fruit antioxidant content. Plant Physiol 2005;137:199–208. 64. Davuluri GR, van Tuinen A, Fraser PD, Manfredonia A, Newman R, Burgess D, Brummell DA, King SR, Palys J, Uhlig J, Bramley PM, Pennings HMJ and Bowler C. Fruit-specific RNAi-mediated suppression of DET1 enhances carotenoid and flavonoid content in tomatoes. Nat Biotechnol 2005;23:890–895. 65. Lindgren LO, Stalberg KG and Hoglund AS. Seed-specific overexpression of an endogenous Arabidopsis phytoene synthase gene results in delayed germination and increased levels of carotenoids, chlorophyll, and abscisic acid. Plant Physiol 2003;132:779–785. 66. Frey-Wyssling A and Schwegler F. Ultrastructure of chromoplasts in the carrot root. J Ultrastruct Res 1965;13:543–559. 67. Paolillo DJ, Garvin DF and Parthasarathy MV. The chromoplasts of Or mutants of cauliflower (Brassica oleracea L. var. botrytis). Protoplasma 2004;224:245–253. 68. Li L, Lu S, Cosman KM, Earle ED, Garvin DF and O’Neill J. b-Carotene accumulation induced by the cauliflower Or gene is not due to an increased capacity of biosynthesis. Phytochem 2006;67:1177–1184. 69. Li L and Garvin DF. Molecular mapping of Or, a gene inducing b-carotene accumulation in cauliflower (Brassica oleracea var. botrytis). Genome 2003;47:588–594.

190 70. Li L, Lu S, O’Halloran DM, Garvin DF and Vrebalov J. High-resolution genetic and physical mapping of the cauliflower high b-carotene gene Or (Orange). Mol Genet Genomics 2003;270:132–138. 71. Cheetham ME and Caplan AJ. Structure, function and evolution of DnaJ: conservation and adaptation of chaperone function. Cell Stress Chaperones 1998;3:28–36. 72. Hartl FU and Hayer-Hartl M. Molecular chaperones in the cytosol: from nascent chain to folded protein. Science 2002;295:1852–1858. 73. Hugueney P, Bouvier F, Badillo A, d’Harlingue A, Kuntz M and Camara B. Identification of a plastid protein involved in vesicle fusion and/or membrane protein translocation. Proc Natl Acad Sci USA 1995;92:5630–5634. 74. Morris WL, Ducreux L, Griffiths DW, Stewart D, Davies HV and Taylor MA. Carotenogenesis during tuber development and storage in potato. J Exp Bot 2004;55: 975–982. 75. Griffiths DW, Dale MF, Morris WL and Ramsay G. Effects of season and postharvest storage on the carotenoid content of Solanum phureja potato tubers. J Agric Food Chem 2007;55:379–385. 76. Fishwick MJ and Wright AJ. Isolation and characterization of amyloplast envelope membranes from Solanum tuberosum. Phytochem 1980;19:55–59. 77. Camara B, Hugueney P, Bouvier F, Kuntz M and Moneger R. Biochemistry and molecular biology of chromoplast development. Int Rev Cytol 1995;163:175–247. 78. Bartley GE and Scolnik PA. Plant carotenoids: pigments for photoprotection, visual attraction, and human health. Plant Cell 1995;7:1027–1038. 79. Rabbani S, Beyer P, Lintig J, Hugueney P and Kleinig H. Induced beta-carotene synthesis driven by triacylglycerol deposition in the unicellular alga Dunaliella bardawil. Plant Physiol 1998;116:1239–1248. 80. Cookson PJ, Kiano JW, Shipton CA, Fraser PD, Romer S, Schuch W, Bramley PM and Pyke KA. Increases in cell elongation, plastid compartment size and phytoene synthase activity underlie the phenotype of the high pigment-1 mutant of tomato. Planta 2003; 217:896–903. 81. Liu Y, Roof S, Ye Z, Barry C, van Tuinen A, Vrebalov J, Bowler C and Giovannoni J. Manipulation of light signal transduction as a means of modifying fruit nutritional quality in tomato. Proc Natl Acad Sci USA 2004;101:9897–9902. 82. Li L and Van Eck J. Metabolic engineering of carotenoid accumulation by creating a metabolic sink. Transgenic Res 2007;16:581–585.

191

How to predict and prevent the immunogenicity of therapeutic proteins Huub Schellekens Utrecht University, Faculty of Science, Department of Pharmaceutical Sciences, P.O. Box 80.082, 3508 TB Utrecht, The Netherlands Abstract. Therapeutic proteins in general induce an immune response, especially when administered as multiple doses over prolonged periods. Non-human therapeutic proteins such as asparaginase and streptokinase induce antibodies by the classical immune reaction and their primary immunogenic factor is the degree of non-self. Human therapeutic proteins such as the interferons and GM-CSF breakdown immune tolerance and protein aggregation is their main factor inducing antibodies. Many other factors influence the level of immunogenicity of proteins, such as storage conditions,contaminants or impurities in the preparation, downstream processing, dose and length of treatment, as well as route of administration, appropriate formulation and disease status and concomitant treatment of patients. Clinical manifestations of antibodies directed against the protein include loss of efficacy, cross neutralization of endogenous proteins and general immune system effects, such as anaphylaxis or serum sickness. Keywords: immunogenicity, therapeutic proteins, prevention, prediction.

Introduction The use of therapeutic proteins such as the interferons, growth factors, hormones and monoclonal antibodies has increased dramatically over the past two decades [1]. However, the first use of therapeutic proteins dates back to more than a century ago, when animal antiserum was introduced for the treatment and prevention of infections. This antiserum treatment was associated with a high degree of immunogenicity. As foreign proteins they induced high levels of antibodies in the majority of patients after a single or a few injections sometimes with fatal anaphylactic reactions. The next generation of therapeutic were of human origin such as the plasmaderived clotting factors and growth hormone isolated from the pituitary glands of cadavers. Also, these proteins induced antibodies in many patients [2,3]. Most of the patients were treated with these products because of an innate deficiency and therefore they lacked immune tolerance for these proteins. Most of the therapeutic proteins introduced in recent decades are homologues of human proteins. However, in contrast to expectations, these proteins also appear to induce antibodies, and in some cases, in the majority of patients [4,5]. Proteins which have not been reported to induce antibodies are extremely rare. Given that these antibodies are directed against Tel.: +3130-2536973. Fax: +3130-2517839.

E-mail: [email protected] (H. Schellekens). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00007-0

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

192 self-antigens and used in patients with a normal immune tolerance, the immunological reaction involves breaking B-cell tolerance. In general, patients need to be exposed to therapeutic proteins for prolonged periods to activate autoreactive B-cells and the antibody response is relatively limited and usually disappears shortly after stopping treatment. So when discussing the immunogenicity of proteins it is important to realize that foreign proteins and human homologues in patients lacking immune tolerance induce antibodies by the classical immunological pathways. Human homologues in patients with immune tolerance induce antibodies by breaking B-cell tolerance. There are groups of patients in which both types of antibody induction may occur, e.g., in patients with protein deficiency. Some gene defects lead to complete silencing and lack of immune tolerance. In some cases non-functional but still immune-reactive protein is produced and the patients are immune tolerant, while suffering from a functional deficiency. Antibody assays Another important issue is the assays which are used to evaluate the immune response. It is important to realize that the assays are not standardized and there are no international reference preparations or standards. This makes it difficult and in most cases impossible to compare antibody incidences from different trials and different products unless the antibody assays were performed in the same laboratory. The lack of standardization explains why the reported incidence of the same product administered to comparable group of patients differs widely. For example, the reported incidence of an antibody response to interferon alpha 2 in patients with viral infections varied between 0 and W80% [6]. It is also essential to discriminate between the assay performance characteristics and the clinical relevance of the assay results when evaluating the immunogenicity of therapeutic proteins. So a positive signal in an antibody assay does not imply automatically that there are any biological or clinical consequences. There are two principles for the testing of antibodies: assays that monitor binding of antibody to the drug (EIA, RIA, SPR) and (neutralizing) bioassays. These assays are used in combination: a sensitive binding assay to identify all antibody-containing samples is often more practical as bioassays are usually more difficult and time-consuming. If a native molecule is modified (e.g., pegylated or truncated) to obtain a new product with different pharmacological characteristics, antibodies to both the ‘‘parent’’ and the new molecule should be determined. A definition for the negative cut-off must be included in the validation process of this assay and is often based on the 5% false positive rate. Such an analytical cut-off per se does not predict a biological or clinical effect but is

193 dependent on the technical limitations of the assay. And because the cut-off is set to include a relative high number of false positives, all initial positive sera should be confirmed by either another binding assays or displacement assay. The confirmed positive samples should be tested in a bioassay for neutralizing antibodies, which correlate with potential clinical effects in patients because these antibodies interfere with receptor binding. A confirmatory step for neutralizing antibodies is not necessary because these antibodies are a subset of binding antibodies. In some cases it may be important to show the neutralization to be caused by antibodies, if the presence of other inhibitory factors such as soluble receptor has to be excluded. Further characterization of the neutralizing antibody response may follow, such as determining antibody isotype, affinity and specificity. Often SPR (Biacore) is used for this further characterization. Evaluating the clinical and biological consequences on the basis of a positive/negative answer from the assays may be misleading; development of antibodies is a dynamic process and therefore the kinetics of antibody development must be plotted quantitatively over time. Usually increasing and high titers of neutralizing antibodies are associated with some biological effect. In many cases a therapeutic protein may not be immunogenic enough to achieve such levels. Also, an analysis of the biological effect based on incidence alone is misleading. Comparing clinical efficacy in patient groups defined on the basis of mean population antibody titers, taking into account both incidence and titers of neutralizing antibodies, is in general more useful than comparing antibody negative vs. antibody positive patients. For further information I refer to some recently published excellent reviews regarding the development and validation of the different assay formats for antibodies to therapeutic proteins [7–9]. Mechanisms of antibody induction As discussed, the mechanism of antibody induction will depend on the origin of the product. A non-human protein will induce antibodies by the classical immune response. Such a response will also occur if the patients do not have immune tolerance to the protein. In case of human proteins antibodies will be induced by breaking B-cell tolerance. The classical immune response The classical immune response involves antigen-presenting cells, usually macrophages and dendritic cells which activate both B- and helper T-cells which are necessary for a mature antibody reaction and the formation of

194 memory cells. For details of this response the reader is referred to general immunology textbooks. Breaking B-cell tolerance Therapeutic proteins which are homologues of human proteins induce antibodies by breaking of B-cell tolerance. The details of this mechanisms are not completely understood. There are always autoreactive B-cells circulating, which are kept under control by peripheral mechanisms such as elimination by apoptosis when they meet their antigen. Also receptor editing has been described as a mechanism to make these cells harmless. The most likely silencing peripheral mechanism is the induction of functional anergy in these B-cells. Apparently these cells are not stimulated to produce antibodies by the circulating levels of native proteins such as insulin, interferon and EPO. Such B-cells may start to antibodies by a second signal or ‘‘danger’’ signal. Bacterial endotoxines which react with Toll receptors may provide such danger signal. This explains the production of antibodies to self-antigen associated with LPS. Also microbial CpG motifs present in DNA may be another factor which can trigger an immune reaction to self-antigens. This T-cell dependent activation is weak. When self-antigens are coupled with foreign T-cell epitopes only a weak IgM response is induced, unless multiple high doses of antigen are given together with immune adjuvants. The most potent way to induce high levels of IgG independent of T-cell help is to present the self-antigen in arrayed on viruses and viral-like particles [10–12]. The pacing of epitopes with a distance of 50–100 A˚ is unique to microbial and the immune system has apparently learned to react vigorously to this type of antigen presentation. Aggregates of proteins also present epitopes in an array form and explain why as described later they are the main cause of antibody induction. Aggregates are also described to be internalized by B-cells which then become helper cells explaining the T-cell independence of the isotype switching. The induction of antibodies to self-antigens may also occur by immunological mimicry. It needs the presence in the product of a modification of the therapeutic protein which makes it non-self enough to induce a classical immune response but still resembles the human homologue enough to allow cross reaction of the with the original protein. The mirror image of this situation would be patients with an allele of the gene of the proteins which diverges from the gene used for the production of the therapeutic protein. This possibility has been experimentally suggested in mice with different allotypes for IL-2. When immunized the mice produced antibodies which reacted with all forms of IL-2. Whether this situation also may occur with therapeutic proteins in patients needs to be confirmed.

195 Factors influencing the immunogenicity of therapeutic proteins As discussed, the primary factors influencing the immunogenicity of therapeutic proteins are aggregates in the case of human homologues and the degree of non-self and the presence of T-cell epitopes in the case of nonhuman proteins. In addition, there are a number of secondary factors which influence the level and intensity of the antibody response. One of these factors is the length of treatment. Foreign proteins like streptokinase and asparaginase induce antibodies often after a single injection. Breaking B-cell tolerance by human protein takes in general more than six months of chronic treatment. The route of administration influences the likelihood of an antibody response independent of the mechanism of induction. The probability of an immune response is the highest with subcutaneous administration, less probable after intramuscular administration and intravenous administration is the least immunogenic route. There are no studies comparing parenteral and non-parenteral routes of administration. However, it should be realized that also mucosal tissues are immune competent making intranasal and pulmonary administration of therapeutic proteins potential immunogenic. The biological activities of the product are influencing the immune response. An immune-stimulating therapeutic protein is more likely to induce antibodies than an immune-suppressive protein. Monoclonal antibodies targeted to cell-bound epitopes are more likely to induce an immune response than monoclonal antibodies with a target in solution. Also the Fc-bound activities of monoclonal antibodies have an influence. Impurities may influence immunogenicity. I have discussed the presence of modified protein and contaminants such as host-cell derived endotoxine as potential inductors of an immune response. The immunogenicity of products such as human growth hormone, insulin and interferon alpha-2 have declined over the years due to improved downstream processing and formulation, reducing the level of impurities. The probability of an immune response therefore increases with the level of impurities. Impurities may also induce an immune response to themselves and may be the cause of skin reactions, allergies and other side effects. Products modification which are intended to enhance half-life potentially also increase the exposition of the protein to the immune system and may increase immunogenicity. In addition, the modification may reduce biological activity necessitating more protein for the same biological effect. Pegylation is claimed to reduce the immunogenicity of therapeutic proteins by shielding [13]. There is evidence that pegylation reduces the immunogenicity of non-human proteins like bovine adenosine deamidase and asparginase. Whether pegylation also reduces the capacity of human proteins to break B-cell tolerance is less clear. There are reports of high immunogenicity

196 of pegylated human proteins such as megakryocyte growth and differentiation factor (MGDF), but the immunogenicity of unpegylated MGDF products is unknown [14]. Gender, age and ethnic background have all been reported to influence the incidence of antibody response to specific therapeutic proteins. However, the only patient characteristic which consistently has been identified for a number of different products is the disease the patients suffer from. Cancer patients are less likely to produce antibodies to therapeutic protein than other patients. The most widely accepted explanation for this difference is the immune compromised state of cancer patients, both by the disease and by anticancer treatment. Also the median survival of patients on treatment by therapeutic proteins may be too short to develop an antibody response. In any case, cancer reduces the probability of an antibody response to a protein considerably. As the experience in cancer patients shows immune-suppressive therapy reduces the probability to develop an immune response to proteins. In addition, immune-suppressive drugs such as methotrexate are used in conjunction with monoclonal antibodies and other protein drugs in order to reduce the immune reactions. Prediction on the basis of physicochemical characterization Non-human proteins The degree of non-self is an important factor in the potency of a protein to induce antibodies and especially where exactly in the protein the divergence is located. This prediction on the number of amino acid differences with a native sequence alone is misleading. There are examples of proteins in which a single amino acid difference induced strong immunogenicity, while in other proteins more than 10% of changes had no effect [15]. If the proteins differs so substantially that epitopes are formed to which the patients has no innate immunity, then a rapid and intense immune reaction is induced. On the basis of sequence analysis, the immunogenicity of these proteins can be reduced, however, the current algorithms that are designed to predict epitopes are at an early stage of development. Also, the extreme heterogeneity of HLA responses in humans poses a major hurdle for precise prediction. Therefore epitope prediction in not certain as patients can make antibodies to unpredicted epitopes or fail to mount an immune response to predicted epitopes. On the basis of sequence it is at present impossible to fully predict immunogenicity of biopharmaceuticals. Considering that other factors than sequence are important for immunogenicity and immune responses are genetically determined and therefore highly individual, sequence will probably never be enough to predict and avoid the induction of antibodies.

197 Human homologues For biopharmaceuticals with a high degree of homology to native proteins, the main factors in immunogenicity have been reported to be impurities, heterogeneity, aggregate formation and protein degradation such as oxidation and deamidation, which can be monitored by physicochemical characterization during production and storage [16–18]. However, there are factors influencing immunogenicity which cannot yet be identified by the current methods to characterize biopharmaceuticals. The considerable reduction in immunogenicity seen over the years in products like insulin and growth hormone has been explained by the increased purity. However, which impurities should be avoided is not known. The relative high immunogenicity of first recombinant DNA-derived growth hormone has been blamed by the presence of a high level of host-cell derived endotoxin. However this explanation has been disputed. Less controversy concerns the role of aggregates in the induction of antibodies by human homologues. They present epitopes in a repeated fashion known to be able to activate B-cells. In transgenic models aggregates have been shown the most decisive product modification able to break tolerance. And reduction of aggregates has also been shown in clinical trials to reduce the induction of antibodies. A number of chromatographic and electroforetic methods can be used to identify some types of aggregates, although the experimental conditions may also dissolve aggregates which have immunogenic potential. Analytical ultracentrifugation is considered the least-destructive method of analysis and capable of identifying all aggregate forms involved in induction of antibodies.

Prediction with animal studies In principle all proteins intended for therapeutic use in humans are foreign for animals and cause the induction of antibodies. For biopharmaceuticals of microbial, plant or other non-human origins this immunogenic response is similar as in humans as they are comparably foreign for all mammalian species. Therefore, animal studies in which the reduction of immunogenicity is evaluated have a high degree of predictability for immunogenicity in humans. Especially useful are transgenic animals expressing the human MHC-complex, so processing the peptide as humans. For human homologues standard animals are less predictive because the induction of antibodies in humans is based on breaking B-cell tolerance which differs fundamentally from the vaccine-like reaction seen in conventional animals. Still, animal studies with these type of products in conventional animals may be useful because it may help to identify the relative immunogenicity of different preparations.

198 If a biological effect develops, preclinical testing may help to identify the possible sequelae of immunogenicity, as has been shown with EPO in dogs. In the canine model human EPO is immunogenic and it induces antibodies which neutralize the native canine EPO leading to pure red-cell aplasia. This severe complication of antibodies to EPO was later confirmed in humans [19]. In addition, antibody positive animals provide sera for the development and validation of antibody assays. Non-human primates and transgenic animals may be the best models to study immunogenicity of biopharmaceuticals for human use. In human primates there will be a high sequence homology between the product and the homologous molecule to which the animal is immune tolerant. However, one should be aware of strain differences in different monkey species and differences in immunogenicity between monkeys and humans. And there are reports about proteins immunogenic in non-human primates which proved non-immunogenic in human patients and the other way around. The best models to evaluate the immunogenicity of human proteins are transgenic animals, immune tolerant for the human protein [20–22]. In these animals breaking of immune tolerance can be studied as has been shown with insulin, htPA and growth hormone. Transgenic mice were used some years ago to show that aggregates in Hu IFN alpha 2a produced during storage at room temperature of a freeze dried preparations were responsible for enhanced immunogenicity. Recently this model was extensively used to study the effect of protein modifications on immunogenicity. These studies not only confirmed the potential of this model but also the central role of aggregates in breaking B-cell tolerance. Results obtained in mice expressing human IFN beta confirmed these observations. Although transgenic immune-tolerant mice are the best choice to study breaking B-cell tolerance, the in vivo analysis of the immunogenicity may be complicated, if the therapeutic protein has immunemodulatory effects which may interfere with antibody production or if the transgenesis modulates the immune response. Consequences of antibodies to therapeutic proteins In many cases the presence of antibodies is not associated with biological or clinical consequences. The effects which antibodies may induce depend on their level and affinity and can be the result of antigen–antibody reaction in general or of the specific interaction (Table 1). Severe general immune Table 1. Consequences of antibodies to therapeutic proteins. General immune effects: anaphylactoid reactions, serum sickness, etc. Effects on pharmacokinetics Reduction of efficacy Cross neutralization of endogenous protein

199 reactions as anaphylaxis associated with the use of animal antiserum have become rare because the purity of the products increased substantially. Delayed-type infusion-like reactions resembling serum sickness are more common, especially with monoclonal antibodies and other proteins administered in relative large amounts and the formation of immune complexes. Patients with a slow but steadily increasing antibody titer are reported to show more infusion-like reactions than patients with a short temporary response. The consequences of the specific interaction with protein drugs are dependent on the affinity of the antibody translating in binding and/or neutralizing capacity. Binding antibodies may influence the pharmacokinetic behavior of the product and both increase and reduction of half-life have been reported resulting in enhancement or reduction in activity. Persisting levels of neutralizing antibodies in general result in loss of activity of the protein drug. Because by definition neutralizing antibodies interact with ligand–receptor interaction, they will inhibit the efficacy of all products in the same class with serious consequences for patients if there is no alternative treatment. The most dramatic side effects occur if the neutralizing antibodies cross react with an endogenous factor with an essential biological function. This had been described for antibodies induced by epoetin alpha and MGDF which led to life-threatening anemia, respectively, thrombocytopenia sometimes lasting for more than a year. Reduction of immunogenicity In Table 2, there are many reports on the reduction of immunogenicity of human growth hormone, factor VIII, insulin and interferon alpha 2a through improvements in manufacturing, specifically purification and formulation. So optimizing downstream processing and the avoidance of aggregates in particular is probably the best possible approach to minimize immunogenicity especially of human homologues. Chemical modification may be helpful if the therapeutic protein is intrinsically immunogenic as is the case with non-human proteins. Covalent binding with polymers, such as PEG and Dextran, and protection of positive charges with succinic anhydride may reduce the immunogenicity of proteins. Table 2. Strategies to reduce immunogenicity. Removal of non-human sequences Pegylation Optimizing production and purification Optimizing formulation

200 Whether pegylation of products that are more homologous to human proteins generally reduces immunogenicity is much less clear. There are examples of serious immunogenicity of pegylated human homologues such as MGDF and TNF binding protein. Sequence modification has been reported to reduce the immunogenicity of products like staphylokinase and streptokinase, which are bacterial products carrying neo-antigens. The benefits from sequence modification must be carefully weighed against the risk that these products will cause the breakdown of immune tolerance by introducing novel T- or B-cell epitopes. The best example of reducing the immunogenicity by sequence modifications are the different generations of monoclonal antibodies [23,24]. The first generation of monoclonal antibodies were of murine origin and therefore highly immunogenic. Several technical advances have been made over the years in both the development as well as the production of monoclonal antibodies. Recombinant DNA technology has made it possible to exchange the murine constant parts of the immune globulin chains with the human counterparts (chimeric mAb) and later to graft murine complementary regions (CDRs) which determine specificity into a human immune globulin backbone creating humanized mAbs. Both the chimeric as well as the humanized monoclonal antibodies have a reduced immunogenicity. At present, transgenic animals, phage display technologies and other developments allow the production of completely human mAb. These human antibodies still have some immunogenicity based on the other factors discussed earlier. Route of administration is one of the factors influencing the induction of antibodies by proteins. It is possible that local inflammatory reaction, which is occasionally seen after the local injection of biopharmaceuticals, may contribute to the immunogenicity. Alternative forms of delivery may well help to reduce immune reactions. However, the use of pumps delivering insulin intraperitoneally was associated with a higher immunogenicity in about 50% of patients. Sustained release of human growth hormone by encapsulation in microspheres did not lead to different immunogenicity in rhesus monkeys nor transgenic mice which were immune tolerant for rhGH. The use of liposomes to target protein pharmaceuticals has been associated with higher immunogenicity. While it is reasonable to expect that advances in delivery technology will allow one to reduce the immunogenicity, this has not yet been fully established.

References 1. Walsh G. Pharmaceutical biotechnology products approved within the European Union. Eur J Pharm Biopharm 2003;55:3–10.

201 2. Moore WV and Leppert P. Role of aggregated human growth hormone (hGH) in development of antibodies to hGH. J Clin Endocrinol Metab 1980;51:691–697. 3. White GC, 2nd, Kempton CL, Grimsley A, Nielsen B and Roberts HR. Cellular immune responses in hemophilia: why do inhibitors develop in some, but not all hemophiliacs? J Thromb Haemost 2005;3(8):1676–1681. Review. 4. Schellekens H. Immunogenicity of therapeutic proteins: clinical implications and future prospects. Clin Ther 2002;24(11):1720–1740. 5. Schellekens H. Bioequivalence and the immunogenicity of biopharmaceuticals. Nat Rev Drug Discov 2002;1(6):457–462. 6. Schellekens H, Ryff JC and van der Meide PH. Assays for antibodies to human interferon-alpha: the need for standardization. J Interferon Cytokine Res 1997; 17(Suppl 1):S5–S8. 7. Neyer L, Hiller J, Gish K, Keller S and Caras I. Confirming human antibody responses to a therapeutic monoclonal antibody using a statistical approach. J Immunol Methods 2006;315(1–2):80–87. Epub Aug 7. 8. Geng D, Shankar G, Schantz A, Rajadhyaksha M, Davis H and Wagner C. Validation of immunoassays used to assess immunogenicity to therapeutic monoclonal antibodies. J Pharm Biomed Anal 2005;39(3–4):364–375. 9. Mire-Sluis AR, Barrett YC, Devanarayan V, Koren E, Liu H, Maia M, Parish T, Scott G, Shankar G, Shores E, Swanson SJ, Taniguchi G, Wierda D and Zuckerman LA. Recommendations for the design and optimization of immunoassays used in the detection of host antibodies against biotechnology products. J Immunol Methods 2004;289: 1–16. 10. Bachmann MF, Rohrer UH, Kundig TM, Burki K, Hengartner H and Zinkernagel RM. The influence of antigen organization on B cell responsiveness. Science 1993;262:1448–1451. 11. Chakerian B, Lenz P, Lowy DR and Schiller JT. Determinants of autoantibody induction by conjugated papillomavirus virus-like particles. J Immunol 2002;169:6120–6126. 12. Fehr T, Bachmann MF, Bucher E, Kalinke U, Di Padova FE, Lang AB, Hengartner H and Zinkernagel RM. Role of repetitive antigen patterns for induction of antibodies against antibodies. J Exp Med 1997;185:1785–1792. 13. Veronese FM and Pasut G. PEGylation, successful approach to drug delivery. Drug Discov Today 2005;10(21):1451–1458. 14. Vadhan-Raj S. Clinical experience with recombinant human thrombopoietin in chemotherapy-induced thrombocytopenia. Semin Hematol 2000;37(2 Suppl 4):28–34. 15. Ottesen JL, Nilsson P, Jami J, Weilguny D, Duhrkop M, Bucchini D, Havelund S and Fogh JM. The potential immunogenicity of human insulin and insulin analogues evaluated in a transgenic mouse model. Diabetologia 1994;37:1178–1185. 16. Hermeling S, Schellekens H, Maas C, Gebbink MF, Crommelin DJ and Jiskoot W. Antibody response to aggregated human interferon alpha2b in wild-type and transgenic immune tolerant mice depends on type and level of aggregation. J Pharm Sci 2006; 95(5):1084–1096. 17. Hochuli E. Interferon immunogenicity: technical evaluation of interferon-alpha 2a. J Interferon Cytokine Res 1997;17(Suppl 1):S15–S21. 18. Rosenberg AS. Effects of protein aggregates: an immunologic perspective. AAPS J 2006;8(3):E501–E507. 19. Casadevall N, Nataf J, Viron B, Kolta A, Kiladjian JJ, Martin-Dupont P, Michaud PT, Ugo V, Teyssandier I, Varet B and Mayeux P. Pure red cell aplasia and antierythropoietin antibodies in patients treated with recombinant erythropoietin. New Engl J Med 2002; 346:469–475.

202 20. Braun KL, Labow MA and Alsenz J. Protein aggregates seem to play a key role among the parameters influencing the antigenicity of interferon alpha (IFN-alpha) in normal and transgenic mice. Pharm Res 1997;14:1472–1478. 21. Hermeling S, Jiskoot W, Crommelin D, Bornaes C and Schellekens H. Development of a transgenic mouse model immune tolerant for human interferon Beta. Pharm Res 2005;22(6):847–851. Epub Jun 8. 22. Stewart TA, Hollingshead PG, Pitts SL, Chang R, Martin LE and Oakley H. Transgenic mice as a model to test the immunogenicity of proteins altered by site-specific mutagenesis. Mol Biol Med 1989;6:275–281. 23. Pendley C, Schantz A and Wagner C. Immunogenicity of therapeutic monoclonal antibodies. Curr Opin Mol Ther 2003;5:172–179. 24. Presta LG. Engineering of therapeutic antibodies to minimize immunogenicity and optimize function. Adv Drug Deliv Rev 2006;58(5–6):640–656. Epub May 23.

203

Protein arginine methylation in health and disease John M. Aletta1,2, and John C. Hu3 1 CH3 BioSystems, LLC, University Commons (Suite 8), 1416 Sweet Home Road, Amherst, New York 14228, USA 2 New York State Center of Excellence in Bioinformatics and Life Sciences Buffalo, New York, USA 3 Biomedical Sciences Program, University at Buffalo, State University of New York Buffalo, New York, USA

Abstract. Protein arginine methylation is a rapidly growing field of biomedical research that holds great promise for extending our understanding of developmental and pathological processes. Less than ten years ago, fewer than two dozen proteins were verified to contain methylarginine. Currently, however, hundreds of methylarginine proteins have been detected and many have been confirmed by mass spectrometry and other proteomic and molecular techniques. Several of these proteins are products of disease genes or are implicated in disease processes by recent experimental or clinical observations. The purpose of this chapter is twofold; (1) to re-examine the role of protein arginine methylation placed within the context of cell growth and differentiation, as well as within the rich variety of cellular metabolic methylation pathways and (2) to review the implications of recent advances in protein methylarginine detection and the analysis of protein methylarginine function for our understanding of human disease. Keywords: growth, differentiation, methyltransferase, proteomics, neurodevelopment, autoimmune, viral, cancer, cardiovascular.

Introduction Protein methylation research can be traced back to 1939 [1], but the biomedical promise of the field has gained momentum only over the last 10–12 years. Post-translational methylations of the nucleophilic side chains of no less than eight different amino acids are known. Most common are methylation of either nitrogen or oxygen atoms, but evidence for the posttranslational methylation of carbon atoms in amino acid side chains has also recently been described [2]. The N-methylation of arginine side chains is the focus of this review for two reasons. First, the diversity of the enzymatic pathways involved in the generation and modification of methylarginine proteins is broad and has grown much larger in recent years. Similarly, the discovery of a number of disease proteins that are methylated on arginine residues and the contribution of arginine methyltransferases to pathological processes forecast a broad emerging clinical relevance for protein arginine methylation. There are currently five areas of clinical medicine in which Corresponding author: Tel.: (716) 688-5222. Fax: (866) 610-2261.

E-mail: [email protected] (J.M. Aletta). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00008-2

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

204 protein methylation has been associated in some way with human maladies that include neurodevelopmental diseases, autoimmune disorders, and viral, neoplastic and cardiovascular disease. Advances in biochemistry, molecular genetics and proteomics have collectively accelerated the discovery and characterization of at least nine arginine methyltransferases and hundreds of arginine-methylated proteins. Type I and type II protein arginine methyltransferases (PRMTs) are responsible for catalyzing the arginine methylation of human proteins. Both types produce mono-methylarginine and the former also leads to the formation of asymmetrical dimethylarginine, while the latter produces symmetrical dimethylarginine. There are a number of recent excellent reviews that focus on PRMTs from a variety of perspectives that include the regulatory control of transcription [3], cellular functions [4], cellular signaling [5] and phylogenetic evolution and therapeutic potential [6]. When sequence analysis of in vivo asymmetric methylation sites was first performed comprehensively, a total of only 20 proteins were confirmed methylarginine proteins based on mass spectrometric analysis [7]. The generation of methylarginine-specific antibodies [8–10] has facilitated the identification and further analysis of many other methylarginine proteins. Based on a proteomic paradigm, immunopurification of symmetrical and asymmetrical dimethylarginine protein complexes revealed more than 200 putative arginine-methylated proteins [9]. Future work must, not only confirm the methylation status by alternative means, but also demonstrate that the methylation is physiologically relevant. Despite these caveats, the large number of proteins detected by this approach indicates a fundamental and widespread role of methylarginine protein in many different cellular processes that include, but are not limited to, RNA processing, transcription, cellular differentiation and DNA repair. Currently available methylarginine-specific antibodies are based on a canonical sequence motif that is derived from the glycine- and arginine-rich (GAR) domains of well-known methylproteins. Methylarginine sites in proteins without a discernible consensus motif are also well documented [11,12] and thus, expand the range of potential methylarginine proteins even further. Among the growing number of verified methylarginine proteins are many examples of disease-related proteins that can serve as methylation substrates. Methylarginine disease proteins and the potential involvement of PRMTs in human diseases will be discussed in the concluding sections of this review. Metabolic pathways The biochemical pathways that contribute to the metabolites that take part in the genesis and potential regulation of methylarginine proteins are complex (Fig. 1). The cycle of protein methylation metabolism involves chemical intermediates that play roles in many other important cellular functions,

utilization and regeneration are illustrated. Additional reaction pathways that are not depicted include the syntheses of cyclopropane fatty acids, biotin and queuosine, a 7-deazaguanosine, hypermodified tRNA nucleoside. See text for descriptions of the pathways and the enzymes involved. ATP, adenosine triphosphate; AMP, adenosine monophosphate; PPi, inorganic pyrophosphate; Pi, inorganic phosphate; PRMT, protein arginine methyltransferase; SAM, S-adenosylmethionine; MTA, 5u-deoxy 5u-methylthioadenosine; SAH, S-adenosylhomocysteine. (The color version of this figure is hosted on Science Direct.)

Fig. 1. The methylation metabolic cycle. The principal biochemical reaction pathways that comprise S-adenosylmethionine

206 including anti-oxidant pathways, cell growth and differentiation and the methylation of DNA, RNA and lipids, as well. For example, Fontecave et al. [13] have reviewed the unusual utilization of every chemical group of the sulfonium compound, S-adenosylmethionine (SAM). The chemical moieties of SAM participate not only in the synthesis of methylated protein, DNA, RNA and lipid (methyl group), but also in the synthesis of cyclopropyl fatty acids (methylene group), biotin (amino group), a modified nucleoside of transfer RNA (ribosyl group) and polyamines (aminopropyl group). The precursor of SAM is the essential amino acid, methionine, which is acquired principally from protein in the diet. SAM synthase, a vitamin B12-dependent enzyme, catalyzes the transfer of the adenosyl group from ATP to methionine followed by the hydrolysis of the polyphosphate to inorganic pyrophosphate and phosphate. Figure 1 illustrates the two most recognized routes for SAM metabolism, polyamine synthesis and methyl-group transfers. SAM is the sole donor of an aminopropyl group for the synthesis of the polyamines, spermidine and spermine. Following the generation of putrescine via the usual rate-limiting step of ornithine decarboxylation, the polyamines are formed from the condensation of putrescine and the aminopropyl group contributed by SAM through the action of an aminopropyl transferase. The widespread role of polyamines in cell growth, differentiation and apoptosis has been acknowledged for decades, but precise functions and specific mechanisms are only recently becoming somewhat clearer [14]. SAM is also the requisite source of methyl-groups for essential methylation reactions involving proteins, nucleic acids and lipids. For every methyltransferase reaction that occurs, S-adenosylhomocysteine (SAH) is generated. SAH is hydrolyzed to produce adenosine and homocysteine by the copper enzyme, SAH hydrolase. Knockout of this enzyme in the mouse is an embryonic lethal mutation. Adenosine is further metabolized within the cell to adenosine monophosphate by adenosine kinase or converted to inosine by adenosine deaminase. Ueland [15] has provided a discussion of the potential physiological implications of the adenosine route of metabolism. Homocysteine can be re-methylated to form methionine through the action of either betaine homocysteine methyltransferase (BHMT) or methylene tetrahydrofolate (THF) reductase in conjunction with methionine synthase. Alternatively, upon condensation with serine, homocysteine forms cystathionine, which is especially abundant in human brain. Cystathionine g-lyase, a vitamin B6-dependent enzyme, catalyzes the cleavage of cystathionine to yield free cysteine which can react further with glycine and glutamate to produce glutathione, a potent anti-oxidant. The relative utilization of the metabolites within each limb of the methylation metabolic cycle and the physiological significance that can be attributed to the intermediates may vary depending on the cell or tissue type, kinetic equilibria, dietary status and the developmental and disease states of

207 the system in question. For example, re-methylation of homocysteine is dependent on dietary methionine, cell type and gender [16–18]. Remethylation of homocysteine by betaine via BHMT is absent from the brain, which has led some authors to suggest that neuropathies induced by B12 deficiency reflect decreased protein methylation in the nervous system that results from hypo-activity of the B12-dependent methionine synthase (step 8; Fig. 1). Several of the metabolites generated during the methylation metabolic cycle have the potential to inhibit protein methylation under certain circumstances. The adenosine and homocysteine produced during methyl transfer reactions can reverse the direction of SAH hydrolysis. In fact, in the absence of adenosine deaminase and without further homocysteine metabolism, the thermodynamically favored direction of the reaction pathway (Step 4; Fig. 1) is toward SAH synthesis rather than hydrolysis [19]. This is quite significant, because SAH is a very potent inhibitor of many methyltransferases. In this regard, the role of homocysteine as a cardiovascular risk factor poses a potentially important relationship between methylation pathways and disease status [20]. Finally, 5u-deoxy 5u-methylthioadenosine (MTA), formed during polyamine biosynthesis (step 2; Fig. 1) and cyclopropane fatty acid synthesis (not illustrated), is also capable of inhibiting SAH hydrolase under certain circumstances [21]. Because MTA inhibits SAH hydrolase in vitro, this metabolite often has been purported to act as an in vivo protein methylation inhibitor based on the mechanism of SAH-mediated inhibition of protein methyltransferases. However, MTA is less potent and more toxic than other SAH hydrolase inhibitors. For these reasons, the cellular action of MTA is controversial and remains to be firmly established. The dynamic or static nature of protein arginine and lysine methylation continues to be actively considered (see [22], for an overview of the debate), but work by Wang et al. [23] and Cuthbert et al. [24] has fundamentally changed the view of arginine methylation. Both groups identified peptidyl arginine deiminase 4 as a histone H3 modifying enzyme. The reaction, illustrated generally in Fig. 2, converts protein mono-methylarginine (or arginine) to citrulline with the release of methylamine. Deimination prevents re-methylation at the converted citrulline residue and dimethylation (not mono-methylation) of arginine prevents the action of the deiminase. Thus, although evidence for the reversibility of arginine methylation is lacking, the conversion of mono-methylarginine to citrulline clearly increases the range and nuance of the dynamism of protein methylation marks. The first example of a true protein demethylase was a histone-modifying enzyme, lysine-specific demethylase 1 (LSD1; [25]). The inability of LSD1, a FADdependent amine oxidase, to demethylate trimethyl-lysine histone H3, however, suggested that additional enzymes exist for this purpose. Recently an entire class of Fe2+- and a-ketoglutarate-dependent oxygenases has been described with demethylase activity directed toward trimethyl-lysine

208

Fig. 2. Protein methylarginine metabolism. Protein arginine methylation is catalyzed

by a family of enzymes known as protein arginine methyltransferases (PRMT), described further in the text. NG-mono-methylated protein can be further metabolized by at least three different routes: (i) addition of a second methyl group in asymmetric configuration by a type I PRMT, (ii) addition of a second methyl group in symmetric configuration by a type II PRMT or (iii) deimination of NG-mono-methylarginine to citrulline. The identification of protein arginine demethylase that can revert protein methylarginine back to the non-methylated state has never been documented. (The color version of this figure is hosted on Science Direct.)

residues [26]. Jumonji-domain-containing histone demethylase 1 is the founding member of the family [27]. Thus, the research over the last few years regarding the reversibility of lysine methylation raises the expectation for the discovery of additional enzymes, perhaps related to jumonji-domaincontaining species, that serve similar purposes in relation to the reversal of the methylation of protein methylarginines.

209 Role of protein arginine methyltransferases in growth and differentiation One of the principal roles of PRMTs 1, 4, 5 and 6 is participation in chromatin remodeling, based on the well-documented histone methyltransferase activities of these enzymes. PRMT1, the predominant type I arginine methyltransferase in mammalian cells [28] targets many other biologically interesting proteins as well, including SAM68 [29], RNA helicase A [30], fragile X mental retardation protein (FMRP) [31], DNA polymerase b [32] and Smad6 [33]. There are also many other proteins that are likely PRMT1 substrates including heterogeneous ribonucleoproteins [34], other nuclear proteins [35,36] and Golgi fraction-associated proteins [37]. PRMTs are required for normal mouse development based on genetargeted disruption experiments. PRMT1 disruption terminates fetal development by embryonic day 7 [38] and the disruption of PRMT4 (also known as CARM1) results in stunted embryo growth and peri-natal death [39]. The actions of PRMT1 have, furthermore, been implicated in cellular signal transduction and cell differentiation. Overexpression of PRMT1 in myeloid cells enhances retinoid-regulated gene expression during differentiation [40]. Retinoid treatment does not significantly alter the endogenous PRMT1 expression levels in the cells, although histone H4 R3 methylation associated with the HR1 enhancer is increased. The effect of retinoic acid on increased histone H4 methylation is time dependent and accompanied by subsequent demethylation and acetylation. The differentiation of these cells appears to involve a complex regulatory cascade that also involves the p53-regulated Btg2 gene [41]. In another example, there is a modest increase in anti-CD3induced PRMT1 expression that coincides with increased arginine methylation of NIP45 and augmented cytokine production in murine T helper lymphocytes [42]. Nerve growth factor (NGF) potentiates the PRMT1 activity isolated from PC12 cell extracts [43] by a mechanism not yet fully determined, but that does not involve increased PRMT1 expression. During erythroid differentiation, the PI-3 kinase controlled Forkhead box class O transcription factor, FoxO3a, acts by targeting enhanced BTG1 expression [44], which binds to PRMT1 and presumably amplifies cellular methyltransferase activity [45]. Recent findings extend the range of the PRMTs that play similar roles in other developmental systems. Histone arginine methylation has recently been found to regulate pluripotency in the early mouse embryo [46]. PRMT5 associates with the transcriptional regulator, Blimp1, in mouse germ cells [47] and Tudor, required for the formation and fate of Drosophila germ cells [48]. The PRMT5 methylation of histones H2A and H4 is involved in extensive reprogramming of germ cells. Further studies will be needed to examine other possible PRMT5 cytoplasmic substrates that may also contribute to primordial germ cell specification. PRMT6-mediated methylation of histone H3 on R2 acts to antagonize the trimethylation of K4 of the histone H3

210

Fig. 3. Developmental expression of methylarginine proteins. (A) Equal numbers of T. brucei cells of the procyclic (PF) and the bloodstream (BF) forms of the parasite were collected in SDS-sample buffer and committed to western blot for reaction with the methylarginine-specific anti-mRG (1:5,000). A non-methylated arginine peptide (GRG) and an asymmetrically dimethylarginine peptide (GmRG) were included as negative and positive controls, respectively. The electrophoretic migration of molecular mass markers is indicated in kilodaltons. (B) PC12 cells were cultured without NGF () or with NGF for sixteen days (16d), harvested in SDS-sample buffer and committed to western blot for reaction with anti-mRG (1:5,000). The blot was stripped and re-probed with anti-ERK1,2 to demonstrate equal protein loading of the gel as illustrated in the lower panel. The electrophoretic migration of molecular mass markers is indicated in kilodaltons.

tail [49]. This antagonistic interplay that controls the global levels of histone H3 in human cells is conserved in yeast [50] as well, indicating an evolutionarily important, general eukaryotic mechanism. These mutually exclusive lysine and arginine methylation marks on chromatin predict critical functions for PRMT6 in cell fate determination during early embryonic development. Fundamental changes in the pattern of cellular methylarginine proteins occur during development as a result of the regulation of PRMT signaling. Figure 3 illustrates two examples of the altered protein methylarginine expression that can occur in diverse model systems including eukaryotic parasites and mammalian cells. The methylarginine profiles of Trypanosoma brucei whole cell protein extracts from the procyclic (PF) insect vector versus

211 the mammalian bloodstream (BF) forms of the parasite are illustrated by western blot using a methylarginine-specific antibody in Fig. 3A. The expression of several methylarginine proteins is greatly reduced in the BF parasite [10]. In the case of PC12 cells, NGF stimulation leads to neuronal differentiation that is accompanied by increases in methylarginine proteins (Fig. 3B; see also [51]). Neurodevelopmental disease Methylarginine proteins are implicated in the support of normal neuronal functions based on the phenotypes that arise from gene mutations in several neurodevelopmental diseases. For example, mutations of the survival motor neuron gene 1 (SMN1) result in spinal muscular atrophy of varying degrees of severity [52,53]. The SMN protein is both the assembler and specificitygenerator of the RNA-protein complexes that are imported into the nucleus [54]. Smith (Sm) proteins are synthesized in the cytoplasm and recruited by a 6S pICln complex (Fig. 4; step 1) that interacts with a PRMT5-containing protein complex to produce a 20S complex. Once formed, the 20S complex, known as the ‘‘methylosome’’, methylates the GAR domains of SmD1, D3, B/B’ and other Sm-like proteins (step 2). A very large (B1 MDa) protein complex containing SMN multimers next associates with the methylosome for the transfer of the methylated Sm proteins and other Sm proteins to the SMN protein portion of the 1 MDa complex (step 3; see also [56]). The interaction between SMN and Sm proteins is mediated directly by a mammalian Tudor domain. Following the formation of the SMN/methylosome multimers, assembly of Sm proteins on the Sm core domain of snRNAs forms snRNP core particles (step 4) that are processed further by hypermethylation of the m7G RNA cap necessary for import into the nucleus (not illustrated). Since SMN preferentially binds symmetrical dimethylarginine-modified SmD1 and D3 [57], an intriguing hypothetical question may be posed. Might there be differential information value encoded in Sm core particles that exhibit complete versus hypo-methylated Sm proteins? An answer will require more refined analysis of the fates of different snRNPs following entry into the nucleus. It should also be noted that there are many other type I and type II methylproteins involved in RNA processing that are likewise capable of interacting with SMN [58] posing additional regulatory roles for protein arginine methylation. Fragile X syndrome, like spinal muscular atrophy, is a genetic disease that gives rise to a wide spectrum of clinical severity. The trinucleotide repeats expansion of the fragile X mental retardation gene 1 (FMR1) can cause either mental retardation or behavioral problems in non-retarded individuals, as well. The protein product of the FMR1 gene is a methylarginine RNA binding protein, the FMRP. Stetler et al. [31] have

form a 6S complex (1) that interacts with a second protein complex that exhibits type II protein arginine methyltransferase activity. The aggregation of the two complexes results in a 20S methylosome that produces symmetric dimethylation (2) of the Sm protein arginines (small spheres on RG domains). PRMT5 is illustrated as the type II PRMT, but new data from studies in human cells also indicate that PRMT7 plays a non-redundant role as well [55]. The methyl modifications greatly enhance the subsequent interaction (3) with the survival motor neuron protein (SMN). This very large protein aggregate contains higher order oligomers of the protein components, shown here as monomers for simplicity. The mature SMN complex serves as the assembly scaffold and selector of snRNAs and RNA binding Sm core proteins for final snRNP assembly (4) around the Sm domain of the snRNA. (The color version of this figure is hosted on Science Direct.)

Fig. 4. Biogenesis of Sm-class ribonucleoproteins requires protein arginine methyltransferase activity. Cytoplasmic Sm proteins

213 recently demonstrated the presence of asymmetric dimethylarginine (ADMA) in Flag-tagged FMRP that was immunoprecipitated from mammalian cells. Four methylarginine sites within GAR domains were identified. Additional results suggested that the methylation reduced the ability of FMRP to bind specific RNAs. The dimerization and recruitment of FMRP into large stress granules is, furthermore, a methylationdependent process [59]. Thus, arginine methylation of FMRP may serve to modify FMRP function by limiting or modulating interactions with RNA and/or other proteins. Finally, the protein product of the disease gene responsible for Rett syndrome appears to harbor three GRG methylation sites. Site-directed mutagenesis of any of the three arginines greatly reduces in vitro methylation by PRMT1 (Xu and Aletta, unpublished work). The protein product of the Rett gene is MeCP2 (methyl-CpG binding protein 2), particularly abundant in brain where it acts as a potent transcriptional repressor [60,61]. Bird and colleagues [62] have demonstrated that this transcriptional repression involves a histone deacetylase complex containing the Sin3A co-repressor, but a complete understanding of the normal mechanism of action of MeCP2 is still unclear. The putative methylation sites are located at the C-terminus of the methyl-CpG binding domain and in the linker region between the DNA binding and transcriptional repressor domains of the MeCP2 protein. Motor, behavioral and other neurological deficits characterize the disease phenotype. A more complete understanding of the molecular pathogenesis of Rett syndrome, as well as, spinal muscular atrophy and fragile X syndrome may come from further analysis of the normal protein methylation biology that is affected by the genes responsible for the neurodevelopmental deficits that characterize these diseases.

Autoimmune disorders Rheumatic diseases and multiple sclerosis are distinctive for the heterogeneity of both the patient pathology and the profile of autoantibodies observed in the syndromes. Protein arginine methylation pathways may contribute to the pathogenesis of several forms of these disorders. For example, the nucleolar protein, fibrillarin, has long been known as a scleroderma antigen [63] that is rich in ADMA [64]. Anti-fibrillarin autoantibodies are present in a small sub-population of patients with systemic sclerosis [65]. Individuals with anti-fibrillarin antibodies frequently exhibit internal organ involvement. The anti-Sm autoimmune response involves multiple epitopes of the snRNP common core proteins with Sm B serving as the major antigen and Sm D1 and D2 as the next most prevalent antigens [66]. In cases of systemic lupus erythematosus (SLE), greater than 30% of patients exhibit anti-Sm

214 antibodies. Symmetric dimethylation of Sm D1 and D3 forms another epitope for the production of anti-Sm autoantibodies [67,68]. The presence of the symmetric arginine dimethylation greatly increases the affinity of anti-Sm autoantibodies, which raises the possibility that variations in patient autoantibody expression may reflect fundamental differences in the extent of a particular patient’s Sm protein methylation status. Perhaps, hypoor hypermethylation of antigens plays a role in autoantibody generation and disease symptoms, status and/or progression. In addition to fibrillarin and Sm proteins, there is growing evidence that autoantibodies to heterogeneous nuclear ribonucleoproteins (hnRNPs) can serve as important targets of the autoimmune response. Anti-hnRNP reactivity in rheumatoid arthritis, SLE and mixed connective tissue disease is mostly directed against the methylarginine proteins, hnRNP A1, A2 and K [69]. Nucleolin, which may be the most highly methylated methylarginine protein, also serves as an autoantigen in the serum of some patients with rheumatic disease [70]. Beyond potential roles in diagnostic tests and clinical practice, these observations may facilitate the development of new molecular tools to understand better the molecular basis of the autoimmune response in humans. Myelin basic protein (MBP), a prominent candidate autoantigen relevant to multiple sclerosis, was among the very first identified methylarginine proteins [71]. A single site (R107) within human MBP is subject to monomethylation and type I symmetric dimethylation. The protein methylation of MBP is thought to be essential for the formation of compact myelin in the nervous system. Kim et al. [72] found that mono-methylarginine and dimethylarginine are increased in MBP isolated from multiple sclerosis white matter relative to that from normal human control specimens. This may be related to a physiological attempt at re-myelination or some other reaction to the de-myelinating environment. Additional analysis revealed that MBP from multiple sclerosis brains is less phosphorylated and more deiminated (citrulline-containing; see Fig. 2) than MBP from normal tissue. As described above for other human diseases, the heterogeneity of the clinical course of multiple sclerosis may, in part, reflect the alterations in the type and extent of MBP methylation. Viral disease The pioneering work from the Cantoni and Borchardt laboratories [73,74] established the concept of rational drug design of anti-viral agents based on methylation inhibitors. More recently, noteworthy advances in understanding the possible effects of protein arginine methylation in viral disease pathogenesis have been accruing rapidly. Mears and Rice [75] reported the earliest suggestion of an association between a specific protein methylation and virus biology. Studies on the protein product of the immediate early gene

215 ICP27 demonstrated that the GAR domain of ICP27 protein is required for methylation of the protein in virus-infected cells. Since methylation was dependent on the presence of the 13 amino acid GAR domain and the only methylatable amino acid of the GAR domain is arginine, ICP27 must be a target of a PRMT. Following the first report of a viral methylarginine protein, many other methylarginine proteins derived from viruses have been described, including Epstein Barr nuclear antigen 2 [76], the hepatitis delta virus (HDV) antigen [77], the NS3 RNA helicase/protease of hepatitis C virus [78] and the HIV-proteins, Tat and Rev [79,80]. In addition to the identification of new viral methylarginine proteins, greater insight into the function of the methylated forms of the proteins is now available as well. With regard to the HIV-1 methylarginine proteins, PRMT6-mediated methylation occurs in arginine-rich, but glycine-poor domains. Increased methylation of Tat decreases transactivation of a HIV-1 reporter plasmid and knockdown of PRMT6 increases HIV-1 production and infectiousness. Overexpression of wild type PRMT6, but not a methylationinactive mutant form, decreases Rev-mediated viral nuclear export and Rev binding to the Rev response element. Thus, in the case of the PRMT6 targets, methylation acts to limit HIV replication. Alternatively, non-specific, adenosine periodate (AdOX) inhibition of all SAM-dependent methylation in HIV-1 infected cells was recently found to promote maximal virus infectivity [81]. In a series of studies on virus production, Gag-Pol processing, viral structure and infectivity, AdOX affects early events in viral pathogenesis, between receptor binding and un-coating, but not reverse transcription. In the case of PRMT1 methylation of HDV hepatitis delta antigen, we find an instance of protein methylation that promotes viral pathogenesis. Both sitedirected mutagenesis of the viral antigen methylation site (R13) or treatment of infected cells with SAH, the protein methylation inhibitor, interferes with HDV RNA replication. The integration and final readout of the methylation modifications discovered in the experimental systems above may turn out to be more complex when examined in organ systems. The important role of protein arginine methylation in viral biology, however, is now clearly evident and deserving of additional study with the ultimate goal of exposing novel drug targets for anti-viral therapeutics [82]. For example, the interactions of viral methylarginine proteins with host proteins may have profound effects on pathogenesis. EBNA2 is methylated in vivo within an 18 amino acid GAR domain, which is required for association with the SMN binding partner. Additional reports implicate interactions between murine parvovirus proteins and SMN protein with regard to viral infectivity [83,84]. Based on a screen for proteins associated with the papilloma virus nuclear transcription factor, E2, it has been suggested that spinal muscular atrophy may also, in part, result from abnormal gene expression mediated by SMN interaction with the E2 transactivator [85].

216 Cancer Protein arginine methylation has been implicated in an expanding array of fundamental biological processes that are relevant to human cancers. Recent experimental results regarding genomic instability, DNA damage and repair, metastatic potential and clinical correlations all point to potential oncogenic pathways involving protein arginine methylation. CARM1 and PRMT1 are co-activator enzymes for nuclear receptors and CARM1 for b-catenin as well. These gene-activating targets are frequently involved in the genesis of hematological and solid malignancies. In the absence of the major type I PRMT of yeast, Hmt1, cells display increased transcription from previously silent chromatin and increased rates of mitotic recombination. Genetic assays and ChIP analysis show that the null Hmt1 yeast lack recruitment of Sir2, the silent information regulator, at typically silent chromatin domains and histone H4 R3 methylation is lost. Based on these findings, Yu et al. [86] have proposed a model in which protein arginine methylation has a role in the maintenance of genome stability. DNA damage and repair processes, also crucial for maintaining the genome and restricting carcinogenesis, are also affected by protein methylation events. MRE11, p53 binding protein 1 (53BP1) and DNA polymerase b are critical proteins involved in DNA repair mechanisms. All three proteins have also been found to be arginine methylated in vivo. MRE11 exhibits a GAR domain that is methylated by PRMT1 [87]. Alanine substitution of the arginines within the GAR domain abrogates all of the exonuclease activity of MRE11 and even lysine substitution severely reduced the activity, indicating that the arginines or methylarginines are necessary for optimum exonuclease activity. Both PRMT1 shRNA and MTA-treated cells progress through S phase after DNA damage displaying similar checkpoint defects. However, when exogenous repair complexes containing methylated MRE11 were introduced into cells, significantly reduced DNA synthesis in the PRMT1 shRNA-treated cells was observed confirming a methylation-dependent event in the activation of intra-S-phase checkpoint control. Although the molecular mechanism underlying the influence of the GAR methylarginines on checkpoint control and DNA repair remains unclear, the experimental manipulations that impair exonuclease activity do so without affecting MRE11 binding to the DNA repair complex. Overall, in addition to the regulatory role of methylated MRE11, these results are also compatible with contributions from other methylarginine proteins as well. The GAR motif of 53BP1 is also arginine methylated by PRMT1 [88]. In this case, the methylation may influence the DNA binding activity of 53BP1, but not the oligomerization of 53BP1 [89]. In addition to the GAR motif, 53BP1 contains two tandem Tudor domains that can also serve as DNA interaction sites. Both the GAR motif and the tandem Tudor domains are necessary for the recruitment of 53BP1 to DNA damage sites. Mutation of

217 the GAR motif arginines dramatically reduces 53BP1 DNA binding, but this experiment does not directly address the contribution of methylarginine versus the lack of positively charged arginines. Repair of DNA single base damage is performed by base excision repair. DNA polymerase b, a multifunctional 335 amino acid constitutively expressed protein, is a key component of the repair mechanism. At least three arginine methylation sites within nondescript primary sequence of the polypeptide have been identified by in vitro and in vivo analyses. The PRMT6-mediated methylation of R83 and R152 stimulates polymerase activity by enhancing DNA binding and processivity without affecting other known functions of DNA polymerase b [90]. PRMT1 specifically methylates R137 with no change in any of the enzymatic activities of DNA polymerase b, but the methylation event is accompanied by abrogation of the interaction between DNA polymerase b and proliferating cell nuclear antigen [32]. The implications of the actions of the protein arginine methylation events involved in these DNA repair mechanisms is discussed at greater length in a timely new review [91]. High mobility group A (HMGA) proteins of approximately 100 amino acid residues have three AT hook regions that are responsible for interactions with the minor groove of AT-rich DNA for the formation of protein–nucleic acid complexes called enhanceosomes. In addition to overexpression in cancer cells, HMGA1a protein is subject to a wide variety of posttranslational modifications including arginine methylation. PRMT6 specifically methylates HMGA1a in vitro and in vivo [92]. The methylation sites are R57 and R59 at the third AT hook region. Although there is evidence for several other arginine methylation sites at the other two AT hook regions, R59 methylation is most interestingly associated with the increased metastatic potential. Two different cell lines exhibit mono- or dimethylarginine modification, whereas the non-metastatic parental precursor cells are not detectably modified at R59. Additional information from other cell lines and naturally occurring tumors with regard to the role of methylarginine sites in HMGA1a will be useful for further critical assessment of the role of arginine methylation in metastatic potential and cancer biology. Finally, the expression of CARM1 in prostatic adenocarcinoma and prostatic intraepithelial neoplasia was significantly elevated above that in benign prostate tissue specimens [93]. Despite this interesting correlation, currently there is no direct evidence for the role of CARM1 in androgendependent neoplasia. However, an essential function for PRMT1 has recently been revealed in a leukemogenic retroviral transduction and transplantation assay [94]. The mixed lineage leukemia (MLL) gene, which encodes a lysine methyltransferase, can be rearranged by chromosomal translocation to form a major family of MLL SH3-domain-containing fusion proteins [95]. When murine primary hematopoietic cells are transduced with MLL fusion protein, an acute myeloid leukemia is produced which closely recapitulates the human

218 disease associated with the fusion oncoprotein. After demonstrating the in vivo interaction with endogenous PRMT1, the authors prepared a direct fusion of MLL with PRMT1 or SAM68, an adaptor protein that is itself a PRMT1 substrate. The fusions were capable of transforming myeloid progenitors and enhancing the self-renewal of primary hematopoietic cells. The same construction made with catalytically inactive PRMT1 was incapable of generating transforming activity. Also, specific knockdown of PRMT1 or SAM68 suppressed MLL-mediated transformation. Thus, a specific, critical role of PRMT1 in a potential oncogenic pathway has now been clearly demonstrated. Future work will be required to determine whether the molecular mechanism of the oncogenesis involves activation, repression or both processes. Cardiovascular disease The proteolysis of methylarginine proteins gives rise to free guanidinomethylated arginine residues in blood and tissues. Endogenous monomethylated arginine, ADMA and symmetric dimethylarginine (SDMA) are present in the human body [96]. Interestingly, mono-methylarginine and ADMA, but not SDMA, competitively inhibit all three isoforms of nitric oxide synthase (NOS). In the case of endothelial NOS, vascular injury has been found to increase ADMA by B4-fold with subsequent inhibition of eNOS and significant impairment of vascular relaxation [97]. Boger et al. [98] have proposed that low density lipoprotein-mediated release of free methylated arginines (ADMA, but not SDMA) from endothelial cells, involves regulation by PRMTs. Furthermore, a specific enzyme, dimethylarginine dimethylaminohydrolase, hydrolyzes ADMA to yield citrulline and dimethylamine, but has no activity toward SDMA. Thus, the protein arginine methylation metabolic pathways of Figs. 1 and 2 that specifically involve type I arginine methylation are also pertinent for cardiovascular function in certain disease states. For example, atherosclerosis and hypertensive disorders are associated with elevated ADMA [99]. Chronic renal failure exhibits especially high ADMA levels because the kidney is a major route of ADMA elimination. Additional regulatory effects of the proteolysis of ADMA proteins on neuronal NOS and inducible or inflammatory NOS have been proposed, but not studied extensively [96]. Further investigation of the role of protein arginine methylation in cardiovascular and other diseases that involve disrupted NOS systems will no doubt yield a deeper understanding of the expanding landscape of syndromes connected with the methylation metabolic cycle. Conclusion The diversity of the types of proteins, biological processes and diseases that are associated with protein arginine methylation has increased enormously over the last decade. Protein arginine methylation now rivals all other

219 post-translational modifications, including serine, threonine and tyrosine phosphorylation, as a fertile frontier for the exploration of new disease models of pathogenesis and therapeutic interventions. Note added in proof During the processing of the manuscript for editing and typesetting, the breakthrough identification of histone arginine demethylase activity associated with Jumonji domain-containing protein 6 (JMJD6) was described by Chang et al. (Science 2007;318:414–447). JMJD6 is a non-heme dioxygenase that belongs to the same enzyme family as the lysine demethylase, JHDM1, described in this review. Conflict of interest disclosure John M. Aletta is the Founder and Principal Scientist of CH3 BioSystems, LLC, a supplier of molecular tools for research discovery in protein methylation biology. References 1. Paik WK, Paik DC and Kim S. Historical review: the field of protein methylation. Trends Biochem Sci 2007;32:146–152. 2. Walsh CT. Posttranslational Modifications of Proteins: Expanding Nature’s Inventory, Greenwood Village, CO, Roberts and Company Publishers, 2006. 3. Lee DY, Teyssier C, Strahl BD and Stallcup MR. Role of protein methylation in regulation of transcription. Endocr Rev 2005;26:147–170. 4. Pahlich S, Zakaryan RP and Gehring H. Protein arginine methylation: cellular functions and methods of analysis. Biochim Biophys Acta 2006;1764:1890–1903. 5. Blanchet F, Schurter BT and Acuto O. Protein arginine methylation in lymphocyte signaling. Curr Opin Immunol 2006;18:321–328. 6. Krause CD, Yang Z-H, Kim Y-S, Lee J-H, Cook JR and Pestka S. Protein arginine methyltransferases: evolution and assessment of their pharmacological and therapeutic potential. Pharmacol Ther 2006;113:50–87. 7. Gary JD and Clarke S. RNA and protein interactions modulated by protein arginine methylation. Prog Nucleic Acid Res Mol Biol 1998;61:65–131. 8. Boisvert FM, Cote J, Boulanger MC, Cleroux P, Bach F, Autexier C and Richard S. Symmetrical dimethylarginine methylation is required for the localization of SMN in Cajal bodies and pre-mRNA splicing. J Cell Biol 2002;159:957–969. 9. Boisvert FM, Cote J, Boulanger MC and Richard S. A proteomic analysis of argininemethylated protein complexes. Mol Cell Proteomics 2003;2:1319–1330. 10. Duan P, Xu Y, Birkaya B, Myers J, Pelletier M, Read LK, Guarnaccia C, Pongor S, Denman RB and Aletta JM. Generation of polyclonal antiserum for the detection of methylarginine proteins. J Immunol Meth 2007;320:132–142. 11. Smith JJ, Rucknagel KP, Schierhorn A, Tang J, Nemeth A, Linder M, Herschman HR and Wahle E. Unusual sites of arginine methylation in Poly(A)-binding protein II and in vitro methylation by protein arginine methyltransferases PRMT1 and PRMT3. J Biol Chem 1999;274:13229–13234.

220 12. Xu W, Chen H, Du K, Aashara H, Tini M and Emerson BM. Montminy M and Evans RM A transcriptional switch mediated by cofactor methylation. Science 2001;294:2507–2511. 13. Fontecave M, Atta M and Mulliez E. S-adenosylmethionine: nothing goes to waste. Trends Biochem Sci 2004;29:243–249. 14. Janne J, Alhonen L, Pietila M and Keinanen TA. Genetic approaches to the cellular functions of polyamines in mammals. Eur J Biochem 2004;271:877–894. 15. Ueland PM. Pharmacological and biochemical aspects of S-adenosylhomocysteine and S-adenosylhomocysteine hydrolase. Pharmacol Rev 1982;34:223–253. 16. Krebs HA, Hems R and Tyler B. The regulation of folate and methionine metabolism. Biochem J 1976;158:341–353. 17. Chiang PK, Gordon RK, Tal J, Zeng GC, Doctor BP, Pardhasaradhi K and McCann PP. S-Adenosylmethionine and methylation. FASEB J 1996;10:471–480. 18. Selhub J. Homocysteine metabolism. Annu Rev Nutr 1999;19:217–246. 19. De La Haba G and Cantoni GL. The enzymatic synthesis of S-adenosyl-L-homocysteine from adenosine and homocysteine. J Biol Chem 1959;234:603–608. 20. Herrmann W, Herrmann M and Obeid R. Hyperhomocysteinaemia: a critical review of old and new aspects. Curr Drug Metab 2007;8:17–31. 21. Ferro AJ, Vandenbark AA and MacDonald MR. Inactivation of S-adenosylhomocysteine hydrolase by 5u-deoxy-5umethylthioadenosine. Biochem Biophys Res Commun 1981; 100:523–531. 22. Bannister AJ, Schneider R and Kouzarides T. Histone methylation: dynamic or static? Cell 2002;10:801–806. 23. Wang Y, Wysocka J, Sayegh J, Lee Y-H, Perlin JR, Leonelli L, Sonbuchner S, McDonald CH, Cook RG, Dou Y, Roeder RG, Clarke S, Stallcup MR, Allis CD and Coonrod SA. Human PAD4 regulates histone arginine methylation levels via demethylimination. Science 2004;306:279–283. 24. Cuthbert GL, Daujat S, Snowden AW, Erdjument-Bromage H, Hagiwara T, Yamada M, Schneider R, Gregory PD, Tempst P, Bannister AJ and Kouzarides T. Histone deimination antagonizes arginine methylation. Cell 2004;118:545–553. 25. Shi Y, Lan F, Matson C, Mulligan P, Whetstine JR, Cole PA, Casero RA and Shi Y. Histone dimethylation mediated by the nuclear amine oxidase homolog LSD1. Cell 2004;119:941–953. 26. Chen Z, Zang J, Whetstine J, Hong X, Davrazou F, Kutateladze TG, Simpson M, Mao Q, Dai S, Hagman J, Hansen K, Shi Y and Zhang G. Structural insights into histone dimethylation by JMJD2 family members. Cell 2006;125:691–702. 27. Tsukada Y, Fang J, Erdjument-Bromage H, Warren ME, Borchers CH, Tempst P and Zhang Y. Histone demethylation by a family of JmjC domain-containing proteins. Nature 2006;439:811–816. 28. Tang J, Frankel A, Cook RJ, Kim S, Paik WK, Williams KR, Clarke S and Herschman HR. PRMT1 is the predominant type I protein arginine methyltransferase in mammalian cells. J Biol Chem 2000;275:7723–7730. 29. Cote J, Boisvert F-M, Boulanger MC, Bedford MT and Richard S. Sam68 RNA binding protein is an in vivo substrate for protein arginine N-methyltransferase 1. Mol Biol Cell 2003;14:274–287. 30. Smith WA, Schurter BT, Wong-Staal F and David M. Arginine methylation of RNA helicase A determines its subcellular localization. J Biol Chem 2004;279:22795–22798. 31. Stetler A, Winograd C, Sayegh J, Cheever A, Patton E, Zhang X, Clarke S and Ceman S. Identification and characterization of the methyl arginines in the fragile X mental retardation protein Fmrp. Hum Mol Genet 2006;15:87–96.

221 32. El-Andaloussi N, Valovka T, Toueille M, Hassa PO, Gehrig P, Covic M, Hubscher U and Hottiger MO. Methylation of DNA polymerase beta by protein arginine methyltransferase 1 regulates its binding to proliferating cell nuclear antigen. FASEB J 2007;21:26–34. 33. Inamitsu M, Itoh S, Hellman U, Ten Dijke P and Kato M. Methylation of Smad6 by protein arginine methyltransferase 1. FEBS Lett 2006;580:6603–6611. 34. Liu Q and Dreyfuss G. In vivo and in vitro arginine methylation of RNA-binding proteins. Mol Cell Biol 1995;15:2800–2808. 35. Lischwe MA, Roberts KD, Yeoman LC and Busch H. Nucleolar specific acidic phosphoprotein C23 is highly methylated. J Biol Chem 1982;257:14600–14602. 36. Lapeyre B, Amalric F, Ghaffari SH, Rao SV, Dumbar TS and Olson MOJ. Protein and cDNA sequence of a glycine-rich, dimethylarginine-containing region located near the carboxyl-terminal end of nucleolin (C23 and 100 kDa). J Biol Chem 1986;261:9167–9173. 37. Wu CC, MacCoss MJ, Mardones G, Fiiigan C, Mogelsvang S, Yates JR and Howell KE. Organellar proteomics reveals Golgi arginine dimethylation. Mol Biol Cell 2004;15: 2907–2919. 38. Pawlak MR, Scherer CA, Chen J, Roshon MJ and Ruley HE. Arginine N-methyltransferase 1 is required for early postimplantation mouse development, but cells deficient in the enzyme are viable. Mol Cell Biol 2000;20:4859–4869. 39. Yadav N, Lee J, Kim J, Shen J, Hu MC, Aldaz CM and Bedford MT. Specific protein methylation defects and gene expression perturbations in coactivator-associated arginine methyltransferase 1-deficient mice. Proc Natl Acad Sci USA 2003;100:6464–6468. 40. Balint BL, Szanto A, Madi A, Bauer UM, Gabor P, Puskas LG, Davies PJ and Nagy L. Arginine methylation provides epigenetic transcription memory for retinoid-induced differentiation in myeloid cells. Mol Cell Biol 2005;25:5648–5663. 41. Passeri D, Marcucci A, Rizzo G, Bill M, Panigada, M, Leonardi L, Tirone F and Grignani F. Btg2 enhances retinoic acid-induced differentiation by modulating histone H4 methylation and acetylation. Mol Cell Biol 2006;26:5023–5032. 42. Mowen K, Schurter BT, Fathman JW, David M and Glimcher L. Arginine methylation of NIP45 modulates cytokine gene expression in effector T lymphocytes. Mol Cell 2004;15:559–571. 43. Cimato TR, Tang J, Xu Y, Guarnaccia C, Herschman HR, Pongor S and Aletta JM. Nerve growth factor-mediated increases in protein methylation occur predominantly at type I arginine methylation sites and involve protein arginine methyltransferase 1. J Neurosci Res 2002;67:435–442. 44. Bakker WJ, Blazquez-Domingo M, Kolbus A, Besooyen J, Steinlein P, Beug H, Coffer PJ, Lowenberg B, von Lindern M and van Dijk TB. FoxO3a regulates erythroid differentiation and induces BTG1, an activator of protein arginine methyl transferase 1. J Cell Biol 2004;164:175–184. 45. Lin WJ, Gary JD, Yang MC, Clarke S and Herschman HR. The mammalian immediateearly TIS21 protein and the leukemia-associated BTG1 protein interact with a proteinarginine N-methyltransferase. J Biol Chem 1996;271:15034–15044. 46. Torres-Padilla ME, Parfitt DE, Kouzarides T and Zernicka-Goetz M. Histone arginine methylation regulates pluripotency in the early mouse embryo. Nature 2007;445:214–218. 47. Ancelin K, Lange UC, Hajkova P, Schneider R, Bannister AJ, Kouzarides T and Surani MA. Blimp1 associates with Prmt5 and directs histone arginine methylation in mouse germ cells. Nat Cell Biol 2006;8:623–630. 48. Anne J and Mechler BM. Valois, a component of the nuage and pole plasm, is involved in assembly of these structures, and binds to Tudor and the methyltransferase Capsule´en. Development 2005;132:2167–2177.

222 49. Guccione E, Bassi C, Casadio F, Martinato F, Cesaroni M, Schuchlautz H, Luscher B and Amati B. Methylation of histone H3R2 by PRMT6 and H3K4 by an MLL complex are mutually exclusive. Nature 2007;449:933–937. 50. Kirmizis A, Santos-Rosa H, Penkett CJ, Singer MA, Vermeulen M, Mann M, Ba¨hler J, Green RD and Kouzarides T. Arginine methylation at histone H3R2 controls deposition of H3K4 trimethylation. Nature 2007;449:928–932. 51. Cimato TR, Ettinger MJ, Zhou X and Aletta JM. Nerve growth factor-specific regulation of protein methylation during neuronal differentiation of PC12 cells. J Cell Biol 1997;138:1089–1103. 52. Lefebvre S, Burglen L, Reboullet S, Clemont O, Burlet P, Viollet L, Benichou B, Cruaud C, Millasseau P, Zeviani M, et al. Identification and characterization of a spinal muscular atrophy-determining gene. Cell 1995;80:155–165. 53. Hausmanowa-Petrusewicz I and Vrbova G. Spinal muscular atrophy: a delayed development hypothesis. Neuroreport 2005;16:657–661. 54. Battle DJ, Kasim M, Yong J, Lotti F, Lau CK, Mouaikel J, Zhang Z, Han K, Wan L and Dreyfuss G. The SMN complex: an assembly machine for RNPs. Cold Spring Harb Symp Quant Biol 2006;71:313–320. 55. Gonsalvez GB, Tian L, Ospina JK, Boisvert FM, Lamond AI and Matera AG. Two distinct arginine methyltransferases are required for biogenesis of Sm-class ribonucleoproteins. J Cell Biol 2007;178:733–740. 56. Friesen WJ, Paushkin S, Wyce A, Massenet S, Pesirides GS, Van Duyne G, Rappsilber J, Mann M and Dreyfuss G. The methylosome, a 20S complex containing JBP1 and pICln, produces dimethylarginine-modified Sm proteins. Mol Cell Biol 2001;21:8289–8300. 57. Friesen WJ, Massenet S, Paushkin S, Wyce A and Dreyfuss G. SMN, the product of the spinal muscular atrophy gene, binds preferentially to dimethylarginine-containing protein targets. Mol Cell 2001;7:1111–1117. 58. Yong J, Wan L and Dreyfuss G. Why do cells need an assembly machine for RNAprotein complexes? Trends Cell Biol 2004;14:226–232. 59. Dolzhanskaya N, Merz G, Aletta JM and Denman RB. Methylation regulates the intracellular protein–protein and protein–RNA interactions of FMRP. J Cell Sci 2006;119:1933–1946. 60. Amir RE, Van den Veyver IB, Wan M, Tran CQ, Francke U and Zoghbi HY. Rett syndrome is caused by mutations in X-linked MECP2, encoding methyl-CpG-binding protein 2. Nat Genet 1999;23:185–188. 61. Hendrich B. Methylation moves into medicine. Curr Biol 2000;10:R60–R63. 62. Nan X, Ng HH, Johnson CA, Laherty CD, Turner BM, Eisenman RN and Bird A. Transcriptional repression by the methyl-CpG-binding protein MeCP2 involves a histone deacetylase complex. Nature 1998;393:386–389. 63. Aris JP and Blobel G. cDNA cloning and sequencing of human fibrillarin, a conserved nucleolar protein recognized by autoimmune antisera. Proc Natl Acad Sci USA 1991;88:931–935. 64. Lischwe MA, Ochs RL, Reddy R, Cook RG, Yeoman LC, Tan EM, Reichlin M and Busch H. Purification and partial characterization of a nucleolar scleroderma antigen (Mr ¼ 34,000; pI, 8.5) rich in NG, NG-dimethylarginine. J Biol Chem 1985;260:14304–14310. 65. Tormey VJ, Bunn CC, Denton CP and Black CM. Anti-fibrillarin antibodies in systemic sclerosis. Rheumatology 2001;40:1157–1162. 66. Zieve GW and Khusial PR. The anti-Sm immune response in autoimmunity and cell biology. Autoimmun Rev 2003;2:235–240.

223 67. Brahms H, Raymackers J, Union A, de Keyser F, Meheus L and Lu¨hrmann R. The C-terminal RG dipeptide repeats of the spliceosomal Sm proteins D1 and D3 contain symmetrical dimethylarginines, which form a major B-cell epitope for anti-Sm autoantibodies. J Biol Chem 2000;275:17122–17129. 68. Mahler M, Fritzler MJ and Bluthner M. Identification of a SmD3 epitope with a single symmetrical dimethylation of an arginine residue as a specific target of a subpopulation of anti-Sm antibodies. Arthritis Res Ther 2005;7:R19–R29. 69. Caporali R, Bugatti S, Bruschi E, Cavagna L and Montecucco C. Autoantibodies to heterogeneous nuclear ribonucleoproteins. Autoimmunity 2005;38:25–32. 70. Minota S and Winfield JB. Autoantibodies to nucleolin in systemic lupus erythematosus and other diseases. J Immunol 1991;146:2249–2252. 71. Baldwin GS and Carnegie PR. Specific enzymic methylation of an arginine in the experimental allergic encephalomyelitis protein from human myelin. Science 1971;171:579–581. 72. Kim JK, Mastronardi FG, Wood DD, Lubman DM, Zand R and Moscarello MA. Multiple sclerosis: an important role for post-translational modifications of myelin basic protein in pathogenesis. Mol Cell Proteomics 2003;2:453–462. 73. Bader JP, Brown NR, Chiang PK and Cantoni GL. 3-Deazaadenosine, an inhibitor of adenosylhomocysteine hydrolase, inhibits reproduction of Rous sarcoma virus and transformation of chick embryo cells. Virology 1978;89:494–505. 74. Borchardt RT, Keller BT and Patel-Thombre U. Neplanocin A: a potent inhibitor of S-adenosylhomocysteine hydrolase and of vaccinia virus multiplication in mouse L929 cells. J Biol Chem 1984;259:4353–4358. 75. Mears WE and Rice SA. The RGG box motif of the herpes simplex virus ICP27 protein mediates an RNA-binding activity and determines in vivo methylation. J Virol 1996;70:7445–7453. 76. Barth S, Liss M, Voss MD, Dobner T, Fischer U, Meister G and Gra¨sser FA. EpsteinBarr virus nuclear antigen 2 binds via its methylated arginine-glycine repeat to the survival motor neuron protein. J Virol 2003;8:5008–5013. 77. Li Y-J, Stallcup MR and Lai MC. Hepatitis delta virus antigen is methylated at arginine residues and methylation regulates subcellular localization and RNA replication. J Virol 2004;78:13325–13334. 78. Rho J, Choi S, Seong YR, Choi J and Im DS. The arginine-1493 residue in QRRGRTGR1493G motif IV of the hepatitis C virus NS3 helicase domain is essential for NS3 protein methylation by the protein arginine methyltransferase 1. J Virol 2001;75:8031–8044. 79. Boulanger MC, Liang C, Russell RS, Lin R, Bedford MT, Wainberg MA and Richard S. Methylation of Tat by PRMT6 regulates human immunodeficiency virus type 1 gene expression. J Virol 2005;79:124–131. 80. Invernizzi CF, Xie B, Richard S and Wainberg MA. PRMT6 diminishes HIV-1 Rev binding to and export of viral RNA. Retrovirology 2006;3:93. 81. Willemsen NM, Hitchen EM, Bodetti TJ, Apolloni A, Warrilow D, Piller SC and Harrich D. Protein methylation is required to maintain optimal HIV-1 infectivity. Retrovirology 2006;3:92. 82. Yedavalli VR and Jeang KT. Methylation: a regulator of HIV-1 replication? Retrovirology 2007;4:9. 83. Young PJ, Jensen KT, Burger LR, Pintel DJ and Lorson CL. Minute virus of mice NS1 interacts with the SMN protein, and they colocalize in novel nuclear bodies induced by parvovirus infection. J Virol 2002;76:3892–3904.

224 84. Young PJ, Jensen KT, Burger LR, Pintel DJ and Lorson CL. Minute virus of mice small nonstructural protein NS2 interacts and colocalizes with the Smn protein. J Virol 2002;76:6364–6369. 85. Strasswimmer J, Lorson CL, Breiding DE, Chen JJ, Le T, Burghes AH and Androphy EJ. Identification of survival motor neuron as a transcriptional activator-binding protein. Hum Mol Genet 1999;8:1219–1226. 86. Yu MC, Lamming DW, Eskin JA, Sinclair DA and Silver PA. The role of protein arginine methylation in the formation of silent chromatin. Genes Dev 2006;20:3249–3254. 87. Boisvert FM, De´ry U, Masson JY and Richard S. Arginine methylation of MRE11 by PRMT1 is required for DNA damage checkpoint control. Genes Dev 2005;19:671–676. 88. Boisvert FM, Rhie A, Richard S and Doherty AJ. The GAR motif of 53BP1 is arginine methylated by PRMT1 and is necessary for 53BP1 DNA binding activity. Cell Cycle 2005;4:1834–1841. 89. Adams MM, Wang B, Xia Z, Morales JC, Lu X, Donehower LA, Bochar DA, Elledge SJ and Carpenter PB. 53BP1 oligomerization is independent of its methylation by PRMT1. Cell Cycle 2005;4:1854–1861. 90. El-Andaloussi N, Valovka T, Toueille M, Steinacher R, Focke F, Gehrig P, Covic M, Hassa PO, Scha¨r P, Hu¨bscher U and Hottiger MO. Arginine methylation regulates DNA polymerase beta. Mol Cell 2006;22:51–62. 91. Lake AN and Bedford MT. Protein methylation and DNA repair. Mutat Res 2007;618:91–101. 92. Sgarra R, Lee J, Tessari MA, Altamura S, Spolaore B, Giancotti V, Bedford MT and Manfioletti G. The AT-hook of the chromatin architectural transcription factor high mobility group A1a is arginine-methylated by protein arginine methyltransferase 6. J Biol Chem 2006;281:3764–3772. 93. Hong H, Chinghai K, Jeng M-H, Eble JN, Koch MO, Gardner TA, Zhang S, Li L, Pan C-X, Hu Z, MAcLennan GT and Cheng L. Aberrant expression of CARM1, a transcriptional coactivator of androgen receptor, in the development of prostate carcinoma and androgen-independent status. Cancer 2004;101:83–89. 94. Cheung N, Chan LC, Thompson A, Cleary ML and So CW. Protein argininemethyltransferase-dependent oncogenesis. Nat Cell Biol 2007;9:1208–1215. 95. So CW, Caldas C, Liu MM, Chen SJ, Huang QH, Gu LJ, Sham MH, Wiedemann LM and Chan LC. EEN encodes for a member of a new family of proteins containing an Src homology 3 domain and is the third gene located on chromosome 19p13 that fuses to MLL in human leukemia. Proc Natl Acad Sci USA 1997;94:2563–2568. 96. Vallance P and Leiper J. Blocking NO synthesis: how, where and why? Nat Rev Drug Discov 2002;1:939–950. 97. Cardounel AJ, Cui H, Samouilov A, Johnson W, Kearns P, Tsai AL, Berka V and Zweier JL. Evidence for the pathophysiological role of endogenous methylarginines in regulation of endothelial NO production and vascular function. J Biol Chem 2007;282:879–887. 98. Boger RH, Sydow K, Borlak J, Thum T, Lenzen H, Schubert B, Tsikas D and Bode-Boger SM. LDL cholesterol upregulates synthesis of asymmetrical dimethylarginine in human endothelial cells: involvement of S-adenosylmethionine-dependent methyltransferases. Circ Res 2000;87:99–105. 99. Boger RH. Asymmetric dimethylarginine (ADMA): a novel risk marker in cardiovascular medicine and beyond. Ann Med 2006;38:126–136.

225

Identification and characterization of a novel cytotoxic protein, parasporin-4, produced by Bacillus thuringiensis A1470 strain Shiro Okumura1, Hiroyuki Saitoh1, Tomoyuki Ishikawa1, Eiichi Mizuki1 and Kuniyo Inouye2, 1

Fukuoka Industrial Technology Centre, Kurume, Fukuoka 839-0861, Japan Division of Food Science and Biotechnology, Graduate School of Agriculture, Kyoto University, Kitashirakawa-Oiwakecho, Sakyo-ku, Kyoto 606-8502, Japan 2

Abstract. In 1901, a unique bacterium was isolated as a pathogen of the sotto disease of the silkmoth larvae, and later in 1915, the organism was described as Bacillus thuringiensis. Since the discovery, this bacterium has widely attracted attention of not only insect pathologists but many other scientists who are interested in strong and specific insecticidal activity associated with inclusion bodies of B. thuringiensis. This has led to the recent worldwide development of B. thuringiensis-based microbial insecticides and insect-resistant transgenic plants, as well as the epoch-making discovery of parasporin, a cancer cell-specific cytotoxin. In the review, we introduce a detection study of interaction between inclusion proteins of B. thuringiensis and brush border membrane of insects using surface plasmon resonance-based biosensor, and then identification and cloning of parasporin-4, a latest cancer cell-killing protein produced by B. thuringiensis A1470 strain. Inclusion bodies of the parasporin-4 produced by recombinant Escherichia coli were solubilized and activated with a new method and purified by an anionexchange chromatography. At last the characterization of the recombinant parasporin-4 was shown. Keywords: Bacillus thuringiensis, brush border membrane vesicles, Cry1Ac, Cry protein, cytotoxic activity, d-endotoxin, inclusion body, parasporin, Plutella xylostella, surface plasmon resonance.

Abbreviations BBM BSA DTT EC50 EDTA GalNAc HPA MTT

brush border membrane bovine serum albumin dithiothreitol 50% effective concentration ethylenediaminetetraacetic acid N-acetyl-D-galactosamine hydrophobic association 3-(4,5-dimethyl-2-thiazolyl)-2,5-diphenyl-2H tetrazolium bromide PC14 1,3-ditetradecylglycero-2-phosphocholine PCR polymerase chain reaction SDS-PAGE sodium dodecyl sulfate-polyacrylamide gel electrophoresis SPR surface plasmon resonance.

Corresponding author: Tel.: +81-75-753-6266. Fax: +81-75-753-6265.

E-mail: [email protected] (K. Inouye). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00009-4

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

226 General introduction In 1901, a Japanese silkworm pathologist, Shigetane Ishiwata, isolated a unique bacterium as a pathogen of the sotto disease of the silkmoth larvae [1]. Later in 1915, the organism was described as Bacillus thuringiensis by E. Berliner [2]. B. thuringiensis is a Gram-positive endospore-forming bacterium which produces large crystalline parasporal inclusions in sporulating cells. The parasporal inclusions often contain one or more proteins that are toxic to insects. The toxic proteins in the parasporal inclusions are called d-endotoxin, and the d-endotoxin with hemolytic and non-hemolytic activity are called Cyt and Cry protein, respectively. The parasporal inclusions are ingested by insect larvae and the proteins are solubilized and converted into toxins by midgut proteases of susceptible insects [3]. They are highly and specifically toxic to insect pests of the Lepidoptera, Diptera, and Coleoptera [4], but are not pathogenic to mammals, birds, amphibians, or reptiles [5]. This makes B. thuringiensis, a promising microbial agent in the control of insect pests in agriculture, forestry, veterinary, and public health management [5]. Now the B. thuringiensis toxins are classified and designated by the Bacillus thuringiensis d-endotoxin nomenclature committee (http://www.biols. susx.ac.uk/home/Neil_Crickmore/Bt/) based solely on amino acid identity. Meanwhile non-insecticidal B. thuringiensis strains are more widely distributed than insecticidal ones [6]. Mizuki et al. have previously reported that human cancer cell-killing activity is associated with some noninsecticidal B. thuringiensis isolates [7,8], and have created a new category of protein, parasporin, defined as bacterial parasporal proteins that are capable of preferentially killing cancer cells [8]. Parasporal inclusion of B. thuringiensis A1470 strain (Fig. 1) also exhibits strong cytotoxicity against human leukemic T-cells (Fig. 2) when activated by protease treatment, although it did not exhibit insecticidal or hemolytic activities [9], and we anticipated the strain produces multiple cytotoxic proteins with similar molecular masses [10]. In 2005, the cytotoxic protein of the strain was identified [11] and designated parasporin-4 by the committee of parasporin classification and nomenclature (http://parasporin.fitc.pref.fukuoka.jp/). And it was also designated Cry45Aa by the Bacillus thuringiensis d-endotoxin nomenclature committee. To date, including the parasporin-4, 13 parasporins have been identified from non-insecticidal B. thuringiensis strains (Table 1). Each parasporin has a specific and distinct target spectrum against mammalian cells. This raises the possibility that we could screen novel proteins cytotoxic to specific cancer cells. And strict cytotoxic specificity toward target cells suggests the presence of a receptor-like protein or lipid at the surface of the sensitive cells. Thus, identification of the cell receptor might provide some insight into the mechanism of target specificity and cytotoxicity of parasporins.

227

Fig. 1. B. thuringiensis A1470 strain.

Fig. 2. Cytopathic effect of inclusion bodies produced by B. thuringiensis A1470

strain against MOLT-4 cells. The cells were incubated with the proteinase K-treated protein at 371C for 0 h (A), 3 h (B), and 24 h (C) and visualized by phase-contrast microscopy. Bar indicates 50 mm.

Previously, the crystal proteins produced by B. thuringiensis strains have been attracting the attention only about their insecticidal activity. But in recent years, some new activities, anti-cancer cell activity, antitrichomonal activity [18], and lectin activity [19] were discovered in the crystal proteins of B. thuringiensis (Fig. 3). Especially, the anti-cancer cell activity should be

228 Table 1. The members of the parasporin family. Name

Cry numbera

Accession number

Authorsb

Yearc

Source strain

Parasporin-1Aa1 Parasporin-1Aa2 Parasporin-1Aa3 Parasporin-1Aa4 Parasporin-1Aa5 Parasporin-1Ab1 Parasporin-1Ab2 Parasporin-1Ac1 Parasporin-2Aa1 Parasporin-2Ab1 Parasporin-3Aa1 Parasporin-3Ab1 parasporin-4Aa1

Cry31Aa1 Cry31Aa2 Cry31Aa3 Cry31Aa4 Cry31Aa5 Cry31Ab1 Cry31Ab2 Cry31Ac1 Cry46Aa1 Cry46Ab1 Cry41Aa1 Cry41Ab1 Cry45Aa1

AB031065 AY081052 AB250922 AB274826 AB274827 AB250923 AB274825 AB276125 AB099515 AB186914 AB116649 AB116651 AB180980

Mizuki et al. [8] Jung and Coˆte´ [12] Uemori et al. [13] Yasutake et al. [14] Yasutake et al. [14] Uemori et al. [13] Yasutake et al. [14] Yasutake et al. [14] Ito and Kitada [15] Yamagiwa et al. [16] Yamashita et al. [17] Yamashita et al. [17] Okumura and Saitoh [11]

2000 2002 2006 2006 2006 2006 2006 2006 2004 2004 2005 2005 2004

A1190 M15 B195 Bt 79–25 Bt 92–10 B195 Bt 31–5 Bt 87–29 A1547 TK-E6 A1462 A1462 A1470

Source: The table was reprinted from a web site of the committee of parasporin classification and nomenclature (http://parasporin.fitc.pref.fukuoka.jp/). a For reference, see a web site of the Bacillus thuringiensis d-endotoxin nomenclature committee (http://www.biols.susx.ac.uk/home/Neil_Crickmore/Bt/). b Authors registered in GenBank. c Year of registration in GenBank.

Fig. 3. A conceptual diagram of the present and previous B. thuringiensis crystal protein world. (The color version of this figure is hosted on Science Direct.)

229 highly potent, and more investigations are needed to apply parasporins to a cancer therapy or diagnosis. In this overview, we first describe a novel screening method using the surface plasmon resonance (SPR)-based optical biosensor [20]. By the system, interactions between the inclusion proteins of B. thuringiensis and brush border membrane (BBM) vesicles of insects could be measured in real time as total mass changes. We then go on to show identification and cloning of the parasporin-4 [11] and an effective solubilization and purification method of the cytotoxic protein [21]. At last, we also show the characterization of the parasporin-4 [11,21]. The parasporin-4 revealed selective cytotoxic activities against various mammalian cell lines, but exhibited little or nothing against four normal tissue cells [11]. Interaction between a d-endotoxin of B. thuringiensis and the artificial phospholipid monolayer incorporated with BBM vesicles of Plutella xylostella by SPR-based optical biosensor B. thuringiensis is well-known as an insecticidal bacterium, however, noninsecticidal B. thuringiensis strains are more widely distributed than insecticidal ones [6]. Hence it is necessary to screen for insecticidal strains of B. thuringiensis against certain insect pests from various natural environments. Bioassay with target insects would be the primary method for screening of B. thuringiensis toxins. However, it is often difficult to apply the bioassay to the development of a new B. thuringiensis insecticide, because breeding techniques for many pests have not yet been established. Thus a novel screening method using an SPR-based optical biosensor was proposed [20]. The SPR is used for an optical detection system that allows direct interaction analysis between a ligand immobilized on a sensor chip and a specific analyte in a continuous flow [22]. In this system, one of the reactants is immobilized to a monolayer of alkane thiol on the surface of a gold sensor chip. The angle of polarized light reflected from the surface of the chip changes according to total mass attached to the sensor chip. These changes are detected by a diode array. By linking this system to a personal computer, complex association and dissociation can be measured in real time. Interactions between the toxins and high-affinity binding sites on the midgut epithelium of the susceptible insect species are a major determinant for the specificity of the insecticidal proteins. The importance of midgut target sites was shown in the in vitro binding studies using BBM of the insect midgut [23]. Recently the SPR technique has been applied to examine interactions of B. thuringiensis toxins with both insect midgut BBM vesicles [22,24] and a purified toxin receptor [25]. In the former case, the membrane vesicles were immobilized on the sensor chip using immobilized antibodies against either avidin or biotin. The binding of the toxin to a receptor protein embedded in a model membrane environment containing egg yoke

230 L-a-phosphatidylcholine was also reported [26]. The previous studies have demonstrated that multiple toxin-binding proteins exist in the midgut BBM of susceptible insect larvae, and the interaction between the toxin and the receptor is a complex process. In this section, we introduce the SPR analysis of interactions between B. thuringiensis Cry1Ac d-endotoxin and a BBM from its susceptible insect, diamondback moth, Plutella xylostella [20]. The monolayer of the BBM and an artificial phospholipid was spontaneously reconstructed on the hydrophobic surface of the sensor chip HPA. And the amount of the toxin binding to the BBM showed a saturation curve of the Michaelis-Menten type against toxin concentration. It is demonstrated that the monolayer newly introduced in this study provides us with an insect-free rapid and large-scale screening system of B. thuringiensis insecticide proteins by the aid of the SPR-based biosensor technology. And the reconstituted monolayer system will be also useful for screening of the parasporins. Formation of the reconstructed monolayer The BBM vesicle of diamondback moth (P. xylostella) was isolated from midguts of late-instar larvae by selective magnesium precipitation [27]. The isolated BBM vesicle was diluted with a neutral-charged artificial lipid, 1,3-ditetradecylglycero-2-phosphocholine (PC14, phase transition temperature ¼ 251C) [28] at a PC14/BBM protein ratio of 100:5 (w/w). The mixture was sonicated for 1 min three times and passed through polycarbonate membrane filter (pore size: 100 nm). And reconstruction of the monolayer was achieved on the hydrophobic association (HPA) chip using an SPR detector system, BIAcore 1000. The amount of the mixture bound to the chip showed the resonance response of (2,620775) RU. Followed by the binding of the mixture, bovine serum albumin (BSA) solution was injected to assess the extent of coverage of the surface. The resonance response of the BSA solution increased by (11276) RU, and was an adequate small amount to the coverage. When the BBM was loaded alone on the chip, it was almost washed out by the 4 mM NaOH injection of the regeneration step, and the final binding amount measured was only 200 RU. This suggested that the BBM was considerably unstable on the HPA chip. For the examination on the interaction of the activated Cry1Ac toxin to the BBM, the PC14/BBM monolayer reconstructed on the HPA chip was used as the sample chip. When observed with an electron microscope stained with uranyl acetate, amphiphile PC14 formed well-developed vesicles with a diameter of 140–200 nm (Fig. 4C). In contrast, BBM from P. xylostella formed large irregular-shaped aggregates with various sizes (Fig. 4A). When the BBM was mixed with PC14, vesicles with a diameter of 80–130 nm were formed (Fig. 4B). The morphological observations revealed that the BBM were

231

Fig. 4. Electron micrographs of the BBM vesicles. (A) BBM of the diamondback

moth, (B) the mixture of PC14 and BBM, and (C) PC14. The samples were stained with uranyl acetate and subjected to electron microscopic examination. (Reprinted from reference [20].)

incorporated in the phospholipid PC14 vesicles. The ratio of [phospholipid]/ [protein] in the BBM was approximately 1:1 (w/w) because of this low value of the phospholipid ratio against the BBM protein may account for the failure of BBM to form typical liposome vesicles. This may lead to instability of the BBM monolayer on the HPA chip. Binding of the Cry1Ac toxin to the reconstructed monolayer Cry1Ac is an insecticidal protein produced by B. thuringiensis strain HD-73, the type strain of serovar kurstaki, which has an activity against the diamondback moth larvae. Inclusion body of the strain was solubilized in 50 mM sodium carbonate buffer (pH 10.5) and activated with proteinase K. And the activated Cry1Ac was delivered to the reconstructed PC14/BBM monolayer to determine the levels required for ligand saturation. The relative resonance response increased with increasing the initial Cry1Ac toxin concentration ([toxin]), indicating that the amount of the toxin bound ([relative response]) to the reconstructed monolayer increased with an increase in [toxin]. The relative resonance response meaning the amount of the toxin binding showed a saturation curve of the Michaelis-Menten type against [toxin] (Fig. 5A). The double reciprocal plot ([toxin]1 vs. [relative response]1) showed a straight line, and the affinity constant of the toxin to the monolayer was evaluated to be 3.1 mM, and the maximum resonance response was 170 RU (Fig. 5B). Subsequently, N-acetyl-D-galactosamine (GalNAc) was examined for its ability to reduce the binding of the Cry1Ac toxin to the reconstructed monolayer. GalNAc inhibited the toxin binding to the reconstructed

232

Fig. 5. Relationship between the Cry1Ac toxin concentration bound to the reconstituted monolayer and the initial concentration of the Cry1Ac toxin. (A) The Michaelis-Menten type relationship between the resonance responses at the saturated level obtained at 600 s and the initial toxin concentration, [toxin]. The relative resonance response is proportional to the concentration of the toxin bound onto the reconstituted monolayer, [relative response]. (B) The double reciprocal plot between [relative response] and [toxin]. The affinity constant between the toxin and the monolayer was evaluated to be 3.1 mM and the maximum value of the resonance response was 170 RU. (Reprinted from reference [20].)

monolayer. The inhibition was dose-dependent, and 89% of the toxin binding was inhibited at 80 mM GalNAc. The inhibitor constant of GalNAc against the monolayer was determined to be 8 mM. The results obtained in our present detecting system are in good agreement with the previous observations that the Cry1Ac toxin molecules bind to receptors on the lepidopteran epithelium cell membrane through GalNAc [25,26,29].

233 Analysis of the SPR data It has been reported that Cry toxins generate pores in cell membrane by electrophysiological and biochemical studies [30]. The action of the toxin includes two steps; namely, the binding of the toxins to the receptors on the cell membrane, and the subsequent penetration of the toxins into the cells. Thus, the sensorgrams with the two-state reaction model was analyzed. Data for the binding of the Cry1Ac, corrected for bulk reflective index changes, fitted well to the model; the standard errors for kinetic constants calculated gave an overall w2 of 1.8 for the simultaneously fitting of five binding curves. The binding of the toxin occurred with an affinity constant (K1) of 0.51 mM, followed by a second slower event with a constant (K2) of 0.47. The latter was a uni-molecular event that is concentration-independent, thus the equilibrium constant is dimensionless. The overall affinity constant (Kd) defined as the product of K1 and K2, was calculated to be 0.24 mM. The affinity constants obtained in the assay system by the BBM/PC14 were 8.8–120-fold greater than those reported by the other workers [31–34]. This difference is apparently due to the difference in the methods used; most of the other studies have been done with the competitive binding assay using radiolabeled Cry1Ac toxin. In SPR experiments, an affinity constant (Kd) of 7 nM was reported previously with the Cry1Ac toxin and diamondback moth BBM vesicles [24], and this value is 34-fold smaller than that obtained in the introducing study. At present, the cause of this difference is uncertain. It should be noted, however, that the Kd value (0.24 mM) observed here is on the level of those (94.5–346 nM) reported by the previous workers in SPR studies with the Cry1Ac toxin and BBM vesicles of other lepidopterans, Manduca Sexta, and Heliothis virescens [25,26,29]. It was reported that when analyzed with a two-state reaction algorithm, the interaction between the Cry1Ac toxin and a lepidopteran receptor involved two steps, the second step having an affinity constant (K2) of 2.79  102 [26]. This value is significantly lower than that obtained in the introducing study. The second step, the penetration of toxins into membranes, might be affected by the property of lipid. Thus, it is likely that the lower affinity observed in the introducing study might be caused by the use of an artificial lipid, PC14. It should be noted that the receptor protein of the B. thuringiensis insecticidal proteins is considered to be aminopeptidase N and/or cadherinlike protein that are bound on the midgut-cell membrane. The aminopeptidase N is a metallopeptidase containing a zinc atom essential for the enzyme activity. Aminopeptidase N of mammalian cells is known to participate in the processing of peptide hormones, and recently it is noticed as an immunological cell marker, CD13. The structural homology between aminopeptidase N of insects and mammals were reported, but the physiological roles of the insect enzyme are not well revealed. The SPR-based

234 biosensor method proposed in the present study for investigating the interaction between the insecticidal proteins and the BBM monolayer could be applicable to examine the proteolytic properties of the aminopeptidase N. This method might also provide us with a suitable tool to study the structure– function relationship of this enzyme veiled in mystery, and enables us to discuss this enzyme in comparison with the well-studied metalloproteinases, thermolysin [35,36], and matrix metalloproteinases [37,38]. Identification and cloning of two cytotoxic proteins produced by the B. thuringiensis A1470 strain B. thuringiensis A1470 strain (Fig. 1) belongs to serovar shandongiensis. Lee et al. reported that the parasporal inclusion of the strain exhibits strong cytotoxicity against human leukemic T-cells when activated by protease treatment, however, it did not exhibit insecticidal or hemolytic activities [9]. The native inclusion before proteinase treatment consists of five major proteins with molecular masses of 160, 60, 34, 32, and 16 kDa [9]. And proteinase K-treated inclusion consists of three major proteins with molecular masses of 43, 30, and 28 kDa [39]. They were subjected to anion-exchange chromatography and cytotoxic activity of each fraction was determined. And then, a 28-kDa protein processed by the protease has been considered to be cytotoxic [39]. Thereafter, we demonstrated that the strain produces at least two kinds of cytotoxic proteins with similar molecular masses of 28 kDa against human cancer cells [10]. In this section, we introduce the identification of two cytotoxic proteins produced by the A1470 strain [11]. As a result of homology searching for databases, one of them was certificated as a novel cytotoxic protein, and it was designated as Cry45Aa by the Bacillus thuringiensis d-endotoxin nomenclature committee, and parasporin-4 by the committee of parasporin classification and nomenclature. Purification of two cytotoxic proteins of B. thuringiensis A1470 strain B. thuringiensis A1470 strain was isolated from a soil sample collected in the city of Hino, Tokyo, Japan [9]. It had been concluded that the cytocidal activity of inclusions from B. thuringiensis A1470 was resided a 28-kDa protein based on the experimental data obtained by anion-exchange chromatography and sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) [39]. As a result of SDS-PAGE stained with Coomassie brilliant blue R250, a proteinase K-treated inclusion protein of the A1470 strain consisted of three major proteins with molecular masses of 40, 30, and 26 kDa. However, a slightly stained band was also observed at a little above the 26-kDa band. Then silver-staining clearly showed three distinct bands of 26, 27, and 28 kDa in that area (Fig. 6B,

235

Fig. 6. (A) Anion-exchange chromatography of the proteinase K-treated parasporal

inclusion proteins from the B. thuringiensis A1470 strain. (B) SDS-PAGE of proteins in the fractions of anion-exchange chromatography. Each lane was provided with 100 ng of proteins. Lane M indicates molecular size marker. Proteins were stained with silvers.

fraction No. 24–26). The fraction 24 contained 26 kDa band mainly, fraction 25 contained 26 kDa and 28 kDa bands, and fraction 26 contains 27 kDa and 28 kDa bands. The 27- and 28-kDa proteins in the fraction 26 were separated from each other by gel-filtration chromatography on Superdex 75 HR 10/30, and the 27-kDa protein only revealed the cytopathic effect against MOLT-4 cells. In the next place, the fraction 24, which mainly contained 26 kDa protein, was applied to gel-filtration chromatography. The 26-kDa protein was isolated and it exhibited cytotoxicity against both MOLT-4 and Jurkat cells. After this manner, two cytotoxic proteins have been confirmed. Identification of a novel cytotoxic protein, parasporin-4 At first, identification of the 27-kDa cytotoxic protein was attempted. Using N-terminal and internal amino acid sequences, the gene encoding the

236 protein was obtained from the strain. The gene cloned was 828 bp long, encoding a polypeptide of 275 amino acid residues with a predicted molecular weight of 30,078, and the nucleotide sequence obtained in the study has been deposited in the GenBank database (accession number AB180980). Neither typical promoter nor ribosome binding site was identified upstream. The inverted repeat sequence, presumably capable of forming a stem-loop structure (DG ¼ 67.8 kJ/mol), was identified 136 bp downstream from the stop codon. This structure may act as a transcriptional terminator. When deduced from the gene sequence, amino acid sequence from the position 3 to 11 of the N terminus of the peptide was identical to that (IINLANELA) of the protein. The internal sequence (ADIAGH) of the protein was found in the deduced peptide sequence at positions 173 through 179. The cloned gene was expressed in recombinant E. coli, in which the protein was accumulated as an inclusion body. The recombinant protein showed strong cytotoxicity to MOLT-4 and CACO-2 cells at a final concentration of 2 mg/ml. The 27-kDa protein and the recombinant protein could be judged to be the same substance by the method of SDS-PAGE, Western blotting, and MALDI-TOF MS spectrometry. The amino acid sequence of the protein was compared with those of the other known proteins in Swiss-Prot and PIR-Prot databanks. No significant homologies (o30%) were observed on the sequence between the protein and the existing proteins including B. thuringiensis Cry and Cyt proteins. Thus the protein was designated as Cry45Aa by the Bacillus thuringiensis d-endotoxin nomenclature committee, and parasporin-4 by the committee of parasporin classification and nomenclature. Proteolytic processing of the parasporin-4 The gene for full-length pro-parasporin-4, which is the precursor of the activated protein, encodes a polypeptide of 275 amino acid residues with a predicted molecular mass of 30,078. The pro-parasporin-4 produced by the recombinant E. coli had no apparent cytocidal activity without proteinase K-treatment, and the native parasporal inclusions of B. thuringiensis A1470 also only exhibited cytotoxic activity when activated by protease treatment [9]. Thus, proteolytic processing was essential for activation of the cytotoxic protein. MALDI-TOF MS analysis of the proteinase K-treated recombinant parasporin-4 estimated the molecular mass as 26,808. The N-terminal amino acid sequence of the protein was AIINLANELA and was completely identical to Ala2–Ala11 of the pro-parasporin-4. It is likely that the proteinase K-treated parasporin-4 corresponds to amino acid residues Ala2 to Arg246 of the precursor, as a predicted molecular mass of such a protein 26,828 is close to that measured for proteinase K-treated parasporin-4 26,808.

237 Cry46Aa-like toxin of A1470 strain And then, identification of the cytotoxic 26-kDa protein was attempted. Its N-terminal amino acid sequence was determined to be QSTTDVIREY. This sequence perfectly corresponded to the sequence from Gln48 to Tyr57 of cytotoxic protein from the B. thuringiensis A1547 strain [15], which has been designated as parasporin-2. To compare the 26-kDa protein of the A1470 strain with the parasporin-2, a polymerase chain reaction (PCR) was performed from total plasmids of A1470 as the template with the primers to clone the precursor of the parasporin-2. A 1,000-bp fragment was amplified by PCR. Sequence analysis verified that this amplified DNA fragment differed from the parasporin-2 by 8 bp and the deduced amino acid sequence from this amplified fragment also differed by 4 amino acid residues. As just described, the 26-kDa protein of A1470 strain and the parasporin-2 were almost identical. It was also confirmed that the 26-kDa protein of the A1470 reacted with antiserum against the parasporin-2 by Western blotting analysis. Efficient solubilization, activation, and purification of recombinant parasporin-4 of the B. thuringiensis A1470 strain expressed as inclusion bodies in E. coli Parasporin-4 is one of the cytotoxic proteins produced by B. thuringiensis A1470 strain. A gene coding for the pro-parasporin-4 was expressed in recombinant E. coli, in which the protein was accumulated as an inclusion body. It was solubilized in 50 mM sodium carbonate buffer, pH 10.5, containing 1 mM ethylenediaminetetraacetic acid (EDTA) and 10 mM dithiothreitol (DTT), and the 30-kDa precursor protein was degraded to approximately 27 kDa by proteinase K-treatment. The processed protein exhibited high cytotoxic activity against MOLT-4 and CACO-2 cells. However, the protein concentration of the processed parasporin-4 by the preparing method above was up to 20 mg/ml [21]. Thus, higher concentration has been required to improve the study of the protein. There were no reports except for a report of Koller et al. [40] demonstrating that the native inclusion bodies of B. thuringiensis were solubilized in an acidic or a neutral solution. Digestive juice of most insects is under an alkaline condition [3], and thus the native parasporal inclusions are generally solubilized in an alkaline buffer or insect gut extracts in vitro [5]. Additionally, inclusion bodies of recombinant B. thuringiensis insecticidal proteins in E. coli were also solubilized in alkaline conditions. Generally, high-level expression of recombinant proteins in E. coli often results in the formation of insoluble and inactive aggregate known as inclusion bodies [37]. They occur by deposition of misfolding or partially folded polypeptides through the expression of hydrophobic patches and the consequent intermolecular interactions [41]. Therefore, the purified inclusion bodies

238 must be solubilized by strong denaturants such as 6 M guanidine hydrochloride or 8 M urea, which promote the disruption of intermolecular interactions and the complete unfolding of the protein [42]. And, consequently, the solution is removed excess denaturants by dilution or a buffer exchange step [43] due to renaturation of denatured protein. Insecticidal proteins of B. thuringiensis were also expressed in E. coli, and often resulted in formation of inclusion bodies. They were usually fed to susceptible insect larvae directly for their bioassay and not necessary to be solubilized in this case. However, they were also solubilized in alkaline conditions if need arose, and they often revealed their own activity without denaturation and renaturation steps [44,45]. As just described, in most studies regarding B. thuringiensis the inclusion bodies of either native or in E. coli have been solubilized in alkaline solutions. In this section, we introduce an efficient solubilization and activation method of the recombinant pro-parasporin-4 [21]. The inclusion body of the protein was solubilized in a hydrochloric acid solution and activated by pepsin. The pepsin-treated parasporin-4 was purified in one-step purification by cation-exchange chromatography. Cytotoxic activities of the pepsintreated and the proteinase K-treated parasporin-4 were substantially equal. This enables us to prepare the purified parasporin-4 in higher yield in comparison with that by the solubilization in alkaline pH and proteinase K-treatment.

Solubility of the recombinant pro-parasporin-4 in various solutions Inclusion bodies of the pro-parasporin-4 produced by the recombinant E. coli BL21 (DE3) cells were mixed with each buffer solution and incubated at room temperature for 30 min. Figure 7 shows protein concentrations of the saturated solution of the pro-parasporin-4 in various buffer solutions. Solubility of the protein tended to be higher in both alkaline and acidic regions, and lower in the neutral pH region. In 50 mM sodium carbonate buffer (pH 10.5) containing 10 mM DTT and 1 mM EDTA (designated as buffer A), which has been used in many previous study to dissolve the inclusion bodies of the B. thuringiensis, the protein was solubilized at the highest concentration in the buffers examined except for 10 mM and 100 mM HCl. The solubility was approximately half when DTT was removed from the buffer A. A solubility of the protein in HCl was determined in a broad range depending on HCl concentration. In 1 M HCl, the solubility was as low as that in the buffers at neutral pH. On the other hand, both in 100 mM and 10 mM HCl, the solubility was greatly improved. Especially, in 10 mM HCl, the solubility was approximately 10.8 mg/ml, being 25 times higher than that in the buffer A.

239

Fig. 7. Solubility of the recombinant pro-parasporin-4 in various buffer solutions.

Purified inclusion bodies of the pro-parasporin-4 expressed in E. coli were saturated in each buffer. The protein concentration of the solution was determined by the BCA protein assay method except for the solution containing with dithiothreitol, which was determined by the method of Bradford. Shaded bars represent the solubility in the buffers generally used in the studies to dissolve inclusion bodies of the B. thuringiensis. (Reprinted from reference [21].)

Koller et al. previously reported about the effect of pH on the solubilization of the native parasporal inclusion protein of B. thuringiensis var. san diego [40]. The native crystals from the strain was dissolved well in the universal buffers at above pH 10 and below pH 4, but it was almost remained in crystal form between pH 5 and pH 9.5. The pH-dependence of the solubility profile of the native inclusion protein of the strain was similar to that of the recombinant pro-parasporin-4 in E. coli. Cytotoxic activity of the recombinant parasporin-4 processed by pepsin or proteinase K Activated parsporin-4 was prepared in an alkaline or an acidic condition, and their activities were compared. In alkaline condition, the inclusion bodies were solubilized in 50 mM sodium carbonate buffer, pH 10.5, containing 1 mM EDTA and 10 mM DTT, and solubilized pro-parasporin-4 was treated by proteinase K at a final concentration of 50 mg/ml for 90 min. In an acidic condition, they were solubilized in 10 mM HCl, and were treated with pepsin 1:10,000 (Wako Pure Chemicals) at a final concentration of 200 mg/ml. Figure 8A shows dose-response curve against CACO-2 cells of the parasporin-4 activated by proteinase K (closed circles) and by pepsin (open circles). The cytotoxic activities of the samples were measured by the MTT

240

Fig. 8. Cytotoxic activities and SDS-PAGE profiles of the recombinant parasporin-4

processed by pepsin or proteinase K. Inclusion bodies of parasporin-4 were solubilized in sodium carbonate buffer (pH 10.5) and activated by proteinase K (A: closed circles, B: lane 1), or they were solubilized in 10 mM HCl and activated by pepsin (A: open circles, B: lane 2). Cell survival rates against CACO-2 cells of both the processed samples were determined by MTT assay (A). The samples were subjected to SDSPAGE and stained with silver (B). 150 ng was applied to lanes 1 and 2. Lane 3 indicates blank sample of lane 1 containing proteinase K at an identical dilution ratio with lane 1. Blank of lane 2 is not shown because pepsin was not analyzed by the SDSPAGE. Lane M indicates molecular size marker. Protein concentration of the solution was determined by the method of Bradford. (Reprinted from reference [21].)

[3-(4,5-dimethyl-2-thiazolyl)-2,5-diphenyl-2H tetrazolium bromide] method [46,47], and 50% effective concentration (EC50) for cytotoxicity was decided by probit analysis [48]. The EC50 values obtained at 20 h after the administration of the parasporin-4 activated by proteinase K and by pepsin were 0.369 and 0.331 mg/ml, respectively. Both cytotoxic activities were substantially equal. The cytotoxic activity of the protein solubilized in 10 mM HCl and then activated by proteinase K was also determined. In the case, the solution was adjusted to pH 9.4 with 50 mM sodium carbonate before activation, because proteinase K does not activate at acidic pH. The EC50 value of the protein was 0.351 mg/ml, which was almost identical to those of two proteins aforementioned. Figure 8B shows SDS-PAGE profiles of the proteinase K-treated parasporin-4 and the pepsin-treated one. The pepsin-treated protein included only one major band because pepsin was not analyzed by the SDS-PAGE. The bands of the two kind of the processed parasporin-4 located in approximately 27 kDa together, but the band of the pepsin-treated protein was a little upper than that of the proteinase K-treated one. Molecular masses of the pepsin-treated and the proteinase K-treated parasporin-4 were determined by MALDI-TOF mass spectrometer as 27,466 (Fig. 9B) and

241

Fig. 9. Protein profiles of samples in each step of purification of the pepsin-treated

parasporin-4. (A) Samples were subjected to SDS-PAGE and stained with silver. 150 ng was applied to all lanes. Lane 1, the inclusion bodies of pro-parasporin-4 in E. coli solubilized in 10 mM HCl; Lane 2, the proteins which was solubilized in 10 mM HCl and processed with pepsin (activated solution); Lane 3, the pepsintreated parasporin-4 purified by cation-exchange chromatography (purified solution); Lane M, molecular size marker proteins. (B) Results of MALDI-TOF MS analysis of the activated and the purified solutions. Relative intensities to the peak of parasporin-4 (27,466) were represented. (Reprinted from reference [21].)

26,808 [11], respectively. However, the difference of 658 between molecular masses of these proteins had no effect on their cytotoxic activities against CACO-2 cells. The N-terminal amino acid sequence of the pepsin-treated protein was AIINLANELA. It was completely identical to that of the proteinase K-treated protein [11]. They are also identical to Ala2–Ala11 of the pro-parasporin-4 [11]. It is likely that the pepsin-treated parasporin-4 corresponds to amino acid residues Ala2 to Glu252 of the precursor, as a predicted molecular mass of such a protein 27,469 is close to that measured for the protein 27,466, which was 6 amino acids residues longer than the proteinase K-treated protein. Purification of the pepsin-treated parasporin-4 from inclusion bodies in E. coli Then the recombinant parasporin-4 activated in an acidic condition mentioned above was purified by Resource S cation-exchange column (size: 6.4 mm inner diameter  30 mm length: GE Healthcare). The activated parasporin-4 was subjected to the column, and non-cytotoxic proteins or peptide fragments by

242 pepsin-treatment were eluted with an increasing gradient of NaCl. Then the running buffer was changed from 20 mM glycine buffer (pH 3.0) to 10 mM 2-aminoethnol buffer (pH 10.8) and the fraction of the major peak appeared at 20 ml in elution volume after the buffer change only revealed cytotoxic activity. The purification procedure was summarized in Table 2. Relative activity of the purified protein was approximately 32 and 3.9-fold higher than the solubilized solution and the activated solution, respectively. The concentration of the purified protein was 539 mg/ml (Table 2), which was 27-fold higher than that of the activated parasporin-4 by the previous method. Figure 9A shows result of SDS-PAGE of the samples of each purification step. Although the activated solution (lane 2) seemed to include one band in SDS-PAGE, many peaks were included by the analysis using MALDI-TOF MS (Fig. 9B, upper stand), and the highest relative intensity attained 1,500% to a 27,466 peak of the pepsin-treated parasporin-4. It was considered that the many intensive peaks in the region up to 5,000 mass/charge would be short peptide fragments generated by activation. Although most of them were removed by the step of cation-exchange chromatography, they could not be eliminated entirely from the sample. A 34,588 peak of MALDI-TOF MS analysis in Fig. 9B seemed to be pepsin. The pepsin cannot be sufficiently analyzed by SDS-PAGE, so the band of the pepsin was not observed in the activation sample (Fig. 9A, lane 2). Most of the pepsin was considered to be removed by the cation-exchange chromatography step according to the result of MALDI-TOF MS analysis, although the analysis is not a quantitative one by nature. Furthermore, by the analysis the molecular mass of the pepsin-treated parasporin-4 was 27,466 (Fig. 9B). The peak of approximately 13,740 also seemed to be the parasporin-4 which is ionized bivalent. The SDS-PAGE analysis of the purified solution showed two obvious bands at 27 kDa and above 250 kDa (Fig. 9A, lane 3). The band above 250 kDa would be an aggregate of the pepsin-treated parasporin-4, because it reacted with rabbit antiserum against parasporin-4 by Western blotting. The aggregate was not observed by gelfiltration chromatography, therefore, it was considered to be produced during the procedure of the SDS-PAGE analysis. The solubilized native parasporal inclusions from B. thuringiensis A1470 without proteolytic process was non-cytotoxic [9], however, the proparasporin-4 showed weak activity (EC50 ¼ 17.7 mg/ml) against CACO-2 cells in the study (Table 2, solubilized solution). It seemed to be because of a higher concentration of the pro-parasporin-4 by an effective solubilization and processing by the endogenous proteinase. The EC50 value of the pepsin-treated parasporin-4 against CACO-2 cells was 0.555 mg/ml (Table 2) by the BCA protein assay method [49]. On the other hand, that of the proteinase K-treated protein was reported to be 0.124 mg/ml [11] by the method of Bradford [50]. Although the EC50 values of

10.0 10.2 3.5

Volume (ml) 1180 920 539

Protein concentration (mg/ml) 11.8 9.02 1.89

Protein (mg) 17.7 2.15 0.555

EC50 (mg/ml)

1.00 8.22 31.9

Relative activity (fold)

100 76 16

Recovery (%)

Source: Reprinted from reference [21]. Note: The protein was obtained from 40 mg (wet weight) of purified inclusion body in E. coli, and solubilized in 10 mM HCl at 371C for 1 h (solubilized solution) and processed with pepsin 1:10,000 (Wako Pure Chemicals) at a final concentration of 200 mg/ml for 90 min at 371C (activated solution). The solution was subjected to cation-exchange chromatography on a Resource S column (GE Healthcare), and then the eluted parasporin-4 was collected (cation-exchange chromatography). Protein concentration was measured by the BCA protein assay method with BSA as the standard. The cytotoxic activity of each sample against CACO-2 cells was determined by the MTT method, and EC50 values were decided by probit analysis.

Solubilized solution Activated solution Cation-exchange chromatography

Purification step

Table 2. Purification of the pepsin-treated recombinant parasporin-4 of B. thuringiensis expressed in E. coli.

244 the parasporin-4 were different depending on the protease-activation method, the cytotoxic activities of both the pepsin-treated and the proteinase K-treated parasporin-4 were considered substantially equal under the same experimental conditions (Fig. 8). Generally, it is considered that inclusion bodies in E. coli are the results from misfolding or partially folding through the expression [41], thus they are inactive [51] and must need a renaturation step [43]. On the other hand, although B. thuringiensis insecticidal proteins are expressed in native parasporal inclusion bodies by nature, they keep their activities [5]. In many cases, inclusion bodies of insecticidal proteins expressed in E. coli were also active without a renaturation step [44,45]. The inclusion body of recombinant parasporin-4 in E. coli also keeps its activity as they are. Characterization of a novel cytotoxic protein, parasporin-4, produced by the B. thuringiensis A1470 strain At present, 13 parasporins are reported, but the number of primary rank is only 4 (Table 1), and the parasporin-4 is the latest primary rank of all parasporins. In the nomenclature scheme of the parasporin, the same system of numbers and letters for Cry proteins of B. thuringiensis [52] is adopted so that a novel parasporin protein is assigned to a new class incorporating four ranks such as parasporin1Aa1. Currently, approximately 95%, 78%, and 45% sequence identities are the borders of the four ranks. Each parasporin has a specific and distinct target spectrum against mammalian cells. And selective cytotoxic activity of the parasporin-4 was also determined [11]. In this section, we introduce the characterization of the parasporin-4 excerpt from two studies, references [11] and [21]. Morphological changes in cultured cells by the recombinant parasporin-4 protein Morphological changes in two cultured cells, MOLT-4 and CACO-2, induced by the addition of the proteinase K-treated recombinant parasporin-4 were observed by a phase contract microscope. The parasporin-4 showed strong cytotoxicity in the early phase to both cells. Within 10 min after administration of the cytotoxic protein, nuclear condensation started to occur, and appeared distinctly within 1 h. Ballooned cells appeared 2 h after exposure to the cytotoxic protein, and burst within 24 h to lead to the cell death. No morphological change was observed in resistant cells [11]. Cytotoxic activity of parasporin-4 against various mammalian cells The dose-response of various cultured mammalian cells to the proteinase K-treated recombinant parasporin-4 protein was monitored by MTT assay,

245 Table 3. Cytotoxic activity of the recombinant parasporin-4 against various cultured mammalian cells. EC50 values (at 20 h) were calculated by probit analysis. Name

Origin

Cell lines derived from normal human tissues T cell Normal T cell UtSMC Normal uterus HC Normal hepatocyte MRC-5 Normal embryonic lung fibroblast Cell lines derived from human neoplasm MOLT-4 Leukemic T cell U-937 DE-4 Lymphoma cell Jurkat Leukemic T cell HL60 Promyelocytic leukemia cell K562 Myelogenous leukemia cell HeLa Uterus cervix cancer TCS Uterus cervix cancer Sawano Uterus cancer HepG2 Hepatocyte cancer A549 Lung cancer CACO-2 Colon cancer

Medium

EC50 (mg/ml)

RPMI1640+10% FBS Sm-BM+5% FBS CS-C+10% FBS MEM+10% FBS

W2 W2 W2 W2

RPMI1640+10% FBS RPMI1640+10% FBS RPMI1640+10% FBS RPMI1640+10% FBS RPMI1640+10% FBS MEM+10% FBS MEM+10% FBS MEM+15% FBS DMEM+10% FBS DMEM+10% FBS MEM+20% FBS+1% NEAA

0.472 0.980 W2 0.725 W2 W2 0.719 0.245 1.90 W2 0.124

Cell lines derived from mammalians beyond human Vero Kidney cell, monkey MEM+10% FBS COS-7 Kidney cell, monkey DMEM+10% FBS PC12 Pheochromocytoma, rat DMEM+10% FBS+10% HS NIH 3T3 Embryo cell, NIH Swiss mouse DMEM+10% FBS CHO Ovary cell, Chinese hamster DMEM+10% FBS

W2 W2 1.78 W2 W2

and the EC50 values obtained 20 h after administration are shown in Table 3. Cytotoxicity varied greatly amongst the different cells. The recombinant protein was highly cytotoxic to CACO-2, Sawano, MOLT-4, TCS, and HL60 cells, with EC50 values in the range 100–800 ng/ml. It was moderately cytotoxic to U-937 DE-4, PC-12, and HepG2 cells. On the other hand, Jurkat, K562, normal T, HeLa, UtSMC, HC, A549, MRC-5, Vero, COS-7, NIH3T3, and CHO cells were resistant. The EC50 values for all normal tissues were W2 mg/ml. Although we found no general rule for specificity of the cells and no common characteristics of sensitive or resistant cells, some cells derived from tumor tissues seemed more sensitive to the protein than those derived from normal tissues, as shown by the difference in cytotoxicity between Sawano and UtSMC. Strict cytotoxic specificity toward target cells suggests the presence of a receptor-like protein or lipid at the surface of the sensitive

246 cells. Thus, identification of the cell receptor might provide some insight into the mechanism of target specificity and cytotoxicity of the parasporin-4. The EC50 values in all of the normal cells (HC, UtSMC, MRC-5, and normal T-cells) were W2 mg/ml (Table 3). However, this result does not imply that the parasporin-4 is not cytotoxic to normal tissues. For example, 2 mg/ml parasporin-4 was weakly cytotoxic to UtSMC and normal T-cells [11]. The parasporin-4 exhibited the highest cytotoxic activity against CACO-2, which is derived from a human colonic adenocarcinoma cell line. CACO-2 is generally used as a substitute for human colon cells in various studies. The cytotoxicity of the parasporin-4 to CACO-2 cells may imply that it is toxic to normal colon cells. Therefore, it needs to be confirmed that the parasporin-4 toxin would not act on normal colon cells or normal human intestinal epithelial cells. Other toxins from microorganisms that act on CACO-2 cells were previously reported: the CytK toxin of Bacillus cereus, which causes severe food poisoning [53], a thermostable hemolysin of Vibrio parahaemolyticus [54], and a vulnificolysin-like cytolysin of Vibrio tubiashii [55]. Most of these toxins are derived from virulent microorganisms, hence the purposes of those studies were mainly to elucidate the mechanism of pathogenesis. Generally, it is considered that B. thuringiensis is not pathogenic [5]. However, different enterotoxins, including hemolytic [56] and non-hemolytic (GenBank accession no. AB099298) enterotoxins, have been found in B. thuringiensis or B. cereus. Additionally, B. thuringiensis is indistinguishable from B. cereus, a common food pathogen, except for the production of parasporal inclusion proteins [57]. Therefore, whether B. thuringiensis A1470 strain is pathogenic or not would be a subject of an interesting study. Stabilities and physical properties of the pepsin-treated parasporin-4 The 200 mg/ml solution of the purified parasporin-4 with distilled water was prepared from the lyophilized protein powder. The solution was diluted into the same buffers used in the solubility study (Fig. 7). Figure 10A shows relative EC50 values of the pepsin-treated parasporin-4 in the buffers at various pH after 3 days at 41C. The cytotoxic activity of the protein was decreased a little by preservation in the buffer of pH 2, 7.4, and 8. However, significant inactivation was not observed in the buffers examined. The activity of the lyophilized protein solubilized in 20 mM citrate  NaOH buffer (pH 4.0) and 20 mM glycine  NaOH buffer (pH 9.0) contained with 100 mM NaCl were determined over the long term (Fig. 10B). Each half-life of the activity was calculated by each regression line to be 365 and 152 days, respectively. The thermal stability of the protein from 301C to 601C was examined (Fig. 10C). The protein kept 97% of its activity at 301C for 360 min. On the other hand, it was fully inactivated at 501C for 150 min or 601C for 10 min.

247

Fig. 10. pH and thermal stabilities of the pepsin-treated parasporin-4. (A) The

lyophilized pepsin-treated parasporin-4 was dissolved in buffers prepared, and preserved for 3 days at 41C. Then the cytotoxic activities of samples against CACO-2 cells were determined. Relative EC50 values to the lyophilized protein dissolved in distilled water were represented. (B) Cytotoxic activity of the protein solutions stored at 41C was determined once a week for 70 days. The lyophilized powder was solubilized in 20 mM citrate  NaOH buffer (pH 4.0) contained with 100 mM NaCl (open circles and broken line) or in 20 mM glycine  NaOH buffer (pH 9.0) with 100 mM NaCl (closed circles and unbroken line). Relative EC50 values to that of each sample on the first day were represented. Unbroken and broken lines represent the regression line of each. (C) Thermal stability of the protein at 30 (closed squares), 40 (open squares), 50 (closed circles), and 601C (open circles) was examined. The purified protein was diluted into 20 mg/ml with distilled water. And the protein solutions were incubated from 5 to 360 min in a water bath, and were immediately cooled with iced water. 10 ml of the sample solution was added to CACO-2 cells prepared in a 96-well microtiter plate with 90 ml medium. Cell survival rate was determined by the MTT method. (Reprinted from reference [21].)

The physical properties of the protein were determined. Isoelectric point (pI) of the pepsin-treated parasporin-4 was determined as 7.7 by electrofocusing, and the value was close to calculated theoretical pI of 7.1 by the method of Bjellqvist et al. [58]. Absorbance at 280 nm of the 1 mg/ml pepsin-treated protein prepared from lyophilized powder was 1.80 and a molecular extinction coefficient at 280 nm (e280) of the protein was

248 calculated to be 4.94  104 M1 cm1. It was close to the predicted value of 4.41  104 M1 cm1, which could be forecasted from the number of tyrosine and tryptophan residues [59] present in the protein. Conclusion The binding of Cry1Ac, an insecticidal protein of B. thuringiensis, to a BBM isolated from midguts of the diamondback moth P. xylostella was examined by SPR-based biosensor. The BBM was mixed with PC14, a neutral-charged artificial lipid, and was reconstructed to a monolayer on a hydrophobic chip for the biosensor. Amounts of the toxin bindings to the monolayer showed a saturation curve of the Michaelis-Menten type against toxin concentration. And the binding of the Cry1Ac to the reconstructed monolayer was analyzed by a two-state binding model, and it was shown that Cry1Ac bound to the monolayer in the first step with an affinity constant (K1) of 508 nM, followed by the second uni-molecular step with an equilibrium constant (K2) of 0.472. The overall affinity constant Kd was determined to be 240 nM. The binding was markedly inhibited by GalNAc (Ki ¼ 8 mM). The monolayer was shown to retain a high affinity to Cry1Ac, providing an insect-free system for rapid and large-scale screening of B. thuringiensis insecticidal proteins by the SPRbased biosensor technology. Parasporal inclusion proteins produced by B. thuringiensis A1470 strain exhibit strong cytotoxicity against human leukemic T-cells when activated by protease treatment. Two cytotoxic proteins were separated by anionexchange chromatography and gel-filtration chromatography, and they were identified. One of them was confirmed as a novel cytotoxic protein, and designated parasporin-4. The other was determined to be parasporin 2-like protein, which is the cytotoxic protein produced by the B. thuringiensis A1547 strain. Pro-parasporin-4 gene was cloned and expressed in recombinant E. coli BL21(DE3), in which the protein was accumulated in an inclusion body. It was solubilized in sodium carbonate buffer and processed with proteinase K, and the activated recombinant parasporin-4 revealed cytotoxic activity against human leukemic T-cells. The efficient solubilization and activation method of the recombinant parasporin-4 from its inclusion body expressed in E. coli was introduced. The pro-parasporin-4 was solubilized in 10 mM HCl and activated by pepsin, and it revealed cytotoxic activity against CACO-2 cells equal to that of the protein prepared by conventional method. There are no other reports that the inclusion bodies of B. thuringiensis protein were solubilized in an acidic solution for its purification. By the new method, a concentration of the purified parasporin-4 was 27-fold higher than that of the protein by the previous method. The improvement of preparation method of the parasporin-4 might be useful for the other studies on the B. thuringiensis or recombinant proteins.

249 Cytotoxic activities of the parasporin-4 produced by recombinant E. coli against various mammalian cell lines were evaluated using the MTT assay. The protein exhibited high cytotoxic activity against CACO-2, Sawano, MOLT-4, TCS, and HL60 cells, and moderate activity against U-937 DE-4, PC12, and HepG2 cells. On the other hand, the EC50 values against Jurkat, K562, HeLa, A549, Vero, COS-7, NIH3T3, CHO, and four normal tissue cells (human primary hepatocyte cells, UtSMC, MRC-5, and normal T-cells) were W2 mg/ml. The cytotoxic activity of the purified protein was stable in broad pH region (pH 2.0–11.0) for 3 days, and 97% cytotoxic activity was remained after incubation at 301C for 360 min. The pI of the protein was determined as 7.7, and a molecular extinction coefficient at 280 nm of the protein was to be 4.94  104 M1 cm1. The molecular mass of the pepsintreated parasporin-4 was 2.75  104. References 1. Ishiwata S. On a kind of severe flacherie (sotto disease). I. Dainihon Sanshi Kaiho (Report of the Sericultural Association of Japan), 1901, No. 114, 1–5 (in Japanese). 2. Berliner E. U¨ber die schlaffsushut der mehlmottenraupe (Ephestia kuhniella Zell.) und ihren erreger Bacillus thuringiensis n. sp. Z Ang Entomol 1915;2:29–56. 3. Ho¨fte H and Whiteley HR. Insecticidal crystal proteins of Bacillus thuringiensis. Microbiol Rev 1989;53:242–255. 4. Beegle CC and Yamamoto T. History of Bacillus thuringiensis Berliner research and development. Can Entomol 1992;124:587–616. 5. Schnepf E, Crickmore N, Van Rie J, Lereclus D, Baum J, Feitelson J, Zeigler DR and Dean DH. Bacillus thuringiensis and its pesticidal crystal proteins. Microbiol Mol Biol Rev 1998;62:775–806. 6. Ohba M. Bacillus thuringiensis populations naturally occurring on mulberry leaves: a possible source of the populations associated with silkworm-rearing insectaries. J Appl Microbiol 1996;80:56–64. 7. Mizuki E, Ohba M, Akao T, Yamashita S, Saitoh H and Park YS. Unique activity associated with non-insecticidal Bacillus thuringiensis parasporal inclusions: in vitro cellkilling action on human cancer cells. J Appl Microbiol 1999;86:477–486. 8. Mizuki E, Park YS, Saitoh H, Yamashita S, Akao T, Higuchi K and Ohba M. Parasporin, human leukemic cell-recognizing parasporal protein of Bacillus thuringiensis. Clin Diagn Lab Immunol 2000;7:625–634. 9. Lee D-W, Akao T, Yamashita S, Katayama H, Maeda M, Saitoh H, Mizuki E and Ohba M. Noninsecticidal parasporal proteins of a Bacillus thuringiensis serovar shandongiensis isolate exhibit a preferential cytotoxicity against human leukemic T cells. Biochem Biophys Res Commun 2000;272:218–223. 10. Okumura S, Akao T, Higuchi K, Saitoh H, Mizuki E, Ohba M and Inouye K. Bacillus thuringiensis serovar shandongiensis strain 89-T-34-22 produces multiple cytotoxic proteins with similar molecular masses against human cancer cells. Lett Appl Microbiol 2004;39:89–92. 11. Okumura S, Saitoh H, Ishikawa T, Wasano N, Yamashita S, Kusumoto K, Akao T, Mizuki E, Ohba M and Inouye K. Identification of a novel cytotoxic protein, Cry45Aa, from Bacillus thuringiensis A1470 strain and its selective cytotoxic activity against various mammalian cell lines. J Agric Food Chem 2005;53:6313–6318.

250 12. Jung YC, Mizuki E, Akao T and Coˆte´ JC. Isolation and characterization of a novel Bacillus thuringiensis strain expressing a novel crystal protein with cytocidal activity against human cancer cells. J Appl Microbiol 2007;103:65–79. 13. Uemori A, Maeda M, Yasutake K, Ohgushi A, Kagoshima K, Mizuki E and Ohba M. Ubiquity of parasporin-1 producers in Bacillus thuringiensis natural populations of Japan. Naturwissenschaften 2007;94:34–38. 14. Yasutake K, Binh ND, Kagoshima K, Uemori A, Ohgushi A, Maeda M, Mizuki E, Yu YM and Ohba M. Occurrence of parasporin-producing Bacillus thuringiensis in Vietnam. Can J Microbiol 2006;52:365–372. 15. Ito A, Sasaguri Y, Kitada S, Kusaka Y, Kuwano K, Masutomi K, Mizuki E, Akao T and Ohba M. A Bacillus thuringiensis crystal protein with selective cytocidal action to human cells. J Biol Chem 2004;279:21282–21286. 16. Hayakawa T, Kanagawa R, Kotani Y, Kimura M, Yamagiwa M, Yamane Y, Takebe S and Sakai H. Parasporin-2Ab, a newly isolated cytotoxic crystal protein from Bacillus thuringiensis. Curr Microbiol 2007;55:278–283. 17. Yamashita S, Katayama H, Saitoh H, Akao T, Park YS, Mizuki E, Ohba M and Ito A. Typical three-domain Cry proteins of Bacillus thuringiensis strain A1462 exhibit cytocidal activity on limited human cancer cells. J Biochem (Tokyo) 2005;138:663–672. 18. Kondo S, Mizuki E, Akao T and Ohba M. Antitrichomonal strains of Bacillus thuringiensis. Parasitol Res 2002;88:1090–1092. 19. Akao T, Mizuki E, Yamashita S, Saitoh H and Ohba M. Lectin activity of Bacillus thuringiensis parasporal inclusion proteins. FEMS Microbiol Lett 1999;179:415–421. 20. Okumura S, Akao T, Mizuki E, Ohba M and Inouye K. Screening of the Bacillus thuringiensis Cry1Ac d-endotoxin on the artificial phospholipid monolayer incorporated with brush border membrane vesicles of Plutella xylostella by optical biosensor technology. J Biochem Biophys Methods 2001;47:177–188. 21. Okumura S, Saitoh H, Wasano N, Katayama H, Higuchi K, Mizuki E and Inouye K. Efficient solubilization, activation, and purification of recombinant Cry45Aa of Bacillus thuringiensis expressed as inclusion bodies in Escherichia coli. Protein Expr Purif 2006;47:144–151. 22. Masson L, Mazza A and Brousseau R. Stable immobilization of lipid vesicles for kinetic studies using surface plasmon resonance. Anal Biochem 1994;218:405–412. 23. Garczynski SF, Crim JW and Adang MJ. Identification of putative insect brush border membrane-binding molecules specific to Bacillus thuringiensis d-endotoxin by protein blot analysis. Appl Environ Microbiol 1991;57:2816–2820. 24. Masson L, Mazza A, Brousseau R and Tabashnik B. Kinetics of Bacillus thuringiensis toxin binding with brush border membrane vesicles from susceptible and resistant larvae of Plutella xylostella. J Biol Chem 1995;270:11887–11896. 25. Luo K, Sangadala S, Masson L, Mazza A, Brousseau R and Adang MJ. The Heliothis virescens 170 kDa aminopeptidase functions as ‘‘Receptor A’’ by mediating specific Bacillus thuringiensis Cry1Ac d-endotoxin binding and pore formation. Insect Biochem Mol Biol 1997;27:735–743. 26. Cooper MA, Carroll J, Travis ER, Williams DH and Ellar DJ. Bacillus thuringiensis Cry1Ac toxin interaction with Manduca sexta aminopeptidase N in a model membrane environment. Biochem J 1998;333:677–683. 27. Wolfersberger M, Luethy P, Maurer A, Parenti P, Sacchi FV, Giordana B and Hanozet GM. Preparation and partial characterization of amino acid transporting brush border membrane vesicles from the larval midgut of the cabbage butterfly. Comp Biochem Physiol 1987;86A:301–308.

251 28. Kunitake T, Okahata Y and Tawaki S. Bilayer characteristics of 1,3-dialkyl- and 1,3-diacyl-rac-glycero-2-phosphocholines. J Colloid Interface Sci 1985;103:190–201. 29. Masson L, Lu Y, Mazza A, Brousseau R and Adang MJ. The Cry1Ac receptor purified from Manduca sexta displays multiple specificities. J Biol Chem 1995;270:20309–20315. 30. Knowles BH and Ellar DJ. Colloid-osmotic lysis is a general feature of the mechanism of action of Bacillus thuringiensis d-endotoxins with different insect specificities. Biochim Biophys Acta 1897;924:509–518. 31. Ballester V, Escriche B, Me´nsura JL, Riethmacher GW and Ferre´ J. Lack of crossresistance to other Bacillus thuringiensis crystal proteins in a population of Plutella xylostella highly resistance to Cry1Ab. Biocontrol Sci Technol 1994;4:437–443. 32. Tabashnik BE, Finson N, Groeters FR, Moar WJ, Johnson MW, Luo K and Adang MJ. Reversal of resistance to Bacillus thuringiensis in Plutella xylostella. Proc Natl Acad Sci USA 1994;91:4120–4124. 33. Wright DJ, Iqubal M, Granero F and Ferre´ J. A change in a single midgut receptor in the diamondback moth (Plutella xylostella) is only in part responsible for field resistance to Bacillus thuringiensis subsp kurstaki and B thuringiensis subsp aizawai. Appl Environ Microbiol 1997;63:1814–1819. 34. Ballester V, Granero F, Tabashnik BE, Malvar T and Ferre´ J. Integrative model for binding of Bacillus thuringiensis toxins in susceptible and resistant larvae of the diamondback moth (Plutella xylostella). Appl Environ Microbiol 1999;65:1413–1419. 35. Inouye K, Lee SB and Tonomura B. Effect of amino acid residues at the cleavable site of substrates on the remarkable activation of thermolysin by salts. Biochem J 1996;315:133–138. 36. Inouye K, Lee SB, Nambu K and Tonomura B. Effects of pH, temperature, and alcohols on the remarkable activation of thermolysin by salts. J Biochem 1997;122:358–364. 37. Oneda H and Inouye K. Refolding and recovery of recombinant human matrix metalloproteinase 7 (matrilysin) from inclusion bodies expressed by Escherichia coli. J Biochem 1999;126:905–911. 38. Inouye K, Tanaka H and Oneda H. States of tryptophyl residues and stability of recombinant human matrix metalloproteinase 7 (matrilysin) as examined by fluorescence. J Biochem 2000;128:363–369. 39. Lee D–W, Katayama H, Akao T, Maeda M, Tanaka R, Yamashita S, Saitoh H, Mizuki E and Ohba M. A 28 kDa protein of the Bacillus thuringiensis serovar shandongiensis isolate 89-T-3422 induces a human leukemic cell-specific cytotoxicity. Biochim Biophys Acta 2001;1547:57–63. 40. Koller CN, Bauer LS and Hollingworth RM. Characterization of the pH-mediated solubility of Bacillus thuringiensis var san diego native d-endotoxin crystals. Biochem Biophys Res Commun 1992;184:692–699. 41. Kopito RR. Aggresomes, inclusion bodies and protein aggregation. Trends Cell Biol 2000;10:524–530. 42. Middelberg AP. Preparative protein refolding. Trends Biotechnol 2002;20:437–443. 43. Yon JM. The specificity of protein aggregation. Nat Biotechnol 1996;14:1231. 44. Gustafson ME, Clayton RA, Lavrik PB, Johnson GV, Leimgruber RM, Sims SR and Bartnicki DE. Large-scale production and characterization of Bacillus thuringiensis subsp tenebrionis insecticidal protein from Escherichia coli. Appl Microbiol Biotechnol 1997;47:255–261. 45. Boonserm P, Pornwiroon W, Katzenmeier G, Panyim S and Angsuthanasombat C. Optimised expression in Escherichia coli and purification of the functional form of the Bacillus thuringiensis Cry4Aa d-endotoxin. Protein Expr Purif 2004;35:397–403. 46. Behl C, Davis J, Cole GM and Schubert D. Vitamin E protects nerve cells from amyloid beta protein toxicity. Biochem Biophys Res Commun 1992;186:944–950.

252 47. Heiss P, Bernatz S, Bruchelt G and Senekowitsch-Schmidtke R. Cytotoxic effect of immunoconjugate composed of glucose-oxidase coupled to an anti-ganglioside (GD2) antibody on spheroids. Anticancer Res 1997;17:3177–3178. 48. Koch W and Kaplan D. A nomographic probit solution for the median effective dose (ED50). J Immunol 1950;65:7–16. 49. Smith PK, Krohn RI, Hermanson GT, Mallia AK, Gartner FH, Provenzano MD, Fujimoto EK, Goeke NM, Olson BJ and Klenk DC. Measurement of protein using bicinchoninic acid. Anal Biochem 1985;150:76–85. 50. Bradford MM. A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal Biochem 1976;72:248–254. 51. Schein CH. Solubility as a function of protein structure and solvent components. Biotechnol 1990;8:308–317. 52. Crickmore N, Zeigler DR, Feitelson J, Schnepf E, Van Rie J, Lereclus D, Baum J and Dean DH. Revision of the nomenclature for the Bacillus thuringiensis pesticidal crystal proteins. Microbiol Mol Biol Rev 1998;62:807–813. 53. Hardy SP, Lund T and Granum PE. CytK toxin of Bacillus cereus forms pores in planar lipid bilayers and is cytotoxic to intestinal epithelia. FEMS Microbiol Lett 2001;197:47–51. 54. Raimondi F, Kao JP, Fiorentini C, Fabbri A, Donelli G, Gasparini N, Rubino A and Fasano A. Enterotoxicity and cytotoxicity of Vibrio parahaemolyticus thermostable direct hemolysin in in vitro systems. Infect Immun 2000;68:3180–3185. 55. Kothary MH, Delston RB, Curtis SK, McCardell BA and Tall BD. Purification and characterization of a vulnificolysin-like cytolysin produced by Vibrio tubiashii. Appl Environ Microbiol 2001;67:3707–3711. 56. Pru¨X BM, Dietrich R, Nibler B, Ma¨rtlbauer E and Scherer S. The hemolytic enterotoxin HBL is broadly distributed among species of the Bacillus cereus group. Appl Environ Microbiol 1999;65:5436–5442. 57. Yang CY, Pang JC, Kao SS and Tsen HY. Enterotoxigenicity and cytotoxicity of Bacillus thuringiensis strains and development of a process for Cry1Ac production. J Agric Food Chem 2003;51:100–105. 58. Bjellqvist B, Hughes GJ, Pasquali C, Paquet N, Ravier F, Sanchez JC, Frutiger S and Hochstrasser D. The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences. Electrophoresis 1993;14:1023–1031. 59. Yin HL, Iida K and Janmey PA. Identification of a polyphosphoinositide-modulated domain in gelsolin which binds to the sides of actin filaments. J Cell Biol 1988;106:805–812.

253

G protein-independent cell-based assays for drug discovery on seven-transmembrane receptors Folkert Verkaar1,2, Jos W.G. van Rosmalen1,2, Marion Blomenro¨hr1, Chris J. van Koppen1, W. Matthijs Blankesteijn2, Jos F.M. Smits2 and Guido J.R. Zaman1, 1

Molecular Pharmacology Unit, Organon BioSciences, Oss, The Netherlands Department of Pharmacology and Toxicology, Cardiovascular Research Institute Maastricht (CARIM), University of Maastricht, Maastricht, The Netherlands 2

Abstract. Conventional cell-based assays for seven-transmembrane receptors, also known as G protein-coupled receptors, rely on the coupling of the ligand-bound receptor to heterotrimeric G proteins. New assay methods have become available that are not based on G protein activation, but that apply the molecular mechanism underlying the attenuation of G protein signaling mediated by b-arrestin. b-arrestin is a cytoplasmic protein that targets receptors to clathrin-coated endocytotic vesicles for degradation or recycling. This process has been visualized and quantified in high-content imaging assays using receptor- or b-arrestinchimeras with green fluorescent protein. Other assay methods use bioluminescence resonance energy transfer, enzyme fragment complementation, or a protease-activated transcriptional reporter gene, to measure receptor-b-arrestin proximity. b-arrestin recruitment assays have been applied successfully for receptors coupling to Gaq, Gas and Gai proteins, thus providing a generic assay platform for drug discovery on G protein-coupled receptors. The best understood signal transduction pathway elicited by the seven-transmembrane Frizzled receptors does not involve G proteins. The activation of Frizzleds by their cognate ligands of the Wnt family recruits the phosphoprotein dishevelled. Dishevelled regulates a protein complex involved in the destruction of b-catenin. Activation of Frizzled blocks degradation of b-catenin, which translocates to the nucleus to activate transcription of Wnt-responsive genes. The cytoplasm-to-nuclear translocation of b-catenin forms the basis of several high-content assays to measure Wnt/Frizzled signal transduction. Interestingly, Frizzled receptors have recently been shown to internalize and to recruit b-arrestin. This suggests that b-arrestin recruitment assays may be applied for drug discovery on seven-transmembrane receptors beyond G protein-coupled receptors. Keywords: GPCR, seven-transmembrane receptor, G protein, b-arrestin, b-galactosidase, chemiluminescence, bioluminescence, Wnt, Frizzled, Hedgehog.

Introduction Seven-transmembrane receptors (7TMs), which include G protein-coupled receptors (GPCRs), form the largest superfamily of cell surface receptors involved in signal transmission [1]. 7TMs play an important role in numerous physiological processes, such as neurotransmission, metabolism, inflammation, and reproduction. They respond to a wide repertoire of endogenous and Corresponding author : Tel.: +31-412-661043. Fax: +31-412-662519.

E-mail: [email protected] (G.J.R. Zaman). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00010-0

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

254 exogenous ligands, including photons, cations, biogenic amines, pheromones, fragrances, lipids, peptides and large glycoprotein hormones [1]. Multiple diseases have been linked to mutations or polymorphisms in 7TMs [2], and they are the targets of many therapeutic agents. It has been estimated that 25% of prescription drugs sold worldwide exert their therapeutic effect through 7TMs, making these receptors the most successful single drug target class at present [3]. Not surprisingly, many pharmaceutical industries continue to actively pursue high-throughput screening (HTS) on large chemical compound libraries to identify new low molecular weight modulators for 7TMs. Advances in detection technologies have enabled an increased use of cell-based functional assays for 7TM screening. These technologies take advantage of the functional responses elicited through heterotrimeric guanine nucleotide-binding proteins (G proteins) following receptor activation. For instance, the activation of Gas and Gai proteins results in an increase or decrease in intracellular cAMP concentration, respectively, that can be measured with displacement assays using labeled cAMP tracers and anti-cAMP antibodies [4]. Activation of Gaq proteins causes the influx and/or mobilization of calcium ions that can be monitored with cell-permeant calcium-sensitive dyes [5]. The reader is referred to excellent review articles by Williams [4] and Eglen [6] to learn more about these techniques. Recently, a number of technologies have become available that do not rely on G protein activation, but on the attenuation of G protein signaling by b-arrestin [7]. The role of b-arrestin in receptor pharmacology and its application in drug discovery research will be discussed in the first part of this review. Frizzled receptors are 7TMs that are activated by ligands of the Wnt family and that signal through a G protein-independent mechanism involving stabilization of the transcription factor b-catenin. There are no drugs yet or compounds in clinical development that act on Frizzled receptors or their downstream effector molecules (The Investigational Drugs database, www.iddb.com, September 2007). However, there is increased interest in the Wnt/Frizzled pathway as a target for therapeutic intervention, because of its role in cell proliferation and apoptosis [8,9]). Our interest in Frizzled receptors stems from their role in repair of the heart after myocardial infarction [10,11]. The signaling mechanisms employed by Frizzled and the methods used in drug discovery on the Wnt/Frizzled pathway are the topic of the second part of this review. In this second part we will also address signal transduction and drug discovery methods employed for the related 7TM Smoothened. Receptor desensitization and b-arrestin recruitment Agonist activation of GPCRs leads to the exchange of guanine diphosphate (GDP) for guanine triphosphate (GTP) on the Ga subunit of the

255 heterotrimeric G protein. The GTP-bound Ga subunit then dissociates from both its neighboring Gbg subunits and from the receptor. Both the Ga and Gbg protein subunits interact with intracellular effector proteins, whereas the receptor becomes phosphorylated on serine and threonine residues in the intracellular loops and the C-terminal tail by GPCR kinases (GRKs) [12]. The phosphorylated sites are recognized by the cytosolic protein b-arrestin [13] (Fig. 1), which binds to the receptor and sterically prevents further G protein coupling, resulting in receptor desensitization [14]. The bound b-arrestin recruits other proteins, such as clathrin and the clathrin adaptor protein 2 (AP-2) complex, which target the desensitized receptors to clathrincoated pits for endocytosis [15]. Once internalized, the receptors are dephosphorylated and agonist dissociates from its receptor due to the acidic environment of the endosomes (Fig. 1). The internalized receptors are then either recycled to the plasma membrane as resensitized GPCRs capable of responding again to agonist, or they are directed to lysosomes for

Fig. 1. Schematic representation of class A and class B GPCR trafficking. Upon receptor activation, the GPCR is bound by b-arrestin, which targets the receptor to clathrin-coated pits and subsequent internalization. Class A receptors display transient interactions with b-arrestin and are characterized by rapid resensitization and recycling to the plasma membrane. In contrast, binding of b-arrestin to class B receptors is persistent. As a result, class B receptors recycle to the plasma membrane slower than class A receptors. Abbreviations: CCV: clathrin-coated vesicle, PM: plasma membrane. (The color version of the figure is hosted on Science Direct.)

256 degradation [16]. The latter route results in an overall decrease in receptor number. In addition to their role in attenuating G protein-dependent signaling, b-arrestins have been shown to function as signal transducers by coupling to protein kinases, such as members of the Src and mitogenactivated protein kinase (MAPK) families [7]. b-arrestin is a member of the arrestin family, which comprises four proteins. The visual arrestin (arrestin-1) and cone arrestin (arrestin-4) are expressed solely in the eye, whereas the non-visual arrestins, b-arrestin-1 (arrestin-2) and b-arrestin-2 (arrestin-3), are ubiquitously expressed [17]. 7TMs have been divided into two distinct classes based on the dynamics of the interaction between the receptors and the non-visual arrestins (Table 1). Class A receptors preferentially associate with b-arrestin-2 over b-arrestin-1. Binding to both b-arrestin-1 and b-arrestin-2 is transient and rapidly

Table 1. Overview of class A and class B receptors grouped by G protein coupling. Class

Reference

Gaq-coupled receptors Endothelin type A receptor (ETAR) Neurotensin 1 receptor (NTR-1) Thryotropin releasing hormone receptor (TRHR) Substance P receptor (SPR) Angiotensin type 1A receptor (AT1AR) Oxytocin receptor (OR) Vasopressin 1a receptor (V1aR) Platelet-activating factor receptor (PAFR)

A B B B B B A B

[18] [18] [18] [18] [19] [20] [21] [22]

Gas-coupled receptors b2-adrenergic receptor (b2AR) Dopamine D1A receptor (D1AR) a1b-adrenergic receptor (a1bAR) Vasopressin V2 receptor (V2R) Thyroid-stimulating receptor (TSH) Follicle-stimulating hormone (FSH) Luteinizing hormone (LH) CXC chemokine receptor 4 (CXCR4)

A A A B A A A A

[18] [18] [18] [18] [23] [24] [25] [22,26]

Gai-coupled receptors m-opioid receptor (MOR) Somatostatin 2A receptor (SST2A) Somatostatin 3 receptor (SST3) Somatostatin 5 receptor (SST5) CC chemokine receptor 5 (CCR5)

A B A A B

[18] [27] [27] [27] [22,28]

Other Frizzled 4 Smoothened

A A

[29] [30]

257 followed by dephosphorylation and reconstitution of receptor function [17] (Fig. 1). Class B receptors bind b-arrestin-1 and b-arrestin-2 with similar high affinities [18]. The tight binding of class B receptors to b-arrestins results in the formation of clathrin-coated pits and translocation of 7TM–b-arrestin complexes into endocytic vesicles (Fig. 1). The robustness of the interaction between class B receptors and b-arrestins is due to a cluster of serine and threonine residues in the C-terminal tail of class B receptors [20]. In addition to differences within the 7TM itself, a major determinant of the binding kinetics of receptors and b-arrestin is the duration of the ubiquitination state of b-arrestin. Class A receptors are marked by transient ubiquitination of the recruited b-arrestin, whereas class B receptors are characterized by persistent ubiquitination of b-arrestin [31]. The affinity of the 7TM for b-arrestin determines the rate of recycling of receptors. Class A receptors usually recycle to the plasma membrane and resensitize more readily than class B receptors [32]. Therapeutic relevance of b-arrestin-mediated receptor internalization As discussed earlier, receptor desensitization protects cells from acute and chronic overstimulation and is responsible for the decrease in drug effect with repeated dosing. The efficacies of G protein activation and receptor desensitization do not always correlate for different agonists of the same 7TM [33]. For instance, the natural endogenous ligands of the m-opioid receptor, the enkephalin peptides and the opiate etorphine all induce receptor internalization. In contrast, the opiate agonist morphine does not cause internalization, whereas potently inducing G protein coupling [34]. This difference is explained by the observation that morphine only weakly stimulates m-opioid receptor phosphorylation and b-arrestin recruitment [35]. Another m-opioid receptor agonist, herkinorin, is even less active on b-arrestin and internalization than morphine [36]. Studies with b-arrestin-2 knockout mice indicate that ligands that do not induce coupling of the m-opioid receptor to b-arrestin may lack certain side effects of opiates, such as tolerance, respiratory suppression and constipation [37,38]. Another striking example of a ligand that dissociates between G protein activation and b-arrestin recruitment has been described for the angiotensin II type 1 receptor. A mutationally altered angiotensin peptide, denoted SII angiotensin, is unable to induce Gaq-mediated mobilization of second messengers, whereas it still activates b-arrestin-dependent signaling to extracellular signal-related kinase (ERK) [39,40]. A potential therapeutic application of molecules that dissociate between G protein coupling and b-arrestin-mediated receptor internalization has been defined for the chemokine receptors CCR5 and CXCR4. These chemokine receptors are not only involved in regulation of cell migration, but also function as co-receptors for human immunodeficiency virus 1 (HIV-1).

258 Molecules that promote the internalization of CCR5 or CXCR4 without causing recycling of the receptor to the cell surface have potential use for the development of new therapies against AIDS. A synthetic derivative of the CCR5 ligand RANTES having these properties in vitro has been described [41]. To prevent the potential negative effects of fully blocking the inflammatory functions of the chemokine receptors, a compound that induces receptor internalization would need to be co-administered with an allosteric agonist. Such a compound has been identified for CXCR4 [42]. The potential benefit of compounds that induce receptor internalization is further exemplified by the immunomodulatory compound FTY720. The phosphorylated form of this compound is an agonist of the sphingosine1-phosphate (S1P) receptor-1, also known as endothelial differentiation sphingolipid GPCR 1 (Edg 1) [43,44]. The immunomodulatory effect of FTY720 has been attributed to its strong agonistic effect on S1P receptor-1 internalization, and subsequent excessive degradation of internalized receptors [45]. The resultant decrease in receptor number renders recipient lymphocytes unresponsive to the chemoattractant properties of endogenous S1P [46]. Receptor internalization and b-arrestin recruitment assays High-content assays The ubiquity of the interaction between 7TMs and b-arrestins has been applied to develop new assay platforms for drug discovery (Table 2). These assays use read-outs that are typically used to study protein–protein interaction in living cells, such as bioluminescence resonance energy transfer (BRET) [52] and enzyme fragment complementation (EFC) [50]. Receptor internalization and b-arrestin recruitment can also be visualized by creating chimeras with green fluorescent protein (GFP) and high-content cellular imaging (Fig. 2). Redistributions measures the effect of compounds on the cellular translocation of fluorescently labeled proteins, such as GFP-tagged 7TMs [47]. TransFluors is a derived technology and specifically refers to measurement of the effect of compounds on the translocation of GFPb-arrestin fusion proteins [48,53]. Most TransFluor assays use b-arrestin-2, Table 2. Labels used in b-arrestin recruitment and receptor internalization assays. Technology

7TM

b-arrestin

Reference

Redistribution TransFluor BRET2 PathHunter (EFC) TANGO

GFP No label Luciferase ProLink Transcription factor

No label GFP GFP2 b-galactosidase mutant Protease

[47] [48] [49] [50] [51]

259

Fig. 2. Assay principles of the Redistribution and TransFluor assays. (A) In the

Redistribution assay the 7TM is labeled with GFP, (B) whereas in the TransFluor assay the b-arrestin is labeled. The formation of endocytotic vesicles or pits containing 7TM-b-arrestin complexes is visualized by fluorescence microscopy and quantified by automated image acquisition and analysis. (The color version of the figure is hosted on Science Direct.)

because this variant displays more promiscuity for receptor binding than b-arrestin-1 [18] and has a higher affinity for components of the endocytotic machinery [15]. GFP-b-arrestin-2 provides a means to follow internalization of both class A and class B receptors [48] and has been used in HTS [54–56]. TransFluor is even suited for agonistic screening of orphan receptors. In this approach, overexpression of a constitutively active GRK mutant may serve as a positive reference for receptor activation [56]. An advantage of highcontent assays is the easy detection of potential cytotoxicity of compounds by microscopic inspection. Bioluminescence resonance energy transfer Another assay system that uses excitation of GFP as a read-out for b-arrestin-2-mediated receptor internalization is BRET (Fig. 3). BRET is a naturally occurring process. For instance, the jellyfish Aequoria victoria possesses a bioluminescent system that consists of the calcium-sensitive protein aequorin and GFP [57]. Aequorin catalyzes the oxidation of

260

Fig. 3. Principle of the BRET2 b-arrestin recruitment assay. The 7TM is fused to

Renilla luciferase (Rluc). A mutant of b-arrestin-2 is labeled with GFP. (A) In nonstimulated cells, GFP-b-arrestin-2 does not associate with the 7TM-Rluc. (B) Ligand activation of the 7TM in the presence of the chemiluminescent substrate DeepBlueC results in energy transfer and fluorescence emission of the GFP molecule, which is brought in close proximity to the Rluc due to the recruitment of b-arrestin. (The color version of the figure is hosted on Science Direct.)

coelenterazine, resulting in the emission of blue light with a wavelength of 480 nm that excites GFP to produce green light of 508 nm. The luciferase of the sea pansy Renilla reniformis (Rluc) can also oxidize coelenterazine and has been used in combination with yellow fluorescent protein (YFP) to study GPCR dimerization [52,58,59]. The substantial overlap in the emission spectra of Rluc and YFP poses a significant problem that has been overcome in a second-generation BRET assay (BRET2). BRET2 employs an alternative coelenterazine substrate, designated DeepBlueC, and a mutationally modified GFP (GFP2). BRET2 b-arrestin assays have been used for screening of various 7TMs with different G protein coupling [60]. Because the assay window for class A 7TMs was rather low, the assay has been optimized further using co-transfection of GRKs, or of a mutant of b-arrestin-2 that does not interact with the endocytotic machinery [49,61–63]. Additionally, the C-terminal part of a class A GPCR may be exchanged for the corresponding portion of a class B receptor to improve receptor-b-arrestin-binding kinetics [32]. An advantage of BRET is that it uses a ratiometric read-out. The assay signal is calculated from the ratio of the luminescent and

261 fluorescent signal. Errors in liquid dispensing or variations in cell density are canceled out in the ratio, thus increasing the robustness of the assay. Enzyme fragment complementation EFC involves the non-covalent interaction of two polypeptide fragments resulting in a holoenzyme with reconstituted enzyme activity. EFC of b-galactosidase from Escherichia coli has been applied in diagnostic analyte detection and biochemical HTS [50,64,65]. b-galactosidase is a tetrameric protein consisting of identical 116 kDa units. The deletion of either a C-terminal fragment (a) or an N-terminal fragment (o) results in inactive enzymes. Enzyme function is restored upon addition of the a- or o-fragment to the corresponding deletion mutant (Da or Do, respectively). b-galactosidase EFC as applied in the PathHuntert assay technology developed by DiscoveRx (www.discoverx.com) uses a mutant b-galactosidase lacking amino acids 11–42 that is catalytically inactive and is denoted as ‘enzyme acceptor’ (EA) [50] (Fig. 4). Activity is restored by complementation with a peptide containing amino acids 3–92 of b-galactosidase, the so-called enzyme donor (ED). Variants of ED have been made to decrease its affinity for the EA [66]. These low-affinity ED fragments have been critical for the

Fig. 4. Schematic outline of PathHunter EFC assay. (A) The 7TM is labeled with a mutationally altered peptide fragment of b-galactosidase (ProLink or enzyme donor). b-arrestin is labeled with a corresponding deletion mutant of b-galactosidase (enzyme acceptor). (B) Recruitment of b-arrestin by the ligand-activated 7TM results in b-galactosidase complementation. The reconstituted holoenzyme catalyzes the hydrolysis of a substrate, which yields a chemiluminescent signal. (The color version of the figure is hosted on Science Direct.)

262 development of b-arrestin recruitment assays, because if high-affinity ED fragments were to be used, the EFC reaction would be driven constitutively by the highly efficient formation of b-galactosidase holoenzyme rather than by ligand-stimulated, reversible b-arrestin recruitment. In the PathHunter assays, b-arrestin-2 is fused to the EA, while the 7TM receptor is C-terminally tagged with a low-affinity variant of the ED, named ProLinkt [50]. After ligand stimulation, cells are lysed and b-galactosidase EFC is read by measuring chemiluminescence. A great advantage of b-galactosidase EFC is its rapid kinetics, allowing signals to be measured within seconds after enzyme activation [50,67]. InteraX is an early application of b-galactosidase EFC [67]. It uses two weakly complementing b-galactosidase mutants, Da and Do, that can form catalytically active enzymes when brought in close proximity. However, due to the large size of the complementing factors, sterical hinderance frequently occurs, prohibiting assay development. The InteraX technology has found limited use, although assays and screening results have been described for b-arrestin recruitment assays for two different GPCRs [68]. Protease-mediated transcriptional reporter gene assay In the TANGOt assay developed by Sentigen Biosciences (now part of InVitrogen, www.invitrogen.com), the recruitment of b-arrestin is linked to the activation of a transcriptional reporter gene (Fig. 5). The 7TM is C-terminally extended with a protease cleavage site, followed by the tetracycline-controlled transactivator (tTA) from E. coli [69]. b-arrestin has been fused to the tobacco etch virus N1a proteinase [51]. Ligand stimulation of the 7TM and recruitment of b-arrestin brings the protease in sufficiently close vicinity for it to cleave off the transcription factor. The released transcription factor moves into the nucleus where it activates transcription of a tTA-dependent luciferase reporter gene. The transcription factor and the protease are foreign to the host cell, which ensures low intrinsic reporter activity. Other strong aspects of the TANGO technology are its sensitivity and the high-assay window, which are both due to the signal amplification following receptor activation. Discussion: What is the preferred assay? b-arrestin recruitment assays are relatively new. There have been no systematic comparisons of the behavior of the different assays in screening and compound profiling. Therefore, it is too early to draw strong conclusions about what is the best assay format. However, some characteristics of the different technologies will likely be determinants of their popularity in different phases of the drug discovery process. For instance, Redistribution and TransFluor require dedicated image analysis systems, which are still

263

Fig. 5. Schematic outline of TANGO assay. The 7TM is C-terminally extended with

a protease cleavage site, followed by a transcription factor. (A) b-arrestin is fused to a protease. (B) Ligand-activation of the 7TM and recruitment of b-arrestin results in protease-mediated release of the transcription factor, (C) which migrates into the nucleus to induce reporter gene expression. (The color version of the figure is hosted on Science Direct.)

expensive, but provide more information content than single parameter assays such as BRET, EFC and TANGO. For this reason, Redistribution and TransFluor may be preferred in lead optimization, whereas BRET, EFC and TANGO are better suited for HTS. A potential complication when using BRET is that it requires a fluorescence reader with injectors, whereas EFC and TANGO assays can be readily measured on conventional multimode readers. In the BRET and EFC assays the interaction of the 7TM and

264 b-arrestin is directly measured at the activated receptor. In contrast, in the TANGO assay this interaction is measured after signal transduction to a reporter gene, with the potential risk of compound interference at levels downstream in the signaling cascade. Finally, Redistribution, BRET, EFC and TANGO use receptors that are tagged. This enhances the specificity of the signal, in particular when a cell line is used expressing endogenous receptors that respond to the same ligands. On the other hand, receptor tagging increases the chances of perturbation of proper receptor function. Frizzled receptors and Smoothened The most appealing aspect of b-arrestin recruitment is its generic nature, which is best exemplified by considering its applicability for an atypical family of 7TMs. The Frizzled and Smoothened receptors form a subfamily of 7TMs with imperative roles in development and pathology. Although they are evolutionary quite distinct from other 7TMs, they represent a therapeutically interesting target family that displays ligand-induced b-arrestin recruitment. Wnt/Frizzled signaling Frizzled receptors are activated by large palmitoylated glycoproteins of the Wnt family [70]. Activation of Frizzled leads to a multitude of cellular responses, of which the most intensively studied, is a G protein-independent pathway, denoted as the canonical pathway (Fig. 6) [71]. This pathway involves the transcription factor b-catenin. In quiescent cells, b-catenin is continuously degraded. This process is regulated by a multiprotein complex containing glycogen synthase kinase-3b (GSK-3b), the scaffolding protein axin/conductin and the protein product of the adenomatous polyposis coli (APC) gene. GSK-3b phosphorylates b-catenin at several N-terminal serine residues, thereby marking it for ubiquitination and subsequent degradation by the proteasome. The binding of Wnt to Frizzled and its co-receptor low-density lipoprotein-related protein (LRP5 or -6) [72] results in the recruitment of the phosphoprotein dishevelled (Dvl). Dvl becomes hyperphosphorylated and sequesters axin to the plasma membrane [73,74]. This disrupts the formation of the degradation complex and results in the accumulation of unphosphorylated b-catenin, which moves to the nucleus (Fig. 6). Here, b-catenin interacts with transcription factors of the lymphoid enhancer factor (LEF) protein family, also known as T-cell factor (TCF), to induce transcription. There is evidence that certain Frizzleds, i.e. rat Frizzled 1 and 2 [75,76] and Drosophila Frizzleds [77] can also couple to G proteins. G protein activation finally links to b-catenin-LEF/TCF signaling [76]. Recent evidence suggests that b-arrestin may play a role in canonical signaling. Binding of Dvl to ligand-activated Frizzled 4 was shown to recruit

265

Fig. 6. Wnt/Frizzled signal transduction pathway. (A) In quiescent cells, de novo synthesized b-catenin is phosphorylated by a destruction complex, which marks it for proteasomal degradation. (B) The Wnt-activated Frizzled receptor forms a signaling complex and recruits Dvl and axin from the destruction complex. Frizzled hereby disrupts destruction complex function, resulting in the accumulation of unphosphorylated b-catenin in the cytoplasm. b-catenin subsequently translocates to the nucleus, where it combines with TCF to induce transcription of Wnt-responsive genes. (The color version of the figure is hosted on Science Direct.)

b-arrestin-2 [29], while inhibition of endocytosis blocked canonical signaling [78]. Studies with mouse embryonic fibroblasts derived from b-arrestin-1/-2 double knockouts showed that b-arrestin is a necessary component of the b-catenin-LEF/TCF signaling pathway [79]. Aberrant Wnt/Frizzled signaling has been implicated in a number of diseases, including cancer, rheumatoid arthritis, cardiovascular disease and osteoporosis [8,9,11,80–82]. Notably, mutations that augment Wnt/Frizzled signaling have been identified in nearly 90% of sporadic colon carcinomas. In the majority of cases, APC is the defective component [83,84]. Less frequently, mutations in b-catenin or axin seem causative for the disease [85]. Therapeutic strategies to inhibit the Wnt/Frizzled pathway for cancer therapeutics have been reviewed recently by Barker and Clevers [8].

266 Wnt/Frizzled canonical signaling can be measured with LEF/TCFresponsive transcriptional reporter genes or with translocation assays. These assays measure the relative amount of b-catenin in the cytoplasm and the nucleus after ligand-stimulation of Frizzled receptors. The translocation of b-catenin has been quantified using anti-b-catenin antibodies in an EnzymeLinked ImumunoSorbant Assay (ELISA) and with fluorescence microscopic imaging [86]. A high-content assay using GFP-labeled b-catenin was made available by BioImage A/S (now part of ThermoFisher, www.thermofisher.com). As a control in these reporter and translocation assays, cells are treated with an inhibitor of GSK-3b, such as lithium chloride. Inhibition of GSK-3b results in the accumulation of unphosphorylated b-catenin and its subsequent translocation into the nucleus. The repertoire of translocation assays to measure Frizzled receptor activation may be expanded with an EFC-based approach, in which b-catenin is genetically fused to an a-fragment of b-galactosidase (ED). Nuclear import of this chimera is then quantified by complementation with a nucleus-resident b-galactosidase (Da) mutant (EA) [50]. This approach has already been employed for the detection of the cytoplasmic-to-nuclear translocation of the glucocorticoid receptor [87]. Hedgehog and Smoothened Smoothened is a 7TM receptor involved in the Hedgehog signaling pathway. Hedgehog signaling plays an important role in several developmental processes [88]. The natural compound cyclopamine potently inhibits Smoothened [89] and causes cyclopia and limb deformation in livestock [90]. Deregulated Hedgehog signaling has also been implicated in several cancers of endodermal origin [91,92]. A synthetic low molecular weight antagonist of Hedgehog signaling from Curis/Genentech has progressed into phase I clinical trials for the treatment of advanced solid tumors (The Investigational Drugs database, www.iddb.com, September 2007). Hedgehog signaling shows remarkable resemblance to Wnt/Frizzled signal transduction, as its activation also involves the accumulation of unphosphorylated transcription factors (of the Gli family) that migrate into the nucleus. Degradation of Gli in unstimulated cells is regulated by GSK-3b and protein kinase A (PKA). Hedgehog is a large secreted protein that does not bind to Smoothened directly, but to the 12-transmembrane repressor protein Patched. There is evidence for the involvement of G proteins, in particular Gai, in the regulation of Hedgehog signaling [93,94]. In addition, Smoothened is C-terminally phosphorylated by GRK2, resulting in b-arrestin-2-mediated internalization [30]. Hedgehog signaling can be measured with Gli-responsive transcriptional reporters. There are no reports of translocation assays for this pathway, probably due to lack of knowledge. It may prove difficult, however, to devise

267 a translocation assay that measures the nuclear entry of Gli proteins, similar to the b-catenin translocation assays for Wnt/Frizzled. This is because in quiescent cells, a portion of the Gli proteins is processed into an N-terminal fragment that enters the nucleus to repress Hedgehog-responsive genes [95,96]. Thus, considerable amounts of Gli transcription factors are already present in the nuclei of non-stimulated cells. In this respect, b-arrestin recruitment assays would provide a valuable additional tool to measure Hedgehog pathway activation.

Conclusions Cell-based drug discovery for 7TMs is gaining further momentum due to recent advances in detection technologies that improve the quality of HTS and decrease the time needed to select true hits and to discard technology artifacts [97–99]. In addition, screening is becoming more flexible due to the use of cryopreserved cells instead of cells that are continuously maintained in cell culture [99]. b-arrestin recruitment assays have been applied successfully for receptors coupling to Gaq, Gas and Gai proteins (Table 1), thus providing a generic assay platform for GPCRs and may also be applicable to Frizzled receptors and Smoothened. Assay methods include high-content cellular imaging, BRET, EFC and protease-activated reporter gene (TANGO) assays (Table 2). b-arrestin recruitment assays can also be used to identify compounds with novel, non-G protein-mediated effects, such as compounds that specifically induce receptor internalization. In addition, b-arrestin recruitment assays can have certain advantages over conventional G protein-dependent assays (Table 3). For instance, BRET and EFC assays detect receptor activation proximal to the 7TM, while cAMP and calcium measurements, and in particular reporter gene assays, detect receptor activation further downstream in the signaling cascade, thus yielding a higher chance of compound interferences and false positives in screening. Furthermore, b-arrestin recruitment results in a positive assay signal for Gai-coupled receptors. To measure the assay signal, cells do not need to be co-stimulated with forskolin, Table 3. Advantages of b-arrestin recruitment assays over conventional GPCR assays.     a

Possibility to identify compounds with non-G protein-mediated efficacies Detection is proximal to receptora Positive assay signal for Gai-coupled receptors; no forskolin The application of tagged receptorsb enhances specificity of the assay signal

Except for TANGO. Except for TransFluor.

b

268 which is a common cause of assay variation in cAMP and reporter gene assays for Gai-coupled receptors. Conventional GPCR assays measuring intracellular calcium and cAMP concentrations can be performed in native cell lines expressing endogenous receptors. For HTS assay development, receptors are usually overexpressed to allow sufficient assay window above background signal. Most b-arrestin assays use cell lines in which both the 7TM and b-arrestin are overexpressed. In TransFluor and BRET the b-arrestin is tagged with GFP, while in EFC and TANGO the b-arrestin is fused to an enzyme fragment or a protease, respectively (Table 2). In the Redistribution, BRET, EFC and TANGO assays, the 7TM is labeled with another molecule as well (Table 2). This increases specificity of the measured signal, but may influence the efficiency of G protein coupling. This needs to be verified, e.g. in TransFluor assays or by measuring receptor internalization using radiolabeled ligands. Receptortagging potentially limits the use of these particular assays to identify compounds that only weakly differ in b-arrestin as opposed to G protein coupling. TransFluor has been used in HTS of both known and orphan receptors using high-speed confocal imaging [54–56]. Redistribution and BRET have been used to screen smaller compound sets (i.e. sets of r20,000 compounds) [22,47,63], but there are no reports of its use in HTS. EFC and TANGO have the advantage that they can be read with conventional multimode readers and are therefore expected to find broad application in screening and compound profiling. Results from these efforts are eagerly awaited. Acknowledgements We thank Els van Doornmalen, Miranda van der Lee, Bart van Lith and Toon van de Doelen for their contributions in setting up b-arrestin recruitment assays and high-content imaging at Organon BioSciences. Financial support was obtained from the Dutch Government BSIK program (to F.V., J.W.G.R., W.M.B. and J.F.M.S.). References 1. Pierce KL, Premont RT and Lefkowitz RJ. Seven-transmembrane receptors. Nat Rev Mol Cell Biol 2002;3:639–650. 2. Rana BK, Shiina T and Insel PA. Genetic variation and polymorphisms of G-protein-coupled receptors: functional and therapeutic implications. Annu Rev Pharmacol Toxicol 2001;41:593–624. 3. Overington JP, Al-Lazikani B and Hopkins AL. How many drug targets are there? Nat Rev Drug Discov 2006;5:993–996. 4. Williams C. cAMP detection methods in HTS: selecting the best from the rest. Nat Rev Drug Discov 2004;3:125–135.

269 5. Monteith GR and Bird GS. Techniques: high-throughput measurement of intracellular Ca(2+)-back to basics. Trends Pharmacol Sci 2005;26:218–223. 6. Eglen RM. Functional G protein-coupled receptor assays for primary and secondary screening. Comb Chem High Throughput Screen 2005;8:311–318. 7. Luttrell LM and Lefkowitz RJ. The role of beta-arrestins in the termination and transduction of G-protein-coupled receptor signals. J Cell Sci 2002;115:455–465. 8. Barker N and Clevers H. Mining the Wnt pathway for cancer therapeutics. Nat Rev Drug Discov 2006;5:997–1014. 9. Dihlmann S and von Knebel Doeberitz M. Wnt/b-catenin-pathway as a molecular target for future anti-cancer therapeutics. Int J Cancer 2005;113:515–524. 10. Blankesteijn WM, Essers-Janssen YP, Verluyten MJ, Daemen MJ and Smits JF. A homologue of Drosophila tissue polarity gene frizzled is expressed in migrating myofibroblasts in the infarcted rat heart. Nat Med 1997;3:541–544. 11. van Gijn ME, Daemen MJAP, Smits JFM and Blankesteijn WM. The wnt-frizzled cascade in cardiovascular disease. Cardiovasc Res 2002;55:16–24. 12. Pitcher JA, Freedman NJ and Lefkowitz RJ. G-protein-coupled-receptor kinases. Annu Rev Biochem 1998;67:653–692. 13. Lohse MJ, Benovic JL, Codina J, Caron MG and Lefkowitz RJ. Beta-Arrestin: a protein that regulates beta-adrenergic receptor function. Science 1990;248:1547–1550. 14. Hausdorff WP, Caron MG and Lefkowitz RJ. Turning off the signal: desensitization of b-adrenergic function. FASEB J 1990;4:2881–2889. 15. Laporte SA, Oakley RH, Zhang J, Holt JA, Ferguson SS, Caron MG and Barak LS. The beta2-adrenergic receptor/betaarrestin complex recruits the clathrin adaptor AP-2 during endocytosis. Proc Natl Acad Sci USA 1999;96:3712–3717. 16. Tsao P, Cao T and von Zastrow M. Role of endocytosis in mediating downregulation of G-protein-coupled receptors. Trends Pharmacol Sci 2001;22:91–96. 17. Shenoy SK and Lefkowitz RJ. Multifaceted roles of beta-arrestins in the regulation of seven-membrane-spanning receptor trafficking and signalling. Biochem J 2003;375:3–15. 18. Oakley RH, Laporte SA, Holt JA, Caron MG and Barak LS. Differential affinities of visual arrestin, beta arrestin1, and beta arrestin2 for G protein-coupled receptors delineate two major classes of receptors. J Biol Chem 2000;275:17201–17210. 19. Zhang J, Barak LS, Anborgh PH, Laporte SA, Caron MG and Ferguson SSG. Cellular trafficking of G protein-coupled receptor/b-arrestin endocytic complexes. J Biol Chem 1999;274:10999–11006. 20. Oakley RH, Laporte SA, Holt JA, Barak LS and Caron MG. Molecular determinants underlying the formation of stable intracellular G protein-coupled receptor-beta-arrestin complexes after receptor endocytosis. J Biol Chem 2001;276:19452–19460. 21. Terillon S, Barberis C and Bouvier M. Heterodimerization of V1a and V2 vasopressin receptors determines the interaction with beta-arrestin and their trafficking patterns. Proc Natl Acad Sci USA 2004;101:1548–1553. 22. Hamdan FF, Audet M, Garneau P, Pelletier J and Bouvier M. High-throughput screening of G protein-coupled receptor antagonists using a bioluminescence resonance energy transfer 1-based beta-arrestin2 recruitment assay. J Biomol Screen 2005;10:463–475. 23. Frenzel R, Voigt C and Paschke R. The human thyrotropin receptor is predominantly internalized by beta-arrestin 2. Endocrinology 2006;147:3114–3122. 24. Lazari MF, Liu X, Nakamura K, Benovic JL and Ascoli M. Role of G protein-coupled receptor kinases on the agonist-induced phosphorylation and internalization of the follitropin receptor. Mol Endocrinol 1999;13:866–878.

270 25. Min L, Galet C and Ascoli M. The association of arrestin-3 with the human lutropin/ choriogonadotropin receptor depends mostly on receptor activation rather than on receptor phosphorylation. J Biol Chem 2002;277:702–710. 26. Venkatesan S, Rose JJ, Lodge R, Murphy PM and Foley JF. Distinct mechanisms of agonist-induced endocytosis for human chemokine receptors CCR5 and CXCR4. Mol Biol Cell 2003;14:3305–3324. 27. Tulipano G, Stumm R, Pfeiffer M, Kreienkamp HJ, Ho¨lt V and Schulz S. Differential b-arrestin trafficking and endosomal sorting of somatostatin receptor subtypes. J Biol Chem 2004;279:21374–21382. 28. Kraft K, Olbrich H, Majoul I, Mack M, Proudfoot A and Oppermann M. Characterization of sequence determinants within the carboxyl-terminal domain of chemokine receptor CCR5 that regulate signaling and receptor internalization. J Biol Chem 2001;276:34408–34418. 29. Chen W, ten Berge D, Brown J, Ahn S, Hu LA, Miller WE, Caron MG, Barak LS, Nusse R and Lefkowitz RJ. Dishevelled 2 recruits beta-arrestin 2 to mediate Wnt5A-stimulated endocytosis of Frizzled 4. Science 2003;301:1391–1394. 30. Chen W, Ren XR, Nelson CD, Barak LS, Chen JK, Beachy PA, de Sauvage F and Lefkowitz RJ. Activity-dependent internalization of smoothened mediated by betaarrestin 2 and GRK2. Science 2004;306:2257–2260. 31. Shenoy SK and Lefkowitz RJ. Trafficking patterns of beta-arrestin and G protein-coupled receptors determined by the kinetics of beta-arrestin deubiquitination. J Biol Chem 2003;278:14498–14506. 32. Oakley RH, Laporte SA, Holt JA, Barak LS and Caron MG. Association of beta-arrestin with G protein-coupled receptors during clathrin-mediated endocytosis dictates the profile of receptor resensitization. J Biol Chem 1999;274:32248–32257. 33. Violin JD and Lefkowitz RJ. b-arrestin-biased ligands at seven-transmembrane receptors. Trends Pharmacol Sci 2007;28:416–422. 34. Keith DE, Murray SR, Zaki PA, Chu PC, Lissin DV, Kang L, Evans CJ and von Zastrow M. Morphine activates opioid receptors without causing their rapid internalization. J Biol Chem 1996;271:19021–19024. 35. Zhang J, Ferguson SS, Law PY, Barak LS, Bodduluri SR, Laporte SA, Law PY and Caron MG. Role for G protein-coupled receptor kinase in agonist-specific regulation of mu-opioid receptor responsiveness. Proc Natl Acad Sci USA 1998;95:7157–7162. 36. Groer CE, Tidgewell K, Moyer RA, Harding WW, Rothman RB, Prisinzano TE and Bohn LM. An opioid agonist that does not induce Mu opioid receptor-arrestin interactions or receptor internalization. Mol Pharmacol 2007;71:549–557. 37. Bohn LM, Galnetdinov RR, Lin F-T, Lefkowitz RJ and Caron MG. Mu-opioid receptor desensitization by beta-arrestin-2 determines morphine tolerance but not dependence. Nature 2000;408:720–723. 38. Raehal KM, Walker JKL and Bohn LM. Morphine side effects in beta-arrestin 2 knockout mice. J Pharmacol Exp Ther 2005;314:1195–1201. 39. Wei H, Ahn S, Shenoy SK, Karnik SS, Hunyady L, Luttrell LM and Lefkowitz RJ. Independent beta-arrestin 2 and G protein-mediated pathways for angiotensin II activation of extracellular signal-regulated kinases 1 and 2. Proc Natl Acad Sci USA 2003;100:10782–10787. 40. Ahn S, Shenoy SK, Wei H and Lefkowitz RJ. Differential kinetic and spatial patterns of beta-arrestin and G protein-mediated ERK activation by the angiotensin II receptor. J Biol Chem 2004;279:35518–35525.

271 41. Mack M, Luckow B, Nelson PJ, Cihak J, Simmons G, Clapham PR, Signoret N, Marsh M, Stangassinger M, Borlat F, Wells TN, Schlo¨ndorff D and Proudfoot AE. Aminooxypentane-RANTES induces CCR5 internalization but inhibits recycling: a novel inhibitory mechanism of HIV infectivity. J Exp Med 1998;187:1215–1224. 42. Sachpatzidis A, Benton BK, Manfred JP, Wang H, Hamilton A, Dohlman HG and Lolis E. Identification of allosteric peptide agonists of CXCR4. J Biol Chem 2003;278:896–907. 43. Mandala S, Hajdu R, Bergstrom J, Quackenbush E, Xie J, Milligan J, Thornton R, Shei GJ, Card D, Keohane C, et al. Alteration of lymphocyte trafficking by sphingosine-1-phosphate receptor agonists. Science 2002;296:346–349. 44. Brinkmann V, Davis MD, Heise CE, Albert R, Cottens S, Hof R, Bruns C, Prieschl E, Baumruker T, Hiestand P, Foster CA, Zollinger M and Lynch KR. The immune modulator FTY720 targets sphingosine 1-phosphate receptors. J Biol Chem 2002;277:21453–21457. 45. Oo ML, Thangada S, Wu MT, Liu CH, Macdonald TL, Lynch KR, Lin CY and Hla T. Immunosuppressive and anti-angiogenic sphingosine 1-phosphate receptor-1 agonists induce ubiquitinylation and proteasomal degradation of the receptor. J Biol Chem 2007;282:9082–9089. 46. Chiba K. FTY720, a new class of immunomodulator, inhibits lymphocyte egress from secondary lymphoid tissues and thymus by agonistic activity at sphingosine 1-phosphate receptors. Pharmacol Ther 2005;108:308–319. 47. Gra˚na¨s C, Lundholt BK, Heydorn A, Linde V, Pedersen HC, Krog-Jensen C, Rosenkilde MM and Pagliaro L. High content screening for G protein-coupled receptors using cell-based protein translocation assays. Comb Chem High Throughput Screen 2005;8:301–309. 48. Oakley RH, Hudson CC, Cruickshank RD, Meyers DM, Payne RE, Jr., Rhem SM and Loomis CR. The cellular distribution of fluorescently labeled arrestins provides a robust, sensitive, and universal assay for screening G protein-coupled receptors. Assay Drug Dev Technol 2002;1:21–30. 49. Elster L, Elling C and Heding A. Bioluminescence resonance energy transfer as a screening assay: focus on partial and inverse agonism. J Biomol Screen 2007;12:41–49. 50. Olson KR and Eglen RM. Beta galactosidase complementation: a cell-based luminescent assay platform for drug discovery. Assay Drug Dev Technol 2007;5:137–144. 51. Lee KJ and Berman Y. Method for identifying CART receptor and uses thereof. Patent application WO 2007;2007/002641 A2. 52. Milligan G. Applications of bioluminescence- and fluorescence resonance energy transfer to drug discovery at G protein-coupled receptors. Eur J Pharm Sci 2004;21:397–405. 53. Ghosh RN, DeBiasio R, Hudson CC, Ramer ER, Cowan CL and Oakley RH. Quantitative cell-based high-content screening for vasopressin receptor agonists using Transfluors technology. J Biomol Screen 2005;10:476–484. 54. Garippa RJ, Hoffman AF, Gradl G and Kirsch A. High-throughput confocal microscopy for beta-arrestin-green fluorescent protein translocation G protein-coupled receptor assays using the Evotec Opera. Methods Enzymol 2006;414:99–120. 55. Hudson CC, Oakley RH, Sjaastad MD and Loomis CR. High-content screening of known G protein-coupled receptors by arrestin translocation. Methods Enzymol 2006;414:63–78. 56. Oakley RH, Hudson CC, Sjaastad MD and Loomis CR. The ligand-independent translocation assay: an enabling technology for screening orphan G protein-coupled receptors by arrestin recruitment. Methods Enzymol 2006;414:50–63. 57. Wilson T and Hastings JW. Bioluminescence. Annu Rev Cell Dev Biol 1998;14:197–230.

272 58. Angers S, Salahpour A, Joly E, Hilairet S, Chelsky D, Dennis M and Bouvier M. Detection of beta 2-adrenergic receptor dimerization in living cells using bioluminescence resonance energy transfer (BRET). Proc Natl Acad Sci USA 2000;97:3684–3689. 59. Ramsay D, Kellett E, McVey M, Rees S and Milligan G. Homo- and hetero-oligomeric interactions between G-protein-coupled receptors in living cells monitored by two variants of bioluminescence resonance energy transfer (BRET): hetero-oligomers between receptor subtypes form more efficiently than between less closely related sequences. Biochem J 2002;365:2–40. 60. Bertrand L, Parent S, Caron M, Legault M, Joly E, Angers S, Bouvier M, Brown M, Houle B and Menard L. The BRET2/arrestin assay in stable recombinant cells: a platform to screen for compounds that interact with G protein-coupled receptors (GPCRS). J Rec Signal Trans Res 2002;22:533–541. 61. Kim YM and Benovic JL. Differential roles of arrestin-2 interaction with clathrin and adaptor protein 2 in G protein-coupled receptor trafficking. J Biol Chem 2002;277:30760–30768. 62. Vrecl M, Jorgensen R, Pogacnik A and Heding A. Development of a BRET2 screening assay using beta-arrestin 2 mutants. J Biomol Screen 2004;9:322–333. 63. Heding A. Use of the BRET 7TM receptor/beta-arrestin assay in drug discovery and screening. Expert Rev Mol Diagn 2004;4:403–411. 64. Eglen RM and Singh R. Beta galactosidase enzyme fragment complementation as a novel technology for high throughput screening. Comb Chem High Throughput Screen 2003;6:381–387. 65. Zaman GJR, van der Lee MMC, Kok JJ, Nelissen RLH and Loomans EEMG. Enzyme fragment complementation binding assay for p38a mitogen-activated protein kinase to study the binding kinetics of enzyme inhibitors. Assay Drug Dev Technol 2006;4:411–419. 66. Wehrman TS, Casipit CL, Gewertz NM and Blau HM. Enzymatic detection of protein translocation. Nat Methods 2005;2:521–527. 67. Blakely BT, Rossi FM, Tillotson B, Palmer M, Estelles A and Blau HM. Epidermal growth factor receptor dimerization monitored in live cells. Nat Biotech 2000;18:218–222. 68. Yan YX, Boldt-Houle DM, Tillotson BP, Gee MA, D’Eon BJ, Chang XJ, Olesen CE and Palmer MA. Cell-based high-throughput screening assay system for monitoring G protein-coupled receptor activation using beta-galactosidase enzyme complementation technology. J Biomol Screen 2002;7:451–459. 69. Gossen M and Bujard H. Tight control of gene expression in mammalian cells by tetracycline-responsive promoters. Proc Natl Acad Sci USA 1992;89:5547–5551. 70. Bhanot P, Brink M, Samos CH, Hsieh JC, Wang Y, Macke JP, Andrew D, Nathans J and Nusse R. A new member of the frizzled family from Drosophila functions as a Wingless receptor. Nature 1996;382:225–230. 71. Dale TC. Signal transduction by the Wnt family of ligands. Biochem J 1998;329:209–223. 72. Tamai K, Semenov M, Kato Y, Spokony R, Liu C, Katsuyama Y, Hess F, Saint-Jeannet JP and He X. LDL-receptor-related proteins in Wnt signal transduction. Nature 2000;407:530–535. 73. Cong F, Schweizer L and Varmus H. Wnt signals across the plasma membrane to activate the beta-catenin pathway by forming oligomers containing its receptors, Frizzled LRP. Development 2004;131:5103–5115. 74. Tamai K, Zeng X, Liu C, Zhang X, Harada Y, Chang Z and He X. A mechanism for Wnt coreceptor activation. Mol Cell 2004;13:149–156.

273 75. Liu T, Liu X, Wang H, Moon RT and Malbon CC. Activation of rat frizzled-1 promotes Wnt signaling and differentiation of mouse F9 teratocarcinoma cells via pathways that require Galpha(q) and Galpha(o) function. J Biol Chem 1999;274:33539–33544. 76. Liu T, DeCostanzo AJ, Liu X, Wang H, Hallagan S, Moon RT and Malbon CC. G protein signaling from activated rat frizzled-1 to the beta-catenin-Lef-Tcf pathway. Science 2001;292:1718–1722. 77. Katanaev VL, Ponzielli R, Semeriva M and Tomlinson A. Trimeric G protein-dependent frizzled signaling in Drosophila. Cell 2005;120:111–122. 78. Blitzer JT and Nusse R. A critical role for endocytosis in Wnt signaling. BMC Cell Biol 2006;7:28. 79. Bryja V, Gradl D, Schambony A, Arenas E and Schulte G. b-arrestin is a necessary component of Wnt/b-catenin signaling in vitro and in vivo. Proc Natl Acad Sci USA 2007;104:6690–6695. 80. Sen M. Wnt signalling in rheumatoid arthritis. Rheumatology 2005;44:708–713. 81. Semenov MV and He X. LRP5 mutations linked to high bone mass diseases cause reduced LRP5 binding and inhibition by SOST. J Biol Chem 2006;281:38276–38284. 82. Downey LM, Bottomley HM, Sheridan E, Ahmed M, Gilmour DF, Inglehearn CF, Reddy A, Agrawal A, Bradbury J and Toomes C. Reduced bone mineral density and hyaloid vasculature remnants in a consanguineous recessive FEVR family with a mutation in LRP5. Br J Ophthalmol 2006;90:1163–1167. 83. Powell SM, Zilz N, Beazer-Barclay Y, Bryan TM, Hamilton SR, Thibodeau SN, Vogelstein B and Kinzler KW. APC mutations occur early during colorectal tumorigenesis. Nature 1992;359:235–237. 84. Korinek V, Barker N, Morin PJ, van Wichen D, de WR, Kinzler KW, Vogelstein B and Clevers H. Constitutive transcriptional activation by a beta-catenin-Tcf complex in APC-/- colon carcinoma. Science 1997;275:1784–1787. 85. Liu W, Dong X, Mai M, Seelan RS, Taniguchi K, Krishnadath KK, Halling KC, Cunningham JM, Boardman LA, Qian C, et al. Mutations in AXIN2 cause colorectal cancer with defective mismatch repair by activating beta-catenin/TCF signalling. Nat Genet 2000;26:146–147. 86. Borchert KM, Galvin RJS, Frolik CA, Hale LV, Halladay DL, Gonyier RJ, Trask OJ, Nickischer DR and Houck KA. High-content screening assay for activators of the Wnt/ Frd pathway in primary cells. Assay Drug Dev Technol 2005;3:133–141. 87. Fung P, Peng K, Kobel P, Dotimas H, Kauffman L, Olson K and Eglen RM. A homogeneous cell-based assay to measure nuclear translocation using beta-galactosidase enzyme fragment complementation. Assay Drug Dev Technol 2006;4:263–272. 88. Ingham PW. Signalling by hedgehog family proteins in Drosophila and vertebrate development. Curr Opin Genet Dev 1995;5:492–498. 89. Taipale J, Chen JK, Cooper MK, Wang B, Mann RK, Milenkovic L, Scott MP and Beachy PA. Effects of oncogenic mutations in Smoothened and Patched can be reversed by cyclopamine. Nature 2000;406:1005–1009. 90. Bale AE. Sheep, lilies and human genetics. Nature 2000;406:944–945. 91. Xie J, Murone M, Luoh SM, Ryan A, Gu Q, Zhang C, Bonifas JM, Lam CW, Hynes M, Goddard A, Rosenthal A, Epstein EH, Jr. and de Sauvage FJ. Activating Smoothened mutations in sporadic basal-cell carcinoma. Nature 1998;391:90–92. 92. Pasca di Magliano M and Hebrok M. Hedgehog signalling in cancer formation and maintenance. Nat Rev Cancer 2003;3:903–911. 93. Riobo NA, Saucy B, Dilizio C and Manning DR. Activation of heterotrimeric G proteins by Smoothened. Proc Natl Acad Sci USA 2006;103:12607–12612.

274 94. Ruiz-Go´mez A, Molnar C, Holguı´ n H, Mayor F, Jr. and de Celis JF. The cell biology of Smo signalling and its relationship with GPCRs. Biochim Biophys Acta 2007;1768:901–912. 95. Wang B, Fallon JF and Beachy PA. Hedgehog-regulated processing of Gli3 produces an anterior/posterior repressor gradient in the developing vertebrate limb. Cell 2000;100:423–434. 96. Pan Y, Bai CB, Joyner AL and Wang B. Sonic hedgehog signaling regulates Gli2 transcriptional activity by suppressing its processing and degradation. Mol Cell Biol 2006;26:3365–3377. 97. Oosterom J, van Doornmalen EJP, Lobregt S, Blomenro¨hr M and Zaman GJR. Highthroughput screening using b-lactamase reporter-gene technology for identification of low-molecular-weight antagonists of the human gonadotropin releasing hormone receptor. Assay Drug Dev Technol 2005;3:143–154. 98. Williams C and Sewing A. G-protein coupled receptor assays: to measure affinity or efficacy that is the question. Comb Chem High Throughput Screen 2005;8:285–292. 99. Zaman GJR, de Roos JA, Blomenro¨hr M, van Koppen CJ and Oosterom J. Cryopreserved cells facilitate cell-based drug discovery. Drug Discov Today 2007;12:521–526.

275

The application of low shear modeled microgravity to 3-D cell biology and tissue engineering Stephen Navran Synthecon, Inc., 8042 El Rio, Houston, Texas 77054, USA Abstract. The practice of cell culture has been virtually unchanged for 100 years. Until recently, life scientists have had to content themselves with two-dimensional cell culture technology. Clearly, living creatures are not constructed in two dimensions and thus it has become widely recognized that in vitro culture systems must become three dimensional to correctly model in vivo biology. Attempts to modify conventional 2-D culture technology to accommodate 3-D cell growth such as embedding cells in extracellular matrix have demonstrated the superiority of concept. Nevertheless, there are serious drawbacks to this approach including limited mass transport and lack of scalability. Recently, a new cell culture technology developed at NASA to study the effects of microgravity on cells has emerged to solve many of the problems of 3-D cell culture. The technology, the Rotating Wall Vessel (RWV) is a single axis clinostat consisting of a fluid-filled, cylindrical, horizontally rotating culture vessel. Cells placed in this environment are suspended by the resolution of the gravitational, centrifugal and Coriolis forces with extremely low mechanical shear. These conditions, which have been called ‘‘low shear modeled microgravity’’, enable cells to assemble into tissue-like aggregates with high mass transport of nutrients, oxygen and wastes. Examples of the use of the RWV for basic cell biology research and tissue engineering applications are discussed. Keywords: low shear, microgravity, cell culture, tissue engineering, rotating wall vessel, three dimensional, stem cells.

Introduction Tissue culture has been utilized as a tool in the biological sciences for 100 years [1]. It strives to mimic, as closely as possible, the conditions inside the body. Although classical tissue culture has contributed enormously to our understanding of cell biology, it has fallen short of the ideal. The term ‘‘tissue culture’’ has often been used synonymously with organ culture and cell culture. There are, however, some important distinctions between these terms. Organ culture is generally taken to mean an intact, undisaggregated piece of tissue that is placed on the bottom of a plastic dish or flask. Cell culture usually implies dispersed cells taken from original tissue or cell lines by enzymatic, mechanical or chemical disaggregation. For most of its history, tissue culture has been performed in two dimensions. Dispersed cells typically form monolayers. Whole tissue organ cultures may start out in their original three-dimensional configuration, but they usually lose their structure as cells

E-mail: [email protected] (S. Navran). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00011-2

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

276 migrate out of the tissue and attach to the plastic substrate. In addition, many of the phenotypic characteristics of the tissue from which the cells are derived are lost in culture. This may be due to changes in gene expression in the differentiated cells or to overgrowth of a particular cell type in the population. Whatever the reason, the loss of tissue-specific characteristics can reduce the usefulness of classical two-dimensional cell culture for modeling in vivo cell behavior. Indeed, it is now becoming well accepted that 3-dimensional culture is, in most respects, a much better analog of the in vivo environment than 2-D culture [2]. Technology for 3-D tissue culture For the in vitro engineering of tissue in 3-D there are several basic requirements: A. Cells must be able to co-localize to facilitate cell-to-cell contacts. These contacts are critical for intercellular signaling between surface receptors and for exchange of soluble molecules. B. Sufficient mass transfer of nutrients, oxygen and wastes to maintain cell viability throughout the tissue. Since 3-D cell aggregates do not have a blood supply, cells deep within the construct will be hypoxic unless sufficient mass transport can be maintained. C. Spatial freedom to remodel random aggregates of cells into organotypic 3-D structures. Considering that most tissues are made up of multiple cell types, it is critical to the function of the cultured tissue that cells can move freely to recapitulate the normal in vivo cytoarchitechture. A variety of technologies for cell and tissue culture have been developed since the first use of glass or plastic substrates. Some of these culture techniques have been adapted with varying degrees of success to 3-D culture. The simplest of these is a cell pellet in which isolated cells are packed in the bottom of a test tube by centrifugation and covered with culture media [3]. While this technique can produce 3-D cultures, diffusion of nutrients, oxygen and waste is limited by the static nature of the culture environment. Consequently, this technique has not been widely used. Another simple technique is to culture anchorage-dependent cells in a plastic flask or dish which has not been treated to promote cell attachment. As the cells float in the dish, they may randomly aggregate and form large irregular clumps. Many individual cells, however, fail to make contact, become apoptotic and die. Furthermore, cells within the large aggregates will eventually become necrotic due to lack of nutrients and oxygen in the static culture environment. An example of this technique is the formation of embryoid bodies produced from dispersed embryonic stem cells [4].

277 More recently, a number of investigators have used 3-D extracellular matrices with embedded cells in a dish or multiwell plate [5]. This arrangement has been shown to produce reasonable 3-D tissue models by allowing cells to organize within the matrix. The limitations with this procedure are that it restricts mass transfer due to the static nature of the culture conditions and the matrix itself presents an additional barrier to diffusion. It is also not readily adaptable to tissue engineering applications since scale-up is difficult. Occasionally, investigators have attempted to use dynamic culture systems such as spinner flasks or hollow fiber systems for 3-D culture. While these technologies provide excellent mass transport, they do not fulfil the requirements for a useful 3-D culture system. In the case of spinner flasks, or on a larger scale, stirred tanks, the method of suspending cells utilizes mechanical force which not only damages cells [6] but also tends to prevent the assembly of 3-D aggregates. With hollow fiber systems, cells are grown on one side of a semi-permeable membrane while media is perfused on the other side. Although shear forces are eliminated by this technology, the ability to culture cells in 3-D is limited to the surface of the fibers. Approximately 20 years ago, a new type of cell culture system was developed at NASA’s Johnson Space Center. It was based on the principle of clinorotation, defined as the nullification of the force of gravity by slow rotation around one or two axes. The clinostat developed at NASA is a single axis device known as the Rotating Wall Vessel (RWV) (Fig. 1). A two-axis clinostat, the Random Positioning Machine (RPM) was developed for the European Space Agency. This review will deal only with the RWV since it is a simpler design and therefore more adaptable to scale-up and tissue engineering applications. In addition, two studies comparing the RWV with the RPM concluded that they produced similar effects on cells [7,8]. The original purpose of the RWV was to carry cells into space on the Shuttle. The rationale for the design was that it should simulate microgravity in the 48 h waiting period on the Shuttle before launch and protect cultured cells during the stresses of the liftoff. It was quickly noted in ground experiments that suspended cells tended to form 3-D aggregates. This finding has become the basis of most of the applications which will be detailed below. As this technology has been adopted for 3-D cell culture, the terminology, low shear modeled microgravity [9] has generally come to be used to describe the environment created in the RWV. Operating principles of the RWV The RWV is a horizontally rotated cylinder with a coaxial oxygenator positioned in the center (Fig. 1). The vessel is completely filled with culture media and when rotated, the fluid flow is coupled to the vessel wall such that

278

Fig. 1. The Rotating Wall Vessel. The cylindrical vessel has a central core covered with a silicone membrane which allows oxygen and CO2 transfer between the culture medium and the incubator environment. The vessel is rotated by a motorized base unit which also contains a pump to drive incubator gas through the central core. (The color version of this figure is hosted on Science Direct.)

it rotates essentially as a solid body. The oxygenator core is fixed and rotates at the same angular velocity as the outer wall and thereby creates a laminar flow with minimal shear force. Cells placed in this environment are maintained in suspension by the resolution of the gravitational, centrifugal and Coriolis forces as the vessel rotates (Fig. 2). A detailed explanation of the principles of optimized suspension culture in the RWV have been recently published [10] and will only be summarized here. There are three basic elements that govern the cell culture environment in the RWV: A. As the vessel rotates, the cells or cell aggregates accelerate until they reach terminal (sedimentation) velocity at which the gravitational

279 ω Streamline

G

Particle Path

P

Ro

Vst

Vsr Vs

Vct

Fig. 2. Vector velocity diagram. The forces acting on a particle rotating in a fluid are

shown. Gravity-induced sedimentation (Vs) can be resolved into radial (Vsr) and tangential components (Vst). There is an outwardly directed vector due to centrifugal force. Tangential Coriolis-induced motion (Vct) leads to spiraling of particles.

force is counterbalanced by hydrodynamic forces of shear, centrifugal force and Coriolis force. The major determinant of sedimentation velocity is the size of the cell aggregate which, according to the Stokes equation increases as the square of the radius [11]. Therefore, as cell aggregates grow in size, they will sediment more rapidly and it is necessary to increase the rotational speed of the RWV to maintain aggregates in suspension and avoid damage from wall collisions. B. As the RWV rotates, the gravitational force can be resolved into radial and tangential components (Fig. 2). The radial component produces a secondary spiraling of the cell aggregates. The relationship between the amplitude of these spirals and the rotational speed of the vessel is shown in Fig. 3 (see [11] for the derivation of this diagram). At low rotational speed, the spirals are large and result in many wall collisions by the cell aggregates. As the speed of rotation is increased, the spirals decrease in amplitude, but the effect of centrifugal force becomes more dominant and wall collisions rise. Therefore an optimal rotational speed must be determined that minimizes the total deviations due to gravitational and centrifugal forces, i.e. at the intersection of the two

280

Amplitude of Deviations caused by Gravity (cm) Vs/ω

Centrifugal Deviation in One Revolution (cm) 2π VsgωRo/G

0.5

0.0 0

Rotations per Minute

100

Fig. 3. Relationship between centrifugal and gravity-induced distortions and rotations

per minute. The radius of Coriolis-induced spiraling at low speed can be reduced by increasing the rotational speed but as the speed is increased the particles will eventually collide with the vessel wall.

lines in Fig. 3. At first glance, determining the optimal speed might seem difficult, but in practice it is quite simple. As the cell aggregates increase in size to a point where adjusting the rotational speed is necessary to keep them in suspension, they become visible to the naked eye and the speed can optimized by observing the particle motion through the clear wall of the RWV. C. Nutrient and oxygen delivery to and metabolic waste removal from 3-D cell aggregates is critically important to their function and survival in culture. The availability of nutrients and oxygen to 3-D cell aggregates in suspension is dependent on the relative contribution of diffusion versus convective bulk flow. As cell aggregates fall through the media nutrients and oxygen are continuously supplied by convective bulk flow. When sedimentation velocity is very low or approaching zero such as in the case of single cell suspensions or very small aggregates, the availability of nutrients and oxygen is limited to diffusion. This situation can be addressed by allowing cells to attach to scaffolding which sediments at a higher velocity or by introducing a gradient of fresh media through perfusion of the RWV (Fig. 4).

281

Fig. 4. Perfused RWV has rotating couplings at each end which allow the vessel to

rotate and circulate media through the vessel from a feeding reservoir. The media is pumped through an external oxygenator (upper left) before entering the vessel. (The color version of this figure is hosted on Science Direct.)

Applications of the RWV The RWV has proven to be a unique tool in the domain of cell and tissue culture technology. Applications range from basic cell biology to space biology to tissue engineering and, in the future, development of therapies for disease and injury. These subdivisions clearly overlap, but for the sake of clarity, the topics will be organized in this manner, albeit, somewhat arbitrarily. Basic cell biology 2D vs. 3D Not surprisingly, many of the basic biology applications are focused on the differences between conventional 2-D culture and 3-D RWV culture in terms of cell-to-cell communications, cell–matrix interactions, gene expression and functional characteristics. Below are several examples of these types of studies. A recent report on the effect of extracellular matrix on embryonic stem cell (ESC) differentiation showed that a mixture of complex basement membrane

282 components such as Matrigel, promoted ESC differentiation into epithelium with a characteristic glandular structure [12]. The authors compared static, 2-D culture in which ESCs were grown on Matrigel with RWV culture where soluble Matrigel was added to the suspended ESCs. The RWV culture produced more phenotypically normal looking glandular structures than the 2-D culture demonstrating the importance of the RWV in creating a 3-D environment for effective cell differentiation. In another study of neuronal stem cell differentiation [13], the investigators compared the growth and differentiation of neuronal stem cells embedded in collagen in static culture and in the RWV. The embedded static culture had previously been established as a superior 3-D model for neuronal differentiation compared to conventional 2-D culture [14]. However, it was noted that the constructs contained hypoxic, necrotic cores and the cells were short lived. When the immobilized neuronal stem cells were cultured in the RWV, the constructs were viable for up to nine weeks, expressed 10-fold greater numbers of nestin+ and GFAP+ (neuronal markers of differentiation) cells than comparable static cultures and were able to display extensive neurite outgrowth. These results demonstrate that the improvement in mass transport provided by the RWV can markedly enhance cellular differentiation even in culture systems designed to facilitate 3-D culture. Infectious disease In the study of infectious diseases, it is sometimes difficult to find eukaryotic host cell lines which can be infected with the organism of interest, for example, in the case of human noroviruses. Noroviruses are the leading cause of non-bacterial gastrointestinal illness in the world [15]. Understanding of human norovirus pathogenesis has been hampered by the inability to propagate these viruses in vitro [16]. Until recently, investigators have been forced to use a mouse norovirus cell culture model to make inferences about the human virus. However, this model did not answer fundamental questions about the behavior of the human virus in the human gastrointestinal tract. To address this problem, a 3-D culture of a human small intestinal epithelial cell line was established in the RWV and, unlike previous attempts in 2-D culture, was successfully infected with human norovirus [17]. The RWV has also been widely used in the study of bacterial pathogenesis [18–20]. The overall conclusion of these and similar studies was that, although bacterial cells were able to infect 2-D cultures, the 3-D tissue models produced in the RWV were far superior in simulating the bacterial-host cell interactions that occurred in vivo. Stem cells The potential use of stem cells in regenerative medicine, a subject of intense interest recently, has also benefited from the RWV cell culture technology. Stem cells, both embryonic and adult are rare and have proven difficult to

283 control in conventional culture systems in terms of their tendency to spontaneously differentiate into various lineages. Recent studies have shown that ESC differentiation can be guided to some extent by the combination of extracellular matrix and 3-D culture in the RWV [12,21]. Similar results have been seen with adult stem cells, for example in differentiation of neural stem cells into neurons [13] and umbilical cord blood-derived stem cells differentiated into hepatocytes [22]. Bone marrow mesenchymal stem cells have also drawn considerable interest lately due to their ability to form a variety of differentiated tissues including, skeletal muscle, bone, endothelium and cartilage [23]. In order for these cells to be useful for regenerative medicine applications, their numbers must be considerably expanded. Here again, the RWV has been shown to be an improvement over conventional 2-D flask culture in increasing the numbers of primitive progenitor cells without uncontrolled differentiation [24]. Intercellular signaling and gene expression An example of the use of low shear modeled microgravity in signaling is demonstrated in the study of bone loss, which can be a serious health issue in a growing population of elderly. The molecular mechanisms which underlie bone loss in mechanical unloading are not well understood. The 3-D RWV culture of osteoblasts which stimulate bone formation and osteoclasts which stimulate bone resorption has provided considerable insight into this question. Findings from several laboratories have shown that both an increase in osteoclast activity [25] and a decrease in osteoblast activity [26] are responsible for the overall phenomenon of bone loss in mechanical unloading. At very early times of microgravity exposure, osteoblasts increased phosphorylation of ERK-1/2 leading to secretion of proteins which stimulate osteoclast activity [25]. With longer exposure to microgravity, osteoblast differentiation was directly inhibited by changes in phosphorylation of several mitogen-activated protein kinases (MAPK) [26]. The analysis of signaling mechanisms identified in these studies suggests potential targets for future therapies. In the study of the relationships between genotype and phenotype, yeast (Saccharomyces cerevisiae) has been a very useful model organism, having much in common with cells of higher phylogenetic origin [27]. There are a number of examples of the relationships between mechanical forces and changes in gene expression [28–30] in cells and tissues. Yeast culture in the RWV has been used to study these relationships because of the unique conditions of low shear modeled microgravity and because the yeast genome has been completely sequenced. One such study showed that shifting from the normal gyrorotatory culture to RWV culture caused the yeast to upregulate genes involved in metabolism, i.e. glucose utilization and the so-called stress response genes which might be better characterized as sensors for changes in

284 the environment [31]. Here, the low shear modeled microgravity environment was key to revealing the role of these pathways in environmental adaptation. Space biology Prolonged microgravity experienced by astronauts is well documented to produce a variety of physiological consequences including bone loss, muscle atrophy and immunosuppression. The future of manned space flight depends on the development of counter measures to these serious health issues. The development of the RWV and the low shear modeled microgravity environment has provided a methodology to study the effects of microgravity at the cellular and molecular level and to provide tissue model systems for testing of potential counter measures. The major question is whether low shear modeled microgravity is a true analog of the microgravity environment of space. One recent study questioned the validity of this assumption. In this study, the investigators compared global gene expression of renal cortical cells cultured on the Space Shuttle with same cells cultured in the RWV on the ground [32]. The results showed that there were many changes in gene expression in both the space and ground cultures, but there was little overlap. However, it must be noted that this study used only a single cell type and thus the results may not be generalizable to all cells. In particular, the behavior of two tissue types, bone and immune, which are especially problematic for astronauts, correlate well in space and in the RWV. As indicated above, the physiological effect of bone loss in space is very well modeled at the cellular level in the RWV. A recent study utilizing low shear modeled microgravity has already suggested a potential counter measure [33] by manipulating signaling pathways revealed in previous studies [25,26]. The immunosuppressive effects of space microgravity have also been modeled in the RWV [34–36] demonstrating that T-lymphocyte activation is particularly sensitive to microgravity, correlating with the immune deficiency experienced by the astronauts. As with bone, the data derived from such studies has been used to develop an in vitro solution to T-lymphocyte dysfunction, by the addition of nucleotides and nucleosides to the culture medium [37] which is being developed into a nutritional supplement for astronauts [38]. Drug development As the traditional large pharmaceutical companies are finding it increasing difficult to generate new small molecule drugs, the era of the human genome is opening vast opportunities for protein-based therapeutics. While there are already a number of recombinant proteins and antibodies on the market, there are some significant hurdles to be overcome for protein-based therapeutics to reach their full potential. Currently, most of these proteins are manufactured by animal cells in large stirred tanks, sometimes as much as 10,000 l. The scale of these stirred tanks is so large because the efficiency of

285 protein production is low. Furthermore, the purification of these proteins in downstream processing is made more difficult by the dilution of the protein in the large volume of culture media. Because of these inefficiencies, proteinbased therapeutics are generally much more expensive to produce and therefore more expensive for the patients. There is considerable ongoing research to improve the efficiency of protein production, most of which is concerned with genetically engineering cell lines and improving culture media formulations. Recently, the RWV bioreactor has been investigated as an alternative technology for further improvements in protein production efficiency [39]. The hypothesis for this study was that cell damage caused by mechanical agitation in stirred tanks was reducing the capability of cells to produce recombinant proteins. To test this hypothesis, a cell line transfected with a reporter gene, LacZ, was cultured in a spinner flask and an RWV and the production of b-galactosidase was compared. The results showed a sevenfold increase in protein production in the RWV over the spinner flask (Fig. 5). Another potential advantage of the RWV is that the production of complex, post-translation ally modified proteins could be greatly facilitated by using unconventional cell lines that could correctly make these modifications, but which might not survive in a high shear bioreactor environment. At this point, the RWV has been scaled up to 3 l, not sufficient for commercial protein production, but development of much larger rotating bioreactors is in progress.

OD 420 /106 Cells

0.3

0.2

0.1

0 Spinner Flask

RWV

Fig. 5. Comparison of b-galactosidase production in an RWV and a spinner flask. K562 cells transfected with LacZ were initially seeded in the RWV and spinner flask at 5  105 cells/ml. Media was changed at 48 h intervals. After seven days, the cells were harvested and assayed for b-galactosidase activity (n ¼ 3).

286 Cell and tissue therapy Tissue engineering is an emerging multidisciplinary field that uses principles of biology and engineering for developing replacement tissues for the treatment of disease and injury [40]. Cell therapy aims to treat illness by introducing suspended cells into tissues or into the blood stream to either replace injured cells or to provide a supportive environment for recovery of injured cells. In both tissue engineering and cell therapy, the low shear modeled microgravity environment has shown potential for assisting these new forms of treatment. The following examples are but a small sample of the applications of low shear modeled microgravity to potential cell and tissue therapies. Connective tissue Tissue-engineered cartilage has been used to repair damaged joints [41,42]. However, there has been no success in repair of hyaline cartilage which covers the ends of bones. Recently, investigators have been able to heal these injuries in animal models using cartilage constructs created in the RWV [43]. The distinguishing features of this study were that the cartilage was cultured from bone marrow mesenchymal stem cells, making it possible to use autologous bone marrow to eliminate any risk of tissue rejection. Furthermore, these cartilage constructs were formed without scaffolding which probably accounts for their ability to fill the ostrochondral defects. In the past, many cartilage tissue engineering approaches have used nonwoven, fibrous, 3-D meshes as scaffolding [44], but it has been difficult to achieve uniform cell density with this technique. An alternative approach is to use hydrogels which are formed around the chondrocytes [45]. However, cell viability is compromised in static culture by the thickness of the 3-D constructs which interferes with diffusion of oxygen, nutrients and wastes [46]. This limitation was overcome by culturing the hydrogel constructs in the RWV [47], resulting in enhanced cell proliferation, increased production of characteristic extracellular matrix proteins, proteoglycans and type II collagen and uniform cell distribution throughout the hydrogel. Cardiac muscle Cardiovascular diseases such as myocardial infarction and heart failure are a major cause of mortality in Western societies. While pharmacological treatments have improved, most patients at the end stage of disease will die. Heart transplants, the last option for these patients is limited due to the shortage of donors. Therefore, cardiac muscle tissue engineering may offer a potential treatment option. Since there is no easily accessible supply of cardiomyocytes, they will have to be generated from an alternative source such as stem cells. Recently investigators have used the RWV to produce large

287 numbers of embryoid bodies from embryonic stem cells for this purpose [48]. Embryoid bodies are 3-D spheroids aggregated from embryonic stem cells which contain early committed progenitors from all three germline lineages. Previously, it was shown that the RWV was very efficient in embryoid body formation [4]. These investigators were able to isolate progenitors from embryoid bodies that could then be differentiated into myocardial cells and further processed into cardiac muscle tissue. Another recent study focused on engineering cardiac tissue to create a myocardial patch to replace tissue damaged by an infarct [49]. This research was primarily concerned with producing a cardiac muscle construct which could correctly propagate electrical impulses. A combination of scaffold and culture in the RWV produced a tissue that was highly differentiated and exhibited normal anisotropic electrophysiological properties. Cornea and retina Diseases of the eye including the cornea and retina present serious risks for blindness. These conditions are beginning to be addressed by tissue engineering approaches [50,51]. An important requirement for construction of a tissue-engineered cornea is to recreate the three layers of corneal tissue: epithelial, stromal and endothelial in an extracellular matrix biomaterial. One of the major difficulties encountered in corneal tissue engineering is the inability of the stromal cells to survive within the matrix under static culture conditions. It has been shown that the low shear microgravity environment in the RWV is ideal to promote a robust stroma which makes up 90% of the corneal thickness [52]. One approach to the treatment of retinal diseases such as retinitis pigmentosa is the replacement of diseased cells with health ones. This concept has shown some success [53], but the use of retinal cells cultured in a conventional 2-D static environment has proved problematic due to the fact that the cells are undifferentiated when grown in this way. By culturing a combination of retinal progenitors and retinal pigment epithelial cells in the RWV bioreactor, it was shown that highly differentiated retinal cells could be produced for transplantation [54]. Adipose tissue The possibility of using engineered adipose tissue to repair soft tissue defects caused by trauma, tumor resection and congenital abnormalities has been considered for some time [55,56]. To accomplish this objective, a 3-D culture of adipocytes is necessary. Two-dimensional culture does not recapitulate the unique cytoarchitecture observed in vivo. Using preadipocytes cultured without scaffolding in the RWV, it was demonstrated that large multi-millimeter aggregates could be formed which contained mature,

288 differentiated adipocytes with strong evidence of vascularization [57]. The origin of the vascular structures in the cultured adipose tissue was unclear, but the authors speculated that there may have been small numbers of endothelial stem cells in the isolated preadipocyte population. This observation is important because it suggests that it may be possible to vascularize other engineered tissues in the RWV by coculture with endothelial stem cells. Pancreatic islet transplantation The transplantation of islets of Langerhans isolated from cadaver pancreases is a new, experimental procedure which holds great promise as a cure for type 1 diabetes [58]. Nevertheless, there are still significant problems associated with this procedure as indicated by the fact that the mean duration of insulin independence is only 15 months and after 5 years, only 7.5% of the recipients remain insulin independent [59]. There is considerable evidence that early graft failure is associated with apoptosis due to mechanical and enzymatic damage during the isolation process [60,61]. The loss of the surrounding extracellular matrix also contributes to reduced islet viability [62]. The use of conventional static culture before islet transplantation has produced some small improvements in the procedure [63], but recent studies in a mouse diabetic model have suggested that culturing isolated islets in low shear modeled microgravity could dramatically improve islet transplantation in humans [64,65]. When freshly isolated-, static-, dish- or RWV-cultured mouse islets were transplanted into diabetic mice; only the RWV-cultured islets could maintain long-term normal blood glucose levels (Fig. 6). Furthermore, it required only 1/3 to 1/2 the number of RWV-cultured islets to achieve normal blood glucose levels compared to static, dish cultured islets. Further investigation [66] showed that RWV culture of islets decreased apoptosis (Fig. 7) and increased ATP levels (Fig. 8), considered a good indicator of graft success [67]. One complication that was noted in these studies was the tendency for RWV-cultured islets to clump, especially at high densities. This situation was not surprising since the RWV is widely used in tissue engineering applications to promote aggregation [68]. However, in this case it adversely affected the survival of the islets. It was found subsequently that preincubating the islets with a peptide derived from laminin, a protein component of the extracellular matrix, could prevent the clumping and allow the islets to be successfully cultured (Fig. 9). As a bonus, the laminin peptide also appeared to independently reduce apoptosis (Fig. 7) and increase ATP in islets (Fig. 8) probably by mimicking the extracellular matrix at adhesion sites on the surface of the islets.

289

Fig. 6. Streptozotocin-induced diabetic C3H recipients were transplanted with 30,

60, or 120 islet equivalents: (A) fresh C57BL/10 islets (n ¼ 7); (B) C57BL/10 dishcultured islets (n ¼ 19); or (C) C57BL/10 RWV bioreactor-cultured islets (n ¼ 15). Shaded areas indicate normal blood sugar levels ranging from 5.5 to 11 mmol/L (100–200 mg/dL). The islet equivalent for each group was adjusted to the one size of 150 mm after islets were measured every 50 mm from 50 to 350 mm and over 350 mm. Glucose levels were measured three times per week in blood from a tail vein. Normoglycemia was defined as glucose levels between 5.5 and 11 mmol/L (100–200 mg/dL): four measurements greater than 11 mmol/L (200 mg/dL) or two consecutive measurements greater than 16 mmol/L (300 mg/dL) confirmed diabetes. (Reproduced with permission from reference [54].)

290 120

Caspase 3 Activity % of Dish Control

100

80

60 * 40

20

0 Dish

Dish + Laminin

RWV + Laminin

Fig. 7. Caspase 3 (apoptosis) activity in human islets cultured in dishes or the RWV

with or without laminin A chain peptide. The islets were preincubated in the presence or absence of 50 mg/ml laminin A chain peptide for 3 h and then transferred to a dish or an RWV and cultured for seven days. Caspase 3 was measured by a fluorescence assay. All data was normalized to islet DNA.

Conclusions The low shear modeled microgravitly environment created in the NASAdesigned RWV is produced by rotation of a fluid-filled cylindrical culture vessel about its horizontal axis. Unlike other suspension culture technologies, cells are maintained in suspension in the RWV with very little mechanical force. At the time this technology was first developed, the implications of culturing cells in a low shear environment were not truly appreciated. However, early users quickly noted that suspended cells could be easily assembled into 3-D, tissue-like aggregates. It took still more time for the greater life science community to appreciate the importance of 3-D tissue culture [2]. Now, low shear modeled microgravity is beginning to be recognized as a valuable tool in creating organotypic tissue models for basic biology research. In the emerging multidisciplinary field of tissue engineering, constructing tissues in 3-D is an absolute requirement. As indicated above, there are a number of therapeutic applications where tissues engineered in the RWV are approaching clinical use. That is not to say that all tissue engineering

291

ATP Luminescence % of Dish Control

700 600

*

500 400 300 ** 200 100 0 Dish

Dish + Laminin

RWV + Laminin

Fig. 8. ATP levels in human islets cultured in dishes or the RWV with or without

laminin A chain peptide. The islets were preincubated in the presence or absence of 50 mg/ml laminin A chain peptide for 3 h and then transferred to a dish or an RWV and cultured for seven days. ATP was measured by luminescence. All data was normalized to islet DNA.

problems can be solved with low shear modeled microgravity. This technology, by itself, will not be able to produce large, complex organs like kidneys, lungs or hearts. It is more likely that other technologies will have to be used together with the RWV to complete the construction of such organs. Until recently, the cellular building blocks for 3-D culture in the RWV were limited to immortalized cell lines, neonatal cells or terminally differentiated adult cells. Each of these types of cells are less than ideal for tissue modeling or engineering. Immortalized cell lines, while they have nearly unlimited growth potential, often do not function like the tissues from which they are derived. Furthermore, they cannot be used for therapeutic applications because of the risk of tumor formation. Neonatal and adult primary cells may produce more differentiated 3-D constructs, but they are limited in their ability to proliferate. Fortunately, a solution to this problem has emerged in the recent exciting research in stem cell biology. Stem cells offer the possibility of producing almost unlimited numbers of undifferentiated cells which can be converted into functional tissue with the proper stimuli. Clearly, there is much work to be done before stem cells are routinely used for tissue engineering in the RWV, but it has already started (see earlier) and will undoubtedly expand in the future.

292

A

B

C

Fig. 9. The effect of laminin A chain on human islet aggregation in the RCCS.

Human islets were preincubated in media with or without 50 mg/ml laminin A chain peptide for 3 h in a humidified, CO2 incubator at 37 1C followed by a seven-day culture in an RWV bioreactor or a petri dish. (A) Dish control without laminin A chain. (B) RWV without laminin A chain. (C) RWV with laminin A chain. The density of islets in the culture was approximately 300 islet equivalents per ml. (The color version of this figure is hosted on Science Direct.)

References 1. Harrison RG. Observations on the living developing nerve fiber. Proc Soc Exp Biol Med 1907;4:140–143. 2. Abbott A. Biology’s new dimension. Nature 2003;421:870–872. 3. Johnston B, Hering TM, Caplan AI, Goldberg VM and Yoo JU. In vitro chondrogenesis of bone marrow-derived mesenchymal progenitor cells. Exp Cell Res 1998;238:265–272. 4. Gerecht-Nir S, Cohen S and Itskovitz-Eldor J. Bioreactor cultivation enhances the efficiency of human embryoid body (hEB) formation and differentiation. Biotechnol Bioeng 2004;86:493–502. 5. Jacks T and Weinberg RA. Taking the study of cancer cell survival to a new dimension. Cell 2002;111:923–925. 6. Cherry RS and Papoutsakis ET. Physical mechanisms of cell damage in microcarrier cell culture bioreactors. Biotechnol Bioeng 1998;32:1001–1014.

293 7. Villa A, Versari S, Maier JAM and Bradammente T. Cell behavior in simulated microgravity: a comparison of results obtained with RWV and RPM. Gravit Space Biol 2005;18:89–90. 8. Patel MJ, Liu W, Sykes MC, Ward NE, Risin SA, Risin D and Jo H. Identification of mechanosensitive genes in osteoblasts by comparative microarray studies using the rotating wall vessel and random positioning machine. J Cell Biochem 2007;101:587–599. 9. Wilson JW, Ramamurthy R, Porwollik S, McClelland M, Hammond T, Allen P, Ott CM, Pierson DL and Nickerson CA. Microarray analysis identifies Salmonella genes belonging to the low-shear modeled microgravity regulon. Proc Natl Acad Sci USA 2002;99: 13807–13812. 10. Hammond TG and Hammond JM. Optimized suspension culture: the rotating-wall vessel. Am J Physiol Renal Physiol 2001;281:F12–F25. 11. Wolf DA, Schwarz RP. Analysis of gravity-induced particle motion and fluid perfusion flow in the NASA designed rotating zero-head-space tissue culture vessel. NASA Technical Paper 3143, 1991. 12. Philp D, Chen SS, Fitzgerald W, Orenstein J, Margolis L and Kleinman H. Complex extracellular matrices promote tissue-specific stem cell differentiation. Stem Cells 2005; 23:288–296. 13. Lin HJ, O’Shaughnessy TJ, Kelly J and Ma W. Neural stem cell differentiation in a cell-collagen-bioreactor culture system. Dev Brain Res 2004;153:163–173. 14. O’Connor SM, Stenger DA, Shaffer KM, Maric D, Barker JL and Ma W. Primary neural precursor cell expansion, differentiation and cytosolic Ca(2+) response in threedimensional collagen gel. J Neurosci Methods 2000;102:187–195. 15. Atmar RL and Estes MK. Diagnosis of noncultivatable gastroenteritis viruses, the human caliciviruses. Clin Microbiol Rev 2001;14:15–37. 16. Duizer E, Schwab KG, Neill FH, Atmar RL, Koopmans MPG and Estes MK. Laboratory efforts to cultivate noroviruses. J Gen Virol 2004;85:79–87. 17. Straub TM, Ho¨ner zu Bentrup K, Orosz-Coghlan P, Dohnalkova A, Mayer BK, Bartholomew RA, Valdez CO, Bruckner-Lea CJ, Gerba CP, Abbaszadegan M and Nickerson CA. In vitro cell culture infectivity assay for human noroviruses. Emerg Infect Dis 2007;13:396–403. 18. Smith YC, Grande KK, Rasmussen SB and O’Brien AD. Novel three-dimensional organoid model for evaluation of the interaction of uropathologenic Escherichia coli with terminally differentiated human urothelial cells. Infect Immun 2006;74:750–757. 19. Carterson AJ, Honer zu Bentrup K, Ott CM, Clarke MS, Pierson DL, Vanderburg CR, Buchanan KL, Nickerson CA and Schurr MJ. A549 lung epithelial cells grown as threedimensional aggregates: alternative tissue culture model for Pseudomonas aeruginosa pathogenesis. Infect Immun 2005;73:1129–1140. 20. Carvalho HM, Teel LD, Goping G and O’Brien AD. A three-dimensional tissue culture model for the study of attach and efface lesion formation by enteropathogenic and enterohaemorrhagic Escherichia coli. Cell Microbiol 2005;7:1771–1781. 21. Chen SS, Fitzgerald W, Zimmerberg J, Kleinman HK and Margolis L. Cell–cell and cell–extracellular matrix interactions regulate embryonic stem cell differentiation. Stem Cells 2007;25:553–561. 22. McGuckin CP, Forraz N, Baradez MO, Navran S, Zhao J, Urban R, Tilton R and Denner L. Production of stem cells with embryonic characteristics from human umbilical cord blood. Cell Prolif 2005;38:245–255. 23. Reyes M, Lund T, Lenvik T, Agular D, Koodie L and Verfaillie CM. Purification and ex vivo expansion of postnatal human marrow mesodermal progenitor cells. Blood 2001; 98:2615–2625.

294 24. Chen X, Xu H, Wan C, McCaigue M and Li G. Bioreactor expansion of human bone marrow mesenchymal stem cells. Stem Cells 2006;24:2052–2059. 25. Rucci N, Rufo A, Alamanou M and Teti A. Modeled microgravity stimulates osteoclastogenesis and bone resorption by increasing osteoblast RANK/OPG ratio. J Cell Biochem 2007;100:464–473. 26. Zayzafoon M, Gathings WE and McDonald JM. Modeled microgravity inhibits osteogenic differentiation of human mesenchymal stem cells and increases adipogenesis. Endocrinology 2004;145:2421–2432. 27. Botstein D and Fink GR. Yeast: an experimental organism for modern biology. Science 1998;240:1439–1443. 28. Nerem RM, Alexander RW, Chappell DC, Medford RM, Varner SE and Taylor WR. The study of the influence of flow on vascular endothelial biology. Am J Med Sci 1998;316:169–175. 29. Liu M, Tanswell AK and Post M. Mechanical force-induced signal transduction in lung cells. Am J Physiol Lung Cell Mol Physiol 1999;277:L667–L683. 30. Tabony J, Pochon N and Papaseit C. Microtubule self organization depends upon gravity. Adv Space Res 2001;28:529–535. 31. Johanson KJ, Allen PL, Lewis F, Cubano LA, Hyman LE and Hammond TG. Saccharomyces cerevisiae gene expression changes during rotating wall vessel suspension culture. J Appl Physiol 2002;93:2171–2180. 32. Hammond TG, Benes E, O’Reilly KC, Wolf DA, Linnehan RM, Taher A, Kaysen JH, Allen PL and Goodwin TJ. Mechanical culture conditions effect gene expression: gravityinduced changes on the space shuttle. Physiol Genomics 2000;3:163–173. 33. Zheng Q, Huang G, Xu Y, Guo C, Xi Y, Pan Z and Wang J. Could the effect of modeled microgravity on osteogenic differentiation of human mesenchymal stem cells be reversed by regulation of signaling pathways? Biol Chem 2007;388:755–763. 34. Ward NE, Pellis NR, Risin SA and Risin D. Gene expression alterations in activated human T-cells induced by modeled microgravity. J Cell Biochem 2006;99:1187–1202. 35. Simons DM, Gardner EM and Lelkes PI. Dynamic culture in a rotating-wall vessel bioreactor differentially inhibits murine T-lymphocyte activation by mitogenic stimuli upon return to static conditions in a time-dependent manner. J Appl Physiol 2006;100: 1287–1292. 36. Licato LL and Grimm EA. Multiple interleukin-2 signaling pathways differentially regulated by microgravity. Immunopharmacology 1999;44:273–279. 37. Hales NW, Yamauchi K, Martinez AA, Sundaresan A, Pellis NR and Kulkarni AD. A countermeasure to ameliorate immune dysfunction in in vitro simulated microgravity environment: role of cellular nucleotide nutrition. In Vitro Cell Dev Biol Anim 2002;38:213–217. 38. Kulkarni AD, Yamauchi K, Pellis NR. Nutrition Countermeasure and Immune Function in Microgravity. Proceedings of the 2nd Pan Pacific Basin Workshop on Microgravity Sciences, Pasadena, CA, 2001, April 2001BT-1099, pp. 1–10. 39. Navran S. Rotary bioreactor for recombinant protein production. In: Cell Technology for Cell Products, Smith R (ed), Dordrecht, The Netherlands, Springer, 2005, pp. 567–569. 40. Langer R and Vacanti JP. Tissue engineering. Science 1993;260:920–926. 41. Li WJ, Tuli R, Okafor C, Derfoul A, Danielson KS, Hall DJ and Tuan RS. A threedimensional nanofibrous scaffold for cartilage tissue engineering using human mesenchymal stem cells. Biomaterials 2005;26:599–609. 42. Hunziker EB. Articular cartilage repair: basic science and clinical progress. A review of the current status and prospects. Osteoarthr Cartil 2002;10:432–463.

295 43. Yoshioka T, Mishima H, Ohyabu Y, Sakai S, Akaogi H, Ishii T, Kojima H, Tanaka J, Ochiai N and Uemura T. Repair of large osteochondral defects with allogenic cartilaginous aggregates formed from bone marrow derived cells using RWV bioreactor. J Orthop Res 2007;25:1291–1298. 44. Freed LE, Vunjak-Novakovic G and Langer R. Cultivation of cell-polymer cartilage implants in bioreactors. J Cell Biochem 1993;51:257–264. 45. Lee DA, Reisler T and Bader DL. Expansion of chondrocytes for tissue engineering in alginate beads enhance chondrocytic phenotype compared to conventional monolayer techniques. Acta Orthop Scand 2003;74:6–15. 46. Ha¨uselmann HJ, Fernandes RJ, Mok SU, Schmid TM, Block JA, Aydelotte MB, Kuettner KE and Thonar EJ. Phenotypic stability of bovine articular chondrocytes after long-term culture in alginate beads. J Cell Sci 1994;107:17–27. 47. Akmal M, Anand A, Anand B, Wiseman M, Goodship AE and Bentley G. The culture of articular chondrocytes in hydrogel constructs within a bioreactor enhances cell proliferation and matrix synthesis. J Bone Joint Surg Br 2006;88:544–553. 48. Guo XM, Zhao YS, Chang HX, Wang CY, Ling-Ling E, Zhang XA, Duan CM, Dong LZ, Jiang H, Li J, Song Y and Yang XJ. Creation of engineered cardiac tissue in vitro from mouse embryonic stem cells. Circulation 2006;113:2229–2237. 49. Bursac N, Loo Y, Leong K and Tung L. Novel anisotropic engineered cardiac tissues: studies of electrical propagation. Biochem Biophys Res Commun 2007;361:847–853. 50. Minami Y, Sugihara H and Oono S. Reconstruction of cornea in three-dimensional collagen gel matrix culture. Invest Ophthalmol Vis Sci 1993;34:2316–2324. 51. Aramant RB and Seiler MJ. Retinal cell transplantation. In: Yearbook of Cell and Tissue Transplantation, Lanza RP and Chick WL (eds), Dordrecht, Kluwer Academic, 1996, p. 193. 52. Chen J, Chen R and Gao S. Morphological characteristics and proliferation of keratocytes cultured under simulated microgravity. Artif Organs 2007;31:722–731. 53. Das T, Cerro MD, Jalali S, Rao VS, Gullapalli VK, Little C, Loreto DAD, Sharma S, Sreedharan A, del Cerro CD and Rao GN. The transplantation of human fetal neuroretinal cells in advanced retinitis pigmentosa patients: results of a long-term safety study. Exp Neurol 1999;157:58–68. 54. Kumar R and Dutt K. Enhanced neurotrophin synthesis and molecular differentiation in non-transformed human retinal progenitor cells cultured in a rotating bioreactor. Tissue Eng 2006;12:141–158. 55. Patrick CW, Jr. Adipose tissue engineering: the future of breast and soft tissue reconstruction following tumor resection. Semin Surg Oncol 2000;19:302–311. 56. Choi YS, Park SN and Suh H. Adipose tissue engineering using mesenchymal stem cells attached to injectable PLGA spheres. Biomaterials 2005;26:5855–5863. 57. Frye CA and Patrick CW, Jr. Three-dimensional adipose tissue model using low shear bioreactors. In Vitro Cell Dev Biol Anim 2006;42:109–114. 58. Shapiro AM, Lakey JR, Ryan EA, Korbutt GS, Toth E, Warkock GL, Kneteman NM and Rajotte RV. Islet transplantation in seven patients with type 1 diabetes mellitus using a glucocorticoid-free immunosuppressive regimen. N Eng J Med 2000;343:230–238. 59. Ryan EA, Breay PW, Senior PA, Bigam D, Alfadhli E, Kneteman NM, Lakey JRT and Shapiro AMJ. Five-year follow-up after clinical islet transplantation. Diabetes 2005;54: 2060–2069. 60. Paraskevas S, Duguid WP, Maysinger D, Feldman L, Agapitos D and Rosenberg L. Apoptosis occurs in freshly isolated human islets under standard culture conditions. Transplant Proc 1997;29:750–752.

296 61. Cattan P, Berney T, Schena S, Molano RD, Pileggi A, Vizzardelli C, Ricordi C and Inverardi L. Early assessment of apoptosis in isolated islets of Langerhans. Transplantation 2001;71:857–862. 62. Rosenberg L, Wang R, Paraskevas S and Maysinger D. Structural and functional changes resulting from islet isolation lead to cell death. Surgery 1999;126:393–398. 63. Gaber OA, Fraga DW, Callicutt CS, Gerlling IC, Sabek OM and Kotb M. Improved in vivo pancreatic islet function after prolonged in vitro islet culture. Transplantation 2001;72:1730–1736. 64. Rutzky LP, Bilinski S, Kloc M, Phan T, Zhang H, Katz SM and Stepkowski SM. Microgravity culture condition reduces immunogenicity and improves function of pancreatic islets. Transplantation 2002;74:13–21. 65. Stepkowski SM, Phan T, Zhang H, Bilinski S, Kloc M, Qi Y, Katz M and Rutzky LP. Immature syngeneic dendritic cells potentiate tolerance to pancreatic islet allografts depleted of donor dendritic cells in microgravity culture condition. Transplantation 2006;82:1756–1763. 66. Navran S, Rutzky L, Rajan A, Zhang H and Phan T. New culture technology significantly improves islet function prior to transplantation. Sixth Annual Rachmiel Levine Diabetes and Obesity Symposium, Advances in Diabetes Research: From Cell Biology to Cell Theraphy, Universal City, CA, 2005. 67. Kuroda Y, Fujino Y, Morita A, Tanioka Y, Ku Y and Saitoh Y. The mechanism of action of the two-layer (Euro-Collins solution/perfluorochemical) cold storage method in canine pancreas preservation – the effect of 2,4 dinitrophenol on graft viability and adenosine triphosphate tissue concentration. Transplantation 1992;53:992–994. 68. Navran SS. Rotary cell culture systems. In: Encyclopedia of Biomaterials and Biomedical Engineering, Wnek GE and Bowlin GL (eds), New York, Marcel Dekker, 2004, pp. 1324–1330.

297

Ethnomedicines and ethnomedicinal phytophores against herpesviruses Debprasad Chattopadhyay1 and Mahmud Tareq Hassan Khan2, 1

ICMR Virus Unit, I.D. & B.G. Hospital Campus, General Block: 4, First floor, 57, Dr. Suresh C. Banerjee Road, Beliaghata, Kolkata 700 010, India 2 Department of Pharmacology, Institute of Medical Biology, University of Tromsø, 9037 Tromsø, Norway Abstract. Herpesviruses are important human pathogens that can cause mild to severe lifelong infections with high morbidity in susceptible adults. Moreover, Herpes simplex virus (HSV) type 2, for example, has been reported to be responsible for increased transmission and disease progression of human immunodeficiency virus (HIV). Therefore, the discovery of novel antiHSV drugs deserves great efforts. Herbal medicinal products have been used as source of putative candidate drugs in many diseases. However, in case of viral diseases the development of antivirals from natural source is less explored probably because within the virus there are few specific targets where the small molecules can interact to inhibit or kill the virus. The currently available antiherpes drugs are nucleoside analogs that did not cure the lifelong or recurrent infections and the use of these drugs often lead to the development of viral resistance coupled with the problem of side effects, recurrence and viral latency. However a wide array of herbal products, used by diverse medicinal systems throughout the world, showed high level of antiherpesvirus activities and many of them have complementary and overlapping mechanism of action, either by inhibiting viral replication, or viral genome synthesis. This chapter will summarize some of the promising herbal extracts and purified compounds isolated from the herbal sources by several laboratories. Cases with proven in vitro and documented in vivo activities, along with their structure-activity relationship against herpesviruses are discussed. Keywords: herbal medicine, herpesvirus, latency-associated transcripts, structure-activity relationship.

Introduction Over the centuries herbal medicinal products formed the basis of medicaments in Africa, China [1], India [2] and in many other civilizations [3]. The traditional healers have long used herbal products to prevent or cure infectious conditions but today the clinical virologists are interested in herbal products as (i) the effective life span of any antiviral drug is limited, (ii) many of the viral diseases are intractable to most of the orthodox antivirals and (iii) the problems of viral drug resistance, latency and recurrence in immunocompromised hosts. Moreover, the rapid spread of emerging and reemerging viral diseases has spurred intensive investigation into the herbal products. Additionally the rapid Corresponding author: Tel. (Off ): þ47-776-46755. (Mobile): þ47-977-94171.

Fax: þ47-776-45310. E-mail: [email protected]; [email protected] (M.T.H. Khan). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00012-4

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

298 rate of species extinction [4] leads to irretrievable loss of structurally diverse and potentially useful phytochemicals. Most of the herbal medicinal products are the secondary metabolites of plants and are species/strain-specific with diverse structures and bioactivities, synthesized mainly for defense against predators, as the natural version of chemical warfare. This chapter will summarize some of the promising extracts and compounds, isolated from herbal medicinal source, having antiherpesvirus activity, reported from several laboratories with proven in vitro and some documented in vivo activities, along with their structure-activity relationship (SAR).

The Herpesvirus The herpesviruses belong to Herpesviridae, a family of DNA viruses that cause diseases in humans and animals [5,6]. In Greek, the word herpein means ‘‘to creep’’, referring to the latent, reoccurring and lytic infections typical of these viruses. There are eight distinct viruses (Table 1) in this family that are known to cause disease in humans [7,8]. All human herpesviruses (HHV) contain a large double-stranded, linear DNA with 100–200 genes encased within an icosahedral protein capsid wrapped in a lipid bilayer envelope, called virion. Following binding of Table 1. Members of human Herpesviridae. Type

Synonym

Subfamily

Pathophysiology

HHV-1

Herpes simplex virus-1 (HSV-1) Herpes simplex virus-2 (HSV-2) Varicella zoster virus (VZV) Epstein–Barr virus (EBV) Lymphocryptovirus

Alphaherpesvirinae

Oral and/or genital herpes (orofacial) Oral and/or genital herpes (genital) Chickenpox and Shingles

HHV-2 HHV-3 HHV-4

HHV-5 HHV-6, 7 HHV-8

Cytomegalovirus (CMV) Roseolovirus Kaposi’s sarcomaassociated herpesvirus (KSHV), a rhadinovirus

a (alpha) a (alpha) Gammaherpesvirinae g (gamma)

Betaherpesvirinae b (beta) g (gamma)

Infectious mononucleosis, Burkitt’s lymphoma, CNS lymphoma (in AIDS patients), post-transplant lymphoproliferative syndrome (PTLD), nasopharyngeal carcinoma Infectious mononucleosis-like syndrome, retinitis, etc. Roseola infantum or exanthem subitum Kaposi’s sarcoma, primary effusion lymphoma, some multicentric Castleman’s disease

299 viral envelope glycoproteins to host cell membrane receptors, the virion is internalized and dismantled, allowing viral DNA to migrate to the host cell nucleus, where viral DNA replication and transcription occurs. One successful replication cycle of Herpes simplex virus (HSV) depend upon a number of steps: virion entry, ‘‘expression’’ of viral immediate-early (a) genes such as infected cell protein (ICP) 0 and 4, early (b1, b2) genes including DNA polymerase and thymidine kinase and late (g1, g2) genes containing glycoprotein B (gB), ICP5 and gC, and unpaired DNA replication [9]. During symptomatic infection, infected cells transcribe lytic viral genes, but sometimes a small number of viral genes (latency-associated transcripts, LAT) accumulate, and the virus can persist in the host cell indefinitely. The primary infection is a self-limited period of illness, but long-term latency is symptomfree. Following reactivation, transcription of viral genes switches from LAT to multiple lytic genes and consequently leading to enhanced replication and virion production. Herpesviruses are common pathogens that cause localized skin infections of the mucosal epithelia of the oral cavity, the pharynx, the esophagus and the eye, or genitals, depending upon the type involved [10]. Moreover, herpesvirus infections may also cause severe problems to infected individuals due to the properties such as (i) the virus establish latent infections that can be periodically reactivated. (ii) Under certain circumstances, the virus can produce serious infections of the central nervous system including acute necrotizing encephalitis and meningitis; the viruses may also produce fatal infections in patients with immune deficiencies [11]. (iii) The immediate-early genes of HSV-1 can stimulate the activation of genes belonging to different viruses such as human immunodeficiency virus (HIV) [12], Varicella zoster virus (VZV) [13] or human papillomavirus type 18 [14]. Additionally, HSV infections were reported to be a significant risk factor for transmission of HIV/AIDS [15,16]. HSV-2 is also known as oncogenic virus which has the ability to convert cells into tumor cells [17]. Infection with herpesvirus can also lead to scarification, a major cause of blindness in developing nations [18]. Acute and recurrent HSV infections remain a most important health problem. The search for selective antiviral agents has been vigorous in recent years, but the need for new antiviral therapies still exists since many of the problems relating to the treatment of HSV infections remain unresolved, such as generation of viral resistance and conflicting efficacy in recurrent infection and in immunocompromised patients [19]. Control of Herpesvirus infection The herpesvirus causes a lifelong infection with high morbidity. Although no cure is available nucleoside analogs have been extensively investigated in the search for effective antiherpesvirus agents [20], of which acyclovir [9-(2-hydroxyethoxymethyl) guanosine], a highly selective antiherpetic agent

300 widely used for the systemic treatment of herpesvirus infections, because acyclovir is specifically phosphorylated by viral thymidine kinase in infected cells [21,22]. However, acyclovir-resistant herpesvirus has been isolated in immunocompromised patients [23,24]. Moreover, there are no effective agents that can control neonatal infections as well as infections caused by other viruses of Herpesviridae. Therefore, it is desirable to develop new antiherpesvirus agents that substitute for or complement acyclovir group of drugs. But the development of antiherpetic agents from herbal medicinal products is less explored, probably because there are a very few specific viral targets for small molecules to interact with [25,26]. Several herbal medicinal products are potential sources of functional foods due to their various biological activities such as immunomodulatory and antitumor functions [27,28]. Even today nearly 80% of the world population utilize herbal medicinal products for primary health care [29,30]. However, ethnopharmacology provides an alternative approach for the discovery of antiherpesvirus agents by utilizing the generation old wisdom of ethnic community. Numerous studies showed that phloroglucinol [31], anthraquinones [32], polysaccharides [33], triterpenes [34], polyphenols and flavonoids [35,36] isolated from herbal sources have inhibitory activities against the replication of herpesviruses. Owing to the amazing structural diversity and broad range of bioactivities herbal medicinal products can be explored as a source of complementary antiherpetic agents, as many of them are reported to inhibit several steps of replication cycle and certain cellular factors of herpesviruses. It has been reported that a thiazolylsulfonamide BAY 57-1293 inhibits herpesvirus helicase-primase with potent in vitro and in vivo antiherpetic activity [37,38]. Another promising natural antiherpetic agent, n-docosanol, completed clinical evaluation and approved by the U.S. Food and Drug Administration as a topical treatment for herpes labialis [39–41]. These findings indicated that the natural products are still potential sources in the search for new antiherpetic agents [42,43]. A list of some potential herbal medicinal extracts having antiherpes activities are presented in Table 2. In vitro and in vivo antiherpetic activity with crude herbal products During the last 50 years numerous broad-based screening programme conducted throughout the world to evaluate the in vitro and in vivo antiviral activity of hundreds of herbal products, showed that many such products had strong antiherpesvirus activity and some can be used in the treatment of diseases caused by the members of Herpesviridae [44–49]. Canadian researchers reported the in vitro antiherpesvirus activities of grape, apple and strawberry juices while the leaf extract of Melia azadirachta (Azadirachta indica) inhibits DNA viruses like smallpox, chicken pox, poxvirus and HSV [50]. The water-soluble extract from narcissus bulb [51], the infusion of

301 Table 2. In vitro and in vivo antiherpesvirus activities of some important medicinal plants of diverse culture. Virus

Name of plants

Compound (class)

References

Herpesvirus

Mallotus japonicus

Phloroglucinol (terpene) – Eugeniin

[31] [58] [164]

– – –

[62] [19] [63]

Spirostanol (steroid) glycoside Methyl gallate – – Propolis – – –

[166] [114] [52] [54] [117] [64] [75] [76]

Sulfated galactan –

[181] [79]

Flavonoids, terpenoids, essential oil Scopadulcic acid B Polyphenol complex Anthroquinone, flavones

[61] [60]

Rheum officinale Geum japonicum, Syzygium aromaticum Annona sp., Beta vulgaris Geranium sanguineum L. Polygonium punctatum, Sebastiana braselliensis, Lithraea molleiodes Solanum sp. HSV

HSV-1

Sapium seiferum Flos verbasci Pongamia pinnata Many plants Eupatorium articulum Brysonima verbascifolia Holoptelia integrifolia, Nerium indicum Bistrychia montagnei Boussiangaultia gracilis, Serissa japonica Geum japonicum, Syzygium aromaticum Terminalia chebula, Rhus javanica Scoparia dulcis Geranium sanguineum L. Rheum officinale, Aloe barbadensis, Cassia angustifolia Aloe emodin Houttuynia cordata Helichrysum aureonitens Melaleuca leucadendron, Nephelium lappaccum Stephania cepharantha Melia azedarach Rhizoma polygonia Cuspidati, Radix Astragali Morus alba

[141] [91] [32]

Rosmarinic acids (phenolics) – –

[32]

N-methylcrotsparine (alkaloid) 28-Deacetylsendanin Meliacine (peptide)

[65,173]

Mulberocide (flavonoid)

[181]

[57] [59] [65]

[146] [179] [67]

302 Table 2. (Continued ) Virus

Name of plants

Compound (class)

References

Maesa lanceolata

Maesasaponin (saponin) Pulegone (essential oil) Iridoids (saponins)

[120]

Torvanol, torvoside (flavonoid) Essential oil Essential oil

[124]

Polyphenol

[82]

Polyphenol

[95]

Minthostachys verticillata Scrophularia scorodonia, Bupleurum nigidum Solanum torvum

HSV-2

Santalum album, Lemon grass Artemisia douglasiana, Eupatorium patents Tessaria absinthioides Moringa oleifera, Aglaia odorata, Ventilago enticulata Agrimonia pilosa, Punica granatum Cretastigma willmattianum Barteria lupulina, Clinacanthus nutans Glyptopetalum scerocarpum Cedrela tubiflora

HSV-1 and HSV-2

Terminalia arjuna Melissa officinalis Rhus javanica Maesa lanceolata Prunella vulgaris Trixis praestans, Cunilla spicata Centell asiatica, Mangifera indica Phyllanthus orbicularis Eucalyptus, Australian Tree Tea Eupatorium patens Homalium cochinchinensis Geum japonicum Alstonia macrophylla

VZV

Aloe emodin, Aloe barbadensis

[156] [168]

[46] [159]

[83] [68] Sclerocarpic acid (terpene) Acidic polysaccharide Casuarinin (tannin) Essential oil Morin (triterpene) Maesasaponin (saponin) Anionic polysaccharide – Asiaticoside, mangiferin Polyphenol Essential oil Essential oil Salicin, cochinolide, tremulacin Eugeniin (tannin) Ursolic acid (triterpene) Rosmarinic, chlorogenic, caffeic acids (phenolics)

[143] [182] [167] [160] [87] [147] [184] [66] [119] [72] [157] [159] [96] [87] [174] [32]

303 Table 2. (Continued ) Virus

Name of plants

Compound (class)

References

CMV

Listeria ovata, Urtica dioca Allium sativum Nigella sativa Geum japonicum, Syzygium aromaticum Terminalia chebula, Rhus javanica Phyllanthus myrtifolius, Phyllanthus urinaria Syzygium aromaticum

Lectins

[177] [56] [71] [61] [60]

EBV

Essential oil Flavonoids, terpenoids, esstential oil Ellagotannin

[165]

Ellagitannin (tannin)

[46]

Notes: HSV, Herpes simplex virus; VZV, Varicella zoster virus; CMV, cytome´galovirus and EBV, Epstein–Barr virus.

Flos verbasci [52] as well as the British Columbian plants Cardamine angulata, Conocephalum conicum, Polypodium glycyrrhiza [53] showed antiherpesvirus activity. The seed extract of an Indian medicament Pongamia pinnata inn showed in vitro anti-HSV activity [54], while the hot water extracts of 32 traditional medicaments used in China, Indonesia and Japan showed antiherpesvirus activity, of which 12 extracts were found to reduce the development of skin lesions caused by HSV-1 significantly and prolonging the mean survival time of HSV-1-infected mice [55]. Interestingly, the garlic extracts showed inhibitory activity against human cytomegalovirus (CMV) in vitro [56], while the steam distillate of Houttuynia cordata showed inhibitory activity against HSV-1 [57]. Wang et al. [58] reported the antiherpesvirus activity of ethanol extract of Rheum officinale Baill root and rhizome, while the aqueous extracts of Helichrysum aureonitens (Asteraceae) shoots inhibit HSV-1 [59]. The therapeutic activity of Terminalia chebula was demonstrated against HSV in vivo [60], while Japanese researchers found that Terminalia chebula significantly suppressed MCMV (murine CMV) yields in lungs of treated mice and also inhibit the replication of human CMV in vitro and in immunosuppressed mice. Hence, Terminalia may be beneficial for the prevention of CMV diseases in immunocompromised patients [61], but Geranium sanguineum L. showed antiherpesvirus activity [19]. Prescreening of Colombian plants used for the treatment of a variety of diseases revealed the antiherpetic activity by the aqueous extract of Beta vulgaris, ethanol extract of Callisia grasilis and the methanol extract of Annona sp. (CC50 49.6  103 mg/ml) with acceptable therapeutic indexes (TIs) [62]. Similarly, the aqueous extracts of Argentinean medicament Polygonium punctatum, Lithraea molleiodes, Sebastiana braselliensis and Sebastiania klotzschiana used in infectious diseases showed in vitro antiherpetic activity (ED50 39-169 mg/ml) with no cytotoxicity [63]. Interestingly, the aqueous extract of the most potent antiherpetic plant of South America Eupatorium

304 articulum inhibits HSV-1 at 125–250 mg/ml [64]. The extracts of seven traditional remedies of Indonesia demonstrated good anti-HSV-1 activity at 199 mg/ml, of which the methanol extracts of Melaleuca leucadendron (Myrtaceae) fruits and Nephelium lappaceum (Sapindaceae) pericarp significantly prolonged the development of skin lesions and reduced the mouse mortality [65]; while the hydromethanolic extracts of 23 Brazilian folk remedies including Trixis praestans (Vell) Cabr and Cunila spicata Benth showed strong inhibitory activity against HSV-1 and HSV-2 [66]. The combination of two Chinese herbs Rhizoma Polygoni Cuspidati and Radix Astragali at 1:1 is reported to annihilate HSV-1 F strain synergistically, with 20–80% plaque reduction. The combination can inhibit multiplication with combination index of o1.0 and is virucidal for HSV-1 F strain [67]. The Thai folk medicinal plant Barleria lupulina Lind and Clinacanthus nutans (Burm.f) Lindua (Acanthaceae) inhibit five clinical isolates and control strain (G strain) of HSV-2 [68], while aqueous extracts of Nepeta nepetella, Nepeta coerulea, Nepeta tuberosa (150–500 mg/ml), Dittrichia viscosa and Sanguisorba minor magnolii (50–125 mg/ml) of Iberian Peninsula showed strong anti-HSV1 activity [69]. The oral administration of Japanese oriental medicineTJ-41 (200 mg/kg/day) is beneficial for the treatment of HSV-1 infection in immunocompromised cancer patients receiving chemotherapy [70], while the i.p. administration of black seed oil of Nigella sativa to BALB/c mice against a susceptible strain of MCMV strikingly inhibits the virus titers in spleen and liver. This striking antiviral effect against MCMV infection is probably mediated by increasing of M&phi number and gamma interferon (IFN-g) production [71]. The aqueous extract of leaves and stems of Phyllanthus orbicularis (Euphorbiaceae) exhibited selective antiviral indexes of 12.3 and 26 against bovine HSV-1 and HSV-2, respectively; and impaired the replication of both viruses in a concentration-dependent manner, partially due to a direct interaction with virus particles or their entry into the cell [72]. Strong anti-HSV activity was also reported with Byrsonima verbascifolia extract, a remedy of skin infections in Colombia [73]. The anti-HSV-1 effect of essential oil-containing mouthwash (LA & TLA) on Vero cell monolayer showed that the undiluted LA and TLA completely inhibited HSV-1 McIntyre strain up to 1:2 dilutions, suggesting its preprocedural use in reducing viral contamination of bioaerosols during dental care [74]. Strong anti-HSV activities were noticed in methanolic extracts of 13 plants used in the treatment of skin infections in four regions of Colombia, where the most potent extract was Byrsonima verbascifolia L. HBK (2.5 mg/ml), indicating that these plants represent an untapped source of potentially useful antivirals [75]. Rajbhandari et al. [76] noticed that the Nepalese medicine Holoptelia integrifolia and Nerium indicum exhibited considerable anti-HSV activity without cytotoxicity. As the increasing use of acyclovir, ganciclovir and foscarnet against HSV, VZV and CMV leads to the emergence of drugresistant strains, so when 31 herbs of Chinese medicine used as antipyretic

305 and anti-inflammatory remedy were screened, the ethanol extract of Rheum officinale and methanolic extract of Paeonia suffruticosa prevented the attachment and penetration of HSV; while the aqueous extract of Paeonia suffruticosa and ethanolic extract of Melia toosendan inhibited HSV attachment, indicating that these herbs have a potential for new anti-HSV compounds [77]; while Senecio ambavilla of La Reunion Island had antiHSV-1 activity [78]. The hot water extracts of two medicinal plants of Taiwan exhibited anti-HSV activity with different degrees of potency, suggesting that these extracts merit further investigation [79]. Similarly, the extract of Boussingaultia gracilis var. pseudobaselloides (Basellaceae) and Serissa japonica (Rubiaceae) of Taiwan inhibits HSV-1 [80]. When BCC-1/KMC cells were infected with HSV and then cultured with hot water extract of Bidens pilosa L. var. minor (Blume) Sherff or Houttuynia cordata Thunb, the Bidens pilosa significantly inhibited the replication of HSV at 100 mg/ml (11.9% for HSV-1; 19.2% for HSV-2), whereas Houttuynia cordata had the same effect at 250 mg/ml (10.2% for HSV-1; 32.9% for HSV-2). The ED50 of HSV-1 and HSV-2 for Bidens pilosa was 655.4 and 960 mg/ml, respectively, but for Houttuynia cordata it was 822.4 and 362.5 mg/ml, with selective indexes above 1.04, but Houttuynia cordata had better effect against HSV-2 with low ED50, indicating its usefulness against HSV-2 infection [81]. In a plaque reduction assay, 11 Thai medicinal plants exhibited anti-HSV activity at 100 mg/ml. Among them Aglaia odorata, Moringa oleifera and Ventilago denticulata were effective against thymidine kinase-deficient HSV-1 and phosphonoacetate-resistant HSV-1 strains. Significantly the extract of Moringa oleifera at 750 mg/kg per day delayed the development of skin lesions, prolonged the mean survival times and reduced the mortality of HSV-1 infected mice; while Aoringa odorata and Ventilago denticulata significantly reduce the development of skin lesions (Po0.05), similar to acyclovir. There was no significant difference between acyclovir and Moringa oleifera in prolonging the mean survival times. As these extracts were nontoxic in effective doses, hence may be possible candidates for new antiHSV agent [82]. Chen et al. [83] reported that the extract of Ceratostigma willmattianum had anti-HSV-1 activity at 50% toxic concentration with an IC50 of 29.46 mg/l and TI of 36.56. While the same extract at 50% inhibiting concentration had IC50 of 9.12 mg/l and TI of 36.18, similar to acyclovir. Further study reveals that the extract interfere HSV-1 absorption, inhibit gD DNA replication and gD mRNA expression. Recently Tshikalange et al. [84] found that the extracts of Senna petersiana, a folk remedy of sexually transmitted diseases, have strong anti-HSV activity. On the contrary, Cheng et al. [85] reviewed the anti-HSV activity of Terminalia arjuna, Myrica rubra, Thea (Camellia) sinensis and Pterocarya stenoptera extracts, as well as their active phytochemicals alkaloids, flavonoids, saponins and quinines having anti-HSV activity with their mechanisms of action. A recent study by Ramzi et al. [86] with the methanol and hot aqueous extracts of 25

306 different plant species, used in Yemeni traditional medicine growing on the island Soqotra showed that 17 plants including Boswellia ameero, Boswellia elongata, Buxus hildebrandtii, Cissus hamaderohensis, Cleome socotrana, Dracaena cinnabari, Exacum affine, Jatropha unicostata and Kalanchoe farinacea have anti-HSV and anti-influenza A (IC50 0.7–12.5 mg/ml). A large number of isolated compounds from diverse chemical group(s) like phenolics, flavonoids, terpenoids, anthraquinones, phloroglucinol, lectins and sugars have promising antiherpetic activities [87]. The antiherpesvirus activities of several isolated compounds from herbal sources, with their probable mode of action, as reported from different laboratories, are presented in Table 3. Phenolics and polyphenols Phenolics are one of the largest groups of nonessential dietary components with diverse structure and activity, including 8,000 different compounds widely distributed in the plant kingdom. Phenolics are consisting of a hydroxyl group (–OH) attached to an aromatic hydrocarbon group, the simplest of the class is phenol (C6H5OH). Although similar to alcohols, phenols are unique since the –OH group is not bonded to a saturated carbon atom. They have relatively higher acidities due to the aromatic ring tightly coupling with the oxygen and a relatively loose bond between the oxygen and hydrogen. Loss of a hydrogen ion (Hþ) from the hydroxyl group of a phenol forms a negative phenolate ion. The three most important groups of dietary phenolics are flavonoids, phenolic acids and polyphenols. Flavonoids are the largest and most studied group. Phenolic acids form a diverse group that includes the hydroxybenzoic and hydroxycinnamic acids. Phenolic polymers (known as tannins) are compounds of high molecular weight having two classes: hydrolyzable and condensed tannins. Phenols have traditionally been considered as antinutritive due to the adverse effect of tannins on protein digestibility. However, recent research showed that these compounds are biologically active and may possess some disease-preventive properties [88], as some polyphenolic complex can inhibit the reproduction of HSV [89]. Many phenolics are consumed in the diet that have health-promoting activities like inhibition of atherosclerosis, cancer, certain infections and can act as antioxidant (chelate metals, inhibit lipoxygenase and scavenge free radicals). A review on the organoleptic effects, metabolism and bioavailability of phenolics in humans, by Martinez-Valverde et al. [90], will be of great interest. The simplest bioactive phytochemicals with a single substituted phenolic ring belongs to a wide group of phenylpropane that are in the highest oxidation state and have wide range of antiviral activities. It has been reported that the polyphenolic complex of Geranium sanguineum L. (Geraniaceae) inhibits HSV-1 multiplication [91], while the water extract of Geranium sanguineum L. aerial roots containing flavonoids, catechins and condensed tannins inhibit

307 Table 3. Antiherpetic antivirals from diverse chemical groups. Natural product Alkaloids Cepharanthine (Fig. 65) FK-3000 (Fig. 63) Harmine (Fig. 64) Bis-benzylisoquinoline (Fig. 61), Protoberberine, morphine (Fig. 62), N-methylcrotsparine

Source

Antiviral activity (mg ml1)

Reference

Stephania cepharantha Stephania cepharantha Ophiorrhiza nicobarica

HSVa HSV-1 (7.8)b,c HSV-2 (300)c,d

[65,173] [65,173] [174]

Stephania cepharantha HAYATA

HSV-1 (18)b

Phenolics Caffeic acid (Fig. 1)

Plantago major

Caffeic acid (Fig. 1) Chlorogenic acid (Fig. 1)

Plantago major Aloe barbadensis

Procyanidin A1 (Fig. 7) Procyanidin C1 (Fig. 7) Prodelphinidine-Ogallate Rosmarinic acid (Fig. 2) Xanthohumol (Fig. 8) Polyphenolic complex Polyphenolic compounds

Vaccinium vitis-idaea Crataegus sinaica Myrica rubra Plantago major Humulus lupulus Geranium sanguineum L. Agrimonia pilosa

ACVR HSV-1, HSV-2 (7.8–9.9)b (90, 71, 81)e

[65,173]

HSV-1(15.3)f, VZVa HSV-2 (87.3)f,d HSV-1 (47.6)b HSV-2 (86.5)b HSV-2a,g HSV-1a,d HSV-2 (5.3)b,d,g

[93]

VZVa HSVa,c HSV-1

[93] [101] [91]

HSV-1 HSV-1, HSV-2 (3.6–6.2)f

[95] [19]

[93] [32] [85,98] [99] [94]

Polyphenols

Pithecellobium clypearia Punica granatum Geranium sanguineum L.

HSVa HSV-1 (9.12b, 36.5)e,d,h HSV-2a HSV-1, HSV-2

[140] [83]

Cochinolide B, tremulacin Asiaticoside (Fig. 20) Mangiferin (Fig. 21) Propolis

Artocarpus lakoocha Millettia erythrocalyx Cretastigma willmaltianum Barleria lupulina Homalium cochinchinensis Centella asiatica Mangifera indica Many plants

HSVa HSV-1a, HSV-2a HSV

[119] [119] [117]

Rhus succedanea

HSVi

[186]

Orange, grape Helichrysum aureonitens

HSV-1a HSVa

[110] [118]

Flavonoids Amentoflavone (Fig. 23) Catechin (Fig. 4) Galangin (Fig. 15)

[68] [96]

308 Table 3. (Continued ) Natural product

Source

Antiviral activity (mg ml1)

Reference

Glycyrrhizin (Fig. 26) Hesperidin (Fig. 14) Mulberroside C Resveratrol (Fig. 35)

Glycyrrhiza glabra Orange, grape Morus alba

Oxyresveratrol (Fig. 36) Robustaflavone (Fig. 24) Torvanol A (sulfated isoflavone) Torvoside H (steroidal glycoside) Acetal Torvanol Quercetin (Fig. 13)

Millettia erythrocalyx Garcinia multiflora Solanum torvum

HSV-1 HSVa HSV-1a HSVa HSVa,h HSVa VZVa HSV-1 (9.6)b

[46] [110] [125] [136] [139] [140] [123] [124]

Solanum torvum

HSV-1 (23.2)b

[124]

Solanum torvum Caesalpinia pulcherrima

[124] [79,80]

Leachianone G Mulberroside C Phloroglucinol (Fig. 19) Methyl gallate (Fig. 18), methyltrihydroxybenzoate Flavone glycoside

Morus alba L. Morus alba L. Mallotus japonicus Sapium sebiferum

HSV-1 (17.4)b HSV-1 (24.3)e, (20)j,d HSV-1 (1.6)b HSV-1 (75.4)b HSVa HSVa

Butea monosperma

HSVa

[126]

Ocimum basilicum Ocimum basilicum Syzygium claviflorum Cassia javanica Melaleuca alternifolia Euphorbia segetalis Myrceugenia euosma Rhus javanica Minthostachys verticillata Euphorbia jolkini

HSV-1a HSV (2.6)f HSVa HSV-2a HSV-1, 2 (0.06)b,d HSV-1, HSV-2a HSV (3.9)f HSV-2a HSV-1 (10)k

[127] [152] [151] [153] [154,158] [149] [145] [145] [156]

HSV-2 (6.3 mM)b,d,g

[150]

HSV-1, HSV-2

[143]

HSV-1 (16.7)e,d

[141]

Terpenes/sterols Apigenin (Fig. 29) Betulinic acid (Fig. 41) Epiafzelechin (Fig. 47) Isoborneol (Fig. 48) Lupenone (Fig. 43) Moronic acid (Fig. 42) Pulegone (Fig. 49) Putranjivain A (Fig. 44) Ursolic acid (Fig. 46) Sclerocarpic acid (sesquiterpene) Scopadulcic acid B (diterpenoid) Triterpenes Quassinoids (Fig. 39) Limonoid, 28deacetylsendanin Asiaticoside (Fig. 20) Mangiferin (Fig. 21)

Geum japonicum Glyptopetalum sclerocarpum Scoparia dulcis L.

[125] [125] [31] [114]

Cochlospermum tinctorium Crataegus pinatifida – Melia azedarach

EBVa,d

[144]

HSVa EBVa,d HSV-1 (1.46)b,d,l

[144] [142] [146]

Centella asiatica L. Mangifera indica L.

HSV-1a, HSV-2a HSV-1a, HSV-2a

[119] [119]

309 Table 3. (Continued ) Natural product

Source

Antiviral activity (mg ml1)

Reference

Steroidal glycoside Schizarin Salicin

Solanum sp. Kadsura matsudai Homalium cochinchinensis Melissa officinalis L. Minthostachys verticillata Melaleuca alternifolia Eucalyptus Santolina insularis Nigella sativa Nelumbo nucifera –

HSVa HSVa HSV-1a, HSV-2a

[166] [170] [96]

HSV-2a HSVa

[160] [156]

HSVa HSVa HSVa MCMVa HSV-1a HSVa,m

[157] [157] [155] [71] [176] [37,38]

Bupleurum nigidum

HSV-1 (500)b

[168]

Scrophularia scorodonia Maesa lanceolate Solanum sp.

VSVa HSV-1a, HSV-2a HSV-1a

[168] [147,120] [166]

Terminalia arjuna Terminalia arjuna L. Geum japonicum

[167] [167] [164]

Euglobal-G1–G3 n-Docosanol

Syzygium aromaticum Limonium sinensi Clausena heptaphylla Phyllanthus myrtifolius, Phyllanthus urinaria Eucalyptus grandis –

HSV-2 (1.5 mM)b,g HSV-2a HSV-1a, HSV-2a, EBVa Herpes virus HSV-1a HSVa EBVa,h EBVa HSV-1a

[165] [162] [41,39]

Lignans Yatein (Fig. 60)

Chamaecyparis obtusa

HSV-1a,d,h

[172]

Carbohydrate Polysaccharide Acidic polysaccharide Anionic polysaccharide

Selerotium glucanicum Cedrela tubiflora Prunella vulgaris

HSV-1a HSV-2a,d HSV-1 (100)c, HSV-2 (10)c HSVa,d HSVa,d HSV-1, HSV-2a

[33] [182] [184]

HSV-1a,d HSV-1a

[179] [179]

Volatile oil Essential oil

Black seed oil Thiazolylsulfonamide Saponins 8-Acetylharpagide, scorodioside Saikosaponin (Fig. 58) Maesasaponin A Saponin glycosides (spirostane, tomatidane) Tannin Casuarinin (Fig. 57) Casuarinin (Fig. 57) Eugeniin (Fig. 55) Samaragenin B Isomeranzin Ellagitannins

Sulfated galactans Galactofucan

Bostrychia montagnei Many plants Undaria pinnatida

Proteins and peptides Meliacine

Melia azedarach

[164] [36] [133]

[181] [180] [183]

310 Table 3. (Continued ) Natural product

Source

Antiviral activity (mg ml1)

Reference

Mannose-specific lectins (GlcNAc)n-specific lectin

Listera ovata Urtica dioica

CMV (0.08)f CMV (0.3–9)f

[178] [177]

Notes: HSV, Herpes simplex virus; VZV, Varicella zoster virus; CMV, cytomegalovirus; MCMV, murine cytomegalovirus and EBV, Epstein–Barr virus. a IC50/EC50/ED50 not available. b IC50. c Virus-induced cytopathic effects. d Virus replication/multiplication. e TI. f EC50. g Attachment and penetration. h Infected cell polypeptide/DNA polymerase inhibitor. i DNA replication/gene expression. j SI, Inhibit: k ED50. l Thymidine kinase. m Helicase-primase.

HSV-1 and HSV-2 (EC50 3.6–6.2 mg/ml) in a dose-dependent strain-specific manner with least toxicity. But at MIC90 (120 mg/ml) the extract exhibits strong extracellular inactivation in HSV-1 Kupka strain-infected albino guinea pigs and delayed the development of herpetic vesicles [19]. Hence, they pointed out that polyphenolics like caffeic acid, chlorogenic acid (Fig. 1) and rosmarinic acid (Fig. 2) derivatives can inactivate HSV-1 and VZV. Sydiskis et al. [32] found that the hot glycerin extracts containing anthraquinones isolated from Rheum officinale, Aloe barbadensis, Rhamnus frangula, Rhamnus purshianus and Cassia angustifolia can inactivate HSV-1; while the purified aloe emodin is virucidal to HSV-1, HSV-2 and VZV by partial disruption of envelopes of these viruses. Interestingly caffeic acid phenethyl ester (CAPE), an active component of propolis from honeybee hives (Fig. 3), showed anticancer, anti-inflammatory, immunomodulatory and anti-HSV activities [92]. The aqueous extract of Plantago major L., a popular ethnomedicine used in Ayurveda, traditional Chinese medicine (TCM) and Chakma Talika Chikitsa of ‘‘Chakma’’ tribes of Chittagong Hill, Bangladesh, for treating several ailments from cold to viral hepatitis, was reported to posses antiherpesvirus activity; while its isolated phenolics caffeic acid (Fig. 1) exhibited the strongest activity against HSV-1 (EC50 15.3 mg/ml, SI 671) and HSV-2 (EC50 87.3 mg/ml, SI 118), by inhibiting viral multiplication suggesting its potential use for treatment of HSV infection [93]. The SAR studies revealed that chlorogenic acid and caffeic acid can be developed as an improved antiherpes agent [87]. The SAR studies revealed that the

311 OH

Phenol O R3 R2

Name Caffeic Acid Chlorogenic Acid p-Coumaric Acid

R1 -OH -OH -OH

R2 -OH -OH -H

R3 -OH X* -OH

R1 *1, 3, 4, 5-tetrahydroxycyclohexane carboxylic acid

Fig. 1. Structure of caffeic acid and its derivatives.

OH OH HO HO

O O COOH

Fig. 2. Rosmarinic acid.

O HO

O

HO

Fig. 3. Caffeic acid phenyl ester.

site(s) and number of hydroxyl groups on phenols are responsible for their antiviral activity as found with the catechin (Fig. 4), catechol (Fig. 5) and pyrogallol (Fig. 6). The prodelphinidin-di-O-gallate isolated from Myrica rubra bark demonstrated in vitro anti-HSV-2 activity by inhibiting viral attachment and penetration, reducing viral infectivity and affecting the late stage of infection [94]; while the polyphenolics of Agrimonia pilosa, Pithecellobium clypearia and Punica granatum showed anti-HSV-1 activity [95]. Cochinolide B, tremulacin and tremuloidin isolated from Homalium cochinchinensis root bark showed activity against HSV-1 and HSV-2 [96]. Polyphenols and proanthocyanidins isolated from Hamamelis virginiana bark had remarkable anti-HSV-1 activity [97] while proanthocyanidins A1 (Fig. 7) isolated from Vaccinium vitis-idaea block HSV-2 attachment and

312 O

O

O

O

OH HO

HO

[Keto Form]

OH

MeO

OMe

MeO

OMe

Caffeic acid Phenylester

[Enol Form]

OH OH O

HO

OH OH

Fig. 4. Catechin (C15H14O6). OH OH

Fig. 5. Catechol (C6H6O2).

OH HO

OH

Fig. 6. Pyrogallol.

penetration [98], but oligomeric procyanidins of Crataegus sinaica significantly inhibit HSV-1 [99], as proanthocyanidins nonspecifically bind proteins, but selectively inhibit nuclear factor kappa B (NFkB)-dependent gene expression, as reported with proanthocyanidin C1 (Fig. 7) that modulate apoptosis and inhibit NFkB activities [100]. A xanthohumol (Fig. 8)-enriched Humulus lupulus (hop) extract showed moderate activity against HSV-1 (TIW1.9) and HSV-2 (TIW5.3) and CMV with low IC50, while its isomer iso-alpha acids revealed better activity against CMV, suggesting that it might serve as a lead for synthesizing more active antiherpetic agent [101]. An interesting SAR is noted with dimeric procyanidins (Fig. 7) and related polyphenols, where epicatechin-containing dimers (Figs. 9 and 10) showed pronounced anti-HSV activities, as the o-trihydroxyl groups in the B-ring and the double interflavan linkages lead to

313 OH HO

O

OH

R O

OH OH

O

OH

Proanthocyanidin C1

H

HO OH

Name

R OH

Proanthocyanidin A1

OH

Proanthocyanidin A2

OH HO

O

OH

OH

Interflavan bond

OH OH

OH OH OH OH

OH

R1

R2

R1

OH

Name Procyanidin B1 Procyanidin B2 Procyanidin B3 Procyanidin B4

OH OH

OH

Fig. 7. Proanthocyanidins.

OH OH

HO

MeO O

Fig. 8. Xanthohumol.

R2 OH HO

O

OH

OH

R3

Fig. 9. Epicatechin and its dimmers.

R1

R2

OH OH OH OH

H H OH H

OH

-

R3 H2

NAME

H2 O OH

Epigallocatechin (+) Taxifolin Aromadendrin

(-) Epicatechin (+) Catechin

314 OH OH O

HO

OH OH OH

O OH

O OH

Fig. 10. () Epigallocatechin-3-O-gallate.

a significant increase of the anti-HSV effects. The phenolics like flavonoids and dimeric stilbenes from Artocarpus gomezianus Wall. ex tre´c., phloroglucinol derivatives from Mallotus pallidus Airy Shaw and coumarins from Triphasia trifolia (Burm.f.) P. Wilson when tested against HSV-1 and HSV-2 showed that the bis-hydroxyphenyl structure as a potential target for antiHSV drugs development [102]. The polyphenols often showed virucidal activity, which may be due to their association with proteins and/or host cell surfaces, resulting in reduction or prevention of viral adsorption [103]. Flavones, flavonoids and flavonols Flavonoids are a group of natural polyphenolics occurring in fruits, vegetables and beverages, and can exert potent antioxidant and several other effects [104,105]. Bioflavonoids, the basis of many herbal remedies are considered as health-promoting, disease-preventing dietary compounds, and some have therapeutic application or used as prototypes for drug development. Increased intake of flavonoids was found to be associated with reduced risk of cardiovascular and inflammatory diseases, cancer and infections may be due to their inherent capacity to counteracting oxidative stress by scavenging reactive oxygen and nitrogen-free radical species [105,106]. Flavones are hydroxylated phenolics containing one carbonyl group (Fig. 11) instead of two in quinines, while the addition of a third hydroxyl group yields a flavonol (Fig. 11). The flavonoids (Figs. 11 and 12) occur as a C6–C3 unit linked to an aromatic ring and are synthesized in response to microbial infections, hence had broad spectrum of antimicrobial activity. Several reviews [44,107,108] have emphasized the great variety of viruses tested and also the diversity of methods used which demonstrated antiviral activities of flavonoids by direct inactivation or antireplicative effects. Several flavonoids like quercetin (Fig. 13), procyanidin and pelargonidin are virucidal to HSV [109] and the direct inactivation of HSV by catechin (Fig. 4) and hesperidins (Fig. 14) has also been verified [110]. Till date, only a few studies have been reported on the anti-HSV activity of flavonoids (Figs. 11 and 12). The flavonoids quercetin (Fig. 13), and galangin (Fig. 15), naringenin (Fig. 11), kaempferol (Fig. 16) and 3-methyl kaempferol

315 R3′ R8 R7

R2′

R4′

O

O

R5′

O

Ph

R6′ R3

R6 R5

O

O

O

FLAVONOID

FLAVONE

FLAVONOL

Compound

R3

R5

R6

R7

R8

R2′

R3′

R4′

R5′

Flavanol (+/-) Catechin hydrate Epicatechin Epicatechin gallate Epigallocatechin Epigallocatechin gallate

OH OH OH OH OH

OH OH OH OH OH

H H H H H

OH OH OH OH OH

H H H H H

H H OH H H

OH OH OH OH OH

OH OH H OH OH

OH H H H OH

Flavanone Naringenin Naringin

H H

OH OH

H H

OH Rg

H H

H H

H H

OH OH

H H

Flavone Apigenin Chrysin Luteolin Rutin

H H H M/g-py

OH OH OH OH

H H H H

OH OH OH OH

H H H H

H H H H

H H OH H

OH H OH OH

H H H OH

Flavonol Baicalin Fisetin Galangin Kaempferol Myricetin Quercetin

H OH OH OH OH OH

OH H OH OH OH OH

OH H H H H H

OH H OH OH OH OH

H H H H H H

H H H H H H

H OH H H OH OH

H OH H OH OH OH

H H H H OH H

Isoflavone Genistein

H

OH

H

OH

H

H

H

OH

H

Rg, Rahmno glucoside; M/g-py, Rutin manno/gluco-pyranosyl

Fig. 11. Structure of different flavonoids.

(Fig. 17) showed potent antiherpetic activity in Vero cells [111], and the antiviral activity of flavonoids (Fig. 12) is due to the inhibition of viral attachment and penetration or inhibition of reverse transcriptase (RT) enzyme [112] and augmentation of the degree of sulfation [113]. Kane et al. [114] reported that the methyl gallate (Fig. 18) and methyl-3,4,5-trihydroxybenzoate (Fig. 18) from Sapium sebiferum are the potent and highly specific

316 OH [4] OH [3] HO

O

Remove OH at [1]: Flavone Replace OH at [1] with 3rd ring: Isoflovone Replace O at [2] with H: Anthocyanin Replace OH at [3] with Glucose; Remove OH at [4]; Remove OH at [1]:Glucoside

OH [1] OH

O

[2]

FLAVONOL

Flavonoids Acyclovir (+/-) Catechin EC ECG EGC EGCG Naringenin Naringin Apigenin Chrysin Luteolin Rutin Baicalin Fisetin Galangin Kaempferol Myricetin Quercetin Genistein

Antiviral activity EC50 (µM)

Selectivity index (SI)

Toxicity CC50 (µM)

HSV-1

HSV-2

HSV-1

HSV-2

500 >1,000 100 500 250 100 750 1,000 250 10 100 10,000 1,000 100 1,000 50 100 100 250

50.0 4.0 2.5 4.0 2.5 2.5 4.0 2.5 5.0 2.5 5.0 5.0 5.0 2.5 2.5 15.0 5.0 5.0 5.0

50.0 6.2 35.0 63.0 NA NA 22.5 NA NA NA NA NA NA NA NA NA NA 35.0 50.0

10.0 250.0 40.0 125.0 100.0 40.0 187.5 400.0 50.0 4.0 20.0 2,000.0 200.0 40.0 400.0 3.3 20.0 20.0 50.0

10.0 60.0 2.9 7.9 NA NA 33.3 NA NA NA NA NA NA NA NA NA NA 2.9 5.0

CC50 is the 50% cytotoxic effect concentration; EC50 is the 50% effective concentration; Selectivity index (SI) = CC50/EC50. NA, Not available HSV-1: herpes simplex virus type 1 (KOS strain); HSV-2: herpes simplex virus type 2 (G strain). EC, Epicatechin; ECG, Epicatechin gallate; EGC, Epigallocatechin; EGCG, Epigallocatechin gallate.

Fig. 12. SAR of some flavonoids against HSV-1 and HSV-2.

OH OH O

HO

OH OH

O

Fig. 13. Quercetin (3,3u,4u,5,7-pentahydroxy flavone).

317 OH

O

OH

O

HO

Fig. 14. Hesperidin.

O

HO

OH OH

O

Fig. 15. Galangin. OH HO

O OH OH

O

Fig. 16. Kaempferol (3,4u,5,7-tetrahydroxy flavone).

OH O

H3C

OH

O

Fig. 17. 3-Methyl kaempferol.

O 1 2 3

HO

6

C

5

C 4

OMe

COOCH3 C

C OH

C C OH

HO

(a)

Fig. 18. (a) Methyl gallate and (b) methyl benzoate.

OH OH

(b)

318 OH

HO

OH

Fig. 19. Phloroglucinol.

inhibitor of HSV, but the phloroglucinol (Fig. 19) derivatives of Mallotus japonicus (Euphorbiaceae) had antiherpetic activity [31]; while the combination of flavonoids and acyclovir inhibit herpesviruses in cell culture [115]. Traditionally, herbal medicinal products rely on both ‘‘pure’’ single-plant preparations and mixed formulations with many plants. Propolis, a crude extract of the balsam of various trees inhibits acyclovir-resistant HSV-1 and VZV due to the synergistic action of a mixture of terpenoids, flavonoids, benzoic acids esters and phenolic acid esters; while flavone and flavonol were active in isolation against HSV-1 [116]. In another study propolis and 3-methyl-but-2-enyl caffeate showed to inhibit HSV activities [117], while 0.5% aqueous extract of propolis showed 50% inhibition of HSV-1 infection and prevention of animal infection by blocking virus absorption and/or inhibition of early steps of viral replication [92]. The hesperidin (Fig. 14) of orange and grape inhibits replication of HSV, catechin (Fig. 4) inhibits infectivity of HSV-1, but quercetin (Fig. 13) inhibits all, as the small structural differences of these compounds are critical to their activity [110]; while antibacterial galangin or 3,5,7-trihydroxyflavone (Fig. 15) isolated from Helichrysum aureonitens had significant activity against HSV-1 [118]. The Centella asiatica L., Maclura cochinchinensis Cornor, and Mangifera indica L. used as herpesvirus remedies in Thailand are found to inhibit HSV-1 and HSV-2 in plaque inhibition assay, as well as inhibition of infectious HSV-2 virion production in infected Vero cells. Combinations of each of these extracts with acyclovir resulted either in synergistic, subadditive or additive interaction in a dose-dependent manner, due to the active constituent asiaticoside (Fig. 20) of Centella asiatica and mangiferin (Fig. 21) of Mangifera indica with good therapeutic potential [119]. The virucidal activities of iridoid maesasaponin of Maesa lanceolata against HSV-1 are found to be due to diacylation [120]. The morin (Fig. 22), a flavonoid group, isolated from ethyl acetate extract of Maclura cochinchinensis have powerful anti-HSV-2 activity (EC50 38.5–53.5 mg/ml), due to free hydroxyl groups [121]. The amentoflavone (Fig. 23) and robustaflavone (Fig. 24) isolated from Rhus succedanea and Garcinia multiflora inhibits HSV in vitro while VZV were inhibited by rhusflavanone and succedaneflavanone [122,123]. The C-4 sulfated isoflavone torvanol A and the steroidal glycoside torvoside H, isolated from Solanum torvum fruits had strong anti-HSV-1 activity [124]. It was found that the quercetin (Fig. 13) of Caesalpinia pulcherrima Swartz possessed a broad-spectrum antiviral activity against HSV-1 and ADV-8

319 H

HO

OH

O H

O

O

O

O

OH

O

O

OH

O H

O O O

OH

H

HO

OH

OH

Fig. 20. Asiaticoside.

HO HO

OH

O

O OH

OH O

OH

OH OH

Fig. 21. Mangiferin (C9H18O11).

OH

O

HO

OH OH OH

O O

OH

CH3 OH

O

O O OH OH

OH

Fig. 22. Morin (2u,3,4u,5,7-pentahydroxy flavone).

(EC50 24.3–50 mg/l, SI 20.4–60), by inhibiting early stage of multiplication with SI values greater than 20, suggesting the potential use of this compound for treatment of HSV infection [80]. The prenylated flavonoid flavescenones derivative leachianone G (Fig. 25) isolated from the root bark of Morus alba L. showed potent antiviral activity (IC50 1.6 mg/ml), whereas mulberroside C showed weak activity (IC50 75.4 mg/ml) against HSV-1 [125]. Similarly the isoquercitrin of Waldsteinia fragarioides has anti-HSV activity, while glycyrrhizin (Fig. 26) and chrysin (Fig. 27) of many plants showed

320 OH OH O

HO

HO

O

O OH O OH

Fig. 23. Amentoflavone. OH OH

HO

HO

O

O

O OH OH

O

Fig. 24. Robustaflavone.

HO

O

HO

OH

O

O

HO

O HO

OH

(a)

O (b)

Fig. 25. (a) Flavescenones and (b) leachianone G.

COO

O

O O COO O O

Fig. 26. Glycyrrhizin.

OH

321

O

HO

OH

O

Fig. 27. Chrysin. COOH O

OO OH HO

HO OH

HO

O

Fig. 28. Genistein.

OH

OH O

HO

OH

O (a)

Glucose

O

OH

O (b)

Fig. 29. (a) Apigenin (4u,5,7-trihydroxy flavone). (b) Apigenin-7 monoglucoside.

antiherpetic activities [46]. A flavone glycoside dihydroxy-trimethoxyflavoneb-d-xylopyranosyl-b-d-glucopyranoside of Butea monosperma seed showed broad antiviral spectrum [126]. When 18 flavonoids of five classes were tested at various concentrations to Vero cells infected with HSV-1 and HSV-2, most of them showed inhibitory effects on virus-induced cytopathic effect (CPE). Among them, flavanols epicatechin and epicatechin gallate (Fig. 11), isoflavone genistein (Fig. 28), flavanone naringenin and flavonol quercetin (Fig. 13) showed a high level of CPE inhibitory activity. Epicatechin, epicatechin gallate, galangin and kaempferol showed a strong antiviral activity, while catechin, epigallocatechin, epigallocatechin gallate, naringenin, apigenin (Fig. 29), chrysin (Fig. 27), baicalin (Fig. 30), fisetin, myricetin (Fig. 31), quercetin and genistein (Fig. 28) showed moderate inhibitory effects against HSV-1 in plaque reduction assay. Hence, flavanols and flavonols appeared to be more active than flavones, which is due to their structural differences (Figs. 11 and 12). Furthermore, treatment of Vero cells with epicatechin gallate (Fig. 10) and galangin (Fig. 15) before virus adsorption led to a slight enhancement of inhibition by yield reduction assay, indicating that

322

O HO

OH

OH HO O O

HO

O

O

OH

Fig. 30. Baicalin. OH OH O

HO

OH OH

OH

O

Fig. 31. Myricetin (3,3u,4u,5,5u,7-hexahydroxy flavone).

an intracellular effect may also be involved [127]. Hence, the flavonoids like quercetin, chrysin, epicatechin and ()-epigallocatechin gallate showed antiherpesvirus activity and it was observed that most of the potent antiviral flavonoids block viral DNA/RNA polymerase, where the degree of inhibition depends on the structure and side chain (Figs. 11 and 12). The evidence of oxidative stress in virus-infected individuals indicates that antioxidants like flavonoids and proanthocyanidins with low oral bioavailability may have some role in controlling viral disease progression [128,129]. The evaluation of in vivo effect of antioxidants on viral diseases need monitoring of oxidative stress as excessive antioxidant protection could lean over the balance from oxidative stress to ‘‘oxidative deficit’’. The reactive oxygen species, antioxidants, transcription factors and cytokines are essential for life and a part of human defense network that behaves like a black box. Hence, controlled clinical trials with antioxidants, along with oxidative stress measurement will help to determine the clinical significance of oxidative stress on viral diseases; and dietary intervention with antioxidants could be an inexpensive alternative to the existing antiviral treatment strategies. Coumarins The double-ring phenolic compounds made up of fused benzene and a-pyrone rings called coumarin (Fig. 32). The coumarins (Figs. 32 and 33) impart the distinctive sweet smell to newly mown hay. Till date at least 1,300 coumarins have been identified. Though coumarins are highly toxic in rodents, they have a ‘‘pronounced species-dependent metabolism’’ and their derivatives are safely excreted in human urine [130]. Coumarins comprise a

323

O

O

Fig. 32. Coumarins. O

O

Fig. 33. 3-Phenylcoumarin. O CH3

OH

O

O

Fig. 34. Warfarin.

vast array of biologically active compounds ubiquitous in herbal medicinal products used in traditional medicine for thousands of years. Coumarins have antithrombotic, anti-inflammatory, antioxidant, vasodilatory, antiallergic, hepatoprotective, anticarcinogenic, enzyme inhibitor, precursors of toxic substances, plant growth hormones, respiration, photosynthesis, as well as in defense against infection. Owing to its phenolic nature with fused benzene and a-pyrone rings coumarins can stimulate macrophages and thereby exert an indirect effect on viral infections like inhibition of recurrences of cold sores caused by HSV-1 [131]; while several structurally novel coumarin derivatives like warfarin (Fig. 34) showed substantial anti-HIV activity that can be used to develop important lead compounds for antiretroviral drugs [132]. The hydroxycoumarins can act as potent metal chelators, free radical scavenger and powerful chain-breaking antioxidants. The topical tree Clausena heptaphylla (Roxb.) W. & Arn. (Rutaceae) leaves, widely used in China, India and Vietnam to treat fever, showed anti-HSV activity against HSV-1 and HSV-2 in Vero cells at 0.5 mg/ml (provided 40% protection) with little to no cytotoxicity; while the ethyl acetate extract provided 70% protection against both HSV-1 and HSV-2 with minor cytotoxicity; hence, coumarins of Clausena heptaphylla can be developed as an anti-HSV agent [133]. The 1,4-dihydropyridine-5-carboxylic acid and pyridine-5-carboxylic acid derivatives comprising a coumarin group is reported to have broad spectrum of antiviral activities, especially against CMV (AD-169 and Davis strain), HSV-1 (KOS, F, Mclntyre, Thymidine Kinase (TK)-B2006,

324 OH HO

OH

Fig. 35. Resveratrol.

TK-VMW1837, TK-Cheng C158/77, TK-Field C137/101), HSV-2 (G, 196, Lyons) and VZV (TK þ OKA, TK þ YS, TK  07/1, TK  YS/R) strains (International patent WO 01/14370, Rephartox B.V.). A recent review on in vitro and in vivo antiviral, antitumor, anti-inflammatory activities of prenyloxy- and furano-coumarins by Curini et al. [134] can be consulted for further studies. Phytoalexins are hydroxylated derivatives of coumarins, produced in response to microbial infection by many plants, and have anti-infective activity. A polyphenolic phytoalexin resveratrol (http://en.wikipedia.org/ wiki/Image:Resveratrol.png; 3,5,4u-trihydroxystilbene; Fig. 35) produced by several plants as defense, is reported to have antimicrobial activities [45] and is used as a nutritional supplement having antiviral, anticancer, neuroprotective, antiaging and anti-inflammatory effects. Resveratrol (http://en.wikipedia.org/wiki/Image:Resveratrol.png; stilbenoid; Fig. 35), a derivate of stilbene is produced by the enzyme stilbene synthase, in two isomeric forms: cis- (Z) and trans- (E). The trans- form can undergo isomerization to the cis- form when heated or exposed to ultraviolet irradiation [135] and is found in varying amounts in grapes, berries, plums, peanuts, Vaccinium species, some pines and knotweed. Resveratrol was first isolated from the Peruvian legume Cassia quinquangulata in 1974, and its anti-inflammatory activity was recognized in 1997 [25]. It was reported that resveratrol inhibits HSV replication by suppressing NFkB activation [136], as HSV activates NFkB during productive infection as an essential step of its replication cycle [137,138]. Electromobility shift assays demonstrated that resveratrol suppressed NFkB activation in HSV-1, HSV-2 and acyclovir-resistant HSV-1 in a dose-dependent and reversible manner; while reduces mRNA for ICP0, ICP4, ICP8 and DNA polymerase and mRNA for glycoprotein C (a late gene), thus significantly blocked DNA synthesis. These data collectively indicate that resveratrol suppresses HSV-induced activation of NFkB within the nucleus, impairs expression of essential immediate-early, early and late HSV genes and synthesis of HSV DNA [139]. Similarly another phytoalexin oxyresveratrol (Fig. 36) of Millettia erythrocalyx and Artocarpus lakoocha inhibit both HSV and HIV-1 [140].

325 OH HO OH OH

Fig. 36. Isostere of oxyresveratrol.

CH3

OH H3C

CH3

Fig. 37. Terpenoid (menthol).

H

H OH OH

Fig. 38. Scopadulic acid analogs.

Terpenoids and essential oils Essential oils (Quinta essentia), the fragrance of plants, are the phenolic compounds with a C3 side chain and at a lower level of oxidation without oxygen. The oils that are highly enriched in isoprene structure are called terpenes, and when contain additional elements like oxygen, they are termed as terpenoids (Fig. 37) that are active against many viruses [46,87,128]. Scopadulcic acid B (Fig. 38), a diterpenoid isolated from Scoparia dulcis L. was found to inhibit HSV-1 replication in vitro (TI 16.7) by interfering with early events of viral growth, effectively prolonged the appearance of herpetic lesions and the survival time in hamster at 100–200 mg/kg/day [141]; while the triterpene quassinoids (Fig. 39) inhibit Epstein–Barr virus (EBV) [142]. Sclerocarpic acid (Fig. 40), a sesquiterpene isolated from the stem bark of Glyptopetalum sclerocarpum (Celeastraceae) showed antiviral activity against HSV-1 and HSV-2 [143]. Triterpenes from Cochlospermum tinctorium can inhibit EBV [144]. The triterpene betulinic acid (Fig. 41) and moronic acid (Fig. 42) of Rhus javanica showed inhibitory activity against

326 OMe O

O

OMe H

H

O

O H

H

Fig. 39. Quassinoids (quassin).

H CO2H

Fig. 40. Sclerocarpic acid.

COOH H HO

H

Fig. 41. Betulinic acid.

COOH

O

Fig. 42. Moronic acid.

acyclovir-resistant, thymidine kinase-deficient and wild-type HSV-1 strains with an EC50 of 2.6–3.9 mg/ml, respectively [145]. Oral administration of moronic acid (Fig. 42) to cutaneously infected mice with HSV-1 significantly retarded skin lesions and/or prolonged the mean survival times of infected mice without toxicity by suppressing virus yields to the brain [145] and therefore, can be a good anti-HSV agent with a different mechanism of

327 action than that of acyclovir. The limonoid terpene 28-deacetylsendanin (DAS) from Melia azedarach fruit showed anti-HSV-1 activity (IC50 1.46 mg/ml) without cytotoxicity (400 mg/ml). Electron microscopy revealed low electron-dense cores of newly synthesized nucleocapsid in nuclei without any extracellular virus particles and the plaque assay confirmed that 77% of progeny viruses killed in DAS-treated virus-infected cells. The virus replication was inhibited along with reduced synthesis of TK at early stage, leading to the formation of defective nucleocapsid [146]. Aqueous EtOH (80%) extracts of two plants used by Rwandan traditional healers to treat infections showed anti-HSV activity, of which MeOH extract of Maesa lanceolate containing maesasaponin A exhibited virucidal activity against HSV-1 and HSV-2 [147]. It has been found that the seeds of Pachyrrhizus erosus (Leguminosae) contain rotenoids 12a-hydroxydolineone, 12a-hydroxypachyrrhizone with moderate anti-HSV activity [148]. The tetracyclic triterpenes lupenone (Fig. 43) of Euphorbia segetalis exhibited strong viral plaque inhibitory effect against HSV-1 and HSV-2 [149]; while the diterpene putranjivain A (Fig. 44), isolated from Euphorbia jolkini significantly reduced infectivity (IC50 6.3 mM), inhibit viral attachment and cell penetration as well as late stage of HSV-2 replication [150]. The oleanolic acid (Fig. 45), a triterpenoid saponin of oleanane group inhibits DNA synthesis, while the member of ursane group inhibits capsid protein synthesis of HSV-1 [151]. The extract of Ocimum basilicum, sweet basil of Indian and Chinese medicine, showed antiviral activity against diverse virus families. The aqueous and ethanolic extract along with purified apigenin (Fig. 29), linalool and ursolic acid (Fig. 46) from basil showed strong activity against HSV-1. Of these ursolic acid (Fig. 46) showed the strongest activity against HSV-1 (EC50 6.6 mg/l), while apigenin (Fig. 29) showed the highest activity against HSV-2. The antiviral activity of ursolic acid (Fig. 46) is evident during the infection process and the replication phase, indicating its potential against some viruses [152]. A recent study reported that ent-epiafzelechin-(4a-W8)epiafzelechin (EEE; Fig. 47) isolated from fresh leaves of Cassia javanica L. agnes de Wit inhibits HSV-2 replication in a dose-dependent manner

CH3

H3C H CH3 CH3

CH3 O H3C

Fig. 43. Lupenone.

H CH3

CH3

328 HO

OH

OH H O

OH 2

OH

O

O

O H2 C O

O O

OH O OH

O O O

OH H

H HO

OH

O O

O

OH OH

O HO

HO

O

OH

Fig. 44. Putranjivain A.

COOH

O O

Fig. 45. Oleanolic acid.

COOH

HO

Fig. 46. Ursolic acid.

(IC50 83  8710  9 and 166  8712  9 mM) at noncytotoxic concentration by inhibiting penetration and replication at the late stage of viral life cycle [153]. Isoborneol (Fig. 48), a monoterpene essential oils isolated from Melaleuca alternifolia exhibited anti-HSV-1 activity by inactivating HSV replication within 30 min of exposure, by inhibiting glycosylation of viral polypeptides without changes in the glycosylation pattern of cellular polypeptides and

329 OH O

HO H COOH O HOOC

O

OH HO

H O

OH

OH

OH

H

OH

(a)

(b)

Fig. 47. (a) Epiafzelechin and (b) EEE (fresh leaves of Casia javanica) ent-epiafzelechin-(4a-W8)-epiafzelechin.

H3C

CH3 CH3 OH

Fig. 48. Isoborneol.

O

Fig. 49. Pulegone.

affecting the glycosylation of gB at noncytotoxic dose [154], indicating isoborneol (Fig. 48) as an interesting anti-HSV agent. The sandalwood oil from Santalum album had a dose-dependent anti-HSV-1 activity, but essential oil of Italian food plant Santolina insularis inhibit cell-to-cell transmission of herpesviruses [155]; while pulegone (Fig. 49) of Minthostachys verticillata inhibits HSV-1 replication [156]. The terpinen-4-ol (Fig. 50) of Australian Tree Tea Melaleuca alternifolia oil used as antimicrobial preservative in cosmetics, exhibited strong virucidal activity against HSV-1 and HSV-2, and at noncytotoxic concentrations plaque formation was reduced by 98.2% (HSV-1) and 93.0% (HSV-2); while with EUO the titers was reduced by 57.9% (HSV-1) and 75.4% (HSV-2), affecting the viruses before or during adsorption by both the oils [157]. Although the active antiherpes components of tea tree and eucalyptus oil are not very clear,

330

OH

Fig. 50. Terpinen 4-ol.

O

Fig. 51. 1,8-Cineole.

HO

OH

O

O

HO

OH OO

HO HO

OH O

HO

OH O OH

OH

O OH

O HO

Fig. 52. Tannic acid.

but their application in recurrent herpes infection is promising. The volatile oils 1,8-cineole (Fig. 51) and terpinen-4-ol (Fig. 50) of Egyptian plants Melaleuca armillaris were reported to be more effective virucidals than the oils of other Melaleuca species [158]. The essential oil of Artemisia douglasiana and Eupatorium patens inhibit HSV-1 (65–125 ppm) [159], but Melissa officinalis oil inhibits HSV-2 replication [160]. Tannins and saponins Tannins (Figs. 52 and 53) are a group of polymeric plant phenolics (MW 500–3,000) that combine with the collagen protein of animal skins forming leather, or precipitating gelatin from solution (astringency), and are grouped as hydrolyzable and condensed tannins. Hydrolyzable tannins are based on gallic acid (Fig. 54), while the condensed tannins proanthocyanidins are derived from flavonoid monomers. The consumption of tannin-containing beverages, like green teas and red wines, can cure or prevent a variety of illness as tannins can stimulate phagocytic cells, inhibit tumor and wide range

331 OH O

O

HO

O

OH

O OH

Fig. 53. Ellagic acid.

HO HO

COOH HO

Fig. 54. Gallic acid.

O

O HO

OH

HO HO

OH O

O

O

O

O

OH

O O

OH OH

O O

OH

HO OH

OH HO

OH OH

Fig. 55. Eugeniin.

of microbes by forming complex with microbial proteins through hydrophobicity, hydrogen and covalent bonding [161]. Thus, the mode of antiviral action of tannin is to inactivate virus adsorption, transport proteins, polysaccharides and viral enzymes [46,110]. Euglobal-G1–G3 from Eucalyptus grandis is reported to inhibit EBV [162], similarly some quassinoids including ailantinol B, ailantinol C and ailanthone can inhibit EBV early antigen activation [163]; while the eugeniin (Fig. 55) and eugenol (Fig. 56) isolated from Geum japonicum and Syzygium aromaticum block viral DNA polymerase and thereby inhibit acyclovir-resistant thymidine kinase-deficient HSV-1, wild HSV-2 and EBV multiplication [164]. A detailed study on seven ellagitannins isolated from Phyllanthus myrtifolius and Phyllanthus urinaria (Euphorbiaceae) was found to block EBV DNA polymerase, and the SAR analyses reveal that the corilagin moiety of these tannins is responsible for

332 OH O

Fig. 56. Eugenol.

OH OH O

O HO

HO

O

O

HO

OH OH O

O

OH

O O

O

OH

O

OH

HO OH

OH HO

OH OH

Fig. 57. Casuarinin.

its antiviral activity [165]. The spirostanol saponin glycosides (spirostane, tomatidane, solasodane, nuatigenin, ergostane and furostane dimers) of some Solanum species were reported to be inhibitory against HSV-1, and the SAR analysis suggests the importance of oligosaccharide moiety in antiherpes activity [166]. Hydrolyzable tannin casuarinin (Fig. 57) from Terminalia arjuna Linn bark is found to be virucidal that inhibits HSV-2 attachment and penetration [167]. In a bioassay-guided fractionation study, samarangenin B isolated from Limonium sinensi found to be significantly suppressed HSV-1 multiplication in Vero cells without apparent cytotoxicity. Results indicated that the glycoproteins B, C, D, G (gB, gC, gD, gG), ICP5 and gB mRNA expression in Vero cells were impeded by this compound; while PCR data showed that samarangenin B arrest the DNA replication and decreased DNA polymerase, ICP0 and ICP4 gene expression of HSV-1; and the electrophoretic mobility shift assay demonstrated the interrupted alphatrans-induction factor/C1/Oct-1/GARAT multiprotein complex formation. Hence, the anti-HSV activity of samarangenin B is mediated partly by inhibiting expression of alpha gene, ICP0 and ICP4, blocking beta transcripts

333

O H OH Glu-Fuc-O CH2OH

Fig. 58. Saikosaponins.

Fig. 59. Lignan.

(DNA polymerase mRNA) and by arresting DNA synthesis and structural protein expression of HSV-1. Hence, samarangenin B can be a potential candidate for a new antiherpetic agent to block HSV replication [36]. The saikosaponins (Fig. 58), iridoids and phenylpropanoid glycoside isolated from Bupleurum rigidum and Scrophularia scorodonia showed in vitro antiHSV-1 activity where the cellular viability (%) at the nontoxic limit concentrations of the active compounds (500 mg/ml) were verbascoside 53.6%, 8-acetylharpagide 32.1%, harpagoside 43.3%, scorodioside 47.8% and buddlejasaponin IV 56.9% (25 mg/ml); while for the iridoid scorodioside the cellular viability was 30.6% with moderate anti-HSV-1 activity [168]. Lignans Lignans are cinnamic acid derivatives with two C-6, C-3 units linked with b, bu (Fig. 59), widely distributed in plants and reported to have antiviral activities [169]. Lignin is a valuable phenolic polymer that gives wood its characteristic brown color, density and mass, which contain about 40% of the weight of the world’s forests! The nordehydroguanoferate isolated from the extracts of Larrea tridentates, Rhinacanthus nasutus and Kadsura matsudai had antiherpes activities [170]; while lignans of Rhus javanica exhibit anti-HSV-2 activity similar to acyclovir [171]. Lignin–carbohydrate complexes, isolated from the cones of various pine trees (Pinus parviflora Sieb. et Zucc., Pinus densiflora Sieb. et Zucc., Pinus thunbergii Parl., Pinus elliottii var. Elliottii, Pinus taeda L., Pinus caribaea var. Hondurenses, Pinus sylvestris L.) or from the seed shell of pine trees Pinus parviflora and Pinus armadii Franch inhibited the proliferation of HSV. The anti-HSV activity of lignin–carbohydrate complexes was maximum when lignin was added at the time of virus adsorption to the cells and the tannin-related

334 O O

O

O

MeO

OMe OMe

Fig. 60. Yetin.

H2N

N

NH2

Proflavine

Fig. 61. Isoquinoline derivative.

compounds also showed comparable anti-HSV activity [103]. In a bioassayguided study, yatein (Fig. 60; C22H23O7; MW 399), a lignan isolated from methanol extracts of Chinese herb Chamaecyparis obtusa, significantly inhibits HSV-1 multiplication in HeLa cells without apparent cytotoxicity. When a set of key regulatory events leading to the HSV-1 multiplication was examined, it was found that levels of glycoprotein gB and gC mRNA expression in HeLa cells were impeded by yatein (Fig. 60). Further study revealed that yatein can arrest the replication of HSV-1 DNA, decreased ICP0 and ICP4 gene expression and blocking of alpha-trans-induction factorC1/Oct-1/GARAT multiprotein complex. Hence, the anti-HSV action of yatein (Fig. 60) seems to be mediated by inhibiting alpha gene expression, expression of ICP0 and ICP4 genes, arresting viral DNA synthesis and expression of structural protein, and thereby inhibits HSV-1 replication [172]. Alkaloids Alkaloids, the heterocyclic nitrogen compounds, have been found to possess antiviral activities against many viruses. MeOH extract of Stephania cepharantha HAYATA root tubers, a Chinese medicinal plant, has potent anti-HSV-1 activity (IC50 18 mg/ml), contains bis-benzylisoquinoline (Fig. 61), protoberberine (Fig. 62), morphine and proaporphine alkaloids. Although N-methylcrotsparine was active against HSV-1, HSV-1 TK-deficient (acyclovir-resistant) and HSV-2 (IC50 7.8, 9.9 and 8.7 mg/ml) with in vitro therapeutic indices of 90, 71 and 81, respectively, the alkaloid FK-3000 (Fig. 63) was found to be a promising anti-HSV drug candidate [173]. The harman (Fig. 64) isolated from Ophirrhoza nicobarica, a folklore of Little Andaman, inhibits plaque formation and delayed the eclipse phase of HSV

335 O O 8

9

1 2

7 +

6

OH

N

5

OMe

N 10

3 4

MeO (a)

(b)

(c)

Fig. 62. (a) Berberine, (b) 8-hydroxyquinoline and (c) acridine. OMe HO NH ACO

OMe OAC

Fig. 63. FK-3000.

HCO3

N N H

Fig. 64. Harmine.

OMe H3C

O N

N O H

H

CH3

O OMe

Fig. 65. Cepharanthine.

replication at 300 mg/ml [174]; while the biscoclaurine alkaloid cepharanthine (Fig. 65) isolated from Chinese and Mongolian folklore Stephania cepharantha root tuber inhibits HSV-1, along with in vivo antitumor, anti-inflammatory, antiallergic and immuno-modulating activity [175]. As cepharanthine (Fig. 65) had strong antiviral activity against both RNA and DNA viruses, hence be a source of potential lead for new antivirals. The Nelumbo nucifera (Lotus) Gaertn (Nelumbonaceae) is used throughout

336 China, Egypt, the Middle East and India for over 1,000 years in gastrointestinal and bleeding disorders. It was reported that the ethanol extracts (100 mg/ml) of both dry and fresh seeds can significantly suppressed HSV-1 replication (IC50 50.0 mg/ml and IC50 62.078.9 mg/ml, respectively); while its subfractions NN-B-5 from bioactive (butanol) part showed the highest activity. At 50 mg/ml NN-B-5 inhibits TK-deficient HSV-1 replication in HeLa cells up to 85.9%, suggesting that NN-B-5 attenuates the acyclovirresistant HSV-1 propagation [176]. Further study revealed that mRNA production and transcription of ICP0 and ICP4 were decreased in treated cells; while the electrophoretic mobility shift assay showed that NN-B-5 interrupted the formation of alpha-trans-induction factor/C1/Oct-1/GARAT multiprotein/DNA complexes. Hence, the anti-HSV action of NN-B-5 seems to be mediated partly through inhibition of immediate-early transcripts, such as ICP0 and ICP4 mRNA and then blocking of all downstream viral products accumulation and progeny HSV-1 production [176]. Lectins, polypeptides and sugar-containing compounds The antimicrobial peptides are often positively charged and contain disulfide bonds. The mannose-specific lectins of orchid species Cymbidium hybrid, Epipactis helleborine and Listera ovata showed a marked antihuman CMV activity, while the (GlcNAc)n-specific lectin from Urtica dioica was inhibitory to CMV-induced cytopathicity at an EC50 of 0.3–9 mg/ml [177,178]. Meliacine (7a-acetoxymeliaca-14,20,22-trien-3-one), isolated from leaves of Melia azedarach L., inhibits HSV-1 replication in vitro when examined on experimental corneal HSV-1 (KOS strain) inoculation in Balb/c mice treated with meliacine topically 3 times a day for 4 days (as herpetic stromal keratitis, a leading cause of human blindness is caused by ocular HSV-1 infection). It was found that meliacine significantly reduced the development of keratitis and the histological damage in corneas. The viral titers in eyes of infected and treated mice were 2-fold lower than those corresponding infected control, but mock-infected and treated mice did not reveal any corneal alteration due to the compound. Hence, meliacine exert a strong antiviral action on HSV-1-induced ocular disease in mice with no toxic effects [179]. Meliacine also have potent in vitro and in vivo anti-HSV-1 activity as it can inhibit infected cell polypeptides, DNA synthesis, nucleocapsids assembly and affect late event in virus life cycle. Ultrastructural analysis of infected cells revealed that meliacine treatment results accumulation of unenveloped nucleocapsids instead of mature virion in cytoplasmic vesicles, suggesting that meliacine block the syntheses of viral DNA and its maturation [46,179]. It has been reported that the sulfated galactans can inhibit herpesvirus multiplication in cell culture [180] and a sulfated galactans from the marine alga Bostrychia montagnei is found to inhibit HSV replication in vitro [181]. An acidic polysaccharides fraction obtained from Cedrela tubiflora leaves inhibits the replication of HSV-2 without cytotoxicity [182], indicating

337 that the antiviral activity of polysaccharides correlates with molecular weight and sulfate content. Recently Thompson and Dragar [183] reported that galactofucan, a sulfated polysaccharide from aqueous extract of seaweed Undaria pinnatida exhibits anti-HSV activity at noncytotoxic dose by inhibiting viral binding and entry into the host cell against clinical strains of HSV-1 (IC50 32 mg/ml), but inhibition was highly significantly against HSV-2 (IC50 0.5 mg/ml; Po0.001). A water-soluble anionic polysaccharide isolated from Prunella vulgaris L. (Labiatae), a perennial folk medicinal herb of China and Europe, inhibits HSV-1 at 100 mg/ml and HSV-2 at 10 mg/ml [184]; while the anionic polysaccharide of the same herb collected from Japan showed specific anti-HSV activity (IC50 10 mg/ml) by competing for cell surface receptor, unlike other anionic carbohydrates [184,185]. The polysaccharide fraction prepared from Prunella vulgaris L. (Labiatae) showed that the HSV antigen increased time-dependently in the infected HSV-1 and HSV-2 cells, and polysaccharide fraction (25–100 mg/ml) reduced such antigen expression (EC50 20.6 and 20.1 mg/ml), along with the antigen expression of acyclovir-resistant strain of HSV-1 (24.8–92.6%), showing that polysaccharide fraction has a different mode of anti-HSV action from acyclovir [185]. Conclusions The diseases caused by the HSV, VZV, CMV, EBV and Kaposi’s sarcomaassociated viruses are the global concern for their (i) contiguous nature, (ii) ability to persist lifelong within the host, (iii) development of resistant mutants and (iv) their silent epidemic potential to cause high morbidity. These diseases are not yet curable, though treatable, but sometimes be fatal, especially in neonates and immunocompromised patients; as the complete cure or development of effective vaccine is not yet possible. Moreover, the antivirals used against herpesviruses are expensive and have toxicity. Therefore, development of safe, effective and inexpensive antivirals is among the top global priorities of drug development. Furthermore, the long-term combination therapies for herpesviruses may yield drug-resistant mutants. Therefore, scientists from divergent fields are investigating herbal medicinal products, with an eye to their antiherpesvirus usefulness. In a decade of extensive research, great progress has been achieved in the discovery of antiherpesvirus agents from natural sources. A number of purified molecules have been used as lead compounds because of their specific activity and low toxicity and significant SAR. Some natural phytophores have potential to interfere with particular viral enzymes, cellular fusion and target cell binding, resulting complementary mechanisms of action to the existing antiviral drugs. Although no plant-derived drug is currently in clinical use to treat herpesvirus diseases, promising activities have been shown by some herbal product/natural product-derived candidates of diverse class, particularly the

338 phenolics, coumarins, flavonoids and alkaloids, in preclinical and clinical trials. Interestingly a number of plant extracts can block virus entry into host cells and/or specific cellular enzymes, which is a very important aspect in the context of viral drug resistance and limited life span of many antiviral drugs. The compounds having alternative mechanism of action, unlike synthetic antivirals, can be the potential candidates to tackle the threats posed by drugresistant herpesviruses, as it is quite difficult to eliminate herpesvirus diseases by the available antivirals till date.

Acknowledgment The authors wish to acknowledge Dr. Sujit K. Bhattacharya, the Additional Director General, Indian Council of Medical Research, New Delhi and the Officer in-Charge, ICMR Virus Unit, Kolkata, for their kind help and encouragement during the preparation of this manuscript.

References 1. Chang HM and But PPH. Pharmacology and Applications of Chinese Materia Medica, Vols. 1–2. Singapore, World Scientific Inc., 1986 2. Dev S. Ancient-modern concordance in ayurvedic plants: some examples. Environ Health Perspect 1999;107:783–789. 3. Schultes RE and Raffauf RF. The Healing Forest, Portland, Dioscorides Press, 1990. 4. Lewis WH and Elvin-Lewis MP. Medicinal plants as sources of new therapeutics. Ann Mo Bot Gard 1995;82:16–24. 5. Ryan KJ and Ray CG (eds). Sherris Medical Microbiology: An Introduction to Infectious Diseases, 4th edn, Chapter 38, New York, USA, McGraw-Hill, 2004, pp. 555–576. ISBN 0-8385-8529-9. 6. Sandri-Goldin RM (ed). Alpha Herpesviruses: Molecular and Cellular Biology, Norfolk, UK, Caister Academic Press, 2006, pp. 65–83. ISBN 978-1-904455-09-7. 7. Whitley RJ. Herpesviruses. In: Baron’s Medical Microbiology, Baron S (ed), 4th edn, Galveston, University of Texas Medical Branch, 1996. ISBN 0-9631172-1-14. 8. Saha GC, Chattopadhyay D and Chakravarty R. Viruses: role in sexually transmitted infections: chapter V human herpes viruses II. Indian Med J 2002;99(6):20–25. 9. Roizman B and Sears AE. Herpes simplex virus and their replication. In: Fundamental Virology, Fields BN, Knipe DM and Howley PM (eds), 3rd edn, Philadelphia, Lippincott-Raven Publishers, 1996, pp. 2231–2295. 10. Habif TP (ed). Warts, herpes simplex, and other viral infections. In: Clinical Dermatology: A Color Guide to Diagnosis and Therapy, 4th edn, Chapter 12, New York, Mosby, 2004, pp. 381–388. 11. Whitley RJ. Herpes simplex viruses. In: In Fields Virology, Fields BN and Knipe DM (eds), 4th edn, New York, Raven Press, 2001, pp. 2461–2509. 12. Ostrove JM, Leonard J, Weck KE, Radson AB and Gendelman HE. Activation of the human immunodeficiency virus by herpes simplex virus type 1. J Virol 1987;61:3726–3732.

339 13. Felser J, Kichington PR, Inchauspe G, Straus SE and Ostrove JM. Cell line containing varicella-zoster virus open reading frame 62 and expressing the ‘IE’ 175 protein complement ICP4 mutants of herpes simplex virus type 1. J Virol 1988;62:2076–2082. 14. Gius D and Laimins LA. Activation of human papillomavirus type 18 gene expression by herpes simplex virus type 1 viral transactivators and phorbol ester. J Virol 1989;63: 555–563. 15. Hook EWI, Cannon RO and Nahmias AJ. Herpes simplex virus infection as a risk factor for human immunodeficiency virus infection in heterosexuals. J Infect Dis 1992;165: 251–255. 16. Corey L, Wald A, Celum CL and Quinn TC. The effects of herpes simplex virus-2 on HIV-1 acquisition and transmission: a review of two overlapping epidemics. J Acquir Immune Defic Syndr 2004;35:435–445. 17. Lapucci A, Macchia M and Parkin A. Antiherpes virus agents: a review. Farmaco 1993;48:871–895. 18. Corey L and Spear PG. Infections with herpes simplex viruses (1). N Engl J Med 1986; 314:686–691. 19. Serkedjieva J and Ivancheva S. Antiherpes virus activity of extracts from the medicinal plant Geranium sanguineum L. J Ethnopharmacol 1999;64(1):59–68. PMID: 10075123. 20. Darby G. A history of antiherpes research. Antivir Chem Chemother 1994;5(1):3–9. 21. Elion GB, Furman PA, Fyfe JA, de Miranda P, Beauchamp L and Schaeffer HJ. Selectivity of action of an antiherpetic agent, 9-(2-hydroxyethoxymethyl) guanine. Proc Natl Acad Sci USA 1977;74:5716–5720. 22. Furman PA, Clair MH, St. Fyfe JA, Rideout JL, Keller PM and Elion GB. Inhibition of herpes simplex virus-induced DNA polymerase activity and viral DNA replication by 9-(2-hydroxyethoxymethyl) guanine and its triphosphate. J Virol 1979;32:72–77. 23. Kimberlin DW and Whitley RJ. Antiviral resistance: mechanisms, clinical significance, and future implications. J Antimicrob Chemother 1996;37:403–421. 24. Chibo D, Druce J, Sasadeusz J and Birch C. Molecular analysis of clinical isolates of acyclovir resistant herpes simplex virus. Antiviral Res 2004;61:83–91. 25. Wagner EK and Hewlett MJ. Basic Virology, Malden, MA, Blackwell Science, 1999. 26. Chattopadhyay D, Chakraborty MS and Saha GC. Viruses, the acellular parasites of cellular hosts: biology and pathology with special reference to HIV. Indian J STD & AIDS 1999;20(2):54–60. 27. Kuo YC, Sun CM, Ou JC and Tsai WJ. A tumor cell growth inhibitor from Polygonum hypoleucum Ohwi. Life Sci 1997;61:2335–2344. 28. Kuo YC, Yang NS, Chou CJ, Lin LC and Tsai WJ. Regulation of cell proliferation, gene expression, production of cytokines, and cell cycle progression in primary human T lymphocytes by piperlactam S isolated from Piper kadsura. Mol Pharmacol 2000; 58:1057–1066. 29. Farnsworth NR. Ethnopharmacology and future drug development: the North American experience. J Ethnopharmacol 1993;38(2–3):145–152. 30. Houghton PJ. The role of plants in traditional medicine and current therapy. J Altern Complement Med 1995;1:131–143. 31. Arisawa M, Fujita A, Hayashi T, Hayashi K, Ochiai H and Morita N. Cytotoxic and antiherpetic activity of phloroglucinol derivatives from Mallotus japonicus (Euphorbiaceae). Chem Pharm Bull (Tokyo) 1990;38(6):1624–1626. PMID: 2170038. 32. Sydiskis RJ, Owen DG, Lohr JL, Rosler KH and Blomster RN. Inactivation of enveloped viruses by anthraquinones extracted from plants. Antimicrob Agents Chemother 1991;35:2463–2466.

340 33. Marchetti M, Pisani S, Pietropaola V, Seganti L, Nicoletti R, Degener A and Orsi N. Antiviral effect of a polysaccharide from Sclerotium glucanicum towards herpes simplex virus type 1 infection. Planta Med 1996;62:303–307. 34. Simo˜nes CMO, Amoros M and Girre L. Mechanism of antiviral activity of triterpenoid saponins. Phytother Res 1999;21:317–325. 35. Ferrea G, Canessa A, Sampietro F, Cruciani M, Romussi G and Bassetti D. In vitro activity of a Combretum micranthum extract against herpes simplex virus types 1 and 2. Antiviral Res 1993;21:317–325. 36. Kuo YC, Lin LC, Tsai WJ, Chou CJ, Kung SH and Ho YH. Samaragenin B identified from Limonium sinense suppressed herpes simplex virus type 1 replication in Vero cells by regulation of viral macromolecular synthesis. Antimicrob Agents Chemother 2002;46: 2854–2864. 37. Betz UAK, Fischer R, Kleymann G, Hendrix M and Ru¨bsamen-Waigmann H. Potent in vivo antiviral activity of the herpes simplex virus primase-helicase inhibitor BAY 57-1293. Antimicrob Agents Chemother 2002;46:1766–1772. 38. Kleymann G, Fischer R, Betz UAK, Hendrix M, Bender W, Schneider U, Handke G, Eckenberg P, Hewlett G, Pevzner V, Baumeister J, Weber O, Henninger K, Keldenich J, Jensen A, Kolb J, Bach I, Popp A, Ma¨ben J, Frappa I, Haebich D, Lockhoff O and Ru¨bsamen-Waigmann H. New helicase-primase inhibitors as drug candidates for the treatment of herpes simplex disease. Nat Med 2002;8:392–398. 39. Alrabiah FA and Sacks SL. New antiherpes virus agents: their targets and therapeutic potential. Drugs 1996;52:17–32. 40. Pope LE, Marceletti JF, Katz LR and Katz DH. Anti-herpes simplex virus activity of n-docosanol correlates with intracellular metabolic conversion of the drug. J Lipid Res 1996;37:2167–2178. 41. Sacks SL, Thisted RA, Jones TM, Barbarash RA, Mikolich DJ, Ruoff GE, Jorizzo JL, Gunnill LB, Katz DH, Khalil MH, Morrow PR, Yakatan GJ, Pope LE and Berg JE. Clinical efficacy of topical docosanol 10% cream for herpes simplex labialis: a multicenter, randomized, placebo-controlled trial. J Am Acad Dermatol 2001;45:222–230. 42. Vanden Berghe DA, Vlietinck AJ and Van Hoof L. Plant products as potential antiviral agents. Bull Inst Pasteur 1986;84:101–147. 43. Boulware SL, Bronstein JC, Nordby EC and Weber PC. Identification and characterization of a benzothiophene inhibitor of herpes simplex virus type 1 replication which acts at the immediate early stage of infection. Antiviral Res 2001;51:111–125. 44. Hudson JB. Antiviral Compounds from Plants, Boca Raton, CRC Press, 1990. pp. 119–131. 45. Newman DJ, Cragg GM and Snader KM. The influence of natural product upon drug discovery. Nat Prod Rep 2000;17:215–234. 46. Jassim SA and Naji MA. Novel antiviral agents: a medicinal plant perspective. J Appl Microbiol 2003;95:412–427. 47. Yarnell E and Abascal K. Herbs for treating herpes zoster infections. Alternat Complement Ther 2005;11(3):131–134. 48. Chattopadhyay D. Role and scope of ethnomedicinal plants in the development of antivirals. Pharmacologyonline Newslett 2006;3:64–71. 49. Chattopadhyay D and Naik TN. Antivirals of ethnomedicinal origin: structure-activity relationship and scope. Mini Rev Med Chem 2007;7(3):275–301. (Review). 50. Rao AR, Kumar SSV, Paramasivam TB, Kamalakshi S, Parashuraman AR and Shanta B. Study of antiviral activity of tender leaves of margosa tree (Melia azadirachta) on vaccinia and variola virus – a preliminary report. Indian J Med Res 1969;57(3): 495–502.

341 51. Vacik JP, Davis WB, Kelling CS, Schermeister LJ and Schipper IA. Current status of studies on the antiviral activity of a water-soluble extract from narcissus bulb against herpes viruses. Adv Ophthalmol 1979;38:281–287. PMID: 230721. 52. Zgorniak-Nowosielska I, Grzybek J, Manolova N, Serkedjieva J and Zawilinska B. Antiviral activity of Flos verbasci infusion against influenza and herpes simplex viruses. Arch Immunol Ther Exp (Warsz) 1991;39(1–2):103–108. PMID: 1666504. 53. McCutcheon AR, Roberts TE, Gibbons E, Ellis SM, Babiuk LA, Hancock RE and Towers GH. Antiviral screening of British Columbian medicinal plants. J Ethnopharmacol 1995;49:101. 54. Elanchezhiyan M, Rajarajan S, Rajendran P, Subramanian S and Thyagarajan SP. Antiviral properties of the seed extract of an Indian medicinal plant, Pongamia pinnata L inn. against herpes simplex viruses: in vitro studies on Vero cells. J Med Microbiol 1993;38(4):262–264. 55. Kurokawa M, Ochiai H, Nagasaka K, Neki M, Xu H, Kadota S, Sutardjo S, Matsumoto T, Namba T and Shiraki K. Antiviral traditional medicines against herpes simplex virus, polio virus and measles virus in vitro and their therapeutic efficacies for HSV-1 infection in mice. Antiviral Res 1993;22(2–3):175–188. 56. Guo NL, Lu DP, Woods GL, Reed E, Zhou GZ, Zhang LB and Waldman RH. Demonstration of the antiviral activity of garlic extract against human cytomegalovirus in vitro. Chin Med J 1993;106(2):93–96. PMID: 8389276. 57. Hayashi K, Kamiya M and Hayashi T. Virucidal effects of the steam distillate from Houttuynia cordata and its components on HSV-1, influenza virus, and HIV. Planta Med 1995;61(3):237–241. PMID: 7617766. 58. Wang Z, Wang G, Xu H and Wang P. Anti-herpesvirus action of ethanol extract from the root and rhizome of Rheum officinale baill. Zhongguo Zhong Yao Za Zhi 1996;21(6): 364–366. 384. PMID: 9388926. 59. Meyer JJ, Afolayan AJ, Taylor MB and Engelbrecht L. Inhibition of herpes simplex virus type 1 by aqueous extracts from shoots of Helichrysum aureonitens (Asteraceae). J Ethnopharmacol 1996;52(1):41–43. PMIID: 8733118. 60. Yukawa TA, Kurokawa M, Sato H, Yoshida Y, Kageyama S, Hasegawa T, Namba T, Imakita M, Hozumi T and Shiraki K. Prophylactic treatment of cytomegalovirus infection with traditional herbs. Antiviral Res 1996;32(2):63–70. 61. Shiraki K, Yukawa T, Kurokawa M and Kageyama S. Cytomegalovirus infection and its possible treatment with herbal medicines. Nippon Rinsho 1998;56(1):156–160. 62. Betancur-Galvis L, Saez J, Granados H, Salazar A and Ossa J. Antitumor and antiviral activity of Colombian medicinal plants extracts. Mem Inst Oswaldo Cruz 1999;94(4):531–535. 63. Kott V, Barbini L, Cruanes M, Munoz JD, Vivot E, Cruanes J, Martini V, Ferraro G, Cavallaro L and Campos R. Antiviral activity in Argentine medicinal plants. J Ethnopharmacol 1999;64(1):79–84. 64. Abad MJ, Bermejo P, Sanchez Palomino S, Chiriboga X and Carrasco L. Antiviral activity of some South American medicinal plants. Phytother Res 1999;13(20): 142–146. 65. Nawawi A, Nakamura N, Hattori M, Kurokawa M and Shiraki K. Inhibitory effects of Indonesian medicinal plants on the infection of herpes simplex virus type 1. Phytother Res 1999;13(1):37–41. PMID: 10189948. 66. Simoes CM, Falkenberg M, Mentz LA, Schenkel EP, Amoros M and Girre L. Antiviral activity of south Brazilian medicinal plants extracts. Phytomedicine 1999;6(3): 205–214.

342 67. Wang Z, Cheng Z and Fang X. Antiviral action of combined use of rhizoma Polygoni cuspidati and radix Astragali on HSV-1 strain. Zhongguo Zhong Yao Za Zhi 1999;24(3): 176–180. 192. 68. Yoosook C, Panpisutchai Y, Chaichana S, Santisuk T and Reutrakul V. Evaluation of anti-HSV-2 activities of Barleria lupulina and Clinacanthus nutans. J Ethnopharmacol 1999;67(2):179–187. PMID: 10619382. 69. Abad MJ, Guerra JA, Bermejo P, Irurzun A and Carrasco L. Search for antiviral activity in higher plant extracts. Phytother Res 2000;14(8):604–607. PMID: 11113996. 70. Kido T, Mori K, Daikuhara H, Tsuchiya H, Ishige A and Sasaki H. The protective effect of hochu-ekki-to (TJ-41), a Japanese herbal medicine against HSV-1 infection in mitomycin C-treated mice. Anticancer Res 2000;20(6A):4109–4113. 71. Salem ML and Hossain MS. Protective effect of black seed oil from Nigella sativa against murine cytomegalovirus infection. Int J Immunopharmacol 2000;22(9):729–740. PMID: 10884593. 72. del Barrio G and Parra F. Evaluation of the antiviral activity of an aqueous extract from Phyllanthus orbicularis. J Ethnopharmacol 2000;72(1–2):317–322. 73. Glatthaar-Saalmuller B, Sacher F and Esperester A. Antiviral activity of an extract derived from roots of Eleutherococcus senticosus. Antiviral Res 2001;50:223–228. 74. Baqui AA, Kelley JI, Jabra-Rizk MA, Depaola LG, Falkler WA and Meiller TF. In vitro effect of oral antiseptics on human immunodeficiency virus-1 and herpes simplex virus type 1. J Clin Periodontol 2001;28(7):610–616. 75. Lopez A, Hudson JB and Towers GH. Antiviral and antimicrobial activities of Colombian medicinal plants. J Ethnopharmacol 2001;77(2–3):189–196. 76. Rajbhandari M, Wegner U, Julich M, Schopke T and Mentel R. Screening of Nepalese medicinal plants for antiviral activity. J Ethnopharmacol 2001;74(3):251–255. 77. Hsiang CY, Hsieh CL, Wu SL, Lai IL and Ho TY. Inhibitory effect of anti-pyretic and anti-inflammatory herbs on herpes simplex virus replication. Am J Chin Med 2001; 29(3–4):459–467. 78. Fortin H, Vigor C, Lohezic-Le Devehat F, Robinm V, Le Bossem B, Boustiem J and Amoros M. In vitro antiviral activity of thirty-six plants from La Reunion Island. Fitoterapia 2002;73(4):346. 79. Chiang LC, Cheng HY, Liu MC, Chiang W and Lin CC. Antiviral activity of eight commonly used medicinal plants in Taiwan. Am J Chin Med 2003;31(6):897–905. 80. Chiang LC, Cheng HY, Liu MC, Chiang W and Lin CC. In vitro anti-herpes simplex viruses and anti-adenoviruses activity of twelve traditionally used medicinal plants in Taiwan. Biol Pharm Bull 2003;26(11):1600–1604. PMID: 12729671. 81. Chiang LC, Chang JS, Chen CC, Ng LT and Lin CC. Anti-herpes simplex virus activity of Bidens pilosa L. var. Minor (Blume) Sherff and Houttuynia cordata Thumb. Am J Chin Med 2003;31:355–362. PMID: 12943167. 82. Lipipun V, Kurokawa M, Suttisri R, Taweechotipatr P, Pramyothin P, Hattori M and Shiraki K. Efficacy of Thai medicinal plant extracts against herpes simplex virus type 1 infection in vitro and in vivo. Antiviral Res 2003;60(3):175–180. 83. Chen T, Jai WX, Yang FL, Xie Y, Yang WQ, Zeng W, Zhang ZR, Li H, Jiang SP, Yang Z and Chen JR. Experimental study on the antiviral mechanism of Ceratostigma willmattianum against herpes simplex virus type 1 in vitro. Zhonggou Zhong Yao Za Zhi 2004;29(9):882–886. 84. Tshikalange TE, Meyer JJ and Hussein AA. Antimicrobial activity, toxicity and the isolation of a bioactive compound from plants used to treat sexually transmitted diseases. J Ethnopharmacol 2005;96(3):515–519.

343 85. Cheng HY and LinChun C. The antiherpes simplex viruses activity of extracts and compounds of natural products. J Tradit Chin Med 2005;22(Suppl. 1):129–132. 86. Ramzi A, Mothana A, Mentel R, Reiss C and Lindequist U. Phytochemical screening and antiviral activity of some medicinal plants from the island Soqotra. Phytother Res 2006;20(4):298–302. 87. Khan MT, Ather A, Thompson KD and Gambari R. Extracts and molecules from medicinal plants against herpes simplex viruses. Antiviral Res 2005;67:107–119. 88. King A and Young G. Characteristics and occurrence of phenolic phytochemicals. J Am Diet Assoc 1999;99(2):213–218. (Review). 89. Serkedjieva J and Manolova N. Plant polyphenolic complex inhibits the reproduction of influenza and herpes simplex viruses. Basic Life Sci 1992;59:705–715. PMID: 1329716. 90. Martinez-Velverde I, Periago MJ and Ros G. Nutritional importance of phenolic compounds in the diet. Arch Latinoam Nutr 2000;50(1):5–18. 91. Zgorniak-Nowosielska I, Zawilinska B, Manolova N and Serkedjieva J. A study on the antiviral action of a polyphenolic complex isolated from the medicinal plant Geranium sanguineum L. VIII. Inhibitory effect on the reproduction of herpes simplex virus type 1. Acta Microbiol Bulg 1989;24:3–8. PMID: 2560321. 92. Huleihel M and Isanu V. Anti-herpes simplex virus effect of an aqueous extract of propolis. IMAJ 2002;4:923–927. 93. Chiang LC, Chiang W, Chang MY, Ng LT and Lin CC. Antiviral activity of Plantago major extracts and related compounds in vitro. Antiviral Res 2002;55:53–62. PMID: 12076751. 94. Cheng HY, Lin TC, Ishimaru K, Yang CM, Wang KC and Lin CC. In vitro antiviral activity of prodelphinidin B-2 3,3u-di-O-gallate from Myrica rubra. Planta Med 2003;69(10):953–956. 95. Li Y, Ooi LS, Wang H, But PP and Ooi VE. Antiviral activities of medicinal herbs traditionally used in southern mainland China. Phytother Res 2004;18(9):718–722. 96. Ishikawa T, Nishigaya K, Takami K, Uchikoshi H, Chen IS and Tsai IL. Isolation of salicin derivatives from Homalium cochinchinensis and their antiviral activities. J Nat Prod 2004;67(4):659–663. 97. Erdelmeier CA, Cinatl J, Jr., Rabenau H, Doerr HW, Biber A and Koch E. Antiviral and antiphlogistic activities of Hemamelis virginiana bark. Planta Med 1996;62:241–245. 98. Cheng HY, Lin TC, Yang CM, Shieh DE and Lin CC. In vitro anti-HSV-2 activity and mechanism of action of proanthocyanidin A-1 from Vaccinium vitis-idaea. J Sci Food Agric 2005;85:10–15. 99. Shahat AA, Cos P, De Bruyne T, Apers S, Hammouda FM, Ismail SI, Azzam S, Claeys M, Goovaerts E, Pieters L, Vanden Berghe D and Vlietinck AJ. Antiviral and antioxidant activity of flavonoids and proanthocyanidins from Crataegus sinaica. Planta Med 2002;68:539–541. 100. Cos P, de Bruyne T, Hermans N, Apers S, Vanden Berghe D and Vlietinck AJ. Proanthocyanidins in health care: current and new trends. Curr Med Chem 2004;11: 1345–1359. 101. Buckwold VE, Wilson RJ, Nalca A, Beer BB, Voss TG, Turpin JA, Buckheit RW, Wei J, Wenzel-Mathers M, Walton EM, Smith RJ, Pallansch M, Ward P, Wells J, Chuvala L, Sloane S, Paulman R, Russell J, Hartman T and Ptak R. Antiviral activity of Hop constituents against a series of DNA and RNA viruses. Antiviral Res 2004;61(1):57–62. PMID: 14670594. 102. Likhitwitayawuid K, Supudompol B, Sritularak B, Lipipun V, Rapp K and Schinazi RF. Phenolics with anti-HSV and anti-HIV activities from Artocarpus gomezianus,

344

103.

104. 105.

106.

107.

108.

109.

110. 111.

112.

113.

114.

115. 116.

117.

Mallotus pallidus, and Triphasia trifolia. Pharm Biol 2005;43(8):651–657, http:// www.informaworld.com/smpp/titleBcontent ¼ t713721640Bdb ¼ allBtab ¼ issueslist Bbranches ¼ 43-v43. Sakagami H, Hashimoto K, Suzuki F, Ogiwara T, Satoh K, Ito H, Hatano T, Takashi Y and Fujisawa S. Molecular requirements of lignin–carbohydrate complexes for expression of unique biological activities. Phytochemistry 2005;66(17):2108–2120. (Review). Rice-Evans CA and Packer L (eds). Flavonoids in Health and Disease, New York, Marcel Dekker, 1997. Virgili F, Scaccini C, Hoppe PP, Kra¨mer K and Packer L. Plant phenols and cardiovascular disease: antioxidants and cell modulators. In: Nutraceuticals in Health and Disease Prevention, Kra¨mer K, Hoppe PP and Packer L (eds), New York, Marcel Dekker, 2001, pp. 187–215. Knekt P, Kumpulainen J, Ja¨rvinen R, Rissanen H, Heliovaara M, Reunanen A, Hakulinen T and Aromaa A. Flavonoid intake and risk of chronic diseases. Am J Clin Nutr 2002;76:560–568. Selway JWT. Plant flavonoids in biology and medicine. Biochemical, pharmacological, and structure-activity relationships. In: Progress in Clinical and Biological Research, Cody V, Middleton E and Arborne JB (eds), New York, A. R. Liss, 1986, pp. 521–536. Vlietinck AJ, Vanden Berghe DA and Haemers A. Plant flavonoids in biology and medicine. Biochemical, pharmacological, and structure-activity relationships. In: Progress in Clinical and Biological Research, Cody V, Middleton E and Harborne JB (eds), New York, A. R. Liss, 1986, pp. 283–299. Mucsi I, Beladi I, Pusztai R, Bakay M and Gabor M. Antiviral effects of flavonoids. In: Proceedings 5th Hungarian Bioflavonoids Symposium, Farkas L, Gabor M and Kallay F (eds), Amsterdam, Elsevier, 1977, pp. 401–409. Kaul TN, Jr., Middletown E and Ogra PL. Antiviral effect of flavonoids on human viruses. J Med Virol 1985;15:71–79. Dargan DJ and Subak-Sharpe JH. The antiviral activity against Herpes simplex virus of the triterpenoid compounds carbenoxolone sodium and cicloxolone sodium. J Antimicrob Chemother 1986;18:185–200. Charles EI, Weimin X, Raju KP and Richard K. Retinoic acid reduces the yield of herpes simplex virus in Vero cells and alters the N-glycosylation of viral envelope proteins. Antiviral Res 2000;47:29–40. Sarisky RT, Crosson P, Cano R, Quail MR, Nguyen TT, Wittrock RJ, Bacon TH, Sacks SL, Caspers-Velu L, Hodinka RL and Leary JJ. Comparison of methods for identifying resistant herpes simplex virus and measuring antiviral susceptibility. J Clin Virol 2002; 23:191–200. Kane CJ, Menna JH and Yeh YC. Methyl gallate, methyl-3,4,5-trihydroxybenzoate, is a potent and highly specific inhibitor of herpes simplex virus in vitro. I. Purification and characterization of methyl gallate from Sapium sebiferum. Biosci Rep 1988;8(1):85–94. PMID: 2840132. Mucsi I, Gyulai Z and Beladi I. Combined effects of flavonoids and acyclovir against herpesviruses in cell cultures. Acta Microbiol Hung 1992;39(2):137–147. PMID: 1339152. Amoros M, Simoes CMO and Girre L. Synergistic effect of flavones and flavonols against herpes simplex virus type 1 in cell culture. Comparison with the antiviral activity of propolis. J Nat Prod 1992;55:1732–1740. Amoros M, Lurton E, Boustie J, Girre L, Sauvager F and Cormier M. Comparison of the anti-herpes simplex virus activities of propolis and 3-methyl-but-2-enyl caffeate. J Nat Prod 1994;57(5):644–647. PMID: 8064297.

345 118. Meyer JJ, Afolayan AJ, Taylor MB and Erasmus D. Antiviral activity of galangin from the aerial parts of Helichrysum aureonitens. J Ethnopharmacol 1997;56:165–169. 119. Yoosook C, Bunyapraphatsara N, Boonyakiat Y and Kantasuk C. Anti-herpes simplex virus activities of crude water extracts of Thai medicinal plants. Phytomedicine 2000; 6(6):411–419. PMID: 10715843. 120. Apers S, Baronikova S, Sindambiwe JB, Witvrouwm M, De Clercqm E, Vanden Berghe D and Van Marck E. Antiviral, haemolytic and molluscicidal activities of triterpenoid saponins from Maesa lanceolata: establishment of structure-activity relationships. Planta Med 2001;67:528–532. 121. Bunyapraphatsara N, Dechsree S, Yoosook C, Herunsalee A and Panpisutchai Y. Anti-herpes simplex virus activity of Maclura cochinchinensis. Phytomedicine 2000;6: 421–424. 122. Lin YM, Flavin MTR, Chen FC, Sidwell R, Barnard DL, Huffman JH and Kern ER. Antiviral activities of biflavonoids. Planta Med 1999;65:120–125. 123. Ma SC, But PP, Ooi VE, He YH, Lee SH, Lee SF and Lin RC. Antiviral amentoflavone from Selaginella sinensis. Biol Pharm Bull 2001;24:311–312. 124. Arthan D, Svasti J, Kittakoop P, Pittayakhachonwut D, Tanticharoen M and Thebtaranonth Y. Antiviral isoflavonoid sulfate and steroidal glycosides from Solanum torvum. Phytochem 2002;59:459–463. 125. Du J, He ZD, Jiang RW, Ye WC, Xu HX and But PP. Antiviral flavonoids from the root bark of Morus alba L. Phytochemistry 2003;62(8):1235–1238. 126. Yadava RN and Tiwari L. A potential antiviral flavone glycoside from the seeds of Butea monosperma O. Kuntze. J Asian Nat Prod Res 2005;7(2):185–188. 127. Lyu S-Y, Rhim J-Y and Park W-B. Antiherpetic activities of flavonoids against herpes simplex virus type 1 (HSV-1) and type 2 (HSV-2) in vitro. Arch Pharm Res 2005;28(11): 1293–1301. 128. Cos P, Maes L, Vanden Berghe D, Hermans N, Pieters L and Vlietinck A. Plant substances as anti-HIV agents selected according to their putative mechanism of action. J Nat Prod 2004;67:284–293. 129. Chattopadhyay D. Ethnomedicinal antivirals: scope and opportunity. Chapter 15. In: Modern Phytomedicine: Turning Medicinal Plants into Drugs, Ahmad I, Aquil F and Owais M (eds), Weinheim, Wiley-VCH, 2006, pp. 313–338. ISBN: 978-3-527-31530-7. 130. Weinmann I. History of the development and applications of coumarin and coumarinrelated compounds. In: Coumarins: Biology, Applications and Mode of Action, O’Kennedy R and Thornes RD (eds), New York, NY, Wiley, 1997. 131. Casley-Smith JR and Casley-Smith JR. Coumarin in the treatment of lymphoedema and other high-protein oedemas. In: Coumarins: Biology, Applications and Mode of Action, O’Kennedy R and Thornes RD (eds), New York, Wiley, 1997, p. 348. 132. Kostova I, Raleva S, Genova P and Argirova R. Structure-activity relationships of synthetic coumarins as HIV-1 inhibitors. Bioinorg Chem Appl 2006;2006:68274. PMID: 17497014. 133. Huy LD, Caple R, Kamperdick C, Diep NT and Karim R. Isomeranzin against Herpes simplex virus in vitro from Clausena heptaphylla (Roxb.) W. & Arn: isolation, structure and biological assay. J Chem 2004;42(1):115–120. 134. Curini M, Cravotto G, Epifano F and Giannone G. Chemistry and biological activity of natural and synthetic prenyloxycoumarins. Curr Med Chem 2006;13(2):199–222. 135. Farnsworth NR. Bioactive compounds from plants. In: Ciba Foundation Symposium, Vol. 174, Chadwick DJ and Marsh J (eds), Ciba Foundation SymposiumChichester, Wiley, 1990, pp. 2–21.

346 136. Docherty JJ, Fu MM, Stiffler BS, Limperos RJ, Pokabla CM and DeLucia AL. Resveratrol inhibition of herpes simplex virus replication. Antiviral Res 1999;43:145–155. 137. Patel A, Hanson J, McLean TI, Olgiate J, Hilton M, Miller WE and Bachenheimer SL. Herpes simplex type 1 induction of persistent NF-kappa B nuclear translocation increases the efficiency of virus replication. Virology 1998;247:212–222. 138. Gregory D, Hargett D, Holmes D, Money E and Bachenheimer SL. Efficient replication by HSV-1 involves activation of the IkappaB kinase-IkappaB-RelA/p65 pathway. J Virol 2004;78:13582–13590. 139. Faith SA, Sweet TJ, Bailey E, Booth T and Docherty JJ. Resveratrol suppresses nuclear factor-kappaB in herpes simplex virus infected cells. Antiviral Res 2006;72(3):242–251. Epub 2006 Jul 14. 140. Likhitwitayawuid K, Sritularak B, Benchanak K, Lipipun V, Mathew J and Schinazi RF. Phenolics with antiviral activity of Millettia erythrocalyx and Artocarpus lakoocha. Nat Prod Res 2005;19:177–182. 141. Hayashi K, Niwayama S, Hayashi T, Nago R, Ochiai H and Morita N. In vitro and in vivo antiviral activity of scopadulcic acid B from Scoparia dulcis, Scrophulariaceae, against herpes simplex virus type 1. Antiviral Res 1988;9(6):345–354. 142. Okano M, Fukamiya N, Tagahara K, Tokuda H, Iwashima A, Nishino H and Lee KH. Inhibitory effects of quassinoids on Epstein–Barr virus activation. Cancer Lett 1995; 94(2):139–146. 143. Sotanaphun U, Lipipun V, Suttisri R and Bavovada R. A new antiviral and antimicrobial sesquiterpene from Glyptopetalum scerocarpum. Planta Med 1999;65(3):257–258. 144. Diallo B, Vanhaelen M, Vanhaelen-Fastre R, Konoshima T, Kozuka M and Tokuda H. Studies on inhibitors of skin-tumor promotion. Inhibitory effects of triterpenes from Cochlospermum tinctorium on Epstein–Barr virus activation. J Nat Prod 1989;52(4): 879–881. PMID: 2553872. 145. Kurokawa M, Basnet P, Ohsugi M, Hozumi T, Kadota S, Namba T, Kawana T and Shiraki K. Anti-herpes simplex virus activity of moronic acid purified from Rhus javanica in vitro and in vivo. J Pharmacol Exp Ther 1999;289:72–78. 146. Kim M, Kim SK, Park BN, Lee KH, Min GH, Seoh JY, Park GG, Hwang ES, Cha CY and Kook YH. Antiviral effects of 28-decaetylsendanin on herpes simplex virus-1 replication. Antiviral Res 1999;43(2):103–112. 147. Sindambiwe JB, Calomme M, Cos P, Totte J, Pieters L, Vlietinck A and Vanden Berghe D. Screening of seven selected Rwandan medicinal plants for antimicrobial and antiviral activities. J Pharmacol 1999;65(1):71–77. 148. Phrutivorapongkul A, Lipipun V, Ruangrungsi N, Watanabe T and Ishikawa T. Studies on the constituents of seeds of Pachyrrhizus erosus and their anti HSV activities. Chem Pharm Bull 2002;50:534–537. 149. Madureira AM, Ascenso JR, Valdeira L, Duarte A, Frade JP, Freitas G and Ferreira MJ. Evaluation of the antiviral and antimicrobial activities of triterpenes isolated from Euphorbia segetalis. Nat Prod Res 2003;17(5):375–380. PMID: 14526920. 150. Cheng HY, Lin TC, Yang CM, Wang KC, Lin LT and Lin CC. Putranjivain A from Euphorbia jolkini inhibits both virus entry and late stage replication of herpes simplex virus type 2 in vitro. J Antimicrob Chemother 2004;53:577–583. 151. Yogeeswari P and Sriram D. Betulinic acid and its derivatives: a review on their biological properties. Curr Med Chem 2005;12(6):657–666. 152. Chiang LC, Ng LT, Cheng PW, Chiang W and Lin CC. Antiviral activities of extracts and selected pure constituents of Ocimum basilicum. Clin Exp Pharmacol Physiol 2005; 32:811–816.

347 153. Cheng HY, Yang CM, Lin TC, Shieh DE and Lin CC. ent-Epiafzelechin-(4alpha-W8)epiafzelechin extracted from Cassia javanica inhibits HSV-2 replication. J Med Microbiol 2006;55:201–206. 154. Armaka M, Papanikolaou E, Sivropoulou A and Aesenakis M. Antiviral properties of isoborneol, a potent inhibitor of herpes simplex virus type 1. Antiviral Res 1999;43(2):79–92. 155. De Logu A, Loy G, Pellerano ML, Bonsignore L and Schivo ML. Inactivation of HSV-1 and HSV-2 and prevention of cell-to-cell virus spread by Santolina insularis essential oil. Antiviral Res 2000;48:177–185. 156. Primo V, Rovera M, Zanon S, Oliva M, Demo M, Daghero J and Sabini L. Determination of the antibacterial and antiviral activity of the essential oil from Minthostachys verticillata (Griseb.) Epling. Rev Argent Microbiol 2001;33(2):113–117. 157. Schnitzler P, Schon K and Reichling J. Antiviral activity of Australian tree oil and eucalyptus oil against herpes simplex virus in cell culture. Pharmazie 2001;56(4):343–347. 158. Farag RS, Shalaby AS, El-Baroty GA, Ibrahim NA, Ali MA and Hassan EM. Chemical and biological evaluation of the essential oils of different Melaleuca species. Phytother Res 2004;18:30–35. 159. Garcia CC, Talarico L, Almeida N, Colombres S, Duschatzky C and Damonte EB. Virucidal activity of essential oils from aromatic plants of San Luis, Argentina. Phytother Res 2003;17:1073–1075. 160. Allahverdiyev A, Duran N, Ozguven M and Koltas S. Antiviral activity of the volatile oils of Melissa officinalis L. against Herpes simplex virus type-2. Phytomedicine 2004;11(7-8):657–661. 161. Haslam E. Natural polyphenols (vegetable tannins) as drugs: possible modes of action. J Nat Prod 1996;59:205–215. 162. Takasaki M, Konoshima T, Shingu T, Tokuda H, Nishino H, Iwashima A and Kozuka M. Structures of euglobal-G1, -G2, and -G3 from Eucalyptus grandis, three new inhibitors of Epstein–Barr virus activation. Chem Pharm Bull (Tokyo) 1990;38(5): 1444–1446. PMID: 2168298. 163. Kubota K. Two new quassinoids, Ailanthinols A and B, and related compounds from Ailanthus altissima. J Nat Prod 1996;59:683–686. 164. Kurokawa M, Hozumi T, Basnet P, Nakano M, Kadota S, Namba T, Kawana T and Shiraki K. Purification and characterization of eugeniin as an anti-herpes virus compound from Geum japonicum and Syzygium aromaticum. J Pharmacol Exp Ther 1998;284(2):728–735. PMID: 9454821. 165. Liu KC, Lin MT, Lee SS, Chiou JF, Ren S and Lien EJ. Antiviral tannins from two Phyllanthus species. ROC Med 1999;65(1):43–46. 166. Ikeda T, Ando J, Miyazono A, Zhu XH, Tsumagari H, Nohara T, Yokomizo K and Uyeda M. Anti-herpes virus activity of Solanum steroidal glycosides. Biol Pharm Bull 2000;23(3):363–364. 167. Cheng HY, Lin CC and Lin TC. Antiherpes simplex virus type 2 activity of Casuarinin from the bark of Terminalia arjuna Linn. Antiviral Res 2002;55(3):447–455. PMID: 12206882. 168. Bermejo P, Abad MJ, Diaz AM, Fernandez L, Santos JD, Sanchez S, Villaescusa L, Carrasco L and Irurzun A. Antiviral activity of seven iridoids, three saikosaponins and one phenylpropanoid glycoside extracted from Bupleurum rigidum and Scrophularia scorodonia. Planta Med 2002;68(2):106–110. 169. Charlton JL. Antiviral activity of lignans. J Nat Prod 1998;61(11):1447–1451. 170. Kuo YH, Li SY, Huang RL, Wu MD, Huang HC and Lee KH. Schizarin B, C, D and E, four new lignans from Kadsura matsudai and their anti-hepatitis activities. J Nat Prod 2001;64:487–490.

348 171. Nakano M, Kurokawa M, Hozumi T, Saito A, Ida M, Morohashi M, Namba T, Kawana T and Shiraki K. Suppression of recurrent genital herpes simplex virus type 2 infection by Rhus javanica in guinea pigs. Antiviral Res 1998;39:25–33. 172. Kuo YC, Kuo YH, Lin YL and Tsai WJ. Yatein from Chamaecyparis obtusa suppresses herpes simplex virus type 1 replication in HeLa cells by interruption the immediate-early gene expression. Antiviral Res 2006;70(3):112–120. 173. Nawawi A, Ma C, Nakamura N, Hattori M, Kurokawa M, Shirak K, Kashiwada N and Ono M. Anti-herpes simplex virus activity of alkaloids isolated from Stephania cepharantha. Biol Pharm Bull 1999;22(3):268–274. 174. Chattopadhyay D, Arunachalam G, Mandal AB and Bhattacharya SK. Dose dependent therapeutic antiinfectives from ethnomedicines of Bay Islands. Chemotherapy 2006;52: 151–157. 175. Szlavik L, Gyuris A, Minarovits J, Forgo P, Molnar J and Hohmann J. Alkaloids from Leucojum vernum and antiretroviral activity of Amaryllidaceae alkaloids. Planta Med 2004;70:871–873. 176. Kuo YC, Lin YL, Liu CP and Tsai WJ. Herpes simplex virus type 1 propagation in HeLa cells interrupted by Nelumbo nucifera. J Biomed Sci 2005;12(6):1021–1034. 177. Balzarini J, Neyts J, Schols D, Hosoya M, Van Damme E, Peumans W and De Clercq E. The mannose-specific plant lectins from Cymbidium hybrid and Epipactis helleborine and the (N-acetylglucosamine)n-specific plant lectins from Urtica dioca are potent and selective inhibitors of human immunodeficiency virus and cytomegalovirus replication in vitro. Antiviral Res 1992;18(2):191–207. PMID: 1329650. 178. Balzarini J and McGuigan C. Chemotherapy of varicella-zoster virus by a novel class of highly specific anti-VZV bicyclic pyrimidine nucleosides. Biochim Biophys Acta 2002; 1587(2–3):287–295. 179. Alche LE, Berra A, Veloso MJ and Coto CE. Treatment with meliacine, a plant derived antiviral, prevents the development of herpetic stromal keratitis in mice. J Med Virol 2000;61(4):474–480. 180. Carlucci MJ, Scolaro LA, Errea MI, Matulewicz MC and Damonte EB. Antiviral activity of natural sulphated galactans on herpes virus multiplication in cell culture. Planta Medica 1997;63(5):429–432. PMID: 9342947. 181. Duarte ME, Noseda DG, Noseda MD, Tulio S, Pujil CA and Damonte EB. Inhibitory effect of sulfated galactans from the marine alga Bostrychia montagnei on herpes simplex virus replication in vitro. Phytomedicine 2001;8(1):53–58. PMID: 11292240. 182. Craig MI, Benencia F and Coulombie FC. Antiviral activity of an acidic polysaccharides fraction extracted from Cedrela tubiflora leaves. Fitoterapia 2001;72(2):113–119. 183. Thompson KD and Dragar C. Antiviral activity of Undaria pinnatifida against herpes simplex virus. Phytother Res 2004;18:551–555. 184. Xu HX, Lee SH, Lee SF, White RL and Blay J. Isolation and characterization of an antiHSV polysaccharide from Prunella vulgaris. Antiviral Res 1999;44(1):43–54. PMID: 10588332. 185. Chiu Lawrence C-M, Zhu W and Ooi Vincent E-C. A polysaccharide fraction from medicinal herb Prunella vulgaris downregulates the expression of herpes simplex virus antigen in Vero cells. J Ethnopharmacol 2004;93(1):63–68. 186. Lin YM, Anderson H, Flavin MT, Pai YH, Mata-Greenwood E, Pengsuparp T, Pezzuto JM, Schinazi RF, Hughes SH and Chen FC. In vitro anti-HIV activity of biflavonoids isolated from Rhus succedanea and Garcinia multiflora. J Nat Prod 1997;60(9):884–888.

349

Free radical processes in green tea polyphenols (GTP) investigated by electron paramagnetic resonance (EPR) spectroscopy K.F. Pirker1,, J. Ferreira Severino1, T.G. Reichenauer1 and B.A. Goodman1,2 1

Department of Environmental Research, Austrian Research Centers GmbH – ARC, 2444 Seibersdorf, Austria 2 Department of Chemistry & Chemical Engineering, University of Guangxi, Nanning 530005, Guangxi, People’s Republic of China Abstract. This chapter reviews the current status of research on investigations of the free radical chemistry of green tea and its constituent polyphenols (GTP). It is based on the use of electron paramagnetic resonance (EPR) spectroscopy, and also includes a section on practical aspects of the technique, which should be of value to readers who are unfamiliar with the detailed operation of EPR. The free radical chemistry of GTP is important, because many of their antioxidant functions involve reactions with O2-derived free radicals, and the products of such reactions are themselves generally free radicals. The stability of these products and their abilities to participate in subsequent reactions may have considerable bearing on their biological function. These are also discussed briefly along with the authors’ views of future investigations which would appear to be valuable for this topic. Keywords: green tea polyphenols, EPR, ESR, free radical, autoxidation, cyclic voltammetry, superoxide, hydroxyl radical, hydrogen peroxide, ROS, spin trap, antioxidant.

Abbreviations CoA CV DEPMPO DFT DMPO DMSO DNA DPPH EC ECG EDTA EGC EGCG EPR ESR GA GTP

coenzyme A cyclic voltammetry 5-(diethoxyphosphoryl)-5-methyl-1-pyrroline N-oxide density functional theory dimethyl-1-pyrroline N-oxide dimethyl sulfoxide deoxyribonucleic acid 1,1-diphenyl-2-picryl-hydrazyl epicatechin epicatechin gallate ethylenediaminetetraacetic acid epigallocatechin epigallocatechin gallate electron paramagnetic resonance electron spin resonance gallic acid green tea polyphenols

Corresponding author: Tel.: þ43-50550-3577. Fax: þ43-50550-3520.

E-mail: [email protected] (K.F. Pirker). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00013-6

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

350 HRP Ka LDL MA MP NADH NMR PAR PBN 4-POBN ROS SCE SHE TEMPO X XO

horseradish peroxidase acid dissociation constant low density lipoproteins modulation amplitude microwave power nicotinamide adenine dinucleotide nuclear magnetic resonance photosynthetic active radiation phenyl-N-t-butylnitrone a-(4-pyridyl-1-oxide)-N-t-butylnitrone reactive oxygen species standard calomel electrode standard hydrogen electrode 2,2,6,6-Tetramethylpiperidine-1-oxyl xanthine xanthine oxidase

Introduction to the chemistry of GTP oxidation The health benefits of green tea have received considerable attention in recent years, and there is general consensus that this is associated with its polyphenolic components, the green tea polyphenols (GTP). They belong to a subgroup of flavonoids, the flavan-3-ols, characterized by a resorcinol group on the A-ring, a pyrogallol or catechol group on the B-ring, and a heterocyclic C-ring where either a hydroxyl group or a galloyl group is attached. The main GTP (Fig. 1) are (þ)-catechin, ()-epicatechin (EC), ()-epigallocatechin (EGC), ()-epicatechin gallate (ECG), and ()-epigallocatechin gallate (EGCG). The antioxidant properties of polyphenols are generally thought to make a major contribution to their beneficial health properties. Antioxidant activity is defined as the inhibition of oxidation processes, and can correspond to the inhibition of the formation or the selective scavenging of reactive oxygen species (ROS), which include oxygen-derived free radicals. Polyphenols are generally thought to act as antioxidants primarily by scavenging free radicals (e.g., [1–6]), which oxidize the phenol in preference to the substrate which it protects. There are several sources of ROS (especially the hydroxyl (dOH) and superoxide anion (O2d) radicals) in vivo, some of them are summarized in the following paragraphs. ROS are produced enzymatically for defence purposes in mammals [7] and plants [8], but they also have functions in normal physiological processes, such as root elongation [9], or fruit softening [10] in plants. O2d is generated in every organism living under aerobic conditions, due to the fact that life is dependent on redox reactions and oxygen is available in every cell of an aerobic organism. The most prominent examples of electron transfer to O2 are the electron transport chains of chloroplasts [11] and mitochondria [12]. Due to the ubiquity of O2d, plants have developed a potent network of

351 (B)

(A) OH 3′

OH

2′

B 8

HO 7

O 2 C

A

B

5′

8

HO 7

6′

O 2 C

A 6

OH

4

5

OH

3

OH

4

EC (D)

(C)

OH

OH 3′

3′

B O 2 C

A

OH

B HO

7

8

O 2 C

A

3

6 5

5′

6′

6 5

OH

4

4′

2′

4′ OH

2′ 8

5′

6'

OH

(+)-catechin

HO 7

4′ OH

2′

3

6 5

3′

4′ OH

5′

6′

O

3

2′′

O

4

OH

OH

OH

6′′ 5′′

EGC

4′′

OH

OH

ECG

(E)

OH 3′′

D

OH 3′

4′

2′

B HO 7

8

A

O 2 C

6 5

4

5′

OH

6′

O

3

2′′

O

OH

OH

OH 3′′

D 6′′

5′′

4′′

OH

OH

EGCG

Fig. 1. Molecular structures of GTP; (A) (þ)-catechin, (B) ()-epicatechin (EC),

(C) ()-epigallocatechin (EGC), (D) ()-epicatechin (E) ()-epigallocatechin gallate (EGCG).

gallate

(ECG),

and

antioxidant molecules and enzymes that can scavenge this radical and related ROS [11] to limit damage from unwanted reactions. Another source of O2d is the system xanthine (X) and xanthine oxidase (XO). XO is found in mammals within the liver and intestine, and catalyses the transformation of X and hypoxanthine to uric acid, generating O2d and H2O2 (Eqs. (1) and (2); [13]). O HN O

OH N

+ N H

xanthine

N H

2 O2

+

H2O

H N

xanthine oxidase N

HO

O N

N H

uric acid

+

2 O2•- + 2 H+

(1)

352 O

OH N

HN

+ O

N H

O2 + H2O

H N

xanthine oxidase N

N H

HO

xanthine

O + N

H2O2

(2)

N H

uric acid

A main route for dOH radical generation is the Fenton reaction (Eq. (3)), in which H2O2 oxidizes a low oxidation state transition metal ion, e.g., Fe(II) to Fe(III). FeðIIÞ þ H2 O2 ! FeðIIIÞ þ dOH þ OH (3) Since H2O2 and a small amount of Fe(II) are normally present in vivo, the Fenton reaction is a common occurrence in biological systems (e.g., [14]). The dOH radical reacts in three ways: by (i) abstraction of a hydrogen atom from an organic molecule to form H2O and a carbon (C)-centered radical, (ii) addition to another structure, e.g., aromatic rings, and (iii) acceptance of an electron, e.g., from the chloride ion [15]. The formation of dOH radicals is also related to O2d production in cells. O2d dismutates to H2O2 and O2 in the presence of the enzyme superoxide dismutase (SOD) (Eq. (4)). The so-called Haber-Weiss reaction (Eq. (5)) of O2d and H2O2 leads then to dOH radical generation [16]. SOD

2O2 d þ 2Hþ ! H2 O2 þ O2

(4)

O2 d þ H2 O2 ! dOH þ OH þ O2

(5)

While O2d is not able to cross cell membranes and has a short biological half-life, H2O2 is more stable and serves as a second messenger that can cross membranes and plays a role in long distance signalling inside plants [17]. Enzymatic systems, such as horseradish peroxidase (HRP), and H2O2 can also oxidize phenolic compounds; Fig. 2 shows a generalized reaction scheme for heme peroxidases and catalases [18]. HRP, a ferric protoporphyrin IX, which is bound to cell-wall polymers, reacts with H2O2 to form three compounds. Compound I (a ferryl porphyrin p-cation radical or a ferryl protein radical) is generated by oxidation of the native enzyme with one H2O2 molecule. It can either react directly back to the ferric enzyme by another H2O2 or indirectly via compound II (a ferryl species or protein radical) by two one-electron reductions in the presence of a one-electron donor (e.g., phenolic compounds). Addition of O2d then generates the so-called compound III (a ferrous-dioxy/ferric-superoxide complex). A physiologically relevant way for its formation is reaction of the enzyme with O2d produced by the oxidative cycle with a suitable substrate such as

353 Compound III .

R-PorFeIII-O2 .

O2 O 2. -

Ferric enzyme

AH2

H2O, O2

H2O

H2O2

.

R-+ PorFeIV=O

Ferrous enzyme

H 2O 2

AH , H2O

H2O2

O2

H 2O

.

R-PorFeIII

O2

R-PorFeeII-O2

R-PorFe R-P rFeII

R-PorFe F IV=O

H 2O 2 H 2O AH2

+ . R-PorFe eIIII

AH

.

AH , H2O

e-

e-

.

AH2

R-PorFe orFeIII

Compound II

Ferric enzyme +. R-PorFeIV=O

Compound I

Fig. 2. Reaction scheme for heme peroxidases and catalases adapted from Jakopitsch

et al. [18] (AH2 – one-electron donor).

nicotinamide adenine dinucleotide (NADH) [19]. Compound III is also formed from compound II with an excess of H2O2 or from the ferrous heme protein by dioxygen binding. There is evidence that GTP also act at an earlier stage in the oxidative reaction process, and not only scavenge ROS, but also inhibit their generation, e.g., by inhibition of the enzyme XO (e.g., [20–22]). Huang et al. [23] have published experimental results that show inactivation of the enzyme HRP by phenoxyl radicals. Often the oxidation of a polyphenol takes place in a stepwise manner based on o-catechol (two phenolic OH groups) or pyrogallol (three OH groups) moieties in the structure, first forming the semiquinone radical, and in a 2nd oxidation the quinone (Eq. (6)): OH R1

oxidation

OH reduction R1

O

.-

oxidation

O reduction R1

O

(6) O

Once the semiquinone radical has been formed it can also dismutate to the hydroquinone and the quinone due to its possessing both reduction and oxidation properties. Reaction back to the hydroquinone enables additional molecules to react with free radicals and therefore enhances the antioxidant activity. However, ROS may also be formed during reactions of polyphenols. For example, Kondo et al. [24] detected the generation of O2d during the scavenging reaction of peroxyl radicals with EGCG and EGC, but not with

354 ECG and EC. This observation is also consistent with the results from Nakayama et al. [25], who confirmed that the gallyl group on the B-ring contributed to the formation of H2O2, whereas the galloyl group on the D-ring did not. These authors suggested that in tea a chain reaction occurs with O2d as the chain carrier in the formation of H2O2, which is also dependent on the standing time, pH and temperature of the tea infusion. In the same year, Mochizuki et al. [26] published a study on the kinetic and mechanistic aspects of the autoxidation of catechins (EC, ECG, EGC, EGCG). They monitored the generation of H2O2 with a peroxidase-based amperometric sensor and the consumption of O2 with a Clark-type oxygen electrode. Their results led them to suggest the following reaction mechanism: O

OH + OH

R

+

O2

2 H+

(7)

O

R

O

OH + O2•OH

R

O2•- +

+

(8)

H2O2

O

R

The reaction rate of Eq. (7) is slow due to the low redox potential (E(O2/O2d) ¼ 0.16 V vs. standard hydrogen electrode (SHE) at pH 7) and spin restriction between the reactants, but that of Eq. (8) is much faster and enhances the oxidation of catechins. The semiquinone radical of catechins formed in Eqs. (7) and (8) is thought to be a better electron donor to O2 than the fully reduced form, and the reaction rate of the semiquinone radical with O2 is fast (Eq. (9)). O

O

.R

O

+

+ O2 R

O2•-

(9)

O

The assumption of Mochizuki et al. [26] that O2d and the semiquinone work as catalysts is similar to the one proposed by Nakayama et al. [25]. The mechanism for the autoxidation of GTP proposed by Mochizuki et al. [26] is shown in Fig. 3. Since in their measurements the concentration of the generated H2O2 exceeded the initial concentrations of GTP, it was also suggested that a subsequent oxidation reaction followed the two-electron oxidation, and that the quinone underwent oxidative polymerization. Redox-cycling also requires stability of the radical with respect to side reactions. Guo et al. [1] found that the stabilities of the GTP semiquinone radicals were in the order of EGCG W ECG WW EC, and that the radical

355 R 2′ 8

HO

A

1

4

OH

O2

OH

.-

B

O 2 C

6

OH

6′

OR′

Red BO2(inhibition)

O2.(initiation)

R

H2O2

O2 (propagation) Ox OH-

O

Cu+ (catalysis)

O2.-

O2

OH• +

R = H, R‘ = H

(-)-Epicatechin Gallate

R = H, R‘ =

(-)-Epigallocatechin

R = OH, R‘ = H

(-)-Epigallocatechin Gallate R = OH, R‘ =

O O2

H2O2 polymer

(propagation) Ox

O

Oxygen species

H2O2

(-)-Epicatechin

O2.-

O2

Sem

Cu2+

Red-borate Complex

R O

O2

Red Sem +

O2.- Red

Cu

+ OH• H2O2 Cu OH-

SOD, H+ (termination)

Fig. 3. Oxidation scheme for GTP proposed by Mochizuki et al. [26].

from EGC was too unstable to investigate. Thus a galloyl group on position 3 seems to have a stabilizing effect. There are, however, many examples where the reverse reaction (reduction to the hydroquinone) is prevented because of additional reactions. Such side reactions often take place at the free radical component in the polyphenol oxidation process and include fragmentation or di-(poly-)merization reactions. It is clear, therefore, that the redox chemistry of polyphenols is far more complex than is often considered. The main aim of this review is to consider the current status of research on free radical processes in green tea and its constituent polyphenols (GTP), with focus being given to the characterization of the free radicals formed on their oxidation. Our method of choice for characterizing free radicals is electron paramagnetic resonance (EPR) spectroscopy, which is also known as electron spin resonance (ESR) spectroscopy. EPR spectroscopy is the electron equivalent of the better-known nuclear magnetic resonance (NMR) spectroscopy and is specific for paramagnetic molecules or materials that contain unpaired electrons. The majority of this review will be based on applications of this technique. However, where we consider appropriate and supportive for understanding the radical chemistry, other aspects of tea and polyphenol research will be included. A short overview of the biological functions of GTP is given in section ‘Biological functions of flavonoids’ followed by some background to the EPR technique along with experimental methods that can be used to characterize polyphenol-derived radicals (section ‘EPR spectroscopy’). The section ‘Investigations of free radical reactions in green tea and GTP’ then covers a review of the main results that

356 have been obtained from studies of green tea and individual GTP. Finally, the section ‘Conclusions and forward look’ includes the authors’ opinions on the current status of the subject, and contains suggestions for future research measurements. Biological functions of flavonoids Plants cannot escape from a stressful environment and therefore have evolved sophisticated biochemical mechanisms to combat stress factors caused by changes in abiotic environmental factors (e.g., temperature, light, nutrients), by pathogens (fungi, bacteria, viruses), or by herbivores. Thus plants produce the so-called secondary metabolites that are involved in stress abatement. The term ‘‘secondary’’ indicates that these substances are produced from products of primary metabolism. Flavonoids, with more than 4,000 described derivatives, are one of the three major groups of secondary metabolites generated in higher plants (besides isoprenoids and alkaloids) [27]. Biosynthesis of flavonoids The starting point for the biosynthesis of flavonoids from primary metabolic products is the amino acid phenylalanine. The enzyme phenylalanine ammonia lyase catalyses the first step by deamination of phenylalanine to cinnamic acid. Cinnamic acid is then hydroxylated by cinnamate 4-hydroxylase, and the resulting hydroxycinnamic acid is activated by the enzyme 4-coumarate coenzyme A (CoA) ligase by formation of a CoA ester. This activated CoA ester of hydroxyl cinnamic acid is the starting point for various branches of the so-called phenylpropanoid pathway [28,29] that includes the formation of catechins, examples of which are the GTP. In plants, flavonoids frequently occur in a glycosylated form (mostly glucose, but also galactose, rhamnose, xylose, arabinose, etc.), but the main GTP are generally described as being in their free forms (e.g., [30]). Functions of flavonoids in plants Flavonoids have several important roles in plants, including protection from UV-light, defence against pathogens (e.g., phytoalexins) and herbivores, signalling, reproduction, allelopathy (which is the inhibition of plant growth by chemicals produced by another plant), and symbiosis. It is well established that flavonoids in the upper leaf-epidermis protect leaf cells from UV-B radiation by specifically absorbing in the wavelength region from 280 to 340 nm, but not in the photosynthetic active radiation (PAR) waveband [31]. In addition, it is often speculated that the antioxidant properties of these polyphenols might help to diminish the load of ROS that

357 are generated in UV-B damaged cells. For example, polyphenols located in the chloroplast have recently been reported to have a role in the scavenging of singlet oxygen (1O2) in mesophyll cells of Phillyrea latifolia [32]. Flavonoids are involved in the modulation of plant development, which is regulated by plant hormones called auxins. Auxins are synthesized in meristematic tissues and in leaves, from which they are transported to the place of action. It is assumed that part of this regulation works via flavonoid binding to auxin transporters in the cell membrane [33]. In plants the physiological effects of flavonoids are at least partly affected by their oxidation chemistry. Apart from non-enzymatic oxidation (e.g., autoxidation), oxidation reactions involving enzymes such as polyphenol oxidases (catechol oxidase and laccase) and peroxidases are important [34]. Catechol oxidase is able to oxidize o-diphenols, while laccase oxidizes o- and p-diphenols in the presence of O2. The oxidation of phenols by peroxidases occurs in the presence of H2O2. The browning effects that occur during senescence (e.g., of wood) or the development of seed coats are visible examples of flavonoid oxidation in planta [34]. Flavonoids also play important roles as defence substances. In this respect they are divided into two groups: ‘‘preformed’’ and ‘‘induced’’ compounds. The latter might be present in unstressed plant tissues in low concentrations, but their synthesis is strongly induced by the appearance of stress factors, e.g., the so-called phytoalexins are only induced by an infection, or by exposure to some type of abiotic stress. An example for the role of preformed flavonoids is in their well-established involvement in plant–insect interactions, which however cannot be simply reduced to a protection of plants from herbivores [35]. Preformed flavonoids are also important in fungal defence of plants [36]. For example, quercetin forms the antifungal agent 3,4-dihydroxybenzoic acid when oxidized by peroxidase, and condensed tannins have been shown to be involved in nematode resistance in banana (Musa sp.) [37]. Since for tea production only young, healthy leaves are harvested, the amount of preformed flavonoids should prevail at the time of harvest. Chemical alteration of flavonoids after harvest is important in the fermentation process used to produce oolong and black teas, but has not been investigated for green teas, although it has been established that wounding can induce flavonoid synthesis [38]. Roles of GTP in tea plants The specific roles of GTP in tea leaves have not yet been elucidated. GTP are the monomers from proanthocyanidins, also known as condensed tannins, the smallest of which is EGCG. In general, such molecules are important in protection of plants against microbial pathogens, insect pests, and larger herbivores [39]. Flavonoids are also involved in the reaction of tea plants to

358 abiotic stress factors. It has been reported that formation of epicatechin quinone and epigallocatechin gallate quinone was significantly increased in leaves of tea plants exposed to drought stress, whereas EC and EGCG concentrations were maintained and lipid peroxidation estimated from malondialdehyde formation was decreased [40]. Since the formation of epicatechin quinone and epigallocatechin gallate quinone preceded the appearance of elevated levels of proanthocyanidins, it was speculated that these quinones might be intermediates in the formation of condensed tannins. Phenolic compounds in the human diet Foods and beverages rich in polyphenolic substances (such as green tea, or red wine) have been shown to have positive health effects in various epidemiological studies. In the past, this was thought to be related to the antioxidative properties of these chemicals and extensive comparisons of the antioxidant activity of various fruits, vegetables, and beverages were performed [41]. However in recent years, the simple hypothesis that a higher content of antioxidants in foods makes them healthier has been increasingly questioned. It should also be borne in mind that antioxidants inhibit microbial spoilage and oxidative degradation of foods, and hence limit the formation of potentially toxic lipid peroxidation products [42,43]. Thus the epidemiological evidence may be the consequence of a combination of many factors. To understand the functioning of secondary plant compounds in humans it might be helpful to draw analogies to their functional mechanism in plants [44]; in both systems, however, a detailed knowledge of the chemistry of these compounds is a prerequisite to understand their biological behavior. Medicinal properties of GTP In plants, flavonoids are expected to have specific protein targets, which are, however, mostly unknown [45]. Identification of protein targets in plants might help to understand their functioning in humans. Effects of flavonoids on several mammalian enzymes are known; probably most prominent is the phytoestrogenic activity [46]. Sirtuin deacetylases (NADþ-dependent deacetylases) can also be activated by these compounds, and thus can mimic caloric restriction and increase longevity in cells of yeast, worms, flies, and humans [47]. Also, inactivation of XO by GTP results in a lowering of uric acid concentration which can induce gout if present in excess. While there is some evidence for the role of flavonoids, especially of GTP, in protection from cardiovascular diseases [48], some doubt has been expressed about the involvement of GTP in the prevention of cancer [49]. Nevertheless, EGCG has been shown to influence gene expression by interacting with transcription factors (NF-kB and AP-1) that are involved in

359 tumor promotion [50], although at least part of this interaction appears not to be dependent on its antioxidant activity [51]. Transition metals have been reported to play a major role in the chemistry of flavonoids and GTP. Water extracts of green tea have been reported by Malik et al. [52] to produce oxidative damage to deoxyribonucleic acid (DNA) in the presence of transition metal ions such as Cu(II). A mechanism was proposed to explain the cytotoxic behavior of plant-derived polyphenols against cancer cells that involves mobilization of endogenous Cu followed by the generation of dOH radicals via a Cu-catalyzed Fenton reaction. However, water extracts of green tea contain Mn(II) (see section ‘Investigations of free radical reactions in green tea and GTP’), and probably also some Fe(II), which would also be expected to initiate the Fenton reaction irrespective of added Cu, so the cause of this observation may be more complex than that presented here. In chemical studies, complexing with Zn(II) is frequently used to stabilize semiquinone radicals by inhibiting further oxidation (see section ‘Investigations of free radical reactions in green tea and GTP’). It was also shown that the Zn-stabilized semiquinone radical of EGCG could form covalent bonds with proteins, which may represent an important aspect of its biological properties [53]. Another example is shown by Kagaya et al. [54], who observed that Zn enhanced the protective activity of EGCG against hepatotoxin induced cell injury in rat hepatocytes. Complexation between EGCG and Zn was confirmed by UV-visible spectroscopy, and stoichiometric studies suggested a 1:2 Zn:EGCG complex, analogous to the Cu(EGC)2 structure proposed by Yoshioka et al. [55]. Chen et al. [56] have also reported an enhancement of the protective effect of EGCG by Zn(II) in inhibiting the growth of PC-3 cells. However, in this work the effects were strongly dependent on the concentrations and the order in which the components were added together. In contrast to the behavior of Zn(II), Furukawa et al. [57] found that DNA was damaged by EGCG in vitro in the presence of Fe(III) and Cu(II) (the former as the ethylenediaminetetraacetic acid (EDTA) complex, the latter as the free ion). In both cases, however, evidence was produced that implicated dOH radical production as the medium for DNA damage. Since the metals were initially in their oxidized forms, reduction by the polyphenol would have to have occurred prior to dOH radical production by the Fenton reaction. Electrochemical and UV/visible absorption spectroscopic studies of the interaction between EGCG and DNA showed that DNA modifications only occurred when Cu(II) was present and not with EGCG alone [58]. Redox reaction between the polyphenol and Cu is considered to be the explanation for this observation. Yu et al. [59], however, found little evidence for induced free radical generation by EGCG in the presence of Cu(II) during a study of the growth inhibition of prostrate cancer cells. These authors drew attention to the complexity of the system, and suggested that an ‘‘array’’ of

360 metal–ligand complexes might be formed, with some undergoing varying degrees of oxidation and/or polymerization. Rates of oxidative DNA degradation and dOH and O2d radical formation were all reported to be greater with EGCG than with EC [60]. Additionally, Cu-mediated oxidation of the polyphenols led to the formation of polymeric compounds that were more efficient prooxidants than the unoxidized forms. A model was proposed in which a quinone and quinone methide bind Cu(II) to EC. As illustration of the complexity of its biological chemistry, EGCG has been shown to inhibit Cr(VI)-induced DNA damage [61], in contrast to its ability to damage DNA through dOH radical generation as described above. Unfortunately, these authors explain their results in terms of dOH radical scavenging, whereas it seems more likely that the behavior involves reactions between the metal and EGCG, which would lead to an inhibition of dOH formation. Chelation behavior has been proposed as the mechanism through which GTP are able to prevent or ameliorate iron-induced oxidative stress, and such molecules have been proposed as brain-permeable neuroprotective drugs for the treatment of neurodegenerative illnesses, such as Parkinson’s and Alzheimer’s. This work has been reviewed recently by Weinreb et al. [62] and Mandel et al. [63,64], who concluded that the neuroprotective behavior of green tea involves multiple biochemical pathways, with metal chelation being an important component. These are just a few of many examples in which GTP are described as having medicinal properties. They are presented to illustrate the importance of such chemicals, but a more detailed discussion of this subject is beyond the objectives of this review. It should also be recognized that epidemiological studies often produce contradictory results and that more specific knowledge of the chemical behavior should make valuable contributions to understanding their biological effects.

EPR Spectroscopy Background to EPR As stated above, EPR is the electron equivalent of NMR. In an analogous way to NMR being based on nuclei having non-zero spins, EPR is dependent on the fact that an electron has a magnetic moment, and in the presence of a magnetic field, the energies of the different electron spin configurations are unequal. However, because the mechanism for acquiring an EPR spectrum (with the most commonly used spectrometers) is fundamentally different from NMR, some background to the theory and practice of EPR spectroscopy will be presented here.

361 The principle of EPR is based on the separation of the two spin states of an unpaired electron in the presence of a magnetic field (þ1/2 and 1/2; Fig. 4). The energy separation of these states, DE, is proportional to the magnetic field according to the relationship (Eq. (10)): (10)

DE ¼ gbe B

where B is the magnitude of the magnetic field (T), be the Bohr magneton (9.27402  1024 J T1), and g a constant that is characteristic of the sample (for an isolated electron g ¼ 2.0023, and it is close to this value for many free radicals). The ratio of the populations between the upper and lower energy states (nþ, n) is given by the Boltzmann equation (Eq. (11)): nþ =n ¼ expðgbe B=kTÞ

(11)

where T is the temperature (in Kelvin) and k is the Boltzmann constant (1.38066  1023 J K1). It should also be noted that the sum of n and nþ equals 1. An EPR spectrum is usually acquired by varying the magnetic field while irradiating the sample with electromagnetic radiation at a fixed energy (frequency), and consists of absorption plotted against magnetic field. Microwave radiation is applied perpendicular to the magnetic field and absorption takes place when the energy hn is equal to DE.

Energy(E) ms = +½

E0

∆ E = h = ge B0

ms = -½

B0

Magnetic Field (B)

Fig. 4. Splitting of the energy levels of the spin magnetic moment (ms) of a free

electron (S ¼ 1/2) with the magnetic field (B). Resonance occurs when the energy gap (DE) equals the energy of the applied magnetic field at B0.

362 Since transition probabilities for the þ1/2-1/2 and for the 1/2-þ1/2 transitions are the same, there is a tendency for the populations of the two states to become equal during an EPR experiment. If that happens there is no longer any net absorption and the EPR signal is lost, a process known as saturation. However, there are relaxation processes that tend to re-establish the Boltzmann population distribution, and usually in recording an EPR spectrum one attempts to control the microwave power (MP) so as to avoid signal saturation; this is in fact essential if any attempt is to be made to derive quantitative information from a spectrum. In polyphenol-derived free radicals there are only very small differences in the magnitudes of the g-values, and in many papers these are not even reported. Consequently, the g-values and the factors that influence their magnitudes will not be considered in this review, although they are important in the spectra of transition metal ions and complexes. The main spectral parameter that is observed in EPR investigations of organic free radicals is the hyperfine coupling (hfc), which arises as a result of interactions between the unpaired electron and nuclei with non-zero spin. This phenomenon is described in Fig. 5, which shows an example of the energy levels obtained when S ¼ 1/2 and the nuclear spin, I, also equals 1/2. According to the selection rules, DmS ¼ 1, DmI ¼ 0, only two peaks are observed. The magnitude of the separation between them is known as the hfc constant, a. mI = +½ ms = +½ mI = -½

a

Energy(E )

∆ E = h ν = g eB0

B=0 a=0

mI = -½ ms = -½ mI = +½

B

a

Fig. 5. Energy splitting of the spin magnetic moment ms due to interaction with a

nucleus of spin I ¼ 1/2 (‘‘solid line’’), visible as a doublet in the EPR spectrum. The ‘‘broken line’’ indicates the splitting of the energy levels of a free electron (single absorption peak). The separation of the peaks is known as hyperfine coupling (hfc) and the magnitude of the splitting is the hfc constant, a.

363 Magnitudes of a are proportional to the unpaired electron spin density on that nucleus. In free radicals with several 1H atoms, each of these atoms contributes to the observed hfc pattern. For equivalent 1H atoms, the intensities of the resulting peaks are in the form of a binomial distribution. In general, as will be seen later, the 1H atoms are not all equivalent and more complex patterns are observed in experimental spectra. Sometimes the assignment of peaks in EPR spectra is not straightforward and it is recommended that the validity of a proposed assignment is tested by computer simulation. Various programs are available for simulation and even least squares fitting of free radical spectra, but we generally use the Bruker Simfonia software. At low MPs, the EPR signal intensity is proportional to the square root of the MP, as illustrated in Fig. 6. However, above a certain MP, the signal increase is less than that predicted, and at high powers the signal intensity may in fact decrease. It is, therefore, good spectroscopic practice to perform a MP saturation study in each new experiment in order to optimize spectral quality and to identify at which point saturation starts (Fig. 6A). Surprisingly, it would appear from methodology descriptions in the polyphenol EPR literature that this is seldom done, and there are various reports of EPR spectra of polyphenol-derived free radicals that were acquired using MPs well in access of that where saturation commences. The influence of the MP on spectral resolution is also shown in Fig. 6B–D. For technical reasons (enhancement of the signal intensity), acquisition of an EPR spectrum involves modulation of the magnetic field, which results in the detection of the 1st derivative of the absorption. The spectral intensity is directly proportional to the modulation amplitude (MA), but this should be smaller than the width of the narrowest line in the spectrum, otherwise line broadening (and a consequent loss of resolution) will occur (as shown in Fig. 7). Sometimes improved resolution is obtained by recording 2nd derivative spectra, which as will be seen later deliver more information than the 1st derivative (Fig. 8), albeit often with reduced intensity. In most EPR measurements a field modulation frequency of 100 kHz is used, but sometimes with spectra having very narrow lines this is too great and produces broad lines in the 1st and modulation side bands in the 2nd derivative [65]; the latter could conceivably be misinterpreted in terms of small hfc if they are not recognized. An example to illustrate this effect is given in Fig. 9. Experimental techniques The stabilities of polyphenol-derived free radicals are often not high, and may consequently be difficult to observe in conventional spectroscopic

364

Intensity (arbitrary unit)

(A) 5.0 4.5 4.0 3.5 3.0 (C) 2.5 2.0 1.5 1.0 0.5 (B) 0.0 0.0 0.5

(D)

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

√ MP (mW) → D-ring oxidation ECG → gallate radical (B)

0.1 mT

(C)

(D)

Fig. 6. (A) Relationship between spectral intensity and microwave power (MP) for

the radical from oxidized ()-epicatechin gallate (ECG) and the respective EPR spectra at (B) 0.02 mW, (C) 0.2 mW, and (D) 20 mW MP. Spectra were acquired with 20 kHz modulation frequency and 0.001 mT modulation amplitude (MA). Spectral interpretation is shown by the ‘‘stick’’ diagrams (our unpublished results).

measurements where a solution containing the radical is placed in the spectrometer, which then has to be ‘‘tuned’’ before spectral acquisition commences. This delay may sometimes be sufficiently long for the radical to disappear before its spectrum can be acquired.

365 → D-ring oxidation ECG → gallate radical (A)

0.1 mT

(B)

(C)

Fig. 7. Effect of increasing modulation amplitude (MA) on the spectral resolution

of autoxidized ()-epicatechin gallate (ECG). Spectra were acquired with (A) 0.001 mT, (B) 0.005 mT, and (C) 0.009 mT MA, 20 kHz modulation frequency, and 0.2 mW microwave power (MP). Spectral interpretation is shown by the ‘‘stick’’ diagrams (our unpublished results).

Various techniques are available for the characterization of unstable free radicals, e.g., the use of a flow system, in which the radical is formed outside of the spectrometer cavity, and then flows through the microwave cavity at a short but calculable time period later. This time can be varied by changing the flow rate. Also, by subsequently turning off the pump and hence stopping the flow, the decay kinetics of the radical(s) can be investigated. A second approach is to actually form radicals inside the microwave cavity. Two techniques that are fairly common involve either in situ irradiation (usually in the UV range) or electrochemical oxidation (or reduction, depending on the position and polarization of the electrodes). Although they have not been used extensively in the study of GTP radicals, these techniques

366 → B-ring oxidation EGCG

0.1 mT (A)

→ gallate radical → D-ring oxidation EGCG

(B)

Fig. 8. EPR spectrum of alkaline autoxidized ()-epigallocatechin gallate (EGCG) as (A) 1st and (B) 2nd derivatives. Spectra were recorded with 0.2 mW microwave power (MP) and 20 kHz modulation frequency. The modulation amplitude (MA) was 0.001 mT in (A) and 0.005 mT in (B). Note the improvement of resolution of the peaks in the central region of the spectrum where three signals (shown by stick diagrams) from the autoxidized EGCG are overlapping (our unpublished results).

will be presented here because we believe that they have considerable potential for use in future experimentation in this area. In the case of photolysis with UV-light, three types of reaction have been described [66]; direct photodissociation (Eq. (12)), photoreduction (Eq. (13a and b)) where the excited parent molecule (a diketone) reacts with a solvent molecule, and indirect by photodissociation of tert-butyl-peroxide (Eq. (14a and b)). hv

ArOH!ArOd þ H hv

RCOCOR!½RCOCOR hv

½RCOCOR þ R0 H ! RCOCdOHR þ R0 d

(12) (13a) (13b)

367 → D-ring oxidation ECG → gallate radical (A)

0.1 mT

(B)

(C)

(D)

Fig. 9. Effect of high modulation frequency on spectral line shape of 1st (A, B) and

2nd derivatives (C, D) of alkaline autoxidized ()-epicatechin gallate (ECG) using 20 kHz (A, C) and 100 kHz (B, D) modulation frequency. Spectrum (D) shows the generation of additional doublets (modulation sidebands) in the 2nd derivative of the triplet, whereas the intensity of the sextet is drastically decreased at 100 kHz modulation frequency. 0.2 mW microwave power (MP) and 0.001 mT modulation amplitude (MA) were used for all spectra. Spectral interpretation is shown by the ‘‘stick’’ diagrams (our unpublished results).

368 hv

ðt-ButÞ2 O2 !t-ButOd

(14a)

t-ButOd þ ArOH ! t-ButOH þ ArOd

(14b)

For in situ electrochemical EPR studies, the common approach is to use a flat cell with the working electrode (usually a Pt grid) positioned inside the flat area; the auxiliary and reference electrodes are then placed outside the flat area of the cell (e.g., above it). Various improvements and adaptations to this simple setup have been described by Compton and Waller [67]. Another approach appropriate for characterization of short-lived radicals is through use of a procedure known as ‘‘spin trapping’’. This involves addition of a diamagnetic molecule (the spin trap) to a solution in which unstable free radicals are generated. The spin trap then reacts with the radicals to form adducts, which are in fact new free radicals that have sufficient stability for spectroscopic characterization. Often the spectral parameters of such adducts permit identification of the type of radical that has been trapped, although the resolution of the hyperfine structure is usually inadequate for a complete characterization of organic C-centered radicals. Some commonly used spin traps are shown in Fig. 10. Many spin traps are nitrones and their free radical products from spin trapping experiments are nitroxides. Typical EPR spectrum of a C-centered radical adduct with the spin trap a-(4-pyridyl-1-oxide)-N-t-butylnitrone (4-POBN) and an dOH-adduct of 5-(diethoxyphosphoryl)-5-methyl-1-pyrroline N-oxide (DEPMPO) are given in Fig. 11. The 4-POBN adduct is characterized by a sextet that is composed of a triplet from the 14N (I ¼ 1) nucleus, further split into doublets from interaction of the unpaired electron with the 1H (I ¼ 1/2) H C

CH3

O N

+

O CH3 + CH3 N

H CH3

O

+ N

C

CH3

CH3 PBN

4-POBN

O H3C H3C

N

P

+

O

(C2H5O)2

CH3

N

+

O DMPO

DEPMPO

Fig. 10. Molecular structures of some commonly used commercially available spin traps (phenyl-N-t-butylnitrone (PBN), a-(4-pyridyl-1-oxide)-N-t-butylnitrone (4-POBN), dimethyl-1-pyrroline N-oxide (DMPO), 5-(diethoxyphosphoryl)-5methyl-1-pyrrolinr N-oxide (DEPMPO).

369 a(14N) a(1H)

(A)

1 mT

a(31P) a(1H) (B)

a(14N) 1 mT

Fig. 11. Different EPR spectra from oxidation of Agaricus mushroom, chosen

to illustrate the radical selectivity of spin traps; (A) the C-centered radical adduct of 4-POBN (a(1H) ¼ 0.32 mT; a(14N) ¼ 1.56 mT) generated by autoxidation, and (B) the OH radical adduct of DEPMPO (a(1H) ¼ 1.3 mT; a(14N) ¼ 1.4 mT; a(31P) ¼ 4.72 mT) produced by oxidation with K3Fe(CN)6 [113]. Spectra were acquired using 20 mW microwave power (MP), 100 kHz modulation frequency and 0.1 mT modulation amplitude (MA). Spectral interpretations of these adducts are given by the ‘‘stick’’ diagrams.

on the C atom adjacent to the N-O group [68]. The dOH-adduct of DEPMPO also contains an additional large splitting from the 31P (I ¼ 1/2) nucleus [69]. These techniques permit the characterization of quite unstable radicals, which may be precursors of more stable radicals that are observed in conventional spectroscopic measurements and are therefore useful in the interpretation of reaction mechanisms. EPR spectroscopy has also been used to measure the ‘‘antioxidant activity’’ of GTP. Various methods have been published, but they all involve measuring changes in intensity of a known radical signal induced by the presence of the ‘‘antioxidant’’ molecule. Stable radicals which are commonly used for reference signals are 1,1-diphenyl-2-picryl-hydrazyl (DPPH), galvinoxyl, or Fremy’s salt (e.g., [2,4,70]). The reference radicals can also be generated during the experiment, e.g., the 2,2,6,6-tetramethylpiperidine1-oxyl (TEMPO) signal after reaction with 1O2 [2], or an adduct signal from a spin trap when competing with the ‘‘antioxidant’’ for radicals such as

370 dOH or O2d. Some examples are given by Guo et al. [1,2], Unno et al. [5,6], and Polovka et al. [71]. Determination of dOH scavenging activity of polyphenols or antioxidants in general is a contentious issue because of the virtually indiscriminate reaction of dOH with organic molecules [72]. Extreme caution should, therefore, be exercised in interpreting the significance of such experimental results, especially with respect to biological systems. Investigations of free radical reactions in green tea and GTP Free radical chemistry of green tea Fresh tea leaves have high contents of flavonoids, e.g., flavanols (25% of their dry weight), flavonols and flavonol glycosides (3%), polyphenolic acids and depsides (5%), and caffeine (3%). As stated previously, the main flavanols are EC, ECG, EGC, and EGCG. The major types of tea, namely green (unfermented), oolong (half-fermented) and black (fully fermented), are distinguished by the levels of post-harvest fermentation to which they are subjected [30]. Fermentation is the controlled oxidation of those polyphenols in tea leaves, and is accompanied by a darkening of the leaves and a decrease of astringency. It is initiated by oxygen after cell disruption (rolling of leaves) and catalyzed by the enzyme polyphenol oxidase. Since this is the most critical step in the production of tea, it has a great influence on the final tea quality. Fermentation of tea reduces the amount of GTP substantially as a result of oxidation and polymerization reactions [73]. The main condensation products are theaflavins and thearubigins which are generated either by pyrogallol–catechol or pyrogallol–pyrogallol condensation [30]. Thus the investigations of oxidation reactions of GTP are also relevant to chemical processes that occur during the formation of oolong and black teas. Antioxidant properties of tea solutions have been investigated by EPR spectroscopy as described in section ‘Experimental techniques’ [70,71]. Gardner et al. [70] investigated the ability of GTP and extracts of green and black tea to scavenge Fremy’s salt (in H2O) and the galvinoxyl radical (in ethanol (EtOH)). EGCG was reported to be the most effective scavenger of both radicals, whereas EGC was least effective with scavenging galvinoxyl radicals and catechin least effective with Fremy’s salt. These authors concluded that the antioxidant potential in tea was a simple summation of the individual antioxidant activities of the GTP, and therefore eliminated the possibility of synergistic or antagonistic side reactions. Polovka et al. [71] used spin trapping techniques to determine radical scavenging abilities of green, black, and fruit teas. H2O2/NaOH/dimethyl sulfoxide (DMSO) was used to generate dOH, O2d, dCH3 radicals which were trapped by DMPO and 4-POBN. They found that green teas had the highest O2d scavenging ability followed by black and fruit teas. They also reported that green teas

371 showed prooxidant activity as a result of Fenton reaction chemistry initiated by higher Mn(II) and ascorbic acid concentrations. Yen and Chen [74] reported that antioxidant activity (as measured amongst others by its scavenging effect on O2d, dOH, H2O2, and DPPH radical) was much lower in black tea than in green tea, but highest in oolong tea. In our view such results illustrate the dangers of attempting to relate data from simple chemical assays to health properties of food products, since consumption of black tea is considered to convey distinct health benefits on the consumer [75]. The remainder of this review is devoted to investigations of oxidative processes in green tea and its individual polyphenols using EPR spectroscopy. Results from the scientific literature are summarized and supplemented with information from our own recent investigations, much of which is currently still unpublished. The EPR spectra of dried tea leaves (e.g., Fig. 12A) contain signals from the transition metal ions, Fe(III) and Mn(II), as well as from an organic free radical component. The Mn(II) signal is characterized by a sextet, and a single free radical peak is located between the third and fourth manganese peak. The remaining peaks can be assigned to Fe(III) components in the leaves. The major broad feature that underlies the Mn(II) signal is associated with electron exchange interactions, the most common source of which is iron storage proteins, which contain a polymeric iron oxyhydroxide core. A smaller feature at low field (gB4.3) corresponds to uncharacterized mononuclear Fe(III) complexes. Similar EPR spectra are seen in many types of plant tissue and are a common feature of leaves in general (e.g., [76,77]). The relatively large amounts of these transition metal ions in tea leaves might also be expected to have an appreciable influence on the chemistry of the beverage [71]. Reports of EPR spectra from water extracts of tea leaves [78,79] have shown spectra that were dominated by the Mn(II) signals, similar to that in Fig. 12B, thus demonstrating that the Mn(II) component in the leaves is soluble. To the best of our knowledge, information is not currently available on the extractability of the Fe components. Yoshioka et al. [79] also reported a free radical signal that could have originated from polyphenols, but its identification must be regarded as inconclusive since there was no resolution of hyperfine structure that is characteristic of GTP. The failure of Yoshioka et al. [79] to observe hyperfine structure in the free radical component of their EPR spectra, which was obtained from an alkaline tea leaf suspension, appears to be the consequence of using inappropriate spectral acquisition parameters, most notably the MA. Repeating their experiments using more reasonable parameters gave the spectrum shown in Fig. 13A. Expansion of the free radical region in the spectrum (Fig. 13B) showed that there is definite structure associated with the free radical resonance. This result was used as the starting point for further investigations of the oxidation of green tea under more carefully controlled conditions.

372 Mn(II) Fe (III) g = 2.0

(A) Fe (III) g = 4.3

100 mT

(B)

10 mT free radical

Mn(II)

Fig. 12. EPR spectra from green tea; (A) dried and ground leaves and (B) an aqueous

extract of dried ground leaves, acquired using 100 kHz modulation frequency, 20 mW microwave power (MP), and 0.6 mT (A), 1 mT (B) modulation amplitude (MA). Spectral interpretation is shown by the ‘‘stick’’ diagrams (our unpublished results).

The EPR spectrum from a hot water tea extract oxidized at pH 13 with NaOH using a flow system is shown in Fig. 13C. This spectrum consists of four components, and subsequent measurements showed that they are identical to components observed in the spectra obtained during oxidation of the individual GTP, EGCG, EGC, and ECG, and gallic acid (GA), which is a breakdown product of EGCG and ECG. The 12-peak spectrum is assigned to the B-ring oxidation of EGCG. EGC is also oxidized at the B-ring and

373

(A)

10 mT

(B)

1 mT

→ B-ring oxidation EGCG (C)

0.1 mT

→ D-ring oxidation ECG or EGCG → gallate radical → B-ring oxidation of ECG

Fig. 13. EPR spectra from a suspension of dried and ground green tea leaves in

alkaline solution; (A) 70 mT scan range; (B) 5 mT scan range using 5 mW microwave power (MP), 100 kHz modulation frequency and 0.6 mT modulation amplitude (MA); (C) alkaline autoxidized hot water extract of green tea leaves using 2 mW MP, 20 kHz modulation frequency and 0.001 mT MA. Spectral interpretation is given by the ‘‘stick’’ diagrams (our unpublished results).

results in the 18-peak spectrum. A radical centered on the D-ring of EGCG and/or ECG is visible as the 6-peak spectrum. Finally, the triplet spectrum is identical to the radical produced by GA oxidation. The spectra of these radical species are discussed in more detail later. When the pump from the flow system was switched off, it was possible to monitor the decay kinetics of these radicals (Fig. 14). Each of the spectral components decreased at essentially the same rate, suggesting that the concentrations of the radical components are dependent on O2, which drives the oxidation reaction. This conclusion was confirmed by measurements using N2-saturated solutions, where an approximate 1,000-fold decrease in the signal intensity was observed.

374 3.50 oxidised oxidised oxidised oxidised

Intensity (arbitrary unit)

3.00

B-ring of EGCG D-ring of ECG/EGCG GA B-ring EGC

2.50 2.00 1.50 1.00 0.50 0.00

0

120

240

360

480 time (s)

600

720

840

Fig. 14. Plots of intensity versus time of the EPR signals in an alkaline autoxidized

green tea extract after switching off the pump in a flow experiment (our unpublished results).

Autoxidation of individual GTP Autoxidation under alkaline conditions is by far the most popular oxidation condition that has been used to investigate the free radical and antioxidant chemistry of polyphenols. In combination with EPR spectroscopy, considerable information has been obtained on the structure of the various radicals. This section summarizes the relevant literature on autoxidation of GTP, and also discusses the importance of the experimental setup in optimizing the information that can be obtained from an EPR spectrum. The first report of an EPR spectrum from autoxidized catechin was published by Kuhnle et al. [80]. Radical formation occurs on the B-ring, which is oxidized to the semiquinone radical. A typical semiquinone radical spectrum (from oxidized luteolin) is shown in Fig. 15A. In this spectrum, the largest hyperfine coupling (hfc) arises from the proton (1H) at C 6u in the p-position to the oxidized OH group followed by the hfc a2 from the 1H on C 2, and then two smaller 1H splittings, a2’ and a5’, at the positions adjacent to the catechol group (Table 1). EPR spectra from alkaline autoxidation of catechin and its diastereomer EC in aqueous DMSO and aqueous EtOH show a remarkable difference in the magnitudes of the hfc constants [53,81], even though the two GTP differ only in the configuration of the attached groups on the heterocyclic C-ring (trans- and cis-configuration). An oxidation product was identified as

375 O 3′

O

2′

(A)

8

O

O C

7

A

5′ 2

6

6′

3 4

5

0.1 mT

4′

B

O

O

O

(B)

3′

O 8

O 7

A 6 5

O

2′

O C 4

B

O 4′ 5′

2

6′

3

O

0.1 mT

Fig. 15. EPR spectra obtained from alkaline autoxidized luteolin [85]; (A) semi-

quinone radical from B-ring oxidation, and (B) the radical with OH substitution on the B-ring. Spectral parameters are 20 mW microwave power (MP), 100 kHz modulation frequency, 0.01 mT modulation amplitude (MA). Interpretations of the EPR spectra are given by ‘‘stick’’ diagrams.

catechinic acid (6-(3,4-dihydroxyphenyl)-7-hydroxy-2,4,9-bicyclo [3.3.1] nonatrione), which is generated by rearrangement reactions after oxidation [81–83]. Kennedy et al. [84] suggested the formation of catechinic acid via a radical mechanism rather than an ionic one. At high pH, oxidation reactions are frequently accompanied by hydroxylation, especially with catechol-containing compounds. Jensen and Pedersen [81] reported solvent and pH-dependence of hydroxylation reactions of catechin and EC, and Pirker et al. [85] showed hydroxylation at the C2u position of the B-ring during alkaline oxidation of luteolin (Fig. 15B). Several other investigations of the GTP EC, ECG, EGC, and EGCG at alkaline pH have also been reported [1,26,86,87], and the results from these papers are summarized in Table 1. In the oxidation of EC, Yoshioka et al. [86] reported the generation of a spectrum that was too weak to be interpreted. Guo et al. [1] on the other hand assigned the hfc constants of the EPR spectrum of oxidized EC to two radical centers on the A- and B-rings in the EC molecule. In our opinion,

– – – – 0.032 0.036 – 0.032 0.019 0.015a 0.024 – – – – 0.015a 0.100 –

0.032 0.048 0.371 0.371 0.366 0.360 – – – – – 0.468 0.480 0.47a – –

a3

0.150 0.025

a2

b

No assignment of hfc constants to individual atoms given. Atom numbers are the same as in catechinic structures. c Authors assigned the radical to the D-ring.

a

B-ring oxidation of luteolin [85] [85] (hydroxylated structure) B-ring oxidation of EC [1] A-ring oxidation of EC [1] B-ring oxidation of EGCG [1] [26]a [86]a [87]c D-ring oxidation of ECG [1] [87] [88]b 5-(isopropoxy-carbonyl-pyrogallol) oxidation of gallic acidb [112] [86]a,c B-ring oxidation of EGC [86]a [1] [87] Oxidation of chrysin [88] [1] (A-ring oxidation of EGC)

Radical compound

– –

– – 0.028a

– –

– – –

– – – –





– –

aOH

– –

0.092 0.088 0.086

– –

– – –

0.096 0.097/0.093 0.095 0.092



0.019/0.339

0.125/0.275 /0.465

a2u/a6u

Table 1. Hfc constants for various published radical structures of polyphenols.

– –

– – –

0.107 0.108c

0.109 0.110 0.113

– – – –





– –

a2u/a6u

– –

– – –

– –

– – –

– – – –



0.064

0.117 0.083

a5u

– 0.051

– – –

– –

– – –

– – – –





– –

a4

0.820 0.131

– – –

– –

– – –

– – – –

0.102



– –

a6

1.125 0.013

– – –

– –

– – –

– – – –

0.016



– –

a8

377 this assignment appears to be unreasonable: The EC molecule is too small to contain two non-interacting radical centers, and the spectrum of a triplet state radical would be completely different from that observed. An explanation of the results in terms of two separate radical species seems to be more likely. We consider that a possible interpretation of their spectrum involves one radical from hydroxylated EC with deprotonation of the OH group on C2u ([81], Table 1). The hfc constants for the 2nd radical species are too small for A-ring oxidation, since, if chrysin (Fig. 16) is taken as a model compound for the A-ring of GTP [88], the hfc constants from C6 and C8 are B0.8 and 1.1 mT, respectively (Table 1). Therefore we think that this 2nd radical species should be assigned to an oxidized dimerization or polymerization product, where a decrease in the magnitude of hfc constants would be expected. Similar conclusions have been drawn from measurements with kaempferol under comparable conditions [85]. With alkaline autoxidized EGC, the radical center was assigned to the pyrogallol group on the B-ring [1,86]. We have observed spectra that are superficially similar to these, but with additional hfc constants from the oxidized B-ring. One of these two couplings (0.032 mT) could arise from the OH proton on C3, which has a pKa of 15.5 in catechin and EC [84] and is therefore likely to be still protonated in EGC (the other OH groups all being deprotonated at pH 13). It should be noted that the radical structures suggested by Guo et al. [1] all show protonated molecules, even though extensive deprotonation should have occurred at the pH they used. The EPR spectrum of oxidized ECG reported by Yoshioka et al. [86] is dominated by a triplet, which was assigned to the oxidized galloyl group on the D-ring. In contrast, Guo et al. [1] obtained a six peak spectrum with two equivalent hfc constants of around 0.1 mT and one smaller (around 0.02 mT) that they also assigned to a radical center on the D-ring (galloyl group). We have produced EPR spectra from autoxidized ECG that show both radical species (Fig. 9). A reasonable interpretation is that the sextet signal corresponds to a radical center on the D-ring, but the triplet corresponds to a breakdown product that is identical to the radical obtained by oxidation of GA. Assignment of the sextet signal to oxidation of the galloyl group is

3′ 2′ 8

7

HO

O

A 6 5 OH

Fig. 16. Molecular structure of chrysin.

4′ B

C 4 O

5′ 2 3

6′

378 further supported by the similarity of the hfc constants to those of the oxidized model compound 5-isopropoxycarbonylpyrogallol [88]. EGCG has two pyrogallol groups in its structure (on the B- and D-rings), but the published radicals have been interpreted as arising from the B-ring only [1,86], although Mochizuki et al. [26] detected additional minor, unidentified peaks. We have now been able to identify three radical species in the EPR spectra from autoxidized EGCG, corresponding to the oxidized B-ring (gallyl group) which was the main signal, the oxidized D-ring (galloyl group) and oxidized GA, the breakdown product which probably followed D-ring oxidation similar to that with ECG. A mechanism for autoxidation of ECG and EGCG has been produced by Kondo et al. [24] (Fig. 17). Due to low bond dissociation energies of the C2 proton and the C4u and C4v phenolic hydrogen atoms, they suggested the formation of an anthocyaninelike compound and the EGCG radical as intermediate products of the reaction, and also cleavage of the galloyl group in ECG and EGCG. The rate of autoxidation increases with increasing pH for all GTP [26]. Under weak basic conditions (pH 7–9.5), a gallyl group on the B-ring seems to favor oxidation, and the galloyl group with its electron-withdrawing carbonyl group is more difficult to oxidize. Also, pyrogallol groups are more easily oxidized than catechol groups. In strong basic solutions (pH W 10), however, the autoxidation rate of ECG dramatically increased [26]. This behavior could not be explained on the basis of the pKa values alone, and Mochizuki et al. [26] proposed that it was the consequence of (i) the minor importance of acid dissociation constants (Ka) of OH groups on A and/or C-rings due to the low extent of p-conjugation of the B-ring with these rings, and (ii) the oxidation reaction being more complicated than a simple one-step oxidation. Stabilization of the semiquinone and O2d at high pH values was also suggested as being responsible for the pH-dependence of autoxidation. Oniki and Takahama [87] have investigated free radical formation from various GTP in alkaline, O2-free solutions using K3[Fe(CN)6] as the oxidizing agent. Their EPR spectra for EGC, ECG, and EGCG resemble quite well the results reported above for alkaline autoxidation. However they assigned the main signal from EGCG to the radical formed by D-ring oxidation. These authors also found that the oxidized GA-like radical was a major product from oxidation of GCG, but a minor one with EGCG. An additional smaller signal in the EGCG and GCG spectrum with hfc constants of 2  0.1, 2  0.43, and 0.25 mT was interpreted as corresponding to a radical with the unpaired electron delocalized over the GCG structure. We have also observed a similar signal (Fig. 18, Table 1) during EGCG oxidation, but assume that it originates from a dimerized or polymerized product. GCG and EGCG are isomers, which can be converted into each other under oxidative conditions [89], and polymerization of EGCG has been reported to occur on oxidation (e.g., [89,90]). Oniki and Takahama [87] also observed a spectrum from oxidation of GA at the pH range 10.5–12, which was interpreted as

HO

6

7

8

6

7

OH

5

A

OH

5

A

8

3

O

3

O

2′

D

6′′

6′

B

O

OH

D

2′′

OH

5′′

5′

OH

4′′ OH

3′′

4′′ OH

3′′

4′ OH

OH

OH

3

2′′

OH

5′′

5′

4′ OH

6′′

6′′

O

ECG

4

C

O 2

EGCG

4

C

O 2

B

OH

3

AAPH -H+, e-

3H+, 3e-

AAPH

HO

OH

A

OH

A

OH

A

C

O

OH

A

path B

HO

HO

HO

C

O

C

O

C

O

+

+ O

.. OH 2

O•

B

OH

H

O

O

D

OH

OH

OH

OH

OH

OH

OH

O

OH

D

OH

D

OH

D

OH

OH

O

OH

HO

B

O

O•

B

O

OH

O

O

O

O

B

OH

O

path B

path A

HO

H

HO

HO

.. OH 2

OH

A

OH

A

HO

HO

C

+ O

D

O

C

O

D

OH

OH

O•

B

O

O

O

OH

O

B

O2

.-

OH

A

HO

C

+ O

HO

O2

path A

HO

+

O

OH

OH

O

OH

OH

D

OH

O

OH

A

O

OH

A

HO

HO

B

C

+ O

OH

C

+ O

O

OH

B

O

B

O

OH

O

OH

+

D

OH

O

OH

O

reaction with peroxyl radicals (AAPH), adapted from Kondo et al. [24].

Fig. 17. Degradation mechanism for ()-epigallocatechin gallate (EGCG) and ()-epicatechin gallate (ECG) at pH 7.4 after

HO

2′

380 → B-ring oxidation EGCG

0.1 mT

→ gallate radical → unidentified component

Fig. 18. EPR spectrum obtained from alkaline autoxidized ()-epigallocatechin gallate (EGCG). The weak component was simulated with hyperfine coupling (hfc) constants of 2  0.40, 0.31, 0.17, and 2  0.10 mT. The spectrum was acquired using 0.2 mW microwave power (MP), 20 kHz modulation frequency, and 0.01 mT modulation amplitude (MA). Spectral interpretation is shown by the ‘‘stick’’ diagrams (our unpublished results).

corresponding to a radical based on a dimer of two GA molecules. Their experimental hfc constants match quite well with the results from our density functional theory (DFT) calculations of electron densities in this radical (Table 2). Oxidation under alkaline conditions has been frequently used to investigate the oxidative free radical chemistry of polyphenols, and results from such studies have been used in developing an understanding of products such as stored beverages, where shelf life is critical. However, it must be born in mind that an increase in reaction rate with O2 at high pH might be accompanied by side reactions that do not occur in physiologically relevant processes. The following section attempts to address this criticism by considering results from experiments using conditions closer to those that are expected to occur in vivo. Reaction of GTP with the superoxide anion radical (O2d) and inhibition of O2d generation The X/XO system is commonly used to generate O2d in vitro. However, as mentioned in section ‘Biological functions of flavonoids’, various polyphenols,

381 Table 2. Hfc constants from DFT calculations and the experimental results from Oniki and Takahama [87]. O

.-

O−

OH

O

2

O

O 1

O− HO

3

OH

O

DFT results Experimental hfcs [87]

a1

a2(OH)

a3(OH)

0.074 0.077

0.051 0.047

0.014 0.014

including GTP, have the ability to inhibit XO activity and hence to lower O2d generation, in addition to their O2d scavenging properties [20–22]. Both effects were investigated by Cos et al. [91], who were able to distinguish between the inhibition of XO and O2d scavenging in a single assay. Inhibition of XO was first detected by photometric determination of uric acid, and the concentration of O2d in the same solution was measured subsequently by the nitrite method. Two IC50 values (50% inhibitory concentration for XO inhibition and the concentration of flavonoids for 50% O2d level reduction) were obtained to distinguish between the two reactions. Catechin, EC, and EGC were found to be solely O2d scavengers in the investigated concentration range up to 100 mM, whereas ECG and EGCG inhibit XO [20]. X/XO, as an enzymatic generator of O2d and phenazine methosulfate-NADH, as a non-enzymatic source, were used by Furuno et al. [92] for comparison with the redox potentials and Cu(II)reducing ability of the polyphenols. The best correlation was obtained between the non-enzymatic system and Cu(II)-reducing ability, which was concluded to be a better measure of O2d scavenging ability than the redox potential. Using molecular modelling techniques, Lin et al. [93] have summarized the key elements for a polyphenol structure to inhibit XO. These include a C2–C3 double bond to maintain a planar structure, OH groups on C7 and C5 as binding sites with the enzyme, and a carbonyl group at C4 which favors the formation of hydrogen bonds. Hence, the C7 and C5 OH groups in the catechins are probably the active groups in the inhibition of XO. Ponce et al. [94] have produced a structure–activity relationship model for XO inhibition of various flavonoids based on UV spectroscopic measurements and molecular topology. Flavonoids were then assigned to four groups, namely inactive, low, significant, and high activity with respect to XO inhibition.

382 It is clear from the above discussion that polyphenols influence O2d levels in enzymatic systems by both inhibition and scavenging reactions. To investigate the chemistry of the products of the scavenging reaction directly with EPR spectroscopy, we have used potassium superoxide (KO2) as the O2d source. This was dissolved in dry DMSO using a crown-ether to enhance the solubility [95]. The advantages of this reaction setup are the high solubility of GTP in DMSO and the exclusion of water, in which O2d undergoes disproportionation (Eq. (15)). 2O2 d þ 2H2 O ! O2 þ H2 O2 þ 2OH

(15)

Figure 19 shows examples of EPR spectra from the reaction of EGCG with O2d (our unpublished results). The initial spectrum (Fig. 19A) consists of a single component which can be identified as the radical corresponding to D-ring oxidation of EGCG. The later spectrum (Fig. 19B) consists of an eight-peak component, which was assigned to the radical from oxidized GA, and at least two minor unidentified components. In contrast to alkaline autoxidation of EGCG, where the B-ring is the favored oxidation site, reaction of EGCG with O2d in an organic solvent favors D-ring oxidation, which is then followed by fragmentation to produce a radical that is identical to that from oxidized GA. It seems, therefore that different pathways operate in the oxidative free radical chemistry of GTP with O2 and O2d. Since the formation of O2d occurs in the O2 autoxidation mechanism, it is tempting to assign the D-ring oxidation pathway to its action in the high pH experiments, although at the present time this conclusion must be regarded as speculative. Other methods for the generation of GTP-derived free radicals Oxidation of GTP with HRP/H2O2 has been investigated by Bors et al. [53] using EPR spectroscopy combined with Zn(II) chelation to attempt to enhance the stabilities of the generated radicals. It was found that only catechol semiquinones from catechin, EC, and ECG were stabilized by Zn(II), and not those that contained pyrogallol structures. On the other hand Hagerman et al. [96] demonstrated Zn stabilization of the radicals from EGCG and EGC at pH values between 3 and 6, where only two of the three OH groups in the pyrogallol moiety are deprotonated. EPR spectra are similar to those from Bors et al. [53], indicating that the B-ring is the main oxidation site, and there is only a small contribution (o10%) from the galloyl group of EGCG. On the basis of the similarity between a later EPR radical signal in the oxidation of EGCG and that from the tannic acid oligomer, Bors et al. [53] suggested that condensed tannins oligomerize rather than redox-cycle from the quinone state.

383 → D-ring oxidation EGCG (A) 0.1 mT

→ gallate radical

(B) 0.1 mT

unidentified components

Fig. 19. EPR spectra obtained after oxidation of ()-epigallocatechin gallate

(EGCG) with O2d in DMSO solution (A) initial spectrum showing oxidation at the D-ring (a(1H) ¼ 0.14, 0.10, 0.03, and 0.01 mT), (B) the later spectrum which is dominated by the radical corresponding to oxidized gallic acid (GA) (a(1H) ¼ 0.19, 0.05, and 0.03 mT). Spectra were acquired using 0.2 mW microwave power (MP), 20 kHz modulation frequency, 0.001 mT modulation amplitude (MA). Spectral interpretations are shown by ‘‘stick’’ diagrams (our unpublished results).

Cyclic voltammetry (CV) can be a useful tool for determining the reducing ability and reversibility of the redox reactions of GTP, and in this respect is potentially complementary to EPR spectroscopy. Since we are convinced that a combination of these two techniques offers considerable potential for advancing our knowledge of the oxidative chemistry of GTP, we have included below a small section on the current status of results from CV studies of GTP. Kilmartin et al. [97] investigated electrochemical properties of various phenolic antioxidants that are found in wine, including catechin and GA. o-Diphenols such as catechin have low oxidation potentials of around

384 400 mV, and a reversible reaction was observed with catechin if the scan was taken only to 500 mV (i.e., just after the 1st oxidation peak). This oxidation peak in the cyclic voltammogram probably corresponds to the two-electron oxidation of catechin to the o-quinone. The reaction was clearly irreversible when the 2nd oxidation peak was included in the cycle. Improvements in the reversibility of polyphenol oxidation was increased in dilute solution [97], which is probably due either to the fact that the process is diffusioncontrolled or to the existence of dimerization reactions after oxidation [98]. Kilmartin et al. [97] demonstrated also that CV can be used to quantify polyphenols with certain reducing potentials in white and red wine samples after appropriate dilution. By using a flow-through column, the resolution of oxidation peaks in voltammograms of GTP was increased as a result of a decrease in interference reactions on the electrode surface [99]. Up to three oxidation waves were detected depending on the GTP. The 1st wave at B0 V was assigned to the pyrogallol group oxidation on the B-ring, the 2nd wave to either the gallyl group on the B-ring and/or the galloyl group on the D-ring, whereas the 3rd wave was suggested to arise from the oxidation of the resorcinol group on ring A. EGCG showed all three waves, EGC the 1st and the 3rd, catechin and ECG the 2nd and 3rd. Kilmartin and Hsu [100] have reported the use of CV to characterize the dependence of the electrochemical properties of tea, coffee, and GTP standards on water temperature, time, repeated infusions, and addition of milk. They divided the GTP into two groups on the basis of their cyclic voltammograms. One group consists of the gallocatechins and has lower formal potentials than the GTP from the second group which consists of catechins with catechol or galloyl groups. Thus the pyrogallol group on the B-ring has the higher reducing power. Catechin and EC are oxidized, whereas gallocatechins and gallates were partly and GA was not reversibly oxidized. Green tea and oolong tea extracts showed similar cyclic voltammograms to that from EGCG, but black tea was characterized by a voltammogram typical for theaflavin. However, up until the present time, the simultaneous combination of electrochemistry with EPR spectroscopy has scarcely been used for studies of the redox chemistry of GTP. One reason could be the low stabilities and concentrations of the generated semiquinone radicals. However, Le Nest et al. [98] have investigated the electrochemical oxidation of catechin and EC amongst other flavonoids in Tris-buffered H2O–DMSO and H2O–EtOH solvent mixtures using Zn(II)acetate to stabilize the formed radicals. CV measurements showed an irreversible oxidation of the catechins with oxidation potentials in the range 0.20–0.29 V vs. standard calomel electrode (SCE) dependent on the solvent. Measurements were performed using low concentrations (104 M) of the polyphenols. Addition of Zn(II)actetate lowered the potential gap (DE) and stabilized the semiquinone radical by

385 complexation, whereas no complexation occurred in the reduced forms of catechin and EC. EPR spectra of in situ electrochemically oxidized catechin and EC were only detected in the presence of Zn(II)acetate and were similar to the polyphenolic radicals produced by autoxidation. Reactions of green tea and polyphenols with transition metal ions As shown in Fig. 12A, and discussed briefly in section ‘Free radical chemistry of green tea’, tea leaves contain substantial quantities of the transition metal ions, Fe(III) and Mn(II), the former apparently mainly as storage protein, along with some mononuclear complexes. The Mn is easily extracted into aqueous solution where its spectrum is identical to that of the Mn(H2O)2þ 6 ion (Fig. 12B), but we do not see it in the alkaline solutions that are used for autoxidation studies. The presence of substantial natural concentrations of these metals and their consequent reaction potential have been largely ignored in investigations of the influence of transition metals on the chemistry of tea. Both Fe and Mn, for example, would be expected to support Fenton reaction chemistry. Metal chelation can also be used as an effective method for controlling polyphenol reactions, an example being the addition of divalent metals, such as Zn(II), Cd(II), or Hg(II) to stabilize the initial radicals, to inhibit further oxidation and to facilitate investigations of the radicals (e.g., [96,101]). In as yet unpublished work, we have shown that the addition of Cu(II) to green tea has an appreciable effect on the time-dependent intensity of the spectra from the various semiquinone radicals produced during alkaline autoxidation (see Fig. 13). However, in contrast to the reports in the biological literature that indicate enhanced oxidation of the polyphenols (see section ‘Biological functions of flavonoids’), the present results indicate a destabilization of the semiquinone radical signal (Fig. 20) that only becomes apparent after an appreciable lag period (around 15 min in the experiment shown in Fig. 20). Although one expects that a redox reaction CuðIIÞ þ polyphenol ! CuðIÞ þ semiquinone

(16)

would stimulate free radical production, it would appear that further oxidation of the semiquinone by Cu(II) has a greater effect on the EPR spectral intensity. EPR spectra also show that excess Cu(II) is present in solution in complexed form(s) (Fig. 21). Figure 21 contains both isotropic and anisotropic Cu components, which probably reflect the range of sizes of the molecules in the solution. With high molecular mass compounds, rotation is slow compared to the timescale of the EPR transition and anisotropic spectra are observed that are comparable to those in a rigid matrix. Such spectra show features corresponding to different orientations of the principal axis relative

386 0.00 2.00

1/total integral (arbitrary unit)

4.00 6.00 8.00 10.00 12.00 14.00 16.00 18.00 20.00

0

5

10

15 20 25 time after reaction start (min)

30

35

40

Fig. 20. Time-dependence of the EPR spectral intensity of an alkaline autoxidized

hot water extract of green tea in the presence (~) and absence (&) of Cu(II) (our unpublished results).

to the magnetic field. Low molecular mass molecules tumble rapidly and averaged (isotropic) spectra are observed. In this spectrum, however, only the two highest field components of the quartet from the 63,65Cu ions are observed with confidence in the isotropic spectrum, because of line broadening to low field as a result of incomplete motional averaging of the spectra. Thus, the Cu(II) complex(es) here have a range of sizes, and none are small enough for their rotation in solution to completely average the EPR spectral parameters. In a simple aqueous extract of tea, no semiquinone radical signal was observed either in the absence or presence of Cu(II) under the conditions used to record Fig. 13, although as shown in Fig. 12B a very weak free radical signal is observed under extreme spectral acquisition conditions. Also, the Cu(II) spectrum (Fig. 22) from this sample was completely different from that seen at higher pH (Fig. 21). Although the spectrum in Fig. 22 contains considerable overlap between Cu(II) and Mn(II) components, it can be clearly seen that the Cu(II) spectrum is isotropic, although the shape (lines narrowing progressively with increasing field) indicates that motional averaging of the spectral parameters is still incomplete. Nevertheless, such a spectrum is characteristic of relatively low molecular mass complexes compared with the spectra at high pH. It should also be borne in mind that

387 giso Aiso(Cu) (A)

Free radical

10 mT

X 10

(B)

A//(Cu) g//

Fig. 21. EPR spectrum from Cu(II) in an alkaline solution of green tea. Spectral

acquisition parameters were 20 mW microwave power (MP), 100 kHz modulation frequency, 1 mT modulation amplitude (MA). This spectrum shows two components (A) an isotropic feature with strongly anisotropic line broadening, and (B) an anisotropic feature seen by the presence of the two lowest field peaks from the Cu hfc on the g// feature. Linewidth variation with progressively increasing broadening to low field is a common feature of Cu(II) EPR spectra in fluid solution, because of the highly anisotropic nature of the basic signals from this 3d 9 ion. The extent of this broadening, however, is a measure of the mobility of the ion, and hence a combination of solvent viscosity and the size of the molecule in which it is bound (our unpublished results).

there is likely a mixture of complexes present in the solution, and that each of the spectra is made up of contributions from the 63Cu and 65Cu isotopes in an approximate 7:3 ratio, the hfc constants for 65Cu being approximately 7% greater than those of 63Cu. The historic work of Eaton [102] shows how effective the EPR technique can be in detecting the formation of complexes between metals and semiquinone radicals in simple systems. In a straightforward experiment in which alkaline autoxidation of catechol was performed in the presence of various divalent metal ions, spectra were obtained in which there was a distinct dependence of the hfc values on the metal, but otherwise they were qualitatively similar to that of the uncomplexed radical (Fig. 23A). The ratio of the coupling constants correlated well with the heats of hydration of the metal ions, and it is interesting that of the ions investigated, the value for Zn showed the greatest deviation from that of the uncomplexed radical ion (i.e., the strongest chelate). Metal hyperfine structure was observed when nuclei with non-zero spins were used. This is illustrated in Fig. 23B for the 89 Y(III) complex, where 18 peaks are observed because of an additional splitting from the 89Y nucleus (I ¼ 1/2). Al, Be, and B were all observed to stabilize catechol against alkaline autoxidation. Spectra obtained in the

388 Mn(II) (A)

Cu(II)

10 mT

(B)

Fig. 22. (A) 1st and (B) 2nd derivative recordings of the Cu(II) and Mn(II) signals in

an aqueous extract of green tea with added Cu(II). Spectral acquisition parameters were 20 mW microwave power (MP), 100 kHz modulation frequency, 1 mT modulation amplitude (MA) (our unpublished results).

presence of methanol were complicated by reaction of the o-semiquinone radical with the solvent which illustrates that solvent interactions may need to be considered when conducting this type of work. In recent years there have been many reports in the biological literature concerning the roles of various metal ions in influencing the activity of polyphenols in general and GTP in particular, and these were discussed briefly in section ‘Biological functions of flavonoids’. In view of the fact that both redox and complexation (chelation) reactions can occur between transition metal ions and both polyphenols and their oxidation products, it is not surprising that apparently different types of effect have been observed. Both pro- and antioxidant behavior, for example, have been reported in roughly equal amounts. Some examples are given in the following sections. However, many recent EPR studies have concentrated on the roles of metals in the generation of dOH and O2d, and there has been little by way of direct studies of the mechanism of bonding between metal ions and the polyphenols, quinones, and/or semiquinone radicals, although the technique is well suited to such investigations. Also, such reactions are likely to be of importance in studies of the effects of polyphenols on metalloenzyme studies. The prooxidant activity of phenols is dependent on a combination of their metal-reducing properties, chelation behavior, and O2-reduction ability, the last of these being directly related to the stability of the phenoxyl radicals

389 a(1H)

a(1H)

(A)

(B)

a(1H)

a(89Y)

a(1H)

Fig. 23. EPR spectra of the o-semiquinone radical in the absence (A) and presence

(B) of Y(III), redrawn from Eaton [102]. Spectral interpretations are shown by ‘‘stick’’ diagrams.

produced by one-electron oxidation of the phenols. Such behavior was explained by Sakihama et al. [101] in terms of metal-induced Fenton chemistry (see section ‘Introduction to the chemistry of GTP oxidation’). They considered that initial oxidation of catechols by, e.g., Cu(II) generates semiquinone radicals that can react with O2 to form O2d. O2d can then produce H2O2, either by disproportionation or by attack on the parent polyphenol. In the presence of Cu(I), H2O2 undergoes Fenton chemistry to generate the dOH radical, which is the agent that causes DNA damage. Generation of dOH was confirmed using the spin trapping method by adding DMSO to the system and then using 4-POBN to trap the methyl radical produced from it. Hu and Kitts [103] found that EGCG behaved as a prooxidant in the presence of Cu(II), and attributed this behavior to the redox reaction which produced Cu(I) and the EGCG cation radical. Cu(I) and H2O2 initiate the Fenton reaction which increases oxidation of GTP. These reports should be compared with the results obtained with tea (Fig. 20), where Cu appeared to have little effect on the generation of

390 semiquinone radicals, but after a period of time accelerated their decomposition. It can be speculated that the lag period could correspond to the time needed to generate sufficient quantities of Cu(I) and H2O2 for substantial dOH radical production, which could be responsible for oxidation of the semiquinone radicals to diamagnetic species, such as quinones. In a study of the effects of catechins on dOH generation in the Cu(II)/ H2O2 and Fe(II)/H2O2 reaction systems, Kashima [104] has concluded that inhibition of dOH generation in the Cu system was the result of reaction between the catechins and Cu, whereas in the Fe system the catechins reacted with the dOH and there was little chelation of the Fe. However, in these systems a redox reaction between the Cu and the polyphenol would have been an essential initial step in dOH formation, whereas this would have been unnecessary in the Fe case. In a different approach to the study of metal–phenol interactions, Sugihara et al. [105] investigated the roles of various GTP in inhibiting Fe- or Cu-induced lipid peroxidation in cultured hepatocytes. Differences in the antioxidative efficiency were in the order EGCG W ECG W EGC W EC. However, in contrast to BHT (a lipid-free radical scavenger) the antioxidative potential of the GTP varied with different metals, suggesting that their metal-chelating abilities might be a major factor in determining their chemical behavior. Brown et al. [106] considered the structural factors that affect the interaction between Cu(II) and flavonoids (quercetin, rutin, kaempferol, and luteolin) with respect to both chelation and modification through oxidation. Each of the flavonoids provided dose-dependent protection of low density lipoproteins (LDL) against Cu(II)-induced oxidation, but their effectiveness was related to firstly the presence of 3u,4u-dihydroxy substitution in the B-ring to facilitate Cu chelation, and secondly to the presence of a 3-OH group in the C-ring to enhance oxidation. With the Cu(II) EGCG system EPR spectra show strongly pH-dependent behavior (e.g., Fig. 24). The spectra at low and high pH correspond to the solvated Cu(II) and Cu(OH)2 4 ions, respectively, but at intermediate pH a number of different Cu complexes can exist. This is illustrated better in frozen solution spectra (e.g., Fig. 25), where the g// features from at least two different Cu components are observed. Thus the Cu chemistry is considerably more complicated than is often assumed. These results appear to be similar to those in a report by Zhang et al. [107], but we have not been able to access an English language version of this Chinese language publication. There are relatively few investigations of metal-induced oxidation of polyphenols in acidic solutions, but Jungbluth et al. [108] report reactions of some flavonols, flavones, and 3-methoxyflavones with Cu(II), Fe(II), and Fe(III) in acidic aqueous media. With Cu(II) and Fe(III), flavonols were oxidized to 2-(hydroxybenzoyl)-2-hydroxybenzofuran-3(2H )-ones. The reaction also occurred with Fe(II), but an initial oxidation of the metal

391 [Cuaq]2+ 10 mT

(A)

Cu(II)-complex

(B)

[Cu(OH)4]2-

(C)

Fig. 24. pH-dependence of EPR spectra from epigallocatechin gallate (EGCG) and Cu(II) in systems buffered to (A) pH 3 showing the spectrum of the [Cuaq]2þ ion, (B) pH 7, showing unidentified Cu(II) complexes, and (C) pH 14, showing the spectrum of the [Cu(OH)]2 4 ion. Spectra were acquired using 20 mW microwave power (MP), 100 kHz modulation frequency, 1 mT modulation amplitude (MA) (our unpublished results).

(by atmospheric O2) was required. More recently, Ryan and Hynes [109] have investigated the kinetics and mechanisms of complex formation between Fe(III) and the GTP EGCG and ECG in the pH range 1–3 with the metal present in excess, and reported the formation of complexes with 2:1 metal:ligand ratios for both EGCG and ECG. These results suggest that complexation occurs on both the B- and D-rings when the metal is present in

392

g// of two Cu(II)-complexes

10 mT

Fig. 25. EPR spectrum of a frozen solution (210 K) of EGCG and Cu(II) in the ratio

5:1, in an unbuffered solution, where at least two different Cu components can be distinguished. Tentative assignments for the peaks associated with the g// features are indicated by the stick diagrams. Spectral acquisition parameters were 2 mW microwave power (MP), 100 kHz modulation frequency, 1 mT modulation amplitude (MA) (our unpublished results).

excess. At the low pH values used in this work, Fe(III) and Fe(OH)2þ were the major species present in the solutions. In contrast to the results from Ryan and Hynes [109], Guo et al. [1] reported the ratio of EGC to Fe(III) to be 3:2, that of EGCG or ECG to Fe(III) to be 2:1 and that of EC to Fe(III) to be 3:1. Andrade et al. [110] have reported that tannic acid inhibits Fe-dependent free radical formation in vitro. They concluded that tannic acid operates mainly by inhibiting Fe(III)-induced ascorbate oxidation and Fe(II) autoxidation, thus limiting the formation of components of the Fenton reaction system. However, they also state that tannic acid exhibits dOH trapping activity. Other studies of interactions between Fe(III) and catecholates have been reported and a variety of different mechanisms observed. However, this type of reaction is complex, as illustrated by an investigation of the simple reaction of Fe(III) with catechol [111]. The initial reaction involved reduction of Fe(III) to Fe(II) and a lowering of the solution pH, as a result of the displacement of the weakly acidic phenolic H atoms by the metal ion. During subsequent oxidation of the Fe(II) chelate using H2O2, Fe(II) oxalate was observed as an intermediate product, indicating that ligand breakdown occurred in advance of Fe oxidation. With chelation and redox chemistry of the metal and redox and free radical chemistry of the ligands all being factors in the reaction progress, it is perhaps unsurprising that so many different types of reaction have been reported.

393 Conclusions and forward look There is currently enormous scientific interest in polyphenols in general and GTP in particular as a result of a wealth of epidemiological studies that foods rich in these chemicals have beneficial health effects. The chemical basis for such observations is, however, still poorly understood. There is increasing evidence that the concept of explaining the biological behavior of polyphenols in terms of their antioxidant properties, which was popular during the 1990s, is too simplistic, and there is a need to develop our knowledge of the chemical behavior of different molecules in order to even begin to understand their biological properties. The use of EPR spectroscopy to investigate the oxidative free radical chemistry of GTP dates back almost 40 years. In early work it was considered that mechanisms of oxidation were independent of pH, and measurements generally focused on alkaline autoxidation, since the oxidation rate is faster at high pH values. In the past decade there has been increased interest in polyphenol chemistry, as an increasing awareness has developed on the biological importance of such molecules. Unfortunately, some investigations of their chemical properties seem to have been conducted at a superficial level (such as e.g., conclusions from measurements of antioxidant properties), and recent progress in our understanding of the chemical behavior of natural polyphenols has been relatively slow. It is in addressing such problems, however, that EPR spectroscopy has particular strength. In this review article we have aimed to provide the reader with an appreciation of the potential applications of the EPR technique in advancing our knowledge of the free radical chemistry of GTP, and at the same time discussing critically the importance of carefully designing the EPR experiments in order to fully realize the potential of this technique. We have introduced various physical procedures that can be used to facilitate the study of the chemistry of unstable free radicals, but which have not so far been well utilized in investigations on GTP. For example, in the future we expect to see more publications utilizing a combination of EPR and electrochemical methods in the investigations of GTP redox reactions. Investigations of the primary mechanisms of alkaline autoxidation of polyphenols are now quite well understood, although many secondary products that result from degradation and/or di- or polymerization reactions have still not been characterized. It should be appreciated, however, that even the most basic spectral properties, such as the magnitudes of hfc constants, radical stability, and spectral resolution are all dependent on the composition of the solvent system used in the measurements. In view of these observations, it is perhaps not surprising that there is now an increasing awareness that the mechanisms of oxidation of polyphenols are also dependent on a variety of conditions that include pH and the oxidizing agent. An important issue for future research will be to link the knowledge of chemical reactions

394 that is determined from measurements using simple systems to reactions taking place under the conditions in which the biologically derived product is used. In respect of understanding properties of tea, it is important that chemical measurements include observations that are relevant to reactions that could occur in a cup of tea, for example, or even in the digestive system if its health properties are to be understood. In the future, we expect to see more reports of results from such experiments. One continually recurring problem in EPR investigations of polyphenols is the reliability of spectral interpretation, because of the large numbers of peaks that may be observed. Where there is sufficient sensitivity, related double resonance techniques may be of value in simplifying the spectra, and numerical methods are readily available for refining the values of the measured hfc constants. Frequently, however, interpretations rely on a combination of the experience and chemical intuition of the spectroscopist, and as shown by the number of conflicting interpretations in the literature, there is seldom absolute proof for a particular assignment with complex radicals. Now with the increasing availability of cheap and powerful computers, it is possible to use numerical methods to (at least partially) address this problem. We expect, therefore, that in the future spectral interpretations will be justified by ab initio calculations of electronic densities, especially on the relatively small radicals that are the primary products of oxidation of defined polyphenols. Increased confidence in our understanding of the basic chemical processes will then lay the foundations for development of the science on more complex (biological) systems. The empirical evidence of the biological activity of polyphenolic compounds in general, and GTP in particular, have created an urgency for understanding their modes of action, and thus to be able to use them with confidence for human benefit. This could take a variety of routes ranging from the development of new products for use as medicines or control of microorganisms through to the production of food products that contribute to the maintenance of good health. Increased knowledge about the chemistry of polyphenols will also contribute substantially to a better understanding of the biochemical mechanisms of polyphenols in plants – the organisms that produce them. Since many polyphenols are glycosylated, investigation of gylcosylation on the biological activity of polyphenols should be an important topic of future research. Understanding the biochemical mechanisms and target molecules of polyphenols in plants may help to identify analogous reaction partners in the human organism. Food quality is an important issue for consumers and consequently also for producers, especially now that the World enjoys greater diversity and availability of products than at any time in human history. Unfortunately, ‘‘quality’’ is such a vague term that it conveys different meanings to different people. Nevertheless, the food industry is highly competitive and there is a natural desire by food producers to be able to market their products in terms

395 of identified quality criteria. Simple concepts, such as ‘‘antioxidant potential’’ are therefore attractive, because they can be easily measured, and were rapidly adopted before even a basic understanding of the underlying science was developed. The plethora of dietary supplements that are now marketed in health food shops with little or no knowledge of either their chemical or biological behavior is testament to this practice. At the food level, we now have the ability to influence the composition of plant-derived foods through agricultural management (e.g., by classical breeding methods), or by genetic manipulation. However, in order for such manipulations to be useful, it is essential to fully understand the functions (both in planta and in humans) of molecules whose concentrations are changed. It is our fear that major manipulations of molecular composition will be adopted on the basis of over simplified or incorrect understanding of the relevant science. Thus with tea, it is imperative that a thorough understanding of the chemistry of the GTP is developed in advance of any major program to modify the levels at which they are produced in planta. Acknowledgments This work was funded by the Austrian Ministry of Traffic, Innovation and Technology (BMVIT) and the Austrian Science Fund (FWF). We thank Daniel Tunega from the University of Vienna for the DFT calculations used in Table 2. References 1. Guo Q, Zhao B, Li M, Shen S and Xin W. Studies on protective mechanisms of four components of green tea polyphenols against lipid peroxidation in synaptosomes. Biochim Biophys Acta 1996;1304:210–222. 2. Guo Q, Zhao B, Shen S, Hou J, Hu J and Xin W. ESR study on the structure– antioxidant activity relationship of tea catechins and their epimers. Biochim Biophys Acta 1999;1427:13–23. 3. Chen C-W and Ho C-T. Antioxidant properties of polyphenols extracted from green and black teas. J Food Lipids 1995;2:35–46. 4. Chen C, Tang H-R, Sutcliffe LH and Belton PS. Green tea polyphenols react with 1,1-diphenyl-2-picrylhydrazyl free radicals in the bilayer of liposomes: direct evidence from electron spin resonance studies. J Agric Food Chem 2000;48:5710–5714. 5. Unno T, Sugimoto A and Kakuda T. Scavenging effect of tea catechins and their epimers on superoxide anion radicals generated by a hypoxanthine and xanthine oxidase system. J Sci Food Agric 2000;80:601–606. 6. Unno T, Yayabe F, Hayakawa T and Tsuge H. Electron spin resonance spectroscopic evaluation of scavenging activity of tea catechins on superoxide radicals generated by a phenazine methosulfate and HADH system. Food Chem 2002;76:259–265. 7. Babior MB. NADPH oxidase: an update. Blood 1999;93:1464–1476. 8. Hu¨ckelhoven R and Kogel K-H. Reactive oxygen intermediates in plant–microbe interactions: who is who in powdery mildew resistance. Planta 2003;216:891–902.

396 9. Schopfer P, Liszkay A, Bechtold M, Frahry G and Wagner A. Evidence that hydroxyl radicals mediate auxin-induced extension growth. Planta 2002;214:821–828. 10. Fry SC, Dumville JC and Miller JG. Fingerprinting of polysaccharides attacked by hydroxyl radicals in vitro and in the cell walls of ripening pear fruit. Biochem J 2001;357: 729–737. 11. Mittler R. Oxidative stress, antioxidants and stress tolerance. Trends Plant Sci 2002;7: 405–410. 12. Moller IM. Plant mitochondria and oxidative stress: electron transport, NADPH turnover, and metabolism of reactive oxygen species. Annu Rev Plant Physiol Plant Mol Biol 2001;52:561–591. 13. Terada LS, Leff JA and Repine JE. Measurement of xanthine oxidase in biological tissue. Methods Enzymol 1990;186:651–656. 14. Symons MCR and Gutteridge JMC. Free Radicals and Iron: Chemistry, Biology and Medicine, Oxford, Oxford University Press, 1998. 15. Halliwell B and Gutteridge JMC. Oxygen toxicity, oxygen radicals, transition metals and disease. Biochem J 1984;219:1–14. 16. Smirnoff N. The role of active oxygen in the response of plants to water deficit and desiccation. New Phytology 1993;125:27–58. 17. Neill S, Desikan R and Hancock J. Hydrogen peroxide signalling. Curr Opin Plant Biol 2002;5:388–395. 18. Jakopitsch C, Wanasinghe A, Jantschko W, Furtmu¨ller PG and Obinger C. Kinetics of interconversion of ferrous enzymes, compound II and compound III, of wild-type Synechocystis catalase-peroxidase and Y249F. J Biol Chem 2005;280:9037–9042. 19. Chen S-X and Schopfer P. Hydroxyl-radical production in physiological reactions. Eur J Biochem 1999;260:726–735. 20. Dew TP, Day AJ and Morgan MRA. Xanthine oxidase activity in vitro: effects of food extracts and components. J Agric Food Chem 2005;53:6510–6515. 21. Aucamp J, Gaspar A, Hara Y and Apostolides Z. Inhibition of xanthine oxidase by catechins from tea (camellia sinensis). Anticancer Res 1997;17:4381–4386. 22. Lin J-K, Chen P-C, Ho C-T and Lin-Shiau S-Y. Inhibition of xanthine oxidase and suppression of intracellular reactive oxygen species in HL-60 cells by theaflavin-3, 3u-digallate, (-)-epigallocatechin-3-gallate, and propyl gallate. J Agric Food Chem 2000; 48:2736–2743. 23. Huang Q, Huang Q, Pinto RA, Kai G, Schweitzer-Stenner R and Weber WJ, Jr. Inactivation of horseradish peroxidase by phenoxyl radical attack. J Am Chem Soc 2005;127:1431–1437. 24. Kondo K, Kurihara M, Miyata N, Suzuki T and Toyoda M. Scavenging mechanisms of (-)-epigallocatechin gallate and (-)epicatechin gallate on peroxyl radicals and formation of superoxide during the inhibitory action. Free Radic Biol Med 1999;27: 855–863. 25. Nakayama T, Ichiba M, Kuwabara M, Kajiya K and Kumazawa S. Mechanisms and structural specificity of hydrogen peroxide formation during oxidation of catechins. Food Sci Technol Res 2002;8:261–267. 26. Mochizuki M, Yamazaki S, Kano K and Ikeda T. Kinetic analysis and mechanistic aspects of autoxidation of catechins. Biochim Biophys Acta 2002;1569:35–44. 27. Heim KE, Tagliaferro AR and Bobilya DJ. Flavonoid antioxidants: chemistry, metabolism and structure–activity relationships. J Nutr Biochem 2002;13:572–584. 28. Douglas CJ. Phenylpropanoid metabolism and lignin biosynthesis: from weeds to trees. Trends Plant Sci 1996;1:171–178.

397 29. Shirley BW. Flavonoid biosynthesis: ‘new’ functions for an ‘old’ pathway. Trends Plant Sci 1996;1:377–382. 30. Ho C-T and Zhu N. The chemistry of tea. In: Caffeinated beverages, Parliament TH, Ho C-T and Schieberle P (eds), ACS Symposium Series, Washington DC, American Chemical Society, 2000, pp. 316–326. 31. Jansen MAK, Gaba V and Greenberg BM. Higher plants and UV-B radiation: balancing damage, repair and acclimation. Trends Plant Sci 1998;3:131–135. 32. Agati G, Matteini P, Goti A and Tattini M. Chloroplast-located flavonoids can scavenge singlet oxygen. New Phytologist 2007;147:77–89. 33. Taylor LP and Grotewold E. Flavonoids as developmental regulators. Curr Opin Plant Biol 2005;8:317–323. 34. Pourcel L, Routaboul J-M, Cheynier V, Lepiniec L and Debeaujon I. Flavonoid oxidation in plants: from biochemical properties to physiological functions. Trends Plant Sci 2006;12:29–36. 35. Simmonds MSJ. Flavonoid-insect interactions: recent advances in our knowledge. Phytochemistry 2003;64:21–30. 36. Grayer RJ and Harborne JB. A survey of antifungal compounds from higher plants. Phytochemistry 1994;37:19–42. 37. Collingborn FMB, Gowen SR and Mueller-Harvey I. Investigations into the biochemical basis of nematode resistance in roots of three Musa cultivars in response to Radopholus similis infection. J Agric Food Chem 2000;48:5297–5301. 38. Treutter D. Significance of flavonoids in plant resistance: a review. Environ Chem Lett 2006;4:147–157. 39. Dixon RA, Xie D-Y and Sharma SB. Proanthocyanidins – a final frontier in flavonoid research? New Phytologist 2005;165:9–28. 40. Herna´ndez I, Alegre L and Munne´-Bosch S. Enhanced oxidation of flavan-3-ols and proanthocyanidin accumulation in water-stressed tea plants. Phytochemistry 2006;67:1120–1126. 41. Rice-Evans CA, Miller NJ and Paganga G. Antioxidant properties of phenolic compounds. Trends Plant Sci 1997;2:152–159. 42. Winter CK, Segall HJ and Haddon WF. Formation of cyclic adducts of deoxyguanosine with the aldehydes trans-4-hydroxy-2-hexenal and trans-4-hydroxy-2-nonenal in vitro. Cancer Res 1986;46:5682–5686. 43. Zollner H, Schaur RJ and Esterbauer H. Biological activities of 4-hydroxyalkenals. In: Oxidative stress: oxidants and antioxidants, Sies H (ed), London, Academic Press, 1991, pp. 337–369. 44. Grassmann J and Hippeli S. Elstner EF Plant’s defence and its benefits for animals and medicine: role of phenolics and terpenoids in avoiding oxygen stress. Plant Physiol Biochem 2002;40:471–478. 45. Winkel-Shirley B. Biosynthesis of flavonoids and effects of stress. Curr Opin Plant Biol 2002;5:218–223. 46. Dixon RA. Phytoestrogens. Annu Rev Plant Biol 2004;55:225–261. 47. Wood JG, Rogina B, Lavu S, Howitz K, Helf SL, Tatar M and Sinclair D. Sirtuin activators mimic caloric restriction and delay ageing in metazoans. Nature 2004;430:686–689. 48. McKay DL and Blumberg JB. Roles for epigallocatechin gallate in cardiovascular disease and obesity: na introduction. J Am Coll Nutr 2007;26:362S–365S. 49. Yang CS, Maliakal P and Meng X. Inhibition of carcinogenesis by tea. Annu Rev Pharmacol Toxicol 2002;42:25–54. 50. Surh Y-J. NF-kB and AP-1 as molecular targets for chemoprevention with EGCG, a review. Environ Chemy Lett 2006;4:137–141.

398 51. Yang F, Oz HS, Barve S, De Villiers WJS, McClain CJ and Varilek GW. Green tea polyphenol (-)-epigallocatechin-3-gallate blocks nuclear factor-kB activaction by inhibiting IkB kinase activity in the intestinal epithelial cell line IEC-6. Mol Pharmacol 2001;60:528–533. 52. Malik A, Azam S, Hadi N and Hadi SM. DNA degradation by water extract of green tea in the presence of copper ions: implications for anticancer properties. Phytother Res 2003;17:358–363. 53. Bors W, Michel C and Stettmaier K. Electron paramagnetic resonance studies of radical species of proanthocyanidins and gallate esters. Arch Biochem Biophys 2000;374: 347–355. 54. Kagaya N, Kawase M, Maeda H, Tagawa Y, Nagashima H, Ohmori H and Yagi K. Enhancing effect of zinc on hepatoprotectivity of epigallocatechin gallate in isolated rat hepatocytes. Biol Pharm Bull 2002;25:1156–1160. 55. Yoshioka H, Senba Y, Saito K, Kimura T and Hayakawa F. Spin trapping of the hydroxyl radical formed from a tea catechin-Cu2þ system. Biosci Biotechnol Biochem 2001;65:1697–1706. 56. Chen X, Yu H, Shen S and Yin J. Role of Zn2þ in epigallocatechin gallate affecting the grwoth of PC-3 cells. J Trace Elem Med Biol 2007;21:125–131. 57. Furukawa A, Oikawa S, Murata M, Hiraku Y and Kawanishi S. (-)-Epigallocatechin gallate causes oxidative damage to isolated and cellular DNA. Biochem Pharmacol 2003; 66:1769–1778. 58. Zheng X, Chen A, Hoshi T, Anzai J and Li G. Electrochemical studies of (-)-epigallocatechin gallate and its interaction with DNA. Anal Bioanal Chem 2006; 386:1913–1919. 59. Yu H, Yin J-H and Shen S-R. Growth inhibition of prostate cancer cells by epigallocatechin gallate in the presence of Cuþ. J Agric Food Chem 2004;52:462–466. 60. Azam S, Hadi N, Khan NU and Hadi SM. Prooxidant property of green tea polyphenols epicatechin and epigallocatechin-3-gallate: implications for anticancer properties. Toxicol In Vitro 2004;18:555–561. 61. Shi X, Ye J, Leonard SS, Ding M, Vallyathan V, Castranova V, Rojanasakul Y and Dong Z. Antioxidant properties of (-)-epicatechin-3-gallate and its inhibition of Cr(VI)induced DNA damage and Cr(IV)- or TPA-stimulated NF-kB activation. Mol Cell Biochem 2000;206:125–132. 62. Weinreb O, Mandel S, Amit T and Youdim MBH. Neurological mechanisms of green tea polyphenols in Alzheimer’s and Parkinson’s diseases. J Nutr Biochem 2004;15:506–516. 63. Mandel S, Weinreb O, Reznichenko L, Kalfon L and Amit T. Green tea catechins as brain-permeable, non-toxic iron chelators to ‘‘ironour iron’’ from the brain. J Neural Transm Suppl 2006;71:249–257. 64. Mandel S, Amit T, Zheng H, Weinreb O and Youdim MBH. The essentiality of iron chelation in neuroprotection: a potential role of green tea catechins. Oxidat Stress Disease 2006;22:277–299. 65. Ka¨lin M, Gromov I and Schweiger A. The continuous wave electron paramagnetic resonance experiment revisited. J Magn Reson 2003;160:166–182. 66. Loth K and Graf F. Structure and dynamics of intramolecular hydrogen bonds in radicals: substituent, steric and solvent effects. Helv Chim Acta 1981;64:1910–1929. 67. Compton RG and Waller AM. ESR spectroscopy of electrode processes. In: Cyclic voltammetry, Gosser DA (ed), Weinheim, VCH – Verl.-Ges., 1993, pp. 349–398. 68. Buettner GR. Spin trapping: ESR parameters of spin adducts. Free Radic Biol Med 1987;3:259–303.

399 69. Fre´javille C, Karoui H, Tuccio B, Le Moigne F, Culcasi M, Pietri S, Lauricella R and Tordo P. 5-(Diethoxyphosphoryl)-5-methyl-1-pyrroline-N-oxide: a new efficient phosphorylated nitrone for the in vitro and in vivo trapping of oxygen-centred radicals. J Med Chem 1995;38:258–265. 70. Gardner PT, McPhail DB and Duthie G. Electron spin resonance spectroscopy assessment of the antioxidant potential of teas in aqueous and organic media. J Sci Food Agric 1998;76:257–262. 71. Polovka M, Brezova´ V and Stasko A. Antioxidant properties of tea investigated by EPR spectroscopy. Biophys Chem 2003;106:39–56. 72. Halliwell B, Aeschbach R, Lo¨liger J and Aruoma OI. The characterization of antioxidants. Food Chem Toxicol 1995;33:601–617. 73. Hollman PCH and Arts ICW. Flavonols, flavones and flavanols – nature, occurence and dietary burden. J Sci Food Agric 2000;80:1081–1093. 74. Yen G-C and Chen H-Y. Antioxidant activity of various tea extracts in relation to thier antimutagenicity. J Agric Food Chem 1995;43:27–32. 75. Davies MJ, Judd JT, Baer DJ, Clevidence BA, Paul DR, Edwards AJ, Wiseman SA, Muesing RA and Chen SC. Black tea consumption reduces total and LDL cholesterol in mildly hypercholesterolemic adults. J Nutr 2003;133:3298S–3302S. 76. Muckenschnabel I, Williamson B, Goodman BA, Lyon GD, Steward D and Deighton N. Markers for oxidative stress associated with soft rots in French beans (Phaseolus vulgaris) infected by Botrytis cinerea. Planta 2001;212:376–381. 77. Goodman BA and Reichenauer TG. Formation of paramagnetic products in leaves of wheat, Triticum aestivum, as a result of ozone-induced stress. J Sci Food Agric 2003;83: 1248–1255. 78. Morsy MA and Khaled MM. Novel EPR characterization of the antioxidant activity of tea leaves. Spectrochimica Acta (Part A) 2002;58:1271–1277. 79. Yoshioka H, Rin K and Tsuyumu S. Light emission and the formation of radicals after grinding tea leaves. Agric Biol Chem 1990;54:3105–3110. 80. Kuhnle JA, Windle JJ and Wais AC. Electron paramagnetic resonance spectra of flavonoid anion-radicals. J Chem Soc B 1969;613–616. 81. Jensen ON and Pedersen JA. The oxidative transformations of (þ)catechin and (-)epicatechin as studied by ESR. Formation of hydroxycatechinic acid. Tetrahedron 1983;39:1609–1615. 82. Sears KD, Casebier RL and Herbert HL. The structure of catechinic acid. A base rearrangement product of catechin. J Org Chem 1974;39:3244–3247. 83. Kiatgrajai P, Wellons JD, Gollob L and White JD. Kinetics of epimerisation of (þ)-catechin and its rearrangement to catechinic acid. J Org Chem 1982;47:2910–2912. 84. Kennedy JA, Munro MHG, Powell HKJ, Porter LJ and Foo LY. The protonation reactions of catechin, epicatechin and related compounds. Aust J Chem 1984;37:885–892. 85. Pirker KF, Stolze K, Reichenauer TG, Nohl H and Goodman BA. Are the biological properties of kaempferol determined by its oxidation products? Free Radic Res 2006; 40:513–521. 86. Yoshioka H, Sugiura K, Kawahara R, Fujita T, Makino M, Kamiya M and Tsuyumu S. Formation of radicals and chemiluminescence during the autoxidation of tea catechins. Agric Biol Chem 1991;55:2717–2723. 87. Oniki T and Takahama U. Free radicals produced by the oxidation of gallic acid and catechin derivatives. J Wood Sci 2004;50:545–547. 88. Pedersen JA. Handbook of EPR Spectra from Quinones and Quinols, Boca Raton, FL, CRC Press, Inc., 1985.

400 89. Hou Z, Sang S, You H, Lee M-J, Hong J, Chin K-V and Yang CS. Mechanism of action of (-)-epigallocatechin-3-gallate: auto-oxidation-dependent inactivaction of epidermal growth factor receptor and direct effects on growth inhibition in human esophageal cancer KYSE cells. Cancer Res 2005;65:8049–8056. 90. Valcic S, Muders A, Jacobsen NE, Liebler DC and Timmermann BN. Antioxidant chemistry of green tea catechins. Identification of products of the reaction of (-)-epigallocatechin gallate with peroxyl radicals. Chem Res Toxicol 1999;12:382–386. 91. Cos P, Ying L, Calomme M, Hu JP, Cimanga K, Van Poel B, Pieters L, Vlietinck AJ and Berghe DV. Structure–activity relationship and classification of flavonoids as inhibitors of xanthine oxidase and superoxide scanvengers. J Nat Prod 1998;61:71–76. 92. Furuno K, Akasako T and Sugihara N. The contribution of the pyrogallol moiety to the superoxide radical scavenging activity of flavonoids. Biol Pharm Bull 2002;25: 19–23. 93. Lin C-M, Chen C-S, Chen C-T, Liang Y-C and Lin J-K. Molecular modeling of flavonoids that inhibits xanthine oxidase. Biochem Biophys Res Commun 2002;294: 167–172. 94. Ponce AM, Blanco SE, Molina AS, Garcı´ a-Domenech R and Ga´lvez J. Study of the action of flavonoids on xanthine-oxidase by molecular topology. J Chem Inf Comput Sci 2000;40:1039–1045. 95. Valentine JS, Miksztal AR and Sawyer DT. Methods for the study of superoxide chemistry in nonaqueous solutions. Methods Enzymol 1984;105:71–81. 96. Hagerman AE, Dean RT and Davies MJ. Radical chemistry of epigallocatechin gallate and its relevance to protein damage. Arch Biochem Biophys 2003;414:115–120. 97. Kilmartin PA, Zou H and Waterhouse AL. A cyclic voltammetry method suitable for characterizing antioxidant properties of wine and wine phenolics. J Agric Food Chem 2001;49:1957–1965. 98. Le Nest G, Caille O, Woudstra M, Roche S, Burlat B, Belle V, Guigliarelli B and Lexa D. Zn-polyphenol chelation: complexes with quercetin, (þ)-catechin, and derivatives: II Electrochemical and EPR studies. Inorganica Chim Acta 2004;357:2027–2037. 99. Yang B, Kotani A, Arai K and Kusu F. Relationship of electrochemical oxidation of catechins on their antioxidant activity in microsomal lipid peroxidation. Chem Pharm Bull 2001;49:747–751. 100. Kilmartin PA and Hsu CF. Characterisation of polyphenols in green, oolong, and black teas, and in coffee, using cyclic voltammetry. Food Chem 2003;82:501–512. 101. Sakihama Y, Cohen MF, Grace SC and Yamasaki H. Plant phenolic antioxidant and prooxidant activities: phenolics-induced oxidative damage mediated by metals in plants. Toxicology 2002;177:67–80. 102. Eaton DR. Complexing of metal ions with semiquinones. An electron spin resonance study. Inorg Chem 1964;3:1268–1271. 103. Hu C and Kitts DD. Evaluation of antioxidant activity of epigallocatechin gallate in biphasic model systems in vitro. Mol Cell Biochem 2001;218:147–155. 104. Kashima M. Effects of catechins on superoxide and hydroxyl radical. Chem Pharm Bull 1999;47:279–283. 105. Sugihara N, Ohnishi M, Imamura M and Furuno K. Differences in antioxidative efficiency of catechins in various metal-induced lipid peroxidations in cultured hapatocytes. J Health Sci 2001;47:99–106. 106. Brown JE, Khodr H, Hider RC and Rice-Evans CA. Structural dependence of flavonoid interactions with Cu2þ ions: implications for their antioxidant properties. Biochem J 1998;330:1173–1178.

401 107. Zhang Y, Shen S, Tang D, Chen X and Xu C. Study on EGCG complex with Cu2þ. Zhejiang Daxue Xuebao, Nongye Yu Shengming Kexueban 2004;30:290–295. 108. Jungbluth G, Ru¨hling I and Ternes W. Oxidation of flavonols with Cu(II), Fe(II) and Fe(III) in aquous media. J Chem Soc Perkin Trans 2000;2:1946–1952. 109. Ryan P and Hynes MJ. The kinetics and mechanisms of the complex formation and antioxidant behaviour of the polyphenols EGCG and ECG with iron(III). J Inorg Biochem 2007;101:585–593. 110. Andrade RG, Ginani JS, Lopes GKB, Dutra F, Alonso A and Hermes-Lima M. Tannic acid inhibits in vitro iron-dependent free radical formation. Biochemie 2006;88: 1287–1296. 111. McHardy WJ, Thomson AP and Goodman BA. Formation of iron oxides by decomposition of iron–phenolic chelates. J Soil Sci 1974;25:471–482. 112. Adams M, Boils MS, Jr. and Sands RH. Paramagnetic resonance spectra of some semiquinone free radicals. J Chem Phys 1958;28:774–776. 113. Pirker KF, Reichenauer TG, Goodman BA and Stolze K. Identification of oxidative processes during simulated mastication of uncooked foods using electron paramagnetic resonance spectroscopy. Anal Chim Acta 2004;520:69–77.

403

Critical review and appraisal of published clinical literature: Useful skill in biotechnology product development MaryAnn Foote MA Foote Associates, Westlake Village, CA 91362, USA Abstract. Critical review of published literature may be necessary during several stages of biotechnology product development. The reviewer should develop a standardized method for reviewing and comparing published papers on a given topic and should be aware of common errors found in published papers. Keywords: biopharma product development, literature search, medical writing, regulatory documentation.

Introduction Scientists in the field of biotechnology often are required to critically review and appraise published clinical literature in the course of biopharma product development (Table 1). Critical appraisal is the careful assessment of the quality and credibility of the rationale, design, execution, interpretation, presentation, and application of published research, and the assessment of the accuracy, completeness, and clarity of the written report of the research. While a good working knowledge of statistics is helpful, critical evaluation is more than statistics, and a systematic approach is generally most useful in performing critical analysis. Critical analysis is a skill that can be learned. This chapter discusses the need for and the steps necessary to do a good critical analysis of published literature. Common errors seen in scientific publications are identified. A literature search plan Sometimes, only a single paper requires critical analysis for a given project, but usually the current literature base on a given topic requires analysis, particularly when a sponsor is providing information to a regulatory authority based on published literature. To begin, a sponsor should have a documented process to identify papers to be used in a critical analysis to support a regulatory issue. This documented prespecified process is very important because the results of any literature search depend on the databases searched, the terms used, the day the search was conducted, and the skill of the searcher. Searches using different terms or different databases

E-mail: [email protected] (MA. Foote). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00014-8

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

404 Table 1. Reasons why a critical analysis of literature may be needed in biopharma product development. To substantiate statements in regulatory documents, such as investigator’s brochures and protocols To evaluate the feasibility of using a marketed product in a potential new setting To identify and track unlabeled adverse events associated with a marketed product To understand a competitor’s product To understand a class of drugs, a disease, or both

Table 2. Hierarchy of literature in for evidence-based reviews. Anecdotal case reports of single patients Case series without controls Case series with historical controls Analyses of clinical databases or registries Case-control (retrospective) studies Cohort (prospective) studies Single randomized controlled trials Confirmed randomized controlled trials Meta-analyses Cumulative meta-analyses

can produce very different results. A prespecified analysis can assist in showing that the papers reviewed and cited in a regulatory submission were chosen without bias. Some papers that used a standard process in identifying published papers may be helpful to sponsors who are establishing such a process [1,2]. It is important to understand that a hierarchy of literature sources exists in terms of levels of evidence (Table 2). Caution should be used in citing papers of single case reports of uncontrolled case studies when attempting to make a point with a regulatory agency.

Using a systematic approach to critical appraisal Clinical papers are universally written and published using the IMRaD format – Introduction, Materials and Methods, Results, and Discussion – with the addition of other information (title page, abstract, and references). It is useful to keep notes as one proceeds through a paper [3]. Table 3 summarizes some points for a critical reviewer. The reviewer may find it useful to develop a standard form or table layout to capture the same information from all papers identified in the literature

405 Table 3. Questions to ask in a critical review of published papers. Question to ask

Concerns/what to look for

Identifying information Who did the study and where was it published?

 

Introduction What was studied, and why?



The need or rationale for the study is clear



Warning sign: The authors assume that readers will know the specific purpose or goals of the study

Materials and methods What general type of study was conducted?



The study design was appropriate to answer the question

What, exactly, was studied and how?



The variables studied are clear and appropriate Description of the intervention(s) is adequate Identification or control of potential confounding variables is adequate Warning sign: Invalidated surrogate endpoints (or ‘‘markers’’) are used The unit of observation is clear and appropriate

Is the purpose or goals of the study missing or unclear?

The authors are familiar and reputable Financial disclosures, sponsorship, and affiliations are provided  The journal is reputable and peerreviewed journal  It is clear when was paper submitted, revised, and accepted  The title adequately and completely describes the main topic or message

   

How was sample selected?



Warning sign: Sample selection may have been biased

How were participants assigned to groups?



Bias may have been introduced when assigning participants to groups

How was the sample size determined?



Warning sign: The sample size was inadequate to provide a statistically significant results

How and under what conditions were the data collected?



Warning sign: Errors in measurement contribute to variability

406 Table 3. (Continued ) Question to ask

Concerns/what to look for

What statistical analyses were used?

 

Results Not all patients or data are reported.

The analyses are reported completely Warning sign: It is not clear if the analysis was per protocol or intent to treat  The stopping rules are clear  Any interim analyses are revealed 

Warning sign: Carelessness in conduct of study or intentional disregard to improve results is evident

Unplanned or ‘‘post hoc’’ analyses have been emphasized at the expense of the primary analyses.



Warning sign: Subgroup or post hoc analyses are often based on minimal data and are suggested by results seen in the data

Results are reported inappropriately or inadequately.



Both p values and 95% confidence intervals are reported



Warning sign: References cited are dated, nonstandard (i.e., meeting abstracts), or questionable (i.e., in obscure journals or in foreign language)

Discussion What is known about the problem and its solution?

The results are interpreted correctly The study was adequately powered to show differences  Warning sign: The baseline risk for an outcome was low  Warning sign: The results have limited applicability to the general population

What do the results mean?

 

The implications of the study are not provided.



How will the study affect science or the practice of medicine?



The results must confirm or repute the work of other scientists.  Warning sign: Study limitations are not discussed Warning sign: The conclusions are inconsistent with the problem studied, the aims of the study, or both  Warning sign: The authors do not relate their work to clinical practice  Warning sign: The results do not support the conclusions

407 Table 4. Sample standardized form used in a literature review to understand chemotherapy drugs used to treat patients with metastatic breast cancer. Citation Objective Patients Intervention Study design Results

Disease-free survival – Overall survival – Time to progression – Safety – Conclusion

Full citation should be given Note what the authors’ state is the objective of the trial; if not given, it generally can be surmised Supply the total number of patients Provide names of drugs and dosages Can be briefly stated, e.g., patients randomly assigned to treatment with drug x for three 21-day cycles Provide the most significant results that are of interest to you. All papers should be evaluated for the same endpoints, even if some papers do not report them

Provide conclusion given by authors, not your conclusion of the results of the study; may make notation where you significantly disagree

search. It is important that the question to be answered in the search is well understood so that the standard form is appropriate (Table 4). Identifying information Critical appraisal begins with a quick review of the paper, the author, and the journal. The title of the paper should accurately and completely describe the population and setting or provide information about the main topic or message of the paper. Review of the authors and their affiliations may indicate known leaders in a given field, industry or academic affiliations, and funding for the study. Most journals publish conflict-of-interest statements to provide information to potential sources of bias. It is also interesting to note, if this information is provided, when the paper was submitted, revised, accepted, and published. A long lag between submission and acceptance may indicate that substantial revisions were required. Abstract and keywords The abstract should provide an abbreviated summary of the study and should allow the reader to determine whether the setting and patient

408 population were appropriate [4]. Structured abstracts (i.e., abstracts with many predetermined sections) often are precise and accurate. The keywords, if done properly, can be reflective of methods used and results obtained. Sometimes, key words are MeSH (Medical SubHeadings; www.nlm.nih.gov/ mesh/meshhome) and in any case provide more information than merely the words in the title. Introduction The introduction should be a relatively concise 3–4 paragraphs that move from general knowledge to the specific research question [5]. A good introduction will provide information about the patient population and the disease studied, what is known about the research question, type of study used to answer the research question (e.g., prospective, retrospective), and what is new and important about the study in particular. The paper being reviewed must state a hypothesis to be tested or a question to be answered. If this information is not given, it is impossible to critically review the materials, methods, statistical analyses, and results. Materials and Methods This section may be title ‘‘Materials and Methods’’ or ‘‘Patients and Methods’’. In any case, it is a section that greatly benefits from the use of multiple subheadings [3]. The methods section is usually one of the longest sections of a published paper because the paper must provide enough information to allow assessment of the validity of the study and the results, to allow another researcher to repeat and verify the results, and to allow a critical reviewer to understand if the appropriate tests were used [6]. This section should provide how the patients were recruited and selected and how they were randomly assigned to interventions. If patients were not randomly assigned to treatment groups, the allocation method should be noted in the review. The start and end dates of the study should be provided particularly because in some fields, the practice medicine changes very quickly. This section also should provide information on inclusion and exclusion criteria; use of placebos and controls; and how investigators, staff, and patients were blinded to treatment, how medication was masked, and how laboratory/radiographic data were blinded to clinical staff. Lack of this information suggests a lower level of evidence for the paper being reviewed. The primary and secondary endpoints must be prospectively stated. It is necessary to know how these endpoints were to be evaluated, when they were evaluated, and how the evaluation was done. A detailed statistical plan should be provided for assessment of the appropriateness of the tests and study design.

409 Results The results section should mirror the methods section, that is, all methods cited must have a result and all results must have a corresponding method [3]. The results section should also provide the number of participants in the study and should indicate whether the treatment groups were sufficiently similar to be adequately compared. The paper should provide basic demographic information, the number of patients who started and completed the study, the number of patients who discontinued study or died on study, and all adverse events and serious adverse events. A CONSORT figure (see www.consort-statement.org for free download of a figure template) follows the flow of patients through a trial. If these data are missing, the trial should be suspect. All patients must be accounted for, not only the patients who achieved the desired outcome. If the study had a drug or treatment intervention, it is important that the results provide data on the number of patients who were able to receive the required or expected number of doses. Any dose-limiting toxicities or identification of a maximum tolerated dose should be provided. Many papers omit critical safety information. It should be remembered that all drugs that are effective have some side effects in some patients. Only prespecified endpoints should be reported. Use of post hoc analyses suggests that the endpoints were not reached and that the authors struggled to find something to report. ‘‘Trends’’ in data are valid only if a statistical trend analysis was prespecified. Otherwise, the study has failed to reach statistical significance and is a ‘‘negative’’ or inconclusive study. Negative studies are useful, however, because they illustrate what should not have been done and allow other researchers to refine the original hypothesis. The results section should provide specific information in tables or figures that are cited in text and that are appropriate (i.e., tables should not used where figures would better illustrate the data). Appropriate statistical variance should be reported (e.g., mean, median, range, 95% confidence interval).

Discussion The discussion section should provide a balanced interpretation of the results and should not merely repeat the results. This section should compare and contrast findings to other studies and should provide the impact of the study on medical practice. The limitations of the study should be clearly defined and all unexpected findings discussed. No new data that did not appear in the results section should be added to the discussion. The data should not be extended beyond the study population. The discussion should support the hypothesis or provide a reason why the hypothesis was not supported.

410 Concluding remarks Critically reviewing and appraising clinical literature is a time-consuming and difficult process. It is a necessary process for many aspects of drug development and accompanying regulatory documentation. Using a methodical approach, section-by-section, critical appraisal is a skill that can be learned. References 1. Foote MA and Welch W. Biopharmaceutical drug development: filgrastim (r-metHuGCSF) use in patients with HIV infection. J Hemato Stem Cell Res 1999;8:S3–S8. 2. Smalling R, Foote MA, Molineux G, Swanson SJ and Elliott S. Drug-induced and antibody-mediated pure red cell aplasia: a review of literature and current knowledge. Biotech Ann Rev 2004;10:237–250. 3. Foote MA. A simple way to write, edit, or review clinical manuscripts to ensure logical and uniform presentation of data. AMWA J 2005;20:123–124. 4. Foote MA. Some concrete ideas about manuscript abstracts. Chest 2006;129:1375–1377. 5. Foote MA. How to make a proper introduction. Chest 2006;130:1935–1937. 6. Foote MA. Materials and methods: a recipe for success. Chest 2008;133:291–293.

411

The current status and future potential of personalized diagnostics: Streamlining a customized process Terri D. Richmond School of Pharmacy, Regulatory Science Program, University of Southern California, 1540 Alcazar St. CHP G32, Los Angeles, CA 90033, USA Abstract. Recent genetic discoveries and related developments in genomic techniques have led to the commercialization of novel diagnostic platforms for studying disease or gauging therapeutic outcomes in individual patients. This newly emerging field is called ‘‘personalized medicine,’’ and uses the patient’s genetic composition to tailor strategies for patient-specific disease detection, treatment, or prevention. Personalized diagnostic tests are used to detect patient-to-patient variations in gene or protein expression levels, which act as indicators for drug treatments or disease prognosis. In turn, medical professionals can better answer questions such as: ‘‘Who should be treated with which drug?’’ and ‘‘How should the treatment be administered?’’ The regulations governing personalized medicine can be complicated because they encompass in vitro diagnostic systems and laboratory tests as well as methods of disease treatment and patient care. Industry, academia, medicine, and the Food and Drug Administration (FDA) are all involved in the cultivation of the field: substantial collaborations between drug developers and regulatory authorities are required to consider and shape emerging regulations as personalized drug strategies mature. Some of the regulatory issues identified by industry and the FDA about personalized medicine and personalized diagnostics will be addressed. In addition, relevant collaborations, advances, and current and draft regulatory guidances will be discussed with respect to the future of personalized medicine. Keywords: pharmacogenomics, regulations, biomarker, co-development, genomics/genetics.

Introduction The field of personalized medicine has gained momentum recently, as evidenced by several new personalized diagnostic approvals by the United States Food and Drug Administration (US FDA) since 2005. Personalized medicine uses a patient’s genetic composition to tailor strategies for patientspecific disease detection, treatment, or prevention. In vitro diagnostics can help create a foundation for patient-targeted medicine by exploiting known, validated biomarkers to identify patient genotypes and thus predict responses to certain disease treatments and prognosis. Proper progression and development of this field, however, requires substantial collaborative efforts from industry, academia, medicine, and the FDA. The collaborations, regulations, and developments currently progressing in personalized medicine and personalized in vitro diagnostics are discussed and some issues are provided that will illustrate concerns that may develop. Tel.: (416) 885-3186. Fax: (323) 442-2333.

E-mail: [email protected] (T.D. Richmond). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00015-X

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

412 What is personalized medicine? Personalized medicine has been described as ‘‘using information about a person’s genetic makeup to tailor strategies for the detection, treatment, or prevention of disease’’ [1]. No consensus on the definition of personalized medicine has been reached because the term currently exists in large part as a concept instead of a reality; however, most researchers agree that the concepts embodied by this field attempt to answer two central questions: ‘‘Who should be treated with which drug?’’ and ‘‘How should the treatment be administered?’’ In July 2006, the FDA approved the first molecular-based test for breast cancer metastasis (GeneSearch BLN Assay by Veridex, Johnson & Johnson). In September 2007, the FDA approved the first genetic test for warfarin sensitivity (Nanosphere Verigene Warfarin Metabolism Nucleic Acid Test), which differentiates patients according to the variation of two genes involved in warfarin metabolism, CYP2C9 and VKORC1. While these advances are celebrated by pharmacogeneticists and physicians alike, the process of translating basic scientific developments to the clinic has generally been slower than desired. The regulations governing personalized medicine can be complicated because they encompass in vitro disease diagnostic (IVD) systems and laboratory tests as well as disease treatment and patient care. The plethora of novel IVD innovations require careful consideration before acceptance in the clinic, and have created pressure to which regulators must respond and adapt.

What are personalized diagnostics? Personalized diagnostics are as tests that use a person’s genetic or proteomic makeup to detect or diagnose disease. The FDA Center for Devices and Radiological Health (CDRH) Office of In Vitro Diagnostic Device Evaluation and Safety (OIVD) defines IVDs as ‘‘y those reagents, instruments, and systems intended for use in diagnosis of disease or other conditions, including a determination of the state of health, in order to cure, mitigate, treat, or prevent disease or its sequelae. Such products are intended for use in the collection, preparation, and examination of specimens taken from the human body’’ [2]. Although IVDs are classified as medical devices, they may also be considered biologics, and be subject to the corresponding regulations. The FDA classifies personalized IVDs as class I, II, or III based upon level of control necessary for ensuring product safety and effectiveness [3–5]. Diagnostics are subject to both premarket and postmarket controls, much like other medical devices; however, diagnostic products also are subject to the Clinical Laboratory Improvement Amendment (CLIA) regulations.

413 What are some examples of personalized diagnostics? GeneSearch BLN assay by Veridex, Johnson & Johnson (class III, CLIA complexity high) Approval of the GeneSearch BLN Assay in July 2006 marked the first FDAapproved molecular diagnostics assay for breast lymph node testing. Results of this rapid test are available during surgery and provide some women with the option of avoiding a second operation [6]. This test benefits the field of personalized diagnostics because it sets the precedent for approval of future in vitro real-time reverse transcriptase polymerase chain reaction (RT-PCR) based tests beyond the text found in the FDA Guidance Class II Special Controls Guidance Document: RNA Preanalytical Systems, and was approved at a time when the FDA was seeking input toward regulations governing In Vitro Diagnostic Multivariate Index Assays (For example, real-time RT-PCR or microarray). Nanosphere Verigene warfarin metabolism nucleic acid test (class II, CLIA complexity high) The Nanosphere Verigene Warfarin Test is an allele-based test (Cytochrome P450 2C9 and VKORC1) based upon the Invader UGT1A1 Molecular Assay predicate device by Third Wave Technologies, Inc. The regulatory strategy of this assay was based upon the approval of numerous predicate devices under the categories of 21 CFR y862.3360. The initial submission for a Drug Metabolizing Enzyme Genotyping System was brought by Roche (CYP450 test), and the development of the Nanosphere Verigene Warfarin Metabolism Nucleic Acid Test illustrates how one type of test system can be modified for many purposes in only 3 years. Nanosphere’s FDA-approval letter in September 2007 followed quickly behind the FDA announcement to update the warfarin label to include information about genetic predisposition of patients to warfarin sensitivity, and may mark the beginning of a new age where genetic data is requisite to the approval and labeling of some new drugs and biologics. AmpliChip CYP450 test by Roche (class II, CLIA complexity high) The Roche CYP450 test identifies a patient’s CYP2D6 genotype from whole blood genomic DNA. The test is based on the premise that certain genes regulate drug metabolism, and differences in those genes cause differences in the metabolism of certain drugs. In December 2004, Roche advanced the field of personalized medicine by creating a chip that measures polymorphic differences in known drug metabolism-related genes, to aid clinicians in their choice of therapeutic strategies for drugs that are metabolized by the CYP2D6 gene product. The classification of the Roche CYP450 Test to class II allowed for development of additional novel Drug Metabolizing Enzyme Genotyping Systems based on the 510(K) pathway.

414 MammaPrint by Agendia (class II, CLIA complexity high) MammaPrint was approved by the FDA as a class II in vitro diagnostic device in February 2007. At the same time, the FDA created an additional guidance entitled Class II Special Controls Guidance Document: Gene Expression Profiling Test System for Breast Cancer Prognosis. MammaPrint represents the first US-based expression array, using microarray technology to assay RNA from breast cancer tissue for a subset of 70 genes whose mutation predicts tumor recurrence. This test is one of many rapid advances in microarrays and other complex assays that have prompted the FDA to progressively analyze the development, performance, and benefit-to-risk ratio of such IVDs. What regulations cover personalized diagnostics? Most personalized diagnostics are used in clinical laboratories. Laboratory testing has been regulated by the CLIA since 1988 when Congress implemented quality standards to ensure the accuracy, reliability, and timeliness of patient test results developed by clinical laboratories. The in vitro diagnostics used by laboratories are governed by the FDA Center for Devices and Radiological Health Office of In Vitro Diagnostic Device Evaluation and Safety. Only recently have specific guidance been issued for personalized IVDs. Many guidance for industry remain in draft format. Clinical laboratory improvement amendments (CLIA) Under CLIA, the FDA is responsible for assigning CLIA regulatory categories based on IVD potential risk to public health. The categories include waived tests, moderate complexity tests, and high complexity tests. Tests may be waived from regulatory classification if they are approved by the FDA for home use or they are listed under 42 CFR y493. Additionally, a test may be waived if it is simple and accurate with the likelihood of erroneous results almost nonexistent or if the test poses no reasonable risk of harm to the patient if it is performed incorrectly [7]. In the absence of a waiver, a test system is assigned a score from 1 (low complexity) to 3 (high complexity) for each of 7 categories (Table 1). A total score of 12 or less categorizes the test as moderate complexity, while a score of 13 or more categorizes the test as high complexity. Laboratories performing moderate and/or high complexity tests must comply to specific laboratory standards governing certification, personnel, proficiency testing, patient test management, quality assurance, quality control, and inspections under 42 CFR y493. Most personalized IVDs receive a high complexity CLIA classification. Thus, laboratories performing these tests must obtain a certificate (as in Subpart C or Subpart D 42 CFR y493) from Health and Human

415 Table 1. CLIA categorization criteria. These seven criteria are examined with respect to the complexity level of an In Vitro Diagnostic (IVD). A rating of (1) denotes low complexity, and a rating of (3) denotes high complexity. When these scores are totaled, a diagnostic with a final score of 12 or less is considered moderate complexity, and that with a score above 12 is considered high complexity. Knowledge Training and experience Reagents and materials preparation Characteristics of operational steps Calibration, quality control, and proficiency testing materials Test system troubleshooting and equipment maintenance Interpretation and judgment

Services (HHS) for these tests, and must meet the applicable requirements of subparts F, H, J, K, M, and Q of 42 CFR y493. FDA Center for devices and radiological health (CDRH) office of in vitro diagnostic device evaluation and safety (OIVD) The regulatory oversight of an in vitro diagnostic system mirrors the regulatory oversight of any medical device: The FDA classifies a diagnostic into Class I, II, or III according to the level of regulatory control that is necessary to assure safety and effectiveness. In turn, this classification determines the appropriate premarket submissions process. Approved products are published on the OIVD webpage in 2 databases: the 510(K) database for some class I and most class II products, and the premarket approval (PMA) database for most class III diagnostics and devices. Special controls The FDA is proactive by developing guidances that act as road maps for the development of personalized diagnostics and other genetic tests [8]. Flexible down-classification of low-risk IVDs by de novo classification is an example of such an initiative, and is evidenced by the recent increase in special controls documents for class II personalized diagnostics (Table 2). The de novo process of device and diagnostic classification is fundamentally a reactive process, which relies on the manufacturer to develop special controls to govern the regulation of a diagnostic. It remains to be seen if the FDA can work with research and industry to develop guidance that impart a clearer road map for the manufacturer and if the cultivation of clear expectations, deliverables, and outcomes by industry and the FDA can speed safe and efficacious personalized diagnostic development.

416 Table 2. Timeline of FDA-related documents for personalized diagnostics. 01/1972 01/1994 11/2002 03/2004 03/2005 03/2005 03/2005 09/2005 10/2005 05/2007 06/2007 07/2007 08/2007

Bureau of Drugs established a diagnostic products staff Guideline for the manufacture of in vitro diagnostic products The office of In Vitro Diagnostic Device Evaluation and Safety formed to consolidate all regulatory activities for IVDs Class II special controls guidance document: Factor V leiden DNA mutation detection systems Class II special controls guidance document: Drug metabolizing enzyme genotyping system Class II special controls guidance document: Instrumentation for clinical multiplex test systems Guidance for industry: Pharmacogenomic data submissions Class II special controls guidance document: RNA preanalytical systems (RNA collection, stabilization and purification systems for RT-PCR used in molecular diagnostic testing) Class II special controls guidance document: CFTR gene mutations detection systems Class II special controls guidance document: Gene expression profiling test system for breast cancer prognosis Guidance for industry and FDA staff: Pharmacogenetic tests and genetic tests for heritable markers Draft guidance for industry, clinical laboratories and FDA staff: In vitro diagnostic multivariate index assays Guidance for industry: Pharmacogenomic data submissions– companion guidance

Note: IVD, in vitro diagnostic; RT-PCR, reverse transcriptase polymerase chain reaction.

Current FDA initiatives in personalized regulations The FDA recognizes personalized IVDs are crucial factors in modern drug development [9]. Hard work and careful scrutiny is necessary to develop safe and effective IVDs. The FDA is actively participating in numerous studies to learn more about pharmacogenomics and the field of personalized medicine. Collaborations The FDA is developing guidances with numerous government programs to assist diagnostic companies with their establishment of molecular diagnostic systems. The FDA is collaborating with NIH-based programs such as the Program on Assessment of Clinical Cancer Tests (PACCT), Early Response Detection Network (EDRN), Specialized Program of Research Excellence (SPORE), and the NIH Proteomic program to enable the advancement of biomarkers in personalized medicine. The FDA is working with the National Cancer Institute (NCI) and American Association for Cancer Research

417 (AACR) to draft guidances for the purposes of promoting and protecting human health through personalized medicine [8]. Voluntary genomic data submissions process (VGDS) In 2005, the FDA published a guidance for industry entitled Pharmacogenomics Data Submissions. This guidance described a voluntary genomic data submissions process that was organized by the FDA Interdisciplinary Pharmacogenomic Review Group (IPRG). The purpose of this process is to promote greater understanding of the issues surrounding the use of pharmacogenomic data and to enable timely review of future submissions where genomics are an integral part of specific studies in a drug development program. The developments from VGDS will encourage acceptance and expansion of the personalized diagnostic field. This program is currently being extended to web-based courses for educational purposes [8]. Microarray quality control consortium (MAQC) The MAQC has been designed to find sources of variability in microarray data. In the first phase, the consortium explored variability from differential gene expression data. In the second phase, the consortium will investigate the variability of predictive genomic signatures. Predictive safety testing consortium (PSTC) The PSTC is a FDA Critical Path initiative that was announced on March 16, 2006 [10]. The consortium was created to identify new and increasingly accurate methods to predict drug safety. As such, the consortium consists of 16 pharmaceutical companies developing drug safety biomarkers in the fields of carcinogenicity, kidney, liver, and vascular injury. These companies have shared their data within the consortium. In turn, these same companies verify and validate the predicted safety biomarkers. The FDA acts as an advisor in lieu of their normal regulatory role, and the C-Path serves to collect and summarize the data. From this work, the FDA hopes gain evidence to provide novel FDA Guidances for personalized, biomarker-driven studies. This type of collaboration is key to developing and spearheading new directions for drug safety platforms of the future. FDA drug-diagnostic co-development draft concept paper Although the FDA has traditionally regulated drugs and diagnostics individually, there is an emerging belief that drugs and diagnostics should be developed and reviewed in concert. Drug-diagnostic co-development stems in part from the concept that diagnostics can be safe and efficacious,

418 but have minimal clinical utility. Lack of clinical utility hurts companies which create useless, non-profitable tools, and wastes the resources of regulatory bodies which spend time reviewing such diagnostic applications. History has shown the value of diagnostics that accurately predict the response of patients to drugs; estrogen receptor hormones have been able to predict candidates for hormone therapy in women with breast cancer since 1972 [11]. More recently, the new Nanosphere Verigene Warfarin Test accurately predicts the sensitivity of patients to a previously fatal drug. The FDA has developed a possible approach to combine IVD and drug development and regulation. This approach would present itself when the clinical validation of the biomarker has a direct influence on the clinical utility of the co-developed product. Within the framework of IVD/drug co-development, each product is evaluated for safety and efficacy in collaboration with the respective FDA office (CDER/CDRH/CBER, or combination products). In the event that the test is not important to the clinical selection of patients to receive the particular drug but the test gives meaning to drug function, it may be individually developed, or developed as a subsection to the overall drug review [12]. Current draft guidances driven by industry The novelty of personalized medicine and personalized diagnostics is recognized by the FDA and the scientific community; however, some diseases and their cognate diagnostic tests can be complex and the FDA has to carefully consider the implications of these tests for the protection of study subjects. The current guidance for many in vitro tests is minimal. As such, the FDA is making every effort to formulate guidances appropriate to pharmacogenomics and personalized diagnostics. Public commentary is valuable, and is considered with proper gravity. Accordingly, these guidances are still in development and will set the benchmark for future IVD initiatives. Draft guidance: In vitro diagnostic multivariate index assays (IVDMIA) In general, In Vitro Diagnostic Multivariate Index Assays (IVDMIAs) are IVDs that use multiple variables (i.e., gene or protein expression profiles) to diagnose disease or other patient-related outcomes. The regulations for IVDMIAs are under development. The first draft guidance for IVDMIAs was released by the FDA in September 2006 in an effort to regulate IVDMIA safety and efficacy. The draft guidance proposed that IVDMIAs should be classified as devices instead of laboratory tests that are solely governed by CLIA. Extensive public commentary followed and culminated in a public meeting held in February 2007. The same week of the public meeting, the FDA cleared the first IVDMIA by de novo classification (The MammaPrint Test).

419 The second draft guidance for IVDMIAs was issued in July 2007. Due to the substantial public commentary revolving around uncertainties in IVDMIA definition in the first draft guidance, the definition of an IVDMIA was expanded upon in the new draft guidance. In the 2007 draft, IVDMIAs were described as diagnostics that are devices, under 201(h) of the Food, Drug and Cosmetic Act; interpret the values of multiple variables to yield a single, patient-related result; and provide a result through a complex method, of which the end user is incapable of performing independently [13]. At this time, the guidance sets out to define subtle differences between devices that fall under this categorization, and gives guidance for premarket and postmarket requirements for IVDMIAs. Importantly, the FDA believes that most IVDMIAs will fall under class II or III and recommends that sponsors interact with the Agency as early as possible during device development. In response to comments regarding time, the FDA created a grace period for currently marketed IVDMIAs that extends 12 months from the date of final guidance document publication and allows 6 more months if a 510(k) or premarket approval application is submitted within the initial 12-month grace period. The FDA intends to enforce regulatory requirements on all marketed IVDMIAs within 18 months of final guidance document publication. Guidance for industry: Pharmacogenomics data submissions The FDA recognizes the importance of personalized medicine and diagnostics, yet, many pharmaceutical companies are reluctant to embark upon studies involving genetic markers and testing because they fear FDA response during IND, NDA, or BLA submissions. The FDA released the Guidance for Industry: Pharmacogenomics Data Submissions in March 2005 to provide instruction for voluntary pharmacogenomic data submissions, and assuage the concerns of the pharmaceutical community. A 2007 companion guide to the guidance reflects FDA experience gained since that guidance was issued. The FDA believes that this companion guidance, together with the March 2005 guidance, will benefit companies considering the submission of voluntary or essential genomic data in their respective applications. Issues in personalized diagnostics The introduction of personalized diagnostics to drug development and medicine has created a movement toward novel regulatory guidelines and new product development pathways. The utilization of personalized IVDs will become increasingly important and complex as the FDA becomes more

420 experienced with these novel technologies and advances their regulatory guidelines correspondingly. Currently, drugs are developed by collaborating data from a large population; a safe and effective drug is deemed acceptable by the FDA for treatment if efficacy is demonstrated on a normal distribution of patients. A few patients receiving such FDA-approved drugs experience adverse responses from these drugs, despite convincing safety and efficacy data overall. Personalized diagnostics can have a positive effect on overall safety and efficacy profiles by helping to identify subpopulations of individuals who deal differently with a particular drug than the typical patient. A reliable diagnostic can create more stringent patient inclusion/exclusion criteria. As a result, the range of responses exhibited by the patient population is further characterized and the variability of patient response is minimized. The Verigene Warfarin Metabolism Test is an example of the important potential that such individualized diagnostics have to offer. Unfortunately, the reality is that companies may not embrace the use of personalized diagnostics because personalized IVDs will inevitably influence critical drug development parameters such as development time, development cost, patient sales life, and yearly sales. Like most innovations, the integration of diagnostics and drug development will take time and effort. New medical processes will change drug development strategies; however, there are both positive and negative outcomes associated with this integration. Patient base A current common rule for large pharmaceutical companies is that a drug must show potential for $500 million in peak annual sales to justify discovery, development, and marketing costs [14]. In almost all cases, the target market will be smaller for a personalized drug than one targeted to the population at large. The introduction of a drug-associated personalized IVD will change clinical trial requirements, site selections, validation times, inclusion/exclusion criteria, as well as many other clinical trial-associated factors. Development costs will change variably for each drug, but may be higher rather than lower as a result of more stringent clinical trial requirements. At the same time, the number of responders to a given personalized drug may not be known a priori. This leads each pharmaceutical company to two questions: ‘‘Is a personalized drug still worth developing?’’ and ‘‘Should we invest in the development of a personalized IVD?’’ Incentive must exist to drive pharmaceutical companies to co-develop drug and diagnostics. Incentives for drug-diagnostic co-development are numerous and may include increased phase 3 clinical trial success rate [15]; orphan drug exclusivity; stronger intellectual property position [16]; premium pricing; stronger success rate; and individual revenues from diagnostic systems.

421 Clinical trial design In the current environment, phase 3 drug trials generally rely on randomized controlled trials (RCTs) for drug approval. RCTs have broad inclusion/ exclusion criteria, and accept a genetically divergent group of patients. Therefore, as many as 30,000 patients are required for any given trial in order to demonstrate drug efficacy. Some of these patients will be nonresponders to the drug, and some will react with adverse events. The heterogeneic patient population may not be sufficient to identify important clinical events or adverse reactions that would be observed in the population at large. If the true responders could be identified with IVDs that relate to drug effect, then the number of patients included in a phase 3 trial could be much smaller. In other words, in vitro diagnostics could change the design of clinical trials to focus on responsive patients. Social infrastructure For personalized diagnostics to be accepted as standards of patient care, the medical community must accurately appreciate their purpose and necessity. At a minimum, this approach entails communication within the field [17]; listing on formularies and reimbursement plans; and integration and acceptance as part of the standard of care [18]. Conclusions The value of targeted medicine cannot be understated; a drug that can be targeted optimally to a specific population of responders increases safety and efficacy of that drug. Drug safety and efficacy is correlated with increased patient compliance, decreased adverse events, and is fundamental to the mission of the FDA. Careful development of diagnostics for personalized medicine will promote higher drug safety and efficacy in correctly targeted patient populations. Given these therapeutic benefits, the motivation behind FDA’s involvement in the development of personalized medicine is clear. Nonetheless, the future still hold numerous challenges for all the participants in the field. References 1. 2. 3. 4. 5.

Collins, FS. Personalized medicine. The Boston Globe, July 17, 2005. Definitions, 21 CFR y809.3 (2006). Clinical Chemistry and Clinical Toxicology Devices, 21 CFR y862 (2001). Hematology and Pathology Devices, 21 CFR y864 (2001). Immunology and Microbiology Devices, 21 CFR y866 (2000).

422 6. Food and Drug Administration Press Release. FDA Approves First Molecular-Based Lab Test to Detect Metastatic Breast Cancer, July 16, 2007. 7. Laboratory Requirements, 42 CFR y493 (2004). 8. Personalized Health Care: Opportunities, pathways, resource. Available at http:// www.hhs.gov/myhealthcare. Accessed on September 21, 2007. 9. FDA Guidance for Industry: Pharmacogenomics data submissions. US Food and Drug Administration Web site, March 2005. Available at http://www.fda.gov/cber/gdlns/ pharmdtasub.htm. Accessed on November 21, 2007. 10. Predictive Safety Testing Consortium. Available at http://www.c-path.org/Predictive SafetyTestingConsortium/tabid/219/Default.aspx. Accessed on November 21, 2007. 11. McGuire L. Current status of estrogen receptors in breast cancer. Cancer 1975;36(2):638–644. 12. Phillips KA, Van Bebber S and Issa AM. Diagnostics and biomarker development: priming the pipeline. Nat Rev Drug Disc 2006;5:462–469. 13. FDA Draft Guidance: In Vitro Diagnostic Multivariate Index Assays. US Food and Drug Administration Web site, July 26, 2007. Available at http://www.fda.gov/cdrh/oivd/ guidance/1610.pdf. Accessed on November 21, 2007. 14. Trusheim MR, Berndt ER and Douglas FL. Stratified medicine: strategic and economic implications of combining drugs and clinical biomarkers. Nat Rev Drug Disc 2006;6: 287–293. 15. Behrman RE. Critical Path Initiative, An FDA update, December 6, 2006. Available at http://rd100.cceb.med.upenn.edu/crcu_html/roadmap/mtgs/dec_steering_2006/SP_Behrman_ 200612.pdf Accessed on November 21, 2007. 16. General Information Concerning Patents. Available at http://www.uspto.gov/web/offices/ pac/doc/general/index.html#novelty. Accessed on November 22, 2007. 17. Lesko LJ. Personalized medicine: elusive dream of imminent reality? Clin Pharmacol Ther 2007;81:807–816. 18. Meadows M. Genomics and personalized medicine. Food and Drug Administration Consumer Magazine, November–December 2005. Available at http://www.fda.gov/fdac/ features/2005/605_genomics.html. Accessed on November 27, 2007.

423

Recent advances in the development of transgenic papaya technology Evelyn Mae Tecson Mendoza1,, Antonio C. Laurena1 and Jose´ Ramo´n Botella2 1

Institute of Plant Breeding, Crop Science Cluster, College of Agriculture, University of the Philippines Los Ban˜os, College, Laguna, Philippines 2 School of Integrative Biology, Faculty of Biological and Chemical Sciences, University of Queensland, Brisbane, Queensland 4072, Australia Abstract. Papaya with resistance to papaya ringspot virus (PRSV) is the first genetically modified tree and fruit crop and also the first transgenic crop developed by a public institution that has been commercialized. This chapter reviews the different transformation systems used for papaya and recent advances in the use of transgenic technology to introduce important quality and horticultural traits in papaya. These include the development of the following traits in papaya: resistance to PRSV, mites and Phytophthora, delayed ripening trait or long shelf life by inhibiting ethylene production or reducing loss of firmness, and tolerance or resistance to herbicide and aluminum toxicity. The use of papaya to produce vaccine against tuberculosis and cysticercosis, an infectious animal disease, has also been explored. Because of the economic importance of papaya, there are several collaborative and independent efforts to develop PRSV transgenic papaya technology in 14 countries. This chapter further reviews the strategies and constraints in the adoption of the technology and biosafety to the environment and food safety. Constraints to adoption include public perception, strict and expensive regulatory procedures and intellectual property issues. Keywords: papaya, Carica papaya, transgenic, genetically modified, biosafety, PRSV-resistant papaya, long shelf life papaya.

Introduction Papaya is an important fruit commodity with total world production of 6.708 million metric tons in 2004, up from 6.415 million metric tons average from 1996 to 1997 [1]. The top ten papaya producers are Brazil, Mexico, Nigeria, Indonesia, India, Ethiopia, Congo, Peru, China and the Philippines. Mexico, Brazil and Belize are the main exporters of papaya to the US market while USA (Hawaii) and the Philippines supply the Japanese market. The major suppliers of papaya to the European Union market are Brazil (W50%) and Netherlands (17%) [2]. Ripe papaya fruits are popularly eaten fresh and can be processed into jam, jelly, marmalade, candy, puree and as a component of tropical fruit cocktails. The green or unripe fruits can be added to viands as vegetable and

Corresponding author: Tel.: þ63-49-576-0025. Fax: þ63-49-536-3438.

E-mail: [email protected]; [email protected] (E.M. Tecson Mendoza). BIOTECHNOLOGY ANNUAL REVIEW VOLUME 14 ISSN 1387-2656 DOI: 10.1016/S1387-2656(08)00019-7

r 2008 ELSEVIER B.V. ALL RIGHTS RESERVED

424 are made into a pickled product called achara. In addition, the latex from green papaya fruits is the source of papain which is used as meat tenderizer, in clarifying beer, in the production of fish concentrates for animal feed and various food processing steps. Papaya is also utilized in the pharmaceutical and cosmetics industries. The papaya industry is beset by two major problems, disease infestation especially by the papaya ringspot virus (PRSV) which can cause up to 100% losses and postharvest losses of up to 30%–40%. PRSV has caused enormous devastation of papaya farms in various countries worldwide resulting in decline in fruit production. On the other hand, papaya fruits in general have short shelf life, especially in tropical countries. The quality of the fruits suffers during handling, storage, transport, distribution and retail resulting in poor appearance, texture, flavor and overall acceptability. Breeding for resistant papaya against PRSV has resulted only into tolerant varieties because of the absence of PRSV resistance in the Carica family. Attempts to hybridize wild relatives with Carica papaya had also been unsuccessful due to incompatibility and production of infertile hybrids [3]. It was only recently that the hybridization of C. papaya with Vasconcellea quercifolia resulted into fertile resistant hybrids [4]. In the mid-1980s, efforts to develop PRSV-resistant papaya by using genetic engineering by the groups of Dr. Dennis Gonsalves of Cornell University and Dr. Richard Manshardt of the University of Hawaii [5] resulted in the commercialization of two transgenic cultivars, SunUp and Rainbow in 1998 [6]. This collaboration led to the development and commercialization of the first transgenic fruit crop, first transgenic tree and the first from a public institution. The success in developing PRSV-resistant papaya using transgenic technology and advances made in various aspects of the technology has encouraged other institutions to utilize the technology in addressing other problems of papaya that are not amenable or difficult to address using conventional techniques. These problems include short shelf life and postharvest losses, insect infestation by mites and aphids, susceptibility to root and fruit rot, toxicity to high aluminum in acid soils, etc. Further, the possibility of using papaya for the production of pharmaceuticals has been explored. The importance of papaya to the economies of many countries, especially the developing ones, is evidenced by the wide utilization of the transgenic technology for the improvement of papaya in these countries. In many cases, technology transfer has been facilitated through collaboration and/or networking between and among research institutes. In 2004, the University of Hawaii Center for Genomics, Proteomics and Bioinformatics Research Initiative (CGPBRI) formed a consortium to sequence the papaya genome for the Hawaii Papaya Genome Project [7]. The knowledge that can be derived from the sequencing of the papaya genome can be utilized in further improving the quality and productivity of

425 papaya by molecular or conventional techniques. The consortium includes the Maui High Performance Computing Center, Hawaii Agricultural Research Center (HARC, US Department of Agriculture), the Pacific Telehealth & Technology Hui and Nankai University, China. This chapter focuses on recent advances in papaya transgenic technology and includes: (a) transformation systems for papaya, (b) development of economically important traits in papaya by genetic engineering, (c) strategies and constraints in the adoption of the technology and (d) safety aspects. Several review articles have dealt with the transgenic PRSV-resistant papaya [6,8–10] and papaya biotechnology [11]. Transformation systems for papaya Promoters The only promoter used in the transformation of papaya for various traits is the cauliflower mosaic virus (CaMV) 35S promoter (Table 1). This constitutive promoter can drive high levels of transgene expression in both dicots and monocots [12–13]. From the first PRSV-resistant GM papaya developed in the early 1990s [5,14] up to the present time, the CaMV 35S promoter continues to be the most common nonplant-derived promoter in use for transgenic technology for most commercial crops including papaya. Selection markers The availability of selectable markers is an integral part of any plant transformation strategy. The most common selectable marker gene used in the production of transgenic papaya is the neomycin phosphotransferase (nptII) gene that confers kanamycin resistance. This marker has been used by several research groups in the development of PRSV-resistant papaya [15–24], resistance to mites [25] and Phytophthora [26], aluminum and herbicide tolerance [27–28], delayed ripening trait [29] and production of vaccine against cysticercosis [30] and tuberculosis [31] (Table 1). Other selectable markers used in papaya are herbicide resistance genes, antimetabolite selectable markers and fluorescing protein-encoding genes. The bar gene encodes the enzyme phosphinothricin acetyl transferase (PAT) which can confer resistance to herbicides such as glufosinate or BASTA. Cabrera-Ponce [28] utilized the bar and nptII selectable markers and obtained papaya plants which could withstand up to 3% w/v of phosphinothricin, 3 to 5 times higher than the dose recommended for field application. Souza et al. [32] recovered transgenic papayas using herbicide concentrations in excess of 125 mM glufosinate applied to papaya somatic embryos. The antimetabolite selectable marker gene pmi (phosphomannose isomerase) [33] encodes an enzyme that catalyzes the reversible interconversion of

426 Table 1. Transformation systems used in genetic engineering of papaya.

Promoter CaMV35S (Cauliflower mosaic virus 35S) Selection markers nptII (neomycin phosphotransferase II)

bar/PAT (bialaphos resistance/ phosphinothricin acetyltransferase) hph gus (beta-glucuronidase) Green fluorescent protein (GFP) pmi (phosphomannose isomerase) Delivery Agrobacterium Particle bombardment

Trait

References

All traits mentioned below

Cited below

PRSV resistance Resistance to mites Resistance to Phytophtora Herbicide resistance Delayed ripening Vaccine against cystercercosis and tuberculosis Herbicide resistance Vaccine against cystercosis Vaccine against tuberculosis Vaccine against cystercosis and tuberculosis Herbicide resistance Optimization study Optimization study

[15–24] [25] [26] [28] [29] [30–31]

PRSV resistance Vaccine against tuberculosis PRSV resistance Resistance to mites Resistance to Phytophtora Aluminum and herbicide resistance Long shelf life or delayed ripening trait Vaccine against cystercercosis

[28] [30] [31] [30–31] [28] [35] [34] [17,38–42] [31] [14,39,44–47] [35] [26] [27–28] [29,48–51] [30]

mannose-6-phosphate and fructose-6-phosphate. Plant cells lacking this enzyme are incapable of surviving on synthetic medium containing mannose, and therefore, the gene has been used to select for transformants on media containing the sugar mannose. Papaya embryogenic calli have little or no PMI activity and cannot utilize mannose as a carbon source. Earlier work showed that mannose at concentrations from 0.1 to 120 g/L alone or together with sucrose does not affect papaya secondary somatic embryo development [32]. Consequently, Zhu et al. [34] developed a transformation protocol for papaya using pmi as a selectable marker. This marker proved to be more efficient than either antibiotic or a visual marker for selection of transformants. Zhu et al. [35] also developed a transformation protocol using the green fluorescent protein (GFP) gene insert as a selectable marker.

427 Delivery systems Agrobacterium-mediated transformation Agrobacterium tumefaciens is a Gram-negative phytopathogen that causes crown gall disease, manifested as tumors of stem tissues in more than one hundred plant species mostly belonging to the dicot family [36]. The earliest experiments done in papaya involved the Agrobacterium infection of leaf discs and recovery of transgenic callus [37] but there was no regeneration of whole plants. The first transformation of papaya using Agrobacterium was reported by Fitch et al. in 1993 [17]. Since the late 1990s, various research laboratories in different countries have successfully used the Agrobacteriummediated transformation in generating transgenic papaya plants containing either the cp or the replicase viral genes of PRSV [38–42]. Particle bombardment technology Fitch et al. [14] described the transient and stable transformation of papaya (C. papaya L.) using various tissues such as immature, zygotic embryos, hypocotyl sections and somatic embryos using the nptII and gusA genes as the selectable marker and reporter gene, respectively. The same group in Hawaii [5] demonstrated an efficient gene gun transfer system of a construct containing the nptII, gusA and the PRSV-cp genes to 2,4D-treated immature zygotic embryos. A system for the production of transgenic papayas using zygotic embryos and embryogenic callus as target cells was also reported by Cabrera-Ponce et al. [28] using the bar and nptII genes as selectable markers and the gusA as the reporter gene. Gonsalves et al. [43] used the gene gun in transferring an untranslatable cp gene derived from PRSV HA 5-1 and some of the resulting transgenic lines were resistant to PRSV. These pioneering studies in the early and late 1990s paved the way for other research groups to use particle bombardment in developing transgenic papaya with novel traits such as virus resistance [39,44–47], delayed ripening [29,48–51] and other useful traits [25,26,30] (Table 1). Tissue culture system The pioneering work on the tissue culture system of Fitch and Manshardt [52] led to the successful transformation and regeneration of papaya transgenic plants. Their system is based on the protocol developed for the generation of embryogenic cultures in walnuts [53] and can be used with either Agrobacterium or biolistic transformation. The original tissue culture protocols have been modified and improved specifically with regard to induction of somatic embryos, transformation efficiency, increased dosage of antibiotic to minimize chimeras and regenerative capacity after transformation [41,49,54–57].

428 Under the Australia–Philippines–Malaysia international research collaboration funded by the Australian Centre for International Agricultural Research (ACIAR) and the corresponding country funding agencies, the tissue culture system developed by Fitch and Manshardt [52] was further modified and improved for the three papaya cultivars used in each of the participating countries [50]. Development of economically important traits in papaya by genetic engineering Resistance to pests and diseases Papaya has several economically important pests and diseases [3]. Among the insect pests, leafhoppers (Empoasca sp.) and mites, including the carmine spider mite (Tetranychus cinnabrinus Boisd.) cause serious damage to papaya plants. Aphids (Aphis sp.), which abound in various host weeds, attack papaya after their host weeds dry up and can transmit PRSV. PRSV, a potyvirus, is the number one viral disease in papaya-growing countries and, thus, is a major limiting factor in papaya cultivation. Papaya seedlings are subject to damping-off diseases caused by Phytophthora, Pythium and Rhizoctonia species. Anthracnose caused by Colletotrichum gloeosporioides infests leaf petioles and fruits. Postharvest diseases include Phytophthora stem-end rot (Phytophthora nicotianae var. parasitica), Phomopsis rot (Phomopsis caricae-papayae), anthracnose (C. gloeosporioides), black stemend rot (Phoma caricae-papayae and Lasiodiplodia theobromae) and Alternaria rot (Alternaria alternata) [58]. A comprehensive listing of the different pests and diseases of papaya is included in the OECD compendium on the biology of papaya [3]. Development of PRSV resistance in papaya PRSV is a major viral disease of papaya in Hawaii. It was discovered in 1948 by Jensen in Hawaii [59]. Reports of PRSV infestation of papaya farms were reported in Thailand [60], in Taiwan in 1975 [61] and in the Philippines in 1982 [62]. While PRSV is not a major problem in Australia, it is considered as an important threat to its papaya industry [4]. PRSV-infected papaya plants exhibit chlorotic leaves, ringed spots on the fruit and the upper part of the trunk, distortion of leaves which resembles the damage of mites, and depressed fruit production and eventually the infected plants die. Control of PRSV includes rouging infected plants and spraying with aphicides. However, rouging cannot stop the spread of the disease once it is established. Similarly, spraying with aphicides is often ineffective since the virus is transmitted to the plants before the aphids are killed [63]. Interplanting rows of non-host crops between papaya rows allows vectors to feed on the non-hosts before they feed on papaya, and thus reduces transmission of disease and incidence [6]. Inoculation of a mild strain of

429 PRSV or a mutated virus can result in cross protection; however, the protection was found to be temporary and ineffective perhaps due to the mutability of the virus [64]. Resistance to PRSV in papaya has not been identified in the C. papaya germplasm and thus, no resistant variety of papaya has been developed although PRSV-tolerant varieties have been produced [9,65]. An example of a PRSV-tolerant papaya variety is the Tainung No. 5 which has had poor acceptance due to inferior consumer qualities [66]. The hybrid Sinta papaya developed by Dr. Violeta Villegas in the Philippines also exhibits tolerance to PRSV and can provide a good harvest for the farmer even if infected but will eventually decrease in production because of infection. Another strategy to control PRSV is the breeding of resistant varieties by interspecific hybridization of C. papaya with Vasconcellea sp. Interspecific hybridization studies, summarized in the consensus document on papaya [3], had limited success due to incompatibility problems and production of infertile hybrids. More recently, Drew et al. [4] reported that their intergeneric crosses between Vasconcellea quercifolia and C. papaya have produced fertile hybrids. Moreover, backcrossing has produced several fertile intergeneric hybrids with PRSV resistance. In the mid-1980s, with the success in the development of genetically modified crops such as maize, cotton and soybean, the complete molecular characterization of PRSV [67] and the difficulty of obtaining resistant varieties of papaya through conventional methods of breeding, scientists from Cornell University and University of Hawaii initiated the development of PRSV-resistant papaya by genetic engineering. The most common strategy used to protect against PRSV has been to develop transgenic papaya plants expressing the PRSV-cp. These plants exhibit ‘‘pathogen-derived resistance’’ through a process that might be similar to the natural phenomenon of viral cross protection [68–70]. The cp of the PRSV is the dominant viral gene and has been the preferred choice of scientists in 11 countries to develop PRSV-resistant papaya through genetic engineering [11]. The second preferred viral gene encodes the nuclear inclusion protein b (nib) that contains conserved motifs characteristic of RNA-dependent RNA polymerase of positive-strand RNA virus and is adjacent to the cp gene; this has been used by several researchers [39,40,42,71]. The US was the first country to develop genetically engineered PRSVresistant papayas, SunUp and Rainbow. Countries such as Jamaica, Taiwan and Thailand have already completed successful field testing but are still awaiting commercialization, while countries such as Australia, Malaysia, Philippines and Vietnam are still in the field testing stage. The Papaya Biotechnology Network of Southeast Asia facilitated by the International Service for the Acquisition of Agri-biotech Applications (ISAAA) was established in 1998 among national laboratories in Indonesia, Malaysia,

430 Philippines, Thailand and Vietnam. One of the major objectives of this international network is to develop PRSV-resistant papaya using modern biotechnological approaches and utilizing the local and preferred varieties of the participating countries [72]. Under this network, control of PRSV using genetic engineering through CP-mediated protection and antisense technology was adopted by participating institutions in different countries. Although the CP-mediated resistance for PRSV in papaya has been successfully developed in Hawaii, an alternative strategy is the antisense technology which confers resistance at the RNA level. In this case, the full-length viral gene of cp or Nib is not required. Vietnam is developing GM papayas for PRSV resistance using both CP-mediated protection and antisense technology [42]. The following discusses the different initiatives in various countries in developing PRSV-resistant papaya through genetic engineering. In Hawaii After the initial success transforming and regenerating transgenic papaya using a gene construct with the gusA reporter gene and the selectable marker nptII using particle bombardment technology [14], the next step was transformation with a construct containing the cp gene of a mild crossprotecting strain HA5-1 isolated in Hawaii. Resistant lines were generated [5,20,73] and one of these, R0 line 55-1, a red-fleshed Sunset female with a single insert of the cp gene was crossed with a non-transgenic Sunset. R1 progenies positive for the transgene were selfed and advanced until homozygous R4 plants still positive for the transgene were obtained. These advanced materials called SunUp were further crossed with non-transgenic Kapoho and the resulting hybrid was called Rainbow. Field tests and commercialization. Transgenic PRSV-resistant papayas [5] were field tested on a small-scale in 1992 [20] which was also the year when a severe PRSV virus epidemic devastated the Hawaiian papaya industry. In late 1995, a five-acre demonstration plot of Rainbow and SunUp was planted during a virus epidemic on the island of Puna and these lines showed dramatic resistance over the susceptible commercial cultivars (Fig. 1). In 1997 the US regulatory agencies approved the commercialization of transgenic papaya after completing their review of the product. Hawaii’s Papaya Administrative Committee (PAC) had to obtain license agreements with owners of the patented genetic engineering technology. The negotiated licenses include limitations-of-use and compliance provisions [74]. Growers must follow five provisions to comply with their contracts. These provisions are: (a) The transgenic papaya can be planted in Hawaii only. (b) Only PAC can sell seeds of the transgenic varieties ‘‘Rainbow’’ and ‘‘SunUp.’’ (c) Selling of fruits is limited to countries that have accepted genetically engineered papayas as safe for commercialization. (d) Attendance in an educational

431

Fig. 1. (a) Non-transgenic papaya plants (left) show effects of PRSV infection vs.

transgenic plants (right). (b) Aerial view of the field trial in Puna started in October 1993. Photo shows the healthy transgenic papaya plants surrounded by infected nontransgenic papaya trees (May 1997). (Photos used with permission from Dr. Dennis Gonsalves.)

432 session by producers is required. This session covers the requirements of the licenses and PRSV resistance management. (e) Producers are required to sign an agreement to buy seeds only from PAC. In Brazil PRSV is an important problem in Brazil, which is the largest producer of papaya in the world at 1.6 million tons per year, the third largest exporter to the US and the major exporter to Europe. A technology transfer program to develop PRSV-resistant transgenic papaya for Brazil was formed between the Gonsalves laboratory and Brazilian Agricultural Research Corporation (EMBRAPA) in the early 1990s. The transgenic papaya was developed in the US by visiting Brazilian scientists and then transferred to EMBRAPA. Translatable and nontranslatable cp genes were used as inserts in the transformation of the Sunrise and Sunset Solo varieties using particle gun bombardment on secondary somatic embryos derived from immature zygotic embryos. Fiftyfour transgenic lines were regenerated into whole plants, 26 of them contained the translatable version while 28 had the untranslatable cp gene. Inoculation of the cloned R0 plants with three different virus isolates from Brazil, Hawaii and Thailand revealed some lines with mono-, double- and even triple-resistance under greenhouse conditions [47]. In China At least four groups are developing PRSV-resistant transgenic papaya in the People’s Republic of China, namely, the Huazhong Agricultural University in Wuhan, Hubei [38], Zhongshan University [75], Sun Yat-sen University and South China Agricultural University, the last three being in Guangzhou. In Huazong Agriculture University, Jiang et al. [38] transformed papaya (cv Sunset) using A. tumefaciens strain LBA4404 carrying the binary plasmid pGA482G containing the cp and nptII genes. The study focused on the development of an effective transformation method by adding a sonication treatment on embryogenic calli; nevertheless regenerated plants with the cp transgene based on PCR and Southern hybridization were obtained. Ye et al. [75] reported the field test of two transgenic papaya T1 lines with a replicase mutant gene derived from a strain of PRSV. This study focused on the virus resistance in the field and molecular characterization of the transgenes present in the two lines. The work in South China Agricultural University involved the use of the binary vector pBI121 without the gus gene containing the replicase gene [76]. The CP gene was also used in the initial work but proved ineffective against the virus. Commercialization in China. In August 2006, a press release in Beijing announced that the Chinese agriculture authorities granted approval for the commercialization of PRSV-resistant GM papaya to researchers of the

433 South China Agricultural University in Guangzhou [76]. Dr. Hauping Li, director of the Plant Virology Laboratory of the same university led the project initiated by his former mentor, Prof. Faan Huaizhong who studied the four strains of PRSV that infected papaya fields in four major southern Chinese provinces. The PRSV disease was first reported in 1959 in Guangdong Province and from there spread to the other papaya-producing provinces. Although there were three varieties of GM papaya developed, only one variety was approved for commercialization, the Huanong No. 1 which is a small ‘‘Solo’’ type papaya similar to the Hawaiian varieties. This variety will be made available to Chinese farmers through a Chinese seed company which will distribute the papaya as micropropagated seedlings to ensure quality and maintain the hermaphrodite character. According to the researchers, no breakdown of resistance occurred in the replicase-silenced GM papaya plants in the past 5 to 6 years. The Chinese researchers estimated that the cost of the papaya project to be about $250,000 USD from the beginning in 1990 up to the approval for commercialization in 2006. In Jamaica Transgenic papaya lines containing either of the two inserts, translatable and non-translatable cp genes, were evaluated for field resistance against PRSV [77]. Lines (R0) with the translatable cp gene showed 80% field resistance while lines with the non-translatable cp gene showed only 44% resistance. The R1 progenies showed similar levels of resistance compared to the parentals. The transgenic lines were found to be gynodioecious with red flesh, weighing between 260 g and 536 g and sweetness levels of 11.5–13.51Brix. These lines possessed desirable horticultural characteristics and effective viral resistance for the eventual development of a final product with acceptable commercial value. In Indonesia The cp gene was introduced using particle bombardment into two Indonesian varieties of papaya, namely Bangkok and Burung [44]. The construct p2K7/ BICP consisted of the isolated cp gene from a local PRSV strain (Bogor, West Java) under the control of the CaMV 35S promoter. Co-transformation with p2K7 vector containing the cp gene and pRQ 6 containing the gusA and hygromycin phosphotransferase (hph) was done using biolistic delivery. In Malaysia Binary vectors for Agrobacterium-mediated transformation were constructed based on the plant expression vector pMON54904B with two PRSV viral genes, cp and Nib [39]. This Monsanto binary vector (pMON54904B) contained the CaMV 35S promoter with a duplicated enhancer region, the hsp17.9 leader sequence derived from soybean and the 35S 3u UTR. The marker gene for selection in this binary vector is the nptII for kanamcycin

434 resistance. An additional binary vector was made with the cp gene with a B250 bp inverted repeat of the cp gene inserted downstream of the stop codon. Immature zygotic embryos derived from the variety Eksotika were used in co-cultivation experiments with Agrobacterium using a modified method of Ying et al. [57]. A total of 87 transgenic lines were generated from all constructs and are in various stages of field trials. In the Philippines The first initiative to develop transgenic papaya with PRSV resistance was undertaken at the Institute of Plant Breeding (IPB), the Philippine’s national breeding center for all crops (except rice), under the College of Agriculture, University of the Philippines Los Ban˜os. The construct contained the cp gene derived from a PRSV virulent strain isolated in Cavite [45]. A total of 188 targets were derived from 7,845 primary somatic embryos and were bombarded with the pCP-LBP plasmid containing the cp gene using the particle gun. Three months after the bombardment, a total of 48 individual transformation events regenerated into small plantlets. Further, 359 putative transgenic plantlets were produced which were morphologically similar to non-transgenic control plants [46]. All the R0 transgenic lines had moderate to high susceptibility to PRSV. However resistant PRSV R1 lines were derived from R0 lines indicating that the R0 lines were hemizygous for the introduced cp gene and thus selfing was necessary to obtain a full complement of the gene. Another study was initiated under the ISAAA’s Papaya Biotechnology Network of Southeast Asia, which utilized the Agrobacterium-mediated transformation (ABI strain) of papaya somatic embryos [40,78]. Different gene constructs derived from the pMON vector cassettes (pMON 65,306, 65,307 and 65,310) were used containing also different inserts such as the cp (941 bp), Nib (1,574 bp) and a cp inverted repeat fragment (B250 bp). These constructs contain a leader sequence derived from the soybean’s hsp17.9 gene to increase translational efficiency of the transgene. The nptII was also incorporated in the pMON vector as the selectable marker. A total of 1,348 somatic embryos were transformed using Agrobacterium and 200 independent transgenic lines were regenerated. Confined field trial. The Institute of Plant Breeding (IPB) of the University of the Philippines Los Ban˜os is presently conducting confined field trial of candidate lines of GM papaya for PRSV resistance. A confined field trial is an intermediary stage between greenhouse testing and an open field trial. The first two are supervised and regulated by the National Committee on Biosafety of the Philippines (NCBP), while the third is supervised and regulated by the Bureau of Plant Industry (BPI) of the Department of Agriculture (DA). The main objective of this NCBP-regulated confined trial is the disease evaluation screening for resistance to PRSV of three candidate

435 T3 lines. A total of 135 inoculated seedlings plus 45 uninoculated and 45 inoculated ‘‘Davao Solo’’ seedlings (control) were planted [79]. In Taiwan Embryogenic tissues derived from immature zygotic embryos of the Tainung No. 2 were used for Agrobacterium-mediated transformation using the binary vector pBGCP [41] containing the cp gene of the PRSV YK strain, a severe virus strain from Taiwan [80] and the nptII selectable marker gene. A total of 38 transgenic lines were tested for PRSV resistance with two lines showing immunity (no symptoms in 4 months), nine highly resistant lines (4–7-week delay in symptom development with attenuation) and eight moderately resistant lines (3–4-week delay in development of severe symptoms) [23]. Field trial. Three transgenic lines were selected for evaluation under field conditions [23]. All three lines were female producing typical fruits not different from the non-transgenic Tainung No. 2 female plants. Fruit yield ranged from 30 to 50 kg per tree in contrast to 0–20 kg of the diseased controls. In a first field trial, 0%–0.2% of the three transgenic lines (100 plants each) were infected with PRSV 12 months after planting, while the control plants were 100% infected 8 months after planting. In a second trial where the transgenic plants were planted adjacent to a diseased orchard, infection rates of 10%–20% were observed 5 months after transplanting. In this trial, the control plants were all infected 3 months after planting. Because infection occurred in the early stage, the control plants did not produce any significant amount of fruits. In Thailand The Thai papaya varieties Khak Dum and Khak Nual were transformed using microprojectile bombardment by Thai scientists at Cornell University (USA) in 1995 [6]. After 2 years, the research team returned to Thailand with two transformed varieties and further conducted breeding and analysis under greenhouse conditions at the research station in Tha Pra, Khon Kaen Province. In selection set 1, three R3 lines (from Khaknuan variety) showed excellent field resistance to PRSV (97%–100%) and had a yield of fruits 70 times higher than non-transgenic Khaknuan papaya [81]. In selection set 2, one R3 line (Khakdam variety) showed 100% field resistance. A small-scale field trial of the R2 transgenic line (KN116/5) for its PRSV resistance and agronomic qualities was conducted from June 2003 to July 2004 in the field testing facility of the Plant Genetic Engineering Unit of BIOTEC, Kasetsart University located in Nakhon Pathom [82]. KN116/5, an advanced transgenic line derived from the Thai papaya variety Khak Nual, was found to be highly resistant (97%) to PRSV infection during the oneyear field test while the non-transgenic plants were all infected 2 months after planting. Fruit yield from the transgenic line was approximately 40 times

436 higher compared to infected non-transgenic controls. The average fruit weight of transgenic papayas varied from 1.7 to 2.4 kg/fruit with an average total soluble solids content of 121Brix. On the other hand, fruits obtained from non-transgenic papayas were small and blemished with ringspots on the fruit surface. An independent study was initiated by the National Center for Genetic Engineering and Biotechnology and the Plant Genetic Engineering Unit (PGEU) of Kasetsart University to develop PRSV-resistant transgenic papaya similar to the project initiated in the Gonsalves laboratory. A field trial was also ongoing at Kasetsart University in 2004 when the moratorium on field testing of GE crops forced all such activities to a halt in Thailand. In Venezuela PRSV has always threatened the commercial production of papaya in Venezuela and in 1993, the University of Los Andes linked up with Cornell University for the transfer of the transgenic technology to develop PRSVresistant papaya in Venezuela. In collaboration with Dennis Gonsalves, the first transgenic papaya in this country was developed by the said university using the cp gene from two different geographically isolated PRSV strains isolated from local varieties of papaya grown in the Andean foothills of Merida. After the Agrobacterium-mediated transformation and regeneration of whole plants, four PRSV-resistant R0 plants were intercrossed or selfpollinated and the resulting progenies were found to be resistant against the two different PRSV strains under greenhouse conditions [83]. In Vietnam Five papaya varieties (KD Thai, Tim Taiwan, Solo, Mexico and Local Lansom) were chosen as the target for Agrobacterium-mediated transformation using three different plasmid constructs based on pMON65304, pMON65305 and pMON65309, and the A. tumefaciens ABI strain [42]. These constructs contained the cp (in sense and antisense orientation) and nib genes from a Vietnamese PRSV strain. Twenty-nine transgenic lines were generated using the three different constructs and evaluated under greenhouse conditions. Developing resistance to mites in papaya Mite infestation causes major damage to papaya plantations in Hawaii [84–85]. The transgenic PRSV-resistant cultivar Rainbow is, however, susceptible to both the leafhopper and mites since its female parent, SunUp, and male parent, Kapoho, are very susceptible to the leafhopper and mites, respectively. To enhance papaya resistance to the carmine spider mite, McCafferty et al. [25] transformed a commercial variety of papaya with the gene for chitinase from Manduca sexta (msch). A chitinase gene was previously introduced into tobacco resulting in reduced feeding damage and

437 stunted growth of the larvae of the tobacco budworm [86]. Embryogenic calli of papaya were bombarded with the plasmid pBI121 containing the msch gene under the control of CaMV 35S promoter and the nptII gene under the control of the nopaline synthase promoter as selectable marker. Nineteen independent lines were identified after selection with geneticin (G418) and confirmed to be transgenic by PCR. The presence and expression of the msch gene were likewise confirmed by RT-PCR. Chitinase activity was higher by up to 52% in the transgenic leaf extracts compared to control. Bioassays performed in the laboratory showed that the plants expressing the msch gene significantly inhibited the multiplication of the mites. Under field conditions, the number of mites on most transformed lines was significantly lower than the control Kapoho. Two lines, T-23 and T-14 had significantly higher mite counts than control. However, by the end of 10 weeks, the control plants died while lines T-23 and T-14 had grown new leaves. These results indicate a greater tolerance of the transgenic lines to the mites. Developing resistance to phytophthora in papaya Papaya is highly susceptible to Phytophthora palmivora at the seedling and mature stages causing fruit and root rot particularly during the rainy season and in poorly drained soil [87]. To improve the resistance of papaya to Phytophthora, Zhu et al. [26] introduced the defensin gene from Dahlia merckii by particle bombardment in embryogenic calli of papaya. The dahlia defensin has been shown to inhibit the in vitro growth of a broad range of fungi [88–89]. The defensin gene has also been introduced in different crops resulting in enhanced resistance to their respective fungal pathogens, e.g., radish defensin in tobacco vs. the leaf pathogen Alternaria longipes [90] and tomato vs. A. solani [91]. The gene construct used by Zhu et al. [26] contained the defensin gene from dahlia driven by CaMV 35S promoter and the nptII gene under the control of the nopaline synthase (NOS) promoter as selectable marker. Twenty-one geneticin resistant calli were selected from 20 bombarded plate cultures. These putative transformation events were confirmed by PCR using specific primers for the dahlia gene and by an ELISA assay for the NPT II protein. The defensin was estimated to range from 0.07% to 0.14% total soluble protein (TSP) in callus and from 0.05% to 0.08%TSP in the leaves of young plants. The mycelial growth of P. palmivora was inhibited by 35%–50% by leaf extracts of the transgenic lines. Further, inoculation experiments in the greenhouse showed that defensin expressing transgenic papaya plants had increased resistance against P. palmivora. The roots of the infected transgenic papaya were 40%–50% heavier than infected control plants. Increased resistance was associated with shorter growth of the hyphae of P. palmivora at the infection sites. These results indicate that defensin expression in papaya could be a good strategy to enhance resistance of papaya to P. palmivora.

438 Development of aluminum and herbicide tolerance in papaya Acid soils affect about 40% of the arable land worldwide. Aluminum (in the form of Al3þ in acid soils) is toxic to most plants. Organic acid excretion by crops is always associated with tolerance to aluminum. Thus, overproduction of an organic acid in a crop either by conventional breeding or through genetic engineering could address this major problem of aluminum toxicity. De la Fuente et al. [27] reported the production of transgenic papayas by particle bombardment with constructs driving the overexpression of the citrate synthase (cs) gene from Pseudomonas aeruginosa. Lines expressing the cs gene accumulated and released two to three times more citrate than control plants. The transgenic lines were able to form roots and grow in solutions containing up to 300 mM of aluminum unlike the control plants. Overexpression of cs in tobacco resulted in even more dramatic results, with transgenic plants producing four-to-ten-fold of citrate over control plants. The results of the study demonstrated that excretion of organic acid is a mechanism of aluminum tolerance in plants. In their efforts to increase the efficiency of particle bombardment transformation methods in papaya, Cabrera-Ponce et al. [28] utilized a construct containing phosphinothricin (bar) and nptII resistance genes, and the gus gene (uidA) under the control of CAMV 35S promoter. The incorporation of the transgenes in the transgenic plants was confirmed using a histological fluorimetric assay for GUS, a NPT assay and Southern analysis. To assess the reaction of the transgenic plants to herbicide, phosphinothricin was applied on leaves of transgenic plants and control. Transgenic plants withstood applications of the herbicide while the control plants were very sensitive and showed total necrosis two weeks after application. The transgenic plants were tolerant to herbicide at concentrations 3 to 5 times higher than recommended for field applications and thus show potential for use in commercial plantations. Development of long shelf life papaya Aside from pest infestation, a major problem of papaya, like other fruits, is postharvest losses which could reach 30%–40% of production. After harvest, undesirable environmental and physical conditions during handling, storage, transport, distribution and retail can dramatically reduce the quality of papaya fruits, resulting in poor appearance and suboptimal texture and flavor. In general, papayas have a short shelf life of 4 to 5 days at room temperature of 25 1C–28 1C and up to 3 weeks at lower temperatures of 10 1C–12 1C [92]. When stored at 15 1C and 20 1C, the quality and, thus, marketability of papaya fruit are affected primarily by flesh softening and shriveling indicating overripeness [93]. Like other tropical fruits, papaya fruits are sensitive to low temperatures below 10 1C and may develop

439 symptoms of chilling injury such as pitting of the skin, hard lumps around the vascular bundles, scald, water soaking of the flesh and abnormal ripening with uneven coloration and greater susceptibility to diseases [93–95]. Several strategies have been adopted to prolong the shelf life of papaya or delay ripening by genetic engineering: (1) suppressing the production of ethylene by blocking the synthesis of key enzymes such as ACC synthase (ACS) or the ACC oxidase (ACO) and (2) suppressing the synthesis and activity of cell wall degrading enzymes like polygalacturonase (PG). Climacteric fruits, including papaya, are characterized by an increased respiration rate at an early stage in the ripening process accompanied by autocatalytic ethylene production. In climacteric fruits, ACC synthase and ACC oxidase mRNAs accumulate sequentially with the rise in ethylene evolution [96]. Thus, inhibiting the production in ethylene production by blocking the synthesis of ACC synthase or ACC oxidase have been shown to delay ripening and increase fruit life in other climacteric fruits such as tomato and cantaloupe melons [97–99]. Suppressing ethylene production strategy The authors were involved in a tri-country (Australia, Philippines and Malaysia) collaborative project under a grant from the Australian Centre for International Agricultural Research (ACIAR) from 1997 to 2005 [100]. Two different ACS cDNA fragments were isolated from ripe fruits of papaya (variety Solo Kapoho or Philippine Solo) using RT-PCR and degenerate primers designed based on the conserved regions of the ACS protein [29,101]. acs1 (Gen Bank Accession number AF 178076) is 1194 bp long and codes for 397 amino acids while the acs2 (AF 178077) is 1192 bp long and codes for 396 amino acids. Northern blot analysis showed that only acs2 transcript was detectable during ripening of the Solo papaya, with a maximum at 60% ripe and leveling off at the 100% ripe stage. However, acs1 was detectable at the green mature stage. Similar results had been reported earlier by Mason and Botella [101] on the same two ACS genes in papaya var. Australia 2001. However, the ripening-related gene in the Philippine papaya hybrid Sinta was found to be only of the acs1 type although at least four isoforms of the transcripts possibly generated by different RNA splicing were detected [102]. Gene constructs were prepared using pGTVa as the primary vector containing acs2 in antisense orientation and pGTVb as the secondary vector which contains the selectable marker nptII expression cassette [29]. Optimum bombardment conditions for transforming somatic embryos of papaya were determined using transient expression of GUS [49]. After bombardment, tissues were allowed first to recover from the bombardment for 1 month after which they were subjected to selection on a medium containing kanamycin [50]. Untransformed tissues stopped growing, became yellow, chalky white or

440 bleached and eventually died, while, putatively transformed tissues were yellowish or golden yellow and grew well on kanamycin-containing medium. Putative transgenic tissues were regenerated into plantlets which were hardened in a humidity chamber and transferred to a biosafety level (BL2) screenhouse for growing out and evaluation. Based on molecular and phenotypic analysis [103], twelve papaya trees were selected. The bases for selection were: (1) molecular (presence of antisense acs2 and nptII selection marker), (2) outstanding fruit qualities including delayed ripening trait, (c) high yield and (d) overall tree stand. Hermaphrodite trees were also preferred over female trees to enable faster attainment of homozygosity and stability of the transgenic trait. In general, the selected trees had a good stand with normal sigmoidal growth and prolific growth habit producing 15–48 fruits upon reaching the first sign of ripening (color break). Large farms harvest fruits at the green mature stage. However, the ‘‘green mature’’ stage is difficult to assess resulting in a number of prematurely picked fruits that eventually translate in a lack of uniformity in terms of aroma, texture, taste and sweetness. We adopted the practice of small farmers and breeders to harvest fruits at color break (about 10% yellow) at which point, the fruits are ready to ripen. The fruits of the selected transgenic papaya lines exhibited similar number of days from color break to full color of 6–7 days compared with 5–6 days for control non-transgenic fruits. However, the number of days from full yellow to fully ripe stage was more pronounced and significant: 4–14 days for selected transgenic lines compared with 2 days for control non-transgenic papayas. Evaluation of the fruit was done at ambient room temperature of 28 1C–30 1C. This is the first proof of concept that the strategy of controlling the ethylene production at the gene level can delay the ripening of a fruit from a tree [103]. Figure 2 shows fruits of selected transgenic and control at fully ripe stage. The transgenic fruits exhibited 11–141Brix total soluble solids similar to control. Among the quality traits determined, softening was most significantly different between the transgenic and non-transgenic fruit. The transgenic papaya fruit stayed firm from 4 to 14 days after reaching full yellow stage at room temperature (28 1C–30 1C) while the non-transgenic control fruit lost firmness 1 to 2 days after the full yellow stage. From color break to the 95% yellow stage, the transgenic and control fruits had similar firmness of c130 N and at full yellow stage, they also exhibited similar firmness of 111 N and 113 N, respectively. However, the non-transgenic control fruit continued to lose firmness while the transgenic fruit exhibited a slower rate (negative hyperbolic curve) of loss of firmness [103]. At 12 days after full yellow, transgenic papaya fruit had firmness of 73 N compared with control of 12 N. Further biochemical analysis showed substantial equivalence of the transgenic papaya fruits with delayed ripening trait with control fruits

441

Fig. 2. At a similar stage of 12 days after full yellow, selected transgenic papaya

with delayed ripening trait (upper photo) was still firm at 94 N compared with control papaya (lower photo) which was much softer at 12 N. (From Cabanos et al. [103].)

442 in terms of proximate chemical composition, beta-carotene, vitamin C and benzyl isothiocyanate contents. On the Australian project side, about 100 transgenic trees were produced and field tested [48,100]. The fruits from transgenic trees exhibited increased shelf life of up to two weeks from color break similar to the results obtained by the Filipino team with no change in other characters. This technology was patented by Dr. Jimmy Botella and the University of Queensland (US Patent No. 67124525) covering ACC synthase genes from papaya, pineapple and mango with utility to produce transgenic plants ‘‘in which the expression of the ACC synthase is substantially controlled to affect the regulation of plant development, particularly, fruit ripening’’ [104]. The Malaysian group in MARDI also initiated research on developing papaya var. Eksotika by genetic engineering using antisense ACC oxidase [105] and ACC synthase [106] which they reported to be in contained field evaluation [107–108]. Attempts to develop papaya with delayed ripening trait at the Research Institute for Food Crops Biotechnology in Bogor, Indonesia, were initiated in 2001 using antisense ACC oxidase by particle bombardment of embryogenic calli [44]. Neupane et al. [109] cloned ACC synthase and ACC oxidase cDNAs from partially ripe papaya fruits (30% yellow). The cloned cDNAs were used as probes to isolate full-length genes of ACC synthase and ACC oxidase cDNAs from a library made from 30% yellow papaya fruits. They reported a single ACC synthase gene and a multigene ACC oxidase family and the transformation of papaya with sense and antisense ACC synthase. However, the fruit of the transgenic plants did not exhibit delayed ripening or decreased fruit softening [11]. Reducing softening strategy This strategy focuses on modifying the expression of genes involved in delaying the softening of tissues. In 2006, Pais et al. [110] were granted a US patent (7,084,321 B2) for their invention ‘‘Isolated DNA molecules related to papaya fruit ripening.’’ The patent included the DNA sequences of three genes, pectin methylesterase, b-galactosidase and polygalacturonase isolated from papaya and methods in promoting or delaying papaya fruit ripening through their effects on tissue softening. Paull and Jung Chen [111] obtained a patent for the cloning and isolation of xylanase genes which can be utilized to create transgenic plants whose growth, abscission, dehiscence and/or fruit and vegetable ripening characteristics can be controlled. In Malaysia, attempts to delay the softening of papaya have been undertaken by introducing b-galactosidase in antisense orientation to reduce the activity of said cell wall degrading enzyme [106,108].

443 Fitch [11] mentioned in her review that papaya plants have been transformed with antisense polygalacturonase and endoxylanase and transgenic plants were in greenhouse tests. However, no further publication or announcement has come out regarding these activities. While this strategy, if successful, can inhibit and delay the softening of tissues and thus prolong the firmness of the fruit, all other attributes of ripening will still proceed at normal rates. However, the strategy aimed at inhibiting ripening-related ethylene production can result in the delay or lowering of ethylene production and in the delay in the formation of the various attributes of ripening. Thus, so far, only the introduction of antisense acs2 in papaya plants from the work in the authors’ laboratories [48,100,103] has produced delayed ripening in papaya and has demonstrated proof of concept. Their work showed that the delay in the ripening was accompanied by prolonged firmness of the tissues and slightly longer time to attain full coloration. Field testing Transgenic papaya trees developed to have fruits with the delayed ripening traits have been field tested in Queensland [48,100] and in the Philippines [112]. In March 2007, after obtaining the permit for field testing the transgenic papaya, 194 seedlings of transgenic papaya representing four events and control papaya were planted in a field in Laguna, Philippines under the supervision of representatives from the Philippine biosafety regulatory bodies. This constituted the first field testing of a homegrown biotech crop in the Philippines. All plants of the three events of advanced generations were positive for antisense acs2 and negative for the kanamycin resistance gene marker. One line, T0218 was positive for both. The control plants were negative to both genes. By May 2007, several plants started to flower. Infection of plants by papaya ringspot virus was observed. Differential reaction of the trees to PRSV, indicating varying degrees of tolerance, was noted. Fruits were still obtained from the infected transgenic plants and some of the control plants. To address the PRSV problem, selected lines of the transgenic papaya with delayed ripening trait will be crossed with PRSV-resistant backcross of C. papaya x V. quercifolia from the collaborative project of Dr. Rod Drew of Griffiths University and Dr. Simeona V. Siar of the University of the Philippines Los Ban˜os. Production of pharmaceuticals Vaccine against cysticercosis The use of transgenic papaya as a new antigen delivery system for the production of a vaccine against cysticercosis has recently been reported by Hernandez et al. [30]. Cysticercosis is an infectious disease that affects

444 humans through pigs which serve as host for the parasite Taenia solium. The vaccination of pigs could reduce or eliminate the transmission of this disease to humans. Three peptides, KETc1, KETc12 and KETc7, consisting of 12, 8 and 18 amino acids, respectively, were originally identified in T. crassiceps [113–114] and have been shown to have high protective capacity in piglets under endemic field conditions [115–116]. The development of an oral edible vaccine in plants could provide a better delivery system for both pigs and humans since both acquire T. solium eggs through ingestion. Embryogenic papaya cells were co-transformed with the pUI 235-5.1 vector containing either of three inserts for the above-mentioned peptides and the pWRG1515 plasmid containing GUS-A, hph gene (providing hygromycin resistance) or nptII gene (providing kanamycin resistance) using particle bombardment [30]. KETc1 and KETc12 were modified to contain additional six histidine residues to increase their size and aid in their identification. Embryogenic transgenic papaya clones were selected using hygromycin and kanamycin. Forty-one transgenic clones were obtained and the presence of the transgenes in the genome confirmed using RT-PCR and real-time PCR. Soluble extracts of the transgenic and control embryogenic calli were used to immunize female mice (BALB/cAnN). Antibodies induced by the transgenic extracts were histochemically detected in T. crassiceps tissues. Subcutaneous immunization with the soluble extracts of transgenic clones provided complete protection in about 90% of immunized mice. The authors’ strategy to propagate and use the calli instead of completely regenerating the tissues to trees is quite innovative. They argued that the cell culture of papaya was low cost and this eliminated the issues surrounding the release of transgenic plants in the environment. Vaccine against tuberculosis An initial study to develop a vaccine against tuberculosis in papaya was done by Zhang et al. [31] by introducing the esat-6 gene from Mycobacterium tuberculosis under the control of the CaMV 35S promoter and using the hph gene as selection marker for Agrobacterium-mediated transformation. Selected transgenic papaya plantlets were shown to have incorporated the gene by PCR and Southern analysis and expression was demonstrated by RNA blot analysis. However, it still needs to be shown if the protein produced by the transgenic plant is immunogenic when injected into test animals. Safety to the environment and food safety Transgenic papayas, like other genetically modified organisms, are subject to biosafety regulations regarding their potential impacts on the environment and on human and animal health. In general, risks to the environment include: (1) possible effects on non-target organisms such as beneficial

445 insects, mammals, wildlife, endangered or threatened species and the microbial community, (2) possibility of gene flow, (3) possibility of crossing with wild relatives and thus developing weedy relatives and (4) possible persistence in the environment. Food safety concerns include the possible presence of toxins and/or allergens. More specific concerns will depend on the particular strategy used. For example, a potential safety issue with virusresistant transgenic plants is heteroencapsidation, in addition to those already mentioned. Recently, Fuchs and Gonsalves [10] published a critical review of studies on the safety of virus-resistant transgenic crops which have been commercialized. Biosafety to the environment Heteroencapsidation or transencapsidation Heteroencapsidation or transencapsidation may result from any interaction between the coat protein (CP) expressed by the transgene and another virus infecting the same plant which can lead to synergism, recombination and heteroencapsidation. It is theoretically possible that the CP protein produced by transgenic papayas carrying the cp gene may interact with PRSV-W virus strains. PRSV-P and PRSV-W are closely related potyviruses. The former infects both papayas and cucurbits while the latter infects cucurbits but not papayas. In one study, the Thai transgenic papaya NK 116/5 R4 and R5 lines containing the PRSV-P cp gene were tested for the possibility of infection by a PRSV-W superinfecting strain under screenhouse conditions [117]. No disease symptoms were observed and there was complete absence of PRSV-W using the ELISA tests. Infection occurred after 2 weeks in pumpkins inoculated with PRSV-W as positive controls for the study. The presence of PRSV-W was confirmed in infected tissues using RT-PCR with Nib specific primers. These results showed that transencapsidation in the transgenic papaya NK 116/5 R4 and R5 lines did not occur during artificial PRSV-W inoculation. Moreover, after more than 10 years of commercialization of the PRSVresistant transgenic papaya, the emergence of virus species with undesirable characteristics has not been reported [10]. Effect on microbial community The possible effects of transgenic papaya plants on the microbial community of the rhizosphere have been studied under greenhouse conditions [118]. Transgenic and non-transgenic papayas were grown in large pots (1 m in diameter). Soil samples were taken from the rhizosphere level (15 cm depth) every 30-day intervals until the plant fruiting stage. Based on the Principal Component Analyses of the types and number of soil bacteria and population profile characterization, there were no distinct differences of

446 the microbial community in the soil samples where transgenic and nontransgenic papaya were grown. Wei et al. [119] conducted environmental studies which compared the soil properties, microbial communities and enzyme activities in the soil where transgenic papaya containing the PRSV replicase (rp) gene and nontransgenic papaya were planted under field conditions. The RP-transgenic papaya and non-transgenic papaya produced different soils in terms of arylsulfatase, polyphenol oxidase, invertase, cellulase and phophodiesterase enzyme activities. According to their study, the three soil enzymes (arylsulfatase, polyphenol oxidase and invertase) appeared to be more sensitive to the transgenic papaya than the others. The authors suggested that transgenic papaya could alter soil chemical properties, enzyme activities and microbial communities. In another study, Hsieh and Pan [120] studied the possible effects of PRSV-resistant transgenic papaya on soil microorganisms in different layers of soil (down to 15–30 cm) collected around the planting area of the papaya. The soil microorganisms in the upper layer and lower layer were W80% similar in soils planted with transgenic and non-transgenic plants using various analytical methods such as amplified ribosomal DNA restriction analysis, terminal restriction fragment polymorphism and denaturing gradient gel electrophoresis patterns. The authors concluded that planting PRSV-resistant transgenic plant had only limited effects on the soil microbial community. Lo et al. [121] used real-time PCR to detect the presence of transgene fragments in the soil samples from an isolated field where transgenic papayas were planted. Three DNA fragments were selected with different molecular sizes, namely 35S-P/PRSV-CP (a 796 bp fragment between 35S CaMV promoter and the CP), pBI121/NOS-T (a 398 bp fragment between the binary vector pBI121 and NOS terminator) and NOS-P/nptII (a 200 bp fragment between the NOS promoter and the nptII gene). Two DNA fragments, the 796 bp and the 200 bp fragments, were detected at very low levels, less than 30 pg per gram of soil (the detection limit of real-time PCR) while the 398 bp DNA fragment was present at 60 ng per gram of soil. The authors hypothesized that the higher GC content in the 398 bp DNA fragment might be one of the reasons of its greater persistence in the soil. This study also showed that soil DNA extracts did not transform two Acinetobacter spp. due to the very low concentration of transgenic nptII in the extract, indicating the very small possibility or none at all of the occurrence of bacterial transformation. Only one of the four studies reviewed in this chapter showed some changes in the soil properties and microbial community in soil planted to PRSVresistant transgenic papaya while the three others showed no or limited effects on the microbial community. In a comprehensive assessment of the effects of transgenic crops with 27 different traits on soil microbial

447 communities, Widmer [122] noted that many studies showed differences in soil microbiological characteristics between soils planted with transgenic and non-transgenic plants while many other studies showed no effects. Further, Widmer’s review [122] revealed that (a) environmental factors had a greater influence on soil microbiological characteristics than the transgenic crops, (b) effects were often restricted to the rhizosphere of the transgenic plant and (c) many of the effects were spatially and temporally limited. Widmer [122] recommended further studies to define which alterations in soil microbial characteristics should be considered as unacceptable damage to a soil system. Transgene flow A major concern especially among organic growers and exporters is the transgene flow through pollination from the transgenic papaya plants to nontransgenic ones. For example, Japan has not approved the sale of transgenic papaya and requires that papaya shipments to Japan do not contain transgenic fruits. To help minimize delay in shipment of papaya to this market, the Hawaii Department of Agriculture adopted an Identity Preservation Protocol (IPP) that growers and shippers need to comply with to receive an IPP certification [8,123]. This IPP process has facilitated the papaya shipments to Japan. According to Fuchs and Gonsalves [10], this also suggests that gene flow is quite low among papaya, considering also that most of the papaya plants in Hawaiian commercial plantations are hermaphrodites which are self-pollinated. According to Manshardt [124], preliminary studies in the island of Puna, Hawaii, showed that transgenic seeds were found in 7% of non-transgenic hermaphrodites and 43% of the female plants among the non-transgenic trees that immediately surrounded a large solid block of transgenic papaya plants. However, no transgenic seeds were obtained from PRSV-infected non-transgenic papaya plants 400 m away from the transgenic crop. In the Philippines, this is also a concern which is being addressed by researchers undertaking development of GM papaya. Thus, with the long shelf life transgenic papaya, a study on the gene flow from transgenic papaya to control papaya surrounding the transgenic papaya plants is being conducted [112]. Food safety Before any transgenic product is commercially released, it has to gain approval for its safety as food and feed to human and animals by the appropriate regulatory body. The FAO–WHO Codex Alimentarius Commission [125] has developed guidelines for the safety assessment of foods derived by modern biotechnology.

448 Allergenicity potential Viral proteins. For virus-resistant papaya, possible allergenicity could be due to the viral proteins that are expressed in the plants. Using the minimum length of six amino acids recommended by an FAO/WHO Expert Consultation [126] in 2001, the PRSV-CP sequence (EKQKEK), which is present in transgenic SunUp and Rainbow, is identical to a putative allergen determinant (ABA-1) of roundworms [127]. However, Hileman et al. [128] concluded that a threshold of six amino acids will not distinguish allergenic from non-allergenic proteins and will result in a large number of false positives. He instead recommended a minimum threshold of eight amino acids which is consistent with the International Life Science Institute/ Institute of Food Biotechnology Council (ILSI/IFBC) recommendations in 1996 in their decision tree approach which has been adopted internationally by GM food evaluators [129]. Moreover, a report showed that the ABA-1 protein is not an allergen by itself [130] vindicating that the criterion of identical six amino acid stretch may not be sufficient to judge potential allergenicity and, thus, there is a need for additional criteria. It was noted by Fuchs and Gonsalves [10] that virus-infected crops such as papaya, citrus and others have been consumed without any ill effects for many years. Papain. Papain is a protease which is found mainly in the latex of the unripe papaya fruit but is also present in the leaves and trunk and in the maturing fruit. Evidence shows that papain has caused allergenicity to workers exposed to it causing asthmatic reactions, rhinitis and contact conjunctival irritation [131]. The papain family of thiol proteases is known to immunologically cross react with other thiol proteases such as bromelin from pineapple and ficin from fig. Based on this review on transgenic papaya technology, the possible problem of papain allergenicity has not been raised in developing transgenic papaya probably because papain is naturally present in papaya, especially in the unripe fruit. However, as a possible unintended effect, researchers may well be advised to monitor the levels of papain in the transgenic papaya at various stages of development and ripening. Toxicity, unintended effects and substantial equivalence Benzyl isothiocyanate (BITC) is considered an antinutrient which has been found to be present in extracts of Cruciferae, Moringaceae, Capparidaceae, Tropaeolaceae, Caricaceae, Gyrostemonaceae and Salvadoraceae [132]. BITC is linked to incidents of spontaneous abortions in pregnant women and with the higher incidence of prostate cancer in Japanese men over the age of 70 [133] as well as to anticancer effects [134]. Results of the BITC assay in PRSV-resistant transgenic papaya [135] using the method of Tang [136] showed that at green mature stage, the total

449 potential BITC content in papaya ranged from 7.3 to 32.3 ppm in nontransgenic lines and 11.1–13.2 ppm in transgenic lines. At full yellow stage, total potential BITC level ranged from 1.3 to 3.5 ppm in non-transgenic lines and 1.7–1.8 ppm in transgenic lines. BITC levels in papaya drop 10–100 times from immature to ripe stage and thus BITC does not pose a problem in consuming the ripe fruit. The results also showed that the BITC values of the transgenic compared with the non-transgenic were similar. In their study of the transgenic papaya with delayed ripening trait, Cabanos et al. [103] observed that the total BITC contents decreased from 14.5 to 11 ppm at green mature stage to 5.5–7.6 ppm at full yellow stage. It was also observed that at mature green stage, the total potential BITC and free BITC had similar values (10–14 ppm) but at full yellow stage, the free BITC values were 0.7–1.5 ppm compared to 5.5–7.6 ppm total potential BITC. The authors noted that the BITC values in transgenic papayas were not significantly different from the control. The values for papaya were also 10–100 times lower than those reported for broccoli, brussels sprouts and cabbage [137]. The results indicate that the BITC in papaya does not pose a threat to human health. As an index of unintended effects, proximate chemical composition and contents of various nutrients are analyzed in foods derived by modern biotechnology. For the transgenic papaya with delayed ripening trait, contents of moisture, protein, crude fiber, fat, ash and carbohydrate, betacarotene and ascorbic acid were analyzed at three stages of fruit maturity (green mature, 10% yellow and full yellow) and found to be similar to the values obtained for control papaya and to values in the literature [103]. The results also indicate the substantial equivalence of the transgenic papaya with the control papaya. Strategies and constraints in the adoption of transgenic papaya technology Collaboration and networking characterize the efforts to transfer transgenic papaya technology from the industrial to developing countries. This review revealed that 14 different countries are involved in the development of papaya transgenic technology and they can be grouped into four categories, namely, the Gonsalves-associated group, the ACIAR group with Professor Jimmy Botella of the University of Queensland as Principal Investigator, the ISAAA-led group (Southeast Asia Papaya Biotechnology Network) and the independent groups from research institutions from various countries (Table 2). In some countries, the same research group formed two collaborations like MARDI of Malaysia which became involved with both the ACIAR- and the ISAAA-led groups. Thailand has two different research groups that are involved in the Gonsalves-associated and the ISAAA-led groups. The Philippines has two different research groups from the same research

450 Table 2. Countries involved in developing transgenic papaya technology. Gonsalves-associated

ACIAR group

ISAAA-led group

Independent

US (Cornell/Hawaii) Brazil Jamaica Thailand Venezuela

Australia Philippines Malaysia

Indonesia Malaysia Philippines Thailand Vietnam

China Japan Mexico US (Florida) US (Virgin Islands) US (Hawaii)

institution that are involved in the ACIAR- and the ISAAA-led groups. The US has the most number of research groups (at least four). Various independent research groups were identified in China, Indonesia, Japan, Jamaica, Mexico, Venezuela and Vietnam (Table 3). There was successful transfer of the transgenic technology in papaya from the Gonsalves research group in Cornell to other countries such as Brazil, Jamaica, Thailand and Venezuela. In Venezuela, the development of transgenic PRSV-resistant papaya and its evaluation under greenhouse testing progressed rapidly but the field testing within the country was stopped early on by pressures from the public who opposed transgenic agricultural products [83]. Similarly, in 2004, the successful field tests of two transgenic papaya lines which showed 97%–100% resistance to the virus were stopped by sustained efforts of Greenpeace and BioThai, a Bangkok-based NGO. In July 2004, Greenpeace trespassed into and held a demonstration in the experimental GM papaya field which was well covered by the local and international presses. Furthermore, Greenpeace claimed that the GM papaya had already escaped into farmers’ fields in 37 provinces in Thailand. This incident resulted into the government’s investigation which ultimately led to the Thai government’s policy of moratorium on all field trials of GM crops. Thus, the future of GM papaya in Thailand was deemed uncertain and would highly depend on a friendly policy on the application of transgenic technology in papaya [76]. While the transfer of transgenic papaya technology could be successfully undertaken through collaboration, networking and sufficient manpower, facilities and funding resources, problems in perception and social acceptability of the technology exist as shown by the experiences in Venezuela and Thailand. However, these problems are not unique to the transgenic papaya as other biotech or genetically modified crops still face consumer resistance, e.g., in the European Union and in Japan. Japan tops the list of countries that have approved the importation of biotech crops for food and feed use and for release into the environment (field tests), although Japan has not approved the importation of papaya [10]. A consumer attitude study in Japan in 2006 showed that 61% would be reluctant to eat GM food but this is down from the 80% figure obtained in a similar

451 Table 3. Academic research institutions involved in papaya transgenic technology. Grouping/ country Gonsalves group United States Brazil Jamaica Thailand Venezuela ACIAR group Australia Philippines Malaysia

Academic/research institution

Cornell University and University of Hawaii EMBRAPA University of West Indies, Biotech Centre Department of Agriculture University of Los Andes University of Queensland Institute of Plant Breeding, Crop Science Cluster, College of Agriculture (CA) University of the Philippines Los Ban˜os (UPLB) Malaysia Agricultural Research and Development Institute (MARDI)

ISAAA group Indonesia

Indonesian Research Institute for Agricultural Biotechnology and Genetic Resources (IABIOGRI), Agency for Agricultural Research and Development (AARD) Malaysia MARDI Philippines Institute of Plant Breeding CA UPLB Thailand National Center for Genetic Engineering and Biotechnology (BIOTEC) Kasetsart University Vietnam Institute of Biotechnology (IBT), National Centre for Natural Science and Technology Independent groups China Huazhong Agricultural University (Hubei) Zhongshan University South China Agricultural University (Guangzhou) Japan Japan International Research Center for Agricultural Sciences (Okinawa) National Agricultural Research Center for Hokkaido Region (Sapporo) Department of Agro-bioscience, Faculty of Agriculture, Iwate University Mexico CINVESTAD, Irapuato USA University of Florida University of Hawaii University of Virgin Islands

poll conducted in 2003 [138]. Another study concluded that a transformation in the consumer perceptions and attitude in Japan is needed before GM food can be successfully accepted by the Japanese consumer [139]. However, the adoption of biotech crops by about 55 million farmers worldwide has reached 114.3 million ha in 2007 in 23 countries, consisting of

452 12 developing and 11 industrial countries, and involving 12 crops including papaya [140]. A second source of limitation and/or constraint is the protection of intellectual property rights (IPR) of most biotechnologies. However, the IPRs of most of these biotechnologies are not protected in developing countries, since their owners only sought protection in selected countries usually in the industrial countries. Thus, developing countries can legally access such biotechnologies. However, they may not be technically capable to do so. In such cases, collaboration and networking with more advanced countries has helped in this regard. While patents are territorial, licenses and other contracts such as material transfer agreements (MTAs) are not territorial and will bind institutions in countries even where a technology which is the subject of the MTA or contract is not patented. A third source of constraint is the strict regulation of products of modern biotechnology. The regulatory system for modern biotech products should now be revised considering the accumulated experiences, lessons and scientific evidence from more than 20 years of field releases of GM crops and regulation and 18 years of commercialization. The overly strict regulation impacts not only on financial resources but also prolongs the time of development of any biotech crop. These should be considered in the revision of the biosafety regulatory system without losing its responsible and rigorous nature.

Concluding remarks Very promising advances have been reported in the improvement of papaya using genetic engineering techniques. Important problems such as disease resistance and fruit quality have been targeted and the results have been positive. Promising initiatives in the production of pharmaceuticals in papaya have been reported. Nevertheless a number of factors are delaying the widespread adoption of the technology. The most important one is the very stringent regulations that affect all transgenic crops, requiring exhaustive environmental and health tests that make it impossible or very difficult for small companies or government institutes to develop and commercialize their own varieties. Public perception is another issue that needs to be carefully considered although the success of the other big GM crops such as soybean, maize, cotton and canola is paving the way for the smaller commodities such as papaya. Finally, intellectual property issues (such as patents and plant variety rights) could further hinder the commercialization of new transgenic papaya varieties, although the fact that papaya is predominantly grown in developing countries could be an advantage since big corporations do not normally pursue patenting in those countries.

453 Acknowledgements The authors gratefully acknowledge the support from their respective institutions (University of the Philippines Los Ban˜os and University of Queensland), funding agencies (Australian Centre for International Agricultural Research, the Philippine Department of Science and Technology, Philippine Council for Advanced Science and Technology Research and Development, Philippine Council for Agriculture, Forestry and Natural Resources Research and Development, US Agency for International DevelopmentEMERGE and the Department of Agriculture Biotech PIU) and co-workers, staff and students who have participated in the various aspects of our research. Special thanks to Mr. Cerrone S. Cabanos for his help in sourcing many of the articles used in this chapter. We also sincerely thank Dr. Dennis Gonsalves for his kind permission to use two photographs in this chapter.

References 1. FAO. FAOSTAT database, 2005. Available at http://www.fao.org. 2. Eurostat. Statistics Yearbook. Papaya 2005 and 2006 intra and extra community imports by EU-25, Supplement July–August 2007. 3. OECD. Consensus document on the biology of papaya (Carica papaya), Paris, Organisation for Economic Co-operation and Development, 2005. Available at http:// www.oecd.org/ehs/. 4. Drew RA, Siar SV, O’Brien CM, Magdalita PM and Sajise AGC. Breeding papaya ringspot virus resistance in Carica papaya via hybridisation with Vasconcellea quercifolia. Australian J Exp Agr 2006;46:413–418. 5. Fitch MM, Manshardt RN and Gonsalves D. Virus resistant papaya plants derived from tissues bombarded with the coat protein gene of papaya ringspot virus. Bio/Technol 1992;10:1466–1472. 6. Gonsalves D. Control of papaya ringspot virus in papaya: a case study. Annual Rev Phytopathol 1998;36:415–437. 7. Hawaii Papaya Genome Project. Available at http://cgpbr.hawaii.edu/papaya/. 8. Gonsalves D. Transgenic papaya: development, release, impact, and challenges. Adv Virus Res 2006;67:317–354. 9. Gonsalves D, Vegas A, Prasartsee Vm Drew R, Suzuki J, et al. Developing papaya to control papaya ringspot virus by transgenic resistance, intergeneric hybridization and tolerance breeding. Plant Breed Rev 2006;26:35–78. 10. Fuchs M and Gonsalves D. Safety of virus-resistant transgenic plants two decades after their introduction: lessons from realistic field risk assessment studies. Annu Rev Phytopathol 2007;45:173–202. 11. Fitch MM. Carica papaya. Papaya. In: Biotechnology of Fruit and Nuts Crops, Litz RE (ed), CABI Publishing, 2005, pp. 174–207. 12. Battraw MJ and Hall TC. Histochemical analysis of CaMV 35S promoter-Xglucuronidase gene expression in transgenic rice plants. Plant Mol Biol 1990;15: 527–538.

454 13. Benfey PN, Takatsuji H, Ren L, Shah D and Chua NH. Sequence requirements of the 5-enolpyruvylshikimate-3-phosphate synthase 5u-upstream region for tissue-specific expression in flowers and seedlings. Plant Cell 1990;2:849–856. 14. Fitch MM, Manshardt R and Gonsalves D. Stable transformation of papaya via microprojectile bombardment. Plant Cell Reps 1990;9:189–194. 15. Chen G, Ye CM, Huang JC, Yu M and Li BJ. Cloning of the papaya ringspot virus (PRSV) replicase gene and generation of PRSV-resistant papayas through the introduction of the PRSV replicase gene. Plant Cell Reps 2001;20:272–277. 16. Davis MJ and Ying Z. Development of transgenic ringspot resistant papaya for Florida. Phytopathol 2002;(Suppl.)518 (Abstract). 17. Fitch MMM, Manshardt RM, Gonsalves D and Slightom JL. Transgenic papaya plants from agrobacterium-mediated transformation of somatic embryos. Plant Cell Reps 1993;12:245–249. 18. Mahon RE, Bateson MF, Chamberlain DA, Higgins CM, Drew RA and Dale JL. Transformation of an Australian variety of Carica papaya using microprojectile bombardment. Aust J Plant Physiol 1996;23:679–685. 19. Lines RE, Persley D, Dale JL, Drew R and Bateson M.F. Genetically engineered immunity to papaya ringspot virus potyvirus in Australian papaya cultivars. Mol Breed 2002;10:119–129. 20. Lius S, Manshardt RM, Fitch MM, Slighthom JL, Sanford JC and Gonsalves D. Pathogen-derived resistance provides papaya with effective protection against papaya ringspot virus. Mol Breed 1997;3:161–168. 21. Souza MT and Gonsalves D. Genetic engineering resistance to plant virus diseases: an effort to control papaya ringspot potyvirus in Brazil. Fitopatol Bras 1999;24:485–502. 22. Tennant PF, Ahmad MH and Gonsalves D. Transformation of Carica papaya L. with virus coat protein genes for studies on resistance to papaya ringspot virus from Jamaica. Trop Agric (Trinidad) 2002;79:105–113. 23. Yeh SD, Bau HJ, Cheng YH, Yu TA and Yang JS. Greenhouse and field evaluations of coat-protein transgenic papaya resistant to papaya ringspot virus. Proceedings of International Symposium on Biotechnology of Tropical and Subtropical Species (RA Drew, ed), Brisbane, Queensland, Australia, Acta Hort 1998;461:321–328. 24. Zimmerman TW and St Brice N. Selection of transgenic papaya seedlings using kanamycin and DMSO. In Vitro Cell Dev Biol 2003;39:47-A (Abstract). 25. McCafferty HRK, Moore PH and Zhu JY. Improved Carica papaya tolerance to carmine spider mite by the expression of Manduca sexta chitinase transgene. Transgenic Res 2006;15:337–347. 26. Zhu YJ, Agbayani R and Moore PH. Ectopic expression of Dahlia merckii defensin DmAMP1 improves papaya resistance to Phytophthora palmivora by reducing pathogen vigor. Planta 2007;226:87–97. 27. De la Fuente JM, Ramirez-Rodriguez V, Cabrera-Ponce JL and Herrera-Estrella L. Aluminum tolerance in transgenic plants by alteration of citrate synthesis. Science 1997;276:1566–1568. 28. Cabrera-Ponce JL, Vegas-Garcia A and Herrera-Estrella L. Herbicide resistant transgenic papaya plants produced by an efficient particle bombardment transformation method. Plant Cell Reps 1995;15:1–7. 29. Laurena AC, Magdalita PM, Hidalgo MSP, Villegas VN, Mendoza EMT and Botella JR. Cloning and molecular characterization of ripening-related ACC synthase from papaya fruit (Carica papaya L.). Proceedings of International Symposium on Tropical and Subtropical Fruits (R Drew, ed), Acta Hort 2002;575:163–169.

455 30. Hernandez M, Cabrera-Ponce JL, Fragoso G, Lopez-Casillas F, Guevara-Garcıa A, Rosas G, Leon-Ramırez C, Juarez P, Sanchez-Garcıa G, Cervantes J, Acero G, Toledo A, Cruz C, Bojalil R, Herrera-Estrella L and Sciutto E. A new highly effective anticysticercosis vaccine expressed in transgenic papaya. Vaccine 2007;25:4252–4260. 31. Zhang G-L, Zhou Z, Guo A-P, Shen W-T and Li X-Y. An initial study of transgenic Carica papaya used as a kind of vaccine for anti-tuberculosis. Acta Botanica Yunnanica 2003;2:223–229. 32. Souza MT, Venturoli MF, Coelho MC and Rech-Filho EL. Analysis of marker gene/ selective agent of transgenic papaya (Carica papaya L.) somatic embryos. Braz J Plant Physiol 2001;13:366–373 (Abstract). 33. Joersbo M, Donaldson I, Kreiburg J, Petersen SG, Brunstedt J and Okkels FT. Analysis of mannose selection used for transformation of sugarbeet. Mol Breed 1998;4:111–117. 34. Zhu YJ, Agbayani R, McCafferty H, Albert HH and Moore PH. Effective selection of transgenic papaya plants with the PMI/Man selection system. Plant Cell Rep 2005; 24:426–432. 35. Zhu YJ, Agbayani R and Moore PH. Green fluorescent protein as the sole selectable marker facilitates genetic transformation of papaya, Carica papaya L. Seventh International Congress of Plant Molecular Biology, Barcelona, Spain 2003;S26–S63, p. 388 (Abstract). 36. Agrios G. Plant Pathology, 2nd ed., New York, Academic Press, 1978. 37. Pang SZ and Sanford JC. Agrobacterium-mediated gene transfer in papaya. J Amer Soc Horti Sci 1988;113:287–291. 38. Jiang L, Maoka T, Komori S, Fukamachi H, Kato H and Ogawa K. An efficient method for sonification assisted Agrobacterium-mediated transformation of coat protein (CP) coding genes into papaya (Carica papaya L.). Plant Cell Reps 2005;24:426–432. 39. Pillai V, Daud HM, Flasinski S, Kaniewski WK, Ang OC and Kwok CY. Status of PRSV resistant papaya research in MARDI Malaysia. ISAAA Papaya Biotechnology Network of Southeast Asia, Hanoi, Vietnam, 22–26 October 2001. 40. Aquino MV, Flasinski S, Perez P and Pillai V. Agrobacterium-mediated transformation of Philippine papaya: progress report. ISAAA Papaya Biotechnology Network of Southeast Asia, Hanoi, Vietnam, 22–26 October 2001. 41. Cheng YH, Yang JS and Yeh SD. Efficient transformation of papaya by coat protein gene of papaya ringspot virus mediated by Agrobacterium following liquid phase wounding of embryo genetic tissues with carborundum. Plant Cell Rep 1996;16:127–132. 42. Nhan LD, Lien LQ, Son LV, Linh TM, Hoang NH, Phong DT and Binh LT. Progress report on transformation of CP, Nib, CP antisense and ACC oxidase antisense genes into papaya. Technical Workshop and Coordination Meeting. Papaya Biotechnology Network of Southeast Asia. Hanoi, Vietnam, 22–26 November 2001. 43. Gonsalves C, Cai W, Tennant P and Gonsalves D. Effective development of papaya ringspot virus resistant papaya with untranslatable coat protein gene using a modified microprojectile transformation method. Proceedingss of International Symposium on Biotechnology of Tropical and Subtropical Species (RA Drew, ed), Acta Hort 1998;461:311–319. 44. Damayanti D, Ambarwati, Utami TIR, Mariska I, Hutami S, Purnamaningsih R and Herman M. Transformation study of Carica papaya L. using CO-PRSV and antisense ACC oxidase genes via particle bombardment. UK Papaya Biotechnology Network Meeting, Vietnam, 22–26 October 2001. 45. Villegas VN, Magdalita PM, Valencia LD and Ocampo TD. Development of transgenic papaya resistance to ringspot virus. Country progress report at the Papaya Biotechnology Network of Southeast Asia held in Hanoi, Vietnam, 22–23 October 2001.

456 46. Magdalita PM, Valencia LD, Ocampo ATID, Tabay RT and Villegas VN. Towards development of PRSV resistant papaya by genetic engineering. In: New Directions for a Diverse Planet, Proceedings of the 4th International Crop Science Congress, Brisbane, Australia, 26 September–October 2004. 47. Souza MT, Nickel O and Gonsalves D. Development of virus resistant transgenic papayas expressing the coat protein gene from a Brazilian isolate of papaya ringspot virus. Fitopatol Bras 2005;30:357–365. 48. Botella JR, Tecson-Mendoza, Leeton P, Sargent H, Laurena AC, Siar SV, Sajise AG, Garcia RN, Cabanos CS, Angeles JGC, Perez BMY and Magdalita PM. Using genetic engineering to extend the life of papaya fruits. First International Symposium on Papaya, Malaysia, 22–24 November 2005, p. 53 (Abstract). 49. Magdalita PM, Laurena AC, Yabut-Perez BM, Tecson Mendoza EM and Botella JR. Progress in the development of transgenic papaya: transformation of Solo papaya using ACC synthase antisense construct. Proceedings of International Symposium on Tropical and Subtropical Fruits (R Drew, ed), Acta Hort 2002;575:171–176. 50. Magdalita PM, Laurena AC, Yabut-Perez BM, Zaporteza MM, Tecson-Mendoza EM, Villegas VN and Botella JR. Towards transformation, regeneration and screening of papaya containing antisense ACC synthase gene. In: Plant Biotechnology 2002 and Beyond, Vasil IK (ed), Netherlands, Kluwer Academic Publishers, 2003, pp. 323–327. 51. Pillai V, Zulkifli L, Awang K, and Abu Bakar UK. Transformation of Eksotika papaya with an antisense of the ACC oxidase gene. Asia Pacific Conference on Plant Tissue Culture Agribiotechnol, Singapore, 19–23 November 2000. 52. Fitch MM and Manshardt R. Somatic embryogenesis and plant regeneration from immature zygotic embryos of papaya (Carica papaya L.). Plant Cell Reps 1990;9:320– 324. 53. McGranaham GH, Leslie CA, Uratsu SL, Martin LA and Dandekar AM. Agrobacterium-mediated transformation of walnut somatic embryos and regeneration of transgenic plants. Biotechnology 1998;6:800–804. 54. Cai W, Gonsalves C, Tennant P, Fermin G, Souza M, Sarindu N, Jan F, Zhu H and Gonsalves D. A protocol for efficient transformation and regeneration of Carica papaya L. In Vitro Cell Dev Biol Plant 1999;35:61–69. 55. Fitch MM, Pang SZ, Slightom JL, Lius S, Tennant P, Manshardt RM and Gonsalves D. Genetic transformation in Carica papaya L. In: Biotechnology in Agriculture and Forestry, Bajaj YPS (ed), Plant Protoplasts and Genetic Engineering V. Berlin, SpringerVerlag, 1994, Vol. 29, pp. 236–256. 56. Litz RE and Conover RA. In vitro somatic embryogenesis and plant regeneration from Carica papaya L. ovular callus. Plant Sci Lett 1982;26:153–158. 57. Ying Z, Yu X and Davis MJ. A new method for obtaining transgenic papaya plants by Agrobacterium-mediated transformation of somatic embryos. Proc Fla State Hort 1999;112:201–205. 58. Kader AA. Produce Facts. Papaya: Recommendations for maintaining postharvest quality. Postharvest Technology Research and Information Centre, Department of Pomology, University of California Davis, 2000. Available at http://rics.ucdavis.edu/ postharvest2/Produce/ProduceFacts/Fruit/papaya.shtml. 59. Jensen DD. Papaya virus diseases with special reference to papaya ringspot. Phytopathology 1949;39:191–211. 60. Prasartsee V, Fungkiatpaiboon A Chompunutrarapa K. Preliminary studies of papaya ringspot virus in the Northeast. NE Regional Office of Agriculture Newsletter, Khon Kaen, Thailand, 1981;10:17–33 (in Thai).

457 61. Wang HL, Wang CC, Chiu RJ and Sun MH. Preliminary study on papaya ringspot virus in Taiwan. Plant Prot Bull 1978;20:133–140. 62. Opina OS. Studies on a new virus disease of papaya in the Philippines. Food Fert Technol Cent Bull 1986;33, Taiwan ROC. 63. Pernezny K and Litz RE. Some common diseases of papaya in Florida. Florida Cooperative Extension Service. Plant Pathol Fact Sheet 1999;35pp. 64. Lin CC, Su HJ and Wang DN. The control of papaya ringspot virus in Taiwan, ROC. FFTC Tech Bull 1989;114:1–3. 65. Cook AA and Zettler FW. Susceptibility of papaya cultivars to papaya ringspot and papaya mosaic virus. Plant Disease Rep 1970;54:893–895. 66. Japan International Research Centre for Agricultural Sciences. Papaya ringspot virus (PRSV) and papaya leaf distortion mosaic virus (PLDMV), 2003. Available at http:// www.jircas.affrc.go.jp/kankoubutsu/jarq/33-1/kiritani/kiritani2.html#table1. 67. Yeh SD, Jan FJ, Chiang CH, Doong TJ, Chen MC, Chung PH and Bau HJ. Complete nucleotide-sequence and genetic organization of papaya ringspot virus-RNA. J Gen Virol 1992;73:2531–2541. 68. Wang HL, Yeh SD, Chiu RJ and Gonsalves D. Effectiveness of cross protection by mild mutants of papaya ringspot virus for control of ringspot disease of papaya in Taiwan. Plant Dis 1987;71:491–497. 69. Yeh SD and Gonsalves D. Evaluation of induced mutants of papaya ringspot virus for control by cross protection. Phytopathol 1984;74:1086–1091. 70. Yeh SD, Gonsalves D, Wang HL, Namba R and Chui RJ. Control of papaya ringspot virus by cross protection. Plant Dis 1988;22:375–380. 71. Jia Hepeng. GM papaya hints at changing attitudes. Biodiversity clearing-house mechanism of China, 2007. Available at http://english.biodiv.gov.cn/zyxw/20073/ t20070302_101264.htm. 72. Hautea RA. The Papaya Biotechnology Network of Southeast Asia. 1st International Symposium on Papaya, Genting Highlands, Malaysia, 22–24 November 2005, p. 54 (Abstract). 73. Tennant PF, Gonsalves C, Ling KS, Fitch M, Manshardt R, Slightom JL and Gonsalves D. Different protection against papaya ringspot virus isolates in coat protein gene transgenic papaya and classically cross-protected papaya. Phytopathol 1994;84: 1359–1366. 74. Nishina MS, Ferreira SJ, Manshardt RM, Cavaletto CG, Llantero E, Mochida L and Perry D. Production requirements of the transgenic papayas ‘UH Rainbow’ and ‘UH SunUp’. Cooperative Extension Service, College of Tropical Agriculture and Human Resources, University of Hawaii at Manoa, 1998, 4pp. 75. Ye CM, Wei XD, Chen DH, Lan CY and Zhu LM. Analyses of virus resistance and transgenes for transgenic papaya. Yi Chuan 2003;25:181–184 (Abstract in English). 76. Davidson SN. The genetically modified (GM) PRSV-resistant papaya in Thailand: a case study for the agricultural biotechnology development in the GMS subregion, New Zealand, Agrifood Consulting International Inc., (AGRICO), USA and ANZDEC Limited, 2006, 25pp. 77. Tennant P, Ahmad MH and Gonsalves D. Field resistance of coat protein transgenic papaya to papaya ringspot virus in Jamaica. Plant Dis 2005;80:841–847. 78. Flasinski S, Aquino VM, Hautea RA, Kaniewski WK, Lam ND, Ong CA, Pillai V and Romyanon K. Value of engineered virus resistance in crop plants and technology cooperation with developing countries. In: Economic and Social Issues in Agriculture

458

79. 80.

81.

82.

83. 84. 85.

86. 87.

88.

89.

90.

91.

92.

93.

94.

Biotechnology, Evenson RE, Santaniello V and Zilberman D (eds), CAB International, 2002, pp. 251–268. Lawas TP and Magdalita P. Biotech papaya resistant to PRSV now under confined trials. ABSPII Newslett 2007;3(2):1. Wang CH, Bau HJ and Yeh SD. Comparison of the nuclear inclusion B protein and coat protein genes of five papaya ringspot virus strains distinct in geographic origin and pathogenecity. Phytopathol 1994;84:1205–1210. Sakuanrungsirikul S, Sarindu N, Prasartsee V, Chaikiatiyos S, Siriyan R, Sriwatanakul M, Lekananon P, Kitprasert C, Boonsong P, Kosiyachinda P, Fermin G and Gonsalves D. Update on the development of virus-resistant papaya: virus-resistant transgenic papaya for people in rural communities of Thailand. Food Nutr Bull 2005;26:422–426. Phironrit N, Chowpongpang S, Warin N, Bhunchoth A and Attathom S. Small scale field testing of PRSV resistance in transgenic papaya line KN116/5. First International Symposium Papaya. Genting Highlands, Malaysia, 22–24 November 2005, p. 42 (Abstract). Fermin G, Inglesses V, Garbozo C, Rangel S, Dagert M and Gonsalves D. Engineered resistance against PRSV in Venezuelan transgenic papayas. Plant Dis 2004;88:516–522. LaPlante AA and Sherman M. Carmine spider mite. Cooperative Extension Service, College of Tropical Agriculture, Insect Pest Series No. 3, 1976. Hill DS. Tetranychus cinnabarinus (Boisd.). In: Agricultural Insect Pests of the Tropics and Their Control, Garget J (ed), 2nd ed., Cambridge University Press, 1983, pp. 501–502. Ding X, Gopalakrishnan G, Johnson LB, White FF, Wang X, et al. Insect resistance of transgenic tobacco expressing an insect chitinase gene. Transgenic Res 1998;7:77–84. Nishijima W. Papaya. In: Compendium of Tropical Fruit Disease, Ploetz RC, Zentmyer GA, Nishijima WT, Rohrbach KG and Ohr HD (eds), St. Paul, MN, American Phytopath Soc. Press, 1994, pp. 54–70. Osborn RW, De Samblanx GW, Thevissen K, Goderis I, Torrekens S, Van Leuven F, Attenborough S, Rees SB and Broekaert WF. Isolation and characterisation of plant defensins from seeds of Asteraceae, Fabaceae, Hippocastanaceae ands Saxifragaceae. FEBS Lett 1995;368:257–262. Thevissen K, Ghazi A, De Samblanx GW, Brownlee C, Osborn RW and Broekaert WF. Fungal membrane responses induced by plant defensins and thionins. J Biol Chem 1996;271:15018–15025. Terras FR, Eggermont K, Kovaleva V, Raikhel NV, Osborn RW, Kester A, Rees SB, Torrekens S, Van Leuven F, Vanderleyden J, et al. Small cysteine-rich antifungal proteins from radish: their role in host defense. Plant Cell 1995;7:573–588. Parashina EV, Serdobinskii LA, Kalle EG, Lavorova NV, Avetisov VA, Lunin VG and Naroditskii BS. Genetic engineering of oilseed rape and tomato plants expressing a radish defensin gene. Russ J Plant Physiol 2000;47:417–423. Paull RE, Nishijima W, Marcelino R and Cavaletto C. Postharvest handling and losses during marketing of papaya (Carica papaya L.). Postharvest Biol Technol 1997;11: 165–179. Proulx E, Nunes MCN, Emond JP and Brecht JK. Quality attributes limiting papaya postharvest life at chilling and non-chilling temperatures. Proc Fla State Hort Soc 2005;118:389–395. Chen NM and Paull RE. Development and prevention of chilling injury in papaya fruit. J Amer Soc Hort Sci 1986;111:639–643.

459 95. Thomson AK and Lee GR. Factors affecting the storage behaviour of papaya. J Hort Sci 1971;46:511–516. 96. Lincoln JE, Campbell AD, Oetiker J, Rottmann WH, Oeller PW, Shen NF and Theologis A. Le-Acs4, a fruit ripening and wound-induced 1-aminocyclopropane-1carboxylate synthase gene of tomato (Lycopersicon esculentum): expression in Escherichia coli, structural characterization, expression characteristics, and phylogenetic analysis. J Biol Chem 1976;268:19422–19430. 97. Oeller PW, Min-Wong L, Taylor LP, Pike DA and Theologis A. Reversible inhibition of tomato fruit senescence by antisense RNA. Science 1991;254:437–439. 98. Hamilton AJ, Lycett GW and Grierson D. Antisense gene that inhibits synthesis of the hormone ethylene in transgenic plants. Nature 1990;346:284–287. 99. Ayub R, Guis M, Amor MB, Gillot L, Roustan JP, Latche A, Bouzayen M and Pech JC. Expression of ACC oxidase antisense gene inhibits ripening of cantaloupe melon fruits. Nat Biotechnol 1996;14:862–866. 100. Botella JR. Control of ripening in papaya and mango by genetic engineering. Project Summary. ACIAR, 2006. Available at http://www.aciar.gov.au/printable/project/PHT/ 1994/045. 101. Mason MG and Botella JR. Identification and characterisation of two 1-aminocyclopropane- 1-carboxylate (ACC) synthase cDNAs expressed during papaya (Carica papaya) fruit ripening. Austr J Plant Physiol 1997;24:239–244. 102. Hidalgo MSP, Tecson-Mendoza EM, Laurena AC and Botella JR. Hybrid ‘Sinta’ papaya exhibits unique ACC synthase I cDNA isoforms. J Biochem Mol Biol 2005; 38:320–327. 103. Cabanos CS, Sajise AG, Siar SV, Laurena AC, Magdalita PM, Garcia RN, Yabut-Perez B, Angeles, JGC, Zaporteza MM, Laureles LR, Villegas VN, Botella JR and TecsonMendoza EM. Transgenic Papaya with Delayed Ripening Trait: Selection and Characterization, submitted for publication. 104. Botella JR. ACC synthase genes from pineapple, papaya and mango. US Patent No. 6,124,525 issued on 26 September 2000. Assigned to the University of Queensland (Queensland Au). 105. Abu Bakar UK, Pillai V, Muda P, Fatt LP, Kwok CY and Daud HM. Molecular and biochemical characterisations of Eksotika papaya plants transformed with antisense ACC oxidase gene. Papaya Biotechnology Network of SEAsia: Technical Workshop and Coordination Meeting, Vietnam 22–26 October 2001. 106. Daud HM, Abu Bakar, Pillai V, Lam PF, Hashim H, Tan CS, Sekeli R, Seq YS, Muda P, Raveendranathan P, Chan YK and Ong CA. Improvement of Eksotika papaya through transgenic technology – Malaysia’s experience. First International Symposium on Papaya, Malaysia, 22–24 November 2005, p. 49 (Abstract). 107. Muda P, Ravindranathan P, Kwok CY, Abu Bakar UK, Pillai V, Fatt LP, Daud HM. Contained field evaluation of delayed ripening transgenic Eksotika papaya. Paper presented at the Papaya Biotechnology Network of SEAsia Coordination meeting, Bangkok, Thailand, 13 December 2003. 108. Abu Bakar UK, Pillai V, Hashim M and Daud HM. Sharing Malaysian experience with the development of biotechnology-derived food crops. Food Nutr Bull 2005;26(4):432–435. 109. Neupane KR, Mukatira UT, Kato C and Stiles JJ. Cloning and characterization of fruitexpressed ACC synthase and ACC oxidase from papaya (Carica papaya L.). International Symposium on Biotechnology of Tropical and subtropical Species, Brisbane, Australia, 29 September–3 October 1997, Acta Hort 1998;461:329–337.

460 110. Pais MSS, Gonzalves D and Balde A. Isolated DNA molecules related to papaya fruit ripening. Patent assigned to Cornell Research Foundation Inc. (NY, USA) and Institute of Applied Science and Technology (Lisbon, Portugal). US Patent 7,084,321 B2, 2006. 111. Paull RE and Jung CN. Plant xylanases. Patent assigned to the University of Hawaii (Hawaii, USA). US Patent 6495743 issued on 17 December 2002. 112. Tecson-Mendoza EMT, Siar SV, Garcia RN, Laurena ACL, Arrienda FQ, Pagulayan S III and Laureles L. Field Testing of Transgenic Papaya with Delayed Ripening Trait Towards Commercialization. Final Report submitted to EMERGE (Grant 06-008), June 2007. 113. Toledo A, Larralde C, Fragoso G, Gevorkian G, Manoutcharian K, Hernandez M, Acero G, Rosas G, Lopez-Casillas F, Garfias CK, Vazquez R, Terrazas I and Sciutto E. Towards a Taenia solium cysticercosis vaccine: an epitope shared by Taenia crassiceps and Taenia solium protects mice against experimental cysticercosis. Infect Immun 1999; 67:2522–2530. 114. Toledo A, Fragoso G, Rosas G, Hernandez M, Gevorkian G, Lopez-Casillas F, Hernandez B, Acero G, Huerta M, Larralde C and Sciutto E. Two epitopes shared by Taenia crassiceps and Taenia solium confer protection against murine T. crassiceps cysticercosis along with a prominent T1 response. Infect Immun 2001;69:1766–1773. 115. Huerta M, de Aluja AS, Fragoso G, Toledo A, Villalobos N, Hernandez M, Gevorkian G, Acero G, Dı´ az A, Alvarez I, Avila R, Beltra´n C, Garcia G, Martinez JJ, Sarralde C and Sciutto E. Synthetic peptides vaccine against Taenia solium pig cysticercosis: successful vaccination in a controlled field trial in rural Mexico. Vaccine 2001;20:262–266. 116. Sciutto E, Morales J, Martı´ nez JJ, Toledo A, Villalobos N, Cruz-Revilla C, Meneses G, Herna´ndez M, Dı´ az A, Rodarte LF, Acero G, Gevorkian G, Manoutcharian K, Paniagua J, Fragoso G, Fleury A, Larralde R, De Aluja AS and Larralde C. Further evaluation of the synthetic peptide vaccine S3Pvac against Taenia solium cysticercosis in pigs in an endemic town of Mexico. Parasitol 2006;4:1–5. 117. Warin N, Phironrit N, Bhunchoth A, Burns P, Chanprame S and Kositratana W. Determination of transencapsidation effects in genetically modified papaya containing the coat protein gene of PRSV-P Superinfected with PRSV-W. Proceedings of 6th Asian Crop Science Association Conference, Bangkok, Thailand, 5–9 November 2007, p. 125 (Abstract). 118. Phironrit N, Phuangrat B, Burns P and Kositratana W. 2007. Determination of possible impact on the cultivation of PRSV resistant transgenic papaya to rhizosphere bacteria using the community-level physiological profiles (CLPP). Proceedings of the Sixth Asian Crop Science Association Conference, Bangkok, Thailand, 5–9 November 2007, p. 126 (Abstract). 119. Wei XD, Zou HL, Chu LM, Liao B, Ye CM and Lan CY. Field released transgenic papaya effect on soil microbial communities and enzyme activities. J Environ Sci (China) 2006;18:734–740 (Abstract). 120. Hsieh Y-T and Pan T-M. Influence of planting papaya ringspot virus resistant transgenic papaya on soil microbial biodiversity. J Agric Food Chem 2006;54:130–137. 121. Lo CC, Chen SC and Yang JZ. Use of real-time polymerase chain reaction (PCR) and transformation assay to monitor the persistence and bioavailability of transgenic genes released from genetically modified papaya expressing nptII and PRSV genes in the soil. J Agric Food Chem 2007;55:7534–7540.

461 122. Widmer F. Assessing effects of transgenic crops on soil microbial communities. Adv Biochem Engin/Biotechnol 2007;107:207–234. 123. Camp SG III. Identity preservation protocol for non-GMO papayas. Proceedings of Virus Resistant Transgenic Papaya in Hawaii: A case for technology transfer to lesser developed countries. OECD/USAID/ARS Conference, Hilo, HI, 20–21 October 2003, pp. 95–100. 124. Manshardt R. Is organic papaya production in Hawaii threatened by crosspollination with genetically engineered varieties. Univ Hawaii Coll Trop Agric Hum. Res Biol-1 2002;3:2pp. 125. FAO/WHO Codex Alimentarius Commission. Principles for Risk Analysis and Guidelines for Safety Assessment of Foods derived from Modern Biotechnology. Adopted at the 26th Session of the Codex Alimentarius. FAO/WHO Food Standards Programme.,2003. Available at http://www.fao.org/ag/agn/agns/biotechnology_codex_en.asp. 126. FAO/WHO Joint FAO/WHO Expert Consultation on Foods Derived from Biotechnology – Allergenicity of Genetically Modified Foods, Rome, 22–25 January 2001. Rome, Food and Agriculture Organisation of the United Nations. 127. Kleter GA and Peijnenburg AACM. Screening of transgenic proteins expressed in transgenic food crops for the presence of short amino acid sequences identical to potential, IgE-binding linear epitopes of allergens. BMC Struct Biol 2002;2:8. Available at http://www.biomedcentral.com/1472-6807/2/8. 128. Hileman RE, Silanovich A, Goodman RE, Rice EA, Holleschak G, Astwood JD and Hefle SL. Bioinformatic methods for allergenicity assessment using a comprehensive allergen database. Int Arch Allergy Immunol 2002;128:280–291. 129. Metcalfe DD, Astwood JD, Townsend R, Sampson HA, Taylor SL and Fuchs RL. Assessment of the allergenic potential of foods derived from genetically engineered crop plants. Crit Rev Food Sci Nutr 1996;36:S165–S186. 130. Paterson JCM, Garside P, Kennedy MW and Lawrence CE. Modulation of a heterologous immune response by the products of Ascaris suum. Infect Immun 2002;70:6058–6067. 131. Steinman H. Papain. Allergens within occupational allergens. Available at http:// www.immunocapinvitrosight.com/dia_templates/ImmunoCAP/Allergen____28129.aspx#, accessed February 2008. 132. Ettlinger MG and Kjaer A. Sulfur compounds in plants. Rec Adv Phytochem 1968;1: 49–144. 133. Le Marchand L, Hankin JH, Kolonel LN and Wilkens LR. Vegetable and fruit consumption in relation to prostate cancer risk in Hawaii: a reevaluation of the effect of dietary beta-carotene. Am J Epidemiol 1991;133:215–219. 134. Hecht SS, Kenney PM, Wang M and Upadhyaya P. Benzyl isothiocyanate: an effective inhibitor of polycyclic aromatic hydrocarbon tumorigenesis in A/J mouse lung. Cancer Lett 2002;187(1–2):87–94. 135. Gonsalves D and Manshardt R. Petition for FDA clearance of papaya ringspot virusresistant transgenic lines 55-1 and 63-1. St. John Plant Science Laboratory, Department of Horticulture, College of Tropical Agriculture and Human Resources, University of Hawaii at Manoa, Hawaii, 1997. 136. Tang CS. Benyl isothiocyanate of papaya fruit. Phytochem 1971;10:117–121. 137. Joseffson E. Distribution of thioglucosides in different parts of Brassica plants. Phytochem 1967;6:1617–1627. 138. Fitzpatrick M. Japan’s GM dilemma, 2 October 2006. Available at http://www.just-food. com/article.aspx?ID ¼ 96184.

462 139. McCluskey JJ, Grimsrud KM, Ouchi H and Wahl TI. Consumer Response to Genetically Modified Food Products in Japan Agricultural and Resource Economics Review, October 2003. Available at http://findarticles.com/p/articles/mi_qa4046/is_200310/ai_n9305606. 140. James C. Global Status of Commercialized Biotech/GM Crops: 2007. ISAAA Brief No. 37. ISAAA, Ithaca, NY, 2007. Available at http://www.isaaa.org.

463

Index of authors

Aletta, J.M. 203 Blankesteijn, W.M. 253 Blomenro¨hr, M. 253 Botella, J.R. 423 Charbonnier, S. 1 Chattopadhyay, D. 297 Foote, M.A. 403 Gallego, O. 1 Gavin, A.-C. 1 Goodman, B.A. 349 Gotoh, T. 109 Hu, J.C. 203 Inouye, K. 225 Ishikawa, T. 225 Karolchik, D. 63 Khan, M.T.H. 297 Lathe, S.M. 63 Lathe, W.C. III. 63 Laurena, A.C. 423 Li, L. 171 Mangan, M.E. 63 Mendoza, E.M.T. 423

Mizuki, E. 225 Navran, S. 275 Okumura, S. 225 Otaki, J.M. 109 Pirker, K.F. 349 Reichenauer, T.G. 349 Richmond, T.D. 411 Roberts, P.C. 29 Saitoh, H. 225 Schellekens, H. 191 Severino, J.F. 349 Smits, J.F.M. 253 Van Eck, J. 171 van Koppen, C.J. 253 van Rosmalen, J.W.G. 253 Verkaar, F. 253 Williams, J.M. 63 Winkler, D.A. 143 Yamamoto, H. 109 Zaman, G.J.R. 253 Zhou, X. 171

465

Keyword index

Abstracts, in critical review, 407–408 Adenosine, 206 Adenosine periodate (AdOX) inhibition, 215 Adipose tissue, RWV in, 287–288 Affinity purification-mass spectrometry, 6 Affymetrix 3u expression GeneChips, 31, 33 Agilent microarrays, 38 Agilent platform, 31 Agrobacterium-mediated transformation, of papaya, 427 Alkaloids, in herpesviruses infection, 334–336 All-trans-violaxanthin, 176 Aluminum tolerance, in papaya, 438 Alzheimer’s disease, 3 AmpliChip CYP450 test, 413 Amplified fragment length polymorphism (AFLP) markers, 180 Amyotrophic lateral sclerosis, 3 Analysis of variance (ANOVA), 40, 42 Analyte proteins (preys), 9 Antibodies to therapeutic proteins, clinical consequences of, 198–199 Antibody assays, 192–193 Antibody induction, mechanism of, 193–194 Anti-fibrillarin autoantibodies, 213 Anti-Sm autoantibodies, 213–214 Apert syndrome, 3 Arabidopsis ap1-1/cal-1 ‘‘cauliflower’’ mutant, 183 Arabidopsis sequence data, 103 Arabidopsis thaliana, 172 Arrestin, 256. See also b-arrestin

Assembly, defined, 66 A1470 strain, of Bacillus thuringiensis, 234–237 AtCCD7 protein, 176 Autoimmune disorders, 213–214 Autoxidation, GTP, 374–375, 377–380 Bacillus thuringiensis, 226–229 A1470 strain, 234–237 characterization of parasporin-4, 244–248 as inclusion bodies in E. coli, 237–244 interaction with d-endotoxin, 229–230 Backpropagation neural networks, 160–161 Bacterial artificial chromosome (BAC) library, 180 Bacterial endotoxines, 194 Bannayan-Riley-Ruvalcaba syndrome, 3 b-arrestin, affinity for 7TM recruitment, 255–257 BRET, 259–261 EFC, 261–262 high-content assays, 258–259 protease-mediated transcriptional reporter gene assay, 262 therapeutic application of, 257–258 Basic Local Alignment Search Tool (BLASTs or BLAST2), 80 Bayesian Belief Net, 158 B-cell tolerance, 194 B12-dependent enzyme, 206 Benzyl isothiocyanate (BITC), 448–449 Betaine homocysteine methyltransferase (BHMT), 206

466 BHMT. See Betaine homocysteine methyltransferase (BHMT) Bioconductor package, 38 project, 30 software, 45–46 Biological interpretation, 44–45 Biological replicates, 35 Bioluminescence resonance energy transfer (BRET), 259–261 Biomaterials, 151 Biomolecular interactions altered, in human diseases, 4 in human diseases, 2–5 mapping of, 5–6 approaches, 6–14 measurement of dynamics, 14–18 Biopharma product development, 403 critical analysis of literature in. See Critical review BioPrints database, 149 BITC. See Benzyl isothiocyanate (BITC) Bixin, 176 BLASTP, 99 BLAST tool, 114, 117, 123 BLAT tool, 80–81 Bonferroni correction, 40–41 Boolean networks, 158–159 Brazil, papaya in, 432 BRET. See Bioluminescence resonance energy transfer (BRET) Browser Extensible Data (BED) record, 95 Brush border membrane (BBM) vesicles, of diamondback moth, 230 Bruton’s protein tyrosine kinase (Btk), 4 Caenorhabditis elegans, 12, 152 embryogenesis, 159 Calmodulin-binding peptide (CBP), 8, 11 Cancers, 216–218 Carbohydrates, 150 Cardiac muscle, RWV in, 286–287

Cardiovascular disease, 218 b-carotene, 177, 180 Carotene desaturase (crtI), in bacteria, 173 z-carotene desaturase (ZDS), 173 Carotenoids, 171 biosynthesis, in plants, 173–177 metabolic engineering in, 177–178 sequestration, 173, 176–177 CBP–calmodulin interaction, 8 Cell culture, 275 Cell therapy, RWV in, 286 China, papaya in, 432–433 Chromatin immunoprecipitation (ChIP) experiments, 152 Chromoplast-specific carotenoidassociated protein (CHRC), 177 9-cis-epoxycarotenoid dioxygenases, 173, 176 C40 isoprenoids, 173 CitCCD1 protein, 176 CLA1, 175 Classical immune response, 193–194 Clinical trial design, in personalized diagnostics, 421 2-C-methyl-D-erythritol-4-phosphate (MEP), 175 Comparative chromatography retention assay, 17–18 Comparative Molecular Field Analysis (CoMFA), 154 Connective tissue, RWV in, 286 Coomassie brilliant blue R250, 234 Cornea, RWV in, 286 Coumarins, in herpesviruses infection, 322–325 CpG motifs, 194 Critical review abstracts, 407–408 discussion, 409 information identification, 407 introduction in, 408 keywords in, 408 literature search, 403–404 materials, 408

467 methods, 408 questions to ask in, 405–406 results, 409 Cross validation methods, 161–162 Cry46Aa-like toxin, of A1470 strain, 237 CRY2 gene, 177 Cry protein, 226 Cryptochromes, 177 b-cryptoxanthin, 176 Cucumis sativus, 177 CXCL5 mRNA sequence, 83 Cyanine 5 (Cy5), 38 Cystathionine g-lyase, 206 CytK toxin of Bacillus cereus, 246 Cyt protein, 226 Data preprocessing, 37–38 DAVID system, 45 DbSNP database, 86 Deimination, 207 Differential expression analysis cluster analysis, 42–44 comparative statistics, 39–40 filtering of data, 38–39 supervised classification, 44 use of corrections for multiple testing, 40–41 Dimethylallyl pyrophosphate (DMAPP), in plastids, 173 Dipeptide information, 132 1,3-ditetradecylglycero-2phosphocholine (PC14), 230 DNA microarrays, 14 DNMT3B, 3 Draft guidance, personalized diagnostics IVDMIA, 418–419 pharmacogenomics data submissions, 419 DRAGON, 154 Drosophila melanogaster, 12 Drug design and network motifs and paradigms, 148–150, 163

Drug development, RWV in, 284–285 Dystrophin, 87 Eberwine method, 36 EBNA2, 215 EFC. See Enzyme fragment complementation (EFC) Electron paramagnetic resonance (EPR) spectroscopy, 355 experimental techniques, 363–370 green tea, free radical reactions in, 370–374 GTP, free radical reactions in autoxidation, 374–375, 377–380 radicals generation, 382–385 superoxide anion radical, 380–382 and transition metals, 385–392 overview, 360–363 ENCODE datasets, 103 d-endotoxin, 226 Ensembl, 44 Entrez Gene, 44 Environment safety, papaya and, 444–445 heteroencapsidation, 445 microbial community, 445–447 transencapsidation, 445 transgene flow, 447 Enzyme fragment complementation (EFC), 261–262 Equilibrium dialysis, 17 dissociation constants, 18 ErbB-receptor family, 11 Escherichia coli, 8, 125, 127–128 Essential oils, in herpesviruses infection, 325–330 Ethylenediaminetetraacetic acid (EDTA), 237 Ethylene production strategy, in papaya, 439–440, 442 Expectation maximization (EM) algorithm, 154 Expressed sequence tags (EST), 33, 71–72

468 External RNA Control Consortium (ERCC), 48 Extrinsic regulatory factors, 151

Frizzled receptors, 254 signaling, 264–266 translocation assays for, 266

Facial anomalies (ICF) syndrome, 3 False discovery rate (FDR) correction, 40–41 Family wise error rate (FWER) control, 40 FASTA format, 82 FDA. See Food and Drug Administration (FDA) Feature selection, 154–155 Fibrillin (fib) gene, 177 Fibroblast growth factor receptor 2 (FGFR2), 3 Flavones, in herpesviruses infection, 314–322 Flavonoids biosynthesis of, 356 in herpesviruses infection, 314–322 in human diet, 358 medicinal properties of, 358–360 in plants, 356–357 Fluorescence-based assays such as fluorescence resonance energy transfer (FRET), 15 Food and Drug Administration (FDA), 411 personalized diagnostics, initiatives in, 416–418 MAQC, 417 PSTC, 417 VGDS, 417 Food safety, and papaya, 447 BITC, 448–449 papain, 448 viral proteins, 448 Fragile X mental retardation gene 1 (FMR1), 211 Fragile X mental retardation protein (FMRP), 209, 213 Fragile X syndrome, 211–212

Galaxy, 91–92 Garnier-Osguthorpe-Robson (GOR) method, 114 Gateway search interface, 67 G-coupled receptors, 3 GenBank repository, 69 GeneChip Consortia Program, 31 Gene expression, RWV in, 283–284 Gene expression microarray, 30–33 data analysis, 29 commerical software, 46 comparison of commercial gene expression microarray, 47 open-source software, 45–46 experiment process, 33–37 future, 50–51 Gene Expression Omnibus (GEO), 49 Gene Ontology (GO) database, 45 GeneSearch BLN assay, 413 GeneSifter, 46 Gene Sorter, 97–99 Genetic regulatory networks, 146 Genome Graphs, 70 Genomes, as network motifs and modelling paradigms, 147 Geranylgeranyl pyrophosphate (GGPP), 173 Glutathione (GSH), 8 Glutathione sepharose (GSTpulldown), 9 Glutathione-S-transferase (GST), 8 Glycine- and arginine-rich (GAR) domains, 204 Golden Rice, 177 Golden Rice 2, 172 GPR91, 3 GPR99, 3 G-protein-based screening assay, 12 G protein-coupled receptors (GPCR), 115

469 Green tea, free radical reactions in, 370–374. See also Green tea polyphenols (GTP) and transition metal ions, 385–392 Green tea polyphenols (GTP). See also Flavonoids free radical reactions in autoxidation, 374–375, 377–380 radicals generation, 382–385 superoxide anion radical, 380–382 and transition metal ions, 385–392 medicinal properties of, 358–360 oxidation, 350–356 in tea plants, 357–358 GST-HisX6 tag-fusion yeast proteins, 14 GTP. See Green tea polyphenols (GTP) Hawaii, papaya in, 430, 432 Hedgehog signaling, 266–267 Helicobacter pylori, 12 Herbal products, for herpesviruses infection, 300–306 Herbicide tolerance, in papaya, 438 Herpes simplex virus (HSV), 299 Herpesviruses HHV, 298–299 HSV, 299 infection control alkaloids, 334–336 coumarins for, 322–325 essential oils, 325–330 flavones, 314–322 flavonoids, 314–322 flavonols for, 314–322 herbal products for, 300–306 lectins, 336 lignans, 333–334 phenolics for, 306, 310–314 polypeptides, 336 polyphenols for, 306, 310–314 sugar-containing compounds, 336–337 tannins, 330–333 Heteroencapsidation, 445

HHV. See Human herpesviruses (HHV) Hierarchical clustering, 42 High mobility group A (HMGA) proteins, 217 HIV-1 methylarginine proteins, 215 Hold up assay, 17–18 Homocysteine, 206 Homologues, human, 192, 197 Homo sapiens, 12 HSV. See Herpes simplex virus (HSV) Human herpesviruses (HHV), 298–299 Human papilloma viruses (HPV) transformed cells, 13 Hydroxy-methylbutenyl diphosphate reductase (HDR), 175 5-hydroxytryptamine 2c (5-HT2C), 7 ICP27 protein, 215 Illumina BeadChips, 31 Illumina platform, 30 Immunogenicity, of therapeutic proteins, 195–196, 199–200 Independent component analysis (ICA), 44 Indonesia, papaya in, 433 Information identification, for critical review, 407 In-Silico PCR, 99–100 Intercellular signaling, RWV in, 283–284 In vitro Diagnostic Multivariate Index Assays (IVDMIA), 418–419 In vitro disease diagnostic (IVD), 412. See also Personalized diagnostics IVDMIA, 418–419 Islet transplantation, RWV in. See Pancreatic islet transplantation, RWV in Isopentenyl pyrophosphate (IPP), synthesis, 173 Isoprenoid precursors, 174–175 Isothermal titration calorimetry (ITC), 17

470 IsWellAboveBG flag, 39 IVD. See In vitro disease diagnostic (IVD) Jamaica, papaya in, 433 Jeffreys’ noninformative hyperprior, 154 Jumonji-domain-containing histone demethylase 1, 208 27-kDa cytotoxic protein, 234–236 Keywords, in critical review, 408 K-means algorithms, 42 Kohonen networks, 155 K-treated parasporin-4, 236, 238 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database, 45 LeCCD1 gene, 176 Lectins, for herpesviruses infection, 336 Lignans, for herpesviruses infection, 334–336 Literature search, for critical review, 403–404 Long shelf life papaya, 438–439 Low shear modeled microgravity, 277. See also Rotating Wall Vessel (RWV) in bone loss, 283 in cell therapy, 286 in gene expression, 283–284 in islet transplantation, 288 in space biology, 284 in tissue therapy, 286 L-a-phosphatidylcholine, 230 Lutein, 173 Lycopene cleavage dioxygenase, 176 Lysine-specific demethylase 1 (LSD1), 207 Lytic bacteriophages, 13 Malaysia, papaya in, 433–434 Mammalian protein–protein interaction trap (MAPPIT), 12–13

MammaPrint, 414 MAQC. See Microarray quality control consortium (MAQC) Materials, for critical review, 408 M13-derived bacteriophages, 13 MeCP2 (methyl-CpG binding protein 2), 213 Megakryocyte growth and differentiation factor (MGDF), 196 Mesoscale network models, of biological systems, 153 Metabolic engineering of carotenoids, in crops, 177–178 Metabolic pathways, 204–208 Metabolic turnover of carotenoids, 175–176 Metabolites, roles, 3 Methods, for critical review, 408 Methylarginine sites, in proteins, 204 Methylation metabolic cycle, 205 Methylene tetrahydrofolate (THF) reductase, 206 6M guanidine hydrochloride, 238 MHC class II genes, 3 Microarray Gene Expression Database Society (MGED), 46 Microarray quality control consortium (MAQC), 48–49, 417 Microarrays/hub genes/diagnostics, 155–158 Microarray Suite 5 (MAS5), 37 Microbial community, and papaya, 445–447 Microsoft Excel plug-ins, 45 Minimum Information About a Microarray Experiment (MIAME) standard, 46 Mismatched (MM) probes, 37 Missense mutations, 3 Mites resistance, in papaya, 436–437 Mixed lineage leukemia (MLL) gene, 217 Molecular basis mutations, 3 Molecular interaction experiment (MIMIx), 6

471 Molecules, as network motifs and modelling paradigms, 147 Monogenic syndromes, 3 Monolayer recontruction binding of the Cry1Ac toxin to the, 231–232 SPR data, nalysis, 233–234 MRE11, 216 Multiple sclerosis, 213 8M urea, 238 Myelin basic protein (MBP), 214 MySQL database, 84–85, 88 n-aa set sequence, defined, 121 N-acetyl-D-galactosamine (GalNAc), 231–232 Nanosphere Verigene warfarin metabolism nucleic acid test, 413 NaOH buffer, 246 National Center for Biotechnology Information (NCBI) database, 80 National Institute of Standards and Technology (NIST), 48 Neoxanthin, 173 Nerve growth factor (NGF) potentiates, 209 Network motifs and modelling paradigms approaches and tools, 151–164 properties, 145–146 in therapeutic and regenerative medicine, 144–151 Neural networks, 160–163 Neurodevelopmental disease, 211–213 N-methylation, of arginine, 203 N-methyl-D-aspartate (NMDA), 7 Non-human proteins, 196 N proteins, 120 Oligonucleotide probes, 30 OpenHelix, 106 Orange cauliflower, 178–180 Organ culture, 275

Or gene, 172 analysis of, 180–181 characteristics of, 178–180 and creation of metabolic sink, 184–185 map-cloning of, 180 possible functional role of, 181–183 use of, 183–184 Paired perfect match (PM), 37 Pancreatic islet transplantation, RWV in, 283–284 PANTHER, 45 Papain, 448 Papaya economically important traits aluminum and herbicide tolerance, 438 in Brazil, 432 in China, 432–433 ethylene production strategy, 439–440, 442 in Hawaii, 430, 432 in Indonesia, 433 in Jamaica, 433 long shelf life papaya, 438–439 in Malysia, 433–434 pests and diseases, 428 pharmaceutical production, 443–444 in Philippines, 434–435 PRSV resistance in, 428–430 resistance to mites, 436–437 resistance to phytophthora, 437 softening strategy, 442–443 in Taiwan, 435 in Thailand, 435–436 in Venezuela, 436 in Vietnam, 436 environment safety, 444–445 heteroencapsidation, 445 microbial community, 445–447 transgene flow, 447 food safety, 447–449 overview, 423–425

472 strategies and constraints, 449–452 transformation systems for delvery systems, 427–428 promoters, 425 selection markers, 425–426 Parasporin-4, 235–236, 239–241, 246–248 Parasporin family, 228 Parkinson’s disease, 3 Particle bombardment technology, 427 Partitioning around medoids (PAM), 42 Patient base, in personalized diagnostics, 420 P53 binding protein 1 (53BP1), 216 Pegylation, 195 Personalized diagnostics defined, 412 draft guidance by industry, 418–419 examples of, 413–414 FDA initiatives, 416–418 issues in, 419–421 clinical trial design, 421 patient base, 420 social infrastructure, 421 regulations, 414–415 Personalized medicine, 412 Pests and diseases, papaya’s resistance to, 428 Phage display, 13–14 Pharmaceutical production, papaya in, 443–444 Phenolics, for herpesviruses infection, 306, 310–314 Philippines, papaya in, 434–435 Phosphatidylinositol phosphates (PIPs), 3 Physicochemical characterization, for prediction in animal studies, 197–198 human homologues, 197 non-human proteins, 196 Phytoene desaturase (PDS), 173 Phytoene synthase, 177 Phytophthora resistance, in papaya, 437

Plastid fusion/translocation factor (Pftf ), 183 Pleckstrin-homology domains (PH), 3 Plutella xylostella, 230 Polypeptide binder, 13 Polypeptides, for herpesviruses infection, 336 Polypeptides (preys), 13 Polyphenols, for herpesviruses infection, 336 Potato tubers, 183 Predictive safety testing consortium (PSTC), 417 Principal component analysis (PCA), 44 Prion encephalopathies, 3 PRMT6-mediated methylation, 215 Protease-mediated transcriptional reporter gene assay, for b-arrestin recruitment, 262 Protein amino acid sequence availability analysis and future, 132–134 historical and philosophical perspectives, 110–117 information extraction from availability as an indicator for secondary structures, 128–130 availability-based structure prediction, 130–131 availability bias and its biological and biotechnological implications, 121–125 availability difference between species and its biomedical applications, 125–128 counting of sequences, 120–121 rationale for use, 117–120 revealing biological word processing units, 131–132 Protein arginine methyltransferases, role of in growth and differentiation, 209–211 Protein characterization, 116–117 Protein–metabolite interactions, deregulation of, 3

473 Protein or small molecule pulldown, 9–12 Protein–protein interaction databases available on the Internet, 7 role of enzymes, 3 studies, 5 Protein three-dimensional structure, 113, 120 Proteome Browser, 101 PRSV resistance, in papaya, 428–430 PSD95-DLG-ZO1 (PDZ) domains, 2 Pseudo-amino acid composition, 133 PSTC. See Predictive safety testing consortium (PSTC) PSY enzyme, 175 Public gene expression data repositories, 49–50 Pulished literature, critical analysis of. See Critical review Putative protein-coding genes, 29 Pyrido[2,3-d]pyrimidine PD173955, 13 Quantile normalization, 37 Quantitative Structure–Activity Relationship (QSAR), 148, 151, 155 Quick Reference Cards, 106 Ramachandran plots, 130–132 Random Positioning Machine (RPM), 277 Ras recruitment systems, 12 Recursive neural networks, 159–160 RefSeq repository, 69 Regenerative medicine and networks, 145, 163 Regulations, for personalized diagnostics CLIA, 414–415 FDA initiatives in, 416–418 special controls, 415 Regulatory factor X (RFX) complex, 3 Resonance units (RU), 16 Retina, RWV in, 287 RFXANK gene mutant, 3 Rheumatic diseases, 213

RNA integrity number (RIN), 36 RNA sample processing, 35 Rotating Wall Vessel (RWV) applications in adipose tissue, 287–288 cardiac muscle, 286–287 cell and tissue therapy, 286 connective tissue, 286 cornea and retina, 287 drug development, 284–285 gene expression, 283–284 intercellular signaling, 283–284 pancreatic islet transplantation, 288 space biology, 284 stem cells, 282–283 operating principles of, 277–281 Saccharomyces cerevisiae, 1–2, 8 S-adenosylhomocysteine (SAH), 206–207 S-adenosylmethionine (SAM) metabolism, 206 Scale-free networks, 147 Schizosaccharomyces pombe, 103 Self-organizing maps (SOM), 42, 155 Sensitive thermocouple circuits, 17 Sequence-characterized amplified region (SCAR) markers, 180 Seven-transmembrane receptors (7TM) b-arrestin, affinity. See b-arrestin, affinity for 7TM Frizzled receptors, 254, 264–266 GPCR Hedgehog signaling, 266–267 overview, 253–254 receptor desensitization, 254–257 Smoothened receptor, 264, 266–267 Seven-transmembrane (7TM) receptors, 115 Significance analysis of microarrays (SAM), 40 Single-stranded DNA (ssDNA), 13 Smoothened receptors, 266–267 Sm proteins, 211, 214

474 Social infrastructure, in personalized diagnostics, 421 Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), 234 Sodium dodecyl sulfate (SDS), 9 Space biology, RWV in, 284 Split-ubiquitin system, 12 SPR (Biacore), 193 Src homology 2 (SH2), 2 Stem cells and networking, 150–151 RWV in, 282–283 Stopped flow experiments, 16 Sugar-containing compounds, herpesviruses infection, 336–337 Superoxide anion radical, GTP, 380–382 Super-tracks, 74 Surface plasma wave (SPW), 16 Surface plasmon resonance (SPR), 16 Survival motor neuron gene 1 (SMN1), 211 Symmetric dimethylarginine (SDMA), 218 Tag-affinity resin pairs, 8 Taiwan, papaya in, 435 Tandem affinity purification (TAP) protocol, 8, 11 Tannins, in herpesviruses infection, 330–333 TAP-fusion protein, 8 TAP/MS protocol, 8 Technical replicates, 35–36 Thailand, papaya in, 435–436 Therapeutic proteins, 191 Tissue culture 3-D, 276–277 overview, 275–276 Tissue culture system, 427 Tissue engineering, 286 7TM. See Seven-transmembrane receptors (7TM) Tobacco Etch Virus (TEV) protease, 8

TP53 gene, 68, 72, 76, 80 Transcription Factor Binding Sites (TFBS), 88–89 Transencapsidation, 445 Transformation systems, for papaya delivery systems, 427–428 promoters, 425 selection markers, 425–426 Transgene flow, and papaya, 445–447 Transgenic papaya. See Papaya Transition metal ions, and GTP, 385–392 Trypanosoma brucei, 210 Two-color systems, 31, 36, 38 Type 1 errors, 38–39 Type 2 errors, 38 Type I and type II protein arginine methyltransferases (PRMTs), 204 UCSC Genome Browser, 44 ‘‘add custom tracks’’ button, 75 additional species browser, 101–103 advance features, 84 associated tools, 97 basics, 65 basic searches, 65–68 coming attractions, 103 ‘‘configure’’ button, 75–76 contact informations, 106 cummunity tracks page, 96 ‘‘custom track’’ output format, 92–96 ‘‘default tracks’’ button, 75–76 displays basic, 68 choices, 76 page-wide, 74–76 visual cues, 68–72 gene sorter, 97–99 ‘‘hide all’’ button, 75 In-Silico PCR, 99–100 navigation of page, 76–78 and OpenHelix, 106–107 other details, 91–92, 103–106 Proteome Browser tool, 101 ‘‘refresh’’ button, 76

475 sample search, 84–88 for filtered SNPs, 88 for SNPs in transcription factor binding sites (TFBS), 88–91 sequence information, 78–83 session management, 96 setting of layouts, 72–74 super-tracks, 74 VisiGene browser, 101, 104 UniGene clusters, 33 UniProt repository, 69 Universality class, 146 Venezuela, papaya in, 436 VGDS. See Voluntary genomic data submissions process (VGDS) Vibrio parahaemolyticus, 246 Vietnam, papaya in, 436 Violaxanthin, 173 Viral disease, 214–215

Viral proteins, and papaya, 448 VisiGene browser, 101, 104 Vitamin A, 171 Voluntary genomic data submissions process (VGDS), 417 VP14, 176 Welch’s t test, 40 Wnt family. See Frizzled receptors Xanthoxin, 173 X-linked a-gamma-globulinaemia (XLA), 3 Yeast two-hybrid system, 12–13 Zeaxanthin, 173, 176, 177 Zero-count constituent sequences, 124