242 111 14MB
English Pages 672 [646] Year 2021
SYSTEMS IMMUNOLOGY AND INFECTION MICROBIOLOGY
SYSTEMS IMMUNOLOGY AND INFECTION MICROBIOLOGY BOR-SEN CHEN
Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan, ROC
Academic Press is an imprint of Elsevier 125 London Wall, London EC2Y 5AS, United Kingdom 525 B Street, Suite 1650, San Diego, CA 92101, United States 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom Copyright © 2021 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress ISBN: 978-0-12-816983-4 For Information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals
Publisher: Stacy Masucci Acquisitions Editor: Rafael E. Teixeira Editorial Project Manager: Barbara Makinster Production Project Manager: Maria Bernard Cover Designer: Mark Rogers Typeset by MPS Limited, Chennai, India
Contents 3.4 On the construction of inflammatory gene regulatory network in immune system 38 3.5 Biological insight and discussion 45 3.6 Conclusion 67 3.7 Material and methods 68 3.8 Appendix 72
Preface ix 1. Introduction to systems immunology and infection microbiology 1 1.1 Introduction 1 1.2 Content and general outline of the book 3
4. Dynamic cross-talk analysis among signaling transduction pathways in the vascular endothelial inflammatory response system of humans 87
I Systems Immunology
4.1 Introduction 87 4.2 Methods of constructing cross talks among signaling pathways in inflammation 90 4.3 Signaling transduction, signaling pathways, and their cross talks in inflammatory response 95 4.4 Discussion 102 4.5 Conclusion 107 4.6 Appendix: Supplementary methods 108
2. Biological network modeling and system identification in systems immunology and infection microbiology 13 2.1 System identification for gene regulatory network 13 2.2 System identification of protein protein interaction network 18 2.3 System identification of integrated genetic and epigenetic network via high throughput next generation sequencing data 19 2.4 Conclusion 21
II Systems Infection Microbiology
3. Identifying the gene regulatory network of systems inflammation in humans by system dynamic model via microarray data and database mining 23
5. Prediction of infection-associated genes via a cellular molecular network approach: A Candida albicans infection case study 117 5.1 Introduction 117 5.2 Methods of constructing cellular molecular networks in Candida albicans infection 118 5.3 Infection-associated genes via cellular molecular network approach 122 5.4 Discussion and conclusion 126 5.5 Appendix 128
3.1 Introduction 23 3.2 Construction of candidate inflammatory gene regulatory network in response to inflammatory stimulus 25 3.3 Pruning the candidate gene regulatory network via a dynamic gene regulatory model 36
v
vi
Contents
6. Global screening of potential Candida albicans biofilm-related transcription factors by network comparison via big database mining and genome-wide microarray data identification 135 6.1 Introduction 135 6.2 Systems methods of screening biofilm-related transcription factors 136 6.3 Potential Candida albicans biofilm-related transcription factors 144 6.4 Discussion 147 6.5 Conclusion 148 6.6 Appendix 149
7. Identification of infection- and defense-related genes through dynamic host pathogen interaction network 157 7.1 Introduction 157 7.2 Material and methods of constructing host pathogen interaction network in Candida albicans zebrafish infection 159 7.3 Pathogenic/offensive mechanism between Candida albicans and zebrafish in infection process 165 7.4 Discussion 175 7.5 Conclusion 177 7.6 Appendix 178
8. Host pathogen protein protein interaction network for Candida albican pathogenesis and zebrafish redox process through dynamic interspecies interaction model and two-sided genome-wide microarray data 187 8.1 Introduction 187 8.2 Construction of host/pathogen protein protein interaction network 189 8.3 Host/pathogen protein protein interaction network during the infection process of Candida albicans 195 8.4 Discussion 199 8.5 Conclusion 200
9. Essential functional modules for pathogenic and defensive mechanisms via host/pathogen crosstalk network by database mining and two-sided microarray data identification 203 9.1 Introduction 203 9.2 Material and methods 205 9.3 Essential functional modules for pathogenic and defensive mechanisms 209 9.4 Discussion 221 9.5 Conclusion 223 9.6 Appendix 223
III Systematic Inflammation and Immune Response in Restoration and Regeneration Process 10. The role of inflammation and immune response in cerebella wound-healing mechanism after traumatic injury in zebrafish 229 10.1 Introduction 229 10.2 Materials and methods for constructing protein protein interaction network of cerebellar wound-healing process in zebrafishes 231 10.3 The role of inflammation and immune response in the cerebellar wound-healing process 238 10.4 Discussion and conclusion 244 10.5 Appendix 247
11. Key immune molecular biomarkers in the pathomechanisms of early cardioembolic stroke: Multidatabase mining and systems biology approach 263 11.1 Introduction 263 11.2 Immune events in pathomechanisms of early cardioembolic stroke 264
vii
Contents
11.3 Material and methods of PPI network construction and principle network projection 273 11.4 Conclusion 277 11.5 Appendix 278
IV Systems Innate and Adaptive Immunity in the Infection Process 12. Cross-talk network biomarkers of pathogen host interaction network from innate to adaptive immunity 281 12.1 Introduction 281 12.2 Material and methods 283 12.3 Investigating PH-PPINs for cross-talk network markers from innate to adaptive immunity 287 12.4 Discussion and conclusion 295
13. The coordination of defensive and offensive molecular mechanisms in the innate and adaptive host pathogen interaction networks 297 13.1 Introduction 297 13.2 Materials and methods to coordinate defensive molecular mechanisms in innate and adaptive host pathogen networks 299 13.3 Defensive and offensive molecular mechanisms based on the innate and adaptive HP-PPINs 304 13.4 Discussion 312 13.5 Appendix 315
14. The significant signaling pathways and their cellular functions in innate and adaptive immune responses during infection process 317 14.1 Introduction of innate and adaptive immune systems 317
14.2 Materials and methods 318 14.3 Investigating the defense/offensive strategies of innate and adaptive immunity 319 14.4 The roles of significant signaling pathways in the innate and adaptive immune responses 326 14.5 Conclusion 333 14.6 Appendix 334
V Systematic Genetic and Epigenetic Pathogenic/ Defensive Mechanism During Bacterial Infection on Human Cells 15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms in human macrophages and dendritic cells during Mtb infection 339 15.1 Introduction to tuberculosis infected by Mycobacterium tuberculosis 339 15.2 Materials and methods for constructing crosstalk GWGEINs and their core networks 341 15.3 Investigating pathogenic/host defense mechanism to identify drug targets 356 15.4 Conclusion 373
16. Investigating the host/pathogen cross-talk mechanism during Clostridium difficile infection for drug targets by constructing genetic-and-epigenetic interspecies networks using systems biology method 375 16.1 Introduction 375 16.2 Materials and methods 378 16.3 Investigating the cross-talk mechanism by constructing the genetic-and-epigenetic interspecies network 396 16.4 Discussion and conclusion 422
viii
Contents
17. Investigating the common pathogenic mechanism for drug design between different strains of Candida albicans infection in OKF6/TERT-2 cells by comparing their genetic and epigenetic host/pathogen networks: Big data mining and computational systems biology approaches 427 17.1 Introduction 427 17.2 Materials and methods 430 17.3 Investigating pathogenic mechanism of C. albicans infection by comparing genetic and epigenetic interspecies networks 433 17.4 Discussion 444 17.5 Conclusion 463 17.6 Appendix 463
VI Systematic Genetic and Epigenetic Pathogen/Defensive Mechanism and Systems Drug Design in Viral Infection on Human Cells 18. Constructing host/pathogen geneticand-epigenetic networks for investigating molecular mechanisms to identify drug targets in the infection of Epstein Barr virus via big data mining and genome-wide NGS data identification 489 18.1 Introduction 489 18.2 Materials and methods 492 18.3 Investigating interspecies molecular mechanisms for human B lymphocytes infected with Epstein Barr virus 521 18.4 HVCNs at the first and second infection stage in the lytic phase of B cells infected with EBV 524
18.5 HVCPs at the first and second infection stage during the lytic replication cycle 528 18.6 The transportation process of viral particles through host virus cross-talk interactions at the second infection stage 540 18.7 Overview of the lytic infection molecular mechanism from the first to second infection stage in human B cells infected with EBV 547 18.8 Drug target proteins and multimolecule drug design 551 18.9 Discussion 553 18.10 Conclusion 555
19. Human immunodeficiency virus human interaction networks investigating pathogenic mechanism via for drug discovery: A systems biology approach 559 19.1 Introduction 559 19.2 Investigate pathogenic mechanisms at different stages of human immunodeficiency virus infection 561 19.3 HIV/human interaction networks for multiple drug designs at three infection stages 565 19.4 Methods 577 19.5 Conclusion 583 19.6 Abbreviations 583 19.7 Appendix 584
20. Systems multiple-molecule drug design in infectious diseases: Drug-design specifications approach 591 20.1 Introduction 591 20.2 Systems drug-design method in infectious diseases 594 20.3 Discussion 599 20.4 Conclusion 601
References 603 Index 647
Preface of huge amounts of data of these complex immunological processes made it clear that this goal cannot be achieved without dynamic interaction models and power computational methods. Then a new big data driven and systems-driven systems immunology is being created. However, immunologic mechanisms are highly complex. The application of systems biology, bioinformatics, and big data mining methods for defining these immunologic mechanisms has high promise for moving the scientific field forward. Most microorganisms associated with a host body are bacteria and viruses. They normally infect specific sites. Sometimes, they compete with pathogens; other times they are capable of producing opportunistic infections. The host’s ability to resist infection depends on some defense mechanisms against microbial invasion. Host resistance arises from both specific (adaptive) and nonspecific (innate) body defense mechanisms. Inflammation, the alternative complement pathway, phagocytosis, cytokines, and natural killer cells are other examples of nonspecific defenses that help one to protect the host against microorganisms. Recently, due to the advanced host/pathogen two-sided genome-wide highthroughput data, we could construct host/ pathogen interaction networks to obtain a core network marker for investigating offensive and defensive mechanisms between host and pathogen by systems biology method as the physicists could predict the planet Pluto by solar dynamic model and measured data before
The immunology field has entered an exciting new era in which our expanding knowledge and data of immune regulation is leading to innovative new therapies for the treatment of cancer, infectious diseases, autoimmune and inflammatory disorders, and other diseases. The major task of an immune system is to defend the host against infections. The immune system evolves as a defensive mechanism against foreign particles and cancer cells in an organism. It interacts with self and foreign components and mounts adequate responses against pathogenic foreign and mutated self. At the same time, it should tolerate self and most of the other environmental particles so that an organism can maintain a healthy state. The immune responses are extraordinarily complex, involving the dynamic interaction of a wide array of tissues, cells, and molecules, which could not be measured directly by the wet lab. Immunology has traditionally been a qualitative science describing the physiological cellular and molecular components of the immune system and their functions. The traditional approaches are by and large reductionist, avoiding complexity, but providing detailed knowledge of a single event, cell, or molecular entity. In recent years, high-throughput technology and big database mining schemes, in concert with systems biology and bioinformatics, have changed the way of studying the immune system drastically. The immunologists are now aiming to provide a comprehensive description of complex immunological processes. The generation
ix
x
Preface
discovering it. From a core network biomarker, we could also identify multiple drug targets to design multiple molecule drugs via the drug data mining method from the systems medicine perspective. In this book, since inflammation is caused as the first response to infection and tissue damage by some microorganism, or as a result of certain types of immune reactions, a systems biology approach is first introduced to construct the gene regulatory network of systematic inflammation and infection microbiology via microarray and database mining. Then the dynamic crosstalk analysis among tumor necrosis factor (TNF)-R, TLR-4, and IL-1R signaling in TNF-induced inflammatory response is discussed in the infection of microorganisms. Wherever cellular life occurs, pathogens are also found. The immune systems are evolved to defend the organism against these intruders. Since pathogens evade or interfere with specific cellular pathways to escape immune response, studying host pathogen or host virus interactions at the level of single-gene effects, however, fails to provide a global systems level understanding. So again, to understand host pathogen interaction mechanisms calls for a close collaboration between microbiology and immunology at the system level. We will globally screen the potential biofilmrelated transcription factors to understand the pathogen invasion mechanism of pathogens via network comparison at first. The recent explosion of information in innate and adaptive immune pathways for recognition, effect or responses, and genetic regulation has given impetus to investigations into both normal physiology and immunopathology. This book will present a systematic method for the investigation of innate and adaptive immune mechanisms in the infection of microorganisms via genome-wide two-sided microarray data
identification. First, a dynamic interspecies interaction network between host and pathogen during infection will be constructed and then a cross-talk network biomarker will be extracted from the pathogen host interaction network. The prediction of phenotype-associated genes of the host in the infection process is then given via a cellular network method. Finally, the infection-related and defense-related genes are identified via the dynamic host pathogen interaction network. An interspecies protein protein interaction network is also constructed for the characterization of host pathogen interactions. The essential functional modules for pathogenic and defensive mechanisms are also revealed, for example, the roles of transforming growth factor-signaling and apoptosis pathways in the innate and adaptive immunity, and the shift in host pathogen interaction mechanism from innate to adaptive immunity. Finally, some significant drug targets are identified based on pathogenic and defensive mechanisms, from which some potential therapeutic multiple molecules with less side effects are discovered by the drug data mining method. Epigenomics has emerged as a promising field and has addressed the gaps in our understanding of immunology and infection diseases. Epigenetic modification can affect after the DNA structure by methylation and acetylation, the chromatin structure by altering the scaffolding protein, and histone, and chromatin structure by small noncoding RNAs. Recently, systems genetic and epigenetic biology is a research field to apply informatics and systems techniques to investigate the complexities of infectious diseases through genomic and epigenomic data. This book will investigate systematic genetic and epigenetic pathogenic/defensive mechanisms and identify significant drug targets during different bacterial infections on human cells
Preface
by systems biology method via big data mining and two-sided next-generation sequencing (NGS) data identification. Recently, HIV and Epstein Barr viruses cause severe syndromes that are fatal to humans. In this book, with the help of simultaneous two-sided time course HIVhuman high-throughput sequence data (NGS), reverse transcription-polymerase chain reaction data, miRNA, and other omics data, the interspecies protein protein miRNA interaction (PPMI) network is constructed based on dynamic interaction model for the host and pathogen interaction process. Principal interspecies PPMI networks are extracted as the host pathogen cross-talk network at different stages of the HIV-infection process. By comparing host pathogen cross-talk networks in HIVinfected cell with mock cell at different stages of the HIV-infection process, respectively, host pathogen cross-talk network markers at different infection stages are extracted. The common core and specific host pathogen cross-talk network marker at the whole infection stage and different infection stages are found to investigate the shift of the pathogenic and defense mechanism in the HIV-infection process. Similarly, the host pathogen cross-talk networks at different infection stages of Epstein Barr virus are also found by big database mining and systems biology method. The core cross-talk networks and pathways are also found from interspecies
xi
genetic and epigenetic networks by twosided high-throughput data to get insight into the pathogenic scheme of Epstein Barr virus and defense mechanism of human. This might offer a new way of investigating multiple drug target for potential multimolecule drug design for efficient therapeutics of AIDS and Epstein Barr virus disease. This book streamlines two different scientific disciplines, that is, immunology and microbiology, into one book. This book has the promise to serve as a roadmap for enhancing one’s understanding of immunology and microbiology concepts and systems biology approaches to molecular biology research on infectious diseases. This book could provide a specific audience of researchers with valuable information on the system methods, data analysis, extraction of pathways and cellular functions, identification of biomarkers and drug target, discovery of multiple molecular drugs, and the general application of systems biology, or, in this case, systems immunology and microbiology.
Acknowledgments I would like to thank Prof. Y.C. Wang, Prof. C. Lin, Prof. C.Y. Lan, Prof. Y.J. Chuang, Prof. W.P. Hsieh, Dr. C.C. Wu, and Dr. C.W. Li for their wonderful cooperations on some research topics of systems immunology and infection microbiology, on which this book is based. Finally, I would also like to thank Ms. Chih-Yin Wang for her careful typing of this book.
C H A P T E R
1 Introduction to systems immunology and infection microbiology 1.1 Introduction The immunology field has entered an exciting new era in which our expanding knowledge of immune regulation is leading to innovative new therapies for the treatment of cancer, infectious diseases, autoimmune and inflammatory disorders, and other discoveries. In the last decade, systems biology and immunology and their applications in medicine have witnessed a revolution in the methods and tools available for bioscience research. The development of new laboratory techniques has been supported by the maturity of bioinformatics tools, databases, and their subsequent applications. The new biotechniques and methods have enabled the systematic collection and analysis of large amounts of data in the global perspective in what have been called “-omics” approaches and thus the use of the suffix “-omics” has expanded across the study of the different molecular levels (genomics, transcriptomics, and metabolics) and disciplines (immunomics). A common aspect across all these approaches is the ability to generate unprecedented amounts of data that require strong support from bioinformatics for their processing, that is, data storage and retrieval in databases, and standards. The number of tool and databases available in this field has increased during the last decade and several of these resources and methods have been oriented to a specific aspect of immunology, such as the major histocompatibility complex or T-cell receptors, and provide single-level analysis or information. In some cases some of the resources combine a few of these aspects but either remaining at the same molecular level or combing nucleotide and protein sequences [1]. The aim of systems biology is to provide a “system” or “multilevel” understanding of biological processes through the integration and modeling of different data sources. Scientists who apply systems approaches as a method in their research are looking for methods how to generate big datasets, how to analyze these big datasets, how to extract valuable information from these datasets, and also how to connect data obtained from different analysis (i.e., genomics and proteomics). Therefore “immunology and infection microbiology,” with all its complex interactions between different species, different cell types, different regulatory and signaling pathways, epigenetic modifications, and different molecules and genes, which always could not be measured directly by experiments in wet lab, provides a
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00011-0
1
© 2021 Elsevier Inc. All rights reserved.
2
1. Introduction to systems immunology and infection microbiology
perfect environment for the development and use of approaches based on systems biology. However, physicists could measure the weight of Sun by the weight and velocity of the Earth based on gravitation model. Further, the planet Pluto had been predicted by the dynamic model of solar system before it was discovered. Obviously, with adequate system model and enough two-sided data, some complex host/pathogen genetic and epigenetic interactions and mechanisms could be efficiently measured and identified indirectly. Systems biology combined with data gathered from “-omics” methods is key at defining diseases not only by its traditional signs and symptoms but also by its underlying molecular causes and other factors such as environmental risk factors [2]. In this context the systems immunology plays a central role in the new era of big data driven and systemdriven medicine research. The various genes and proteins responsible for immunity constitute the immune system and their orchestrated response to defend foreign/nonselfsubstances (antigen) is known as the immune responses. When an antigen attacks the host system, two distinct, yet interrelated, branches of the immune system are active as the nonspecific/innate and specific/adaptive immune response. Both of these immune systems have certain physiological mechanisms, which enable the hosts to recognize foreign materials to themselves and to neutralize, eliminate, or metabolize them. Innate immunity represents the earliest development of protection against antigens. Adaptive immunity has again two branches—humoral and cell mediated. It should be noted that both innate and adaptive immunities do not work independently. Moreover, most of the immune responses involve the activity and interplay of both the humoral and the cell-mediated immune branches of the immune system [2]. These branches will be described in detail with the invasive and defensive mechanism of host and pathogen from the perspective of systems biology. Since the inflammation is a painful swelling and redness caused as a first response to infection and tissue damages by some microorganisms, or as a result of certain types of immune reactions, one of the extensively investigated biological systems is the inflammatory system of humans. It orchestrates a complex biological process, which engages a variety of cell types that eliminate invading microorganisms to protect the host. Infected host immune systems recognize the ligands on the surface of disease-causing pathogens and mobilize specific inflammatory defense mechanisms. On the other hand, pathogens can proactively perturb host defense signaling pathways to enhance their survival. These complex pathogenic mechanism and defensive mechanism in the infection process need to be described by systematic method, that is, systems immunology and infection microbiology method [2]. The scientists in the field of immunology and microbiology (specifically those investigating host pathogen interaction) are looking for information about biomarkers, pathways, cells, etc. that are involved in the host defense against pathogens. Especially the link between innate and adaptive immune response is of great interest in the current research. In particular, these research questions are difficult to investigate with in vitro models. The complexity of in vivo models requires a systems biology approach to extract more valuable information. Researchers involved in drug discovery require additional knowledge on biomarkers in immune system to optimize the efficacy and safety of their drugs. Recently, two-sided genome-wide high-throughput data of host and pathogen have been generated in different infection diseases. Further epigenomics has been emerged as a promising field to address the gaps in our understanding of immunology and infectious
Systems Immunology and Infection Microbiology
1.2 Content and general outline of the book
3
diseases. Therefore the technologies of systems biology and genetic and epigenetic biology could be applied to investigate the complexities of infection diseases through two-sided genomic and epigenomics data. This book will investigate systematic genetic and epigenetic pathogenetic/defensive mechanisms and identify core biomarkers for crucial drug targets during different bacterial and viral infection processes on human by systems biology method via big data mining and two-sided NGS (next-generation sequencing) data identification. It has been suggested that an infectious disease is rarely a consequence of an abnormality in a single gene/protein given the functional interdependence between molecular components in the cell and that both network connectivity and dynamics are important targets for therapeutic intervention. Consequently, we believe that systems medicine target network dynamics can be developed for infectious diseases with the help of proposed interspecies host/pathogen genetic and epigenetic networks in this book.
1.2 Content and general outline of the book This book consists of six parts: Part I is the introduction of systems immunology; Part II is about systems infection; Part III introduces systematic inflammation and immune responses in the restoration and regeneration process; systems innate and adaptive immunity in infection process are introduced in Part IV; systematic genetic and epigenetic pathogenic/defense mechanisms during bacterial infection on human cells are introduced in Part V; finally, Part VI will introduce systematic genetic and epigenetic pathogenic/defense mechanisms during virus infection on human cells. In this book, after we introduce dynamic modeling and system identification for biological networks in Chapter 2, Biological Network Modeling and System Identification in Systems Immunology and Infection Microbiology, a systems biology approach is given to construct the gene regulatory network (GRN) of systemic inflammation by dynamic modeling via microarray and big database mining in Chapter 3, Identifying the Gene Regulatory Network of Systems Inflammation in Humans by System Dynamic Model via Microarray Data and Database Mining. Then a systematic approach based on a stochastic dynamic model is proposed to gain an insight into the underlying defense mechanisms of inflammation via the construction of corresponding signaling networks upon pathogens [3,4]. The highly ranked cross talks that are functionally relevant to the related pathways are identified for inflammatory responses. A bow tie structure is also extracted from these cross-talk pathways, suggesting the robustness of the network structure; the coordination of signal transductions; and the feedback control for efficient inflammatory responses to different pathogens in Chapter 4, Dynamic Cross-Talks Analysis Among Signaling Transduction Pathways in the Vascular Endothelial Inflammatory Response System of Humans. In addition, this systematic approach of systems immunology can be applied to other signaling networks under different conditions in different species. Because Candida albicans is one of leading causes of hospital-acquired bloodstream infections, a number of its virulence factors have been investigated, including the ability to undergo morphogenesis and phenotypic switching, as well as the secretion of adhesions and hydrolytic enzymes [5,6]. We will propose a cellular network approach to predict the phenotype-associated genes in the C. albicans infection process via high-throughput data and big database mining. C. albicans is the
Systems Immunology and Infection Microbiology
4
1. Introduction to systems immunology and infection microbiology
most prevalent opportunistic fungal pathogen in human to cause superficial and serious systemic infections. The infection process can be divided into three stages: (1) adhesion, (2) invasion, and (3) host cell damage. To enhance the understanding of these C. albicans infection stages, Chapter 5, Prediction of Infection-Associated Genes via a Cellular Molecular Network Approach: A Candida albicans Infection Case Study, aims at infection-associated genes, which are involved in these three infection stages, and focuses on their roles in C. albicans host interactions [7]. In light of the fact that proteins that are closer to one another in a protein interaction network are more likely to have similar functions and that genes that are regulated by the same transcription factors (TFs) tend to have similar functions, a cellular network approach is proposed to predict the three stage associated genes; a total of 4, 12, and 3 genes were predicted as adhesion-, invasion-, and damage-associated genes during C. albicans infection, respectively [8]. These predicted genes highlight the fact that cell surface components are critical for cell adhesion and that morphogenesis is crucial for cell invasion. In addition, they provide drug targets for further investigation into the infectious mechanisms of the three C. albicans infection stages. These results give insights into the immune responses elicited in C. albicans during its interaction with the host, possibly instrumental in identifying novel therapies to treat C. albicans infection. Since the formation of biofilm is a major virulence factor in C. albicans pathogenesis and is related to antidrug resistance of this organism, in Chapter 6, Global Screening of Potential C. albicans Biofilm-Related Transcription Factors by Network Comparison via Big Database Mining and Genome-Wide Microarray Data Identification, based on the GRN analysis, we will develop an efficient framework to integrate different kinds of databases and microarray data from genome-scale analysis, for global screening of potential TFs controlling C. albicans biofilm formation. Based on TF-gene regulatory association information and microarray data, a stochastic dynamic model was employed to reconstruct the GRN of C. albicans biofilm and planktonic cells. The two GRNs were then compared and a score of relevance value was proposed to compute and estimate the quantity of correlation coefficient of each potential TF with biofilm formation. In Chapter 6, Global Screening of Potential C. albicans Biofilm-Related Transcription Factors by Network Comparison via Big Database Mining and Genome-Wide Microarray Data Identification, a total of 23 TFs are found to be related to biofilm formation [4]. Though clinical research and development have progressed in the last decades, infection diseases remain a top global problem in public health today, being responsible for millions of morbidities and mortalities each year. Hence, many studies have sought to investigate host pathogen interactions in infection process from various viewpoints in attempts to understand pathogenic and defensive mechanisms, which could help control pathogenic infections. However, most of these efforts have focused predominately on the host or the pathogen individually rather than on both interaction partners simultaneously. In Chapter 7, Identification of Infection-Related and Defense-Related Genes Through Dynamic Host Pathogen Interaction Network, with the help of simultaneously quantified C. albicans zebrafish time profile interaction transcriptomics and other omics data, a systematic method was developed to construct the interspecies protein protein interaction (PPI) network for C. albicans zebrafish interactions based on the inference of orthology-based PPIs and the dynamic modeling of regulatory responses [8]. The identified C. albicans zebrafish interspecies PPI network highlights the interaction between C. albicans pathogenesis and the zebrafish redox process, indicating that redox status is critical in the defense and offense between the host and pathogen. With continued accumulation of interspecies transcriptomics data, the proposed method in this chapter could
Systems Immunology and Infection Microbiology
1.2 Content and general outline of the book
5
be used to explore progressive network rewiring in future, which could improve the development of network medicine for infectious disease. Since C. albicans infections and candidiasis are difficult to treat and create very serious therapeutic challenges. In Chapter 8, Host Pathogen PPI Network for C. albicans Pathogenesis and Zebrafish Redox Process Through Dynamic Interspecies Interaction Model and Two-Sided Genome-Wide Microarray Data, based on twosided interactive time profile microarray data of C. albicans and zebrafish during infection, the infection-related PPI network of two species and the intercellular PPI network between host and pathogen were simultaneously identified by dynamic infection model as an integrated PPI network consisting of intercellular invasion and cellular defense processing during infection. The signal transduction pathways in regulating morphogenesis and hyphal growth of C. albicans are further investigated based on significant interactions found in the intercellular PPI network. Two infection-related PPI networks are also developed corresponding to different infection stages (adhesion and invasion) and then compared with each other to identify significant proteins as biomarkers from which we can gain more insight into the pathogenic role of hyphal development mechanism in the C. albicans infection process. Important defense-related proteins in zebrafish are also predicted as biomarkers using the same approach. The hyphal growth PPI network, zebrafish PPI network, and host pathogen intercellular PPI network are combined to form an integrated infection-related PPI network that helps us understand the systematic mechanisms underlying the pathogenicity of C. albicans and the immune response of host, which may help identify biomarkers as drug targets to improve medical therapies and facilitate the development of new antifungal drugs. Even if clinical and biological significance of the study of fungal pathogen C. albicans has markedly increased, the explicit pathogenic and invasive mechanisms of such host pathogen PPI network have not yet been fully elucidated. The essential functional modules involved in C. albicans zebrafish interactions were investigated for pathogenic and defensive mechanism in Chapter 9, Essential Functional Modules for Pathogenic and Defensive Mechanisms via Host/Pathogen Crosstalk Network by Database Mining and Two-Sided Microarray Data Identification. Adopting a systems biology approach, the early- and late-stage PPI networks for both C. albicans and zebrafish are simultaneously constructed. By comparing PPI networks at the early and late stages of the infection process, several critical functional modules are identified for both pathogenic and defensive mechanisms via host/pathogen cross-talk network. Functional modules in C. albicans, like those involved in hyphal morphogenesis, ion and small molecule transport, protein secretion, and shifts in carbon utilization, were found to play important roles in pathogen invasion and damage caused to host cells. Further, the functional modules in zebrafish, such as those involved in immune response, apoptosis mechanisms, ion transport, protein secretion, and hemostasis-related processes, are found to be significant as defensive mechanisms during C. albicans infection. The essential functional modules thus identified by two-side microarray data could provide insights into the molecular mechanisms of host pathogen cross-talk interactions during the infection process and thereby refer potential therapeutic strategies to treat C. albicans infections. After injury of the cerebellum, cellular signal transduction and defense mechanism (inflammation and immune response) are provoked to coordinate the subsequent process, that is, after G-protein pathways, endocannabinoid pathway and neurotransmitted pathways cause the activation of cell-cycle pathways for neurogenesis, the defense mechanism, that is, inflammation and immune response, could interact with cell-cycle pathways and neurotransmitter-related pathways for restoration from traumatic brain injury. Inflammation
Systems Immunology and Infection Microbiology
6
1. Introduction to systems immunology and infection microbiology
could be interpreted as a necessary and sufficient condition for neurogenesis. Hence, there should exist cross talks between inflammation and cell-cycle pathways for coordination in cerebellar wound healing process. These cross talks will be discussed in Chapter 10, The Role of Inflammation and Immune Response in Cerebella Wound Healing Mechanism After Traumatic Injury in Zebrafish. After inflammation has generally been regarded as a negative factor in stroke recovery, this point of view has recently been challenged by demonstrating that inflammation seems a necessary and sufficient factor for regeneration in the zebrafish brain injury model. This close relationship of regeneration with inflammation suggests that it is necessary to reexamine the immune system’s role in strokes. In Chapter 11, Key Immune Molecular Biomarkers in the Pathomechanisms of Early Cardioembolic Stroke: Multidatabase Mining and Systems Biology Approach, we investigate the role of immune-related functions via their PPI network with other molecular functions by systems biology method in early cardioembolic stroke. Based on protein interaction models through microarray data from the blood of stroke subjects and healthy controls, PPI networks were constructed to delineate molecular interactions at four early stages of cardioembolic stroke [9]. A comparative analysis of functional networks could identify systematic interactions of immune-related functions with other molecular functions, including growth factors, neuro/hormone and housekeeping functions. These provide a potential systematic pathomechanism for early stroke pathophysiology. In addition, several biomarkers as potential drug targets of microRNA (miRNA) and methylation regulations were derived based on basal level changes observed in the core networks and literature. The results of this chapter provide a more systematic understanding of stroke progression mechanisms from an immune perspective and shed light on drug targets for acute stroke treatments. Cross-talk mechanisms between host and pathogen are crucial in the infection process. To obtain a systematic insight into the defense mechanisms of the host and pathogenic mechanisms of the pathogen, pathogen host PPI network interactions in the infection process have become a novel and promising research topic in the field of infection disease. In Chapter 12, Cross-Talk Network Biomarkers of Pathogen Host Interaction Network From Innate to Adaptive Immunity, two pathogen host dynamic cross-talk PPI networks are constructed to investigate the transition of pathogenic and defensive mechanisms from the innate to adaptive immune system in the entire infection process based on two-side time profile microarray data of C. albicans zebrafish infection model and database mining. Potential cross-talk PPI network biomarkers were identified for the transition from innate to adaptive immunity based on proteins with larger interaction variations inside the host and pathogen cells, and at the interface between the host and pathogen cells. The crosstalk PPI network biomarkers consist of proteins with large interaction variation scores in the pathogen host PPI difference network. From the cross-talk network biomarkers, innate and adaptive immunity are successfully investigated from a systems biology perspective. In view of these results the proposed cross-talk PPI network biomarkers may serve as potential therapeutic targets of infectious diseases [1]. Infected zebrafish could coordinate defensive and offensive molecular mechanisms in response to C. albicans infections, and invasive C. albicans could also coordinate offensive and defensive molecular mechanisms to respond to the host. However, the systematic knowledge of the ensuing infection-activated signaling networks in both host and pathogen and their interspecific cross-talk PPI network during the innate and adaptive phases of the infection processes
Systems Immunology and Infection Microbiology
1.2 Content and general outline of the book
7
remains incomplete [1]. In Chapter 13, The Coordination of Defensive and Offensive Molecular Mechanisms in the Innate and Adaptive Host Pathogen Interaction Networks, dynamic network modeling, PPI databases, and dual transcriptome data from zebrafish and C. albicans during infection are used to infer infection-activated host pathogen dynamic PPI networks. The consideration of host pathogen dynamic PPI systems as innate and adaptive loops and subsequent comparisons of inferred innate and adaptive PPI networks could indicate the previously unrecognized cross talk between known pathways and suggest roles of immunological memory in the coordination of host defensive and offensive molecular mechanisms to achieve specific and powerful defense strategy against pathogens [10]. Moreover, pathogens could enhance intraspecific cross talk and abrogate host apoptosis to accommodate host defense mechanisms during the adaptive phase. Accordingly, links between physiological phenomena and changes in the defensive and offensive molecular mechanisms could highlight the importance of host pathogen molecular interaction networks, and consequent inferences of the host pathogen defensive and offensive molecular mechanism could be translated into drug targets for biomedical applications [1]. The immune system is an important biological system present in humans and many other vertebrates. Exposure to pathogens can induce various defensive immune mechanisms to protect the host from potential infections and harmful substances derived from pathogens such as parasite, bacteria, and viruses. The complex immune systems of humans and many other vertebrates can be separated into two major categories: the innate and adaptive immune systems. Up to date, a systematic analysis of the complex interactions between the two subsystems, that is, how they regulate host defense and inflammatory responses, remains challenging. Based on two-side microarray data following primary and secondary infections of zebrafish by C. albicans, in Chapter 14, The Significant Signaling Pathways and Their Cellular Functions in Innate and Adaptive Immune Responses During Infection Process, two intercellular PPI networks are constructed for primary and secondary responses of the host. Further, TGF-β signaling and apoptosis pathways are found as two of the main functional modules involved in primary and secondary infections [1]. The initial in silico analyses in this chapter will pave the way for further investigation into the interesting roles played by TGF-β signaling pathway and apoptosis pathway in the innate and adaptive immune system in zebrafish. Such systematic insights could lead to therapeutic advances and might improve drug design in the successive battle against infectious diseases [5]. Tuberculosis is a disease caused by Mycobacterium tuberculosis (Mtb) infection. Mtb is one of the oldest human pathogens and evolves its infectious mechanisms implied in human evolution. Since lungs are the first organ exposed to aerosol-transmitted Mtb during gaseous exchange, macrophages (Mφs) and dendritic cells (DCs) are the guards of the immune system in the lungs for the most important defense against Mtb infection. Several studies have discussed the functions of Mφs and DCs during Mtb infection. However, the genome-wide pathways and genetic regulatory networks are still incomplete. Furthermore, the immune response induced by Mφs and DCs may vary. Therefore, in Chapter 15, Genetic and Epigenetic Host/Pathogen Networks for Cross-Talk Mechanisms in Human Macrophages and Dendritic Cells During MTB Infection, we will analyze the cross-talk genome-wide genetic-and-epigenetic interspecies networks (GWGEINs) between Mφs versus Mtb and DCs versus Mtb to investigate the defensive and offensive mechanisms of both the host and pathogen as it relates to Mφs and DCs during early Mtb infection [11]. First, we perform big database mining to construct candidate host/pathogen cross-talk GWGEIN between human
Systems Immunology and Infection Microbiology
8
1. Introduction to systems immunology and infection microbiology
cells and Mtb. Then we identify dynamic system models to characterize the molecular mechanisms, including intraspecies GRNs/miRNA regulation networks, intraspecies PPI networks (PPINs), and the interspecies PPIN of the cross-talk GWGEIN. Further a system identification method and a system order detection scheme are applied to host/pathogen interactive dynamic models to identify the real host/pathogen cross-talk GWGEINs using the time profile microarray data of Mφs, DCs, and Mtb. After the identification of real crosstalk GWGEINs, the principal network projection (PNP) method is then employed to extract host pathogen core networks (HPCNs) between Mφs versus Mtb and DCs versus Mtb during Mtb infection process. Thus we could investigate the underlying cross-talk mechanisms between the host and the pathogen to reveal how the pathogen counteracts host defense mechanisms in Mφs and DCs during Mtb H37Rv early infection. Based on these findings, Rv1675c is proposed as a significant biomarker for a potential drug target because of its important defensive role in Mφs [7]. Furthermore, the membrane essential proteins v1098c, and Rv1696 in Mtb could also be significant biomarkers for potential drug targets because of their important roles in Mtb survival in both cell types. Accordingly, the drugs Lopinavir, TMC207, ATSM, and GTSM are proposed as potential therapeutic treatments for Mtb infection since they target the above potential drug targets (biomarkers) [7]. Clostridium difficile is considered as the leading cause of nosocomial antibioticassociated diarrhea, and the major etiologic agent of pseudomembranous colitis [12]. We find C. difficile infection (CDI) can cause toxic megacolon, intestinal perforation, and death in several cases. The intestinal epithelium is considered as the first tissue to be encountered in the adhesion and colonization of C. difficile and serves as a physical defense barrier against the infection C. difficile. In spite of the well-characterized cytotoxicity, at present, only few studies have investigated the genome-wide interplay between host cells and C. difficile. The aim of Chapter 16, Investigating the Host/ Pathogen Cross-Talk Mechanism During Clostridium difficile Infection for Drug Targets by Constructing Genetic and Epigenetic Interspecies Networks Using Systems Biology Method, is to systematically investigate the genetic and epigenetic molecular cross-talk mechanisms between human colorectal epithelial Caco-2 cells and C. difficile during the early and late stages of infection [12]. To investigate the genetic and epigenetic crosstalk mechanisms during the progression of infection, we introduced a systems biology approach using big data mining, dynamic network modeling, a genome-wide data identification method through two-side microarray data, system order detection scheme, and PNP method. In this chapter, we focus on constructing GWGEINs and subsequently extracting of HPCNs to investigate the pathogenic progression of underlying host/pathogen genetic and epigenetic mechanisms from the early to late stages of CDI. Based on these results, it is suggested that the cell wall proteins CD2787 and CD0237, which play an important role in both cell adhesion and pathogen defense mechanisms, could be considered as significant biomarkers for potential drug targets. Further, the crucial proteins of C. difficile for sporulation, including CD1214, CD2629, and CD2643, can also be considered as significant biomarkers for potential drug targets since spore-mediated reinfection is a critical issue. Finally, we propose a potential multimolecule drug containing E64, IgY, REP3123, camptothecin, and apigenin for the treatment of CDI owing to their abilities to inhibit the abovementioned targets and to maintain the homeostasis of host dysfunction proteins.
Systems Immunology and Infection Microbiology
1.2 Content and general outline of the book
9
In Chapter 17, Investigating the Common Pathogenic Mechanism for Drug Design Between Different Strains of Candida albicans Infection in OKF6/TERT-2 Cells by Comparing Their Genetic and Epigenetic Host/Pathogen Networks: Big Data Mining and Computational Systems Biology Approaches, we will employ systems biology method to investigate the common and specific infection mechanisms in human oral epithelial cells during the infection of different strains of C. albicans [8]. We construct candidate host pathogen GEINs through big data mining, identify host pathogen cross-talk GEINs via two-side NGS data to prune false positives in candidate host pathogen GEIN through system order detection scheme, extract core host pathogen cross-talk GEINs by PNP, and compare to core host pathogen cross-talk GEINs of different strains to detect host pathogen core cross-talk networks (HPCNs) as network biomarkers to investigate the common and specific pathogenic mechanism from the infection progression of different strains of C. albicans infection. On the basis of our network marker of common pathogenic mechanisms, we indicate that orf19.5034 (YBP1) has anti-ROS (reactive oxygen species) ability, and orf19.939 (NAM7), orf19.2087 (SAS2), orf19.1093 (FLO8), and orf19.1854 (HHF22) play an important role in hyphal growth and pathogen protein interaction [8]. Moreover, orf19.5585 (SAP5), orf19.5542 (SAP6), and orf19.4519 (SUV3) will cause biofilm formation. In addition, orf19.7247 coordinates other pathogen proteins for the degradation of host cell protein CDH1. As the previous studies indicates that orf19.1816 (ALS3), orf19.610 (EFG1), orf19.1321 (HWP1), orf19.4433 (CPH1), and orf19.723 (BCR1) are also verified as important roles related to endocytosis and morphological transformation by the results in Ref. [13]. Eventually, these C. albicans pathogenic proteins can be considered as drug targets and some potential common multiple-molecule drugs including terbinafine, cerulenin, tunicamycin, tetrandrine, and tetracycline are proposed for the therapeutic treatment of different strains of C. albicans due to their suppression abilities toward abovementioned drug targets and common pathogen molecules of network marker. Epstein Barr virus (EBV), also called human herpesvirus 4, is prevalent in all human populations [6]. EBV mainly infects human B lymphocytes and epithelial cells, so it is associated with the various malignancies about them. In Chapter 18, Constructing Host/Pathogen Genetic- and Epigenetic-Networks for Investigating Molecular Mechanisms to Identify Drug Targets in the Infection of Epstein Barr Virus via Big Data Mining and Genome-Wide NGS Data Identification, we want to construct the interspecies networks to investigate the crosstalk molecular mechanisms between human B cells and EBV at the first infection stage and at the second infection stage during the EBV infection, respectively [14]. We first construct a candidate genome-wide interspecies genetic and epigenetic network (candidate GIGEN) through big databases mining. Then we prune the false positives in the candidate GIGEN to obtain the real GIGENs at the first and second infection stage in the lytic phase by their corresponding NGS data through the dynamic genetic and epigenetic regulation models, the system identification approach, and the system order detection method. The GIGEN consists of PPINs, gene/miRNA/lncRNA regulation networks (GRNs), host virus cross-talk networks. Since the GIGENs are still very complex. In order to gain an insight into the crosstalk molecular mechanisms of the EBV infection, the core GIGENs, including host virus core networks and host virus core pathways, are extracted from GIGENs by the PNP method [6]. On the basis of these identified results, we find that EBV can exploit viral proteins and miRNAs to inhibit the activities of the epigenetics-associated human proteins or genes at first and then may hijack the functions of the epigenetic regulations to make human
Systems Immunology and Infection Microbiology
10
1. Introduction to systems immunology and infection microbiology
immune responses dysregulated. Moreover, viral proteins EBNA2 and Zta play the primary role in the initiation of EBV lytic phase and can be considered as the potential drug targets. EBNA2 is efficient in upregulating genes involving in the infected cell proliferation and survival to evade human immune attacks. Besides, the immediate-early lytic gene product, Zta, is a TF able to induce the entire program of the EBV lytic gene expression. In addition, EBV membrane protein LMP2B works in cooperation with LMP1 via viral BLLF2 to enhance the activity of human B cells and facilitate the production and transportation of the viral particles, and EBV nuclear antigen EBNA1 is crucial for EBV to reproduce via antiapoptosis; thus these viral proteins can be suggested as the potential drug targets [6]. Eventually, we propose the multimolecule drugs composed of thymoquinone (TQ), valpromide (VPM), and zebularine (Zeb) to target the drug targets for the therapeutic intervention as the inhibitors of the EBV-associated malignancies. In Chapter 19, HIV Human Interaction Networks Investigating Pathogenic Mechanism via for Drug Discovery: A Systems Biology Approach, omics data, including time profile data from high-throughput sequencing, real-time polymerase chain reaction (RT-PCR), human miRNA, and PPI database, are used to construct the interspecies protein protein and miRNA interaction network during HIV-1 infection by applying system modeling and identification. Network biomarkers in the host pathogen cross-talk network at different stages of infection are extracted by comparing HIV- and mock-infected cells. Further, by comparing stages of infection, we find a common core network biomarker in the host pathogen cross-talk network that defines host pathogen interaction throughout the infection cycle and also find the specific network biomarker in the network at different stages [15]. By investigating the common core network biomarker and shifts in specific network biomarkers in the host pathogen cross-talk network in the progress of HIV-1 infection, we can obtain further insights into the pathogenesis of HIV and the defense response in host CD4 1 T cells. Our results might offer new opportunities to design efficient therapies against AIDS. Since pathogenic mechanisms of infectious diseases are very complex, a systems drug design strategy for multiple-molecule drug is introduced to meet drug design specifications for infectious diseases in Chapter 20, Systems Multiple Molecule Drug Design in Infectious Diseases: Drug Design Specifications Approach. In this chapter a genetic and epigenetic network biomarker is first identified based on pathogenetic/host defensive mechanism for drug targets of infectious diseases by systems biology method via genome-wide highthroughput data. A combination of computational network based biomarker as multiple drug targets with computational drug design via mining is then proposed for systems drug discovery with more precise medicine and less side effects. Finally, several multiplemolecule drug design examples of infectious diseases based on systems drug design strategy are proposed to meet several design specifications for the therapeutic treatment of infectious diseases [16].
Systems Immunology and Infection Microbiology
C H A P T E R
2 Biological network modeling and system identification in systems immunology and infection microbiology 2.1 System identification for gene regulatory network Suppose we want to construct a gene regulatory network (GRN) of some biological condition from big database mining and the corresponding microarray data. The first step is to construct a candidate GRN via big database mining method. However, there are so many false-positive regulations in the candidate GRN, due to prediction and diversity of biological condition in databases. In this situation, we construct a dynamic regulatory equation for each gene and then employ a parameter estimation method to estimate the regulation coefficients of transcription factors (TFs) to each gene. By system order determination method, we could detect the system order (the number of real regulatory TFs) by microarray data for each gene to prune false-positive regulatory TFs in the candidate GRN, which are out of the system order of gene regulatory dynamic model. In this section a system identification method for GRN is introduced. A linear dynamic gene regulatory model in Fig. 2.1 is given by the following regulatory equation: xi ðt 1 1Þ 5 xi ðtÞ 1
Mi X
aij yj ðtÞ 1 ki 2 λi xðtÞ 1 wi ðtÞ
(2.1)
j51
where yi ðtÞ denotes the protein expression level of the jth regulatory TF of gene i at time t; xi ðtÞ denotes the gene expression level (microarray time profile) of the ith gene at time t. aij denotes the regulatory ability of the jth TF on the ith gene; wi ðtÞ denotes the random noise due to modeling error, measuring noise or environmental disturbance; ki is the basal level of the ith gene to denote some unknown regulations, for example, DNA methylation. λi denotes the degradation rate of mRNA.
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00021-3
13
© 2021 Elsevier Inc. All rights reserved.
14
2. Biological network modeling and system identification in systems immunology and infection microbiology
FIGURE 2.1 A simple gene regulatory network.
Let us denote the state vector and system matrix of GRN consisted of n genes as 2 3 2 3 2 3 2 3 y1 ð t Þ a11 x1 ð t Þ w1 ðtÞ k1 xðtÞ 5 4 ^ 5, yðtÞ 5 4 ^ 5, k 5 4 ^ 5, wðtÞ 5 4 ^ 5, A 5 4 ^ yMn ðtÞ an1 xn ðtÞ wn ðtÞ kn λ 5 diag λ1 ? λn . 2
3 ? a1mM1 & ^ 5, ? anMn
Then the dynamic model of GRN in Fig. 2.1 can be represented by xðt 1 1Þ 5 ðI 2 λÞxðtÞ 1 AyðtÞ 1 k 1 wðtÞ
(2.2)
2.1.1 Least square parameter estimation method If we identify λ, A and k for GRN from (2.2) directly, it will be more complex because the matrix λ and A may be sparse and more round off errors of more complex computations are found. In this situation, we want to identify aij , ki , and λi one gene by one gene from (2.1). In general, the gene regulatory equation in (2.1) could be represented by the following regression equation 3 2 a11 6 a12 7 7 6 6 ^ 7 7 1 wi ðtÞ 6 xi ðt 1 1Þ 5 y1 ðtÞ ? yMi ðtÞ xi ðtÞ 1 6 7 (2.3) 6 aiMi 7 4 1 2 λi 5 ki 5 ϕi ðtÞθi 1 wi ðtÞ for i 5 1; 2; . . .; n I. Systems Immunology
2.1 System identification for gene regulatory network
15
If there are N time profile microarray data for regulatory parameter estimation, then we get the following regression form for each time point of microarray data xi ð1Þ 5 φi ð0Þθi 1 wi ð0Þ xi ð2Þ 5 φi ð1Þθi 1 wi ð1Þ ^ xi ðN Þ 5 φi ðN 2 1Þθi 1 wi ðN 2 1Þ
(2.4)
which can be represented by Xi ðN Þ 5 Φi ðN Þθi 1 Wi ðN Þ 2 3 2 3 2 3 xi ð 1Þ φi ð0Þ w i ð 0Þ 6 xi ð 2Þ 7 6 φ 1 ð 1Þ 7 6 7 7 6 7; Wi ðN Þ 5 6 wi ð1Þ 7 X i ðN Þ 5 6 4 ^ 5; Φi ðN Þ 5 4 5 4 5 ^ ^ xi ð N Þ φi ðN 2 1Þ w i ð N 2 1Þ
(2.5)
By the linear square estimation method [17], we select parameter estimate θ^ i to minimize the following square error [17] 1 min V ðθi Þ 5 min ðXi ðN Þ2Φi ðN Þθi ÞT ðXi ðN Þ 2 Φi ðN Þθi Þ θi θi 2 The optimal estimate θ^ i to minimize the square error in (2.6) is obtained as [17] 21 θ^ i 5 ΦTi ðN ÞΦi ðN Þ ΦTi ðN ÞXi ðN Þ 5 Φ1 i ðN ÞXi ðN Þ
(2.6)
(2.7)
i 5 1; 2; . . .; n Φ1 i ðN Þ
denotes the pseudoinverse of ΦðN Þ. where The covariance of estimation error θ~ i 5 θi 2 θ^ i is given by 21 Cov θ~ i 5 σ2wi ΦTi ðN ÞΦi ðN Þ
(2.8)
where σ2wi IN is the covariance of noise vector Wi ðN Þ, that is, CovðWi ðN ÞÞ 5 σ2wi IN . After parameter estimator θ^ i is estimated in (2.7), the ith row of A in (2.2), that is, ai1 ; . . .; aiMi , the ith element ki of k, and the ith diagonal element λi of λ in gene regulation of the ith gene in (2.1) or (2.3) are also obtained from microarray data, that is, ai1 ; . . .; aiMi , ki and λi are estimated by the least square parameter estimation through N time profile microarray data in (2.7). In general, the GRN in Fig. 2.1 is always constructed based on prior information from database or self-postulation or prediction, that is, the so-called candidate GRN, and there are many false-positive regulations in this candidate GRN. The system order, that is, the regulatory number Mi in (2.1), can be estimated by the following system order detection method, that is, Akaike information criterion (AIC); AIC attempts to include both the estimated variance and model complexity in one statistic as follows [17]: T 2M 1 i (2.9) Xi ðN Þ 2 Φi ðN Þθ^ i 1 AICðMi Þ 5 log Xi ðN Þ2Φi ðN Þθ^ i N N where the first term of AIC is due to the estimated error variance Cov θ~ i and the second term is the model complexity.
I. Systems Immunology
16
2. Biological network modeling and system identification in systems immunology and infection microbiology
AIC in (2.9) decreases as the residual variance Cov θ~ i in (2.8) decreases and increases as the number Mi of regulations increases. AIC will be with a minimum at the correct model order or the number of gene regulatory TFs, that is, the true regulatory TF number Mi could minimize the AIC ðMi Þ in (2.9). Therefore the insignificant regulatory abilities aij out of the true regulatory number Mi of the ith gene are considered as false-positive regulations and should be deleted from the regulatory equation in (2.1), that is, the regulatory equation in (2.1) is pruned based on AIC in (2.9) as follows: xi ðt 1 1Þ 5 ð1 2 λi Þ xðtÞ 1
Mi X
aij yj ðtÞ 1 ki 1 wi ðtÞ
(2.10)
j51
The above parameter estimation process is performed one gene by one gene, then the GRN in Fig. 2.1 can be represented by the following identified model xðt 1 1Þ 5 I 2 λ xðtÞ 1 AyðtÞ 1 k 1 wðtÞ (2.11) where λ, A, and k are identified by the above least square parameter estimation algorithm in (2.7) and pruned by AIC in (2.9) one gene by one gene through the corresponding microarray data.
2.1.2 Maximum likelihood parameter estimation method Another famous parameter estimation method for θi in (2.5) is maximum likelihood parameter estimation algorithm, that is, we want to estimate parameters from the system output measurement Xi ðN Þ. The system parameter value θi should be consistent with the largest probability under output measurement data, that is, the estimate of θi should satisfy the maximum conditional probability [17] max Pðθi jXi ðNÞÞ θi
(2.12)
By the condition probability [17] Pðθi Xi ðN ÞÞ 5
Pðθi Þ PðXi ðN Þjθi Þ Pð X i ð N Þ Þ
(2.13)
In general, it is difficult to calculate Pðθi jXi ðN ÞÞ because we need three probability density functions Pðθi Þ, PðXi ðN ÞÞ and PðXi ðN Þjθi Þ. Since PðXi ðN ÞÞ is not explicit with parameter θi , it does not influence the optimal solution in (2.12). Pðθi Þ denotes the probability density of θi , that is, the priori information of θi . In general, we have no information of Pðθi Þ and always assume Pðθi Þ 5 constant, that is, all values of θi may occur with the same probability. In this situation the optimal parameter estimation method (maximum a priori) in (2.12) is equivalent to the following maximum likelihood estimation (MLE) method [17] max PðXi ðN Þjθi Þ θi
(2.14)
that is, how to select θi so that the probability of Xi ðN Þ is maximum to exploit the appearance of Xi ðN Þ must being with the largest probability. I. Systems Immunology
2.1 System identification for gene regulatory network
17
Since P Xi ðN Þ=θi denotes the likelihood function, it means the output data must be consistent (likelihood) with the largest probability to exploit the occurrence of output data with this θi . From (2.5), it is seen that the probability density function PðXi ðN ÞÞ of Xi ðN Þ is mainly due to the white noise of Wi ðN Þ. If we assume Wi ðN Þ is zero-mean Gaussian noise with covariance matrix σ2wi I, 0 1 1 1 T PðWi ðN ÞÞ 5 ðN21Þ=2 exp@ 2 2 Wi ðN ÞWi ðN ÞA 2σwi 2 2πσwi 0 1 (2.15) 1 1 T 5 ðN21Þ=2 exp@ 2 2 ðXi ðN Þ2ΦðN Þθi Þ ðXi ðN Þ 2 ΦðN Þθi ÞA 2σwi 2 2πσwi In general, the parameters σ2wi and θi in (2.15) of the likelihood function are always unknown. Therefore the maximum likelihood method in (2.14) is equivalent to how to select θi and σ2wi to maximize logPðWi ðN ÞÞ. The MLE method for θi in (2.14) is equivalent to the following likelihood optimization method for θi max log L θi ; σ2wi (2.16) θi ;σ2wi
where the log-likelihood function is denoted as log L θi ; σ2wi 5 log Pðwi ðN ÞÞ
(2.17)
that is, the log-likelihood method in (2.16) could simplify the parameter estimation procedure for (2.14). Here, we expect the log-likelihood function to have the maximum value at θi 5 θ^ i and σ2wi 5 σ^ 2wi . The necessary condition for determining the maximum likelihood estimate θ^ i and σ^ 2wi must consist of [17] @logL θi ; σ2wi 50 (2.18) @θi @logL θi ; σ2wi 50 (2.19) @σ2wi After some computational arrangements from (2.18) and (2.19) the estimated parameters θ^ i and σ^ 2wi are [17] 21 (2.20) θ^ i 5 ΦTi ðN ÞΦi ðN Þ ΦTi ðN ÞXi ðN Þ 5 Φ1 i ðNÞXi ðNÞ T 1 Xi ðN Þ2Φi ðN Þθ^ i Xi ðN Þ 2 Φi ðN Þθ^ i σ^ 2wi 5 N21 (2.21) i 5 1; 2; . . .; n Under the Gaussian distribution of noise Wi ðN Þ in (2.5) the maximum likelihood parameter estimation θ^ i in (2.20) is equivalent to the least square parameter estimate θ^ i in (2.7). After the maximum likelihood parameter estimation in (2.20) the system order detection could be also obtained by AIC in method (2.9).
I. Systems Immunology
18
2. Biological network modeling and system identification in systems immunology and infection microbiology
2.2 System identification of proteinprotein interaction network Consider the proteinprotein interaction (PPI) network in Fig. 2.2. A simple regression model for the expression level of the ith protein at time t 1 1 can be described as follows: yi ð t 1 1Þ 5 yi ð t Þ 1
Li X bij yi ðtÞyj ðtÞ 1 ti xi ðtÞ 2 li yi ðtÞ 1 hi 1 vi ðtÞ j51
(2.22)
i 5 1; 2; . . .; m where yi ðtÞ indicates the expression level of the ith protein, bij denotes the interaction ability between protein i and protein j, ti denotes the translation from mRNA of the ith gene to protein, li denotes the decay rate of the protein yi ðtÞ, hi is the basal level to denote unknown interactions, especially, the epigenetic modification such as phosphorylation, ubiquitination, and acetylation and vi ðtÞ denotes the residual noise. The physical meaning of dynamic model in (2.22) is that the expression level of the ith protein at time t 1 1 is due to the protein expression level at time t, the interaction effect with other proteins, translation from the corresponding mRNA, self-decay and basal level. The interaction of protein i with other proteins can be also described by the following regression form 3 2 bi1 6 ^ 7 7 6 6 biLi 7 7 yi ðt 1 1Þ 5 yi ðtÞ y1 ðtÞ yi ðtÞ y2 ðtÞ ? yi ðtÞ yLi ðtÞ yi ðtÞ xi ðtÞ 1 6 6 ð1 2 li Þ 7 1 vi ðtÞ (2.23) 7 6 4 ti 5 hi 9φi ðtÞθi 1 vi ðtÞ
y 3(t)
FIGURE 2.2 A simple PPI network. PPI, Proteinprotein interaction.
y 5(t) y 1(t)
y 8(t)
y 6(t) y 4(t) y 9(t) y 2(t) y 7(t)
y 10(t) protein protein interaction
I. Systems Immunology
2.3 System identification of integrated genetic and epigenetic network via high throughput next generation sequencing data
19
If there are M time profile protein expression data for interactive parameter estimation in (2.23), we get the following regression form for each time point of protein expression level yi ð1Þ 5 φi ð0Þθi 1 vi ð0Þ yi ð2Þ 5 φi ð1Þθi 1 vi ð1Þ ^ yi ðMÞ 5 φi ðM 2 1Þθi 1 vi ðM 2 1Þ
(2.24)
Yi ðMÞ 5 Φi ðMÞθi 1 Vi ðMÞ
(2.25)
which can be represented by
Therefore by the least square parameter estimation method in (2.6), we get θ^ i 5 Φ1 i ðMÞYi ðMÞ
(2.26)
Similarly, by the AIC in (2.9) to detect the interaction order (number) of the target protein with other proteins, we could delete the false-positive protein interactions to prune the PPI, one target protein by one target protein in the candidate PPI network to obtain the real PPI network in Fig. 2.2.
2.3 System identification of integrated genetic and epigenetic network via high throughput next generation sequencing data If the next generation sequencing (NGS) data at some biological condition (for example HIV infection) is available, we could construct an integrated cellular network based on three coupling dynamic models to include GRN, PPI network, and epigenetic network of microRNA (miRNA) and methylation regulations. The integration of transcription regulation, PPI, epigenetic modification, and methylation regulation could give more insight into the actual cellular mechanism and is more predictive than those without integration. The integrated genetic and epigenetic network (IGEN) is shown to be more powerful and flexible for cellular system under different conditions and different species. The coupling dynamic systems of the whole IGEN are very useful for theoretical cellular function analysis and further experiments and therapeutics in systems immunology and infectious microbiology. Consider a simple IGEN in Fig. 2.3. The receptors in the cell membrane accept external molecular signals such as pathogens. Then the signals are transmitted through coupling signal transduction pathways to the TFs. TFs bind to the corresponding promoter binding sides to regulate the downstream target genes. The epigenetic miRNA regulation and methylation inhibition will degrade mRNA of the corresponding genes. Finally, the remaining mRNAs will be moved outside nucleus to translate to the corresponding proteins for metabolic pathways to respond to external signals. These cellular mechanisms could be described by the IGEN in Fig. 2.3.
I. Systems Immunology
20
2. Biological network modeling and system identification in systems immunology and infection microbiology
○
Protein
gene
●
miRNA
gene with methylation
acetylation
—
protein interaction — gene expression ----
epigenetic regulation
ubiquitination
FIGURE 2.3 A simple integrated genetic and epigenetic network.
The gene regulations of the ith gene in GRN subnetwork could be described by xi ðt 1 1Þ 5 xi ðtÞ 1
Mi Ni X X aij yj ðtÞ 2 cil sl ðtÞxi ðtÞ 2 λi xi ðtÞ 1 ki 1 wi ðtÞ j51
l51
i 5 1; 2; . . .; n
(2.27)
where cil denotes the silence rate of mRNA of the ith gene by the lth miRNA regulation, and the other parameters are similar to (2.1). The epigenetic methylation on gene i will influence the regulatory ability aij and basal level ki in (2.27). The PPIs of the ith protein in the PPI subnetwork could be described by yi ðt 1 1Þ 5 yi ðtÞ 1
Li X
bij yi ðtÞyj ðtÞ 1 ti xi ðtÞ 2 li yi ðtÞ 1 hi 1 vi ðtÞ
j51
(2.28)
i 5 1; 2; . . .; m
which is similar to (2.22). The miRNA dynamic model is described by si ðt 1 1Þ 5 si ðtÞ 1 pi 2 ri si ðtÞ 2
Qi X
dij si ðtÞxj ðtÞ 1 ni ðtÞ
(2.29)
j51
where ri denotes the decay rate of si ðtÞ, pi denotes the basal level of si ðtÞ, and dij denotes the degradation rate of si ðtÞ to silence mRNAs of the jth gene. The genetic and epigenetic regulation of the ith gene in (2.27) could be represented by the following regression form:
I. Systems Immunology
21
2.4 Conclusion
2
xi ðt 1 1Þ 5 y1 ðtÞ ?
yMi ðtÞ
s1 ðtÞ
? sNi ðtÞ
x1 ðtÞ
5 φi ðtÞθi 1 wi ðtÞ i 5 1; 2; . . .; n
? xi ðtÞ
6 6 6 6 6 1 6 6 6 6 4
ai1 ^ aiMi ci1 ^ ciNi 1 2 λi ki
3 7 7 7 7 7 7 1 wi ðtÞ 7 7 7 5
(2.30) Therefore the similar least square parameter estimation method in (2.7) or maximum likelihood parameter estimation method in (2.20) could also be employed to estimate the genetic and epigenetic parameters θi of gene i in (2.30) by the NGS data. The AIC system order scheme in (2.9) can be also employed to prune the false-positive genetic and epigenetic regulations by deleting those insignificant regulatory parameters out of system order determined by AIC. By the similar procedure, we could identify the interaction parameters in the PPI subnetwork in (2.28) one protein by one protein and regulatory parameters of miRNA subnetwork in (2.29) one miRNA by one miRNA in the candidate IGEN by NGs data, respectively. Then we also use AIC to prune the false-positive PPIs in PPI subnetwork and false-positive regulations of miRNA subnetwork to obtain the real PPI subnetwork and real miRNA subnetwork, respectively. Then these three real subnetworks are combined together to obtain the real integrated genetic and epigenetic cellular network, in which TFs link PPI subnetwork to GRN subnetwork, and genes connect PPI subnetwork and miRNA subnetwork. More detailed procedure of constructing host/pathogen genetic and epigenetic networks of different infectious diseases by system identification method and twosided NGS data will be described in parts V and VI of this book in the sequel.
2.4 Conclusion The least square parameter estimation and maximum likelihood parameter estimation method are useful reverse engineering scheme to estimate the interaction parameters in candidate PPI network and regulatory parameters in candidate GRN through microarray data or NGS data. AIC in (2.9) could be employed to estimate the system order (true protein number in PPI network or true gene number in GRN). Therefore, based on AIC, we could identify the system order and prune the false positives out of system order to obtain the real PPI network and GRN from the corresponding candidate PPI network and GRN for further analysis, respectively.
I. Systems Immunology
C H A P T E R
3 Identifying the gene regulatory network of systems inflammation in humans by system dynamic model via microarray data and database mining 3.1 Introduction Recently, microarray technology has been employed to rapidly produce vast catalogs of gene expression activities. The immense data highlight the need for a systematic tool to identify and analyze the underlying gene regulatory networks (GRNs) [18,19]. Several computational methods have been developed for the inference of transcriptional regulatory networks from experimental microarray data in Saccharomyces cerevisiae [20,21]. The genome-wide transcriptional responses of inflammation are usually focused on the known functional interactions of the master switch proteins, such as Rel or NF-κB proteins [2224]. The identification of NF-κB as a key player in the pathogenesis of inflammation suggests that NF-κB-targeted therapeutics might be effective in treating diseases such as rheumatoid arthritis, which is a well-known disease where inflammatory response is causing the primary damage [25]. In general, inflammation is usually considered as a lifepreserving response, as reflected by the increased risk of grave infections in people with genetic deficiencies in key components of the inflammatory signaling pathways [26]. Although inflammation is a hallmark of many human diseases [27,28], few studies have evaluated the genome-wide responses that are induced by systematic inflammation in human. DNA microarray has allowed the semiquantitative measurement of gene expression programming in a great depth and on a broad scale. In general, it is still a challenge to overcome the difficulties of recognizing and evaluating relevant biological processes from vast quantities of experimental data. Recently, systems biology has gained much attention due to emerging experimental and computation methods [18,19]. Systems biology is the coordinated study of biological systems by (1) investigating the components of networks and their interactions, (2) applying experimental high-throughput and wholegenome techniques, and (3) integrating computational methods with experimental efforts
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00008-0
23
© 2021 Elsevier Inc. All rights reserved.
24
3. Identifying the gene regulatory network of systems inflammation in humans
[29]. Therefore it is more appealing to apply a systems biology approach to investigating the systematic mechanism of inflammation via high-throughput transcriptomic studies of human disease. Such systematic approach can provide insights into the genetic regulation of immune cell activities, tolerance of innate immune system, and the susceptibility of infection in human. Based on a structured network-based approach and a statistical likelihood method, a network-based analysis of systemic inflammation in human has been given to evaluate genome-wide transcriptional responses in the context of known functional relationships among proteins, small molecules, and phenotypes [27,28]. The genome-wide interaction network is probed to identify functional modules that are perturbed in response to endotoxin exposure. A dynamic Bayesian network (DBN) approach has also been developed to predict the GRNs from time-course expression data [30]. Gene expression is transcriptionally controlled by several inducible transcription factors (TFs). The TF NF-κB in particularly is pivotal in the regulation of inflammation. For example, unstimulated macrophage is kept under an inactivated condition, and its NF-κB is retained in the cytoplasm through interaction with inhibitory proteins known as IκB. Cell stimulation by bacterial endotoxin will trigger a signaling pathway that results in the degradation of IκB, leading to the nuclear translocation of NF-κB and the activation of the transcription of various proinflammatory cytokines [31] (IL1A, IL1B, TNFA, IL6, IL8, etc.). Many cross talks among the signaling pathways have been recognized. It is now known that the biological functions of IL1A and TNFA overlap and complement each other [21,32]. Thus blocking only one mediator may not effectively reduce the overall inflammatory responses. Both IL1B and TNFA produce effects at the early stage of inflammation, and the use of their inhibitory reagents at the later stage may not be able to reverse the most damaging events initiated by them. As a result, IL1B and TNFA may not represent the best targets for intervention in systemic inflammatory response. In another study [33], TNFA and IL1 were shown to have positive feedback loops to TNFR and IL1R, respectively. On the other hand, the NF-κB also initiates the transcription of an inhibitory protein (A20) that can inactivate NF-κB by suppressive phosphorylation in IKK [34]. The other important receptors in the immune system, TLR family members (TLR2 and TLR4), which recognize pathogens by means of several conserved structural features of the microbes such as lipopolysaccharide (LPS) for Gram-negative bacteria, would involve in activating the MyD88/IRAK signaling cascade that bifurcates and leads to the NF-kB and c-Jun/ ATF2/TCF activation [35]. Because microarray profiles contain vast cataloged patterns of dynamic expression of the activated genes, we need systematic tools to identify the interaction architectures and the dynamics of the underlying gene networks. Indeed, the system identification problem of the underlying dynamic gene networks falls naturally into the category of reverse engineering [30]; a complex genetic network underlies a mass set of gene expression data, and the task is to infer the connectivity of GRN by microarray data through dynamic gene regulatory model [29]. Therefore to understand complex GRNs requires the integration of microarray data and dynamic gene regulatory modeling by a systematic approach. The systematic approach has to include computational dynamic modeling coupled with microarray data, data mining, dynamic responses, and network structures arising from highthroughput data analysis of the interacting species [36]. To achieve this a DBN method has been developed to predict GRNs from time-series data of gene expressions [30]. However,
I. Systems Immunology
3.2 Construction of candidate inflammatory gene regulatory network in response to inflammatory stimulus
25
this study has not combined with other network algorithms and knowledge-based databases. It carries two fundamental problems that greatly reduce the effectiveness of the DBN approach. The first problem is the relatively low accuracy of gene network prediction inherently, and the second is the excessive computation time to predict GRN. Since the identification of a perturbed biological network under the effect of bacterial endotoxin is an important topic in basic and clinical research, it is imperative to conduct a systematic analysis based on the gene expression profiles of microarray data. An approach of combining genome-wide expression analysis with a clustering method has been introduced to identify functional networks using a GRAM (Genetic Regulatory Modules) algorithm to provide biological insights into GRNs [37]. Because the clustering algorithms are employed to identify sets of coexpressed and potentially coregulated genes from gene expression data, it is more suitable to find a gene module as a set of coexpressed genes to which the same set of TFs will bind to their promoter regions. Therefore it is not suitable to construct the transcriptional regulatory networks as a dynamic system model. It is hence essential to provide a new way to identify the perturbed biological networks. To achieve this result, systems biology and computational biology methods will need to be employed to describe the biological functions from a dynamic system perspective [38,39]. In this chapter a systems biology approach is proposed to achieve a gradual refinement of inflammatory regulatory network. In the proposed method, we first construct a candidate GRN of inflammation by data mining from the Ensembl database http://www. ensembl.org/index.html and JASPAR http://jaspar.genereg.net/algorithms. We then build a dynamic gene regulatory model according to the candidate GRN in consideration of time delay between regulatory gene and target gene to describe the GRN. Based on the dynamic gene regulatory model and microarray data in Refs. [27,28], a maximum likelihood method is used to identify the regulatory parameters of upstream regulatory genes for each target gene. Finally, we prune away the insignificant regulatory genes by Akaike information criterion (AIC) model order detection method in system identification [40] to refine the candidate GRN to get real GRN of inflammatory response to bacterial endotoxin. By comparing with normal GRNs, we obtain the perturbed gene network to analyze the effect of inflammatory stimulus on the immune system. The hubs and “weak ties” are also discussed for the robust inflammatory gene network. This chapter is also based on databases mining to construct a candidate inflammatory regulatory network and then uses real microarray data to prune the false-positive regulations in the candidate inflammatory regulatory gene network through dynamic gene regulatory model to obtain the real inflammatory gene network in response to inflammatory stimulus.
3.2 Construction of candidate inflammatory gene regulatory network in response to inflammatory stimulus The flowchart for a GRN of inflammatory system can be divided into seven steps as shown in Fig. 3.1. The candidate GRN of inflammation is set up by similarity analysis and cross-correlation scheme from step 1 to step 5, and the refinement is then performed by system identification method in this chapter from step 5 to step 7. The step numbers are marked alongside the blocks in the flowchart as follows:
I. Systems Immunology
26
3. Identifying the gene regulatory network of systems inflammation in humans
FIGURE 3.1 The flowchart for constructing the GRN of inflammation. The left-hand-side path selects target genes and their potential regulatory genes, and the right-hand-side path generates a threshold of cross correlation between each target gene and its upstream regulator to select possible regulatory genes from the lefthand-side path to construct a candidate GRN of inflammatory response. Then the candidate GRN is pruned by dynamic model and parsimonious AIC to achieve a refined GRN of inflammation [3]. AIC, Akaike information criterion; GRN, gene regulatory network.
Step 1: A total of 49 genes (see Table 3.1) that are associated with the inflammatory responses are selected based on data mining in the published literature [27,28]. Next, we mine the findings reported in other literatures [2226] to select the candidate genes that we are interested in with biofunctions such as cellcell signaling (IL17C, etc.), leukocyte migration (SCYE1, etc.), or detection of abiotic stimulus (TACR1, etc.) as candidates. These 49 genes with annotations of different biological processes from Gene Ontology database are shown in Table 3.2. In order to extract the significance of the complicated global inflammatory gene network, we choose not to classify its function modules as Calvano et al. have done in their study [27]. Instead, only 49 significant genes are selected as a core in the inflammatory network, because it is much simpler to identify the permutations between normal and inflammatory conditions. These could enable us more easy to give cellular function interpretations and to perform literature validations, especially on the NF-κB subnetwork.
I. Systems Immunology
3.2 Construction of candidate inflammatory gene regulatory network in response to inflammatory stimulus
27
TABLE 3.1 Total 49 genes selected from published literatures (P-value # 0.05) [3]. Gene name ABCF1
Description ATP-binding cassette, subfamily F (GCN20), member 1
ADORA2A Adenosine A2a receptor
Gene name
Description
IL22
Interleukin 22
IL6
Interleukin 6
ADORA3
Adenosine A3 receptor
IL8
Interleukin 8
ALOX5
Arachidonate 5-lipoxygenase
IRAK
Interleukin 1 receptorassociated kinase 1
AMBP
Alpha-1-microglobulin/bikunin precursor
ITGB2
Integrin, beta 2
ANXA1
Annexin A1
KNG
Kininogen
AOAH
Acyloxyacyl hydrolase (neutrophil)
MAPK10
Mitogen-activated protein kinase 10
BLNK
B-cell linker
NFATC3
Nuclear factor of activated T cells, cytoplasmic, calcineurin-dependent 3
CCL18
Chemokine (CC motif) ligand 18 (pulmonary and activation regulated)
NFKB1
Nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 (p105)
CCR7
Chemokine (CC motif) receptor 7
NFKBIA
nuclear factor of kappa light polypeptide gene enhancer in B-cell inhibitors, alpha
CEBPD
CCAAT/enhancer-binding protein (C/EBP), delta
NFRKB
Nuclear factor related to κB-binding protein
CXCL14
Chemokine (CXC motif) ligand 14
NR3C1
Nuclear receptor subfamily 3, group C, member 1 (glucocorticoid receptor)
CXCL2
Chemokine (CXC motif) ligand 2
PLA2G4B Phospholipase A2, group IVB (cytosolic)
CYBB
Cytochrome b-245, beta polypeptide (chronic granulomatous disease)
PLAA
Phospholipase A2activating proteinb
FOS
V-fos FBJ murine osteosarcoma viral oncogene homolog
REG3A
Pancreatitis-associated protein
GPR132
G proteincoupled receptor 132
SCCE
Kallikrein 7 (chymotryptic, stratum corneum)
HDAC4
Histone deacetylase 4
SCYE1
Small inducible cytokine subfamily E, member 1 (endothelial monocyte activating)
HDAC5
Histone deacetylase 5
TACR1
Tachykinin receptor 1
HDAC7A
Histone deacetylase 7A
TICAM2
Toll-like receptor adaptor molecule 2
HDAC9
Histone deacetylase 9
TLR4
Toll-like receptor 4
HPSE
Heparanase
TLR7
Toll-like receptor 7
IL17
Interleukin 17C TNFA tumor
TNFA
Tumor necrosis factor
IL1A
Interleukin 1a
TNFR
Tumor necrosis factor receptor superfamily member 1A precursor
IL1B
Interleukin 1b
TOLLIP
Toll-interacting protein
IL1R
Interleukin 1 receptor, type I precursor
I. Systems Immunology
TABLE 3.2
Annotations of 49 target genes (P-value # 0.05) [3].
Gene name
Description
Gene ontology biological process
IL17C
Interleukin17C
IL1A
Interleukin 1a
GO:0007166: GO:0007267: GO:0006954: GO:0006954:
cell-surface receptor linked signal transduction cellcell signaling inflammatory response inflammatory response
TNFA
Tumor necrosis factor
IL6
Interleukin 6
IL1B
Interleukin 1b
TLR4
Toll-like receptor 4
NFATC3
Nuclear factor of activated T cells, cytoplasmic, calcineurindependent 3
GO:0006959: GO:0006954: GO:0043123: GO:0051092: GO:0051023: GO:0007267: GO:0006959: GO:0006954: GO:0045727: GO:0007267: GO:0006954: GO:0007165: GO:0007249: GO:0042116: GO:0007165: GO:0042088: GO:0006954:
humoral immune response inflammatory response positive regulation of IκB kinase/NF-κB cascade positive regulation of NF-κB transcription factor activity regulation of immunoglobulin secretion cellcell signaling humoral immune response inflammatory response positive regulation of translation cellcell signaling inflammatory response signal transduction IκB kinase/NF-κB cascade macrophage activation signal transduction T-helper 1type immune response inflammatory response
SCYE1
Small inducible cytokine subfamily E, member 1 (endothelial monocyte-activating)
TICAM2
Toll-like receptor adaptor molecule 2
GO:0007267: GO:0006954: GO:0050900: GO:0007165: GO:0043123:
cellcell signaling inflammatory response leukocyte migration signal transduction positive regulation of IκB kinase/NF-κB cascade
HDAC4
Histone deacetylase 4
HDAC5
Histone deacetylase 5
HDAC7A
Histone deacetylase 7A
GO:0030183: GO:0006954: GO:0030183: GO:0006954: GO:0006355: GO:0030183: GO:0006954:
B-cell differentiation inflammatory response B-cell differentiation inflammatory response regulation of transcription, DNA dependent B-cell differentiation inflammatory response (Continued)
TABLE 3.2 (Continued) Gene name
Description
Gene ontology biological process
HDAC9
Histone deacetylase 9
ITGB2
integrin, beta 2
CXCL2
Chemokine (CXC motif) ligand 2
GO:0030183: B-cell differentiation GO:0006954: inflammatory response GO:0007267: cellcell signaling GO:0006954: inflammatory response GO:0007159: leukocyte adhesion GO:0008360: regulation of cell shape GO:0030593: neutrophil chemotaxis GO:0006954: inflammatory response
ALOX5
Arachidonate 5-lipoxygenase
NFKBIA
Nuclear factor of kappa light polypeptide gene enhancer in B-cell inhibitors, alpha
NR3C1
Nuclear receptor subfamily 3, group C, member 1 (glucocorticoid receptor)
GO:0007165: signal transduction
CEBPD
CCAAT/enhancer-binding protein (C/EBP), delta
GO:0006366: transcription from RNA polymerase II promoter
ANXA1
Annexin A1
CYBB
Cytochrome b-245, beta polypeptide (chronic granulomatous disease)
GO:0007166: cell-surface receptor linked signal transduction GO:0006954: inflammatory response GO:0006954: inflammatory response GO:0045087: innate immune response
AOAH
Acyloxyacyl hydrolase (neutrophil)
REG3A
Pancreatitis-associated protein
GO:0008283: cell proliferation
FOS
V-fos FBJ murine osteosarcoma viral oncogene homolog
IRAK
Interleukin 1 receptorassociated kinase 1
PLAA
Phospholipase A2activating protein
GO:0006954: inflammatory response GO:0006357: regulation of transcription from RNA polymerase II promoter GO:0007250: activation of NF-κB-inducing kinase activity GO:0045941: positive regulation of transcription GO:0007165: signal transduction GO:0007165: signal transduction
CCR7
Chemokine (CC motif) receptor 7
GO:0006954: inflammatory response
GO:0006954: inflammatory response GO:0019370: leukotriene biosynthetic process GO:0006691: leukotriene metabolic process GO:0007253: cytoplasmic sequestering of NF-κB GO:0042345: regulation of NF-κB import into nucleus
(Continued)
TABLE 3.2 (Continued) Gene name
Description
Gene ontology biological process
CXCL14
Chemokine (CXC motif) ligand 14
PLA2G4B
Phospholipase A2, group IVB (cytosolic)
GO:0007267: cellcell signaling GO:0007165: signal transduction GO:0006954: inflammatory response
NFRKB
Nuclear factor related to κB-binding protein
MAPK10
Mitogen-activated protein kinase 10
ADORA2A Adenosine A2a receptor
SCCE
Kallikrein 7 (chymotryptic, stratum corneum)
ADORA3
Adenosine A3 receptor
NFKB1
Nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 (p105)
CCL18
Chemokine (CC motif) ligand 18 (pulmonary and activation regulated)
AMBP
Alpha-1-microglobulin/bikunin precursor
TACR1
Tachykinin receptor 1
KNG
Kininogen
BLNK
B-cell linker
ABCF1
ATP-binding cassette, subfamily F (GCN20), member 1
GO:0006954: inflammatory response GO:0006366: transcription from RNA polymerase II promoter GO:0007165: signal transduction GO:0008015: blood circulation GO:0007596: blood coagulation GO:0007267: cellcell signaling GO:0006968: cellular defense response GO:0006954: inflammatory response GO:0006909: phagocytosis GO:0007600: sensory perception GO:0008544: epidermis development GO:0006954: inflammatory response GO:0007165: signal transduction GO:0006954: inflammatory response GO:0045941: positive regulation of transcription GO:0006366: transcription from RNA polymerase II promoter GO:0007267: cellcell signaling GO:0006955: immune response GO:0009607: response to biotic stimulus GO:0007165: signal transduction GO:0007155: cell adhesion GO:0050777: negative regulation of immune response GO:0009582: detection of abiotic stimulus GO:0006954: inflammatory response GO:0007638: mechanosensory behavior GO:0006954: inflammatory response GO:0030195: negative regulation of blood coagulation GO:0007162: negative regulation of cell adhesion GO:0006939: smooth muscle contraction GO:0030183: B-cell differentiation GO:0006959: humoral immune response GO:0006954: inflammatory response GO:0007242: intracellular signaling cascade GO:0006954: inflammatory response (Continued)
TABLE 3.2 (Continued) Gene name
Description
HPSE
Heparanase
TLR7
Toll-like receptor 7
IL22
Interleukin 22
GPR132
G proteincoupled receptor 132
IL1R
Interleukin 1 receptor antagonist protein precursor
TOLLIP
Toll-interacting protein
IL8
Interleukin 8
TNFR
CD27 antigen precursor
Gene ontology biological process GO:0006412: translation GO:0051607: defense response to virus GO:0007249: IκB kinase/NF-κB cascade GO:0045416: positive regulation of interleukin 8 biosynthetic process GO:0006953: acute-phase response GO:0007267: cellcell signaling GO:0006954: inflammatory response GO:0007186: G proteincoupled receptor protein signaling pathway GO:0006955: immune response GO:0006954: inflammatory response GO:0007267: cellcell signaling GO:0006954: inflammatory response GO:0007242: intracellular signaling cascade GO:0045321: leukocyte activation GO:0007267: cellcell signaling GO:0007186: G proteincoupled receptor protein signaling pathway GO:0006954: inflammatory response GO:0007242: intracellular signaling cascade GO:0030155: regulation of cell adhesion GO:0045091: regulation of retroviral genome replication GO:0016064: immunoglobulin-mediated immune response GO:0045579: positive regulation of B-cell differentiation GO:0008588: release of cytoplasmic sequestered NF-κB
32
3. Identifying the gene regulatory network of systems inflammation in humans
The main goal is to select the candidate regulators (i.e., TFs) of 49 target genes in inflammatory response to candidate GRN of inflammation by linking these target genes to their regulators. Step 2: The Ensembl database http://www.ensembl.org/index.html is explored to retrieve the promoter sequences of 49 target genes, and then sequence similarity analysis is conducted to identity candidate regulators of these target genes in JASPAR http://asp. ii.uib.no:8090/cgi-bin/jaspar2005/jaspar_db.pl, which is a high-quality TF database. In this stage, it is hypothesized that if some TFs are selected by the predictions of JASPAR using our criteria, genes of the corresponding TFs at the protein level could be considered as candidate regulators to the target genes. At the end of this step a set of candidate regulators could be obtained from the JASPAR analysis (see Table 3.3). However, there are still many false-positive errors in these hits because all possible regulators are listed at the outcome in conditions beyond inflammatory response. Some pruning methods are necessary based on microarray data of inflammatory response. The pruning procedure of these false positives is described after step 5. Step 3: Potential gene regulators are screened and selected from the JASPAR hits by cross-correlation threshold of gene expression data [41], under the assumption that some possible correlations exist between target genes and their upstream regulators, with or without time delays. The cross correlations between the target genes and their own regulatory genes are computed separately, and the cross-correlation coefficients are then used to identify the candidate regulators under to the assumption that the regulatory genes and target genes may have a positively (or negatively) correlated temporal relationship while the target gene’s expression profile is positively (or negatively) correlated with the regulatory genes profile, with or without time lags. Step 4: Since a careful choice of proper threshold for correlation to discriminate the “by chance” associations is indeed important, in order to decide on a threshold of significant correlations between transcription regulators and target genes for the selection of candidate transcription regulatory genes, 2000 genes from 22,577 genes are randomly chosen and computed for their correlations by the Pearson correlation in Eq. (3.3), as shown in Fig. 3.2. According to the ranking of cross correlation in Fig. 3.2, we select the 30% (i.e., 0.46,451 in cross-correlation value) as the threshold. A lower threshold means we may recruit some “by chance” regulator genes, while a higher threshold means it may result in the increase of false-negative genes. However, we have learned from our own experience that high correlated genes may be associated due to the fact that they may be coregulated by the same regulatory genes/TFs. In other word, those genes that are coregulated by a common set of regulatory genes but do not regulate each other always arise with high cross correlation. On the other hand, time delay in signaling pathways may mask the genes with coregulatory association with low cross correlation. Since the cross correlation is only the first discrimination parameter used in the gradual refinement of the inflammatory regulatory network, it means we do not want to miss all possible candidate regulatory genes at such an early stage. Since the aim in the first stage of our algorithm is to delete those “impossible” regulatory associations that are truly false, we cannot adapt a high threshold of cross correlation for discrimination. Furthermore, with the permutation steps for data randomization, we could obtain the probability density function of parameter estimation, from which the parameter estimation of regulatory genes by our threshold of cross
I. Systems Immunology
33
3.2 Construction of candidate inflammatory gene regulatory network in response to inflammatory stimulus
TABLE 3.3 The inflammatory genes and their regulators (P-value # 0.05) [3]. (A)
(B)
(C)
Possible regulators from JASPAR
Candidate regulators from crosscorrelation threshold
Real regulators from AIC
IL17
RUNX1,SOX9,RORA,MEF2A,HLF, TFAP2A,NFIL3,ELK1,FOXD1, FOXL1,GATA2,FOXI1,IRF1,YY1, REL,RELA,NFKB1,SPIB
RUNX1,SOX9,RORA,HLF, TFAP2A,NFIL3,ELK1,FOXD1, FOXL1,GATA2,FOXI1,YY1,REL, RELA,NFKB1,SPIB
RUNX1,SOX9,RORA,HLF, TFAP2A,NFIL3,ELK1,FOXD1, GATA2,FOXI1,YY1,REL,RELA, NFKB1,SPIB
IL1A
RUNX1,Pbx,SOX9,MEF2A,HLF, TFAP2A,E2F1,NFIL3,ELK1,FOXD1, FOXL1,GATA2,FOXI1,YY1,REL, RELA,NFKB1,SPIB
RUNX1,Pbx,SOX9,HLF,TFAP2A, E2F1,NFIL3,ELK1,FOXD1,FOXL1, GATA2,FOXI1,YY1,REL,RELA, NFKB1
RUNX1,Pbx,HLF,TFAP2A, NFIL3,ELK1,FOXD1,FOXL1, GATA2,FOXI1,YY1,REL,RELA, NFKB1
TNFA
TFAP2A,FOXD1,FOXL1,GATA2, FOXI1,MAX,YY1,NFKB1,SPIB
TFAP2A,FOXD1,FOXL1,GATA2, FOXI1,MAX,YY1,NFKB1,SPIB
TFAP2A,GATA2,FOXI1,MAX, YY1,NFKB1,SPIB
IL6
RUNX1,MEF2A,HLF,TFAP2A,ELK1, RUNX1,HLF,TFAP2A,ELK1, FOXL1,GATA2,FOXI1,YY1,REL, FOXL1,GATA2,FOXI1,YY1,REL, RELA,NFKB1,SPIB NFKB1
RUNX1,HLF,TFAP2A,FOXL1, GATA2,FOXI1,YY1,REL,NFKB1
IL1B
SOX9,MEF2A,TFAP2A,ELK1, FOXD1,FOXL1,GATA2,FOXI1,YY1, REL,RELA,NFKB1,SPIB
SOX9,MEF2A,ELK1,GATA2, FOXI1,REL,RELA,NFKB1,SPIB
SOX9,MEF2A,ELK1,GATA2, FOXI1,RELA,NFKB1
TLR4
RUNX1,Pbx,SOX9,RORA,MEF2A, HLF,TFAP2A,NFIL3,ELK1,FOXF2, FOXD1,FOXL1,GATA2,FOXI1,IRF1, MAX,YY1,REL,RELA,SPIB
RUNX1,MEF2A,NFIL3,ELK1, FOXI1,IRF1,MAX,REL,RELA,SPIB
RUNX1,MEF2A,FOXI1,IRF1, MAX,REL,RELA,SPIB
NFATC3
SOX9,HLF,TFAP2A,NFIL3,ELK1, FOXD1,FOXL1,GATA2,FOXI1,MAX, YY1,REL
SOX9,HLF,NFIL3,ELK1,FOXD1, GATA2,FOXI1,MAX,YY1
SOX9,HLF,NFIL3,ELK1,FOXD1, GATA2,FOXI1,MAX
SCYE1
RUNX1,SOX9,TFAP2A,FOXD1, RUNX1,TFAP2A,FOXL1,REL,SPIB RUNX1,TFAP2A,FOXL1,REL, FOXL1,GATA2,FOXI1,YY1,REL,SPIB SPIB
TICAM2
SOX9,RORA,TFAP2A,ELK1,FOXL1, GATA2,YY1,REL,RELA,SPIB
SOX9,TFAP2A,ELK1,FOXL1,REL, RELA,SPIB
SOX9,TFAP2A,ELK1,FOXL1, RELA,SPIB
HDAC4
SOX9,HLF,TFAP2A,E2F1,NFIL3, ELK1,FOXL1,GATA2,YY1,REL
TFAP2A,FOXL1
TFAP2A,FOXL1
HDAC5
RUNX1,SOX9,HLF,TFAP2A,NFIL3, RUNX1,FOXD1,FOXL1,FOXI1, ELK1,FOXD1,FOXL1,GATA2,FOXI1, MAX,YY1,REL,SPIB MAX,YY1,REL,SPIB
RUNX1,FOXD1,FOXL1,MAX, YY1,REL,SPIB
HDAC7A
RUNX1,SOX9,TFAP2A,E2F1,ELK1, FOXD1,FOXL1,GATA2,FOXI1,YY1, NFKB1,SPIB
TFAP2A,E2F1,FOXD1,FOXL1, GATA2,FOXI1,YY1,NFKB1,SPIB
HDAC9
SOX9,RORA,MEF2A,TFAP2A, MEF2A,NFIL3,IRF1,REL,SPIB NFIL3,ELK1,FOXD1,FOXL1,GATA2, FOXI1,IRF1,YY1,REL,SPIB
NFIL3,IRF1,REL
ITGB2
RUNX1,SOX9,TFAP2A,ELK1, FOXD1,FOXL1,GATA2,FOXI1,IRF1, MAX,YY1,SPIB
RUNX1,TFAP2A,ELK1,FOXD1, FOXL1,GATA2,FOXI1,IRF1, MAX,YY1,SPIB
Gene name
SOX9,TFAP2A,E2F1,FOXD1, FOXL1,GATA2,FOXI1,YY1, NFKB1,SPIB
RUNX1,SOX9,TFAP2A,ELK1, FOXD1,FOXL1,GATA2,FOXI1, IRF1,MAX,YY1,SPIB
(Continued)
I. Systems Immunology
34
3. Identifying the gene regulatory network of systems inflammation in humans
TABLE 3.3 (Continued) (A)
(B)
(C)
Possible regulators from JASPAR
Candidate regulators from crosscorrelation threshold
Real regulators from AIC
CXCL2
RUNX1,Pbx,SOX9,RORA,E2F1, NFIL3,FOXF2,FOXD1,FOXL1, GATA2,FOXI1,IRF1,YY1,SPIB
RUNX1,Pbx,SOX9,RORA,E2F1, NFIL3,FOXF2,FOXD1,FOXL1, GATA2,FOXI1,IRF1,YY1
SOX9,RORA,E2F1,NFIL3, FOXD1,FOXL1,GATA2
ALOX5
RUNX1,Pbx,SOX9,RORA,MEF2A, TFAP2A,FOXD1,FOXL1,GATA2, FOXI1,MAX,YY1,SPIB
RUNX1,MEF2A,TFAP2A,FOXL1, SPIB
RUNX1,MEF2A,TFAP2A,SPIB
NFKBIA
RUNX1,SOX9,MEF2A,TFAP2A, FOXD1,FOXL1,GATA2,FOXI1,IRF1, YY1,SPIB
RUNX1,SOX9,MEF2A,GATA2, FOXI1,IRF1,SPIB
RUNX1,MEF2A,FOXI1,IRF1, SPIB
NR3C1
RUNX1,Pbx,SOX9,RORA,MEF2A, TFAP2A,FOXD1,FOXL1,GATA2, FOXI1,MAX,YY1,SPIB
RUNX1,Pbx,SOX9,RORA,MEF2A, TFAP2A,FOXD1,FOXL1,GATA2, FOXI1,YY1,SPIB
RUNX1,Pbx,SOX9,MEF2A, TFAP2A,FOXD1,FOXL1,GATA2, FOXI1,YY1,SPIB
CEBPD
TFAP2A,NFIL3,FOXF2,FOXD1, NFIL3,FOXF2,FOXI1,REL,SPIB FOXL1,GATA2,FOXI1,YY1,REL,SPIB
NFIL3,FOXF2,FOXI1,SPIB
ANXA1
GATA2,YY1,SPIB
SPIB
SPIB
CYBB
RUNX1,SOX9,TFAP2A,FOXD1, FOXL1,GATA2,FOXI1,IRF1,YY1, REL,SPIB
FOXL1,IRF1,SPIB
FOXL1,SPIB
AOAH
RUNX1,SOX9,RORA,HLF,TFAP2A, ELK1,FOXF2,FOXD1,FOXL1, GATA2,FOXI1,YY1,REL,RELA,SPIB
RUNX1,TFAP2A,ELK1,FOXL1, FOXI1,RELA,SPIB
RUNX1,FOXL1,FOXI1,RELA
REG3A
SOX9,RORA,ELK1,FOXL1,GATA2, FOXI1,YY1,SPIB
SOX9,RORA,ELK1,FOXL1, GATA2,FOXI1,YY1,SPIB
SOX9,RORA,ELK1,FOXL1, GATA2,SPIB
FOS
RUNX1,SOX9,TFAP2A,E2F1,ELK1, FOXD1,FOXL1,GATA2,FOXI1,YY1, SPIB
RUNX1,TFAP2A,ELK1,FOXL1
RUNX1,TFAP2A,ELK1,FOXL1
IRAK
RUNX1,SOX9,RORA,TFAP2A,E2F1, ELK1,FOXD1,FOXL1,GATA2,MAX, YY1,SPIB
E2F1,MAX,SPIB
E2F1,SPIB
PLAA
RUNX1,Pbx,SOX9,RORA,MEF2A, HLF,TFAP2A,NFIL3,ELK1,FOXF2, FOXD1,FOXL1,GATA2,FOXI1,IRF1, MAX,YY1,SPIB
RUNX1,Pbx,SOX9,RORA,HLF, TFAP2A,ELK1,FOXF2,FOXD1, FOXL1,GATA2,FOXI1,MAX,YY1, SPIB
RUNX1,SOX9,TFAP2A,FOXF2, FOXD1,FOXL1,GATA2,MAX, SPIB
CCR7
RUNX1,SOX9,TFAP2A,ELK1, FOXD1,GATA2,IRF1,YY1,REL,SPIB
IRF1,REL,SPIB
IRF1,REL
CXCL14
RUNX1,SOX9,RORA,HLF,TFAP2A, E2F1,ELK1,FOXD1,FOXL1,GATA2, FOXI1,IRF1,YY1,REL,RELA,SPIB
RUNX1,SOX9,RORA,HLF, TFAP2A,E2F1,ELK1,FOXD1, FOXL1,GATA2,FOXI1,YY1,REL, RELA,SPIB
RUNX1,SOX9,HLF,TFAP2A, E2F1,FOXD1,FOXL1,GATA2, YY1,REL,RELA,SPIB
PLA2G4B
MEF2A,TFAP2A,E2F1,NFIL3,ELK1, MEF2A,E2F1,NFIL3,GATA2, FOXL1,GATA2,FOXI1,YY1,REL,SPIB FOXI1,YY1,SPIB
Gene name
MEF2A,NFIL3,FOXI1,YY1,SPIB
(Continued)
I. Systems Immunology
35
3.2 Construction of candidate inflammatory gene regulatory network in response to inflammatory stimulus
TABLE 3.3 (Continued) (A)
(B)
(C)
Possible regulators from JASPAR
Candidate regulators from crosscorrelation threshold
Real regulators from AIC
NFRKB
RUNX1,SOX9,RORA,MEF2A, TFAP2A,E2F1,FOXD1,FOXL1, GATA2,FOXI1,MAX,YY1,REL,SPIB
RUNX1,SOX9,RORA,E2F1, FOXD1,GATA2,FOXI1,MAX,YY1, REL,SPIB
RUNX1,SOX9,RORA,E2F1, FOXD1,REL,SPIB,GATA2, FOXI1,MAX,YY1
MAPK10
RUNX1,SOX9,MEF2A,HLF,TFAP2A, RUNX1,SOX9,HLF,TFAP2A,E2F1, E2F1,ELK1,FOXL1GATA2,FOXI1, ELK1,FOXL1,GATA2FOXI1,YY1, IRF1,YY1,REL,SPIB REL,SPIB
Gene name
ADORA2A RUNX1,SOX9,RORA,TFAP2A,E2F1, FOXD1,FOXL1,GATA2,IRF1,YY1, REL,RELA,SPIB
RUNX1,SOX9,TFAP2A,E2F1, ELK1,FOXL1,GATA2,FOXI1, REL,SPIB
RUNX1,SOX9,RORA,TFAP2A, RUNX1,RORA,TFAP2A,E2F1, E2F1,FOXD1,FOXL1,GATA2,IRF1, REL,RELA,SPIB,FOXL1,GATA2, YY1,REL,RELA,SPIB IRF1,YY1
SCCE
RUNX1,Pbx,SOX9,RORA,HLF, TFAP2A,ELK1,FOXD1,FOXL1, GATA2,FOXI1,IRF1,MAX,YY1,SPIB
RUNX1,Pbx,SOX9,RORA,HLF, TFAP2A,ELK1,FOXD1,FOXL1, GATA2,FOXI1,MAX,YY1,SPIB
RUNX1,SOX9,RORA,HLF,ELK1, FOXD1,FOXL1,GATA2,YY1,SPIB
ADORA3
RUNX1,Pbx,ELK1,FOXL1,GATA2, YY1
Pbx,GATA2
Pbx,GATA2
NFKB1
RUNX1,SOX9,TFAP2A,ELK1, FOXD1,FOXL1,GATA2,FOXI1,YY1, REL,SPIB
RUNX1,SOX9,ELK1,GATA2, FOXI1,YY1,REL,SPIB
RUNX1,SOX9,ELK1,GATA2, FOXI1,YY1,SPIB
CCL18
Pbx,SOX9,TFAP2A,NFIL3,FOXD1, FOXL1,GATA2,YY1,SPIB
Pbx,SOX9,TFAP2A,NFIL3,FOXD1, Pbx,SOX9,TFAP2A,NFIL3, FOXL1,GATA2,YY1 FOXL1,GATA2,YY1
AMBP
RUNX1,TFAP2A,ELK1,FOXF2, FOXD1,FOXL1,GATA2,FOXI1,MAX, YY1,REL,SPIB
RUNX1,TFAP2A,ELK1,FOXF2, FOXD1,FOXL1,GATA2,FOXI1, MAX,YY1,REL,SPIB
TFAP2A,FOXF2,FOXD1,FOXL1, GATA2,FOXI1,MAX,YY1,REL, SPIB
TACR1
RUNX1,SOX9,RORA,TFAP2A,E2F1, FOXD1,FOXL1,GATA2,MAX,YY1, SPIB
RUNX1,SOX9,RORA,TFAP2A, E2F1,FOXD1,FOXL1,GATA2, MAX,YY1,SPIB
RUNX1,SOX9,RORA,TFAP2A, E2F1,FOXD1,FOXL1,MAX,YY1, SPIB
KNG
RUNX1,SOX9,RORA,TFAP2A, FOXL1,GATA2,FOXI1,YY1,REL, SPIB
RUNX1,SOX9,RORA,TFAP2A, FOXL1,GATA2,FOXI1,YY1,REL, SPIB
RUNX1,SOX9,TFAP2A,FOXL1, GATA2,FOXI1,YY1,REL
BLNK
SOX9,MEF2A,NFIL3,ELK1,FOXF2, FOXD1,FOXL1,GATA2,FOXI1,YY1, SPIB
MEF2A,NFIL3,FOXI1,SPIB
MEF2A,FOXI1
ABCF1
RUNX1,SOX9,RORA,TFAP2A,E2F1, RUNX1,SOX9,E2F1,ELK1,GATA2, ELK1,FOXD1,FOXL1,GATA2,FOXI1, FOXI1,MAX,YY1,REL,RELA, MAX,YY1,REL,RELA,NFKB1,SPIB NFKB1,SPIB
RUNX1,E2F1,ELK1,GATA2, FOXI1,MAX,YY1,REL,RELA, NFKB1
HPSE
RUNX1,SOX9,TFAP2A,E2F1,ELK1, FOXD1,FOXL1,GATA2,FOXI1,YY1, REL,SPIB
REL,SPIB
REL
TLR7
RUNX1,SOX9,MEF2A,HLFTFAP2A, FOXD1,FOXL1,GATA2,FOXI1,IRF1, YY1,REL,RELA,NFKB1,SPIB
RUNX1,MEF2A,IRF1,REL,RELA, NFKB1,SPIB
RUNX1,MEF2A,IRF1,REL, NFKB1,SPIB
(Continued)
I. Systems Immunology
36
3. Identifying the gene regulatory network of systems inflammation in humans
TABLE 3.3 (Continued) (A)
(B)
(C)
Possible regulators from JASPAR
Candidate regulators from crosscorrelation threshold
Real regulators from AIC
IL22
RUNX1,SOX9,RORA,MEF2A, TFAP2A,E2F1,ELK1,FOXD1,FOXL1, GATA2,FOXI1,MAX,YY1,REL, NFKB1,SPIB
RUNX1,SOX9,RORA,TFAP2A, E2F1,ELK1,FOXD1,FOXL1, GATA2,FOXI1,MAX,YY1,REL, NFKB1,SPIB
RUNX1,SOX9,RORA,E2F1,MAX, YY1,REL,NFKB1,SPIB,ELK1, FOXD1,FOXL1,GATA2,FOXI1
GPR132
SOX9,RORA,MEF2A,TFAP2A,E2F1, FOXL1,GATA2,FOXI1,IRF1,YY1, NFKB1,SPIB
SOX9,RORA,MEF2A,TFAP2A, E2F1,FOXL1,GATA2,FOXI1,IRF1, YY1,NFKB1,SPIB
SOX9,RORA,MEF2A,TFAP2A, E2F1,FOXL1,GATA2,FOXI1, IRF1,YY1,SPIB
IL1R
RUNX1,SOX9,HLF,TFAP2A,NFIL3, TFAP2A,NFIL3,FOXL1,RELA, ELK1,FOXD1,FOXL1,GATA2,FOXI1, SPIB YY1,REL,RELA,SPIB
TFAP2A,NFIL3,FOXL1,RELA
TOLLIP
RUNX1,SOX9,RORA,MEF2A,HLF, TFAP2A,NFIL3,ELK1,FOXD1, FOXL1,GATA2,FOXI1,MAX,YY1, REL,RELA,NFKB1,SPIB
RUNX1,SOX9,RORA,HLF, TFAP2A,NFIL3,ELK1,YY1,REL, RELA,NFKB1,SPIB,FOXD1, FOXL1,GATA2,FOXI1,MAX
RUNX1,SOX9,RORA,HLF, TFAP2A,NFIL3,ELK1,YY1,REL, RELA,NFKB1,FOXD1,FOXL1, GATA2,FOXI1,MAX
IL8
RUNX1,Pbx,SOX9,RORA,MEF2A, HLF,TFAP2A,E2F1,NFIL3,ELK1, FOXF2,FOXD1,FOXL1,GATA2, FOXI1,MAX,YY1,REL,RELA,SPIB
RUNX1,Pbx,SOX9,RORA,MEF2A, HLF,E2F1,NFIL3,ELK1,FOXF2, FOXD1,GATA2,FOXI1,MAX,YY1, REL,RELA
RUNX1,SOX9,RORA,MEF2A, HLF,E2F1,NFIL3,ELK1,FOXF2, FOXD1,GATA2,FOXI1,MAX, YY1,REL,RELA
TNFR
Pbx,SOX9,TFAP2A,FOXL1,GATA2, FOXI1
FOXI1
FOXI1
Gene name
AIC, Akaike information criterion.
correlation has shown a P-value less than.001. Therefore the genes selected by the proposed threshold (0.46451) are the significant candidates for regulatory genes in GRN of inflammation. Step 5: Here the first selection is made from the candidate regulators in step 3. This means that if the cross correlation between a candidate regulator and the target gene is more than 0.46451, it will be considered as a significant candidate regulator for the target gene. After potential regulators are selected by cross-correlation threshold, these target genes and their candidate regulators are integrated to construct a candidate GRN of inflammatory response system. Results of the first selection are listed in Table 3.3, column (B).
3.3 Pruning the candidate gene regulatory network via a dynamic gene regulatory model At this step, we have constructed a candidate network via the first five selection steps using cross correlation by statistical inferences. However, we have yet to consider the dynamic system property of this GRN. To include the dynamic regulatory parameters, we apply the AIC in Eq. (3.14), to help us make a more systematic selection. The AIC
I. Systems Immunology
3.3 Pruning the candidate gene regulatory network via a dynamic gene regulatory model
37
FIGURE 3.2 Distribution of a threshold for selecting candidate regulators by cross-correlation method. A number of 2000 genes are randomly chosen from 22,577 genes to compute their correlation, and then these correlations are ranked. A threshold 0.3 is specified to select possible candidate regulators from those based on DNA sequence similarity in JASPAR database [3].
algorithm is denoted as system order detection in step 7 as shown in Fig. 3.1. A dynamic gene regulatory model is employed to systematically describe the gene regulatory genetic network of inflammation. It should be noted that the time delays from the regulators to their target genes, which have been detected by cross-correlation prediction algorithm via correlation coefficiency, are considered in the dynamic gene regulatory model to mimic the delay phenomenon due to the signaling transduction relay of the metabolic and signal pathways in the real signal transcriptional regulatory process. Details of the pruning process are presented in the following paragraph and Section 3.7. In this chapter, based on the possible interactions in a candidate GRN of inflammation [27,28,42] in the previous sections, a dynamic gene regulatory model for the transcription of an interested target gene of systematic inflammation is employed. This model describes how the upstream regulatory genes control their target genes to produce the downstream expression of mRNA through the transcriptional regulatory network. From the candidate GRN through big database-predicted information, a dynamic gene regulatory model is constructed for each target gene of systematic inflammation in humans. Then, according to the time-profile microarray data of genetic expression, we could identify the number of regulatory connections in the dynamic gene regulatory model of candidate gene network in the inflammatory system. Based on the order of interaction in the GRN, we prune the candidate GRN of inflammation one target gene at a time via AICs to obtain the real GRN
I. Systems Immunology
38
3. Identifying the gene regulatory network of systems inflammation in humans
of inflammation. The pruning procedures to obtain a refined GRN (see Fig. 3.1) are given in the following steps. Step 6: According to the candidate GRN, the transcriptional regulation of a target gene in inflammatory system is a dynamically regulatory model with the following multiinput/ single-output stochastic process. yðt 1 1Þ 5 ayðtÞ 1
L X
bi xi ðt 2 τ i Þ 1 k 1 εðtÞ
(3.1)
i51
where yðtÞ represents mRNA expression level of a target gene at time t, and the parameter a indicates the effect of the present state yðtÞ on the next state yðt 1 1Þ; xi ðt 2 τ i Þ, i 5 1; :::; L, denote the regulation functions of L upstream TFs in the candidate gene network; and bi , i 5 1; :::; L, denote their corresponding kinetic coefficients (or regulation abilities). In addition, τ i denotes the expression delay from regulatory gene i to the target gene, which was detected via identifying the model by the fact that at the delay τ i the regulatory gene i has the highest correlation with the target gene. The value of τ i will be iteratively detected from 0 to 2 h (four time points) by a minimum loss function based on AIC in (3.14) at the final pruning step (AIC). It can be ensured that the detected τ i has the best model fitting, although it has a large amount of computations. k in Eq. (3.1) represents the basal molecular level to denote the regulation of unknown factors. εðtÞ denotes a random noise due to model uncertainty and measurement noise of the mRNA microarray in the target gene. The transcriptional regulatory functions xi ðtÞ of TFs are binding on their motif binding sites described by the following sigmoid functions of mRNA expression profiles of their corresponding regulatory genes, respectively [43]: xi ðtÞ 5 fi yi ðtÞ 5
1 1 1 exp 2r yi ðtÞ 2 mi
(3.2)
where the sigmoid functions in Eq. (3.2) denote the thresholds of bindings of TFs on motif binding sites of target gene for the transcriptional regulation in Eq. (3.1). Step 7: By combining the maximum likelihood parameter estimation method with the most parsimonious model order detection method using the AIC (see Section 3.7), we could prune the candidate gene network to obtain a more real gene network through the most parsimonious gene transcription regulatory model in Eq. (3.1), that is, the insignificant interactions (or small bi ) out of the detected model order could be deleted by AIC. With the upstream regulatory genes as target genes, we can then trace back their upstream regulatory genes by a similar GRN construction procedure. Iteratively, we could construct the whole GRN of systematic inflammation in the innate immune system. The results of selection by AIC pruning method are listed [see Table 3.3, column (C)].
3.4 On the construction of inflammatory gene regulatory network in immune system Based on the 49 target genes (see Table 3.1) and their candidate regulators [see Table 3.3, Column (C)], we construct a candidate GRN of the human inflammatory system.
I. Systems Immunology
3.4 On the construction of inflammatory gene regulatory network in immune system
39
Then, according to the candidate GRN, we employ the dynamic model for the candidate GRN to prune it once more to set up a real GRN by a system identification scheme and parsimonious AIC method via microarray data. At this step, we can construct two more real GRNs for both the inflammatory/activated and the normal/resting conditions by the same construction flowchart as shown in Fig. 3.1, and two GRNs are drawn by the Osprey tool [44] (see Figs. 3.3 and 3.4). In Fig. 3.3, there are 94 nodes with 336 edges for the inflammatory/activated gene network, and in Fig. 3.4, there are 66 nodes with 264 edges for the normal/resting gene network. By comparing the inflammatory GRN with the normal GRN, we could obtain the differential/perturbed GRN (see Figs. 3.5 and 3.6). While some interactions can be found in both the normal and the inflammatory (LPS treated) GRNs, we extracted the unique connections that are only in one specific network but not in another. We showed the similarities and the differences in the GRN of an inflammatory system between the normal and inflammatory cells (see Table 3.4). This significant finding helps us to better understand the systematic effect of inflammatory stimulus on the innate immune system. As mentioned, the perturbed inflammatory GRN in the immune system between normal and LPSstress cells is the focus of systems inflammation in this chapter.
FIGURE 3.3 The inflammatory transcriptional gene network in immune system with LPS. The inflammatory gene network with LPS containing [3]. LPS, Lipopolysaccharide.
I. Systems Immunology
40
3. Identifying the gene regulatory network of systems inflammation in humans
FIGURE 3.4 The inflammatory transcriptional gene network in immune system without LPS. The inflammatory gene network in normal condition [3]. LPS, Lipopolysaccharide.
We further lay out the perturbed inflammatory GRN to locate the significant differential connections of the key components. We can observe many differences in normal and inflammatory conditions from the perturbed gene network. In Fig. 3.5, 64 nodes with 131 edges of GRN are found only in normal condition but not in inflammatory condition, and there are 4 hubs (FOXD1, SPIB, YY1, and TLR4) that appear to be highly connected. In Fig. 3.6, 70 nodes with 159 edges for GRN are found only in inflammatory condition but not in normal condition, and there are clearly three hubs (FOXL1, TFAP2A, and SOX9) within this perturbed GRN. It is noteworthy to mention that these highly connected hubs have been found in several previous studies. For example, TFAP2A is inactivated [45], and SOX9 is inhibitive [46] in response to inflammation as shown in Fig. 3.6. And FOXL1 is dramatically induced during the hepatic stellate cell activation [47], and preliminary experimental data indicate that FoxL1 is also involved in the regulation of the adhesion molecule ICAM-1, which is an important mediator of neutrophil recruitment in liver injury. The current investigation is focused on delineating the systematic mechanism by which FOXL1 regulates inflammatory signaling in the liver.
I. Systems Immunology
3.4 On the construction of inflammatory gene regulatory network in immune system
41
FIGURE 3.5 The perturbed transcriptional gene regulatory network. Gene regulatory network only in normal condition but not in inflammatory condition [3].
The connection degree (i.e., the number of connections) of each node of Fig. 3.6 is summarized in the Table 3.5, and a list of regulators with connection degree $ 8 (see Table 3.6) is compiled to identify perturbed hub proteins that induce differences of gene regulation network between inflammatory and normal conditions. These proteins are possible target regulators for drug discovery investigation (such as antiinflammatory drugs [4850]). Finally, we summarize the gene connectivity of six regulators (FOXL, TFAP2A, SOX9, GATA2, AML1, and NR3C1) with high degree of connectivity in Table 3.6, which have been confirmed in agreement with previous research findings [4552]. It has been shown that a robust GRN can form a scale-free network, that is, genes are more preferable to form links with other genes that already have the highest number of links [53,54]. Scale-free gene networks are more tolerable to the random removal of nodes but are vulnerable to the loss of highly interactive hubs [53,54]. This may result in the lethal outcome in a system’s behavior as these highly connected hubs are targeted. In the inflammatory gene network as shown in Fig. 3.3, genes such as NF-κB, TNF-α, and RELA can be considered as highly connected hubs of the signaling transduction. If they are inactivated by mutation or disease, the inflammatory GRN will lead to the eventual collapse of the whole inflammation system. In order to overcome this lethal outcome, “weak linkage” architectures have been
I. Systems Immunology
42
3. Identifying the gene regulatory network of systems inflammation in humans
FIGURE 3.6 The perturbed transcriptional gene regulatory network. Gene regulatory network only in inflammatory condition but not in normal condition [3].
evolved by natural selection to improve the robustness of GRNs of inflammation. We debate such versatile mechanisms underlie the essential gene regulatory process of robust GRN. As a result, some connections can easily be removed and some connections can easily be added in the GRN. Such concept is also known as “weak ties” in network theory [54]. “Weak ties” structures in biological networks enable the remove of old processes and the addition of new processes to the existing core processing to improve the information exchanges and signal transductions using common versatile mechanisms of robust GRNs that operate on diverse inputs to various stimuli [53]. Consequently, “weak ties” can improve the GRN’s robustness against external stimuli. Obviously, the connections of the perturbed gene network in Fig. 3.5 are found only in the normal condition. The perturbed GRN in Fig. 3.6 can therefore be considered as additional connections in the inflammatory GRN. To respond bacterial endotoxin the connections in Fig. 3.5 have been removed, and the connections in Fig. 3.6 have been added. Apparently, this result agrees with the concept of the so-called strength of weak ties in network theory, that is, the most important interactions and information exchanges sometimes occur via nodes from otherwise unrelated networks. This implies that nonhubs may play a pivotal role in the GRN [53,54]. Similarly, the “weak ties” architecture in NF-κB GRN
I. Systems Immunology
43
3.4 On the construction of inflammatory gene regulatory network in immune system
TABLE 3.4 The gene regulatory network in immune system of unactivated and inflammatory cells [3]. In unactivated condition
In inflammatory condition
IL17
RUNX1,SOX9,RORA,MEF2A,HLF,TFAP2A,NFIL3,ELK1, FOXD1,FOXL1,FOXI1,IRF1,YY1,REL,NFKB1,SPIB
RUNX1,SOX9,RORA,HLF,TFAP2A,NFIL3,ELK1, FOXD1,GATA2,FOXI1,YY1,REL,RELA,NFKB1,SPIB
IL1A
RUNX1,Pbx,SOX9,MEF2A,HLF,TFAP2A,E2F1,NFIL3,ELK1, FOXD1,FOXL1,FOXI1,YY1,REL
RUNX1,Pbx,HLF,TFAP2A,NFIL3,ELK1,FOXD1, FOXL1,GATA2,FOXI1,YY1,REL,RELA,NFKB1
TNFa
TFAP2A,FOXD1,GATA2,FOXI1,MAX,NFKB1
TFAP2A,GATA2,FOXI1,MAX,YY1,NFKB1,SPIB
IL6
RUNX1,MEF2A,HLF,FOXL1,GATA2,FOXI1,YY1,REL,RELA, SPIB
RUNX1,HLF,TFAP2A,FOXL1,GATA2,FOXI1,YY1, REL,NFKB1
IL1B
SOX9,MEF2A,TFAP2A,ELK1,FOXD1,FOXL1,GATA2,FOXI1, YY1,REL,NFKB1,SPIB
SOX9,MEF2A,ELK1,GATA2,FOXI1,RELA,NFKB1
TLR4
RUNX1,Pbx,SOX9,RORA,MEF2A,HLF,TFAP2A,NFIL3,ELK1, FOXF2,FOXD1,FOXL1,GATA2,FOXI1,IRF1,MAX,YY1,REL, RELA,SPIB
RUNX1,MEF2A,FOXI1,IRF1,MAX,REL,RELA,SPIB
NFATC3
NFIL3,ELK1,FOXD1,FOXI1,MAX,YY1,REL
SOX9,HLF,NFIL3,ELK1,FOXD1,GATA2,FOXI1,MAX
SCYE1
RUNX1,FOXI1,YY1,SPIB
RUNX1,TFAP2A,FOXL1,REL,SPIB
TICAM2
RORA,GATA2,YY1,REL,RELA
SOX9,TFAP2A,ELK1,FOXL1,RELA,SPIB
HDAC4
YY1
TFAP2A,FOXL1
HDAC5
NFIL3,FOXI1,YY1,REL,SPIB
RUNX1,FOXD1,FOXL1,MAX,YY1,REL,SPIB
HDAC7A
RUNX1,ELK1,FOXD1,GATA2,YY1,NFKB1,SPIB
TFAP2A,E2F1,FOXD1,FOXL1,GATA2,FOXI1, YY1NFKB1,SPIB,
HDAC9
NFIL3,SPIB
NFIL3,IRF1,REL
ITGB2
RUNX1,ELK1,FOXD1,GATA2,FOXI1,IRF1,MAX
RUNX1,TFAP2A,ELK1,FOXD1,FOXL1,GATA2, FOXI1,IRF1,MAX,YY1,SPIB
CXCL2
RUNX1,RORA,E2F1,NFIL3,FOXF2,GATA2,FOXI1,IRF
SOX9,RORA,E2F1,NFIL3,FOXD1,FOXL1,GATA2
ALOX5
RORA,FOXD1,YY1,SPIB
RUNX1,MEF2A,TFAP2A,SPIB
NFKBIA
MEF2A,FOXI1
RUNX1,MEF2A,FOXI1,IRF1,SPIB
NR3C1
RORA,FOXD1,FOXI1,YY1
RUNX1,Pbx,SOX9,MEF2A,TFAP2A,FOXD1,FOXL1, GATA2,FOXI1,YY1,SPIB
CEBPD
TFAP2A,FOXF2,FOXD1,FOXL1,GATA2,FOXI1,YY1,REL
NFIL3,FOXF2,FOXI1,SPIB
ANXA1
YY1,SPIB
SPIB
CYBB
FOXI1,IRF1,YY1,REL,SPIB
FOXL1,SPIB
AOAH
RORA,TFAP2A,FOXF2,FOXD1,FOXL1,YY1,REL,RELA,SPIB
RUNX1,FOXL1,FOXI1,RELA
REG3A
RORA,FOXI1,YY1,SPIB
SOX9,RORA,ELK1,FOXL1,GATA2,SPIB
FOS
SOX9,E2F1,GATA2,FOXI1,YY1,SPIB
RUNX1,TFAP2A,ELK1,FOXL1
IRAK
RUNX1,SOX9,RORA,E2F1,FOXD1,GATA2,MAX,YY1,SPIB
E2F1,SPIB
PLAA
RUNX1,MEF2A,FOXF2,FOXD1,GATA2,FOXI1,IRF1,MAX, YY1,SPIB
RUNX1,SOX9,TFAP2A,FOXF2,FOXD1,FOXL1, GATA2,MAX,SPIB
CCR7
IRF1,SPIB
IRF1,REL
(Continued)
I. Systems Immunology
44
3. Identifying the gene regulatory network of systems inflammation in humans
TABLE 3.4 (Continued) In unactivated condition
In inflammatory condition
CXCL14
GATA2,FOXI1,IRF1,YY1,REL,RELA,SPIB
RUNX1,SOX9,HLF,TFAP2A,E2F1,FOXD1,FOXL1, GATA2,YY1,REL,RELA,SPIB
PLA2G4B
E2F1,FOXI1,YY1,REL,SPIB
MEF2A,NFIL3,FOXI1,YY1,SPIB
NFRKB
RORA,FOXD1,FOXI1,SPIB
RUNX1,SOX9,RORA,E2F1,FOXD1,REL,SPIB,GATA2, FOXI1,MAX,YY1
MAPK10
E2F1,IRF1,REL,SPIB
RUNX1,SOX9,TFAP2A,E2F1,ELK1,FOXL1,GATA2, FOXI1,REL,SPIB
ADORA2A RUNX1,RORA,FOXD1,GATA2,IRF1,YY1,REL,RELA,SPIB
RUNX1,RORA,TFAP2A,E2F1, REL,RELA,SPIB, FOXL1,GATA2,IRF1,YY1
SCCE
RUNX1,FOXD1,SPIB
RUNX1,SOX9,RORA,HLF,ELK1,FOXD1,FOXL1, GATA2,YY1,SPIB
ADORA3
RUNX1,GATA2,YY1
Pbx,GATA2
NFKB1
RUNX1,ELK1,REL,SPIB
RUNX1,SOX9,ELK1,GATA2,FOXI1,YY1,SPIB
CCL18
FOXD1,GATA2,YY1,SPIB
Pbx,SOX9,TFAP2A,NFIL3,FOXL1,GATA2,YY1
AMBP
FOXF2,FOXD1,FOXI1,MAX,YY1,REL,SPIB
TFAP2A,FOXF2,FOXD1,FOXL1,GATA2,FOXI1,MAX, YY1,REL,SPIB
TACR1
RUNX1,E2F1,FOXD1,YY1
RUNX1,SOX9,RORA,TFAP2A,E2F1,FOXD1,FOXL1, MAX,YY1,SPIB
KNG
RUNX1,FOXI1,YY1,REL,SPIB
RUNX1,SOX9,TFAP2A,FOXL1,GATA2,FOXI1,YY1, REL
BLNK
NFIL3
MEF2A,FOXI1
ABCF1
RUNX1,E2F1,FOXD1,YY1,RELA,NFKB1,SPIB
RUNX1,E2F1,ELK1,GATA2,FOXI1,MAX,YY1,REL, RELA,NFKB1
HPSE
E2F1,ELK1,FOXD1,FOXI1,YY1,REL,SPIB,
REL
TLR7
RUNX1,MEF2A,FOXD1,YY1,REL,SPIB
RUNX1,MEF2A,IRF1,REL,NFKB1,SPIB
IL22
RUNX1,SOX9,E2F1,FOXD1,GATA2,FOXI1,REL,NFKB1,SPIB
RUNX1,SOX9,RORA,E2F1,MAX,YY1,REL,NFKB1, SPIB,ELK1,FOXD1,FOXL1,GATA2,FOXI1
GPR132
E2F1,GATA2,IRF1,NFKB1,SPIB
SOX9,RORA,MEF2A,TFAP2A,E2F1,FOXL1,GATA2, FOXI1,IRF1,YY1,SPIB
IL1R
RUNX1,SOX9,HLF,TFAP2A,NFIL3,ELK1,FOXD1,GATA2, YY1,REL,RELA,SPIB
TFAP2A,NFIL3,FOXL1,RELA
TOLLIP
ELK1,FOXD1,GATA2,FOXI1,MAX,YY1,RELA,NFKB1,SPIB
RUNX1,SOX9,RORA,HLF,TFAP2A,NFIL3,ELK1, YY1,REL,RELA,NFKB1,FOXD1,FOXL1,GATA2, FOXI1,MAX
IL8
Pbx,SOX9,RORA,MEF2A,HLF,TFAP2A,E2F1,NFIL3,ELK1, FOXF2,FOXD1,FOXL1,GATA2,FOXI1,MAX,YY1,REL,SPIB
RUNX1,SOX9,RORA,MEF2A,HLF,E2F1,NFIL3,ELK1, FOXF2,FOXD1,GATA2,FOXI1,MAX,YY1,REL,RELA
TNFR
Pbx,SOX9,TFAP2A,FOXL1,GATA2,FOXI1
FOXI1
I. Systems Immunology
3.5 Biological insight and discussion
45
in inflammatory condition is shown in the removal and addition of connections of GRN in Figs. 3.7 and 3.8. In summary, the regulators of target genes are first selected by JASPAR, then truncated by the threshold of cross correlation, and finally pruned by AIC via microarray data and a dynamic gene regulatory model. We combine several algorithms and tools to improve the performance of the GRN construction of the target inflammatory system. All the data sources are independently produced by various research groups, and the results are verified with more independent studies published previously. It is clear that the top-down procedures can predict the target genes and their regulatory TFs well. More biological insight into the perturbed inflammatory network is given in Section 3.9 and details of the proposed GRN construction algorithm are shown in Section 3.7.
3.5 Biological insight and discussion The NF-κB pathway, which is an important modular inflammatory system, is illustrated as a trimmed down GRN depicted in Figs. 3.7 and 3.8. This concise network includes important proinflammatory cytokine genes: IL1A, IL1B, IL1R, IL6, TNFA, IL17, and IL8 and the receptor genes: IL1R, TLR4, and TNFR, all of which have well-known roles in the NF-κB signaling pathway. This concise network can help us to monitor the performance of inflammatory responses under diverse conditions. By the proposed method shown in this chapter, we can predict the dynamic regulatory profiles of those cytokines. As expected, the results are comparable to the findings published in previous studies [25,42] discussed in the following paragraphs. The in silico findings confirm the wet-bench observation that many characterized genes in the common inflammation response are regulated by the TF NF-κB [23]. On the other hand, the perturbed GRN of these proinflammatory genes in NF-κB signaling pathway is as shown in Figs. 3.9 and 3.10. The perturbation in Fig. 3.9 is more complicated than the perturbation shown in Fig. 3.10 because in normal condition, these genes have to fulfill other biochemical tasks other than inflammation. Our analysis also reveals that the important genes (IL1A, IL1B, IL1R, IL6, TNFA, IL17, IL8, IL1R, TLR4, and TNFR) detected by our algorithms are vital for the inflammatory response because they are more connected during inflammation than in normal conditions. In inflamed conditions, they appear to work in accordance with each other to enhance their effects on the inflammatory responses. For example, there are strong evidences to support that NF-κB1 and RELA have to regulate the proinflammatory genes collectively when they are in inflammatory responses [24]. In recent studies [25,42], cytokine and chemokine networks have been shown to play a pivotal role in inflammation because they are involved either directly or indirectly in the innate and adaptive immune responses. It has been shown that interleukin 1 alpha (IL1A) and interleukin 1 beta (IL1B) act via their receptor (IL1R) to induce gene expressions which in term mediate a feedback protein synthesis involved in the later wave of inflammatory responses [33,42]. This is in agreement with the dynamic expression profiles of the proinflammatory genes and their receptors (IL1A, IL1B, IL1R, IL6, TNFA, IL17 TLR4, TNFR, and IL8), which are simulated by our dynamic regulatory model (see Fig. 3.11). The accuracy of the curve fitting data has well demonstrated the prediction power of the proposed method. Without a doubt the performance
I. Systems Immunology
TABLE 3.5 Gene connectivities only in inflammatory condition but not in normal condition (P-value # 0.05) [3]. Screen name
Orf name
Description
goFunction
goProcess
Connectivities
FOXL1
HGNC:3817
Forkhead box L1
Sequence-specific DNA binding; DNA bending activity; transcription factor activity
Regulation of transcription, DNA dependent; multicellular organismal development; transcription
23
TFAP2A
HGNC:11742 Transcription factor AP-2 alpha (activating enhancer-binding protein 2 alpha)
Protein dimerization activity; protein binding; transcription coactivator activity; RNA polymerase II transcription factor activity, enhancer binding; transcription factor activity
Regulation of transcription from RNA polymerase II promoter; signal transduction; regulation of transcription, DNA dependent; ectoderm development; transcription
19
SOX9
HGNC:11204 SRY (sex-determining region Y)box 9 (campomelic dysplasia, autosomal sex-reversal)
DNA binding; specific RNA polymerase II transcription factor activity; protein binding; transcriptional activator activity
Regulation of apoptosis; positive regulation of transcription from RNA polymerase II promoter; regulation of transcription from RNA polymerase II promoter; male germ line sex determination; regulation of transcription, DNA dependent; transcription; negative regulation of transcription, DNA dependent; hair follicle development; epithelial to mesenchymal transition; regulation of cell proliferation; male gonad development; heart development; neural crest cell development; cell fate specification; cartilage condensation; skeletal development
16
GATA2
HGNC:4171
Zinc ion binding; metal ion binding; sequence-specific DNA binding; transcription factor activity
Neuron differentiation; phagocytosis; regulation of transcription, DNA dependent; transcription; pituitary gland development; positive regulation of phagocytosis; transcription from RNA polymerase II promoter; cell maturation
12
AML1
HGNC:10471 Runt-related transcription factor 1 Molecular_ function; ATP binding; (acute myeloid leukemia 1; aml1 molecular function unknown; oncogene) transcription factor activity; protein binding; chloride ion binding; transcriptional activator activity
Positive regulation of transcription from RNA polymerase II promoter; regulation of transcription, DNA dependent; transcription; behavioral response to pain; hemopoiesis; multicellular organismal development; biological_process; neuron development; positive regulation of granulocyte differentiation; positive regulation of angiogenesis; biological process unknown; skeletal development
11
GATA-binding protein 2
(Continued)
TABLE 3.5 (Continued) Screen name
Orf name
Description
goFunction
goProcess
Connectivities
NR3C1
HGNC:7978
Nuclear receptor subfamily 3, group C, member 1 (glucocorticoid receptor)
DNA binding; steroid hormone receptor activity; ligand-dependent nuclear receptor activity; sequence-specific DNA binding; transcription factor activity; receptor activity; zinc ion binding; glucocorticoid receptor activity; protein binding; metal ion binding; lipid binding; steroid binding
Regulation of transcription, DNA dependent; transcription; transcription from RNA polymerase II promoter; inflammatory response; signal transduction; sex determination
8
YY1
HGNC:12856 YY1 transcription factor
Zinc ion binding; metal ion binding; protein binding; transcription corepressor activity; transcription coactivator activity; transcription factor activity
Regulation of transcription from RNA polymerase II promoter; eye morphogenesis (sensu Mammalia); transcription; antimicrobial humoral response (sensu Vertebrata); anterior/ posterior pattern formation
7
SCCE
HGNC:6368
Trypsin activity; serine-type endopeptidase activity; chymotrypsin activity
Epidermis development; proteolysis
7
GPR132
HGNC:17482 G proteincoupled receptor 132
Rhodopsin-like receptor activity; receptor activity
Signal transduction; G proteincoupled receptor protein signaling pathway;G1/S transition of mitotic cell cycle
7
CXCL14
HGNC:10640 Chemokine (CXC motif) ligand 14
Chemokine activity
Signal transduction; chemotaxis; cellcell signaling; immune response; inflammatory response
7
TOLLIP
HGNC:16476 Toll-interacting protein
Toll binding; signal transducer activity; protein binding
Leukocyte activation; intracellular signaling cascade; cellcell signaling; immune response; phosphorylation; inflammatory response
7
NFKB1
HGNC:7794
Specific transcriptional repressor activity; protein binding; transcription factor activity
Signal transduction; regulation of transcription, DNA dependent; negative regulation of interleukin 12 biosynthetic process; regulation of transcription; response to pathogenic bacteria; inflammatory response; negative regulation of transcription, DNA dependent; positive regulation of transcription; antibacterial humoral response (sensu Vertebrata); transcription from RNA polymerase II promoter; antiapoptosis; apoptosis
7
Kallikrein-related peptidase 7
Nuclear factor of kappa light polypeptide gene enhancer in Bcells 1 (p105)
(Continued)
TABLE 3.5 (Continued) Screen name
Orf name
Description
goFunction
goProcess
Connectivities
Sequence-specific DNA binding; RNA polymerase II transcription factor activity; transcription factor activity
Regulation of transcription from RNA polymerase II promoter; macrophage differentiation; transcription
7
SPIB
HGNC:11242 Spi-B transcription factor (Spi-1/ PU.1 related)
NFRKB
HGNC:7802
Nuclear factor related to κBbinding protein
Specific RNA polymerase II transcription factor activity; DNA binding
Inflammatory response; transcription from 7 RNA polymerase II promoter
MAPK10
HGNC:6872
Mitogen-activated protein kinase 10
ATP binding; protein-tyrosine kinase activity; MAP kinase kinase activity; MAP kinase activity; protein binding; JUN kinase activity; transferase activity; nucleotide binding; protein serine/ threonine kinase activity; protein kinase activity
Signal transduction; JNK cascade; protein amino acid phosphorylation
7
FOXI1
HGNC:3815
Forkhead box I1
Transcriptional activator activity; sequence-specific DNA binding; DNA bending activity; transcription factor activity
Multicellular organismal development; regulation of transcription, DNA dependent; sensory perception of sound; inner ear morphogenesis; transcription; positive regulation of transcription, DNA dependent
7
ELK-1
ELK-1
user_defined_node
None
None
7
MAX
HGNC:6913
MYC-associated factor X
Protein binding; transcription coactivator activity; transcription regulator activity; transcription factor activity; DNA binding
Regulation of transcription, DNA dependent; regulation of transcription; transcription; transcription from RNA polymerase II promoter
6
TACR1
HGNC:11526 Tachykinin receptor 1
Receptor activity; tachykinin receptor activity; rhodopsin-like receptor activity
Tachykinin signaling pathway; 6 mechanosensory behavior; signal transduction; G- proteincoupled receptor protein signaling pathway; inflammatory response; response to pain; G-protein signaling, coupled to IP3 second messenger (phospholipase C activating); detection of abiotic stimulus
(Continued)
TABLE 3.5
(Continued)
Screen name
Orf name
Description
goFunction
goProcess
Connectivities
C-REL
HGNC:9954
V-rel reticuloendotheliosis viral oncogene homolog (avian)
Signal transducer activity; protein binding; transcription factor activity
Positive regulation of IκB kinase/NF-κB cascade; regulation of transcription, DNA dependent; positive regulation of interleukin 12 biosynthetic process; positive regulation of transcription, DNA dependent; transcription from RNA polymerase II promoter; cytokine production
6
TICAM2
HGNC:21354 Toll-like receptor adaptor molecule 2
Protein binding; signal transducer activity; Positive regulation of IκB kinase/NF-κB cascade; intracellular protein transport; transmembrane receptor activity; protein inflammatory response; transport carrier activity
5
CCL18
HGNC:10616 Chemokine (CC motif) ligand 18 (pulmonary and activation regulated)
Cytokine activity; chemokine activity
5
ABCF1
HGNC:70
ATP binding; ATPase activity; nucleoside- Translation; inflammatory response; triphosphatase activity; ATPase activity, electron transport; transport; protein coupled to transmembrane movement of folding substances; nucleotide binding; iron ion binding; electron transporter activity; ironsulfur cluster binding; unfolded protein binding; translation factor activity, nucleic acid binding; metal ion binding; electron carrier activity; heat shock protein binding
IL22
HGNC:14900 Interleukin 22
Interleukin 22 receptor binding; cytokine activity
Immune response; inflammatory response; 5 acute-phase response; cellcell signaling
RORA
HGNC:10258 RAR-related orphan receptor A
Zinc ion binding; metal ion binding; protein binding; steroid hormone receptor activity; sequence-specific DNA binding; transcription factor activity
5 Signal transduction; regulation of transcription, DNA dependent; transcription; cGMP metabolic process; regulation of macrophage activation; nitric oxide biosynthetic process
ATP-binding cassette, subfamily F (GCN20), member 1
Signal transduction; chemotaxis; cellcell signaling; immune response; sensory perception; inflammatory response; antimicrobial humoral response (sensu Vertebrata); response to biotic stimulus
5
(Continued)
TABLE 3.5 (Continued) Screen name
Orf name
Description
goFunction
goProcess
Connectivities
MEF2A
HGNC:6993
MADS box transcription enhancer factor 2, polypeptide A (myocyte enhancer factor 2A)
Protein binding; transcription coactivator activity; sequence-specific DNA binding; transcription factor activity; DNA binding
Regulation of transcription, DNA dependent; transcription; muscle development; positive regulation of transcription; transcription from RNA polymerase II promoter
5
KNG
HGNC:6383
Kininogen 1
Zinc ion binding; protein binding; cysteine Smooth muscle contraction; natriuresis; 4 protease inhibitor activity; heparin diuresis; vasodilation; negative regulation binding; receptor binding of cell adhesion; positive regulation of apoptosis; negative regulation of blood coagulation; inflammatory response; blood pressure regulation; blood coagulation
FOS
HGNC:3796
V-fos FBJ murine osteosarcoma viral oncogene homolog
Protein dimerization activity; sequencespecific DNA binding; specific RNA polymerase II transcription factor activity; transcription factor activity; DNA binding
Regulation of transcription from RNA polymerase II promoter; regulation of transcription, DNA dependent; nervous system development; inflammatory response; DNA methylation
4
REG3A
HGNC:8601
Regenerating isletderived 3 alpha
Sugar binding
Multicellular organismal development; heterophilic cell adhesion; cell proliferation; inflammatory response; acute-phase response
4
HDAC7A
HGNC:14067 Histone deacetylase 7A
Specific transcriptional repressor activity; transcription factor binding; transcription corepressor activity; histone deacetylase activity; hydrolase activity
Negative regulation of transcription from RNA polymerase II promoter; regulation of transcription, DNA dependent; nervous system development; transcription; negative regulation of striated muscle development; inflammatory response; regulation of progression through cell cycle; chromatin modification; B-cell differentiation
4
(Continued)
TABLE 3.5 (Continued) Screen name
Orf name
Description
goFunction
goProcess
Connectivities
RELA
HGNC:9955
V-rel reticuloendotheliosis viral oncogene homolog A, nuclear factor of kappa light polypeptide gene enhancer in B-cells 3, p65 (avian)
Phosphate binding; RNA polymerase II transcription factor activity, enhancer binding; protein kinase activity; transcription factor activity; NF-κB binding; signal transducer activity; protein kinase binding; protein binding; identical protein binding; protein N-terminus binding
Defense response to virus; antiapoptosis; liver development; regulation of transcription, DNA dependent; negative regulation of protein catabolic process; response to organic substance; regulation of transcription; positive regulation of transcription, DNA dependent; cytokineand chemokine-mediated signaling pathway; hair follicle development; positive regulation of interleukin 12 biosynthetic process; transcription from RNA polymerase II promoter; inflammatory response; response to toxin; positive regulation of IκB kinase/NF-κB cascade; cellular defense response; activation of NF-κB transcription factor; response to UV-B
4
E2F1
HGNC:3113
E2F transcription factor 1
Protein binding; transcription corepressor activity; transcription factor activity
Negative regulation of transcription from RNA polymerase II promoter; regulation of transcription, DNA dependent; G1 phase of mitotic cell cycle; transcription; cell cycle; cell proliferation; regulation of progression through cell cycle; apoptosis
4
NFIL3
HGNC:7787
Nuclear factor, interleukin 3 regulated
Protein dimerization activity; transcription corepressor activity; sequence-specific DNA binding; transcription factor activity; DNA binding
Regulation of transcription, DNA dependent; immune response; transcription from RNA polymerase II promoter
4
HDAC5
HGNC:14068 Histone deacetylase 5
Catalytic activity; transcription corepressor activity; hydrolase activity; histone deacetylase activity; transcription factor binding; specific transcriptional repressor activity
Negative regulation of transcription from RNA polymerase II promoter; regulation of transcription, DNA dependent; transcription; chromatin remodeling; inflammatory response; negative regulation of striated muscle development; regulation of progression through cell cycle; heart development; Bcell differentiation; chromatin modification; chromatin silencing
4
(Continued)
TABLE 3.5 (Continued) Screen name
Orf name
Description
goFunction
goProcess
Connectivities
ITGB2
HGNC:6155
Integrin, beta 2 (complement component 3 receptor 3 and 4 subunit)
Receptor activity; protein kinase binding; protein binding
Cellcell signaling; apoptosis; leukocyte adhesion; cell adhesion; neutrophil chemotaxis; antimicrobial humoral response (sensu Vertebrata); multicellular organismal development; inflammatory response; regulation of peptidyl-tyrosine phosphorylation; integrin-mediated signaling pathway; cell-matrix adhesion; regulation of cell shape
4
HLF1
HLF1
user_defined_node
NONE
NONE
4
CXCL2
HGNC:4603
Chemokine (CXC motif) ligand 2
Cytokine activity; chemokine activity
Chemotaxis; G protein coupled receptor protein signaling pathway; immune response; sensory perception; inflammatory response; response to stimulus
3
IL1A
HGNC:5991
Interleukin 1, alpha
Interleukin 1 receptor binding; signal transducer activity; protein binding
Fever; chemotaxis; cellcell signaling; negative regulation of cell proliferation; immune response; cell proliferation; inflammatory response; regulation of progression through cell cycle; antiapoptosis; apoptosis
3
IRF1
HGNC:6116
Interferon regulatory factor 1
Transcription factor activity
3 Regulation of transcription, DNA dependent; positive regulation of interleukin 12 biosynthetic process; transcription; cell cycle; immune response; positive regulation of transcription, DNA dependent; negative regulation of progression through cell cycle; transcription from RNA polymerase II promoter
(Continued)
TABLE 3.5 (Continued) Screen name
Orf name
Description
goFunction
goProcess
PBX1
HGNC:8632
PreB-cell leukemia transcription factor 1
DNA binding; sequence-specific DNA binding; transcription factor activity; protein heterodimerization activity; protein binding
Sex differentiation; ureteric bud 3 branching; urogenital system development; positive regulation of transcription from RNA polymerase II promoter; regulation of transcriptional preinitiation complex formation; regulation of transcription, DNA dependent; embryonic development; organ morphogenesis; hindbrain development; transcription from RNA polymerase II promoter; cell differentiation; adrenal gland development; C21-steroid hormone biosynthetic process; spleen development; positive regulation of cell proliferation
NFATC3
HGNC:7777
Nuclear factor of activated T cells, cytoplasmic, calcineurindependent 3
Protein binding; transcription coactivator activity; transcription factor activity
Heart development; regulation of transcription from RNA polymerase II promoter; regulation of transcription, DNA dependent; inflammatory response; cellular respiration
3
NFKBIA
HGNC:7797
Nuclear factor of kappa light polypeptide gene enhancer in Bcell inhibitors, alpha
Transcription factor binding; NF-κB binding; nuclear localization sequence binding; ubiquitin protein ligase binding
Protein import into nucleus, translocation; regulation of NF-κB import into nucleus; negative regulation of Notch signaling pathway; negative regulation of myeloid cell differentiation; negative regulation of DNA binding; regulation of cell proliferation; response to pathogenic bacteria; cytoplasmic sequestering of NFκB; apoptosis
3
SCYE1
HGNC:10648 Small inducible cytokine subfamily E, member 1 (endothelial monocyte-activating)
Cytokine activity; tRNA binding; nucleic acid binding
Signal transduction; chemotaxis; cellcell signaling; inflammatory response; tRNA aminoacylation for protein translation; translation
3
PLAA
HGNC:9043
Protein binding; phospholipase A2 activator activity; binding
Signal transduction; phospholipid metabolic process; inflammatory response
3
Phospholipase A2activating protein
Connectivities
(Continued)
TABLE 3.5 (Continued) Screen name
Orf name
ADORA2A HGNC:263
Description
goFunction
goProcess
Connectivities
Adenosine A2a receptor
Gastric inhibitory peptide receptor activity; adenosine receptor activity, Gprotein coupled; receptor activity; signal transducer activity; rhodopsin-like receptor activity; G protein coupled receptor activity;A3 adenosine receptor activity, G proteincoupled;A2A adenosine receptor activity, G-protein coupled
Cellcell signaling; synaptic transmission, dopaminergic; apoptosis; G protein signaling, coupled to cAMP nucleotide second messenger; G protein coupled receptor protein signaling pathway; neurotransmitter transport; sensory perception; circulation; adenosine receptor signaling pathway; central nervous system development; inflammatory response; signal transduction; adenylate cyclase activation; phagocytosis; blood coagulation; cellular defense response; cAMP biosynthetic process; eating behavior; locomotory behavior
3
Forkhead box D1
Sequence-specific DNA binding; DNA bending activity; transcription factor activity
Regulation of transcription, DNA dependent; transcription
3
3
FOXD1
HGNC:3802
ALOX5
RP1167C2.3 Arachidonate 5-lipoxygenase
Lipoxygenase activity; arachidonate 5-lipoxygenase activity; protein binding; calcium ion binding; iron ion binding; oxidoreductase activity
Electron transport; inflammatory response; leukotriene metabolic process; leukotriene biosynthetic process
AMBP
HGNC:453
IgA binding; trypsin inhibitor activity; transporter activity; plasmin inhibitor activity; calcium channel inhibitor activity; heme binding; protein homodimerization activity; calcium oxalate binding; serinetype endopeptidase inhibitor activity
3 Negative regulation of JNK cascade; cell adhesion; proteinchromophore linkage; antiinflammatory response; transport; pregnancy; negative regulation of immune response; heme catabolic process
Alpha-1-microglobulin/bikunin precursor
(Continued)
TABLE 3.5 (Continued) Screen name
Orf name
Description
goFunction
goProcess
Connectivities
TNFA
DASS280D8.2
Tumor necrosis factor (TNF superfamily, member 2)
Protein binding; cytokine activity; tumor necrosis factor receptor binding
Leukocyte adhesion; antiapoptosis; apoptosis; organ morphogenesis; negative regulation of glucose import; positive regulation of IκB kinase/NF-κB cascade; regulation of immunoglobulin secretion; multicellular organismal development; regulation of protein amino acid phosphorylation; glucose metabolic process; regulation of transcription, DNA dependent; positive regulation of transcription from RNA polymerase II promoter; defense response to bacterium; positive regulation of transcription; negative regulation of transcription from RNA polymerase II promoter; cellcell signaling; response to virus; cell death; response to wounding; humoral immune response; immune response; signal transduction; inflammatory response; regulation of cell proliferation; positive regulation of translational initiation by iron; regulation of osteoclast differentiation; induction of apoptosis via death domain receptors; protein import into nucleus, translocation; positive regulation of JNK cascade; activation of NF-κB transcription factor; cellular ex
3
HDAC4
HGNC:14063 Histone deacetylase 4
DNA binding; hydrolase activity; histone deacetylase activity; transcription factor binding; transcriptional repressor activity
Negative regulation of transcription from RNA polymerase II promoter; regulation of transcription, DNA dependent; cell cycle; transcription; multicellular organismal development; inflammatory response; negative regulation of striated muscle development; nervous system development; B-cell differentiation; chromatin modification; skeletal development; negative regulation of cell proliferation
2
(Continued)
TABLE 3.5 (Continued) Screen name
Orf name
BLNK
Description
goFunction
goProcess
Connectivities
HGNC:14211 B-cell linker
Protein binding; transmembrane receptor protein tyrosine kinase adaptor protein activity;SH3/SH2 adaptor activity
2 Intracellular signaling cascade; humoral immune response; inflammatory response; hemocyte development (sensu Arthropoda);B-cell activation; B-cell differentiation
TLR7
HGNC:15631 Toll-like receptor 7
siRNA binding; protein binding; transmembrane receptor activity; singlestranded RNA binding; double-stranded RNA binding
2 Defense response to virus; positive regulation of interferon-gamma biosynthetic process; immune response; positive regulation of interleukin 8 biosynthetic process; inflammatory response; positive regulation of interferonbeta biosynthetic process; positive regulation of interferon-alpha biosynthetic process
AOAH
HGNC:548
Acyloxyacyl hydrolase (neutrophil)
Catalytic activity; acyloxyacyl hydrolase activity; lipoprotein lipase activity; hydrolase activity, acting on ester bonds; hydrolase activity
Lipopolysaccharide metabolic process; inflammatory response; lipid metabolic process; negative regulation of inflammatory response
IL6
HGNC:6018
Interleukin 6 (interferon, beta 2)
Cytokine activity; protein binding; interleukin 6 receptor binding
2 Neuron differentiation; cell-surface receptor linked signal transduction; cellcell signaling; humoral immune response; negative regulation of apoptosis; negative regulation of chemokine biosynthetic process; negative regulation of cell proliferation; positive regulation of cell proliferation; neutrophil apoptosis; immune response; acute-phase response; B-cell differentiation
2
(Continued)
TABLE 3.5 (Continued) Screen name
Orf name
Description
goFunction
goProcess
IL8
HGNC:6025
Interleukin 8
Chemokine activity; protein binding; cytokine activity; interleukin 8 receptor binding
2 Cellcell signaling; response to stimulus; neutrophil activation; calcium-mediated signaling; G proteincoupled receptor protein signaling pathway; neutrophil chemotaxis; sensory perception; cell motility; immune response; intracellular signaling cascade; inflammatory response; angiogenesis; regulation of cell adhesion; induction of positive chemotaxis; cell cycle arrest; chemotaxis; regulation of retroviral genome replication; negative regulation of cell proliferation
CEBPD
HGNC:1835
CCAAT/enhancer-binding protein (C/EBP), delta
Sequence-specific DNA binding; transcription factor activity; protein dimerization activity; DNA binding
Regulation of transcription, DNA dependent; transcription from RNA polymerase II promoter; transcription
2
PLA2G4B
HGNC:9036
Phospholipase A2, group IVB (cytosolic)
Calcium-dependent phospholipase A2 activity; calcium-dependent phospholipid binding; calcium ion binding; lysophospholipase activity; phospholipase activity; hydrolase activity
Phospholipid catabolic process; glycerophospholipid catabolic process; calcium-mediated signaling; inflammatory response; arachidonic acid metabolic process; lipid catabolic process; parturition
2
IL17
HGNC:5981
Interleukin 17A
Cytokine activity
Cellcell signaling; protein amino acid glycosylation; immune response; inflammatory response; cell death; apoptosis
2
HDAC9
HGNC:14065 Histone deacetylase 9
Hydrolase activity; transcription corepressor activity; histone deacetylase activity; transcription factor binding; specific transcriptional repressor activity
Negative regulation of transcription from RNA polymerase II promoter; regulation of transcription, DNA dependent; histone deacetylation; transcription; inflammatory response; negative regulation of striated muscle development; regulation of progression through cell cycle; heart development; B-cell differentiation; chromatin modification
2
Connectivities
(Continued)
TABLE 3.5
(Continued)
Screen name
Orf name
Description
goFunction
goProcess
Connectivities
IL1B
HGNC:5992
Interleukin 1, beta
Signal transducer activity; Interleukin 1 receptor binding; protein binding; growth factor activity; Interleukin 1 receptor antagonist activity
Cellcell signaling; apoptosis; neutrophil chemotaxis; antimicrobial humoral response (sensu Vertebrata); positive regulation of interleukin 6 biosynthetic process; immune response; inflammatory response; fever; signal transduction; positive regulation of chemokine biosynthetic process; regulation of progression through cell cycle; leukocyte migration; negative regulation of cell proliferation; cell proliferation
1
IL1R
HGNC:5993
Interleukin 1 receptor, type I
Receptor activity; protein binding; interleukin 1, Type I, activating receptor activity; interleukin 1 receptor activity; transmembrane receptor activity
Cell-surface receptor linked signal transduction; cytokine- and chemokinemediated signaling pathway; immune response; inflammatory response
1
CYBB
HGNC:2578
Cytochrome b-245, beta polypeptide (chronic granulomatous disease)
Metal ion binding; electron transporter activity; iron ion binding; voltage-gated ion channel activity; FAD binding; oxidoreductase activity
Electron transport; inflammatory response; antimicrobial humoral response (sensu Vertebrata); ion transport
1
CCR7
HGNC:1608
Chemokine (CC motif) receptor 7
Receptor activity; CC chemokine receptor activity; rhodopsin-like receptor activity
Signal transduction; chemotaxis; G proteincoupled receptor protein signaling pathway; elevation of cytosolic calcium ion concentration; inflammatory response; antimicrobial humoral response (sensu Vertebrata)
1
ADORA3
HGNC:268
Adenosine A3 receptor
Signal transduction; adenylate cyclase 1 Receptor activity;A3 adenosine receptor activity, G-protein coupled; rhodopsin-like activation; regulation of heart contraction; G proteincoupled receptor protein receptor activity signaling pathway; inflammatory response
(Continued)
TABLE 3.5 (Continued) Screen name
Orf name
Description
goFunction
goProcess
Connectivities
TNFR1
HGNC:11916 Tumor necrosis factor receptor superfamily, member 1A
Receptor activity; protein binding; enzyme Cytokine- and chemokine-mediated binding; tumor necrosis factor receptor signaling pathway; signal transduction; activity positive regulation of IκB kinase/NF-κB cascade; inflammatory response; prostaglandin metabolic process; positive regulation of transcription from RNA polymerase II promoter; positive regulation of inflammatory response; apoptosis
0
ANXA1
HGNC:533
Annexin A1
Protein binding, bridging; structural molecule activity; receptor binding; calcium ion binding; phospholipase inhibitor activity; calcium-dependent phospholipid binding; phospholipase A2 inhibitor activity
Antiapoptosis; peptide cross-linking; cell cycle; cell motility; regulation of cell proliferation; inflammatory response; cell-surface receptor linked signal transduction; arachidonic acid secretion; lipid metabolic process; keratinocyte differentiation
0
IRAK
HGNC:6112
Interleukin 1 receptorassociated kinase 1
ATP binding; protein serine/threonine kinase activity; protein kinase activity; nucleotide binding; receptor activity; transferase activity; NF-κB-inducing kinase activity; interleukin 1 receptor binding; magnesium ion binding; protein kinase binding; protein homodimerization activity; protein binding; kinase activity; transcriptional activator activity
Protein amino acid autophosphorylation; positive regulation of transcription; cytokine- and chemokine-mediated signaling pathway; protein oligomerization; defense response; signal transduction; transmembrane receptor protein serine/threonine kinase signaling pathway; activation of NF-κB-inducing kinase; protein amino acid phosphorylation;
0
HPSE
HGNC:5164
Heparanase
Hydrolase activity; beta-glucuronidase activity; calcium ion binding; magnesium ion binding
Inflammatory response; proteoglycan metabolic process
0
(Continued)
TABLE 3.5 (Continued) Screen name
Orf name
Description
goFunction
goProcess
Connectivities
FOXF2
HGNC:3810
Forkhead box F2
Transcription coactivator activity; sequence-specific DNA binding; RNA polymerase II transcription factor activity; transcription factor activity
Establishment of epithelial cell polarity; embryonic gut development; regulation of transcription, DNA dependent; extracellular matrix organization and biogenesis; organ morphogenesis; lung development; transcription; vasculogenesis; positive regulation of transcription, DNA dependent; transcription from RNA polymerase II promoter
0
TLR4
HGNC:11850 Toll-like receptor 4
Receptor activity; transmembrane receptor activity; protein binding; lipopolysaccharide binding
Detection of pathogenic bacteria; macrophage activation; detection of fungus; positive regulation of Interleukin 13 biosynthetic process; positive regulation of interleukin 6 biosynthetic process; innate immune response; positive regulation of interleukin 12 biosynthetic process; positive regulation of interleukin 1 biosynthetic process; inflammatory response; negative regulation of osteoclast differentiation; signal transduction; positive regulation of JNK cascade; positive regulation of IκB kinase/NF-κB cascade; response to bacterium; activation of NF-κB-inducing kinase; mast cell activation; T-helper 1type immune response
0
61
3.5 Biological insight and discussion
TABLE 3.6 Gene Connectivity only in inflammatory condition but not in normal condition [3]. Regulator
Connectivity
References
FOXL1
23
[35]
TFAP2A
19
[45]
SOX9
16
[46]
GATA2
12
[48,50]
AML1
11
[51]
NR3C1
8
[49,52]
of the proposed method is very satisfactory. In Fig. 3.11 the interleukin 1alpha (IL1A) and interleukin 1 beta (IL1B) have a peak expression at 23 h poststimulation and then gradually decay because of the removal of bacterial endotoxin. Interestingly, its receptor (IL-1R) has a peak expression at 67 h poststimulation, while IL-1A expression has reached another peak in about 89 h. This concerted changes in cytokine and receptor may be explained by the following mechanism in which IL-1A has a positive feedback loop in the NF-κB signaling pathway through IL-1R when the affected signaling network suffers inflammatory stress [33]. The same situations occur as well in TNF-α and its designated receptor TNFR. The TLR4, which is in a growing TLR family structurally characterized by a cytoplasmic toll/interleukin 1R (TIR) domain and by extracellular leucine-rich repeats [35], has the same dynamic fluctuation as seen on IL1R or TNFR. The other genes such as IL8, IL6, IL17 and their own receptors are all exhibiting similar behavior in our analyses (data not shown). For step 7 of Fig. 3.1, in Section 3.8, the identification of time delay and the estimated parameters are shown in Tables 3.A1 and 3.A2, respectively. Although we have considered the effect of time lag τ i in our model, it is plausible that not all regulators have delay times on their transcription regulations. It seems that the regulation in inflammation may act so swiftly that parameter τ i cannot be detected (i.e., less than one time unit of microarray data or 1.5 h). However, there are several time lag regulations in IL8 and its regulators, such as SOX9, MEF2A, NFIL3, ELK1, FOXF1, FOXD1, GATA2, FOXI1, REL, and RELA. It is because IL8 has a more complicated regulatory mechanism through other pathways with a considerable delay. The dynamic regulatory model assumes that the expression profile of a target gene results from the kinetic activity of one or more specific regulators, which bind to the downstream target gene’s promoter site and initiate the transcription of that target gene to exert its effect on the inflammation GRN. In other words, it is possible to generate the target gene expression profile via the gene expression profiles of the upstream TFs using the dynamic regulatory model and its kinetic parameters in Eq. (3.1). The continuous gene expression profiles in Fig. 3.11 (also see Table 3.A1) are generated by the identified dynamic model for all target genes and their corresponding regulators, which can fit the microarray data reasonably well. Dynamic modeling of biological systems, including genetic regulatory networks and cell regulatory networks, has been applied in functional analyses for a long time. However, only a few of the other modeling have included the time-delay parameter that is comprehensively factored into our
I. Systems Immunology
62
3. Identifying the gene regulatory network of systems inflammation in humans
FIGURE 3.7 The important proinflammatory gene regulatory network induced or activated by NF-κB in immune system with LPS. The important proinflammatory gene regulatory network in inflammatory condition [3]. LPS, Lipopolysaccharide.
stochastic dynamic model. The findings shown in this study successfully demonstrate that we can efficiently refine the GRN of systemic inflammation in human via microarray data and to mimic the signaling transduction delay in the transcriptional regulatory process of human inflammatory gene network. Combining the cross-correlation selection algorithm and the AIC, we created a novel dynamic modeling algorithm to trim down the tangled regulatory genetic network of human inflammatory system without loss of biological meaning. The algorithm presented
I. Systems Immunology
3.5 Biological insight and discussion
63
FIGURE 3.8 The important proinflammatory gene regulatory network induced or activated by NF-κB in immune system without LPS. The important proinflammatory gene regulatory network in normal condition [3]. LPS, Lipopolysaccharide.
here can model all combinations of the target genes/regulators and give the best predictions on gene expression by the dynamic regulatory model. Instead of attempting to model the whole complicated regulatory processes with the high risk of incorrect prediction, our dynamic model focuses only on a concise set of target genes with a more reliable outcome. Iteratively, we could eventually construct the whole GRN of systemic inflammation in response to bacterial endotoxin by our dynamic model through microarray data. The essential problem with application of the multivariate procedures to the microarray gene expression data as expressed in recent publications is associated with reproducibility of the complex gene network constructions resulting from such analyses. In order to confirm the
I. Systems Immunology
64
3. Identifying the gene regulatory network of systems inflammation in humans
FIGURE 3.9 The important proinflammatory perturbed NF-κB gene regulatory network. Gene regulatory network only in normal condition but not inflammatory condition [3].
reproducibility of the proposed systems biology method, we use our algorithm to rebuild the GRN via the microarray data published in Ref. [55]. In Ref. [55], they found there are 19 genes with significant inflammatory responses. In this situation, we reconstruct the inflammatory gene network based on these 19 genes. After comparing the reconstructed inflammatory GRN with the one in the text, we found some similarities and differences. The same highly connected hubs are GATA2, AML1 (RUNX1), and YY1. There are more than five connections for these hubs in both perturbed inflammatory networks. However, for the lack of some specific gene
I. Systems Immunology
3.5 Biological insight and discussion
65
FIGURE 3.10 The important proinflammatory perturbed NF-κB gene regulatory network. Gene regulatory network only in inflammatory condition but not in normal condition [3].
expression data in Ref. [55], we were unable to verify a part of highly interactive genes in the text (i.e., FOXL1, TFAP2A, and SOX9). Interestingly, we also found there are some hubs only present in the reconstructed gene network but not in the text such as GATA3 and FPR, which would be involved in host defense against bacterial infection and in the clearance of damaged cells [56]. The reason why these 19 candidate genes still discovered new hubs is that some of 19 candidate genes are not included in the previous 49 genes. For different experimental conditions, research topics, and technology platforms, the data pool from different literature may be different. Therefore the candidates of target genes we chose here differed from the text, so the computational results would not be identical (see Table 3.A3).
I. Systems Immunology
66
3. Identifying the gene regulatory network of systems inflammation in humans
FIGURE 3.11 The curve fittings of dynamic regulatory model of proinflammatory gene and its regulators. The “o” is the data from microarray in 24 h, and the solid line is the curve fitting by the proposed dynamic model in Eq. (3.1). The error bars for standard deviations have been marked. We denoted the curve fittings of nine target genes and their upstream regulators, respectively [3], and the regulatory parameters for each dynamic model are presented (see Tables 3.A1 and 3.A2).
I. Systems Immunology
3.6 Conclusion
67
In this chapter, we use multiinput/single-output gene regulatory model to dynamically describe our gene regulatory system (i.e., multiple regulators and one target gene) that can mimic the real gene regulation in response to inflammation. The simulation can figure out the regulatory relationship and time lag value between upstream regulator and downstream target genes using time-series microarray data. In the research of Zou et al. [30], they used the concept of time delay just in a static state analysis of GRN, without applying it to dynamic regulatory modeling to mimic the real gene regulatory behavior. Furthermore, the apparent shortcoming of the static state analysis is the limitation on a single-input, single-output system (i.e., one regulator and one target gene). Such singleinput, single-output system is rarely existed in actual gene regulation. While the significant improvement in network construction has been achieved by our method, there are still two drawbacks in this study. First, although we present a multiinput/single-output system, it still cannot represent the actual biological conditions because they are multiinput/multioutput systems in most situations. This means when using AIC to trim the initial tangled GRN, we should prune down all data simultaneously rather than separately. However, such approach will increase the computational complexity in the combinatorial way and thus become computationally infeasible. The second drawback of all published algorithms for inference of transcriptional regulatory networks in inflammation is that a lot of candidate regulators are selected from the pool of potential regulators typically defined by computational prediction, either by sequence similarity analysis or by other genome annotation methods. If a true regulator is not included in the pool, it will inevitably escape network identification by the system modeling approach. This type of error will likely become a very significant problem in a poorly characterized genome of a model organism.
3.6 Conclusion This proposed dynamic modeling represents a new systematic approach to the study of GRN in inflammatory response. It is based on big databases mining to construct an inflammatory regulatory network by microarray data. It is also a systems biology approach because we process the complex GRN of numerous genes and regulators from various data sources at the same time. The trimmed down algorithm presented here can also be extended for global GRN analysis other than the inflammatory system in the future. From the curve fitting data generated by the proposed method, it can be seen that the performance is very satisfactory. By comparing with normal GRNs, we could obtain the perturbed gene network to analyze the effect of inflammatory stimulus on the immune system in the human infectious process. The hubs and “weak ties” are also discussed for the robust inflammatory GRN. The proposed GRN is also confirmed by published evidence in the literatures. In the future research, we need to investigate the dynamic GRNs in a hostpathogen interaction on an animal model organism. We will also need to consider extending the proposed algorithm to the identification and analysis of cross talking transcriptional regulatory networks.
I. Systems Immunology
68
3. Identifying the gene regulatory network of systems inflammation in humans
3.7 Material and methods 3.7.1 Dataset selection We used previous microarray data [27,28] as our mRNA expression profiles. Gene expression in whole blood leukocytes was determined at 0, 2, 4, 6, 9, and 24 h after the intravenous administration of bacterial endotoxin to four healthy human subjects. In those experiments, four additional subjects were studied under identical conditions but without endotoxin administration. The infusion of endotoxin activates innate immune responses and presents physiological responses of brief duration. It should be noted that there are an initial proinflammatory phase and a subsequent counterregulatory phase, with resolution of virtually all clinical perturbation within 24 h.
3.7.2 Construction of candidate gene networks of systematic inflammation Cross correlation is developed to identify target genes that are regulated by a common set TFs. The cross correlation uses continuous gene expression with the assumption that the regulatory genes and target genes have a level of positively (negatively) temporal correlation relationship if the target gene’s expression profile is positively (negatively) correlated with the regulatory gene’s profile, possibly with time lags. The next procedure is to specify the threshold for the correlation between target genes and their regulators. In this study, there are 22,577 gene expression time profiles [27,28]. We choose 2000 gene expression profiles randomly and compute their correlations with different time lags or lead to evaluate a threshold for significant cross correlations for possible regulators of target genes, which are useful for selecting candidate regulators from those via JASPAR. Let ~ u 5 (u1 ,. . .,uN ) be the expression profile of gene u and ~ v 5 (v1 ,. . .,vN ) be another expression profile of genev. Nis the time points of expression profile. Compute the cross correlation between ~ u and ~ v with the lag or lead of h time points as follows: ! N 2h X ðui1h 2 uÞðvi 2 vÞ i51 rðhÞ 5 0vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uN2h uX @t ðui1h 2uÞ2 N2h X
i51
!
N2h X
ui1h
u5
i51
ð N 2 hÞ
;
v5
!
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1 uN2h uX t vi 2 v2 A i51
(3.3)
v
i51
ð N 2 hÞ
h 5 2 M; :::; 0; :::; M
where M is the maximal time lead or lag between each two genes; because we initially do not know which are the target genes and which are the regulator genes. Since each time-interval in h is a 0.5 h, we allow 2 h lead and lag and compute the cross correlation between a gene and a TF with all possible time lags or leads that are less than 2 h for regulatory response. Finally, we select the maximum correlation between two genes with different time delays or time leads as their cross correlation and rank them in Fig. 3.2 for all regulatory genes. We can
I. Systems Immunology
69
3.7 Material and methods
obtain the distribution of correlation based on their ranks. Then, we can decide a threshold for a possible regulatory relationship between regulators and their target genes (see Fig. 3.2). In this study a cross correlation larger than 30% (or 0.46451) is selected as a threshold for possible regulators, which is used to truncate all impossible regulators from the pool of regulators suggested by JASPR via DNA sequence similarity analysis. Then, we link the remainder regulators selected by cross-correlation threshold with their target genes to construct a rough GRN. After the candidate GRN of inflammation system is constructed by integrating target genes with their regulators selected by cross correlation, the candidate GRN is modeled by dynamic equation in (3.1) for further pruning. Therefore the kinetic parameters of gene regulatory dynamic model are identified by the maximum likelihood parameter estimation via microarray data. After parameter identification the insignificant interaction coefficients of dynamic model are pruned by the most parsimonious AIC to refine the GRN in the inflammatory condition. The possible regulators selected by JASPAR algorithm are pruned two times in our method, once by correlation threshold via cross correlation and again by AIC via dynamic regulatory model and microarray data. The details are described next.
3.7.3 Constructing a dynamic regulatory model for gene regulatory network via microarray data After constructing the stochastic dynamic equation in Eq. (3.1) to model the regulation of a target gene, we use the method of maximum likelihood to estimate the kinetic parameters of dynamic model. Eq. (3.1) can be written in the following form: 2 3 a b 7 6 6 ^1 7 y½t 1 1 5 y½t x1 ½t 2 τ i ? xL ½t 2 τ i 1 6 7 1 ε½t (3.4) 4b 5 L k 5 ϕ½t θ 1 ε½t
where ϕ½t denotes the regression vector which can be obtained from microarray data, and θARp denotes the parameter vector of dimension p in regression Eq. (3.4). After applying the cubic spline method to interpolate the microarray data, we can obtain as many data points as we want. Then it is easy to obtain values of y½t 1 1xi ½t 1 1 for lAf1; 2; . . .; mg and iAf1; 2; . . .Lg, where m is the number of expression time points of a target gene, and L is the number of TFs binding to the target gene in the candidate gene network. By further computation of Eq. (3.4) at different time points, we can obtain the following vector form equation by data point interpolation: 3 2 3 2 2 3 y½ t 1 2 φ½t 1 1 ε ½ t 1 1 6 y½t 1 3 7 6 φ½t 1 2 7 6 ε ½ t 1 2 7 7 6 7 6 6 7 ^ ^ ^ (3.5) 7 θ16 6 756 7 4 y½t 1 m 2 1 5 4 φ½t 1 m 2 2 5 4 ε½t 1 m 2 2 5 φ ½ t 1 m 2 1 y½ t 1 m ε½t 1 m 2 1
For simplicity, it can be represented as follows:
Y5Φ θ1e
I. Systems Immunology
(3.6)
70
3. Identifying the gene regulatory network of systems inflammation in humans
In Eq. (3.6) the random noise ε½tk is regarded as a random variables P of white Gaussian noise with zero mean and unknown variance σ2 , that is, Efeg 5 0, and e 5 E eeT 5 σ2 I, where I is an identity matrix. In this chapter a maximum likelihood parameter estimation method is used to estimate θ and σ2 by the regression data obtained from the microarray data of regulatory genes and the target gene [51]. Under the assumption of the Gaussian noise vector e with m 2 1 elements, its probability density function is given as follows:
1 1 T X21 e exp 2 e (3.7) pð e Þ 5 P 1=2 e 2 ð2πÞm21 det e From Eq. (3.7), we can obtain the likelihood function: 1 ðX2Φ θÞT ðX 2 Φ θÞ L θ; σ2 5 P θ; σ2 5 exp 2 ð Þ=2 2σ2 ð2πσ2 Þ m21
(3.8)
Eq. (3.8) can be considered as a function of parameters θ and σ2 . In order to simplify the computation, it is practical to take the logarithm of Eq. (3.8) that yields the following log-likelihood function: X 2 m21 1 m21 log 2πσ2 2 2 logL θ; σ2 5 2 y½t1k112φ½t1k θ 2 2σ k51
(3.9)
where y½t 1 k and ϕ½t 1 k are the kth elements of Y and Φ in (3.6), respectively. By the maximum likelihood parameter estimation method, we expect the log-likelihood function to have the maximum at θ 5 θ^ and θ2 5 σ^ 2 . The necessary conditions for the maximum likelihood estimates θ^ and σ^ 2 are as follows [40]: @logL θ; σ2 50 @θ (3.10) @logL θ; σ2 50 @σ2 The estimated parameters θ^ and σ^ 2 are shown in the following: 21 θ^ 5 ΦΦT ΦT Y σ^ 2 5
(3.11)
21 h i T X 1 m 1 Y2Φ θ^ y½tk11 2 φ½tk θ^ 5 Y 2 Φ θ^ m 2 1 k51 m21
(3.12)
where Y and Φ can be obtained from the microarray data of regulatory genes and the tar^ the dynamic equation of the target get gene. After obtaining the estimated parameter θ, gene in the estimated transcriptional regulatory network can be expressed as follows:
y½t 1 1 5 a^ y½t 1
L X i51
b^i xi ½t 2 τ i 1 k^ 1 ε½t
(3.13)
^ b^i , and k^ are obtained from (3.11), and the variance of is obtained from (3.12). where a,
I. Systems Immunology
71
3.7 Material and methods
Iteratively, one target gene at a time, we can construct the overall dynamic equations of transcriptional regulatory network of inflammation, which are interconnected through the L P regulations b^i xi ½t of TFs. i51
Since some interaction coefficients b^i of the GRN in (3.13) are insignificant, they should be pruned off by the parsimonious AIC criterion. This is discussed in the next section.
3.7.4 Pruning the candidate gene regulatory network First, in this chapter, we use the JASPAR database to identify plausible binding motifs of their TFs roughly and select candidate regulators from the pool of DNA sequence similarity analysis. A candidate GRN of inflammation is constructed by linking target genes and their regulators with a cross-correlation threshold larger than 30% (see Fig. 3.2). Then we use the maximum likelihood estimation method to estimate the parameters of the dynamic model for a preliminary GRN of the inflammatory system. Although the maximum likelihood estimation method can help us quantify the regulatory abilities of all the possible interactive candidates of regulators on target genes, we still do not know exactly how significantly the regulatory ability can be regarded as a true regulator. In order to determine whether a regulator is significant or not, a statistical approach based on model validation is proposed for evaluating the significance of our regulatory model parameters to prune the preliminary gene network. In this chapter a statistical approach called the AIC is employed to validate the model order (or the number of model parameters) to determine the significance of our dynamic model parameters [40]. The AIC, which attempts to include both the estimated residual variance and the model complexity in one statistic, decreases as the residual variance σ^ 2 decreases and increases as the number p of parameters increases. As the expected residual variance decreases with increasing p for nonadequate model complexities, there should be a minimum around the correct number p of network parameters. For a transcriptional regulatory model with p regulatory parameters to fit with data from N samples, the AIC can be written as follows [40]: T 2p 1 Y2 Y^ (3.14) AIC p 5 log Y 2 Y^ 1 N N
^ where Y^ denotes the estimated expression profile of the target gene, that is, Y^ 5 ϕ θ. AIC in (3.14) is a trade-off between residual variance and model order. The minimization of Eq. (3.14) will achieve the true model order (i.e., the number of regulators of the target genes) of the gene regulatory system [40]. After the statistical selection of p parameters by minimizing the AIC, we can easily determine whether the candidate regulatory TF is a significant or just a false positive and then construct a refined (real) GRN for inflammation by pruning these false positives. Finally, evidence from previous studies is an important validation to support our refined GRN.
I. Systems Immunology
72
3. Identifying the gene regulatory network of systems inflammation in humans
3.8 Appendix 3.8.1 Dataset selection The microarray data of Boldrick et al. [55] was used as our mRNA expression profiles. Gene expression in whole blood leukocytes was obtained at 0, 0.5, 1, 2, 4, 6, 12, and 24 h after the intravenous administration of bacterial endotoxin to human peripheral blood mononuclear cells. In their experiments, eight additional stresses were studied under identical conditions, and we have considered the MONK-treated sample as our normal condition and the LPS-treated sample as the inflammatory condition. The infusion of endotoxin could activate innate immune responses and present with physiological responses of brief duration. It should be mentioned that there is an initial proinflammatory phase and a subsequent inflammatory regulatory phase, with the good resolution of virtually all clinical perturbation within 24 h.
3.8.2 Gene network construction In the first step, we pick up 9 genes of host response which are common induction and 10 genes of the host response that are common repression (see Tables 3.A4 and 3.A5). These genes have been discussed in the innate immune response from microarray of Boldrick et al. [55], and the candidate regulators of these 19 target genes will be selected in diversity pathogen infection in innate immune response. Then, GRN will be constructed in both inflammatory and normal cases. The microarray profile of regulators needs to analyze the changes in blood leukocyte gene expression patterns by Temporal Relationship Identification Algorithm (TRIA) prediction algorithm [40]. The TRIA prediction algorithm could identify the regulators by the cross correlations among continuous time profiles of microarray with the assumption that the regulatory genes and target genes have a positively (negatively) temporal relationship if the target gene’s microarray profile is positively (negatively) correlated with the regulatory genes’ microarray profile, possibly with time lags. Following the steps of the flowchart in Fig. 3.1, we then rebuild the GRN in both inflammation and normal conditions (see Figs. 3.A1 and 3.A2). And Fig. 3.A3 shows the GRN only in inflammatory condition but not in normal condition. The selected inflammatory genes and their regulators in inflammation condition are shown in Table 3.A6.
3.8.3 Discussion After the reconstructed inflammatory GRN is compared with the one in the text, some similarities and differences are found. The same highly connected hubs GATA2, AML1 (RUNX1), and YY1 are found. We also find that there are more than five connections for these hubs in both perturbed inflammatory networks. However, for the lack of some specific gene expression data in [55], it is difficult to verify a part of highly interactive genes in the text (i.e., FOXL1, TFAP2A, and SOX9). Interestingly, it is also found that there are some hubs only present in the reconstructed network but not in the text such as GATA3 and FPR, to be involved in host defense against bacterial infection and in the clearance of damaged cells [56]. The reason why these 19 candidate genes could be still found with
I. Systems Immunology
3.8 Appendix
73
new hubs is that some of 19 candidate genes are not included in the previous 49 genes. Since experimental conditions, research topics, and technology platforms are different, the data pools from different literature may be different. Therefore the candidate target genes might be different from the text, so the systematic computational results would not be identical.
I. Systems Immunology
74
3. Identifying the gene regulatory network of systems inflammation in humans
TABLE 3.A1 Identification of parameters and time delay by (3.11) in step 7 of Fig. 3.1 [3].
(a)
XIL1A ½n 1 1 5 a1 XIL1A ½n 1 a2 XRUNX1 ½n 1 a3 XPBX ½n 1 a4 XHLF ½n 1 a5 XTFAP2A ½n 1 a6 XC2REL ½n 1 a7 XNFIL3 ½n 1 a8 XELK1 ½n 1 a9 XFOXD1 ½n 1 a10 XFOXL1 ½n 1 a11 XGATA2 ½n 1 a12 XFOXI1 ½n 1 a13 XYY1 ½n 1 a14 XRELA ½n 1 a15 XNFKB1 ½n
(b)
XIL1B ½n 1 1 5 b1 XIL1B ½n 1 b2 XSOX9 ½n 1 b3 XMEF2A ½n 2 3 1 b4 XELK1 ½n 1 b5 XFOXI1 ½n 1 b6 XGATA2 ½n 1 b7 XNFKB1 ½n 1 b8 XRELA ½n (Continued)
I. Systems Immunology
3.8 Appendix
TABLE 3.A1
75
(Continued)
(c)
XIL1R ½n 1 1 5 c1 XIL1R ½n 1 c2 XTFAP2A ½n 1 c3 XNFIL3 ½n 1 c4 XFOXL1 ½n 1 c5 XRELA ½n 2 4
(d)
XIL6 ½n 1 1 5 e1 XIL6 ½n 1 e2 XRUNX1 ½n 1 e3 XHLF ½n 1 e4 XTFAP2A ½n 1 e5 XFOXL2 ½n 1 e6 XGATA2 ½n 1 e7 XFOXI1 ½n 1 e8 XYY1 ½n 1 e9 XREL ½n 1 e10 XNFKB1 ½n (Continued)
I. Systems Immunology
76
3. Identifying the gene regulatory network of systems inflammation in humans
TABLE 3.A1 (Continued)
(e)
IL8
X ½n 1 1 5 i1 XIL8 ½n 1 i2 XPBX ½n 1 i3 XSOX9 ½n 2 1 1 i4 XRORA ½n 1 i5 XMEF2A ½n 2 2 1 i6 XHLF ½n 1 i7 XE2F1 ½n 1 i8 XNFIL3 ½n 2 1 1 i9 XELK1 ½n 2 1 1 i10 XFOXF1 ½n 2 1 1 i11 XFOXD1 ½n 2 1 1 i12 XGATA2 ½n 2 1 1 i13 XFOXI1 ½n 2 1 1 i14 XMAX ½n 1 i15 XYY1 ½n 1 i16 XREL ½n 2 1 1 i17 XRELA ½n 2 1
(f)
XIL17 ½n 1 1 5 f1 XIL17 ½n 1 f2 XRUNX1 ½n 1 f3 XRORA ½n 1 f4 XHLF ½n 1 f5 XTFAP2A ½n 1 f6 XNFIL3 ½n 1 f7 XELK1 ½n 1 f8 XFOXD1 ½n 1 f9 XFOXL1 ½n 1 f10 XGATA2 ½n 1 f11 XFOXI1 ½n 1 f12 XYY1 ½n 2 1 1 f13 XC2REL ½n 1 f14 XRELA ½n 1 f15 XNFKB1 ½n 1 f16 XSPIB ½n 2 2 (Continued)
3.8 Appendix
TABLE 3.A1
77
(Continued)
(g)
XTLR4 ½n 1 1 5 g1 XTLR4 ½n 1 g2 XRUNX1 ½n 2 2 1 g3 XMEF2A ½n 1 g4 XFOXI1 ½n 1 g5 XIRF1 ½n 1 g6 XMAX ½n 2 4 1 g7 XC2REL ½n 2 4 1 g8 XRELA ½n 2 1 1 g9 XSPIB ½n (Continued)
I. Systems Immunology
78
3. Identifying the gene regulatory network of systems inflammation in humans
TABLE 3.A1 (Continued)
(h)
XTNFA ½n 1 1 5 d1 XTNFA ½n 1 d2 XTFAP2A ½n 1 d3 XFOXD1 ½n 1 d4 XGATA2 ½n 1 d5 XFOXI1 ½n 1 d6 XMAX ½n 1 d7 XNFKB1 ½n 1 d8 XSPIB ½n 2 1
(i)
XTNFR ½n 1 1 5 h1 XTNFR ½n 1 h2 XFOXI1 ½n
I. Systems Immunology
79
3.8 Appendix
TABLE 3.A2
The parameter estimation of the inflammatory gene regulator model.
The parameters of the inflammatory gene regulator models for (a)(i) in Table 3.A1 a1
286.3
b1
1.142
d1
1.476
e1
23.185
f1
237.22
g1
1.026
i1
20.06601
a2
2256.5
b2
27.899
d2
27.888
e2
25.678
f2
292.1
g2
3.744
i2
124.4
a3
2678
b3
26.776
d3
18.35
e3
186.2
f3
157.6
g3
22.814
i3
2601.6
a4
23107
b4
30.71
d4
215.36
e4
259.94
f4
2490
g4
10.97
i4
1113
a5
2756.8
b5
7.043
d5
21.185
e5
27.987
f5
310.2
g5
5.266
i5
2431
a6
233.95
b6
210.74
d6
4.209
e6
94.24
f6
308.8
g6
5.693
i6
29.29
a7
21538
b7
232.87
d7
22.06
e7
2164.2
f7
212.31
g7
23.598
i7
2928.5
a8
571.5
b8
13.85
d8
23.281
e8
38.94
f8
164.1
g8
25.625
i8
2378.9
a9
2255.4
e9
232.11
f9
2117.6
g9
213.6
i9
237.33
a10
21458
c1
0.7432
e10
20.4727
f10
95.22
i10
636.3
a11
312.5
c2
17.12
f11
215.3
h1
0.9369
i11
708.3
a12
456
c3
213.89
f12
26.024
h2
1.019
i12
2211.7
a13
298.25
c4
26.779
f13
2106.6
i13
226.4
a14
23.591
c5
8.942
24.48
i14
237.1
a15
4.879
f15
4.975
i15
2260.6
f16
1.567
i16
2268.5
i17
34.03
TABLE 3.A3
Reconstruction errors via independent data. Standard deviations for each gene
L1A
0.017
IL1B
0.012
IL1R
0.051
IL6
0.0035
IL8
0.0219
IL17
0.011
TNFA
0.046
TLR4
0.0059
TNFR
0.0842
I. Systems Immunology
80
3. Identifying the gene regulatory network of systems inflammation in humans
TABLE 3.A4 Features of the host response: common induction (P-value # 0.05) [3]. Gene symbol
Description
GO biofunction
CD40
TNF receptor superfamily member 5
CSF
Combining with macrophage colony stimulating factor to initiate a change in cell activity.
IL-1α
Interleukin 1 alpha precursor
GO:0042100: B-cell proliferation GO:0006954: inflammatory response GO:0030168: platelet activation GO:0043123: positive regulation of IκB kinase/NF-κB cascade GO:0006461: protein complex assembly GO:0008283: cell proliferation GO:0007275: multicellular organismal development GO:0007165: signal transduction GO:0006954: inflammatory response
IL-1β
Interleukin 1 beta precursor
IL2RA
Interleukin 2 receptor alpha chain precursor
IL-3
Interleukin 3 precursor
IL-8
Monocyte-derived neutrophil chemotactic factor
TNF-α
Tumor necrosis factor precursor
TNFSF1
Lymphotoxin-alpha precursor
I. Systems Immunology
GO:0007267: cellcell signaling GO:0006954: inflammatory response GO:0008285: negative regulation of cell proliferation GO:0007165: signal transduction GO:0008283: cell proliferation GO:0007166: cell-surface receptor linked signal transduction GO:0006955: immune response GO:0007267: cellcell signaling GO:0008284: positive regulation of cell proliferation GO:0007267: cellcell signaling GO:0007186: G proteincoupled receptor protein signaling pathway GO:0006954: inflammatory response GO:0007242: intracellular signaling cascade GO:0030155: regulation of cell adhesion GO:0045091: regulation of retroviral genome replication GO:0006959: humoral immune response GO:0006954: inflammatory response GO:0043123: positive regulation of IκB kinase/NF-κB cascade GO:0051092: positive regulation of NFκB transcription factor activity GO:0051023: regulation of immunoglobulin secretion GO:0007267: cellcell signaling
3.8 Appendix
TABLE 3.A5 Gene symbol
81
Features of the host response: common repression (P-value # 0.05) [3].
Description
GO biofunction
ADAM8
A disintegrin and metalloproteinase domain 8
GO:0005887: integral to plasma membrane
CCR1
CC chemokine receptor type 1
CD14
Monocyte differentiation antigen CD14 precursor
CD31
Platelet endothelial cell adhesion molecule precursor
GO:0007155: cell adhesion GO:0007267: cellcell signaling GO:0007187: G protein signaling, coupled to cyclic nucleotide second messenger GO:0006955: immune response GO:0006954: inflammatory response GO:0007166: cell-surface receptor linked signal transduction GO:0006909: phagocytosis GO:0030334: regulation of cell migration GO:0042060: wound healing
CD64
High affinity immunoglobulin gamma Fc receptor I precursor
CYBB
Cytochrome b-245, beta polypeptide
FPR
fMet-Leu-Phe receptor
ITGAX
Leukocyte adhesion receptor p150,95
NCF1
Neutrophil cytosol factor 1
WASP
WiskottAldrich syndrome protein
GO:0001788: antibody-dependent cellular cytotoxicity GO:0019884: antigen processing and presentation of exogenous antigen GO:0007166: cell-surface receptor linked signal transduction GO:0042742: defense response to bacterium GO:0006911: phagocytosis, engulfment GO:0006954: inflammatory response GO:0045087: innate immune response GO:0006928: cell motility GO:0007186: G proteincoupled receptor protein signaling pathway GO:0007188: G protein signaling, coupled to cAMP nucleotide second messenger GO:0007165: signal transduction GO:0007155: cell adhesion GO:0009887: organ morphogenesis GO:0006968: cellular defense response GO:0005625: soluble fraction GO:0007596: blood coagulation GO:0006952: defense response GO:0008544: epidermis development GO:0006955: immune response
I. Systems Immunology
82
3. Identifying the gene regulatory network of systems inflammation in humans
TABLE 3.A6 The inflammatory genes and their regulators [3].
Gene name
(A)
(B)
(C)
Possible regulators from JASPAR
Candidate regulators from cross-correlation threshold
Refined regulators from AIC
CREB1,ELK1,GATA2,GATA3, MAX,SP1,SPI1,SPIB,YY1,NFKB1
CREB1,ELK1,GATA2,GATA3, MAX,SP1,SPI1,SPIB,YY1,NFKB1
ADAM8 RUNX1,TFAP2A,CREB1,ELK1, GATA2,GATA3,MAX,SP1,SPI1, SPIB,YY1,REL,NFKB1 CCR1
HLF,IRF1,MEF2A,REL,RELA, IRF1,MEF2A,RORA,SP1,SPI1, RORA,RUNX1,SP1,SPI1,SPIB,YY1 YY1
IRF1,MEF2A,RORA,SP1,SPI1, YY1
CD14
E2F1,ELK1,GATA2 GATA3,HLF, IRF1,MAX,NFIL3,NFKB1,REL, RUNX1,SP1,SPI1,SPIB,YY1
E2F1,ELK1,GATA2 GATA3,HLF, E2F1,ELK1,GATA2 GATA3,HLF, IRF1,MAX,NFIL3,NFKB1, IRF1,MAX,NFIL3,NFKB1, RUNX1,SP1,SPI1,YY1 RUNX1,SP1,SPI1,YY1
CD31
E2F1,ELK1,GATA2 GATA3,IRF1, MAX,NFIL3,REL,RUNX1 SP1, SPI1,SPIB,YY1
E2F1,ELK1,GATA2 GATA3,IRF1, E2F1,ELK1,GATA3 IRF1,MAX, MAX,NFIL3,RUNX1,SP1,SPI1, NFIL3,RUNX1,SP1,SPI1,YY1 YY1
CD64
E2F1,ELK1,GATA2 GATA3,HLF, IRF1,MAX,NFIL3,REL,RELA, RUNX1,SPI1,SPIB,YY1
E2F1,ELK1,GATA2 GATA3,HLF, ELK1,GATA2,GATA3,HLF,IRF1, IRF1,MAX,NFIL3,RELA,RUNX1, MAX,NFIL3,RELA,RUNX1,SPIB, SPI1,SPIB,YY1 YY1
CYBB
GATA2,GATA3,IRF1,REL, RUNX1,SP1,SPI1,SPIB,SRY,YY1
GATA2,IRF1,RUNX1,SP1,SPI1, SPIB,YY1
GATA2,IRF1,SP1,SPI1,SPIB,YY1
FPR
CREB1,E2F1,ELK1 GATA2, GATA3,HLF MAX,MEF2A,REL, RORA,RUNX1,SP1,SPI1,SPIB, SRY,YY1
CREB1,E2F1,ELK1 GATA2,HLF, MAX,MEF2A,RORA,RUNX1, SP1,SPI1,SPIB,YY1
CREB1,E2F1,ELK1 GATA2,HLF, MAX,MEF2A,RORA,RUNX1, SP1,SPI1,SPIB,YY1
ITGAX
CREB1,E2F1,ELK1 GATA2, GATA3,IRF1,NFKB1,RELA, RORA,RUNX1,SP1,SPI1,SPIB, SRY,YY1
CREB1,E2F1,ELK1 GATA2, GATA3,IRF1,NFKB1,RELA, RORA,RUNX1,SP1,SPI1,YY1
CREB1,E2F1,ELK1 GATA2, GATA3,IRF1,NFKB1,RELA, RUNX1,SP1,SPI1,YY1
NCF1
ELK1,GATA2,GATA3,HLF, MEF2A,SP1 SPI1,SPIB,SRY,YY1
ELK1,GATA2,GATA3,HLF, MEF2A,SP1 SPI1,SPIB,SRY,YY1
ELK1,GATA2,GATA3,HLF, MEF2A,SPI1,SPIB,SRY,YY1
WASP
CREB1,ELK1,GATA2,GATA3, MAX,NFKB1,REL,RELA,RUNX1, SP1,SPI1,SPIB,SRY,YY1
CREB1,ELK1,GATA2,GATA3, MAX,NFKB1,RELA,RUNX1 SP1, SPI1,SPIB,YY1
CREB1,ELK1,GATA3,MAX, RELA,RUNX1,SPI1,SPIB,YY1
CD40
E2F1,ELK1,GATA2 GATA3,IRF1, IRF2,MAX,MEF2A,NFIL3,NFKB1, Pbx,REL,RELA,RORA,RUNX1, SP1,SPI1,SPIB,SRY,YY1
E2F1,ELK1,GATA2 GATA3,IRF1, IRF2,MAX,MEF2A,NFIL3, NFKB1,Pbx,REL,RELA,RORA, RUNX1,SP1,SPI1,SPIB,YY1
E2F1,ELK1,GATA2 GATA3,IRF1, IRF2,MAX,MEF2A,NFIL3, NFKB1,Pbx,REL,RELA,RORA, RUNX1,SP1,SPI1,SPIB,YY1
CSF
E2F1,GATA2,GATA3,IRF1,IRF2, MAX,NFKB1,REL,RELA,RUNX1, SP1,SPI1,SPIB
E2F1,GATA2,GATA3,MAX, NFKB1,RELA,RUNX1,SP1,SPIB
E2F1,GATA2,GATA3,MAX, NFKB1,RELA,RUNX1,SP1,SPIB
IL1A
E2F1,ELK1,GATA2 GATA3,HLF, MEF2A,NFIL3,Pbx,RUNX1,SPI1, SPIB,SRY,YY1
E2F1,ELK1,GATA2 GATA3,HLF, ELK1,GATA2,GATA3,HLF, MEF2A,NFIL3,Pbx,RUNX1,SPI1, MEF2A,NFIL3,Pbx,RUNX1,SPI1, SPIB,YY1 SPIB,YY1 (Continued)
I. Systems Immunology
83
3.8 Appendix
TABLE 3.A6
(Continued)
(A)
(B)
(C)
Gene name
Possible regulators from JASPAR
Candidate regulators from cross-correlation threshold
Refined regulators from AIC
IL1B
ELK1,GATA2,GATA3,MEF2A, REL,RELA,SP1,SPI1,SPIB,SRY, YY1
ELK1,GATA2,GATA3,MEF2A, RELA,SP1,SPI1,YY1
ELK1,GATA2,GATA3,MEF2A, RELA,SP1,SPI1,YY1
IL2RA
CREB1,E2F1,ELK1 GATA2, GATA3,HLF IRF1,MEF2A,NFIL3 REL,RELA,RORA,RUNX1,SPI1, SPIB,SRY,YY1
CREB1,E2F1,ELK1 GATA2, CREB1,E2F1,ELK1 GATA2, GATA3,HLF IRF1,MEF2A,NFIL3 GATA3,HLF IRF1,MEF2A,NFIL3 RELA,RORA,RUNX1,SPI1,YY1 RELA,RORA,RUNX1,SPI1,YY1
IL3
CREB1,E2F1,ELK1 GATA2, GATA3,HLF IRF1,MEF2A,NFIL3 NFKB1,REL,RELA,RORA, RUNX1,SP1 SPI1,SPIB,SRF,SRY, YY1
CREB1,E2F1,ELK1 GATA2, GATA3,HLF IRF1,MEF2A,NFIL3 NFKB1,RELA,RORA,RUNX1, SP1,SPI1,SRF,YY1
IL8
E2F1,ELK1,GATA2 GATA3,HLF, MAX,MEF2A,NFIL3,Pbx,REL, RELA,RORA,RUNX1,SPI1,SPIB, SRY,YY1
E2F1,ELK1,GATA2 GATA3,HLF, E2F1,ELK1,GATA3 HLF,MAX, MAX,MEF2A,NFIL3,Pbx,RELA, MEF2A,NFIL3,Pbx,RORA, RORA,RUNX1,SPI1,SPIB,YY1 RUNX1,SPI1,SPIB,YY1
TNFA
GATA2,GATA3,MAX,NFKB1, SPI1,SPIB,SRY,YY1
GATA2,GATA3,MAX,NFKB1, SPI1,SPIB,YY1
GATA2,GATA3,MAX,NFKB1, SPI1,YY1
CREB1,ELK1,GATA2,GATA3, IRF1,MAX,MEF2A,RELA,SP1, SPI1,YY1
CREB1,ELK1,GATA2,GATA3, IRF1,MAX,MEF2A,RELA,SP1, YY1
TNFSF1 CREB1,E2F1,ELK1 GATA2, GATA3,HLF IRF1,MAX,MEF2A REL,RELA,RUNX1 SP1,SPI1, SPIB,YY1 AIC, Akaike information criterion.
I. Systems Immunology
CREB1,E2F1,ELK1 GATA2, GATA3,IRF1,MEF2A,NFIL3, NFKB1,RELA,RORA,RUNX1, SP1,SPI1,SRF,YY1
84
FIGURE A3.1
3. Identifying the gene regulatory network of systems inflammation in humans
The gene regulatory network in inflammatory condition [3].
I. Systems Immunology
3.8 Appendix
FIGURE A3.2
The gene regulatory network in normal condition [3].
I. Systems Immunology
85
86
FIGURE A3.3
3. Identifying the gene regulatory network of systems inflammation in humans
Gene network only in inflammatory condition but not in normal condition [3].
I. Systems Immunology
C H A P T E R
4 Dynamic cross-talk analysis among signaling transduction pathways in the vascular endothelial inflammatory response system of humans 4.1 Introduction One main interest of molecular biologists is to understand the underlying molecular mechanisms in a cell, including the synthesis of DNA, RNA, and protein and how these molecules are regulated. In the last decade, researchers have uncovered a multitude of biological facts, such as protein 3D structures and genome sequences and organizations. However, this information is not sufficient to interpret the entire biological process and to understand its robustness, which is one of the fundamental properties of living systems under intrinsic and extrinsic fluctuation at different cell levels [57]. Thus understanding how genes, proteins, and small molecules interact to form the robust functional modules has become one of the major challenges in systems biology in recent years. With the advance of experimental techniques, many researchers have utilized high-thcandidateput data such as DNA microarray, yeast two-hybrid assay, coimmunoprecipitation, ChIP-chip, and next-generation sequencing data approach to study the biomolecular networks. In particular, these kinds of data are usually integrated to construct various types of molecular networks, including proteinprotein interaction (PPI) networks (PPINs), gene regulatory networks, metabolic networks, and gene coexpression networks. These molecular networks have been demonstrated with great potentials to discover basic cellular functions and to reveal essential molecular mechanisms for various biological phenomena, by understanding biological systems not on an individual component level but on a system-wide level [58]. One of the extensively investigated biological systems is the inflammatory system of humans. It orchestrates a complex biological process, which engages a variety of cell types that could eliminate invading microorganisms to protect the host [59]. Infected hosts could recognize the ligands on the surface of disease-causing pathogens and then mobilize specific inflammatory defense mechanisms. On the other hand, pathogens can proactively perturb host defense signaling
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00015-8
87
© 2021 Elsevier Inc. All rights reserved.
88
4. Dynamic cross-talk analysis among signaling transduction pathways
pathways to enhance their survival [60]. In this case the hosts and the pathogens in the inflammatory responses are considered as two players with conflicts of interest in the game theory [61]. Therefore the inflammatory responses are highly context dependent, suggesting an unplumbed complexity, and a wealth of intricate intra- and intercellular interactions [62]. In the human self-protection mechanism, vascular endothelium plays a central role in the regulation of several inflammatory functions. Products from bacteria and viruses that could stimulate leukocyte and endothelial release of cytokines, chemokines, and lipid mediators may also play a role in inflammation. These stimuli can alter gene regulation of endothelial leukocyte adhesion molecules, cell signal transduction pathways, and endothelial permeability. Tumor necrosis factor (TNF) is one of the important cytokines that has long been considered as a pathological factor implicated in the pathology of dozens of human diseases, including septic shock [63], cancer [64], rheumatoid arthritis [65], malaria [66], and other afflictions. Two protein families have been implicated in the signaling pathway mediated by the receptors for TNF (TNFR). These proteins include the death domain (DD)containing proteins (TNFR1, TRADD, RIP, and FADD) and the TRAF domaincontaining proteins (TNFR2, CD40, and TRAF1-6). Based on current models, upon binding of TNF to TNFR1, a protein called the silencer of DD is released and TRADD is recruited. TRADD then recruits TNFR-associated factor 2 (TRAF2), RIP, and FADD, leading to the activation of signaling cascades that could mediate c-jun N-terminal kinase (JNK) activation, nuclear factor kappa-B (NF-κB) activation, and apoptosis. Binding of TNF to TNFR2 expressed by endothelial cells may lead to cell activation or apoptosis. Because both responses are initiated by the ligand binding to a single receptor, it is clear that TNF activates multiple signal transductions [67]. Apart from TNFRs, there are two other central signaling pathways mediated by Toll-like receptor 4 (TLR4) and interleukin-1 receptor (IL-1R), both of which play important roles in inflammation by regulating the activity of transcription factors such as NF-κB. From TLR4, IL-1R, and TNFR signalings to NF-κB, there is a convergence on a common IκB kinase complex that phosphorylates the NF-κB inhibitory protein IκBα, namely, the inhibitor of NF-κB kinase (IKK) [68,69]. Although how TNFR signals on the cell surface activate the IKK complex is not completely understood, several studies have identified most key signaling components and uncovered posttranslational modification and cellular translocation of these components [70]. Previous studies have shown that the upstream signaling components are mostly receptor specific, but the principles of signaling are similar, involving the recruitment of specific adaptor proteins and the activation of kinase cascades in which PPIs are controlled by poly-ubiquitination [68]. Because they need to cope with a broader range of pathogens with limited resources, some efficient signaling structures are reduplicated. Therefore understanding these pathways has been focused on the identification of signaling transductance network, the role of cytokine inducer, and the subcellular translocation of those components by integrating various types of genomic and proteomic data. Although many works have extracted some characteristics of TNF signaling transductions, however, there is still lack of a comprehensive approach to discover the systematic and dynamic properties of multipathway signaling networks under specific stimulant conditions. The most common way to construct PPINs in the systems biology studies is to use public available PPI database, published literatures, and experimental data to connect the edges between proteins. However, most of these reconstructions of the PPIN merely displayed the static properties of interactions rather than discussing the dynamic interactions and evolution
I. Systems Immunology
4.1 Introduction
89
FIGURE 4.1 Schematic diagram for reconstructing the PPI network. This diagram shows the basic concept of the reconstruction of PPI network. On the protein level the interactions between proteins from the well-known database and experimental data were extracted. However, this kind of interactions only reflects all possible static connections without stimulus-specific response or temporal changes. The interaction model includes the gene expression patterns from different time course to infer the dynamic PPIs and networks, suggesting a more significant and realistic method for PPI network reconstruction of the living organism [5]. PPI, Proteinprotein interaction.
of network topology. Therefore the basic concept in this chapter lies in introducing the timecourse gene expression data to endow the static PPIs with significantly dynamic profiles, which correspond to the real living organism (see Fig. 4.1). Various types of data were integrated, including gene expression profiles downloaded from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/ accession number: GSE9055), pathway information from KEGG http://www.genome.ad.jp/kegg/ [71] and NetPath http://www. netpath.org/ [72], PPI information from BioGRID (Biological General Repository for Interaction Datasets) database http://www.thebiogrid.org/ [73], and PPI clues from STRING (the Search Tool for the Retrieval of Interacting Genes/Proteins) http://string-db.org/ [74] to infer a candidate PPIN. According to the candidate PPN, a dynamic interaction model was constructed for a protein to describe its interactions with other proteins. Then, based on microarray expression profiles of different time stages, the interaction coefficients among these proteins were further identified by the constrained least squares parameter estimation method for each time stage, respectively. The insignificant interactions were pruned and the significant PPIs were reserved for the specific time stage according to the identified interaction coefficients. In this case the preserved interactions represent the effective PPIs for a specific time stage under a specific stimulus. This procedure was iterated one protein by one protein, and finally the whole PPINs were constructed for the vascular inflammatory response system at different time stages, which can be used to investigate the development of PPIN in the inflammation to the TNFα stimulus. A new cross-talk ranking method was also developed to evaluate the potential core elements in the related signaling pathways. Furthermore, a bow-tie structure was observed and considered as a core system mediating the input pathological factors and the output host responses for efficient processing of a broader range of pathogens with limited proteins and signaling pathways, suggesting the robustness of the molecular network.
I. Systems Immunology
90
4. Dynamic cross-talk analysis among signaling transduction pathways
4.2 Methods of constructing cross talks among signaling pathways in inflammation 4.2.1 Constructing the candidate proteinprotein interaction networks The dynamic gene expression and the assembly of all functional components in the genome of an organism are most influenced by the environment. In this chapter, we focus on the reconstruction of the PPINs of inflammation under TNFα stimulus at different time stages. Further, the analysis of dynamic cross talks is used to investigate the network characteristics of an endothelial inflammatory system. The proposed method of the PPIN construction is divided into four steps as shown in Fig. 4.2. The candidate PPIN with respect to TNFα stress is established
FIGURE 4.2 Flowchart of the proposed method to construct the PPIN. This flowchart depicts the process to construct the PPIN and the afterward investigations in this study. The four key steps of PPIN construction are described in details in the text. The candidate PPIN is set up from steps (1) and (2), and the refinement is then performed in steps (3) and (4) to obtain the refined PPIN. PPIN, Proteinprotein interaction network.
I. Systems Immunology
91
4.2 Methods of constructing cross talks among signaling pathways in inflammation
in the first two steps, and the refinement of candidate to PPIN is then performed in the last two steps. Step 1: At first, the proteins of interest are selected to construct the protein pool. A total of 21 proteins (see Table 4.1, column 1) are selected in association with the TNFα stress based on data mining in the published literatures and known pathway diagrams [68,7072,7578]. Then, 24 proteins are selected, which are involved in IL-1 and TLR4 (both MyD88-dependent and MyD88-independent) pathways and might be triggered under the TNFα stimulus (see Table 4.1, columns 24). Finally, 15 negative regulator proteins [70,79,80] are selected, which are considered to have antiinflammatory therapeutic potential and/or possible feedback control signals in human endothelial inflammatory system (see Table 4.1, column 5). The total 60 proteins constituted the protein pool in this study and then the PPINS will be constructed among these proteins. Further, the Gene Ontology annotations for these 60 proteins are given in Table 4.A1 in Section 4.6. Step 2: In order to construct the static PPI list for these 60 proteins, PPI information from BioGRID http://www.thebiogrid.org/ [73] and STRING http://stringdb.org/ [74] databases are employed. BioGRID contains over 198,000 of protein and genetic interactions information
TABLE 4.1 Human protein candidates and their signaling pathway catalogs in this chapter [5]. TNF
IL-1
MyD88-dependent TLR4
MyD88-independent TLR4
Negative regulators
TNF
IL1α
MyD88
TLR4
A20
TNFR1
IL1β
TIRAP
TRAM
CYLD
TNFR2
IL1R1
CD14
TRIF
FLN29
TRADD
IL1R2
PELI2
TRAF3
IRAK3
FADD
TOLLIP
IRAK4
TANK
NOD2
GRB2
ST2L
IRAK1
TBK1
RIP3
SOS1
PELI1
NIK
IKKε
PTPN11
CAV1
ECSIT
BCL10
IRF3
RNF216
CASP8
SARM
TRAF2
SIGIRR
TRAF5
SOCS1
TTRAP
TMED1
TRAF6
TNIP3
RIP
TRAF4
MEKK3
UBE2N
TAK1 TAB1 TAB2 (Continued)
I. Systems Immunology
92
4. Dynamic cross-talk analysis among signaling transduction pathways
TABLE 4.1 (Continued) TNF
IL-1
MyD88-dependent TLR4
MyD88-independent TLR4
Negative regulators
IKKα IKKβ IKKγ Proteins in the cross-talk analysis are classified into three major signaling pathways, including TNFα, IL-1, and TLR4 (for both MyD88dependent and MyD88-independent) pathways. The full names of these proteins are listed in the following. BCL10: B-cell CLL/ lymphoma 10, NOD2: nucleotide-binding oligomerization domaincontaining 2, CASP8: caspase 8, apoptosis-related cysteine peptidase, CAV1: caveolin 1, caveolae protein, 22 kDa, CD14: CD14 molecule, IKKα: conserved helix-loop-helix ubiquitous kinase, CYLD: cylindromatosis (turban tumor syndrome), FADD: Fas (TNFRSF6)associated via death domain, GRB2: growth factor receptorbound protein 2, IKKβ: inhibitor of kappa light polypeptide gene enhancer in B cells, kinase beta, IKKε: inhibitor of kappa light polypeptide gene enhancer in B cells, kinase epsilon, IKKγ: inhibitor of kappa light polypeptide gene enhancer in B cells, kinase gamma, IL1α: interleukin 1, alpha, IL1β: interleukin 1, beta, IL1R1: interleukin 1 receptor, type I, IL1R2: interleukin 1 receptor, type II, ST2L: interleukin 1 receptorlike 1, IRAK1: interleukin-1 receptorassociated kinase 1, IRAK3: Interleukin-1 receptorassociated kinase 3, IRAK4: interleukin-1 receptorassociated kinase 4, IRF3: interferon regulatory factor 3, NIK: mitogen-activated protein kinase kinase kinase 14, MEKK3: mitogen-activated protein kinase kinase kinase 3, TAK1: mitogen-activated protein kinase kinase kinase 7, TAB1: mitogen-activated protein kinase kinase kinase 7 interacting protein 1, TAB2: mitogen-activated protein kinase kinase kinase 7 interacting protein 2, MyD88: myeloid differentiation primary response gene (88), PELI1: pellino homolog 1 (Drosophila), PELI2: pellino homolog 2 (Drosophila), PTPN11: protein tyrosine phosphatase, nonreceptor type 11 (Noonan syndrome 1), RIP: receptor (TNFRSF)interacting serinethreonine kinase 1, RIP3: receptor-interacting serinethreonine kinase 3, SARM: sterile alpha and TIR motifcontaining 1, SIGIRR: single immunoglobulin and Tollinterleukin 1 receptor (TIR) domain, ECSIT: ECSIT homolog (Drosophila), SOCS1: suppressor of cytokine signaling 1, SOS1: Son of Sevenless homolog 1 (Drosophila), TANK: TRAF family memberassociated NFKB activator, TBK1: TANK-binding kinase 1, TIRAP: Tollinterleukin 1 receptor (TIR) domaincontaining adaptor protein, TLR4: Toll-like receptor 4, TMED1: transmembrane emp24 protein transport domaincontaining 1, TNF: tumor necrosis factor (TNF superfamily, member 2), A20: tumor necrosis factor, alpha-induced protein 3, TNFR1: tumor necrosis factor receptor superfamily, member 1A, TNFR2: tumor necrosis factor receptor superfamily, member 1B, TNIP3: TNFAIP3 interacting protein 3, TOLLIP: Toll interacting protein, TRADD: TNFRSF1A-associated via death domain, TRAF2: TNF receptorassociated factor 2, TRAF3: TNF receptorassociated factor 3, TRAF4: TNF receptorassociated factor 4, TRAF5: TNF receptorassociated factor 5, TRAF6: TNF receptorassociated factor 6, FLN29: TRAF-type zinc finger domaincontaining 1, TRAM: Tolllike receptor adaptor molecule 2, RNF216: TRIAD3 protein, TRIF: Toll-like receptor adaptor molecule 1, TTRAP: TRAF- and TNF receptorassociated protein, UBE2N: ubiquitin-conjugating enzyme E2N (UBC13 homolog, yeast) [5].
from six major model organism species [73]. STRING is a database of known and predicted protein interactions, including direct (physical) and indirect (functional) interactions; these databases are obtained from four various genome and high-thcandidateput sources. STRING currently covers 2,483,276 proteins from 630 organisms [74]. These two databases are integrated together to indicate the all possible interaction between two proteins, and the candidate PPIN was constructed based on the 60 proteins in the protein pool and the all possible interactions among them.
4.2.2 Pruning the candidate proteinprotein interaction network via a dynamic interaction model According to the database information and the literature evidences, the candidate PPIN, which consists of all possible static interactions among the proteins in databases, was constructed. Since the candidate PPIN only reveals all possible protein interactions under all kinds of experimental conditions, the candidate PPIN should be further pruned to indicate the effective protein interactions in PPIN under TNFα stress. As the time profiles of microarrays may reflect the coexistence of two particular proteins, time-course microarray data were employed to assess the true PPIs at certain time. Here, a dynamic interaction model
I. Systems Immunology
4.2 Methods of constructing cross talks among signaling pathways in inflammation
93
and model order selection method Akaike information criterion (AIC) [81] are used to prune the false-positive interactions in candidate PPIN. The details of the pruning process are given in the following paragraph and in Section 4.6. Step 3: In this step the protein interaction equation is used to describe the PPIs between target proteins of interest and their possible interacting proteins of candidate PPIN in the human inflammatory system. The system identification method in Chapter 2, Biological Network Modeling and System Identification in Systems Immunology and Infection Microbiology, is employed to estimate the interaction parameters in candidate PPIN. For a target protein p in the candidate PPIN, the stochastic dynamic interaction model of the protein is as follows [82] yp ½t 1 1 5 yp ½t 1
Q X
bpq yp ½tyq ½t 1 αp xp ½t 2 β p yp ½t 1 ωp ½t
(4.1)
q51
where yp ½t represents the protein expression level at time t of the target protein p, bpq denotes the interaction parameter of the qth interactive protein to pth target protein, yq ½t represents the protein expression level of the qth interactive protein with the target protein p, αp denotes the effect of translation from the corresponding mRNA to target protein, xp ½t represents the mRNA expression level of the corresponding target protein p, β p indicates the degradation effect of the target protein p and ωp ½t is the stochastic measurement noise. The biological meaning of Eq. (4.1) is that the protein expression of the target protein p at the next time t 1 1 is contributed by the concentration of target protein p at the current time t, the effect of Q regulatory protein interactions, the translation effect from the corresponding mRNA, the degradation effect of protein at the present time, and some stochastic measurement noises [82]. Step 4: Interaction parameters in Eq. (4.1) are identified in this step, and the model order selection method AIC in Eq. (3.14) is further used to prune the candidate PPIN according to the significance of the protein interaction parameters. By solving the constrained least squares parameter estimation problem and applying the most parsimonious model order detection using the AIC algorithm [see Section 4.6], we sieve out the interactive proteins that significantly interact with the target protein on the genomic level, that is, among Q proteins in the candidate PPIN, only Q0 proteins significantly interact with protein p. In other words, the insignificant protein interactions or noninvolved protein interactions, that is, false-positive interaction could be deleted by AIC. The pruning process is repeated one protein by one protein. Consequently, the candidate PPIN was pruned to become the PPIN.
4.2.3 Cross-talk analysis by counting the cross-talk ranking values To extract the significance from ever-changing interactions and to determine if a protein is a possible cross-talk candidate, an original approach based on catalogs of pathway and dynamic interactions in sequential time stages is proposed to compute the cross-talk ranking value (CTRV) for cross-talk analysis. We first clustered every protein (node) in the PPIN into different pathways and then calculated the number of interactions that link outward to different pathways for each node. The numbers of interactions in different time
I. Systems Immunology
94
4. Dynamic cross-talk analysis among signaling transduction pathways
stages are summed up, resulting in the CTRVs. In general, if a protein has more interactions to connect with several pathways, it will be considered to have more possibility as a cross-talk candidate. It should be noted that the protein interactions, which link out as interactions to the other proteins and link in as interactions from other proteins, are both important to the cross-talk analysis. A highly interactive protein always plays a critical role to receive, mediate, or amplify signaling cascade, and a protein that intensively interacts with others is usually characterized as an activator, inducer, and transcription factor. Therefore the proposed measurement of CTRV concentrates on the counting of the interactions with different signaling pathways for each protein, rather than taking account of every linkage as the ranking value, and the CTRVs of proteins are considered as a measurement to reflect the potential of a protein to connect with multiple signaling pathways. For example, as shown in Fig. 4.3A, protein A belongs to pathway A, and protein B belongs to pathway B. If all the proteins connected with protein A are within the pathway A, the numbers of connection for protein A will be considered as no change. As shown in Fig. 4.3B, if protein A connects with protein B but they belong to different pathways, then both ranking numbers of protein A and B will be added by one. Since the CTRVs are counted according to the pathway catalogs, the way of pathway classification will be an important step to determine the values of CTRVs. Some proteins have been proven to be involved in more than one pathway, but it is still not appropriate to assign one protein to multiple pathway catalogs. The multiassignment will decrease the values of CTRVs to cause the true cross talks to be neglected. In this cross-talk analysis, to overcome this problem, every protein is only assigned to one pathway. It is assumed that the active sequence under the TNFα stress is TNFα, IL-1, MyD88-dependent, and finally MyD88-independent signaling pathways. If there is a protein that is involved in more than one pathway, this protein will be assigned to the earliest activated signaling pathway among them. Since the realistic situation for the activated sequence of signaling pathways in vivo is unavailable, this hypothesis is based on the feedback signals from the known knowledge. In other words, TNFα stress can induce the gene expression and secretion of
FIGURE 4.3 Counting for the CTRVs. CTRV is a ranking value that reflects the potential of a protein in connection with multiple signaling pathways. (A) All proteins connected with protein A belong to the same pathway A. In this case the CTRV of protein A will not be changed. (B) If protein A connects with protein B, which belongs to a different signaling pathway, then both CTRVs of protein A and B will be added by one [5]. CTRVs, Crosstalk ranking values.
I. Systems Immunology
95
4.3 Signaling transduction, signaling pathways, and their cross talks in inflammatory response
TABLE 4.2 Statistics of the tumor necrosis factor (TNF)α-induced proteinprotein interaction networks of human umbilical vein endothelial cells. Duration (h)
Nodes
Edges
Highly connected proteins
01
56
144
TRAF2
IRAK1
TRAF6
IKKα
IKKβ
12
56
156
TRAF2
TRAF6
IRAK1
RIP
TNFR1
23
56
150
IRAK1
TRAF6
TRAF2
IKKα
RIP
34
55
152
TRAF2
IKKα
TNFR1
IRAK1
TRAF6
46
56
151
IRAK1
TRAF2
TRAF6
RIP
TNFR1
68
56
156
TRAF2
TRAF6
RIP
IKKα
IRAK1
Summary of the reconstruction of proteinprotein interaction networks at different time stages. Proteins with maximal proteinprotein interactions are sorted in this table. These proteins are usually considered as the hubs that will be involved in several biological functions and play important roles in the signaling network.
IL-1 cytokine [83], and upon binding of IL-1, the IL-1R can associate with IL-1R accessory protein (IL-1RAcP) [84,85], forming a functional signaling receptor complex to involve MyD88 to activate the related pathway [86]. Nevertheless, there is no evidence to reveal that the TNFα stimulus will induce the MyD88-independent signaling pathway, and therefore we assume that this signaling pathway is activated at last. The illustration shows a time-series layout for each refined PPINs from 0 to 8 h. Each inflammation PPIN is identified via a set of gene expression profile with five data points. In order to distinguish proteins involved in different signaling cascades, proteins belong to the same pathway are labeled with the same color. The progression of ever-changing interactions obviously reveals that new connections continuously emerge, reflecting the fact that new signaling modules and function communities are involved in the endothelial inflammatory response to the TNFα stimulus. In addition, the top five hubs with highly connected degree are marked with larger font size. The numbers of nodes, edges and the highly connected hubs at different time stages are outlined in Table 4.2 [5].
4.3 Signaling transduction, signaling pathways, and their cross talks in inflammatory response 4.3.1 Construction of proteinprotein interaction networks at different time stages of inflammatory system The proposed systems biology method is used to investigate the PPINs at different time stages of inflammatory system under TNFα stress. The genome-wide microarray data are downloaded from the GEO database at the NCBI website (http://www.ncbi.nlm.nih.gov/ geo/ accession number: GSE9055) in this study. HUVEC (human umbilical vein endothelial cells) are treated with 10 ng/mL TNFα and the samples are collected every 15 or 30 min (08 h, 25 time points) [87]. The dataset of 25 time points is divided into six time stages (01, 12, 23, 34, 46, 68 h), as six subsets of data of inflammation system.
I. Systems Immunology
96
4. Dynamic cross-talk analysis among signaling transduction pathways
FIGURE 4.4
PPINs for HUVEC under TNFα stress at different time stages. HUVEC, Human umbilical vein endothelial cells; PPIN, proteinprotein interaction network; TNF, tumor necrosis factor.
At each stage the corresponding dataset is used to identify the protein interaction parameters in Eq. (4.1) and to prune down the candidate PPIN based on the identified parameters to obtain the PPIN of each time stage of inflammation system, respectively. Consequently, six PPINs are constructed for six time stages. These PPINs are rearranged and visualized by the Cytoscape tool [88] (see Fig. 4.4). The numbers of nodes, edges, and the highly connected hubs of PPINs at different time stages of inflammation system are shown in Table 4.2.
4.3.2 Investigation of the tumor necrosis factor α signaling pathway TNFα is found to be a highly pleiotropic cytokine of inflammation that can activate leukocytes and enhance the adherence of neutrophils and monocytes to endothelium. While TNFα is synthesized by macrophages and other cells to respond to pathogen-associated molecular patterns, inflammatory products, and other invasive stimuli, it may mediate cellular responses through two distinct receptors, the p60 TNF receptor (TNFR1, p55) and the p80 TNF receptor (TNFR2, p75) [75]. These two receptors lie on the plasma membranes of virtually all cells except the erythrocyte, sharing structural homology in the extracellular TNFα-binding domains and exhibiting similar binding affinity for TNFα. However, these two receptors induce separate cytoplasmic signaling pathways under receptorligand binding. In order to investigate the TNF signaling pathway clearly, TNFα-related proteins are extracted from the whole PPIN, and the TNFα-related signaling pathway is shown in Fig. 4.5A. The inferred functional modules supported by literature evidences in the PPIN are further shown in Table 4.A2 (Section 4.6).
I. Systems Immunology
4.3 Signaling transduction, signaling pathways, and their cross talks in inflammatory response
97
FIGURE 4.5 Investigations of the (A) TNFα-related pathway and (B) IL-1/TLR4-related pathways. Protein components that are ultimately responsible for the trigger of an inflammatory function are labeled with the same color in the pathway, and the thicker red edges represent the interactions listed in Tables 4.A1 and 4.A2 (Section 4.6) [5]. TLR4, Toll-like receptor 4; TNF, tumor necrosis factor.
NF-κB activation (TNFR1TRADDTRAF2RIP)—One major capability of TNFα signaling pathway is the mediation of the activation of inflammatory response transcription factor, NF-κB. Upon the activation of TNFR1 at the plasma membrane, the TNFR1 DD serves as a docking site for the DD-containing adaptor protein TRADD through homotypic DD interactions [89]. TRADD then sequentially recruits TRAF2 and the serine/threonine kinase RIP, which rapidly signals to NF-κB activation [89,90]. Finally, the functional modules, including proteins TNFR1, TRADD, TRAF2, and RIP, were successfully identified in the PPIN. NF-κB activation (TNAF5RIP, MEKK3IKKβTAK1)—Two other functional modules associated with NF-κB activation are also recognized. TRAF5 has been implicated in the TNF-induced NF-κB activation. In contrast to the single TRAF2 or TRAF5 knockout cells, TRAF2/TRAF5 double-knockout cells have shown the impaired NF-κB activation upon TNF stimulation [91]. Unlike TRAF2, TRAF5 only interacts with RIP, but not with TRADD in coimmunoprecipitation assays [91]. This phenomenon can also been seen in the PPIN. Further, MEKK3 has been shown to play an important role in TNFinduced NF-κB activation using MEKK3-deficient fibroblast cells [92]. MEKK3 has been also demonstrated to directly phosphorylate IKK and the kinase activity is regulated by TAK1 [92,93]. These functional interactions to result in NF-κB activation are observed in the PPIN. IKK activation (RIPIKKs) and protein recruitment (RIPTAK1)—For the IKK activation, RIP responds to TNF and becomes K63-poly-ubiquitinated at lysine 377 in its intermediate domain. This is indispensable for IKK activation upon TNF stimulation, as the mutation of lysine 377 abolishes the ability of RIP to rescue IKK activation in RIP-deficient cells [94]. RIP also plays a role in the recruitment of TAK1, as TAK1 fails to translocate to the TNFR1 complex upon the TNF stimulation of RIP-deficient Jurkat cells [93].
I. Systems Immunology
98
4. Dynamic cross-talk analysis among signaling transduction pathways
Apoptosis (TRADDRIPFADDTRAF2CASP8) and protein kinase (TAK1-TAB1TAB2)—Apoptosis is another important function involved in the TNF pathway. TNFα binding will promote the complex forming of TNFR1 at the cell membrane. While TRADD, RIP, and TRAF2 dissociate from TNFR1, and endosomal TNFR1 recruits the DD-containing adaptor protein FADD, which could bind itself to caspase-8, a cytoplasmic complex forms, that is, it is implicated in signaling to apoptosis [95]. Moreover, the TAK1TAB1TAB2 complex is a protein kinase module involved in the TNFα signaling transduction and IKK activation. Activation of TAK1 leads to autophosphorylation of TAB1, in contrast, TAB2 becomes phosphorylated at the membrane, probably by an upstream protein kinase [96].
4.3.3 Investigation of the interleukin-1 receptor and Toll-like receptor 4 signaling pathways The IL-1R/TLR superfamily can be clustered into multiple receptors, each of which plays an important role in both innate and adaptive inflammatory systems [68]. Members of the IL-1R subfamily are characterized by Ig-like domain that binds to specific IL-1-related cytokines, which are found primary regulators of inflammatory and inflammatory responses. Through binding type I receptor (IL-1R1), it can activate specific protein kinases, including the NF-κB inducing kinase (NIK) and three distinct mitogen-activated protein (MAP) kinase cascades. Then, these kinases modulate a number of transcription factors, including NF-κB, AP1, and CREB, each of which regulates a plethora expression of immediate early genes central to the inflammatory response [97]. On the contrary, the TLR subfamily includes 13 members that contain leucine-rich repeat motifs in their extracellular domains, which could recognize distinct pathogen-associated patterns such as LPS, microbial lipopeptides, viral double-stranded RNA, and CpG DNA [98]. These receptors have been found to play a critical role in the activation of inflammation and induce the release of critical proinflammatory cytokines that are necessary to activate potent inflammatory responses [99]. The cytoplasmic portions of both IL1R and TLR family members are found to share a common structural motif, the so-called TLR and IL-1R (TIR) homology domain at their cytoplasmic portion [100]. Like TNFRs, TIR-containing receptors have not catalytic activity and employ intracellular adaptors and signal-transducing molecules to activate effector signaling pathways [101]. Homotypic TIRTIR interactions with a limited set of TIR-containing adaptors could explain why more than 15 different receptors can trigger only a small number of signaling pathways [101]. Because of the largely common use of signaling modules, the IL1Rand TLR4-related signaling pathways are integrated into the same diagram for investigation (see Fig. 4.5B). The inferred functional modules supported by literature evidences in the PPIN are further shown in Table 4.A2. Pathway adaptor (MyD88TLR4IL1, IL1IL1R1MyD88TOLLIPIRAK1)—MyD88 is the universal adaptor for TLRs and is also found a member of the IL-1R subfamily [86]. Upon binding of IL-1 the IL-1R1 will be associated with IL-1RAcP, forming a functional signaling receptor complex [84]. Then TIR domaincontaining adaptor protein MyD88 is recruited to the receptor complex [85]. This will lead to the translocation of IRAK1, together with the adaptor protein TOLLIP [102], into the IL-1R1 and then IRAK1 will interact with TRAF6 [103]. The functional modules responsible for pathway adaptation are also observed in the PPIN.
I. Systems Immunology
4.3 Signaling transduction, signaling pathways, and their cross talks in inflammatory response
99
Protein kinase (TRAF6IRAK1IRAK4, TRAF6IRAK1TAK1TAB1TAB2) and IKK activation (TRAF6IKKs)—After forming the pathway adaptor modules, IRAK4 is recruited to TRAF6 and activated by intramolecular autophosphorylation [104]. Activation of IRAK4 could lead to the phosphorylation of IRAK1, procuring the full kinase activity [104,105]. Then the IRAK1TRAF6 complex can interact with a preexisting TAK1TAB1TAB2 membranebound complex [106], thus forming the protein kinase module. Afterward, the protein kinase module will translocate to the cytosol. On the other hand, IRAK1 stays at the membrane and becomes poly-ubiquitinated. In the cytoplasm, TRAF6 will interact with the E2 ubiquitinconjugating enzyme complex Ubc13/Uev1A [107]. In addition, lysine 124 in TRAF6 has been identified as the main ubiquitin acceptor site for autoubiquitination, and the mutation of this lysine has been found leading to the impaired TAK1, IKK, and JNK activation [108,109]. Oligomerization of TRAF6 might lead to autopolyubiquitination of TRAF6, which is found necessary for IL-1- and LPS-induced NF-κB activation, whereas TRAF6-induced poly-ubiquitination of NEMO (NF-κB essential modulator) has been found to play a role in IL-1-induced JNK activation [68].
4.3.4 Cross-talk analysis of the proteinprotein interaction networks Our analyses have demonstrated that many protein interactions have shown characteristics of a real inflammatory system in the TNFα, IL-1, and TLR4 signaling pathways in PPINs. To further understand the dynamic properties of hubs and cross talks among different signaling pathways in the PPINs, we have counted the CTRVs of each protein for cross-talk analysis of the refined PPINs. Proteins in this cross-talk analysis are classified into four major signaling pathways, including TNFα, IL-1, MyD88-dependent, and MyD88-independent signaling pathways. In addition, negative regulators may also be essential factors to be taken into consideration in this cross-talk analysis because the cell responses must be stringently regulated. The exaggerated expression of signaling components and proinflammatory cytokines may cause devastating effects on the host. The negative regulators can act at multiple levels within inflammatory signaling cascades, as well as can elicit negative feedback mechanisms to synchronize the positive activation and negative regulation of signal transduction to avoid potentially harmful consequences [79]. The results of the cross-talk analysis of PPIN are shown in Tables 4.3 and 4.4. In Table 4.3 the CTRVs and link values, which indicate the total numbers of interaction for each node at six different time stages of inflammation, are listed for all 60 proteins. The contrast between link and CTRVs (e.g., CASP8) can reflect that not all proteins with high connective degree in the PPIN of inflammation would also be with high CTRVs. In Table 4.4, proteins with high CTRVs at different time stages of inflammation are presented. From Tables 4.2 and 4.4, we find that some proteins with high CTRVs at different time stages of inflammation such as NIK and A20 are not ranked top according to node degree, revealing that the proposed method could generate new insights into the evaluation of cross-talk candidates. The biological significance of some highly ranked proteins we have identified (Table 4.3) is also investigated. IL-1R-activated kinase 1 (IRAK1, ranked no. 1) is one of the key mediators in the signaling pathways of TLRs/IL-1Rs. IRAKs that can initiate a cascade of signaling events will eventually lead to the induction of inflammatory target gene expression [110].
I. Systems Immunology
100
4. Dynamic cross-talk analysis among signaling transduction pathways
TABLE 4.3 Cross-talk ranking values (CTRVs) and link values of significant proteins in proteinprotein interaction network (PPIN) [5].
No.
Protein
CTRV
Link
No.
Protein
CTRV
Link
1
IRAK1
108
140
31
IL1R1
21
21
2
TRAF6
83
124
32
FADD
19
55
3
NIK
77
89
33
GRB2
18
46
4
A20
61
61
34
TAK1
17
66
5
MYD88
59
107
35
SOCS1
17
17
6
TRAF2
58
198
36
TRIF
16
54
7
IKKα
55
83
37
TMED1
12
12
8
SIGIRR
53
73
38
CASP8
10
87
9
TLR4
53
53
39
TAB2
10
53
10
IKKγ
52
84
40
CAV1
10
44
11
IKKβ
47
97
41
TAB1
9
44
12
BCL10
43
62
42
TTRAP
8
8
13
UBE2N
41
41
43
ECSIT
8
50
14
IRAK4
38
112
44
CYLD
6
6
15
RIP
38
57
45
TNF
0
48
16
TRAF3
37
57
46
TNFR2
0
39
17
IRAK3
36
36
47
IKKε
0
29
18
ST2L
32
32
48
PELI2
0
19
19
TNIP3
32
32
49
IL1B
0
18
20
PTPN11
31
31
50
MEKK3
0
18
21
TIRAP
29
40
51
TRAM
0
18
22
TRAF4
27
27
52
IL1A
0
17
23
TOLLIP
26
47
53
IL1R2
0
17
24
RNF216
25
25
54
TRAF5
0
17
25
TNFR1
24
111
55
SOS1
0
10
26
RIP3
23
93
56
IRF3
0
9
27
TRADD
23
32
57
NOD2
0
0
28
TBK1
22
39
58
CD14
0
0 (Continued)
I. Systems Immunology
4.3 Signaling transduction, signaling pathways, and their cross talks in inflammatory response
101
TABLE 4.3 (Continued)
No.
Protein
CTRV
Link
No.
Protein
CTRV
Link
29
PELI1
21
49
59
SARM1
0
0
30
TANK
21
40
60
FLN29
0
0
To integrate the information from the complex PPINs, the CTRVs at different time stages are summed up for each protein. In contrast with CTRVs, the values of link represent the total number of proteinprotein interactions at six time stages of inflammation each protein. It reveals that not all nodes with high connective degree will also have high CTRVs such as CASP8.
TABLE 4.4 Proteins with high cross-talk ranking values at different time stages of inflammation [5].
Duration (h)
Top-ranked proteins
01
IRAK1
NIK
IKKα
TRAF6
IKKβ
12
TRAF6
IRAK1
NIK
A20
TRAF2
23
IRAK1
TRAF6
NIK
A20
TRAF2
34
IRAK1
NIK
TRAF6
A20
IKKγ
46
IRAK1
TRAF6
NIK
TRAF2
A20
68
IRAK1
NIK
TRAF6
IKKα
A20
In contract with Table 4.2, proteins with maximal CTRVs are listed here. Among these highly ranked proteins, NIK and A20 are considered as two core elements in the signaling network by the analysis of CTRV, but their importance is not exhibited in Table 4.2. This reveals that our proposed method generates a new insight into the evaluation of cross-talk candidates.
IRAK1 activation can constitute an essential signaling module in both IL-1R and TLR signal transductions. Binding to myeloid differentiation primary response protein (MyD88, ranked no. 5) can bring IRAK1 and IRAK4 together at the receptor complex and facilitate the phosphorylation of IRAK1. In the downstream of MyD88 the critical role of IRAK and TRAF6 in related pathways has also been confirmed in some knockout studies. Cells from IRAKdeficient mice have been shown to be defective in their response to IL-1 and IL-18 [111]. Tumor necrosis factor receptorassociated factor 6 (TRAF6, ranked no. 2) is also found a pivot signaling molecule regulating a diverse array of physiological processes, including adaptive immunity, innate immunity, and the development of several tissues. It is also found to be essential for the signaling downstream of the IL-1R/TLR superfamily [112]. The important biological role of TRAF6 in the IL-1R/TLR signaling pathway has been demonstrated by the targeted deletion of TRAF6 [113,114]. MAP kinase kinase kinase 14 (NIK, ranked no. 3) is an essential member of the MAPKKK family that may either directly or indirectly phosphorylate or activate IKKα/β, leading to the phosphorylation and degradation of IκBα followed by NF-κB activation [115]. NIK is also found a common mediator of NF-κB activation by the TNF receptor family and shown to activate the downstream of TRAFassociating receptor signaling pathways, including TNFR, CD40, CD30, and LTβr [116,117].
I. Systems Immunology
102
4. Dynamic cross-talk analysis among signaling transduction pathways
Tumor necrosis factor, alpha-induced protein 3 (TNFAIP3/A20, ranked no. 4) is a protein to be induced in many cell types and by a wide range of stimuli [118]. Although A20 was originally found as an inhibitor of TNF-induced apoptosis [119], it has been most intensively studied as an inhibitor of NF-κB activation. The study based on A20-deficient mice and RNA interference technologies has revealed the essential role of A20 in a variety of pathogen- and cytokine-induced signaling pathways. The fact that mice lacking A20 are born at normal Mendelian ratios but die shortly after birth due to massive multiorgan inflammation indicates of a key role for A20 in homeostasis of the host [120]. There are still many highly ranked proteins with important roles in signal transduction such as MyD88 and SIGIRR; however, it is not our intension to use an exhaustive attack method to prove them one by one. Instead, the global properties and robustness of the network architecture will be investigated in the sequel.
4.4 Discussion In this chapter the candidate PPINs are constructed at first based on the selected proteins of interest and the database information of PPIs. Eq. (4.1) is used to mathematically describe the relationship between the target protein and the possibly interacting proteins in the candidate PPIN. For each protein in the candidate PPIN, the possible interactions are established via Eq. (4.1). Next, with the help of microarray data, the interaction parameters in Eq. (4.1) are identified using the constrained least squares parameter estimation method, that is, every interaction should be confirmed by the real microarray data. Finally, the system order detection scheme AIC in Eq. (3.14) is used to determine the insignificant interaction parameter bpq ’s in Eq. (4.1), thus pruning the initial candidate PPIN into the PPIN by deleting the insignificant interactions out of system order. The dataset of 25 time points microarray profile is divided into six time stages of inflammation, and six PPINs are constructed based on these six subsets of microarray data. Some dynamic characteristics and structures of the reconstructed PPINs of inflammation system are discussed in the following sections.
4.4.1 Dynamic progression of the proteinprotein interaction networks To determine the dynamic progression of PPINs at different time stages of inflammation, the time-series PPINs from 0 to 3 h are present in Fig. 4.6, in which the positions of nodes are rearranged based on approximately up/downstream relations. Further, the complete and pellucid progression of PPINs from 0 to 8 h is presented in Fig. 4.A1 (Section 4.6). In the first hour a very obvious signal cascade passes thcandidate TNFR1, TRADD, TRAF2, RIP, NIK, TAK1/TAB1/TAB2 complex and finally reaches the IKK complex (see Fig. 4.6, 01 h, red edges). It seems to be the rapidest way to activate the NF-κB transcription by TNFα induction. This well-characterized pathway contains NF-κB, JNK, p42/p44, MAPK, and p38 MAPK [75]. Interestingly, NIK seems to be an important mediator between RIP and IKK family in our network diagram but has been not previously considered to be a part of the TNF-induced NF-κB activation. However, Yin et al. found NIK to be required for NF-κB activation by LTβR [121]. In addition, MAP/ERK kinase kinase kinase 3 (MEKK3) is also involved in this pathway by RIP. Gene deletion studies have indicated that MEKK3 is
I. Systems Immunology
4.4 Discussion
103
FIGURE 4.6 Dynamic progression of PPINs for HUVEC under TNFα stress [5]. HUVEC, Human umbilical vein endothelial cells; PPINs, proteinprotein interaction networks; TNF, tumor necrosis factor.
required for IKK activation to function the downstream of RIP in TNF-induced NF-κB activation [122]. In the second hour the IL-1R/TLR4 signaling pathways are sequentially turned on after the TNFR-related pathway (see Figs. 4.6, 12 h, red edges). This observation suggests the movement of autocrine signaling in a cell to secrete a hormone or chemical messenger to bind to autocrine receptors on the same cell type, which could lead to changes in the cells. At this time point the connections in the TNFR signaling pathway become relatively less than those shown in the first hour and even display an inhibitory effect on this signaling pathway. The reason for the rearrangement of protein interactions might be that the cell’s inflammatory mechanism tends to focus on some specific mechanisms that need to largely share common community to fight against the pathogens, rather than distributing resources to the overextended signaling pathways. In the third hour the TNFR-related pathway is triggered again. It is found that several negative regulators such as SOC1 [inhibitor of TIRAP (MAL) and p56], IRAK3 (inhibitor of IRAK1 and IRAK4), and A20 (inhibitor of TRAF6) are large expressed at this stage, reflecting the inhibitory effect of antiinflammation (see Figs. 4.6, 23 h, purple edges). Apart from these three PPINs, the dynamic progression properties of
I. Systems Immunology
104
4. Dynamic cross-talk analysis among signaling transduction pathways
another three residual networks of the late stages of immunity (34, 46, and 68 h) are more like at the steady state without large perturbations [see Figure 4.A1 (Section 4.6)].
4.4.2 Specific architecture in the signaling transduction network A cell’s behavior is a consequence of the complex network interactions between its numerous constituents, such as DNA, RNA, proteins, and small molecules. Cells use signaling pathways and regulatory mechanisms to coordinate multiple cellular processes, allowing them to respond and adapt to an ever-changing environment. In the case of environmental pathogen invading, the human inflammatory system is required to rapidly take an appropriate response to eliminate or moderate the lethal factors without unnecessary wastage. Fig. 4.7 displays the bow-tie structure extracted from the
FIGURE 4.7
Bow-tie structure under TNFα stress for multiple signaling pathways in the inflammatory system. A specific architecture of TNFα-induced endothelial inflammatory system is extracted from the PPIN in which the core elements of the bow-tie structure are identified via the CTRV ranking algorithm. Upon ligand binding, various types of receptors on the cell membrane trigger different signaling pathways and activate the downstream corresponding transcription factors such as NF-κB. NF-κB then regulates the expression of genes involved in inflammatory responses. These kinds of gene expression will induce some particular biological mechanisms helping the host to defense the invading microorganisms. In addition, the translation of cytokines and some negatively regulatory proteins will play roles of feedback control to coordinate the balance in immunity [5]. CTRVs, Cross-talk ranking values; NF-κB, nuclear factor kappa-B; PPIN, proteinprotein interaction network; TNF, tumor necrosis factor.
I. Systems Immunology
4.4 Discussion
105
PPINs in the inflammatory system. The core elements in the bow-tie structure of the human inflammatory system are the highly ranked proteins according to the CTRV ranking algorithm. As different receptors on the cell membrane are only able to recognize their specific pathogen-associated molecular patterns (PAMPs), these receptors on the membrane need to form various cellular functional modules by complex and dimer/trimer assembling for representing different signals of pathogens. Then these various types of pathogen signal would converge to a common cross-talk, that is, the core elements in the bow-tie structure. Therefore the cross talks may be considered as a robust and efficient signal processor on the pivotal position of signal transduction, which plays a role of rearranging the sinew and determines which necessary protective mechanisms or recruitments should be activated. By the coordination of signal processor, inflammatory system could organize the order and balance in the cell and human body. Oda and Kitano have manually integrated 411 published literatures and presented a comprehensive map of TLRs and IL-1R signaling transduction networks under different stimulant conditions [123]. This map has illustrated the possible existence of a main signaling network subsystem that has a bow-tie structure in which MyD88 is a nonredundant core element, two collateral subsystems with small GTPase and phosphatidylinositol signaling, and MyD88-independent signaling pathway [123]. In comparison of the proposed ranking results with their signaling network, it reveals that the top-ranked proteins such as IRAK1 (ranked no. 1), TRAF6 (ranked no. 2), and MyD88 (ranked no. 5) in the study have also been considered as the pivotal roles in the bow-tie core process in the inflammatory system. Specifically, the core process mediates various types of stimulant signals from environment and triggers the downstream activation of NF-κB and MAPK signal cascade, leading to the induction of many target genes such as cytokines. However, the map of Oda and Kitano only reflects the possible static connections without stimulus-specific response or temporal changes. In contrast, we have integrated the gene expression patterns from time-course microarray data to infer the dynamic PPIs in the PPINs of inflammatory system. Consequently, our results may suggest a more significant and realistic bow-tie core network under a specific stimulus. Time-series refined PPINs under the TNFα stress from 0 to 3 h are presented to monitor the dynamic properties of interaction progression. Positions of nodes are rearranged based on approximately up/downstream relationships of the proteins. The general signaling proteins are shown as elliptic nodes; square nodes represent the receptor proteins and diamond nodes represent the possible negative regulator proteins. Dash gray lines represent the interactions that are related to negative regulators. The levels of gene expression are indicated by the node color, in which the red color means the gene expression at that time is higher than its gene expression without TNFα treatment and the green color means the gene expression at that time is lower than its gene expression without TNFα treatment. The complete and pellucid time-series diagrams from 0 to 8 h are presented in Fig. 4.A1 (Section 4.6).
4.4.3 Possible existence of Toll-like receptor 4 endogenous ligand Though the main focus of this study is the TNFRs that are different from the TLRs in Oda and Kitano [123], numerous similar functional modules are identified in this
I. Systems Immunology
106
4. Dynamic cross-talk analysis among signaling transduction pathways
chapter. This observation could reveal the characteristics of module community in the inflammatory system and the presence of active feedback signals from cytokines. It has been proven that TNFα-induced cells will activate the transcriptional expression of several genes to encode cytokines such as IL-1α, IL-1β, and IL-6 [83]. These autocrine signals can act as the positive feedback signals to enhance the inflammatory responses by turning on other correlated inflammatory signaling pathways, such as IL-1 and MyD88dependent signaling pathway of TLR4. Interestingly, in addition to the two signaling pathways mentioned above, the TNFα-induced HUVEC model also exhibits the virtually complete activation of MyD88-independent signaling pathway of TLR4, which theoretically may not be involved in the single TNFα-treated condition (see Fig. 4.5B). It has been shown that the increased expression and signaling by TLR4 may contribute to the activation of innate immunity of inflammatory system in the injured myocardium [124]. Because no infection is evident in this model, our observation raises the intriguing possibility that TLR4 may also have function during the inflammation, possibly in response to an endogenous ligand. One candidate of this ligand is S100, a multigenic family of nonubiquitous Ca21-modulated proteins of the EF-hand type expressed in vertebrates exclusively [125]. It has been also demonstrated that primary tumors secrete soluble factors, including VEGF-A, TGFβ, and TNFα, which can induce the expression of S100 in the myeloid and endothelial cells within the lung prior to tumor metastasis [126]. Recently, the increased S100A8 and S100A9 levels have been also detected in various human cancers, presenting abundant expression in neoplastic tumor cells as well as infiltrating immune cells [127]. Its expression and potential cytokine-like function in inflammation and in cancer suggests that S100A8/A9 may play a key role in inflammation-associated cancer. Another candidate is high-mobility group box 1 (HMGB1), which is found to be one kind of damage-associated molecular patterns. HMGB1 is a nuclear protein expressed in nearly all cell types. In normal conditions, HMGB1 binds to DNA and facilitates gene transcription. Under stress conditions such as injury and infection, HMGB1 is released to promote inflammation [128]. TLR4 has been identified as a receptor of HMGB1 as well as TLR2 and RAGE (receptor of advance glycation end product) [129]. As the mechanisms of promoting the release of HMGB1 and its activating signaling pathways remain to be completely elucidated, HMGB1 also seems to have several regenerative effects on leading to tissue repair [128,130]. Therefore HMGB1 has much potential in clinical medicine.
4.4.4 Negative feedback controls of the cross talks among multiple signaling pathways Inflammation is normally a protective response to destroy, dilute, or isolate an eliciting agent and to promote the repair of injured tissue. However, when inflammation is excessive or persistent, it may lead to tissue injury or organ dysfunction and may contribute to the pathogenesis of disease. For this reason the antiinflammation stage cells need to make use of the negative regulator proteins and cytokines to inhibit the cellular functions of some dominant cross talks to recover from the dramatic activations in inflammation. These negative feedback have also been proven to be related to some
I. Systems Immunology
4.5 Conclusion
107
TNFα-mediated inflammatory diseases. For example, A20 has been identified as a negative regulator of the core cross-talk element TRAF6 [131], and several studies have shown that deficiencies in A20 are related to some autoimmune diseases, including rheumatoid arthritis [132], systemic lupus erythematosus [133], and Crohn’s disease [134]. These clues also reveal the fragility of the bow-tie structure. Because of the nonredundant property of core is found elements in the bow-tie structure of inflammatory system, some targeted perturbations from environment and lethal pathogens might result in destructive consequences. On the other hand, negative feedback controls may also play an essential role to buffer the wide range of environment stimuli. Previous studies have indicated that proteasome inhibition can suppress TNFα-induced activation of adhesion molecules in endothelial cells in vitro [135,136]. These studies have shown that short-term treatment of endothelial cells with high doses of proteasome inhibitors might result in strong inhibition of cytokine-induced expression by preventing nuclear translocation of NF-κB. Cheong et al. have provided an iterative computational and experimental investigation of the dynamic properties of TNFα-mediated activation of the transcription factor NF-κB [137]. They have found that the temporal profile of the NF-κB activity is invariant to the TNFα dose from 0.1 to 10 ng/mL. These discoveries will reflect the properties of robustness and protective mechanisms in the inflammatory system with the effects of feedback controls on the cross talks in the signaling network. As the environmental stresses can trigger the appropriate responses to protect organisms themselves, it will help their survival. However, if the stresses are suddenly perturbed to an acute level and the inflammatory responses still take a correspondingly sharp response to excessively activate the downstream reactions, the protective mechanisms in the inflammatory system will instead injure the organisms. For this reason, there must be some cross talks to buffer the perturbation from upstream signals and respond to the downstream negative feedback regulators to alleviate the scale of reflection in the endothelial inflammation system scale.
4.5 Conclusion In this chapter, we attempt to integrate PPIs from databases and gene expression profiles of TNFα-induced HUVEC to construct the PPINs) at different inflammation stages to illustrate the development of an endothelial inflammatory system. A new cross-talk ranking method is also proposed to evaluate the potential core elements in the related signaling pathways of TLR4 as well as receptors for TNF (TNFR) IL-1R. Further, some highly ranked cross-talk pathways that are functionally relevant to the TNFα stress are also identified. A bow-tie structure is then extracted from these cross-talk pathways for the robustness of network structure of inflammation system, the coordination of signal transduction pathways, and the feedback control regulations for efficient inflammatory responses on different stimuli. Further, several characteristics such as possible existence of TLR4 endogenous ligand and the effects of negative feedback control to the cross talks are also discussed.
I. Systems Immunology
108
4. Dynamic cross-talk analysis among signaling transduction pathways
A systematic approach based on stochastic dynamic model is proposed for biologists to get insight into the underlying defense mechanisms of endothelial inflammatory systems via the construction of corresponding signaling networks upon specific stimulus. The dynamic model provides a protective regulatory mechanism of the inflammatory networks, resulting in not only qualitative but also quantitative dynamic characterization. This systematic model can also be integrated with the downstream signaling networks such as PPIN of NF-κB and gene regulatory networks [3] to investigate the more comprehensive protective mechanism of inflammation. Further, based on the dynamic model, we can also discuss the robust stability and noise filtering ability of the biological network against the intrinsic fluctuation and environmental disturbances [138]. In addition, this systematic approach can be applied to other PPINs under different conditions in different species. As better experimental techniques for protein expression detection and microarray data with multiple sampling points will become available in the future, the performance of the proposed method will be much improved and the dynamic PPINs under different conditions can be compared extensively to investigate their molecular mechanisms from the proposed systems biology method.
4.6 Appendix: Supplementary methods 4.6.1 Identification of the interactive parameters of proteinprotein interaction networks After constructing the stochastic dynamic model of the candidate PPINs from Eq. (4.1), we have to identify the parameters in the dynamic model with the gene expression signatures. Since the interactive parameters in Eq. (4.1) have certain constraints (αp , β p $ 0), we identify the interactive parameters by solving the constrained least squares problems. Eq. (4.1) can be rewritten in the following regression form: 3 2 1 6 bp1 7 7 6 6 ^ 7 7 yp ½t 1 1 5 yp ½t yp ½t y1 ½t ? yp ½t yQ ½t xp ½t 2yp ½t 6 6 bpQ 7 1 ωp ½t (4.2) 7 6 4 αp 5 βp ψp ½t ηp 1 ωp ½t
where ϕp ½t indicates the regression vector and ηp is the parameter vector to be estimated. By the cubic spline interpolation method to avoid the overfitting in the parameter estimation at different time points, Eq. (4.2) can be presented as the following equation [82]:
Y n 5 Ψn η n 1 Ω n
I. Systems Immunology
(4.3)
109
4.6 Appendix: Supplementary methods
The parameter identification problem is then formulated as follows: 1 2 min :Ψn ηn 2Yn :2 such that Cηn # d (4.4) nn 2 T where C 5 diag 0 ? 0 21 21 ; d 5 0 ? 0 0 give the constraints to force the translation effect αp and the degradation effect β p in Eq. (4.1) to be always nonnegative, that is, αp ; β p $ 0. The constrained least squares optimization problem can be solved using the active set method for quadratic programming. Here the gene expression profiles were adapted to infer the dynamic changes of interactions of proteins at different time stages of inflammation, because we do not have trustworthy protein microarray data that can measure thousands of protein expression levels simultaneously as DNA microarray. If the experimental techniques are developed to offer the high-thcandidateput protein expression profiles, the identification of the regulatory parameters can be more reliable. In addition, due to the lack of protein expression data and the nondirection of the protein interactions in the candidate PPIN, the values of the interaction parameter bpq 0 s are not deeply discussed in this study. Instead, we take a system view to investigate the global properties and the network development at serial time stages of inflammatory system.
4.6.2 Determination of significant interaction pairs After identifying the interaction parameters of the dynamic model in Eq. (4.1), there are still some insignificant interaction coefficients being identified. In order to determine whether an interactive protein is significant or not in the candidate PPIN, a statistical approach based on model order selection is proposed for evaluating the significance of our model parameters to prune the candidate PPIN. We employ AIC in equation (3.14) [81,139] for determination of significant interactions in the candidate PPIN. AIC is a model order selection method, which attempts to include both the TABLE 4.A1 Investigation of the tumor necrosis factor (TNF)α-induced proteinprotein interaction network of inflammation [5]. Function
Related proteins
NF-κB activation
TNFR1
Modules extracted from PPINs
Evidence [89,90]
TRADD TRAF2 RIP
TRAF5
[91] (Continued)
I. Systems Immunology
110
4. Dynamic cross-talk analysis among signaling transduction pathways
TABLE 4.A1 (Continued) Function
Related proteins
Modules extracted from PPINs
Evidence
RIP MEKK3 [92,93]
IKKβ TAK1
IKK activation
[94]
RIP IKKs
Apoptosis
[95]
TRADD CASP8 RIP FADD TRAF2
Protein recruitment
[93]
RIP TAK1
Protein kinase
[96]
TAK1 TAB1 TAB2
Modules with significant proteinprotein interactions, which form some specific functional complex, are extracted from Fig. 4.4. The references that support these interactions are listed in the evidence column, and the effects of these functional interactions are described in the text. NF-κB, Nuclear factor kappa-B; PPIN, proteinprotein interaction network.
I. Systems Immunology
111
4.6 Appendix: Supplementary methods
TABLE 4.A2
Investigation of the IL-1 and TLR4 proteinprotein interaction network of inflammation [5].
Function
Related proteins
Pathway adaptor
MyD88
Modules extracted from PPINs
Evidence [86]
TLR4 IL1R1 IL1 [84,85,102,103]
IL1RI MyD88 TOLLIP IRAK1
Protein kinase
[104,105]
TRAF6 IRAK1 IRAK4
TRAF6
[106109]
IRAK1 TAK1 TAB1 TAB2 IKK activation
[68]
TRAF6 IKKs
Modules with significant proteinprotein interactions, which form some specific functional complex, are extracted from Fig. 4.4. The references that support these interactions are listed in the evidence column, and the effects of these functional interactions are described in the text. PPIN, Proteinprotein interaction network.
I. Systems Immunology
112
4. Dynamic cross-talk analysis among signaling transduction pathways
FIGURE 4.A1 The complete and pellucid time-series diagrams of PPINs of inflammation for HUVEC under TNFα stress from 0 to 8 h [5]. HUVEC, Human umbilical vein endothelial cells; TNF, tumor necrosis factor.
I. Systems Immunology
4.6 Appendix: Supplementary methods
113
estimated residual variance and model complexity in one statistic. It decreases as the residual variance decreases and increases as the number of parameters increases. As the expected residual variance decreases with increasing parameter numbers for nonadequate model complexities, there should be a minimum near the correct parameter number [81,139]. Therefore we can use AIC to select model structure based on the interaction parameters (bpq 0 s) identified previously.
I. Systems Immunology
C H A P T E R
5 Prediction of infection-associated genes via a cellular molecular network approach: A Candida albicans infection case study 5.1 Introduction Candida albicans is found to be the most prevalent opportunistic fungal pathogen in humans, to cause superficial infections in the oral and vaginal mucosa as well as lifethreatening systemic infections [139]. Since C. albicans has been one of the leading causes of hospital-acquired bloodstream infections, a large amount of its virulence factors have been widely studied, including the ability to undergo morphogenesis and phenotypic switching, as well as the secretion of adhesions and hydrolytic enzymes [140142]. Further, in vitro infection models have been used to demonstrate that the infection process of C. albicans can be separated into three stages: adhesion, invasion, and damage [143,144]. The initial stage of infection process of C. albicans is characterized by the physical attachment of C. albicans to host tissues. In the second stage, C. albicans will enter into host cells by the active penetration and induced endocytosis [145,146], also known as the invasion stage. In the last stage of infection, it will lead to the substantial cell damage and destruction of host tissues [143]. Wa¨chtler et al. have used a systematic approach to examine the contribution of 26 selected genes and their mechanistic roles during the three infection stages of C. albicans [144]. By using specific gene deletion mutants, Wa¨chtler et al. have assessed the ability of each mutant to adhere to, invade, and cause the damage to host cells during C. albicans infection. Although they have successfully determined the extent of the contribution of each gene in the adhesion, invasion, and damage phenotypes to host cells during C. albicans infection, the construction of mutant strains is labor-intensive and time-consuming process. Therefore we will aim at predicting phenotype-associated genes whose cellular functions may be responsible for the phenotypes of three infection stages in order to enhance our understanding of C. albicans infection from the cellular molecular network perspective. The understanding of cells or organisms at the systems level has been a recurrent research topic in the biological
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00012-2
117
© 2021 Elsevier Inc. All rights reserved.
118
5. Prediction of infection-associated genes via a cellular molecular network approach: A Candida albicans infection case study
sciences [147]. Although the study of individual genes and proteins remains important, understanding the structure and dynamics of a biological system has become increasingly necessary. An organism is not simply an assembly of genes and proteins but also the cellular network properties and interconnectivity between genes and proteins in cells, and these, in fact, can facilitate the full cellular functionality of a cell [148,149]. Hence, key research approaches are using experimental, statistical, and systematic modeling to investigate these cellular molecular networks to discover the basic cellular functions of genes and essential molecular mechanisms involved in various biological phenomena [150]. Many researches have deduced the cellular functions of various proteins using proteinprotein interaction (PPI) networks [151153]. Nevertheless, some protein interaction data in published literature and databases are obtained under different experimental conditions and seem inappropriate for the specific characterization of C. albicanshost interaction during the infection process, specifically. In previous chapters, systems biology method has been developed to construct gene regulatory and protein interaction networks based on the integration of omics data [82]. The proposed systems biology method has been shown to be powerful and widely applicable, usable under different conditions and for different species. In this chapter, with the help of high-throughput omics data, network construction schemes will be employed to construct the cellular molecular networks, that is, gene regulatory and protein interaction networks, involved in C. albicans infection and to predict the infection-associated genes in different infection stages of C. albicans. The underlying principle of phenotypeassociated gene prediction using cellular molecular networks is that the proteins more close to one another in a PPI network are more likely to have similar cellular function [152]. In addition, genes regulated by the same transcription factors (TFs) tend to have similar cellular functions [154]. Therefore these concepts are combined together and utilized to predict phenotype-associated genes according to gene regulatory and protein interaction networks as shown in Fig. 5.1 Based on this principle, some experimentally validated genes whose mutations result in defective adhesion, invasion, and damage phenotypes are first collected. Then, the predicted phenotype-associated genes are regulated by similar TFs, which have been found to regulate experimentally validated genes at the gene level and encode proteins to interact with many experimentally validated proteins at the protein level. In this chapter the phenotype-associated genes responsible for the adhesion, invasion, and damage stages of C. albicans infection are determined and their roles in C. albicanshost interactions are also investigated for pathogenic mechanism in different infection stages. These results provide insights into the responses of infection-associated genes in C. albicans upon its interactions with the host, to provide potential biomarkers as drug targets for possibly facilitating the development of new strategies to treat these infection-associated genes to prevent and control C. albicans infection.
5.2 Methods of constructing cellular molecular networks in Candida albicans infection 5.2.1 Method overview and data selection The method employed to predict C. albicans infection stage-associated genes via cellular molecular networks can be divided into two steps: (1) The first step is to construct
II. Systems Infection Microbiology
5.2 Methods of constructing cellular molecular networks in Candida albicans infection
119
FIGURE 5.1 Schematic diagram for phenotype-associated gene prediction using cellular molecular network approach. In light of the constructed gene regulatory network and protein interaction network, the predicted phenotype-associated genes are regulated by similar TFs that regulate experimentally validated genes at the gene level, and encode proteins that interact with many experimentally validated proteins at the protein level. The phenotype-associated gene and protein predicted by the cellular molecular network approach are filled with crossed lines in the diagram [7]. TFs, Transcription factors.
C. albicans gene regulatory and protein interaction networks during infection. (2) The second step is to identify the genes regulated by similar TFs that can regulate experimentally validated genes at the gene level, and encode proteins that can interact with many experimentally validated proteins at the protein level based on the constructed cellular molecular networks (Fig. 5.1). For the first step of cellular molecular network construction, C. albicans TF-gene regulatory associations, C. albicans PPIs, and gene expression profiles during infection are necessary. However, high-throughput screening such as PPI and ChIP-chip data for C. albicans is currently limited. By the facts that C. albicans and Saccharomyces cerevisiae, the most wellstudied eukaryotic model organism [155,156], are closely related (both fall within the hemiascomycete group), and that the C. albicans genome sequence is now available, allowing for the identification of orthologs between these two species, potential TF-gene regulatory associations, and PPIs in C. albicans can be inferred from the corresponding information in S. cerevisiae using ortholog information [4]. By big database mining regulatory associations between TFs and genes in S. cerevisiae are extracted from YEASTRACT database (http:// www.yeastract.com/) [157]; PPI data in S. cerevisiae are obtained from the Biological General Repository for Interaction Datasets database (http://thebiogrid.org/) [73]; ortholog information between C. albicans and S. cerevisiae genes is extracted from the Candida Genome Database (CGD) (http://www.candidagenome. org/) [158]. If there is a regulatory
II. Systems Infection Microbiology
120
5. Prediction of infection-associated genes via a cellular molecular network approach: A Candida albicans infection case study
association between TF A and Gene B in S. cerevisiae and TF A and Gene B had orthologs in C. albicans (TF A0 and Gene B0 , respectively), we can infer that TF A0 potentially regulates Gene B0 in C. albicans, that is, a potential TF-gene regulatory association also exists between TF A0 and Gene B0 in C. albicans [4]. Potential PPIs in C. albicans can be also inferred in a similar way. The inferred TF-gene associations and the currently available ChIP-chip information for C. albicans from data mining of published literature [159173] contain a large amount of false positives and need further pruning of real data. In addition, genome-wide microarray data from Zakikhany et al. [143], which have profiled time-course gene expression during an experimental C. albicans infection in the reconstructed human oral epithelium (RHE) over 24 h (1, 3, 6, 12, 24 h postinfection with two to five biological replicates), will be used to prune these false positives. RHE is a three-dimensional organotypic epithelial model of human oral and vaginal mucosa developed by SkinEthic Laboratories (France). As this model expresses all natural major markers of the epithelial basement membrane and epithelial differentiation, and even possesses tissue repair mechanisms, it will be utilized to mimic in vivo C. albicans infection [174]. For the second step of infection gene prediction at different infection stages, mutant phenotype data from the CGD [158] and published literature will be employed to match the experimentally validated genes involved in the adhesion, invasion, and damage stages of C. albicans infection. Subsequently, three pools of infection stageassociated genes are created and used as the starting point for phenotype-associated gene prediction in the three stages of C. albicans infection.
5.2.2 Constructing cellular molecular network The strategy for constructing cellular molecular network is to build candidate molecular networks based on TF-gene regulatory associations/PPIs under all possible experimental conditions by big data mining from the literature and databases, and then to refine the candidate molecular networks for a specific condition with the help of corresponding microarray data [82]. In light of all possible TF-gene regulatory associations/PPIs in S. cerevisiae and the ortholog information between C. albicans and S. cerevisiae genes, we can infer potential TF-gene regulatory associations/PPIs in C. albicans [4]. Accordingly, the candidate gene regulatory network of C. albicans can be easily constructed by linking TFs and genes with potential TF-gene regulatory associations. By the same way the candidate C. albicans protein interaction network can be constructed by linking proteins that potentially interact with each other. Since the candidate gene regulatory and protein interaction networks were constructed by mining big data obtained from literature and various databases where experiments were always performed under different biological conditions, they may not appropriately represent the specific cellular process of interest during the C. albicans infection and contain a large amount of false positives. Therefore it would be indeed tempting to refine these candidate networks by pruning these false positives using microarray data of RHE infection with C. albicans. In this chapter, dynamic models will be employed to describe the dynamic transcriptional regulations between TFs and their target genes in the candidate gene regulatory network as well as dynamic interactions between proteins in the candidate protein interaction network [82] (see Appendix for details). With the help of time-course microarray data, the system parameter estimation method and the model
II. Systems Infection Microbiology
5.2 Methods of constructing cellular molecular networks in Candida albicans infection
121
order selection scheme Akaike Information Criterion (AIC) will then be used to detect real regulations and interactions in the cellular molecular networks for C. albicans infection [40,81,82] (see Appendix for details). As a result, the candidate cellular molecular networks will be refined by pruning false positives, and the gene regulatory and protein interaction networks for C. albicans infection can be constructed.
5.2.3 Predicting infection-associated genes at different infection stages Based on mutant phenotype information from the CGD [158] and literature evidence, three pools of experimentally validated genes involved in the adhesion, invasion, and damage stages of infection are specified and used as the starting point for C. albicans infection-associated gene prediction in three stages of infection. Based on the constructed gene regulatory and protein interaction networks in C. albicans infection and the experimentally validated genes within literature, we aim at finding the genes with similar TFs and interacting translated proteins as shown in Fig. 5.1. For each infection-associated gene pool of each infection stage, we first identify the significant TFs that regulate these experimentally validated genes [4] and then determine the potential infection stage-associated genes that are regulated by the significant TFs according to the constructed gene regulatory network. Accordingly, the potential infection stageassociated genes will be regulated by similar TFs that regulate experimentally validated genes. For each TF in the constructed gene regulatory network, the quantity of regulations on the experimentally validated genes can be calculated and an empirical P-value can be computed to specify whether this TF significantly regulates those experimentally validated genes. If a TF regulates more experimentally validated genes in the constructed gene regulatory network, it is a more real TF for the specific infection stage. To determine the empirical P-value for the validation of regulations of a TF, a null distribution is generated by repeatedly permuting the network structure of the candidate gene regulatory network and computing the number of regulations on the experimentally validated genes for each random network structure. The network structure permutations are performed while keeping the network size constant, that is, the target genes that a particular TF regulates are permuted without changing the total quantity of TF-gene regulatory associations of the gene regulatory network. The process is repeated 100,000 times and the empirical P-value for the observed quantity of regulations is estimated as the fraction of random network structures in which the quantity of regulations on the experimentally validated genes of the specific TF is at least as large as the quantity of regulations in the real network structure [4]. The quantities of regulations with P-value # .05 are determined as significant, and the corresponding TFs are identified as the significant TFs for a particular infection stage. Following from the significant TFs for each infection stage, the potential infection stage-associated genes are also identified as the ones that are regulated by most of those significant TFs (P-value # .05). From the constructed protein interaction network for C. albicans infection, we then identify whether or not the translated proteins of potential infection-associated genes lie closer to those proteins that have been experimentally validated at each infection stage. With a similar approach to the permutation of gene regulatory network structure, empirical P-values of the quantities of interactions on the experimentally validated proteins are computed for each potential infection-associated protein. The proteins with P-value # .05 are
II. Systems Infection Microbiology
122
5. Prediction of infection-associated genes via a cellular molecular network approach: A Candida albicans infection case study
determined to interact with many experimentally validated proteins in the PPI network and the corresponding genes are predicted as infection-associated genes of each infection stage. Consequently, starting from experimentally validated genes associated with C. albicans infection stages we could predict more genes that may be involved in the pathogenic mechanism responsible for the adhesion, invasion, and damage phenotypes at three infection stages.
5.3 Infection-associated genes via cellular molecular network approach 5.3.1 Prediction of Candida albicans infection-associated genes of each infection stage In this chapter the phenotype-associated gene prediction method will be applied to predict C. albicans infection-stage-associated genes and to investigate C. albicanshost interactions in the infection process. Based on the microarray data from Zakikhany et al. [143] and some database information, the C. albicans gene regulatory and protein interaction networks during infection are constructed [82]. In addition, based on the mutant phenotype information from the CGD and previous literature, three pools of experimentally validated genes, comprising of 55, 43, and 38 genes for the adhesion, invasion, and damage stages of infection respectively, are collected in Fig. 5.2 and Table 5.A1. According to the constructed cellular molecular networks and the experimentally validated gene pools, 4, 12, and 3 genes are predicted as adhesion-, invasion-, and damage-associated genes, respectively, during the C. albicans infection (Fig. 5.3). In the following, these genes will be further investigated to reveal the underlying mechanisms of C. albicanshost interactions in the infection process.
5.3.2 Investigation of Candida albicans adhesion associated genes at each infection stage The initial contact of C. albicans to host tissues can characterize the first step of C. albicans infection. It is a critical step for the establishment of mucosal infection since physical FIGURE 5.2 Venn diagram indicating numbers of experimentally validated genes for the three Candida albicans infection stages. The diagram shows the numbers of overlapping and nonoverlapping experimentally validated genes in each infection stage. There are 55, 43, and 38 genes in the adhesion, invasion, and damage stages of C. albicans infection, respectively. The complete lists of experimentally validated genes are shown in Table 5.A1 [7].
II. Systems Infection Microbiology
5.3 Infection-associated genes via cellular molecular network approach
123
FIGURE 5.3 The predicted Candida albicans infection-associated genes of three infection stages. The figure demonstrates this study’s predicted C. albicans infection-associated genes of each infection stage using a cellular network approach. The green, red, and blue circles indicate the adhesion, invasion, and damage stages of C. albicans infection, respectively. There are 4, 12, and 3 genes in each respective stage. The gene names are as listed in the CGD database. doi:10.1371/journal.pone.0035339.g003 [7]. CGD, Candida Genome Database.
contact to the host cells is sufficient to trigger C. albicans hyphal growth and biofilm development, facilitating invasion and damage of host cells [142,234]. At the beginning, adhesion is mediated by the interaction between the fungal cell wall and the surface of host cells. As the composition of C. albicans cell surface is continually changing, especially during the yeast-to-hyphal transition, adhesion is anticipated as a multifactorial process [235]. Several proteins have been identified as being involved in the adhesion process (Table 5.A1). These proteins include the agglutinin-like sequence (Als) family (e.g., Als3), hyphaeassociated proteins (Hwp1), cell wallassociated proteins (Eap1, Ecm33, Mp65, and Phr1), secreted proteins (Sap10), and some internal proteins (Rsr1, Big1) [142,146,235,236]. Although the deletion of these genes can result in the decreased adhesion of C. albicans to host cells, the molecular mechanisms by which these genes/proteins mediate adhesion are only partially understood. Moreover, it is still difficult to understand whether these genes/proteins mediate adhesion directly or indirectly, because many of them have complex cellular functions. In this chapter, four genes—CHS2, orf19.5627, SCS7, and UBI4—are identified as adhesion stage-associated genes during C. albicans infection (Fig. 5.3). CHS2 encodes one of the four chitin synthases in C. albicans, which can catalyze the synthesis of chitin. Chitin is an important structural polysaccharide in the fungal cell wall that is required for cell shape and morphogenesis [237]. Since adhesion is mediated by the interaction between C. albicans cell wall and host cells, it is reasonable to speculate that the synthesis of cell wall components may play a role in adhesion process. In fact, it has been shown that the inhibition of chitin synthase activity could result in a reduced adhesion of C. albicans to epithelial cells [238]. Therefore the mutation of CHS2 may lead to an impaired cell wall construction and thus influence the C. albicans adhesion to host cells. In addition to chitin, other cell wall components may also affect C. albicans adhesion. Tsai et al. have recently identified the human antimicrobial peptide LL-37, which could reduce C. albicans infectivity by inhibiting its adhesion [239]. Recently, it has been found that the inhibitory effects of LL-37 on cell adhesion could be actualized through interacting with cell wall carbohydrates. As a result, cell wall components may be selected as potential therapeutic targets for the prevention of C. albicans colonization and infection. SCS7 encodes a putative ceramide hydroxylase, which is involved in sphingolipid biosynthesis [240]. Sphingolipids are a class of important membrane lipid components that have been found to play critical
II. Systems Infection Microbiology
124
5. Prediction of infection-associated genes via a cellular molecular network approach: A Candida albicans infection case study
roles in the regulation of several pathobiological processes [241]. However, the roles of sphingolipids in fungal infections have not been well characterized since the biological functions of fungal sphingolipids have been studied almost exclusively in nonpathogenic fungi such as S. cerevisiae [241]. Although further studies are needed, it has been shown that the disruption of sphingolipid synthesis can reduce C. albicans adhesion [200]. Therefore SCS7 may also play some roles in cell adhesion. In addition, several studies have indicated that SCS7 is also downregulated by iron deprivation [240,242]. As a result, iron deficiency may affect the remodeling of membrane lipids and sphingolipid homeostasis [240], thereby leading to an impact on C. albicans infections. The UBI4 gene is found to encode polyubiquitin, an ubiquitin precursor protein. Ubiquitination is an addition of ubiquitin to a protein substrate, a fundamental regulatory posttranslational modification event. With the combination of molecular, cellular, and proteomic approaches, Leach et al. have suggested that ubiquitination could contribute to the regulation of several key cellular processes in C. albicans, including cell cycle progression, morphogenesis, stress adaptation, and metabolic reprogramming [243]. Further, it has also been shown that UBI4 inactivation can lead to an attenuation in the virulence of C. albicans [243], highlighting the significance of UBI4 during the C. albicans infection. Specifically, UBI4 mutants has displayed a higher sensitivity to cell wall stress, such as antifungal drugs targeting chitin and glucan biosynthesis, indicating that ubiquitination could influence cell wall remodeling [243]. Consequently, the cell adhesion of C. albicans may be indirectly regulated by ubiquitination. orf19.5627 is an uncharacterized gene of unknown function. Further research is necessary to examine its relation with cell adhesion.
5.3.3 Investigation of Candida albicans invasion stage-associated genes During infection, C. albicans can utilize two distinct pathogenic mechanisms to invade host cells: induced endocytosis and active penetration [142,146,235]. Since yeast cells do not appear to induce their own uptake into host cells and apparently not penetrate into host cells [145], it is found that morphogenesis, or yeast-to-hyphal transition, is a critical attribution to C. albicans invasion. The induced endocytosis is the process by which a fungal invasion protein interacts with a host surface protein, triggering the pseudopod formation and fungal engulfment to invade into the host cell [235]. Two invasion proteins, Als3 and Hsp70, have been identified in C. albicans [235]. The endocytosis process is found to be a host actin-dependent as well as host-driven process, as the killed fungal hyphae can be endocytosed [142,145]. Unlike the induced endocytosis, active penetration, the other pathogenic mechanism responsible for C. albicans invasion, is fungal-driven and results in hyphal penetration either directly into host cells or at intercellular junctions [142,235]. However, the active penetration process is not well studied, and the question of which particular fungal proteins contribute to the active penetration process is still unclear. Further, it should be mentioned that C. albicans invasion also depends on host cell type: While the invasion into oral cells occurs via both pathogenic mechanisms, the invasion into intestinal cells occurs only via active penetration process [145]. By the cellular molecular network approach, a total of 12 genes are predicted as C. albicans invasion stage-associated genes as shown in Fig. 5.3. CHS2, SCS7, and UBI4 are predicted as both adhesion and invasion stage-associated genes, and SCS7 and
II. Systems Infection Microbiology
5.3 Infection-associated genes via cellular molecular network approach
125
UBI4 are also identified as damage stage-associated genes. The mutation of CHS2 has been found to have a significant effect on the chitin content of hyphal cells but not on yeast cells [244], indicating that CHS2 may be involved in the hyphal growth. In addition, the blockage of sphingolipid biosynthesis could lead to abnormal hyphal morphogenesis [245] and ubiquitination is suggested to regulate morphogenesis in C. albicans [243]. Since hyphal growth is crucial for C. albicans invasion, CHS2, SCS7, and UBI4 may contribute to this invasion process. CDC20 can encode a protein that is required for the metaphase-to-anaphase transition and mitotic exit during the cell cycle of C. albicans. Depletion of Cdc20 in a mutant strain can result in highly polarized growth of yeast buds under yeast growth conditions but has no influence on serum-induced hyphal growth [246]. Further studies are needed to investigate the association between Cdc20-mediated cell cycle progression and cell invasion. HSL1 is found to encode a protein kinase that has been shown to play a role in the suppression of cell elongation. HSL1 knockout has been shown an elongated cell phenotype in both yeast- and hyphae-inducing media [247]. We thus may speculate that the mutation of HSL1 might promote the cell invasion ability during the infection due to the enhanced hyphal growth. SMI1 is found to encode a regulator of glucan synthesis. Mutation of SMI1 could affect biofilm matrix and cell wall β-1, 3-glucan production [248], which could have further influence on biofilm-associated drug resistance mechanisms. Kitamura et al. have revealed that the inhibitors of β-1,6-glucan could reduce the hyphal elongation during the C. albicans invasion process [249], indicating a role of SMI1 in C. albicans invasion. SSN6 is also found to encode a putative transcriptional regulator. The deletion of SSN6 could result in the defective hyphal development, while the overexpression of SSN6 could lead to an enhanced filamentous growth [250], suggesting that SSN6 may regulate filamentous growth and thus may regulate cell invasion. GγP1, IRA2, MTO1, PHO23, and orf19.6883 are all uncharacterized genes being predicted as C. albicans invasion stage-associated genes in this study. Although their cellular functions need to be further characterized, it seems that they could contribute to host cell invasion either directly or indirectly.
5.3.4 Investigation of Candida albicans damage stage-associated genes The last stage of C. albicans infection is found with a substantial cell damage and destruction of host tissues. However, the pathogenic molecular mechanism by which C. albicans could induce host cell damage is poorly understood. It was originally thought that host cell invasion could induce cell damage. Nevertheless, experiments have shown that some C. albicans mutant strains with normal adhesion and endocytosis are still unable to induce cell damage [144]. Therefore the active penetration of fungal hyphae appears to be more essential for the induction of cell damage, yet the hypothesis remains to be validated by experiment. Numerous factors, such as hyphal formation and secreted lytic enzymes [146,235], have been indicated as contributors to host cell damage, but more than just these, factors are likely required for tissue destruction. However, other fungal factors required for cell damage are not identified yet. SSN6, SCS7, and UBI4 are predicted as C. albicans damage stage-associated genes in this study as shown in Fig. 5.3. Since SSN6 can encode a putative transcriptional regulator,
II. Systems Infection Microbiology
126
5. Prediction of infection-associated genes via a cellular molecular network approach: A Candida albicans infection case study
it may affect cell damage indirectly via regulating damage-associated genes. SCS7 and UBI4 are identified as genes associated with adhesion, invasion, and damage. Although there is no clear evidence to indicate how sphingolipid biosynthesis and ubiquitination induce host cell damage during C. albicans infection, these two cellular processes are worth for further investigation.
5.4 Discussion and conclusion Candida albicans infection has emerged as a significant cause of the mortality in humans. Even several attributions have been associated with C. albicans pathogenesis; however, C. albicans infection is a complex process and the understanding of the underlying pathogenic mechanism remains relatively limited. Due to the complexity of the hostpathogen interaction during infection process, systems biology approaches are more suitable for investigating pathogenic mechanism during the infection process of C. albicans. Unlike the traditional biological research, which has intensely focused on individual components (genes/proteins) involved in biological processes, systems biology approach addresses to biological phenomena from the systems perspective [251,252]. By the integration of highthroughput omics data, systematic models can be constructed to describe the interactions between the biological components of a complex system. Through model development and system analysis, critical components of a system, such as the complex hostpathogen system, can be discovered. We first developed a network comparison framework and then identified 23 potential TFs controlling C. albicans biofilm formation [4]. Further experiments have shown that mutations in some identified genes can result in the alteration of biofilm formation, validating the proposed systems biology method. In this chapter, based on the principles that the proteins that are closer to one another in the PPI network are more likely to have similar functions, and that genes regulated by the same TFs tend to have similar functions, a cellular molecular network approach is proposed to predict phenotype-associated genes that are responsible for the three infection stages of C. albicans infection. A total of 4, 12, and 3 genes are, respectively, predicted as adhesion-, invasion-, and damage-associated genes during three stages of C. albicans infection. Based on these results, we have found that the predicted adhesion stage-associated genes all contribute to the regulation or synthesis of cell surface components. Therefore we conclude that the cell surface components and their related proteins might play pivotal roles in cell adhesion. In fact, the targets of existing antifungal agents are mainly located on the cell surface [253]. Consequently, the newly predicted genes might become promising therapeutic candidates or provide a guidance for the discovery of novel therapies. Further, in light of the predicted invasion stage-associated genes, morphogenesis has emerged as a critical feature for cell invasion. All predicted genes are associated with morphogenesis directly or indirectly. Even morphogenesis is not necessarily a pathogenicity determinant; it is sure a key characteristic of host cell invasion. Therefore signaling pathways that transduce hyphae-inducing signals and regulate downstream TFs may be potential candidates for characterizing cell invasion mechanisms in C. albicans infection. Further, the targeted inhibition of the yeast-to-hyphal transition may be attractive options for controlling
II. Systems Infection Microbiology
5.4 Discussion and conclusion
127
C. albicans infection [254]. Unlike the adhesion and invasion processes, where molecular mechanisms are partially understood, little is known about the damage mechanism during C. albicans infection. If more genes can be experimentally characterized, more damage stage-associated genes could be predicted. Therefore we might be able to better infer the damage mechanism during C. albicans infection. Even if there are distinctive characteristics for the pathogenic mechanisms in the three stages of C. albicans infection, these processes are not necessarily mutually exclusive and are likely to involve significant overlap in cellular functions [235]. Based on the experimentally validated and predicted genes examined in this study (Figs. 5.2 and 5.3, and Table 5.A1), we have also found that several genes contribute to more than one stage of C. albicans infection. A previous study has indicated that the core elements of the cAMPPKA pathway are required for all stages of infection [144]. We have further found that SCS7 and UBI4 may be also involved in the cAMP-PKA pathway since they are predicted to be important in all three stages of C. albicans infection according to our results. On the other hand, some genes are predicted or verified as stage-specific genes, that is, they only contribute to one stage of C. albicans infection. These stage-specific genes can be used to identify infectious factors, which are specifically involved in a certain infection stage. Using the cellular molecular network approach to predict phenotypeassociated genes is not only useful for the investigation of C. albicans infection; it can also be employed to predict infection-associated genes under different experimental conditions and in different organisms, if the required data are available and provided. While the approach has been shown to be useful, some improvements can be made. An apparent limitation is that some data may be needed from other organisms different from the organism of interest. In this chapter, due to a lack of sufficient information in C. albicans PPIs, such information is completely inferred from the model organism S. cerevisiae with the help of ortholog mapping data. Comparative genomics to S. cerevisiae can provide an alternative way to infer potential PPIs in C. albicans. However, using information derived from S. cerevisiae may lead to misinterpretation. Recently, many studies have focused on the development of PPI assay tools in C. albicans. For instance, Boysen et al. have developed a vesicle targeting method to detect PPIs in C. albicans [255]. Stynen et al. have constructed a functional C. albicans two-hybrid system [256]. While these tools have been developed, we can expect PPIs to be massively screened in the near future. Once reliable information regarding C. albicans PPIs becomes available, more accurate cellular molecular networks can be constructed, more significant phenotype-associated genes can be predicted by the proposed systems biology method, and so the defensive and invasive molecular mechanisms of the three C. albicans infection stages can be further investigated. This chapter has emphasized more heavily on the pathogen aspect in the hostpathogen interactions during C. albicans infection. Nevertheless, to better comprehend the interactions between the pathogen and the host, it is crucial to understand the host defense processes combating the pathogen, especially immune responses. Therefore if the defensive and invasive mechanisms of such hostpathogen interactions could be elucidated simultaneously, the comprehensive knowledge would be useful in the identification of biomarkers for drug targets of novel therapies to treat C. albicans infection. These will be the topics in the following chapter.
II. Systems Infection Microbiology
128
5. Prediction of infection-associated genes via a cellular molecular network approach: A Candida albicans infection case study
5.5 Appendix 5.5.1 Methods of cellular molecular network construction For a target gene i in the candidate gene regulatory network, the dynamic transcriptional regulatory model of the gene expression is described by the following equation: xi ½t 1 1 5 xi ½t 1
Ni X
aij zj ½t 2 λi xi ½t 1 ki 1 εi ½t
(5.A1)
j51
where xi ½t represents the gene expression level at time t for target gene i, aij denotes the regulatory ability of the jth transcription factor (TF) toward the ith target gene, zj ½t represents the regulation function of the jth TF, λi indicates the mRNA degradation effect, ki represents the basal level, and εi ½t denotes the stochastic noise due to the model uncertainty and the measurement noise of microarray data. The regulation function zj ½t can be modeled as the sigmoid function of yj ½t (the protein expression of TF j) [257,258]: zj ½t 5 fj yj ½t 5
1 n o 1 1 exp 2 yj ½t 2 μj =σj
(5.A2)
where fj denotes the sigmoid function, μj and σj represent the mean and standard deviation of protein expression level of TF j, respectively. Similarly, for a target protein n in the candidate protein interaction network, the dynamic model of the protein expression was as follows [82]: yn ½t 1 1 5 yn ½t 1
Mn X
bnm yn ½t ym ½t 1 αn xn ½t 2 β n yn ½t 1 hn 1 ωn ½t
(5.A3)
m51
where yn ½t represents the protein expression level at time t of target protein n, bnm denotes the interaction ability of the mth interactive protein to the nth target protein, ym ½t represents the protein expression level of the mth protein interacting with target protein n, αn denotes the translation effect from mRNA to protein, xn ½t represents the mRNA expression level of the corresponding target protein n, β n indicates the protein degradation effect, hn represents the basal expression level, and ωn ½t is the stochastic noise. After the dynamic models of candidate gene regulatory network and candidate protein interaction network were built, the regulatory/interaction parameters in the models have to be identified with the help of time-course microarray data. The strategy is to identify the network parameters gene by gene (and protein by protein) by solving a constrained least square parameter estimation problem. Eq. (5.A1) can be rewritten as the following regression form: 2 3 ai1 6 7 ^ 7 6 6 7 ½ ½ ½ ½ z t ? z t x t 1 xi t 1 1 5 1 Ni i 6 aiNi 7 1 εi ½t (5.A4) 4 ð1 2 λi Þ 5 ki φi ½t θi 1 εi ½t
II. Systems Infection Microbiology
129
5.5 Appendix
where φi ½t denotes the regression vector that can be obtained from the microarray data. θi is the parameter vector of the target gene i which is to be estimated. In order to avoid overfitting when identifying the regulatory parameters, the cubic spline method is also used to interpolate extra time points for gene expression data. By the cubic spline method, we can easily get the values of zj ½tl xi ½tl for lAf1; 2; . . .; Lg and jAf1; 2; . . .; Ni g, where L is the number of expression time points of a target gene i and Ni is the number of TFs binding to the target gene i. Eq. (5.A4) at different time points can be arranged as follows: 2 3 2 3 2 3 xi ½t2 εi ½t1 φi ½t1 6 xi ½t3 7 6 φi ½t2 7 6 εi ½t2 7 6 7 6 7 6 7 (5.A5) 4 ^ 5 5 4 ^ 5 θi 1 4 ^ 5 xi ½tL φi ½tL21 εi ½tL21
For simplicity the notations Xi , Φi , and Ei are defined to represent Eq. (5.A5) as follows:
Xi 5 Φi θi 1 Ei
(5.A6)
The constrained least square parameter estimation problem is formulated as follows: 1 2 min :Φi θi 2Xi :2 such that Aθi # b (5.A7) θi 2 where A 5 0 ? 0 0 2 1 , b 5 0 gives the constraints to force the basal level ki in Eq. (5.A1) to be always nonnegative, that is, ki $ 0. The constrained least square problem can be solved using the active set method for the quadratic programming [259]. Similarly, Eq. (5.A3) can be rewritten in the following regression form: 3 2 bn1 7 6 ^ 7 6 6 bnMn 7 7 yn ½t 1 1 5 yn ½ty1 ½t ? yn ½tyMn ½t xn ½t yn ½t 1 6 6 αn 7 1 ωn ½t (5.A8) 7 6 4 ð1 2 β n Þ 5 hn ψn ½t ηn 1 ωn ½t
where ψn ½t represents the regression vector and ηn is the parameter vector to be estimated. By the cubic spline method, at different time points, Eq. (5.A8) can be presented as the following equation:
Yn 5 Ψn ηn 1 Ωn
(5.A9)
The identification problem was then formulated as follows: 1 2 (5.A10) min :Ψn ηn 2Yn :2 such that Cηn # d ηn 2 T where C 5 diag 0 ? 0 2 1 0 2 1 and d 5 0 ? 0 , indicating that the translation effect αn and the basal expression level hn are nonnegative. Since there are no good data available for genome-wide protein expression levels in Candida albicans, mRNA expression profiles are used to substitute for the protein expression levels when
II. Systems Infection Microbiology
130
5. Prediction of infection-associated genes via a cellular molecular network approach: A Candida albicans infection case study
identifying the interaction parameters. Actually, if the ratio from mRNA expression to protein expression is a constant, the ratio in Eq. (5.A3) could be canceled. Therefore mRNA expression could represent protein expression in Eq. (5.A3) in the above parameter identification procedure. Once the regulatory abilities aij and interaction abilities bnm were estimated, the AIC [40,81] is applied to detect the significant regulations and interactions in candidate gene regulatory network and protein interaction network. AIC, which includes both estimated residual error and model complexity in one statistics, quantifies the relative goodness of fit of a model. For a transcriptional regulatory model with Ni regulatory parameters (or TFs) to fit with data from L samples, the AIC can be written as follows [40,81]:
T 2N 1 i Xi 2 X^ i (5.A11) Xi 2 X^ i AICðNi Þ 5 log 1 L L
^ ^ where X^ i denotes the estimated expression profile of the ith target gene, that is, Xi 5 Φi θi , and σ^ 2i 5 1=L ðXi 2 X^ i ÞT ðXi 2 X^ i Þ is the estimated residual error. As the residual error σ^ 2i decreases, the AIC decreases. In contrast, while the number of regulatory TFs (or parameters) Ni increases, the AIC increases. Therefore there is a trade-off between residual error and model order. As the expected residual error decreases with increasing regulatory TF numbers in models of inadequate complexity, there should be a minimum around the real regulatory TF number. The minimization achieved in Eq. (5.A11) will indicate the real model order (i.e., the real number of TFs that regulate the target gene) of the transcriptional regulatory system. With the statistical selection of Ni regulatory TFs by the minimization of the AIC, the question of whether a regulatory TF is a real one or just a false positive for the ith target gene can be determined. In this way the candidate gene regulatory network is refined by pruning the false positives out of the real number of TFs and the real gene regulatory network for C. albicans infection was constructed. Similarly, the significant protein interaction network for C. albicans infection can also be constructed with AIC. TABLE 5.A1 Experimentally validated genes for the three Candida albicans infection stages. Gene
orf
Adhesion
Invasion
Damage
References
AHR1
orf19.7381
V
ALS1
orf19.5741
V
ALS2
orf19.1097
V
ALS3
orf19.1816
V
ALS4
orf19.4555
V
[177]
ALS9
orf19.5742
V
[178]
ASC1
orf19.6906
V
BCR1
orf19.723
V
BIG1
orf19.2334
V
[172] V
V
[175,176] V
[177]
V
[144]
V
[179] V
[144] [180] (Continued)
II. Systems Infection Microbiology
131
5.5 Appendix
TABLE 5.A1
(Continued)
Gene
orf
Adhesion
Invasion
Damage
BMH1
orf19.3014
BUD2
orf19.940
CDC10
orf19.548
V
[182]
CDC11
orf19.5691
V
[182]
CDC14
orf19.4192
V
[183]
CKA2
orf19.3530
V
CSH1
orf19.4477
V
CZF1
orf19.3127
V
DCK1
orf19.815
DEF1
orf19.7561
DFG16
orf19.881
DFI1
orf19.7084
DRG1
orf19.5083
EAP1
orf19.1401
V
ECM33
orf19.3010.1
V
V
V
[144]
EFG1
orf19.610
V
V
V
[144]
FET3
orf19.4211
V
FTR1
orf19.7219
GPA2
orf19.1621
GPD2
orf19.691
GPR1
orf19.1944
GUP1
orf19.4985
HDA1
orf19.2606
HGC1
orf19.6028
V
HIS4
orf19.5639
V
HSP70
orf19.4980
HWP1
orf19.1321
V
HWP2
orf19.3380
V
ICL1
orf19.6844
INT1
orf19.4257
V
[199]
IPT1
orf19.4769
V
[200]
V V
[181] V
V
V
V
[144]
[144] [185]
V
[144]
V
[186]
V
[187]
V
[188] [189]
[190] V V
[191] [192]
V
V
[144]
[184]
V
V
References
[144]
V
[193]
V
[194]
V
[195]
V
V
[144] [196]
V
V
[197]
V
[144] [198]
V
V
[144]
(Continued)
II. Systems Infection Microbiology
132
5. Prediction of infection-associated genes via a cellular molecular network approach: A Candida albicans infection case study
TABLE 5.A1 (Continued) Gene
orf
Adhesion
Invasion
Damage
IRS4
orf19.6953
V
KEX2
orf19.4755
KRE5
orf19.290
LMO1
orf19.5147
MKC1
orf19.7523
MNT1
orf19.1665
V
[205]
MNT2
orf19.1663
V
[205]
MP65
orf19.1779
V
[206]
PDE2
orf19.2972
V
PEP7
orf19.5662
V
[208]
PGA1
orf19.7625
V
[209]
PGA34
orf19.2833
PHR1
orf19.3829
PLD1
orf19.1161
PMT1
orf19.5171
PMT2
orf19.6812
PMT4
orf19.4109
PMT6
orf19.3802
V
PRA1
orf19.3111
V
RAC1
orf19.6237
RAS1
orf19.1760
RHD3
[201] V
[202]
V
[203] V
[204] V
V
[144]
V
[210,211]
V
[212]
V V
V
[213,214]
V
[144]
V
[214]
V
[214,215] [216]
V V
[144]
[207]
V V
References
V
[144]
orf19.5305
V
[218]
RHR2
orf19.5437
V
[144]
RIM101
orf19.7247
V
[144]
RIM13
orf19.3995
V
[211]
RIM20
orf19.4800
V
[211]
RIM8
orf19.6091
V
[211]
RSR1
orf19.2614
V
V
[144]
SAP1
orf19.5714
V
V
[219,220]
SAP10
orf19.3839
V
V
[221]
SAP2
orf19.3708
V
V
[219,220]
V
V
[217]
V
(Continued)
II. Systems Infection Microbiology
133
5.5 Appendix
TABLE 5.A1
(Continued)
Gene
orf
Adhesion
Invasion
Damage
References
SAP3
orf19.6001
V
SAP9
orf19.6928
SET1
orf19.6009
SFL2
orf19.3969
V
V
[223,224]
SHE3
orf19.5595
V
V
[225]
SIT1
orf19.2179
V
STE2
orf19.696
V
[227]
SUN41
orf19.3642
V
[228]
SUR7
orf19.3414
TEC1
orf19.5908
V
V
V
[144]
TPK1
orf19.4892
V
V
V
[144]
TPK2
orf19.2277
V
V
V
[144]
TPS1
orf19.6640
V
TUP1
orf19.6109
V
V
UTR2
orf19.1671
V
V
VPS11
orf19.4403
V
V
VPS34
orf19.6243
V
YCK2
orf19.7001
[219] V
V
[221] [222]
[226]
V
[229]
[230] V
[231] V
[144] [232]
V
II. Systems Infection Microbiology
[144]
[233]
C H A P T E R
6 Global screening of potential Candida albicans biofilm-related transcription factors by network comparison via big database mining and genome-wide microarray data identification 6.1 Introduction Candida albicans, the most commonly isolated opportunistic human fungal pathogen, can cause skin and mucosal infections as well as life-threatening systemic infections [155,260]. In healthy individuals, C. albicans occurs as a dimorphic commensal colonizer of mucosal membranes in the oral cavity, gastrointestinal tract, urogenital mucosa, and vagina. In immunocompromised patients, including those undergoing cancer chemotherapy, organ, or bone marrow transplantation and those are acquired immune deficiency syndrome sufferers, this organism can become pathogenic, resulting in proliferative growth on mucosal surfaces locally and systemically [261263]. Candida infections, or candidiasis, are difficult to treat and create a very serious challenge in medicine. Mortality rates among patients with candidiasis have been increasing and can be as high as 40%60%, especially for those who have bloodstream infections (candidemia) [264266]. Therefore to understand the molecular mechanisms underlying the pathogenicity of C. albicans is imperative for the management of such infections. Biofilm formation plays an important role in the pathogenicity of C. albicans. For example, biofilm can serve as reservoirs for the cells to continually seed infection. Moreover, C. albicans biofilm cells are much more resistant than free-living planktonic cells to many antifungal agents. As a result, the biofilm-specific property of C. albicans cells has prompted recent interests in the study of biofilm structure, physiology, and regulation, that is, the research into the pathogenicity of Candida focusing on the prevention and management of biofilm development and antifungal resistance [264,267]. Biofilms are defined as surface-associated communities of cells surrounded by an extracellular matrix, displaying phenotypic features
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00016-X
135
© 2021 Elsevier Inc. All rights reserved.
136
6. Global screening of potential Candida albicans biofilm-related transcription
that differ from their planktonic counterparts [268,269]. The development of C. albicans biofilm can be divided into four sequential steps. First, the yeast cells adhere to a foreign substrate (host tissue or medical device). Second, the yeast cells proliferate across the substrate surface and pseudohyphae and hyphae begin to develop. Third, the extracellular matrix is produced and the network of pseudohyphal and hyphal cells is embedded within this matrix. Biofilm will then mature into a complex three-dimensional structure. Finally, the progeny biofilm cells disperse to enable remote surfaces to be populated [264,267,268]. Although previous studies have provided some insights, the details of molecular mechanisms that are responsible for biofilm formation still await to be elucidated. Recently, the C. albicans genome for strain SC5314 was sequenced [155], revealing that almost two-thirds of its B6000 open reading frames (ORFs) are orthologous to genes of Saccharomyces cerevisiae, a well-studied model organism and the first eukaryotic organism to have its entire genome sequenced [260,262,270]. In addition, the ease of genetic/molecular manipulation and the development of various tools for genome-wide functional analysis have led to accumulate a large amount of data from the study of S. cerevisiae. Since C. albicans and S. cerevisiae are closely related, that is, both fall within the hemiascomycete group, the information from S. cerevisiae could be adapted and useful for our understanding in C. albicans biology and pathogenesis [260,271]. We are investigating the underlying molecular mechanisms that are responsible for the biofilm formation in C. albicans. Specifically, it is aimed to unravel what makes the difference between biofilm and planktonic cells from the gene regulatory network point of view. Gene regulatory networking is achieved by the action of multiple TFs binding to cis-regulatory DNA elements of the target genes, in response to different environmental signals. Since TF are central to gene regulatory networks, in this chapter, we will develop a computational framework for global screening of potential C. albicans biofilm-related TFs via network comparison (Fig. 6.1). We integrated different kinds of data from the genome-scale analysis, including gene expression profiles of biofilm formation from C. albicans [261], regulatory associations between TFs and genes adopted from S. cerevisiae [157,272], ortholog data between C. albicans and S. cerevisiae genes [159], and Gene Ontology (GO) [273]. By using this information the gene regulatory networks for biofilm and planktonic cells were constructed separately. These gene regulatory networks were then compared based on the network structure to reveal their differences and to identify their relevance to biofilm formation for each TF via the so-called gain-of-function and loss-of-function subnetworks. The significance of the potential TFs was determined by statistical analysis. A total of 23 TFs are identified to be related to the biofilm formation; 10 of them are previously reported by literature evidences. These results indicate that our approach can be useful to reveal TFs significant in biofilm formation and importantly provide new targets for further studies to understand the regulatory mechanisms in biofilm formation and the fundamental difference between biofilm and planktonic cells.
6.2 Systems methods of screening biofilm-related transcription factors 6.2.1 Overview of the proposed systems screening method The systems method of the global screening for biofilm-related TFs is divided into three essential steps: (1) selection scheme for TFs and genes, (2) construction scheme for gene
II. Systems Infection Microbiology
6.2 Systems methods of screening biofilm-related transcription factors
137
FIGURE 6.1 The flowchart of the proposed systems biology method for screening potential biofilm-related TFs in Candida albicans [4]. TFs, Transcription factors.
regulatory network, and (3) comparison scheme between two networks of biofilm and planktonic cells. The output of the proposed systems method of screening biofilm-related TFs is a score named RV for each TF. RV is computed to correlate a TF with the regulation of biofilm formation. A higher score suggests that the particular TF is more likely involved in the gene regulatory network for C. albicans biofilm formation. Based on the RVs, the biofilm-related TFs are related. The whole process of the proposed screening method of biofilm-related TFs is shown in Fig. 6.1. The data used and the details of each step are described in the following sections.
6.2.2 Data used in the proposed systems screening method of biofilm-related transcription factors In this chapter, four kinds of data are employed for systems screening method, including integrated-microarray gene expression profiles, regulatory associations between TFs and genes, ortholog data between C. albicans and S. cerevisiae genes, and GO annotation information. The microarray data are obtained from Murillo et al. [261], in which genome-wide transcription analysis of biofilm formation is profiled using Affymetrix oligonucleotide
II. Systems Infection Microbiology
138
6. Global screening of potential Candida albicans biofilm-related transcription
GeneChip representative of the entire genome of C. albicans. Further, the DNA microarray includes 7116 ORFs and each microarray experiment is performed in duplicate [261]. The resulting time-course microarray data also contain two sets of information for biofilm and planktonic cells, generated from early stages of biofilm formation (0390 min, 6-time points). The regulatory associations between TFs and genes are obtained from S. cerevisiae using YEAS-TRACT database http://www.yeastract.com/ and genome-wide location analysis of yeast TFs from Harbison et al. [272]. YEASTRACT (Yeast Search for Transcriptional Regulators And Consensus Tracking) can provide more than 34,469 regulatory associations between TFs and target genes in S. cerevisiae, based on more than 1000 bibliographic references [157]. The genome-wide location analysis also allows proteinDNA interactions to be monitored across the entire yeast genome by combing a modified chromatin immunoprecipitation (ChIP) procedure with DNA microarray analysis. Based on Harbison et al. [272], the genomic occupancy of 203 DNA-binding TFs in S. cerevisiae can be determined. The P-value threshold for significant binding is selected as P # .001 in this study since their analysis has indicated that the threshold can maximize the inclusion of legitimate regulatorDNA interactions and can also minimize the false positives [272]. The ortholog data between C. albicans and S. cerevisiae genes are retrieved from Candida Genome Database or CGD http://www. candidagenome.org/ [159]. Gene orthology and its best hit mappings have been used to correlate S. cerevisiae genes with C. albicans genes using the InParanoid program [274]. The annotations for C. albicans genes are acquired from the GO [273]. The GO annotations are also facilitated to query for the cellular molecular function or biological process of a gene-ofinterest in this study. Idea to use these data for screening of biofilm-related TFs is further described in detail in the following sections.
6.2.3 Selection scheme for transcription factors and target genes of biofilm formation and development To select TFs and genes for the construction of gene regulatory network of biofilm formation and development, we will recruit as many TFs as possible in this step. Taking advantages of the fact that C. albicans and S. cerevisiae are closely related and S. cerevisiae has been much better characterized than C. albicans, the information derived from S. cerevisiae is adopted and used in this chapter. If an S. cerevisiae TF has an ortholog found in C. albicans, the ortholog will be assigned as a TF in C. albicans. An example is shown in Fig. 6.2. Ste12 is a well-known transcription factor in S. cerevisiae and has a good sequence homolog (named Cph1) in C. albicans, this Cph1 protein is thus indicated as a TF in C. albicans. In this way, TFs are pooled together and will be selected for the screening of biofilmrelated TFs by the proposed systems biology method. Notably, some particular C. albicans TFs, which have either not been included in the microarray data or lack of association information with target genes, will be excluded from the TF pool. As for the selection of target genes, GO annotations are employed [273]. An assumption of the proposed screening method is based on the fact that if a TF regulates gene expression in biofilm cells rather than in planktonic cells, then this particular TF is more likely involved in the regulatory mechanism to govern biofilm formation. Therefore the genes, which are annotated with GO terms such as biofilm formation, or those which are possibly related to different steps of biofilm formation and development, such as cell adhesion, and
II. Systems Infection Microbiology
6.2 Systems methods of screening biofilm-related transcription factors
139
FIGURE 6.2 An example for the illustration of Candida albicans TFgene regulatory association with Saccharomyces cerevisiae TF gene [4]. TF, Transcription factor.
filamentous growth, are all selected for further analysis. However, if the selected target genes of C. albicans are not included in gene expression profiles or have no ortholog mapping data with S. cerevisiae genes, then they are all excluded for the subsequent steps. The regulatory associations between TFs and genes in S. cerevisiae from YEASTRACT database [157] and Harbison et al. [272] will be used to infer the possible TFgene regulatory associations in C. albicans. An example for this step is illustrated in Fig. 6.2. Borneman et al. [275] have identified Ste12-MUC1 association by ChIP-chip experiment with a P-value 5 2e 2 15 and the result could be found in the YEASTRACT database. According to CGD, the TF Ste12 and its target gene MUC1 in S. cerevisiae have orthologs Cph1 and HWP1 in C. albicans, respectively. Therefore based on the experimental results from S. cerevisiae, the possible associations between Cph1 and HWP1 in C. albicans will be inferred in the following.
6.2.4 Gene regulatory network reconstruction method From the first step described earlier, we have selected TFs, their potential target genes, and their possible regulatory associations. This information will be used to further constitute the candidate gene regulatory network. A stochastic system dynamic model is then applied to prune the candidate gene regulatory network to obtain the gene regulatory networks for biofilm and planktonic cells, respectively, according to their respective data sets. For a target gene i in the candidate gene regulatory network, the gene is described in the following equation using the stochastic discrete dynamic [257]: xi ½t 1 1 5 xi ½t 1
Ni X
aij zj ½t 2 λi xi ½t 1 κi 1 εi ½t
(6.1)
j51
where xi ½t denotes the gene expression level at time t for the particular gene i, aij represents the regulatory ability of the jth TF toward the ith target gene with a positive value
II. Systems Infection Microbiology
140
6. Global screening of potential Candida albicans biofilm-related transcription
indicating gene activation and a negative value indicating gene repression, zj ½t indicates the regulation function of the jth TF (the Ni TFs binding to the target gene i are retrieved from the candidate gene regulatory network), λi represents the degradation effect of the present time t on the next time t 1 1, κi denotes the basal level of expression, and εi ½t represents the stochastic noise due to the model uncertainty and the measurement noise of the DNA microarray data. It has been shown that TF binding usually could affect gene expression in a nonlinear form, that is, below some level of protein concentration a TF has no effect, while above a certain expression level the effect of the TF may become saturated [276,277]. Thus the regulation function zj ½t could be modeled as the sigmoid function, which is one kind of Hill function, of yj ½t (the protein concentration profiles of TF j) as the following equation [257,277279]: zj ½t 5 fj yj ½t 5
1 1 1 expf2ðyj ½t 2 μj Þ=σj g
(6.2)
where fj(·) denotes the sigmoid function, μj and σj indicate the mean and standard deviation of protein concentration level of TF j. The biological meaning of Eq. (6.1) is that the gene expression of the target gene i at the next time t 1 1 is to be determined by the present gene expression, the present regulation function of Ni TFs binding to this target gene, the degradation effect of the present time, the basal level of gene expression, and some stochastic noises. For each target gene selected from the previous selection scheme, a stochastic system dynamic model for gene regulation is constructed. Therefore the stochastic system dynamic equations for all the target genes are employed for the mathematical system model of the candidate gene regulatory network. After constructing the stochastic system dynamic model of the candidate gene regulatory network, the microarray gene expression profiles are then employed to identify the regulatory parameters in Eq. (6.1). Since the DNA microarray data for gene expression profiles of biofilm and planktonic cells are collected independently, the gene regulatory networks of biofilm and planktonic cells can be separately reconstructed. The identification of the gene regulatory network is performed gene by gene so that the identification process is not limited by the number of target genes. Due to the nonnegativity of basal level of expression [κi $ 0 in Eq. (6.1)], the constrained least-squares regression method is employed to identify the regulatory parameters of gene regulatory network [259,280] (see Appendix 1 for details). Moreover, since there are no good data available for genomewide protein concentration levels in C. albicans, gene expression profiles are also used in regulation function in (6.2) identifying the parameters aij. Once the regulatory parameters are identified, the significant TFgene interactions were determined based on the identified aij’s. By means of Akaike information criterion (AIC) [40,81] and student’s t-test [281,282], we can determine the statistical significance of the interactions between TFs and genes. After pruning the false positives from the candidate gene regulatory network, we can reconstruct the gene regulatory networks for biofilm and planktonic cells (see Fig. 6.A1 in Appendix 2 and Fig. 6.A2 in Appendix 3). The resulting biofilm and planktonic gene regulatory networks and the significant TFgene interactions among them will be then used for comparison scheme between gene regulatory networks of biofilm and planktonic cells to investigate the mechanism of biofilm formation and development.
II. Systems Infection Microbiology
141
6.2 Systems methods of screening biofilm-related transcription factors
6.2.5 Comparison scheme between two gene regulatory networks of biofilm and planktonic cells After the selection of TF/target genes and the construction of gene regulatory network in Appendix, the gene regulatory networks of both biofilm and planktonic cells and their significant TFgene interactions are obtained in Figs. 6.A1 and 6.A2, respectively. This information can allow us to compare the gene regulatory networks of biofilm and planktonic cells and compute the RV to identify TFs that are correlated to the regulation of biofilm formation and development. In spite of the nature of each TF as an activator or a repressor toward its target genes in the gene regulatory networks, these two gene regulatory networks need to be compared for their network structure. The interactions between TFs and genes in these two gene regulatory networks are to be simplified as binary relation, in which “1” represents a significant interaction between the TF and the target gene (no matter activation or repression) and “0” denotes no significant interaction (see Fig. 6.3A and Table 6.1 for illustration). As a result, the comparison between the biofilm gene regulatory network and the planktonic gene regulatory network can generate two different subnetworks, one is called “gain-of-function” subnetwork as shown in Fig. 6.A3 and the other is “loss-of-function” subnetwork as shown in Fig. 6.A4. If a significant interaction is detected in the biofilm but is absent in the planktonic gene regulatory network, such an interaction is classified into the gain-of-function subnetwork (see Fig. 6.A3), which represents a subnetwork within the biofilm gene regulatory network. On the contrary, if a significant interaction is detected in the planktonic but not in the biofilm gene regulatory network, this interaction is considered as a part of the loss-of-function subnetwork (see Fig. 6.A4), representing a subnetwork of the planktonic gene regulatory network. Schematic diagrams and the corresponding binary description of TFgene interactions for illustration of gene regulatory network comparison are shown in Fig. 6.3 and Table 6.1. By the use of the gain-of-function and loss-of-function subnetworks in Figs. 6.A3 and 6.A4 to distinguish the biofilm from planktonic gene regulatory network, we could determine a score named RV to quantify the correlation of each TF in these subnetworks with TABLE 6.1 Construction of gain-of-function and loss-of-function subnetworks [4]. TF A
Biofilm
Planktonic
Gain-of-function
Loss-of-function
Gene 1
1
0
1
0
Gene 2
1
1
0
0
Gene 3
1
1
0
0
Gene 4
1
0
1
0
Gene 5
1
1
0
0
Gene 6
1
0
1
0
Gene 7
0
1
0
1
Gene 8
0
1
0
1
TF, Transcription factor.
II. Systems Infection Microbiology
142
6. Global screening of potential Candida albicans biofilm-related transcription
FIGURE 6.3 Schematic diagrams for illustration of gene regulatory network comparison. (A) The biofilm gene regulatory network and the planktonic gene regulatory network. The regulatory abilities for TFgene interactions are omitted for simplicity. (B) The gain-of-function and loss-of-function subnetworks after network structure comparison. aij;gain and aij;loss indicate the regulatory abilities of the jth TF to the ith target gene for gain-of-function and lossof-function, respectively, in which a positive sign indicates activation and a negative sign indicates repression [11].
the regulation of biofilm formation and to identify potential C. albicans biofilm-related TFs. To compute the RV for each TF, two important issues are also needed to be taken into consideration. First, the magnitude of regulatory abilities aij in (6.1) identified by the corresponding microarray data from the gene regulatory network reconstruction scheme denotes the significance of the TF in the transcriptional regulation for a specific target gene. Second, an assumption is made: if a TF regulates more biofilm-related genes in the gain-of-function and loss-of-function subnetworks, then the TF is more likely involved in the regulation for biofilm formation. Consequently, the RV is determined using the following equation, based on the regulatory abilities of TF in the gain-of-function and loss-offunction subnetworks: RVq 5
Nq X p51
log10
Mq X 10 1 apq;gain 1 log10 10 1 apq;loss
(6.3)
p51
where RVq denotes the relevance value for TF q, apq;gain , and apq;loss , which are numerically obtained by the constrained least-squares parameter estimation in Eq. (6.8) from the gene regulatory network reconstruction scheme, indicating the regulatory ability of TF q to control the target gene p in the gain-of-function and loss-of-function subnetworks, respectively; Nq and Mq represent the numbers of target genes for the TF q identified from the gain-of-function and loss-of-function subnetworks, respectively. The implication of Eq. (6.3) is that RV can quantify the extent of the TF involved in the interactions with
II. Systems Infection Microbiology
6.2 Systems methods of screening biofilm-related transcription factors
143
target genes that can differentiate biofilm and planktonic gene regulatory networks. The measurement of RV is conceptually similar to the well-known “graph edit distance” previously used to compare pathways structurally [283]. In the illustrated schematic diagram in Fig. 6.3, the RV for TF A is calculated as follows: RVA 5
3 2 X X log10 10 1 apA;gain 1 log10 10 1 apA;loss p51
p51
5 log10 10 1 a1A;gain 1 log10 10 1 a4A;gain 1 log10 10 1 a6A;gain 1 log10 10 1 a7A;loss 1 log 10 1 a8A;loss
(6.4)
10
For each TF a corresponding RV is assigned and an empirical P-value is computed to determine the significance of the RV. To determine the P-value for an observed RV, a null distribution of RVs (Fig. 6.4) needs to be generated by repeatedly permuting the network structure of the candidate gene regulatory network and computing the RV for each random network structure. The permutation of the network structure is performed by keeping the network size, that is, the target genes to which a particular TF associated are permuted without changing the total number of TFgene regulatory associations of the network. For example, suppose, there are A selected TFs, B target genes, and C TFgene regulatory associations in the candidate gene regulatory network, the probability of a rewiring of a TFgene association in the permuted random candidate gene regulatory network is uniformly given by C/AB. We repeated this process 100,000 times and estimated the P-value for the corresponding RV as the fraction of random network structures, RV of which is at least as large as the RV of the real network structure. The P-values are then adjusted by Bonferroni correction to avoid multiple testing problems [281,282]. FIGURE 6.4 Distribution of RVs of randomly permuted networks [4]. RVs, Relevance values.
II. Systems Infection Microbiology
144
6. Global screening of potential Candida albicans biofilm-related transcription
The RVs with adjusted P-values # .05 are to be determined as significant RVs and the corresponding TFs are to be identified as the potential C. albicans biofilm-related TFs. The table demonstrates the gain-of-function and loss-of-function subnetworks construction shown in Fig. 6.3. The gain-of-function and loss-of-function subnetworks were constructed by comparing the network structure of biofilm gene regulatory network with that of planktonic gene regulatory network via the comparison scheme.
6.3 Potential Candida albicans biofilm-related transcription factors 6.3.1 Screening of potential Candida albicans biofilm-related transcription factors We have applied the proposed comparison method to analyze and compare data derived from C. albicans biofilm and planktonic cells for the screening of potential C. albicans biofilm-related TFs during the infection process of C. albicans. Among all C. albicans genes, 361 were selected as target genes since they have been annotated by at least one of the GO terms, including biofilm formation, cell adhesion, and filamentous growth. By S. cerevisiae TF information and the orthologs between C. albicans and S. cerevisiae, we have identified 220 C. albicans TFs, which have expression profiles in the experiments, by comparing biofilm with planktonic cells. From the identified TFs and target genes, we have further reconstructed the gene regulatory networks for biofilm and planktonic cells, in which 2149 and 2211 TFgene interactions are included in Figs. 6.A1 and 6.A2 in Appendices 2 and 3, respectively. Among these two gene regulatory networks, excluding the 1442 common interactions, there are 707 interactions in gain-of-function subnetwork and 769 interactions in loss-of-function subnetwork in Figs. 6.A3 and 6.A4 in Appendices 4 and 5, respectively. Then the regulatory abilities of TFs in the gain-of-function and lossof-function subnetworks are used to compute the RVs for each TF and to determine the significance by these RVs. Therefore 23 potential TFs related to C. albicans biofilm formation are identified as shown in Table 6.2 [4].
6.3.2 The potential biofilm-related transcription factors in the formation and development of biofilm A total of 23 TFs are determined as potential C. albicans biofilm-related TFs (Table 6.2). To assure the effectiveness of the proposed screening method, we can seek evidences from literature to validate the inferred cellular functions in regulation of biofilm formation and development. (1) Efg1, Cph1, and Efh1: Both cell adhesion and morphogenesis to form hyphae may play important roles in biofilm formation and maturation [268]. Efg1 is a downstream transcription factor of Rasprotein kinase A signaling pathway to regulate multiple different morphogenetic processes, including phenotypic switching and filamentous growth [284,285,295]. The deletion of C. albicans EFG1 gene could decrease the ability of the cell to adhere to oral epithelial cells in vitro [284].
II. Systems Infection Microbiology
145
6.3 Potential Candida albicans biofilm-related transcription factors
TABLE 6.2 Identification of potential Candida albicans biofilm-related transcription factors (TFs). Systematic name
TFa
RV
Adjusted P-valueb
orf19.5953
orf19.5953
117.6815
, 1e 2 05
orf19.610
Efg1
76.1153
, 1e 2 05
[284,285]
orf19.4433
Cph1
75.4189
, 1e 2 05
[286]
orf19.5498
Efh1
70.8097
, 1e 2 05
[287]
orf19.861
orf19.861
68.2492
, 1e 2 05
orf19.1773
Rap1
59.4340
, 1e 2 05
orf19.837.1
Ino4
53.3481
, 1e 2 05
orf19.1069
Rpn4
52.8505
, 1e 2 05
orf19.2236
orf19.2236
51.4651
, 1e 2 05
orf19.5908
Tec1
51.2743
, 1e 2 05
orf19.4545
Swi4
49.9914
, 1e 2 05
orf19.5041
orf19.5041
45.9961
, 1e 2 05
orf19.2054
Fgr15
45.6984
, 1e 2 05
orf19.5312
orf19.5312
44.3625
, 1e 2 05
orf19.1358
Gcn4
38.4428
,1e 2 05
orf19.7046
Met28
38.4261
,1e 2 05
orf19.4573
Zcf26
35.9090
.0022
orf19.971
Skn7
35.7618
orf19.6121
Mnl1
35.4126
.0022
orf19.7025
Mcm1
34.7227
.0066
orf19.952
orf19.952
32.3603
.0242
orf19.5975
orf19.5975
31.8614
.0286
orf19.2752
Adr1
31.8191
.0286
Literature evidence
[288]
[289,290]
[291]
[292]
[293]
[294]
[291]
a
The TF names are retrieved from CGD database http://www.candidagenome.org/. The adjusted P-values are obtained by Bonferroni correction. RV, Relevance value.
b
C. albicans Cph1 is an ortholog of S. cerevisiae Ste12. In S. cerevisiae the cells mate in response to pheromones via the cellular functions of mitogen-activated protein kinase cascade and its downstream TF, Ste12. C. albicans Cph1 is not only required for mating [296], but it is also important for hyphal formation [286]. Finally, it is shown that efg1/efg1 cph1/cph1 double mutant cannot form hyphae and is also defective in biofilm formation [297,298].
II. Systems Infection Microbiology
146
6. Global screening of potential Candida albicans biofilm-related transcription
APSES proteins can regulate fungal filamentation and differentiation. There are two APSES proteins in C. albicans, Efg1 and Efh1 [287]. C. albicans EFH1 gene deletion can cause hyperfilamentation in an efg1 background under certain conditions, indicating that Efh1 can modulate and support the regulatory functions of Efg1 [287]. (2) Rap1 and Tec1: Rap1 is a transcription factor and telomere-binding protein that is crucial for cell viability in S. cerevisiae. Researches from C. albicans RAP1-deletion mutant have shown that Rap1 is required for the efficient repression of pseudohyphal growth under yeast-favoring conditions but is not essential for the viability of C. albicans [288]. Tec1, a member of the TEA/ATTS family of TFs, has been shown to regulate the hyphal development and virulence in C. albicans. Insertion mutations of TEC1 could lead to severe defects in biofilm formation [289,290]. (3) Fgr15, Gcn4, Skn7, Mcm1, and Adr1: Fgr15 is a putative transcription factor with zinc finger DNA-binding motif. Transposon mutation of FGR15 can affect filamentous growth [291]. Gcn4, like its ortholog in S. cerevisiae, can activate the transcription of amino acid biosynthetic genes. In addition, C. albicans Gcn4 can interact with the RascAMP pathway to promote filamentous growth in response to amino acid starvation [299]. C. albicans GCN4-deletion mutant can reduce biofilm biomass, indicating that Gcn4 is required for the normal biofilm growth [292]. Skn7, one of the response regulator proteins in C. albicans, is found to be required for morphogenesis under some conditions and its mutant produces smooth colonies [293]. It is also required for adaptation under some types of oxidative stress in vitro [293]. Mcm1 is an important gene in C. albicans, the protein levels of which are crucial for the determination of cell morphology. It might be a mediator recruiting regulatory factors to be required for hyphal development in C. albicans [294]. Adr1, like Fgr15, is also a putative transcription factor with zinc finger DNA-binding motif, and its mutant has been found to result in less filamentous growth [291]. (4) Other TFs identified: Of the 23 TFs identified previously, 10 of them have been shown to relate to various processes of biofilm formation (e.g., filamentation and cell adhesion) or biofilm formation per se. Consequently, the remaining 13 TFs could provide good candidates for further experiments to determine their regulatory roles in biofilm formation and development.
6.3.3 Statistical measures of the screening test Among a total of 220 TFs selected for screening, 23 potential biofilm-related TFs with significant RVs are identified in Table 6.2. Of the other 197 TFs, we also check literature evidences to see if they are validated by experiments as biofilm-related TFs. Twenty-six out of 197 TFs without significant RV are annotated with GO terms such as biofilm formation, cell adhesion, or filamentous growth. The sensitivity, specificity, positive predictive value, and negative predictive value of the proposed screening method are also evaluated (see Appendix 1 for details). The proposed screening method can identify potential C. albicans biofilm-related TFs with a low sensitivity of 27.78% and a high specificity of 92.93%. Moreover, the proposed method is effective on determining the TFs that are not biofilm related as the negative
II. Systems Infection Microbiology
6.4 Discussion
147
predictive value is 86.80%. The positive predictive value is 43.48%, enriching by 2.7-fold the likelihood of screening TFs that are biofilm related since the biofilm-related prevalence among a total of 220 TFs is 16.36%. It is noteworthy that these statistics are evaluated based on the published literature evidences and GO annotations, also suggesting that if more C. albicans biofilm-related TFs are validated by experiments, the statistics can be improved.
6.4 Discussion Even the architecture of C. albicans biofilms and the correlation between biofilm and infection have been analyzed, the understanding of the gene regulations that are responsible for the biofilm formation is still limited. Because TFs play an important role in gene regulatory networks, in this chapter, we have developed a computational framework via network comparison to screen for C. albicans TFs that may be crucial for biofilm formation and development during the infection process of C. albicans. The original idea is derived from the concept of comparative biology that can commonly utilize comparative approaches in the analysis of genomic sequences to reveal the functional similarities and differences among different species [300]. We have extended the concept and compare the gene regulatory networks to explore what makes the pathogenic difference between biofilm and planktonic cells in C. albicans. The advantage of the proposed screening method of biofilm TFs lies in the convenience and systematicity. Compared with the time- and labor-consuming experiments, we can provide an efficient and rapid way for screening TFs by comparing two gene regulatory networks of C. albicans biofilm and planktonic cells from the systematic point of view. Richard et al. [267] have used a collection of insertion mutations in 197 C. albicans ORFs to screen those mutants that are defective in biofilm formation; however, only 4 such biofilmrelated genes are identified. In this chapter, the computational method has a positive predictive value of 43.48% that is much higher than that shown by Richard et al. (B2.03%). Therefore the proposed screening method can be useful for providing potential target genes for biologists to perform further experiments. It can be considered as a preexperiment screening of other studies. In addition, the proposed approach in this chapter is not only capable of studying biofilm and planktonic cells but can also be used to compare other two physiological conditions as long as the adequate data are available. For example, this method can be extended to screen TFs possibly involved in the cancer development process by comparing the normal cell and cancer cell, and the TFs screened could serve as biomarkers of drug targets for therapeutic intervention [301]. Even the proposed approach is shown to be useful, some drawbacks or improvements are still needed to be taken into consideration. One assumption of the stochastic system dynamic model in Eq. (6.1) is that the time delay of transcriptional regulation of the TF to the target gene is only one time unit (about 7 min in this case), which is not always the case. Several studies have shown from gene expression profiles that different time delays are required for different TFs to exert regulatory effects on their target genes [276,302,303]. However, because the time delays cannot be experimentally measurable for all the TFs and its potential target genes and the computationally predicted time delays are not completely reliable, the time delays are all set to one time unit when reconstructing the gene regulatory networks. Except the time-delay assumption, one important consideration is data accuracy
II. Systems Infection Microbiology
148
6. Global screening of potential Candida albicans biofilm-related transcription
from public domains. For example, based on the orthology information between C. albicans and S. cerevisiae, we have adopted the information of regulatory associations between TFs and genes from S. cerevisiae to the study of C. albicans. The orthology mappings are performed at CGD using InParanoid software, which has basically employed the computed sequence similarity to determine orthologs [274]. If the orthology mapping data are not perfectly accurate, it can result in the misinterpretation of regulatory associations between TFs and genes in C. albicans. To overcome the accurate problem, it is better to obtain the TFgene regulatory associations directly from the experiments (e.g., genome-wide ChIP-chip) using C. albicans. Recently, the genome-wide location analysis by ChIP-chip has been developed for the study of C. albicans [163,167]. However, at present, similar studies for biofilm-related TFs are still not available. Another shortage of the information from public domains is the lack of information related to S. cerevisiae TFgene association in YEASTRACT and ChIP-chip data from Harbison et al. [272], even orthologs of the TF and target genes do exist in C. albicans. Therefore it is still unable to reconstruct the corresponding gene regulatory network, thus the particular TF can be excluded from the TF pool. One can also solve this problem by performing C. albicans ChIP-chip experiments. Once the reliable C. albicans TFgene regulatory associations are available, the performance of the proposed screening method can be improved, and the reliable gene regulatory networks can be reconstructed. Further, numerous factors have been also found to affect C. albicans biofilm formation, including supporting substrate, growth medium, and C. albicans strains [264,267]. Given the complex conditions to affect the kinetics of biofilm formation process and the huge amounts of data generated by postgenomic approaches under different experimental conditions, we can now investigate the most significant TFs that are responsible for the biofilm formation and development in C. albicans infection process. The screening of biofilm-related TFs is the initial step to investigate the whole gene regulatory network that governs biofilm formation and development in C. albicans infection process. Lu and Collins [304] have successfully demonstrated that synthetic biology techniques are feasible to engineer bacteriophage to express DspB, an enzyme that can hydrolyze the crucial biofilm formation adhesion (β-1,6-N-acetyl-D-glucosamine) encoded by genes pgaABCD in Escherichia coli [305,306], therefore reducing bacterial biofilms. Consequently, by combining the systems biology approaches to gain more insight into the molecular mechanisms for biofilm formation with the synthetic biology techniques to engineer the enzyme needed, new therapeutic strategies may be developed to combat the recalcitrant infections caused by C. albicans and other microbial pathogens.
6.5 Conclusion Biofilm formation is found to be a major virulence factor in C. albicans pathogenesis and is related to antidrug resistance to the therapeutic treatment of this organism. However, little is known about the molecular mechanisms for biofilm formation and development. In this chapter, we have developed an efficient computational framework for the global screening of potential TFs to control C. albicans biofilm formation, cell adhesion, and filamentous growth. S. cerevisiae information is first used to infer the possible TFgene
II. Systems Infection Microbiology
149
6.6 Appendix
regulatory associations in C. albicans by ortholog mapping method. Gene regulatory networks of C. albicans biofilm and planktonic cells are then compared to identify TFs involved in biofilm formation, development, and maintenance. A total of 23 TFs are finally identified; 10 of them are previously reported to be involved in biofilm formation. Literature evidences have indicated that our approach can be useful to reveal TFs significant in biofilm formation and provide new important targets for further studies to understand the regulatory mechanisms in biofilm formation and the fundamental differential mechanism between biofilm and planktonic cells, which can serve as the biomarkers of drug targets for therapeutic intervention of C. albicans infections.
6.6 Appendix 6.6.1 Appendix 1: Supplementary methods 6.6.1.1 Identification of the gene regulatory parameters After constructing the stochastic dynamic model of the candidate gene regulatory network, the gene regulatory parameters in the model are needed to be identified using the microarray data we have. The microarray gene expression profiles are then overlaid to identify the gene regulatory parameters. The identification of the gene regulatory network is performed gene by gene. Before the identification method is determined, the system dynamic model must be examined carefully. In Eq. (6.5) the basal expression level ki should be always nonnegative, since the microarray expression of the genes are always nonnegative. Due to the constraint of the parameters in Eq. (6.5), the gene regulatory parameters are identified by solving the constrained least-squares problems. Eq. (6.5) can be rewritten as the following regression form: 2 3 ai1 6 7 ^ 7 6 7 xi ½t 1 1 5 z1 ½t ? zNi ½t xi ½t1 6 a 6 iNi 7 1 εi ½t (6.5) 4 ð1 2 λi Þ 5 ki φi ½t θi 1 εi ½t
where φi ½t denotes the regression vector which can be obtained from the processing above. θi is the parameter vector of the target gene i to be estimated. In order to avoid the overfitting in the identification process, the cubic spline method [307309] is used to interpolate extra for the gene expression data. By the cubic spline method the time points values of zj ½tl xi ½tl for lAf1; 2; ?; Lg and jAf1; 2; ?; Ni g are easily obtained, where L is the number of expression time points of a target gene i, and Ni is the number of TFs binding to the target gene i.Eq. (6.5) at different time points can be arranged as follows: 2 3 2 3 2 3 xi ½t2 εi ½t1 φi ½t1 6 xi ½t3 7 6 φi ½t2 7 6 εi ½t2 7 6 7 6 7 6 7 (6.6) 4 ^ 5 5 4 ^ 5 θi 1 4 ^ 5 xi ½tL φi ½tL21 εi ½tL21
II. Systems Infection Microbiology
150
6. Global screening of potential Candida albicans biofilm-related transcription
For simplicity, the notations Xi , Φi , and Ei were defined to represent equation (6.6) as follows:
X i 5 Φi θ i 1 E i
(6.7)
The constrained least-squares parameter estimation problem is formulated as follows: 1 2 min :Φi θi 2Xi :2 θi 2
such that Aθi # b
(6.8)
where A 5 ½0?0 0 -1, b 5 0 give the constraints to force the basal level ki in Eq. (6.5) to be always nonnegative, that is, ki $ 0. The constrained least-squares problem in (6.8) can be solved using the active set method for quadratic programming [259,280]. Since there are no good data for genome-wide protein concentration levels in C. albicans, gene expression profiles are used instead for identifying the regulatory parameters. 6.6.1.2 Determination of significant gene regulations When the regulatory parameters are identified, AIC [40,81] and student’s t-test [281,282], which is used to calculate the P-values of the regulatory abilities, are all employed for both model selection and determination of significant gene regulations in the gene regulatory network. The AIC attempts to include both the estimated residual variance and model complexity in one statistics. AIC has a minimum around the correct parameter number [40,81]. Therefore AIC can be used to select the real regulation model structure based on the regulatory abilities (aij ’s) identified earlier. Once the estimated regulatory parameters are examined using the AIC model selection criterion, the student’s t-test is employed to calculate the P-values for the regulatory abilities (aij ’s) under the null hypothesis H0 :aij 5 0 [281,282] to determine the significant regulatory interactions. The P-values computed are then adjusted by Bonferroni correction to avoid a lot of spurious positives [281,282]. The regulations with P-value # .05 are determined as significant regulations and be preserved in the gene regulatory network. 6.6.1.3 Statistical measures of the screening test Sensitivity and specificity are statistical measures of the performance of screening/diagnostic test. The sensitivity measures the proportion of true positives that are correctly identified by the test and the specificity measures the proportion of true negatives that are correctly identified by the test [282,310]. More specifically, the sensitivity and the specificity are defined as the following equations: Sensitivity 5 Specificity 5
Number of true positives Number of true positives 1 Number of false negatives
Number of true negatives Number of true negatives 1 Number of false positives
(6.9) (6.10)
In a diagnostic test, for example, a sensitivity of 100% means that the test recognizes all sick people as such. Thus in a high sensitivity test, a negative test result is used to rule out the disease. A specificity of 100% means that the test recognizes all healthy people as healthy. Thus a positive result in a high specificity test is used to confirm the disease.
II. Systems Infection Microbiology
6.6 Appendix
151
Further, positive predictive value can measure the proportion of objects with positive test results that are correctly diagnosed/screened. Negative predictive value can measure the proportion of objects with negative test results that are correctly diagnosed/screened [311]. More specifically, the positive and negative predictive values are defined as the following equations: Positive predictive value 5
Number of true positives Number of true positives 1 Number of false positives (6.11)
Negative predictive value 5
Number of true negatives Number of true negatives 1 Number of false negatives (6.12)
In a diagnostic test a high positive predictive value means that a positive test result will have a high chance of identifying an individual with the disease. A high negativepositive value indicates that a negative test result has a high chance of identifying an individual who does not have the disease.
II. Systems Infection Microbiology
152
6. Global screening of potential Candida albicans biofilm-related transcription
6.6.2 Appendix 2: Supplementary Supplementary Fig. 6.A1 shows the schematic view of the biofilm regulatory network.
FIGURE 6.A1
The biofilm gene regulatory network. The figure was plotted using Cytoscape [4].
II. Systems Infection Microbiology
6.6 Appendix
153
6.6.3 Appendix 3: Supplementary Supplementary Fig. 6.A2 displays the schematic view of the planktonic regulatory network.
FIGURE 6.A2
The planktonic gene regulatory network. The figure was plotted using Cytoscape [4].
II. Systems Infection Microbiology
154
6. Global screening of potential Candida albicans biofilm-related transcription
6.6.4 Appendix 4: Supplementary Supplementary Fig. 6.A3 illustrates the schematic view of the gain-of-function subnetwork.
FIGURE 6.A3
The gain-of-function subnetwork. The figure was plotted using Cytoscape [4].
II. Systems Infection Microbiology
6.6 Appendix
155
6.6.5 Appendix 5: Supplementary Supplementary Fig. 6.A4 demonstrates the schematic view of the loss-of-function subnetwork.
FIGURE 6.A4
The loss-of-function subnetwork. The figure was plotted using Cytoscape [4].
II. Systems Infection Microbiology
C H A P T E R
7 Identification of infection- and defense-related genes through dynamic hostpathogen interaction network 7.1 Introduction Candida albicans is an opportunistic fungal pathogen responsible for various mucosal infections, such as candidiasis (e.g., oral thrush and vaginitis) and other potentially life-threatening diseases [312]. C. albicans is also the species most frequently responsible for hospital-acquired fungal infections. This pathogen can colonize various biomaterials, such as ventricular assist devices and urinary and vascular catheters, forming dense biofilms that are resistant to most antifungal drugs [313]. C. albicans infections and candidiasis are difficult to treat and create very serious therapeutic challenges. Mortality rates among patients with candidiasis can be as high as 40% 60%, especially for those who have bloodstream infections [264]. Therefore an understanding of the molecular mechanisms underlying the pathogenicity of C. albicans and host defense systems could improve the medical therapy and facilitate the development of new antifungal drugs. Under normal circumstances, C. albicans lives in 80% of the human population with no harmful effects, although its overgrowth results in candidiasis, often observed in immunocompromised (e.g., HIV-positive) individuals [139,314]. C. albicans can grow in a variety of morphological forms, ranging from yeast form to pseudohyphal form to true tubular hyphal form, depending on the growth conditions in the host environment [285]. A number of molecular factors have been implicated as associated with the virulence of C. albicans, such as host recognition biomolecules, secreted aspartyl proteases, and phospholipases, as well as life cycle factors such as adhesion and morphogenesis [140]. Among those factors the transition from yeast to hyphal form is considered to be critical for C. albicans pathogenesis. The ability of C. albicans to form hyphae has been proposed as a virulence factor, as these structures are often observed in an invaded tissue and C. albicans strains unable to form hyphae (whether naturally or through introduced mutations) show defective infectivity [285,315]. Although previous studies have provided some insights, the details of the molecular mechanisms responsible for
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00005-5
157
© 2021 Elsevier Inc. All rights reserved.
158
7. Identification of infection- and defense-related genes through dynamic hostpathogen interaction network
morphological forms still await elucidation. In this chapter, we utilized time-series microarray data over nine time points to construct two dynamic networks, which represent protein protein interaction (PPI) in the adhesion stage (i.e., when hyphae are not growing) and the invasion stage (i.e., when they are). By comparing these two dynamic networks, we can investigate the details of molecular mechanisms responsible for changes in C. albicans infectivity across morphological forms. At present, we still lack sufficient high-throughput screening data for C. albicans such as PPI and ChIP-chip data, even though the genome for C. albicans has been identified, sequenced, and released to aid research on this significant pathogen [155]. The C. albicans genome for strain SC5314 has already been sequenced, revealing that almost two-thirds of its B6000 open reading frames are orthologous to genes of Saccharomyces cerevisiae, the most intensively studied eukaryotic model organism to have its entire genome sequenced [260,270]. The identification also revealed gene orthologs between C. albicans and S. cerevisiae. Unlike C. albicans, S. cerevisiae does not form true hyphae and is generally not considered a human pathogen. S. cerevisiae has abundant high-throughput screening data, and it is closely related to C. albicans (i.e., both fall within the class Hemiascomycetes), the genome information from S. cerevisiae could be usefully adapted for our understanding of C. albicans biology and pathogenesis [260]. The zebrafish (Danio rerio) has emerged as a powerful new vertebrate model for human diseases. Numerous studies have already utilized the zebrafish system to study the pathogenesis of various human infectious diseases, including those caused by bacteria or viruses [316,317]. The zebrafish immune system displays remarkable similarities to mammalian counterparts. As a demonstration of the zebrafish’s utility as a model organism for human disease, in 49 cases of a zebrafish mutant gene being cloned based on a forward genetic screen, the genes were found to have homologs in human disease [318]. Overall, the zebrafish genetic map demonstrates highly conserved synteny with the human genome [319]. Chao et al. [320] have also demonstrated that C. albicans can colonize and invade the fish host at multiple anatomical sites and prove fatal in a dose-dependent manner. Therefore a zebrafish infection model could be used to systematically investigate the details of the C. albicans invasive process and infectious mechanisms. In this chapter, we construct an infectious C. albicans and zebrafish intracellular/interspecies PPI network by mining and integrating microarray data, PPI information, and host/pathogen interspecies interactions in order to investigate how morphology regulates the infectious behavior of C. albicans on host tissue. Consequently, we discovered that all major hypha-related pathways are visible in the hyphal PPI network, confirming the reliability and accuracy of the proposed methods and results. From a systems perspective, we were able to predict the proteins with the largest changes in the number of interactions and the hub proteins for morphological switching processes. We identified several important hyphal growthrelated proteins—for example, Ubi4, Act1, Kex2, Hsl1, and Tsa1—and some proteins worth further exploration for pathogenicity research such as Hht21, Kre1, and Orf19.5438. These proteins could be considered as significant biomarkers for drug targets in the development of new antifungal drugs. Moreover, three noteworthy functions at work in C. albicans infection—cellular iron ion homeostasis, glucose transport, and cell wall molecular biosynthesis—were named pathogen invasion mechanisms from the analysis of the integrated intracellular/interspecies protein interaction networks. Furthermore, several cellular functions, such as apoptosis and immune response, were also found to be involved in host defense mechanisms.
II. Systems Infection Microbiology
7.2 Material and methods of constructing hostpathogen interaction network in Candida albicanszebrafish infection
159
7.2 Material and methods of constructing hostpathogen interaction network in Candida albicanszebrafish infection 7.2.1 Simultaneous time-course microarray experiment of zebrafish and Candida albicans during Candida albicans infection SC5314 strain C. albicans and adult AB strain zebrafish are used for the time-course microarray experiments. All the maintenance and preparation are performed according to procedures described previously [320]. Zebrafish are first anesthetized by immersion in water containing 0.17 g/mL of Tricaine (Sigma) and then intraperitoneally injected with 1 3 108 CFU C. albicans cells suspended in the 10 μL sterile phosphate-buffered saline. The infected fishes are then sacrificed by immersion in ice water at 0.5, 1, 2, 4, 6, 8, 12, 16, and 18 h postinjection and frozen in liquid nitrogen. C. albicansinfected zebrafish are also treated with Trizol Reagent (Invitrogen, Carlsbad, California, United States), pulverized in liquid nitrogen using a small mortar and pestle and then disrupted using an MagNA Lyser System (Roche) with glass beads (cat. No. G8772-100G, Sigma) by shaking at 5000 rpm for 15 s. After phase separation by adding chloroform, the total RNA is purified using an RNeasy Mini Kit (Qiagen, Hilden, Germany). Purified RNA is then quantified at OD260nm using an ND-1000 spectrophotometer (NanoDrop Technology, Wilmington, Delaware, United States) and analyzed using a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, California, United States) with RNA 6000 Nano LabChip kit (Agilent Technologies). One microgram of the total RNA is then amplified using a Quick-Amp Labeling kit (Agilent Technologies) and labeled with Cy3 (CyDye, PerkinElmer, Waltham, Massachusetts, United States) during the in vitro transcription process. 0.625 μg of Cy3 cRNA for the C. albicans array and 1.65 μg of Cy3 cRNA for the zebrafish array are fragmented to an average size of 50100 nucleotides by incubation with fragmentation buffer at 60 C for 30 min. The fragmented-labeled cRNA is then hybridized to both C. albicans and zebrafish oligomicroarrays (Agilent Technologies) at 60 C for 17 h. After washing and drying using a nitrogen gun, microarrays are scanned using an Agilent microarray scanner (Agilent Technologies) at 535 nm for Cy3. For each time point, three biological replicates are done for both organisms. The raw data are finally processed with loess normalization, and the results have been deposited in the Gene Expression Omnibus.
7.2.2 Overview of the screening process of infection-related proteins The global screening method for infection-related proteins was divided into three key steps: (1) data selection and processing for proteins, (2) constructing dynamic hyphal growth networks for C. albicans and dynamic networks for zebrafish, and (3) constructing an intercellular PPI network between pathogen and host. The flowchart of the construction of intercellular PPI network is shown in Fig. 7.1. After constructing the overall intercellular PPI network—which consists of the hyphal growth intracellular PPI network for C. albicans, the intracellular PPI network for zebrafish, and the interspecies PPI network between pathogen and host—from which we will search for the potential infection-related proteins and immune response pattern recognition molecules in both C. albicans and zebrafish.
II. Systems Infection Microbiology
FIGURE 7.1 Flowchart of the construction of the integrated infection intracellular/interspecies PPI network via database mining and integration [8]. The construction of the proposed integrated intracellular/interspecies PPI network is performed by database mining and network identification. The network construction combines DNA microarray data with different types of information from various databases, as shown in the white boxes. The blue boxes show the steps of candidate subnetwork construction. The bottom part (orange boxes) of the flowchart illustrates the steps of network identification and the subsequent construction of the integrated intercellular PPI network. PPI, Proteinprotein interaction.
7.2 Material and methods of constructing hostpathogen interaction network in Candida albicanszebrafish infection
161
7.2.3 Data selection and database mining In this chapter, several types of data are mined and integrated to construct the intercellular PPI network. In C. albicans the required data consist of some of its microarray gene expression profiles, PPIs from S. cerevisiae, data on gene orthologs between C. albicans and S. cerevisiae, and gene annotations for C. albicans. There are nine time points in the C. albicans microarray data spanning from 0.5 to 18 h postinfection (i.e., 0.5, 1, 2, 4, 6, 8, 12, 16, and 18 h). The gene ortholog data are obtained from the Candida Genome Database, and the C. albicans gene annotations are retrieved from the Gene Ontology (GO) project. The PPI data for S. cerevisiae are extracted from the Biological General Repository for Interaction Datasets (BioGRID). In zebrafish the data consist of microarray gene expression profiles, PPIs from Homo sapiens, data on human and zebrafish gene orthologs, and functional gene annotations for zebrafish. There are also nine time points in the zebrafish microarray data spanning from 0.5 to 18 h (i.e., 0.5, 1, 2, 4, 6, 8, 12, 16, and 18 h). The gene ortholog data are obtained using the ZFIN, InParanoid, and BLAST databases. The zebrafish gene annotations are also retrieved from the GO project. The PPI data for H. sapiens are also extracted from BioGRID and the Human Protein Reference Database.
7.2.4 Selection of protein pool for candidate proteinprotein interaction networks Due to the lack of PPI databases between C. albicans and zebrafish at present, gene ortholog data between C. albicans and S. cerevisiae as well as between zebrafish and H. sapiens are utilized to set up protein data pools for our candidate C. albicans and zebrafish PPI networks, respectively. C. albicans PPIs can be inferred by applying ortholog data between C. albicans and S. cerevisiae to the latter’s PPI data; in the same way, zebrafish PPIs can be inferred by mapping H. sapiens PPI data to ortholog data between humans and zebrafish. Then, we can set up the protein pool consisting of differentially expressed proteins. Since large-scale protein activity measurements are unavailable, mRNA expression profiles are used as a substitution instead. Although the mRNA expression levels cannot be completely representative of the corresponding protein expression levels, they are at least partially and positively correlated [321,322]. The mRNA expression level for each protein will be used to filter differentially expressed proteins using one-way analysis of variance (ANOVA), where the null hypothesis is the average mRNA expression levels at every time point being equivalent. In C. albicans and zebrafish alike, the proteins with P values returned by ANOVA that are less than .01 are added to the protein pool. In this manner, we can select 4820 and 9665 proteins for inclusion in the protein pools of C. albicans and zebrafish, respectively. In this step a set of 4820 proteins is found to be too large for constructing the PPI network for C. albicans. Because the high resultant PPI number is larger than the size of the microarray data, the protein pool of C. albicans needs to be narrowed to avoid over fitting in the parameter identification process for the construction of PPI network. So, utilizing the GO database to further select a hyphal growth protein pool within the set of 4820 proteins, we could construct a hyphal PPI network for C. albicans consisting of a subset of 403 proteins that are identified as being related to hyphal growth. In addition, we were able to locate the beginning of hyphal growth in the body of the zebrafish at
II. Systems Infection Microbiology
162
7. Identification of infection- and defense-related genes through dynamic hostpathogen interaction network
24 h postinfection from microscopy images of the experiment (Fig. 7.2). Consequently, we could select 598 additional proteins, of which mRNA levels are changed by more than twofold in 16 h to another hyphal growth protein pool. Most of these 598 proteins have not yet been confirmed as being associated with hyphal growth. Combining the 403 hypha-related proteins with the 598 proteins having a more than twofold change in expression levels can yield 1001 proteins for the total hyphal growth protein pool. Consequently, a candidate PPI network could be constructed based on this protein pool and PPI information. Since candidate PPI networks contain many false-positive PPIs, the candidate PPI network needs to be pruned using real time-series microarray data through a dynamic interaction model in the following subsection.
FIGURE 7.2 Experimental microscopy images of the infection process of Candida albicans on zebrafish tissue: infection of zebrafish with C. albicans. Zebrafish sections are stained with haematoxylin and eosin (HE) and imaged by microscopy as described in methods. The respective time points are 0.5 h (A), 1 h (B), 2 h (C), 4 h (D), 6 h (E), 8 h (F), and 12 h (G). “L” indicates liver and “I” indicates intestines. It is apparent that hyphae began to grow between the 2- and 4-h time points [8].
II. Systems Infection Microbiology
7.2 Material and methods of constructing hostpathogen interaction network in Candida albicanszebrafish infection
163
7.2.5 Dynamic system model for the construction of organism protein interaction network during infection The candidate PPI network can be described as a dynamic system in which interactive proteins and mRNA are considered as inputs of the dynamic system and protein activities as outputs of the dynamic system. All proteins in the candidate PPI network can be considered as target proteins. For a target protein p in the candidate PPI network with N interacting proteins, a dynamic system model of the protein’s activity can be represented as follows: yp ½t 1 1 5 yp ½t 1
Qp X
for p 5 1; 2; . . .; N
bpq yp ½tyq ½t 1 αp xp ½t 2 β p yp ½t 1 ωp ½t
q51
(7.1)
where yp ½t denotes the protein level of protein p at time t, bpq represents the interaction ability of the qth interactive protein to protein p, yq ½t denotes the protein level of the qth protein interacting with protein p, αp is the translation rate from mRNA to protein, xp ½t denotes the mRNA expression level of gene p, β p is the decay rate of protein p, and ωp ½t denotes the stochastic measurement noise. The PPI rate is proportional to the product of two proteins’ concentrations [277] (i.e., proportional to the probability of molecular collisions between two proteins), and thus the protein interaction is modeled as a nonlinear multiplication scheme. The biological interpretation of Eq. (7.1) is that the protein level of target protein p at time t 1 1 is a function of the present protein level plus interaction with Qp proteins, plus additive translation effects from mRNA, minus the present protein degradation effect, and plus some stochastic noise. Because of the undirected nature of protein interactions, we cannot assign the interaction direction for a two-protein interaction in the PPI subnetwork in Eq. (7.1). After the dynamic interaction model for the pth protein is constructed as in Eq. (7.1), the interaction parameters bpq , translation parameter αp , and decay rate β p can be estimated from microarray data by the system identification method in Chapter 2, Biological Network Modeling and System Identification in Systems Immunology and Infection Microbiology. Since the number of PPIs in a candidate PPI network varies in the literature, dependent on the biological situation or condition targeted by a specific study, there exist many false positives and several interactions may not be relevant to our purposes. Therefore the estimated interaction parameters b^pq need to be pruned using the model order selection method Akaike information criterion (AIC), which is detailed in the following sections.
7.2.6 Determination of significant protein interaction pairings in infection PPI network When the protein interaction parameters b^pq have been identified, AIC [81] is then employed for both model order selection and determination of significant interactions in the infection PPI networks: that is, to determine the number of interactions Qp in Eq. (7.1). The AIC, which attempts to include both the estimated residual error and model complexity in one statistical measure, decreases as the residual variance decreases and increases as the number of interactions (i.e., complexity) increases [40].
II. Systems Infection Microbiology
164
7. Identification of infection- and defense-related genes through dynamic hostpathogen interaction network
2Qp ; AIC Qp 5 log εp 1 L
where εp 5
T 1 Yp 2Φp θ^ p Yp 2 Φp θ^ p L
(7.2)
As the expected residual error decreases with increasing number Qp of interactions for inadequate model complexities, there should be a minimum Qp located near the correct interaction number [40,81]. Therefore AIC can be used to select model order (i.e., the number of interactions) based on the protein interaction abilities b^pq identified previously, that is, the insignificant protein interactions out of model order Q^ p are considered as false positives and should be pruned from the candidate PPI network to obtain the realistic PPI network. In other words, we use the AIC model order selection method to reduce the likelihood of false-positive PPIs from the candidate PPI network using timeprofile microarray data to achieve a more realistic PPI network. After constructing the PPI networks for host and pathogen, we can construct an intercellular PPI network for the protein interactions between pathogen and host to gain more insight into the offensive and defensive schemes of pathogen and host during the infection process.
7.2.7 Construction of an interspecies proteinprotein interaction network between pathogen and host To identify the intercellular PPIs between pathogen and host during infection of zebrafish with C. albicans, we will utilize the Temporal Relationship Identification Algorithm (TRIA) that has employed gene expression data to identify a given transcription factor’s regulatory targets from its binding targets mimed from ChIP-chip data [276]. The first step is to build a pool of C. albicans cell-surface proteins. We then use the GO database to select 195 cellsurface proteins from the 4031-protein pool to build the resultant protein pool for C. albicans. Since host resistance against C. albicans infections is mediated predominantly by phagocytes, that is, neutrophils and macrophages [323,324], we have assumed that cell-surface proteins of C. albicans may interact with any protein of zebrafish in the infectious process. Therefore , we let x 5 ðx1 ; . . .; xN Þ denote the protein expression time profile of C. albicans cell-surface , protein x and y 5 ðy1 ; . . .; yN Þ denote the protein expression time profile of zebrafish protein y. We will construct the interspecies protein interactions between C. albicans and zebrafish via cross-correlation calculations of their time-series microarray data. , , We compute the cross correlation between protein expression profiles x and y of C. albicans and zebrafish with a lag of k time points as follows: P N2k i51 ðyi1k 2 yÞðxi 2 xÞ cðkÞ 5 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiqffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; k 5 0; 1; . . .; T (7.3) PN2k PN2k 2 2 ðy 2yÞ ðx 2xÞ i i1k i51 i51 where
P y5 x5
N2k i51
yi1k
ðN 2 kÞ P N2k x i i51
;
ðN 2 kÞ
II. Systems Infection Microbiology
(7.4)
7.3 Pathogenic/offensive mechanism between Candida albicans and zebrafish in infection process
165
and T is the maximal time lag after the C. albicans infection. We interpolate the nine time points time profiles available for both C. albicans and zebrafish into 36 time points time profiles. The interval between each time point is 0.5 h. In this study, we set T 5 8, meaning that we can compute the cross correlation between a C. albicans cell-surface protein and a zebrafish protein for all possible time lags less than 4 h. Even the beginning of hyphal growth in the body of the zebrafish occurs at 24 h postinfection, we have assumed the hypha-related proteins of C. albicans might influence zebrafish proteins ahead of 4 h postinfection. Then, we will test the null hypothesis ðH0 Þ : cðkÞ 5 0 (i.e., the cellsurface proteins of C. albicans and zebrafish proteins are uncorrelated) and the alternative hypothesis ðHα Þ : cðkÞ 6¼ 0 by the bootstrap method [325] to obtain a P-value PðkÞ. After all cross correlations are calculated, we set the constraint on cross correlation levels to be higher than .95. The PPIs satisfying this constraint are considered as potential interspecies PPIs between C. albicans and zebrafish.
7.3 Pathogenic/offensive mechanism between Candida albicans and zebrafish in infection process 7.3.1 Construction of the interspecies proteinprotein interaction network during infection This section aims to construct the intercellular PPI network between the hyphal proteins of C. albicans and zebrafish proteins during the infection process. The flowchart about the construction of interspecies PPI network during infection is shown in Fig. 7.1. There are three main routes. Among them two separately construct the hyphal intracellular PPI network of C. albicans and the intracellular PPI network of zebrafish. The third route is to construct the hostpathogen interspecies PPI network. Based on the microarray data, we select 4820 and 9665 proteins for inclusion in the source protein pools of C. albicans and zebrafish, respectively. Further, we select 1002 proteins to include in the hyphal growth protein pool from the C. albicans protein pool due to the need to investigate what factors are behind the transition from yeast form to hyphal form in the C. albicans infection process. In the candidate C. albicans hyphal PPI network, there are 3604 PPIs; in the candidate zebrafish PPI network, there are 1129 PPIs. There are many false positives in these candidate PPI networks, the corresponding microarray data will be employed to prune these false positives to obtain the realistic PPI networks in the following. In this study the nine time point C. albicans time-series microarray data are utilized to construct two dynamic PPI networks by pruning the false positives in the candidate PPI network different infection stages. Because hyphae appear to begin to grow in the zebrafish body from 2 to 4 h postinfection in the experimental microscopy images (Fig. 7.2), two groups of data are collected at different stages of infection to construct two separate PPI networks. With the C. albicans microarray data spanning 0.54 h, we have constructed a PPI network called the “adhesive stage network,” which represents C. albicans cells in the adhesion stage. Since cubic spline interpolation requires at least four data points to solve a cubic polynomial [326], we have included the 4-h data point to construct this network. With the C. albicans microarray data spanning 212 h, we have constructed another PPI network called the “hyphal stage
II. Systems Infection Microbiology
166
7. Identification of infection- and defense-related genes through dynamic hostpathogen interaction network
network,” which is used to represent C. albicans cells transitioning to the hyphal form. Similarly, we have also collected two groups of data at different stages of infection to construct two separate PPI networks for zebrafish as well: one for microarray data from 0.5 to 4 h and another for data from 2 to 12 h, which are named the zebrafish stage 1 PPI network and zebrafish stage 2 PPI network, respectively. By estimating the system parameters using the timeseries microarray data and selecting model order using the AIC measurement in Chapter 2, Biological Network Modeling and System Identification in Systems Immunology and Infection Microbiology [40,81], the false-positive interactions in the candidate PPI network for the infection process are deleted. This network refinement process leads to 550 proteins with 2725 PPIs in the adhesive stage PPI network and 555 proteins with 3171 PPIs in the hyphal stage PPI network: these two intracellular PPI networks could then be combined into the C. albicans dynamic hyphal PPI network for the infection process (Fig. 7.A1 in Appendix). Similar network refinements in the zebrafish data lead to 1248 proteins with 2344 PPIs in the zebrafish stage 1 PPI network and 1265 proteins with 2379 PPIs in the zebrafish stage 2 PPI network, and these two intracellular PPI networks could then be combined into the zebrafish dynamic PPI network for the defensive process (Fig. 7.A2 in Appendix). The C. albicans dynamic hyphal PPI network, the zebrafish dynamic PPI network, and the hostpathogen intercellular PPI network could be merged into an integrated infection intracellular/interspecies PPI network. The global system view of the C. albicans- and zebrafish-integrated infection intracellular/interspecies PPI network is given in Fig. 7.3. The entire integrated infection intracellular/interspecies network can be divided into eight levels according to the location of protein action (i.e., nucleus, intracellular, cell surface or extracellular) and species (i.e., C. albicans or zebrafish) and is composed of three subnetworks as shown in Fig. 7.3. In Fig. 7.3 the upper subnetwork is the dynamic hyphal intracellular PPI network of C. albicans. The middle subnetwork shows the hostpathogen intercellular interaction network. For simplicity, only the top five correlated PPI interactions of the C. albicans cell-surface proteins are listed. The bottom subnetwork is the dynamic defensive intracellular PPI network of zebrafish.
7.3.2 Inspection of the dynamic hyphal growth proteinprotein interaction network of Candida albicans In order to verify the accuracy of the identified dynamic hyphal growth PPI network, we will investigate whether this intracellular PPI network contains previously identified pathways related to hyphal growth as illustrated in Fig. 7.A3 in Appendix [327]. This figure displays signal transduction pathways involved in regulating morphogenesis in C. albicans. An inspection of the integrated infectious intracellular/interspecies network as shown in Fig. 7.3 can confirm that the identified C. albicans dynamic hyphal PPI network includes the MAP kinase cascade, cyclic AMP/PKA pathway, and other hypha-associated pathways. If we isolate these signaling pathways from Fig. 7.3, and construct a new hypha-related subnetworks as Fig. 7.4, then it is seen that this subnetwork is very similar to Fig. 7.A3 in Appendix. Our new hypha-related subnetwork contains almost all of the proteins and interactions of the already known hypha-related pathways. However, the GTP-binding protein Ras1 is not contained in our dynamic hyphal growth PPI network
II. Systems Infection Microbiology
7.3 Pathogenic/offensive mechanism between Candida albicans and zebrafish in infection process
167
FIGURE 7.3 Candida albicans and zebrafish integrated intracellular/interspecies dynamic PPI network during C. albicans infection of zebrafish: the infectious intercellular network is composed of three subnetworks. The upper subnetwork is the dynamic hyphal intracellular PPI network of C. albicans. The middle subnetwork shows the hostpathogen intercellular interaction network. For simplicity, only the top five correlated interactions of the C. albicans cell-surface proteins are listed. The bottom subnetwork is the dynamic defensive intracellular PPI network of zebrafish. This infectious intracellular/interspecies PPI network contains lines and nodes of three different colors. The red lines denote PPIs that do not appear in the stage 1 network but did in the stage 2 network. The green lines denote PPIs that appeared in the stage 1 network but did not in the stage 2 network. The blue lines denote PPIs that appeared in both the stage 1 and 2 networks. The node size denotes connectivity degree. The drawing of the PPI network was created using Cytoscape [8]. PPI, Proteinprotein interaction.
because its P-value is greater than .01 in the original protein pool selection step for C. albicans; but, Ras2, which is in the same family as Ras1, appears in the new subnetwork. Ras2 is similar to S. cerevisiae Ras2p, which can activate adenylate cyclase and is involved in S. cerevisiae pseudohyphal growth, and Ras2 mutants could have altered filamentous growth patterns [291]. Similar to Ras1 in Fig. Fig. 7.A3 in Appendix, Ras2 can also stimulate Cyr1 (Cdc35), which in turn acts as an intracellular second messenger during morphological switching. Ras2 also stimulates Cdc42 through Ras-related protein (Rsr1), which is
II. Systems Infection Microbiology
168
7. Identification of infection- and defense-related genes through dynamic hostpathogen interaction network
FIGURE 7.4 Signaling cascades involved in the dynamic hyphal growth protein interaction subnetwork of Candida albicans at different stages of infection: The MAP kinase signaling pathway, cyclic AMP signaling pathway, polarized cell growth pathway, and Rim101 signaling pathway were identified in our dynamic hyphal growth PPI network of C. albicans as occurring during C. albicans infection of zebrafish. These signal transduction pathways are involved in the dynamic regulation of C. albicans morphological transitions. The red lines indicate PPIs that do not appear in the adhesive stage network but do in the hyphal stage network. The green lines indicate PPIs that appear in the adhesive stage network but do not appear in the hyphal stage network. The blue lines indicate PPIs that appear in both the adhesive and hyphal stage networks. The red nodes indicate proteins that are also included in Fig. 7.A3; the yellow nodes indicate proteins that are not included in Fig. 7.A3 [8]. PPI, Proteinprotein interaction.
involved in budding, cell morphogenesis, and hyphal development processes [328]. Further, an interaction between Cdc42 and Wal1 is represented with a dotted line in Fig. 7.A3, implying this link is not completely known. However, we can see that Cdc42 links to Wal1 via Myo2 and Rho3 in the new identified hypha-related subnetworks. Myo2 is required for polarized cell growth and dimorphic switching in C. albicans and is also involved in hyphal development [329]. Rho3 is required for polarized cell growth and cell separation and also involved in hyphal development [330]. Obviously, the previously
II. Systems Infection Microbiology
7.3 Pathogenic/offensive mechanism between Candida albicans and zebrafish in infection process
169
uncertain pathway between Cdc42 and Wal1 could be more completely elucidated in our dynamic hyphal growth PPI network of C. albicans. Aside from these three well-known hypha-related signaling pathways—that is, the MAP kinase and cyclic AMP signaling pathways and the polarized cell growth pathway— the pH-dependent Rim101 pathway is also identified from the dynamic integrated infection intracellular/interspecies network. In this pathway, Nrg1 is not identified for inclusion in Fig. 7.4 due to its P-value being greater than .01. However, Tup1, which has the same function as Nrg1, fits into the pathways shown in Fig. 7.4, and hence the pHdependent pathway could seemingly be uninterrupted. In conclusion, 20 out of 22 proteins from already known signaling pathways (i.e., those proteins in Fig. 7.A3) are included in the identified C. albicans dynamic hyphal PPI network, in which the major hypha-related signaling pathways are all visible. These results verify the high accuracy of our infection interspecies PPI network. Moreover, the identified hyphal growth protein interaction subnetwork provides a dynamic and more complete hyphal network in comparison with Fig. 7.A3 in Appendix. Specifically, the yellow nodes in Fig. 7.4 represent the proteins that are not contained in Fig. 7.A3. These proteins are all related to hyphal growth or filamentous growth, and the figure reflects the true signaling pathways of these proteins.
7.3.3 Utilization of dynamic intracellular proteinprotein interaction networks to identify proteins with crucial roles in hyphal development We will utilize the dynamic intracellular PPI networks to investigate which proteins play important roles in hyphal development. Based on the dynamic hyphal PPI network of C. albicans during infection (Fig. 7.A1 in Appendix), we will explore proteins that interact with other proteins displaying large quantity variations in the adhesive and hyphal stage networks. In other words, the number of increased interactions and the number of reduced interactions for each protein will be summed to find out the largest interaction difference between the two stages. Table 7.1 lists the top 15 proteins by the magnitude of their PPI changes between two stages of infection. Further, these 15 proteins almost completely overlap with the hubs in the hyphal stage network (Table 7.A1 in Appendix). In general, the number of possible interactions for any given protein is paralleled by changes in protein interaction values. Therefore the quantity variations of the hub protein interactions should be larger than the conventional nodes in the infectious PPI network of C. albicans. From Fig. 7.A4 in Appendix, it can be seen that the hyphal growth intracellular PPI network is scale free. In a scale-free network the probability that a node is highly connected is significantly higher than in a random network, and the network’s properties are often characterized by a relatively small number of highly connected nodes known as hubs [148]. In general, the scale-free networks are particularly resistant to random node removal but extremely sensitive to the targeted removal of hubs [331]. Therefore the hubs are believed to be essential to the robustness of information transmission in the morphological transition from yeast to hyphal form in the C. albicans infection. In the hyphal growth PPI network, Ubi4 is found to be the protein for which the number of interactions has the largest changes in Table 7.1, and it is also the biggest hub in the hyphal stage network
II. Systems Infection Microbiology
170
7. Identification of infection- and defense-related genes through dynamic hostpathogen interaction network
TABLE 7.1 Proteins with the largest changes in proteinprotein interaction (PPI) number between the adhesive and hyphal stage networks during infection of Candida albicans [8]. GO functional annotation Candida albicans protein
Increased interactions
Reduced interactions
Total changed interactions
Ubi4
37
23
60
Act1
31
12
43
Hsp90
20
15
35
Sla1
22
8
30
1
Bni1
22
7
29
1
Sin3
22
6
28
Hht21
19
7
26
Mkc1
16
10
26
Phr2
17
8
25
Pmr1
17
7
24
1
Rvs161
14
9
23
1
Rad6
16
6
22
Hgc1
15
6
21
1
Vrp1
16
4
20
1
Clb2
12
7
19
Hyphal growth
Filamentous growth 1
1 1
1 1
1
1 1
1
1
The top 15 C. albicans proteins in the hyphal growth PPI networks are ranked by change in the number of protein interactions. The number of increased PPIs of these proteins between the adhesive and hyphal stages is given, along with the number of reduced ones. Total changed interactions display the sum of PPI changes. GO functional annotation provides the GO terms of proteins, which we have filtered to only list GO terms more concerned with hyphal growth and filamentous growth. The functional annotations are from the GO database. GO, Gene Ontology.
(Table 7.A1 in Appendix). Ubi4 is found to be involved in the negative control of morphological switching in C. albicans, as well as in maintaining yeast cell morphology [332]. From the time-profile data of Ubi4, the expression of Ubi4 is reduced from the 0.5- to 4-h time point (Fig. 7.A5 in Appendix). It is clear that Ubi4 expression drops when hyphae begin to grow. Act1 is the second-ranked protein in Table 7.1 in the total changed interactions. Act1 is an action that could influence both cAMP synthesis and hyphal morphogenesis [333]. Therefore Ubi4 and Act1 may play the most crucial roles in hyphal growth development, due to their high PPI variation between the adhesive and hyphal stage networks. Similarly, other highly ranked proteins in Table 7.1—Hsp90, Sla1, Bni1, etc.—have also been identified as potentially important factors in hyphal development [334336]. In fact, Hht21 is the only protein in Table 7.1 that has not been verified as related to hyphal growth yet. We predict that Hht21 also plays an important role in hyphal development and is a worthwhile protein to identify in the present study.
II. Systems Infection Microbiology
171
7.3 Pathogenic/offensive mechanism between Candida albicans and zebrafish in infection process
Therefore we still want to investigate which proteins have many increased interactions but fewer reduced interactions in the adhesive stage network than in the hyphal stage network. It is supposed that these proteins still have a lot of influence on hyphal growth development. In order to discard the many proteins with only minor interaction variations between the networks, the minimum number of changed interactions for a protein is raised to 10. Column 6 of Table 7.2 indicates the ratio of increased interactions to total changed interactions. From Table 7.2, it is found that 13 of the top 15 proteins from the GO database are related to hyphal development—that is, having GO terms of hyphal growth or filamentous growth. The first ranked protein is Kex2, which influences on C. albicans proteinase secretion and hyphae formation. The breakdown of Kex2 function in C. albicans has pleiotropic effects that may impinge on the ability of the organism to colonize and invade tissues [337]. The second-ranked protein, Hsl1, is a probable protein TABLE 7.2 Proportion of increased proteinprotein interactions (PPIs) to total changed PPIs between the adhesive and hyphal stage PPI network during infection of Candida albicans [8].
Candida albicans Ranking protein
Increased Reduced Total changed interactions interactions interactions
Ratio of increased interactions
GO functional annotation Hyphal growth
Filamentous growth
1
1
Kex2
10
0
10
1
2
Hsl1
15
1
16
0.9375
3
Tsa1
10
0
10
0.9167
1
4
Cek1
9
1
10
0.9
1
5
Chs3
9
1
10
0.9
1
6
Top1
9
1
10
0.9
1
7
Orf19.3843
15
2
17
0.8826
8
Kre1
10
2
12
0.8333
9
Sec2
10
2
10
0.8333
1
10
Cdc42
14
3
17
0.8235
1
11
Cst20
14
3
17
0.8235
1
12
Kem1
14
3
17
0.8235
1
13
Erg5
9
2
11
0.8182
14
Gpi7
9
2
11
0.8182
15
Orf19.5438
9
2
11
0.8182
1
1
1 1
The top 15 C. albicans proteins in the hyphal growth PPI networks are ranked by the proportion of increased PPIs to reduced PPIs. The number of increased PPIs of these proteins between the adhesive and hyphal stages is given, along with the number of reduced PPIs. Total changed interactions display the sum of PPI changes. The ratio of increased interactions shows the proportion of increased PPIs to total changed PPIs. GO functional annotation provides the GO terms of proteins, which we have filtered to only list the GO terms more concerned with hyphal growth and filamentous growth. The functional annotations are from the GO database. GO, Gene Ontology.
II. Systems Infection Microbiology
172
7. Identification of infection- and defense-related genes through dynamic hostpathogen interaction network
kinase involved in morphological determination during the cell cycle of both yeast form and hyphal cells via the regulation of Swe1 and Cdc28 [247]. The third-ranked protein is Tsa1, necessary for the yeasthyphal transition when C. albicans is cultured under oxidative stress [338]. The fourth- to seventh-ranked proteins are Cek1, Chs3, Top1, and Orf19.3843, respectively, and are also all related to hyphal development [339341]. The eighth- and fifteenth-ranked proteins are Kre1 and Orf19.5438, respectively. From the current literature, it is still unknown whether Kre1 and Orf19.5438 are related to hyphal growth. However, we can predict that Kre1 and Orf19.5438 may have some relationship to hyphal development, because the remaining 13 of the top 15 proteins in Table 7.2 have been confirmed to relate to this process. From the dynamic intracellular PPI network of zebrafish during infection as shown in Fig. 7.A2, we can also investigate which proteins have the largest interaction difference between zebrafish stage 1 and stage 2 networks. These protein interaction changes are found to be related to molecular defensive mechanisms activated in response to C. albicans invasion. Table 7.3 lists the top 10 proteins with the largest changes in the number of protein interactions. Moreover, we have also listed the hub proteins of the zebrafish stage 2 network in Table 7.A2. Tp53 is found to be the protein with the largest quantity variation in interaction between zebrafish stage 1 and stage 2 networks. Tp53 is already well known as an apoptosis protein [342]. Esr1 is found to be the second-ranked protein in Table 7.3, which is identified as an apoptosis-related protein in the GO database. Traf6 is the third-ranked protein; a recent study using a zebrafish embryo model has analyzed the in vivo function of Traf6 in the innate immune response without interference of adaptive immunity [343]. Traf6 can activate the NF-κB signal transduction pathway in zebrafish [344]. It can be seen that of the top 10 proteins in Table 7.3, many are related to innate immune response and apoptosis. It makes sense that the immunization-related proteins of a host may have a greater response to pathogen invasion. In addition, previous studies have found that many bacteria are able to trigger apoptosis in the host cell [345]. It is found that the induction of apoptosis in epithelial or endothelial cells might break the epithelia/endothelial cell barrier and permit the bacteria to reach the submucosa. On the other hand, apoptosis might be beneficial for the infected organ, since apoptotic cell death of the infected target cell can permit other cells to phagocytose the apoptotic bodies containing bacteria, possibly resulting in the rapid digestion of the pathogen. It is therefore also reasonable that apoptosis-associated protein interactions might have a greater defense response to pathogen invasion.
7.3.4 Interspecies proteinprotein interaction network between Candida albicans and zebrafish during infection After investigating the dynamic intracellular PPI networks of C. albicans hyphal growth and zebrafish defense during C. albicans infection, respectively, we could investigate the interspecies PPI network between C. albicans and zebrafish as shown in Fig. 7.3. The cross correlation of each protein interaction between C. albicans and zebrafish is calculated using TRIA, and the interactions higher than 0.95 are chosen as potential PPIs. In order to find out which C. albicans proteins have the greatest impact on zebrafish, we have analyzed which cell-surface proteins of C. albicans (taken from the 4031-protein set) have the most
II. Systems Infection Microbiology
7.3 Pathogenic/offensive mechanism between Candida albicans and zebrafish in infection process
173
TABLE 7.3 Proteins with the most changes in proteinprotein interaction (PPI) number between the zebrafish stage 1 and stage 2 networks during infection [8]. Danio rerio protein
Total Increased Reduced annotation interactions interactions changed
D. rerio GO interactions
Homo sapiens GO annotation
Tp53
10
10
20
Apoptosis
Apoptosis
Esr1
8
5
13
Metal ion binding
Regulation of apoptosis
Traf6
6
5
11
Innate immune response
Apoptosis
Regulation of apoptosis response to bacterium
Innate immune response
Jun
3
6
9
Canonical Wnt receptor signaling pathway
Innate immune response
Ar
5
4
9
Metal ion binding
Cell death positive regulation of NF-κB transcription factor activity
Usp14
5
2
7
Ubiquitin-dependent protein catabolic process
Ubiquitin thiolesterase activity
Hsp90a.1 5
2
7
Myofibril assembly
Rb1
2
4
6
Myoblast differentiation ubiquitin protein ligase binding
Negative regulation of cell growth
Mdm2
4
1
5
Negative regulation of apoptosis p53 binding
Fibroblast growth factor receptor signaling pathway
Casp8
1
3
4
Apoptosis Proteolysis
Activation of innate immune response apoptosis
The names of the top 10 zebrafish proteins in the PPI networks are ranked by change in the number of protein interactions. The number of increased PPIs of these proteins between stages 1 and 2 is given, along with the number of reduced PPIs. The final two columns provide the GO terms of proteins. D. rerio proteins are mapped to H. sapiens proteins by using ortholog data. GO, Gene Ontology.
potential interactions with zebrafish. The top 20 C. albicans hub proteins ranked by number of interactions are listed in Table 7.4, with higher ranked proteins considered to be more important in the infection process. These hub proteins can be classified into seven major cellular function groups according to the GO database, including hyphal growth, cell adhesion, biofilm formation, cellular response to neutral pH, cellular iron ion homeostasis, glucose transport, and cell wall molecular biosynthesis. The first four groups are well known cellular functions that occur during C. albicans infection. When C. albicans infects a host, it adheres to the host at first, then grows hyphae to invade host and further forms biofilms to parasitize in host. The ability to respond to ambient pH is also critical to the growth and virulence of C. albicans. It has been well established that a near-neutral pH (B6.5) favors hyphal development of C. albicans in vitro, while a low pH (,6.5) blocks hyphal formation and stimulates growth of the yeast form [346]. We also find an interesting fact that iron and glucose and cell wall molecular biosynthesis appear in this
II. Systems Infection Microbiology
174
7. Identification of infection- and defense-related genes through dynamic hostpathogen interaction network
TABLE 7.4 Numbers of proteinprotein interactions (PPIs) of Candida albicans cell-surface proteins in the hostpathogen interspecies PPI network [8]. GO functional annotation Interactions, n
Candida albicans protein
Cell Hyphal adhesion growth
Cellular Biofilm response to formation neutral pH
1447
Csh1
1
1
1089
Rbt5
1
1
1078
Pga10
1
1
1027
Pga7
964
Csa1
1
1
747
Mp65
728
Hgt2
701
Chs3
1
1
643
Cht2
1
1
532
Chs2
532
Hgt9
532
Mnt1
491
Hgt12
451
Phr1
435
Pga49
419
Cdc48
408
Hgt7
367
Als1
357
Tos1
352
Mid1
1
1
1
1
Cell wall Iron ion Glucose molecular homeostasis transport biosynthesis
1 1
1 1 1
1
1
1
1
1
1
1
1
1 1
1 1
1
1
1
1
The top 20 C. albicans hub proteins with the most potential interactions with zebrafish are listed; their cross-correlation values between protein interactions were larger than 0.95. The number of PPIs of C. albicans cell-surface proteins is sorted in descending order, along with the corresponding protein name. GO, Gene Ontology.
important table. Notably, in the recent years, iron ion uptake has been thought to be an important factor in pathogenesis. The ability to acquire iron from host tissues is found to be a major contributing factor to the virulence of pathogenic microorganisms. C. albicans, like many pathogenic bacteria, is able to utilize hemin and hemoglobin as iron sources [347]. The availability of iron can have a significant impact on both pathogen virulence and host antimicrobial defenses. Recently, several studies have shown that the pretreatment of endothelial cells with an iron chelator can reduce the damage inflicted by C. albicans [191,348]. Glucose has previously been reported to induce germ-tube formation
II. Systems Infection Microbiology
7.4 Discussion
175
in C. albicans [349,350], and Paranjape and Datta [351] have recognized it as a critical factor for its pH-regulated pathways. Chitin, β-glucan, and mannose are found to be essential molecules constituting most fungal cell walls. It is interesting that the cell wall proteins of C. albicans are predicted by the proposed system dynamic modeling to have so many interactions with zebrafish proteins. Recent studies have shown that cell wall β-glucan is a key fungal signature molecule targeted by the innate immune system to clear fungal infection and that C. albicans can mask β-glucan from immune recognition by using a mannoprotein coat [352,353]. The immune system may also be able to counteract this fungal defense by unmasking the signature components of the fungus during the course of infection. Wheeler et al. [353] have shown that β-glucan is initially masked but subsequently exposed on the surface of C. albicans in the normal course of infection. Although it is still unknown how this unmasking occurs, it is possible that with time immune cells can accumulate in sufficient numbers to directly damage the cell wall and expose the β-glucan on the surface of C. albicans. The chitin of the cell wall may be similarly destroyed such that C. albicans needs to synthesize more essential cell wall molecules to protect itself. It is seen from Table 7.4 that Csh1 (cell-surface hydrophobicity 1) is the largest hub in the intercellular C. albicanszebrafish PPI network. Recently, knockout of the Csh1 gene has been undertaken to address the potential contribution of its antigen in mediating fungal cell adhesion to host tissue [184]. Another study also has demonstrated that Csh1 can contribute to virulence of C. albicans in mice [354]. The fourth hub in Table 7.4 is Pga7 and its cellular function, in which two proteins (Pga49 and Tos1) are still unknown at present. As the top 20 hubproteins in Table 7.4 have the most important interactions with zebrafish, any one of these proteins should have a significant impact on zebrafish defense. It will be worthwhile to investigate the cellular functions of these three unknown proteins. Further, we have also listed the top 25 zebrafish hub proteins that have the most intercellular interactions with C. albicans in Table 7.A3. There are several proteins related to cell proliferation, blood coagulation, and proteolysis. It makes sense that more cells would need to proliferate to defend and maintain the normal operation of the host body while C. albicans invades, and zebrafish may also secrete proteolytic compounds to damage C. albicans. Proteolytic cascades also play a crucial role in innate immune response because they can be triggered more quickly than the adaptive immune response, which requires that gene expression be altered [355]. To summarize our findings, Table 7.5 provides the biological functions of C. albicans and zebrafish that are observed in the interspecies infection PPI network.
7.4 Discussion To discover C. albicans proteins or pattern recognition molecules that can play a critical role in the infection of zebrafish, we have constructed an intercellular infection PPI network consisting of C. albicans dynamic hyphal intracellular, zebrafish dynamic intracellular, and hostpathogen intercellular PPI networks. To verify the reliability and accuracy of the proposed systems biology methods for constructing the cellular molecular networks and validating the predicted results, we have compared the C. albicans dynamic hyphal
II. Systems Infection Microbiology
176
7. Identification of infection- and defense-related genes through dynamic hostpathogen interaction network
TABLE 7.5 Important biological processes of the Candida albicans invasion mechanism and zebrafish defense mechanism during infection [8].
Biological process
Candida albicans (invasion mechanism)
Zebrafish (defense mechanism)
Hyphal growth
Apoptosis
Cell adhesion
Innate immune response
Biofilm formation
Cell proliferation
Cellular response to neutral pH
Blood coagulation
Cellular iron ion homeostasis Glucose transport Cell wall molecular biosynthesis
Proteolysis
intracellular PPI network with already known signaling pathways implicated in hyphal growth. Twenty out of 22 proteins in already known pathways (Fig. 7.A3 in Appendix) are included in the C. albicans dynamic hyphal intracellular PPI network, and all major hypha-related pathways are also visible in the hyphal intracellular PPI network. This chapter also has provided a comprehensive and dynamic intracellular PPI network for hyphal development during the infection process, and we have additionally found that Ras2, Myo2, Rho3, Rsr1, etc. may play significant roles in hyphal growth processes. Based on the time-course information of C. albicans microarray data, we are also able to construct two intracellular PPI networks that could represent prehyphal growth (i.e., adhesion) mechanisms and hyphal growth (i.e., invasion) mechanisms for their respective stages. Using a similar approach for zebrafish, we are able to construct two stages of defense networks corresponding to these two invasion PPI networks of C. albicans. The proteins with the largest PPI variation between the two networks are elucidated for both C. albicans and zebrafish. In the C. albicans hyphal growth network, Ubi4 is found to be the protein with the most interaction changes between the adhesive stage and hyphal stage networks, and is also the biggest hub in the latter. Ubi4 in C. albicans is found to be involved in the negative control of morphological switching, as well as in maintaining yeast cell morphology. We have also listed other proteins with large variations—Act1, Hsp90, Sla1, Bni1, etc.—in Table 7.1, which also appear to be network hubs in Table 7.A1. We also predict that deletion of these proteins in C. albicans could strongly impact the robustness of information transmission during the morphological transition from yeast to hyphal form. Hht21 is a noteworthy exception in Tables 7.1 and 7.A1, which has not been verified as a related factor to hyphal growth. Further, we have investigated some proteins with many increased and fewer reduced interactions in the hyphal stage PPI network in comparison with the adhesive stage PPI network. We have also found that 13 of our top 15 proteins are identified as significant proteins related to hyphal growth development by the GO database. The top-ranked proteins—such as Kex2, Hsl1, Tsa1, Cek1, Chs3, and Top1—are all found to be related to hyphal growth, and we predict they are very important in hyphal development. Moreover, whether the 8th- and 15th-ranked proteins—Kre1 and Orf19.5438, respectively—are found to be related to hyphal growth is still unknown in the present literature; however, we might
II. Systems Infection Microbiology
7.5 Conclusion
177
predict that Kre1 and Orf19.5438 are indeed related to hyphal development, because the other 13 of the top 15 proteins in Table 7.2 have all been confirmed as such proteins. Interestingly, a recent study of deletion mutants in S. cerevisiae has revealed that the Kre1-deleted strain can significantly suppress the hyperpseudohyphal phenotype [356]. This result implies that Kre1 may also be related to filamentous growth in C. albicans. Due to the limited relevant researches on zebrafish PPIs, PPI mapping between zebrafish and H. sapiens is still incompletely characterized, with the variations in protein interaction levels in zebrafish dynamic intracellular PPI networks appearing very small in comparison with the C. albicans hyphal intracellular PPI network. However, we are still able to list the top 10 proteins in Table 7.3 having the largest changes in the number of protein interactions and finally find them to be mainly related to apoptosis and innate immunity in C. albicans infection. The intercellular PPI network between C. albicans and zebrafish is also analyzed in this study. The top 20 C. albicans hub proteins having the most intercellular interactions with zebrafish proteins are also listed in Table 7.4. These hub proteins can be classified into seven major cellular functions according to their annotations in the GO database, that is, hyphal growth, cell adhesion, biofilm formation, cellular response to neutral pH, cellular iron ion homeostasis, glucose transport, and cell wall molecular biosynthesis. The first four are the well-known cellular functions that will occur when C. albicans infects its host. Ionic iron is necessary for hyphal growth, and its availability has been shown as a factor connected with pathogenesis recently. The ability to acquire iron from host tissues is a major factor to affect the pathogenicity of microorganisms. However, a detailed molecular mechanism for the participation of iron in C. albicans infection is still unclear. Recently, glucose has been reported to induce germ-tube formation in C. albicans. During the infection process, C. albicans needs energy and nutrients while it infects its host. To obtain them, it may acquire glucose from host tissues and so glucose transport functionality in C. albicans might have many intercellular interactions with zebrafish. Further, it will be worthwhile to investigate why cell wall molecular proteins have so many interactions with host proteins. Recent studies have shown that the immune system of host may be able to counteract this fungal defense by unmasking the signature molecular components of the fungus during the course of infection. Although it is still not known how this unmasking occurs, it is possible that with time, immune cells can accumulate in enough numbers to directly damage the cell wall and expose β-glucan of C. albicans. The chitin of the cell wall may also be destroyed during this accumulation of immune cells as well, so C. albicans still need to synthesize more cell wall molecules to protect itself.
7.5 Conclusion In this chapter, our aim is to construct a hostpathogen intercellular integrated PPI network by using the microarray data of C. albicans and zebrafish and then utilize it to predict which proteins will play critical roles in the infection process of pathogens and the defense process of hosts. In conclusion, we have identified several important proteins related to C. albicans infection such as Ubi4, Act1, Kex2, Hsl1, and Tsa1, and some proteins the contribution of which to pathogenicity still needs further investigation, such as Hht21, Kre1, and Orf19.5438. These proteins may exert a tremendous influence on morphological
II. Systems Infection Microbiology
178
7. Identification of infection- and defense-related genes through dynamic hostpathogen interaction network
transition of C. albicans, and hence they may provide useful biomarkers as drug targets for broad-spectrum treatments of C. albicans infection. Moreover, three noteworthy cellular functions in C. albicans infection—cellular iron ion homeostasis, glucose transport, and cell wall molecular biosynthesis—are also discovered from the perspective of the intercellular PPI network. Several significant proteins related to innate immune and apoptotic function such as Tp53, Esr1, and Traf6 are found to play crucial role in the molecular defensive mechanisms of zebrafish responsive to C. albicans invasion. Furthermore, biological cellular processes such as apoptosis, innate immune response, cell proliferation, blood coagulation, and proteolysis are also found in systematic defensive mechanisms of zebrafish during C. albicans infection. Finally, we hope that the proposed intercellular protein interaction method implemented through the dynamic hostpathogen interaction identification may ultimately help provide useful biomarkers of drug targets for medical therapies and the development of new antifungal drugs.
7.6 Appendix
FIGURE 7.A1
Candida albicans dynamic intracellular hyphal PPI network for the infection process [8]. PPI, Proteinprotein interaction.
II. Systems Infection Microbiology
FIGURE 7.A2
Zebrafish dynamic intracellular PPI network for the defensive process [8]. PPI, Proteinprotein
interaction.
FIGURE 7.A3
Pathways related to hyphal growth of Candida albicans [8].
180
7. Identification of infection- and defense-related genes through dynamic hostpathogen interaction network
FIGURE 7.A4 Hyphal growth intracellular PPI network is scale free [8]. PPI, Proteinprotein interaction.
FIGURE 7.A5 The time-profile data of Ubi4 [8].
II. Systems Infection Microbiology
181
7.6 Appendix
TABLE 7.A1 infection.
Hubs in the hyphal stage intracellular proteinprotein interaction (PPI) network during
Hub
GO term
Number of interactions
Ubi4
GO:0000902: cell morphogenesis
115
GO:0030447: filamentous growth GO:0009405: pathogenesis Hsp90
GO:0030447: filamentous growth
54
GO:0009405: pathogenesis Sin3
GO:0030447: filamentous growth
54
Act1
GO:0030448: hyphal growth
53
Mkc1
GO:0030447: filamentous growth
52
GO:0009405: pathogenesis GO:0030448: hyphal growth Sla1
GO:0030448: hyphal growth
50
Cdc3
GO:0001411: hyphal tip
50
Pmr1
GO:0030448: hyphal growth
47
Phr2
GO:0030447: filamentous growth
47
GO:0009405: pathogenesis Cdc28
GO:0010570: regulation of filamentous growth
43
GO:0051726: regulation of cell cycle Bni1
GO:0030448: hyphal growth
41
GO:0009405: pathogenesis GO:0030447: filamentous growth Orf19.3843
GO:0030447: filamentous growth
40
Snf1
GO:0042710: biofilm formation
39
GO:0007155: cell adhesion GO:0007124: pseudohyphal growth Hht21
GO:0007094: mitotic cell cycle spindle assembly checkpoint
36
GO:0009303: rRNA transcription Erg3
GO:0030448: hyphal growth
36
GO:0005506: iron ion binding Column 1 gives the names of the top 17 hub proteins in the hyphal growth PPI network ranked by number of PPIs. Column 2 provides the GO terms of proteins, which we have filtered to, only list the GO terms more concerned with hyphal growth. Column 3 shows the number of PPIs of these hub proteins [8]. GO, Gene Ontology.
II. Systems Infection Microbiology
182
7. Identification of infection- and defense-related genes through dynamic hostpathogen interaction network
TABLE 7.A2 Hubs in the zebrafish stage 2 intracellular proteinprotein interaction (PPI) network during infection. Zebrafish protein
Zebrafish GO term
Homo sapiens GO term
Number of interactions
Tp53
GO:0006915: apoptosis
GO:0006915: apoptosis
58
GO:0042981: regulation of apoptosis
33
GO:0043065: positive regulation of apoptosis Esr1
GO:0046872: metal ion binding GO:0008270: zinc ion binding
Hsp90a.1
GO:0014866: skeletal myofibril assembly
32
GO:0045429: positive regulation of nitric oxide biosynthetic process GO:0048769: sarcomerogenesis GO:0030235: nitric-oxide synthase regulator activity Traf6
GO:0016567: protein ubiquitination
GO:0006915: apoptosis
GO:0042981: regulation of apoptosis
GO:0045087: innate immune response
GO:0009617: response to bacterium
GO:0042088: T-helper 1 type immune response
31
GO:0009615: response to virus GO:0046872: metal ion binding GO:0008270: zinc ion binding Ikbkg
GO:0007252: IκB phosphorylation
31
GO:0051092: positive regulation of NF-κB transcription factor activity GO:0006974: response to DNA damage stimulus Jun
GO:0060070: canonical Wnt receptor signaling GO:0045087: innate immune pathway response GO:0006355: regulation of transcription, DNA-dependent
27
GO:0043525: positive regulation of neuron apoptosis
GO:0046686: response to cadmium ion Ube2i
Ar
GO:0007049: cell cycle
GO:0051301: cell division
GO:0007088: regulation of mitosis
GO:0016567: protein ubiquitination
GO:0046872: metal ion binding
GO:0008219: cell death
GO:0008270: zinc ion binding
GO:0008283: cell proliferation
27
26
(Continued)
II. Systems Infection Microbiology
183
7.6 Appendix
TABLE 7.A2 Zebrafish protein Myca
(Continued) Homo sapiens GO term
Zebrafish GO term GO:0006355: regulation of transcription, DNA-dependent
Number of interactions 25
GO:0006351: transcription, DNA-dependent Src
GO:0006468: protein phosphorylation
GO:0016337: cellcell adhesion
24
GO:0005524: ATP binding Column 1 gives the names of the top 10 hub proteins ranked by number of protein interactions in the zebrafish stage 2 intracellular PPI network. Columns 2 and 3 provide the GO terms of the proteins from zebrafish and H. sapiens, respectively. Column 4 indicates the number of PPIs of these hub proteins [8]. GO, Gene Ontology.
TABLE 7.A3 The number of proteinprotein interactions (PPIs) of zebrafish proteins in hostpathogen interspecies PPI network in the infectious processing. GO term
Number of Zebrafish Interaction protein
Biological process
Molecular function
37
GO:0009953: dorsal/ventral pattern formation
GO:0008083: growth factor activity
GO:0008284: positive regulation of cell proliferation
GO:0005179: hormone activity
Igf1
GO:0005159: insulin-like growth factor receptor binding 37
Chia.2
GO:0005975:chitin catabolic process
35
Nbl1
GO:0060872: semicircular canal development
34
F10
GO:0007596: blood coagulation
GO:0003824: catalytic activity
GO:0006508: proteolysis
GO:0004252: serine-type endopeptidase activity
GO:0007596: blood coagulation
GO:0005509: calcium ion binding
GO:0030195: negative regulation of blood coagulation
GO:0003824: catalytic activity
34
F7i
GO:0008061: chitin binding
GO:0004252: serine-type endopeptidase activity 33
Igfbp2a
GO:0001525: angiogenesis GO:0040007: growth GO:0007507: heart development GO:0007275: multicellular organismal development (Continued)
II. Systems Infection Microbiology
184
7. Identification of infection- and defense-related genes through dynamic hostpathogen interaction network
TABLE 7.A3 (Continued) Number of Zebrafish Interaction protein
GO term Biological process
Molecular function
GO:0008285: negative regulation of cell proliferation GO:0008156: negative regulation of DNA replication GO:0001558: regulation of cell growth GO:0043567: regulation of insulin-like growth factor receptor signaling pathway 33
Mstnb
GO:0040007: growth GO:0007517: muscle organ development GO:0045926: negative regulation of growth GO:0007179: transforming growth factor beta receptor signaling pathway
33
Slc34a2a
GO:0006817: phosphate transport
32
cx30.9
GO:0007154: cell communication
32
Itgb1b.2
GO:0007155: cell adhesion
GO:0015321: sodium-dependent phosphate transmembrane transporter activity
GO:0007160: cell-matrix adhesion GO:0007229: integrin-mediated signaling pathway GO:0007275: multicellular organismal development 32
Chia.1
GO:0005975:chitin catabolic process
31
Asah2
GO:0006672: ceramide metabolic process
GO:0008061: chitin binding
GO:0006629: lipid metabolic process GO:0007275: multicellular organismal development GO:0006665: sphingolipid metabolic process 30
F9
GO:0007596: blood coagulation GO:0006508: proteolysis
30
Glra4b
30
Hbl4
30
Hpx
GO:0006811: ion transport GO:0005529: sugar binding GO:0042221: response to chemical stimulus
GO:0046872: metal ion binding (Continued)
II. Systems Infection Microbiology
185
7.6 Appendix
TABLE 7.A3
(Continued) GO term
Number of Zebrafish Interaction protein
Biological process
Molecular function
29
GO:0043691: reverse cholesterol transport
GO:0017127: cholesterol transporter activity
Cetp
GO:0008289: lipid binding 29
F2
GO:0007596: blood coagulation
GO:0005509: calcium ion binding
GO:0006508: proteolysis
GO:0003824: catalytic activity GO:0016787: hydrolase activity GO:0008233: peptidase activity GO:0004252: serine-type endopeptidase activity GO:0008236: serine-type peptidase activity
29
si:ch211140f21.1
GO:0004866: endopeptidase inhibitor activity
29
Zgc:163025 GO:0007596: blood coagulation
GO:0005509: calcium ion binding
GO:0006508: proteolysis
GO:0003824: catalytic activity GO:0004252: serine-type endopeptidase activity
29
Kcna6
GO:0006811: ion transport GO:0006813: potassium ion transport GO:0055085: transmembrane transport
28
Colec11
GO:0005537: mannose binding GO:0005529: sugar binding
27
Chrnd
GO:0006811: ion transport GO:0030239: myofibril assembly
27
Glra3
GO:0006811: ion transport
27
Apoea
Unknown
The top 25 zebrafish hub proteins ranked by number of potential interactions with Candida albicans proteins are listed in this table; their PPI cross correlations are larger than 0.95. Column 1 indicates the number of PPIs of zebrafish proteins sorted in descending order. Column 2 provides the protein name in zebrafish; Columns 3 and 4 provide the corresponding GO terms [8]. GO, Gene Ontology.
II. Systems Infection Microbiology
C H A P T E R
8 Hostpathogen proteinprotein interaction network for Candida albican pathogenesis and zebrafish redox process through dynamic interspecies interaction model and two-sided genome-wide microarray data 8.1 Introduction Despite clinical research and development in the last decades, infectious diseases remain a top global problem in public health today, being responsible for millions of morbidities and mortalities each year [357359]. Investigating the infection process in detail can aid the understanding of molecular mechanisms that underlie infection and the control of infection diseases. To obtain an in-depth understanding of the infectious process, the specific molecular interactions between the virulence factors of the invasive pathogen and the defensive mechanisms of the host need to be elucidated. Candida albicans is one of the most common fungal pathogens of medical importance [360]. In severe cases, C. albicans can penetrate through epithelial layers into deeper tissues and cause life-threatening systemic infections [144]. C. albicans can grow in a budded yeast form or in a highly polarized hyphal form; its yeast-to-hyphal transition ability in response to environmental changes is one of its most well-known virulence characteristics [361]. In addition to dimorphism, a number of fungal attributes, such as the expression of adhesion factors, directed growth/thigmotropism, stress adaptation, metabolic flexibility and the secretion of hydrolytic enzymes have been also implicated in the infection process [142]. However, at present, the exact molecular mechanisms by which C. albicans attaches to epithelial surfaces invades various epithelial barriers, causes cell and tissue damage, and disseminates within the host are still not fully understood [145].
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00023-7
187
© 2021 Elsevier Inc. All rights reserved.
188
8. Hostpathogen proteinprotein interaction network
Recently, zebrafish has been increasingly used in biomedical research due to their high reproductive rate, comprehensive molecular tools, and low maintenance costs [362,363]. In general, zebrafish are more similar to mammals than other mini-hosts (such as Drosophila melanogaster, Galleria mellonella, and Caenorhabditis elegans) in terms of genetics, physiology, and anatomical structure, and, most importantly, they have both innate and adaptive immune functions [316,364]. As a result, the zebrafish model has been used to study human pathogens or closely related animal pathogens, either using adult fish with a fully developed adaptive immune system or using embryos or larvae that rely solely on innate immunity [365,366]. Chao et al. have used zebrafish as a mini-vertebrate host system for their study of C. albicans infection, demonstrating that C. albicans can colonize and invade zebrafish at multiple anatomical sites and kill the fish in a dose-dependent manner [320]. Hence, zebrafish are very suitable for our study of characterizing hostpathogen interactions with C. albicans. Hostpathogen interactions are enormously complex processes. While the traditional biological research, which isolates and studies small sets of components, may provide some insights, these approaches are not well suited to address host/pathogen interaction mechanisms on a larger and more general scale [251]. To this end a systems biology approach is an emerging strategy to better comprehend the underlying molecular mechanisms that occur during host/pathogen interactions [367]. Indeed, several different systems biology approaches have demonstrated their effectiveness [368]. These approaches rely on an unbiased and global understanding of the transcriptomics of the host/pathogen during the infection process. Further computational analyses of genome-wide gene expression profiles have partially revealed the molecular mechanisms of interaction between host and pathogen, leading to a deeper understanding of the infection process [369]. Nevertheless, the majority of these studies have addressed the pathogen or host transcriptomics individually rather than simultaneously analyzing both interaction partners. Consequently, in this chapter, we aimed to analyze the host and pathogen simultaneously and consider the interacting host and pathogen as an orchestrated system. During the dynamically changing environment of hostpathogen interactions, both host and pathogen have evolved numerous strategies for adaptation. These adaptations are mediated by complex interaction networks, which lead to changes to gene-expression patterns. Consequently, we intended to elucidate the adaptation mechanisms by understanding the underlying interaction networks. Although several network construction schemes have been successfully applied to many biological scenarios, these have focused mainly on a single species. Recently, some computational prediction methods to infer hostpathogen interactions have been developed based on interologs [370,371] or gene-expression profiles [372,373]. In Chapter 7, Identification of Infection-Related and Defense-Related Genes Through Dynamic Host-Pathogen Interaction Network, we have identified the intercellular host/pathogen proteinprotein interaction (PPI) network by the cross-correlation method via the microarray data between C. albicans cell surface proteins and zebrafish proteins, instead of using host/ pathogen dynamic interaction model. In this chapter, we developed a computational systematic framework that integrated ortholog-based PPI inference and dynamic modeling of regulatory responses during C. albicans infections to construct the interspecies PPI network for the characterization of host/pathogen interactions. With PPI data for two well-studied organisms, Saccharomyces cerevisiae and Homo sapiens, and the cross-species ortholog information among these species, we first inferred the candidate interspecies PPI network consisting of putative
II. Systems Infection Microbiology
8.2 Construction of host/pathogen proteinprotein interaction network
189
interspecies and intracellular PPIs. We then used multivariate dynamic models to describe the regulatory responses between pathogen and host in the infection process and to prune the candidate network based on the simultaneously quantified C. albicans/zebrafish interaction transcriptomics [374]. The identified C. albicans/zebrafish interspecies PPI network highlights the association between C. albicans pathogenesis and the zebrafish redox process, indicating that redox status is critical in the battle between the host and pathogen. With the accumulation of more interspecies transcriptomics data, the proposed interspecies host/pathogen network construction framework can be used to efficiently explore progressive host/pathogen network rewiring over time. Consequently, this proposed method could benefit the development of systems medicine for infectious diseases.
8.2 Construction of host/pathogen proteinprotein interaction network 8.2.1 Overview of the host/pathogen proteinprotein interaction network construction framework The proposed interspecies host/pathogen PPI network construction framework is depicted in a schematic overview in Fig. 8.1. The overall strategy is that we first infer the putative interspecies and intracellular PPIs among the proteins of interest and collect them as a candidate interspecies host/pathogen PPI network. Since the candidate interspecies host/pathogen network cannot accurately represent the actual C. albicanszebrafish interactions, it should be further validated and pruned. To this end, dynamic models are used to describe the regulatory responses of host/pathogen during the infection process. With the help of simultaneous time-course microarray data for both C. albicans and zebrafish during C. albicans/zebrafish interactions, the regulatory abilities of interacting proteins in the multivariate dynamic models are identified. On the basis of these regulatory abilities, significant PPIs are determined and the candidate interspecies network is pruned, leading to the refined interspecies host/pathogen PPI network for C. albicans/zebrafish interactions. The details of the construction process are described in the following sections.
8.2.2 Data mining and integration of two-sided microarray data This section describes the sources of all the data used in this study. Both zebrafish and C. albicans genome-wide microarray data were downloaded from the GEO database (GSE32119). Microarray experiments were performed to simultaneously profile genome-wide gene expressions in both C. albicans and zebrafish during the infection process. Adult AB strain zebrafish were intraperitoneally injected with 1 3 108 C. albicans (SC5314 strain) cells. Then, a two-step homogenization/mRNA extraction procedure was performed using the whole zebrafish infected with C. albicans. This approach could provide separate pools of gene transcripts from both the host and the pathogen, enabling individual estimation of specific gene expression profiles in either the host or the pathogen using sequence-targeted probes derived from the individual genome [374]. Agilent in situ oligonucleotide microarrays, which cover 6202 and 26,206 genes for C. albicans and zebrafish, respectively, were used to profile time-course gene expression at nine time points (0.5, 1, 2, 4, 6, 8, 12, 16, 18 h postinfection) with three replicates
II. Systems Infection Microbiology
190
8. Hostpathogen proteinprotein interaction network
FIGURE 8.1 Schematic overview of the interspecies PPI network construction framework. PPI data from the BioGRID database, ortholog information from CGD, ZFIN, InParanoid, and simultaneous time-course microarray data for both C. albicans and zebrafish during C. albicanszebrafish interactions are used for interspecies PPI network construction. On the basis of the PPI data for Saccharomyces cerevisiae and Homo sapiens and the ortholog information among these related species, putative interspecies and intracellular PPIs are inferred, which constitute the candidate PPI interspecies network. Then, using multivariate dynamic modeling of PPIs and simultaneously quantified microarray data, the interactive abilities are identified, and the significant interactions determined. In this manner the candidate interspecies network is pruned to construct the refined host/pathogen interspecies PPI network. In the candidate PPI interspecies network and the refined host/pathogen interspecies PPI network, yellow and pink nodes indicate C. albicans and zebrafish proteins, where blue, green, and gray edges denote C. albicans intracellular interactions, zebrafish intracellular interactions, and interspecies interactions, respectively [6]. BioGRID, Biological General Repository for Interaction Datasets; CGD, Candida Genome Database; PPI, proteinprotein interaction. II. Systems Infection Microbiology
8.2 Construction of host/pathogen proteinprotein interaction network
191
for both organisms [374]. Manipulation of the animal model was approved by the Institutional Animal Care and Use Committee of National Tsing Hua University (IRB Approval No. 09808). In order to construct the interspecies PPI network for the characterization of C. albicanszebrafish interactions, the PPIs from S. cerevisiae and H. sapiens and the ortholog information among these species were used to infer the putative interspecies and intracellular PPI due to the lack of sufficient information in the C. albicans, and zebrafish interactomes, and their interspecies interactions. The PPI data for both S. cerevisiae and H. sapiens were obtained from the database of Biological General Repository for Interaction Datasets (BioGRID) [73]. In BioGRID version 3.2.95, there are 89,445 nonredundant physical interactions among 15,690 proteins for H. sapiens and 75,065 nonredundant physical interactions among 6043 proteins for S. cerevisiae. The ortholog information for the four species, namely, zebrafish, H. sapiens, C. albicans, and S. cerevisiae, were acquired from the following databases: InParanoid [375], ZFIN [376], and the Candida Genome Database (CGD) [377]. The cellular information for both C. albicans and zebrafish proteins were retrieved from the Gene Ontology (GO) [273], CGD, and ZFIN databases.
8.2.3 Selection of protein pool The first step of interspecies PPI network construction is to select the proteins of interest for both host and pathogen. Generally, selection of proteins can be divided into two categories: expression-based selection and function-based selection. For expression-based selection, statistical methods such as one-way analysis of variance (ANOVA) or simply fold change selection are usually applied to gene expression profiles from microarray experiments or RNA sequencing (RNA-seq) for the global selection of genes/proteins of interest. In this case the constructed network will represent the global scenario for all the dynamically regulated genes/proteins under the experimental condition. On the other hand, the functionbased selection method is applied only if we want to construct the PPI network for specific functions. GO annotations are useful tools for functional annotation of genes/proteins. In this chapter, one-way ANOVA was employed to detect significant gene expression variations across nine time points for each gene. In this manner the dynamically regulated genes in both C. albicans and zebrafish could be selected for the global characterization of C. albicans zebrafish interactions. The null hypothesis of ANOVA assumed that the average expression level of a gene would be the same at every time point [282]. Genes with Bonferroni-adjusted P-values of less than .05 were identified as dynamically regulated genes and their corresponding gene products were selected in the protein pool as target proteins.
8.2.4 Inference of putative interspecies and intracellular proteinprotein interactions After the selection of proteins of interest for both C. albicans and zebrafish, we then sought to identify the candidate interspecies PPI network among these selected proteins. However, due to the extremely low coverage of the C. albicans and zebrafish interactomes and the lack of interspecies PPIs between C. albicans and zebrafish, the ortholog-based PPI prediction was used to infer the putative PPIs within and between C. albicans and zebrafish [4,370,371]. The PPI data of S. cerevisiae and H. sapiens from BioGRID and the ortholog information among these species
II. Systems Infection Microbiology
192
8. Hostpathogen proteinprotein interaction network
from the InParanoid, CGD, and ZFIN databases were used to infer the putative interspecies and intracellular PPIs of C. albicans and zebrafish. The concept of the ortholog-based PPI inference is shown in Fig. 8.1. For example, suppose that protein A0 and protein B0 of S. cerevisiae (or H. sapiens) are shown to interact based on the BioGRID database. From the InParanoid and CGD databases, we further identify that C. albicans protein A is orthologous to protein A0 ; from the InParanoid and ZFIN databases, we identify that zebrafish protein B is orthologous to protein B0 . Based on the data mining via these databases, we infer that protein A in C. albicans and protein B in zebrafish are a putative interspecies PPI pair. Similarly, the putative intracellular PPIs can also be predicted for both C. albicans and zebrafish. Following this data mining methodology, the putative interspecies and intracellular PPIs were inferred and the candidate interspecies PPI network can be constructed by simply linking proteins inferred by our proposed method to interact with each other (Fig. 8.1). It should be noted that the putative interspecies and intracellular PPIs inferred from the ortholog-based method were derived under many different experimental conditions, which cannot accurately reflect the actual condition of host/ pathogen interactions during C. albicans infections; that is, false positives may be present among these putative PPIs. Therefore these putative PPIs should be further validated by time series microarray data of C. albicanszebrafish interactions as described in the following section.
8.2.5 Multivariate dynamic modeling and identification of host/pathogen proteinprotein interaction network during Candida albicans infections In order to validate the putative PPIs and to prune the candidate interspecies network obtained before using the simultaneously quantified C. albicanszebrafish interaction transcriptomics, dynamic models were employed to describe the regulatory responses of infection. For both C. albicans and zebrafish the gene expression of target protein i in the candidate interspecies network can be described by the following multivariate linear dynamic model: xi ½t 1 1 5 xi ½t 1
Ji X j51
aij xj ½t 1
K X
bik yk ½t 2 λi xi ½t 1 hi 1 εi ½t
(8.1)
k51
where xi ½t represents the expression level at time t the target protein i ði 5 1; 2; . . .; N Þ, aij denotes the regulatory ability of the jth intracellular interactive protein to the ith target protein, xj ½t represents the expression level of the jth intracellular protein interacting with the target protein i, bik denotes the interactive ability of the kth interspecies interactive protein to the ith target protein, yk ½t represents the expression level of the kth interspecies protein interacting with the target protein i, λi indicates the degradation effect of the target protein i, hi represents the basal expression level, εi ½t represents the stochastic noise, and Ji and Ki denote the numbers of intracellular and interspecies proteins interacting with target protein i in the candidate interspecies PPI network. In other words, only the proteins interacting with target protein i in the candidate interspecies PPI network were described in the multivariate linear dynamic model, therefore constraining the multivariate dynamic model based on the candidate interspecies network. In addition, it should be noted that only the mRNA expression level of the corresponding target protein were used in this equation, not the concentrations of the proteins. The biological interpretation of Eq. (8.1) is that the expression
II. Systems Infection Microbiology
8.2 Construction of host/pathogen proteinprotein interaction network
193
level i of the target protein i at the next time t 1 1 is determined by the current expression level, the interactive effects of Ji intracellular interactive proteins, the interactive effects of Ki interspecies interactive proteins, the degradation of the present state, the basal protein level from other sources beyond the interactive proteins in the system, and some stochastic noises. For each target protein with putative PPIs in the candidate interspecies PPI network, a dynamic model was constructed. Consequently, a set of dynamic interactive equations for all the target proteins can be used to describe the entire candidate interspecies PPI network.
8.2.6 Identification of interactive abilities and determination of significant interactions of host/pathogen proteinprotein interaction network From the network point-of-view the interspecies PPI network depicted by the multivariate linear dynamic models in Eq. (8.1) represents how C. albicans and zebrafish interact during the infection process. Once C. albicans invades zebrafish tissues and initiates the infection process as the interaction between pathogen and host, some interacting proteins between C. albicans PKi and zebrafish become active. These interspecies interactions are captured by the term k51 bik yk ½t in Eq. (8.1); P i the response of intracellular protein interactions is instead reflected through the term Jj51 aij xj ½t. In other words the interactive abilities, specifically, the parameters bik and aij terms, indicate the weighting of the edges in the interspecies PPI network. Hence, it is essential to identify these interactive abilities and determine the significant interactions during C. albicans infections such that the candidate interspecies PPI network can be further pruned into the refined interspecies PPI network that could accurately capture C. albicanszebrafish interactions during the infection process. With the help of simultaneously quantified time-course microarray data for both C. albicans and zebrafish during the infection process, identification of parameters in the candidate interspecies PPI network was performed protein by protein. Since the basal expression level hi in Eq. (8.1) should always be nonnegative, some constraints should be employed when identifying the system parameters. Therefore the system parameters were identified by solving the constrained least squares problem [40,82]. The multivariate linear dynamic model in Eq. (8.1) can be rewritten as the following regression form: 3 2 ai1 7 6 ^ 7 6 6 aiJi 7 7 6 6 bi1 7 7 1 εi ½t 6 xi ½t 1 1 5 x1 ½t ? xJi ½t y1 ½t ? yKi ½t xi ½t 1 6 7 ^ (8.2) 7 6 6 biK 7 i 7 6 4 ð1 2 λi Þ 5 hi T ½ ½ φi t θi 1 εi t
where φi ½t denotes the regression vector that can be obtained from the data and θi is the parameter vector to be estimated for target protein i. In order to avoid overfitting the estimated parameters, the original data points (nine time points from the original microarray data) were interpolated to L data points by the cubic spline method (L roughly equals to five times the number of parameters that need to be identified, namely, 5ð Ji 1 Ki 1 2Þ for II. Systems Infection Microbiology
194
8. Hostpathogen proteinprotein interaction network
target protein i since the parameters to be identified are ai1 . . .aiJi ; bi1 . . .biKi ; λi ; and hi ). In other words, there were xi ½l 1 1; φi ½l data point pairs for lAf1; . . .; L 2 1g. Hence, Eq. (8.2) can be written in the following form for target protein i of candidate interspecies PPI network:
X i 5 Φi θ i 1 E i where
(8.3)
2
3 2 3 2 3 xi ½ 2 ε i ½ 1 φTi ½1 5 5 ; Ei 5 4 ^ Xi 5 4 ^ 5; Φi 5 4 ^ xi ½L εi ½L 2 1 φTi ½L 2 1
In this manner the parameter estimation problem for target protein i in the candidate interspecies PPI network (Eq. 8.3) can be represented by the following constrained least squares minimization equation [82]:
1 2 (8.4) min :Φi θi 2Xi :2 such that A θi # b θi 2 T 0 0 21 and b 5 0 ? 0 , constraining the parameters hi
where A 5 diag 0 ? to be nonnegative. Once the system parameters for all proteins in the candidate interspecies PPI network were identified using Eq. (8.4), the significant protein interactions could be determined based on the estimated interactive abilities (the bij and aij terms). Akaike information criterion (AIC) [40,81] and the Student’s t-test [282] were applied for system order selection and for determining significance of the protein interactions. AIC, which includes both the estimated residual error and model complexity in one statistic, quantifies the relative goodness of fit of a model. For a protein interaction model with Ji 1 Ki interaction parameters (or proteins) to fit with data from L samples, the AIC can be written as follows [40,81]: 1 2ðJi 1 Ki Þ (8.5) AICðJi 1 Ki Þ 5 log ðXi 2 X^ i ÞT ðXi 2 X^ i Þ 1 L L where X^ i denotes the expression profile of the ith target protein, that is, estimated X^ i 5 Φi θ^ i , and σ^ 2i 5 1=L ðXi 2 X^ i ÞT ðXi 2 X^ i Þ is the estimated residual error. As the residual error σ^ 2i decreases, the AIC decreases. In contrast, while the number of interactive proteins (or parameters) Ji 1 Ki increases, the AIC increases. Therefore there is a trade-off between residual error and model order. As the expected residual error decreases with increasing number of interactive proteins in the model of inadequate complexity, there should be a minimum around the optimal interactive protein number. The minimization achieved in Eq. (8.5) will indicate the true model order (namely, the true number of proteins that interact with the target protein) of the protein interaction system. The true Ji 1 Ki interactive proteins could be selected by the minimization of the AIC. Hence, AIC can be adopted to select system order, filtering out insignificant (false positive) protein interactions in the candidate interspecies network based on the estimated interactive abilities (bik and aij terms). Once the estimated regulatory abilities were examined using the AIC model
II. Systems Infection Microbiology
8.3 Host/pathogen proteinprotein interaction network during the infection process of Candida albicans
195
selection criteria, the Student’s t-test was further applied to determine the statistical significance of the parameters. The P-values for the regulatory abilities were calculated under the null hypothesis H0 :bik 5 0 or H0 :aij 5 0 [282]. The interactions with Bonferroni-adjusted P-values.05 were identified as significant interactions and preserved in the refined interspecies PPI network. In this manner, insignificant interactions (false positives) in the candidate interspecies network were pruned to construct the refined host/pathogen interspecies PPI network.
8.3 Host/pathogen proteinprotein interaction network during the infection process of Candida albicans 8.3.1 Construction of host/pathogen proteinprotein interaction network In this chapter, our main objective is to identify key host/pathogen PPI network during the infection process for better understanding of adaptation mechanisms during the battle between host and pathogen. As shown in Fig. 8.1, various kinds of omics data and databases are to be mined and integrated as the input for the construction of interspecies PPI network by the systems biology method, including microarray gene expression data, ortholog information, and PPI data. On the basis of the time-course gene expression profiles and one-way ANOVA, 1728 genes (27.86%) for C. albicans and 680 genes (2.59%) for zebrafish are identified as dynamically regulated genes, and their corresponding gene products are selected as target proteins in the protein pool. Then, the putative interspecies and intracellular interactions among these target proteins are also inferred and a candidate interspecies PPI network is built. Totally, there are 1606 putative host/pathogen interactions, 17,456 putative intracellular interactions for C. albicans, and 75 putative intracellular interactions for zebrafish among 1230 C. albicans proteins and 130 zebrafish proteins in the candidate interspecies PPI network. It should be noted that the target proteins without inferred protein interactions are excluded from the candidate interspecies PPI network. Next, a multivariate linear dynamic model is employed as a systematic description for PPIs of the candidate interspecies PPI network. Since there exist a large amount of falsepositive PPIs in the candidate interspecies PPI network, system identification method and system order detection method in Chapter 2, Biological Network Modeling and System Identification in Systems Immunology and Infection Microbiology, are employed to refine candidate interspecies PPI network by pruning false positives through two-side microarray data during C. albicans infection. On the basis of this multivariate linear dynamic model and the simultaneously quantified time-course transcriptomics, a refined C. albicanszebrafish interspecies PPI network is constructed by pruning the false positives during the infection process. In the constructed host/pathogen PPI network, there are 371 interspecies interactions, 3504 intracellular interactions for C. albicans, and 35 intracellular interactions for zebrafish among 1127 C. albicans proteins and 87 zebrafish proteins (Fig. 8.2). Since the focus of this study lies in the interspecies interaction mechanisms between the host and the pathogen, the identified novel interspecies hostpathogen PPIs, rather than the intracellular PPIs, are further investigated in the following section.
II. Systems Infection Microbiology
196
8. Hostpathogen proteinprotein interaction network
FIGURE 8.2 The constructed Candida albicanszebrafish host/pathogen PPI network. There are 371 interspecies interactions, 3504 C. albicans intracellular interactions, and 35 zebrafish intracellular interactions among 1127 C. albicans proteins and 87 zebrafish proteins in the constructed interspecies network. Representation of color nodes and edges are the same as in Fig. 8.1. The figure is created by Cytoscape [88] and the protein names were omitted for simplicity [6]. PPI, Proteinprotein interaction.
8.3.2 Novel host/pathogen proteinprotein interaction network highlights the association between Candida albicans pathogenesis and the zebrafish redox process On the basis of the proposed computational system framework, several novel interspecies interactions are identified in the constructed host/pathogen interspecies PPI network. Since the host/pathogen PPI network construction method is proposed to elucidate pathogenic and defensive mechanisms during the infection process, the interspecies subnetwork for C. albicans virulence proteins, namely, proteins annotated with GO term pathogenesis, will be further investigated. From these identified host/pathogen PPIs, 24 zebrafish proteins are found to interact with C. albicans virulence proteins (Fig. 8.3). In addition, oxidationreduction process is found to be the only significant GO term shared among these 24 proteins (P , :01, Fisher’s exact test), highlighting the association between C. albicans pathogenesis and the zebrafish redox process. Six zebrafish proteins in the pathogenesis subnetwork, that is, Cyb5r2, Cyp51, Kmo, Nsdhl, Sc5d, and zgc:77112, are annotated with oxidationreduction process (Fig. 8.3) and all of their gene expressions are repressed over time, except for zgc:77112 (see Fig. 8.4). Host defense against C. albicans infection relies mainly on phagocytes of the innate immune system, and another important host response
II. Systems Infection Microbiology
8.3 Host/pathogen proteinprotein interaction network during the infection process of Candida albicans
197
FIGURE 8.3 The host/pathogen subnetwork for Candida albicans pathogenesis proteins [6].
FIGURE 8.4 Gene expression profiles of zebrafish proteins annotated with oxidationreduction process in the pathogenesis subnetwork [6].
II. Systems Infection Microbiology
198
8. Hostpathogen proteinprotein interaction network
generated by phagocytes is the production of reactive oxygen species (ROS). Free oxygen radicals produced by the oxidationreduction process are highly found to be toxic to pathogens and are utilized for pathogen clearance. Further, ROS has been demonstrated to act as secondary signaling molecules, contributing to signaling cascades related to inflammation, apoptosis, and immune responses [378]. For example, in certain cell lines, the activation of the proinflammatory transcription factor NF-κB is dependent on ROS [379]. Therefore immune cells depend on ROS to not only kill phagocytosed pathogens directly but also to mediate inflammatory and immune signaling pathways [378]. As a result, a wide variety of pathogens are found to have developed various molecular strategies to prevent host ROS generation, including Helicobacter pylori, Legionella pneumophila, and Aspergillus fumigatus [380382]. Through these ROS inhibition mechanisms, these pathogenic organisms can evade host immune responses. Similarly, C. albicans has the ability to suppress ROS production in host immune cells [383]. The precise mechanism utilized by C. albicans remains unclear; however, C. albicans catalase and surface superoxide dismutase have been involved in counteracting the oxidative burst from phagocytes [384,385]. Since ROS also carry out the above important cellular functions, the suppression of ROS production by C. albicans may result in not only the evasion of phagocytic killing but also the significant modulation of anti-Candida inflammatory responses, which can directly benefit the pathogen further. Consequently, the suppression of ROS production may represent an important immune evasion mechanism of C. albicans. Contrarily, the infection by some pathogens such as Entamoeba histolytica and Japanese encephalitis virus can result in enhanced ROS formation [386,387]. These pathogens have been reported to utilize the enhanced ROS generation to lead to host cell death, thus allowing themselves to escape the cell. This mechanism might be likely to contribute to the spread of the pathogens [378]. Recently, it has been demonstrated that Sclerotinia sclerotiorum, a fungal pathogen, which can infect virtually all dicotyledonous plants, can both suppress and induce host ROS formation during infection via the secretion of oxalic acid [388]. During the initial stages of infection, S. sclerotiorum can dampen the oxidative burst of the plant and lead to a reducing environment in host cells. Once infection is established, the pathogen induces the generation of plant ROS (oxidizing conditions), leading to the programmed cell death of the host, which can directly benefit the pathogen [388]. In this situation, Sclerotinia uses a novel strategy involving the regulation of host redox status to establish infection. Although there is no evidence that C. albicans is capable of inducing ROS production in host cells to date (Wellington et al. have demonstrated that C. albicans can suppress the production of ROS in phagocytes within 180 min of infection [383]), we could postulate that C. albicans may also modulate ROS levels to subvert immune defenses in the same way as S. sclerotiorum, that is, through suppressing host ROS production in the initial stage of infection and inducing ROS in the later stage. Further experiments to measure ROS at different infection stages are needed to validate this hypothesis. Altogether, novel interspecies interactions identified in this chapter could highlight the association between C. albicans pathogenesis and the zebrafish redox process. The redox status in both the host and pathogen can be a key factor to determine the outcome of the battle between the host and pathogen. From the perspective of the pathogen, it is essential to be in control of the redox environment and the cell death pathways of the host in order to subvert immune defenses by the host and support self-survival. Contrarily, during an immune
II. Systems Infection Microbiology
8.4 Discussion
199
response, the host seeks to control the redox environment and the cell death pathways to the detriment of the pathogen. This figure indicates the C. albicans pathogenesis subnetwork extracted from the interspecies PPI network in Fig. 8.2. The 24 zebrafish proteins interacting with C. albicans pathogenesis proteins (purple shadow) are found to be statistically enriched with proteins annotated with the oxidationreduction process (yellow shadow) (P , .01), highlighting the association between C. albicans pathogenesis and the zebrafish redox process. The intracellular protein interactions for both C. albicans and zebrafish are omitted for simplicity.
8.4 Discussion Infectious disease is one of the leading causes of death worldwide, and complex pathogenic and defensive mechanisms between host and pathogen be beneath the process of infection. However, most studies to explore hostpathogen interactions have almost focused on the host or the pathogen individually rather than simultaneously examining both interaction partners. Although these single-species studies have provided insights into the pathogenic and defensive mechanisms for hostpathogen interactions, they could not give clues about interspecies functional associations between host and pathogen yet. Detailed knowledge of hostpathogen protein interactions may enable us to comprehend the mechanisms of pathogen infection and to identify better strategies of host to prevent or cure infection [389]. Accordingly, in this chapter, we have developed a computational framework to efficiently construct the interspecies PPI network for focusing on the characterization of interspecies interactions between host and pathogen. Based on ortholog-based PPI inference and multivariate dynamic modeling of host/pathogen PPI network during C. albicans infections, several omics data are integrated for interspecies PPI network construction. The proposed computational method has been shown to be useful, emphasizing on the combination between C. albicans pathogenesis and the zebrafish redox process, and on the idea that redox status is critical during the battle between the host and pathogen. According to the findings and evidences from other species, we have also speculated that C. albicans may suppress host ROS production in the initial stages of infection and induce ROS formation in the later stages to destroy the host immune defense. However, further experiments are still required to validate this hypothesis. Previous studies have demonstrated that hyphal morphogenesis is an important virulence factor in C. albicans [361]. Therefore in addition to the pathogenesis subnetwork, we could also explore zebrafish proteins that can interact with C. albicans hyphae-related proteins, specifically, proteins annotated with GO term hyphal growth, in the constructed interspecies PPI network. However, only the general GO terms, such as metabolic process and lipid biosynthetic process, are significantly enriched among those zebrafish proteins. Since these general biological processes only add little to the understanding of the pathogenic-defensive mechanisms during the battle between host and pathogen, they are not discussed in the current studies. Although the proposed interspecies PPI network construction method is shown to be useful, some improvements remain to be addressed. Due to extremely low coverage of the C. albicans and zebrafish interactomes and lack of host/pathogen PPIs between
II. Systems Infection Microbiology
200
8. Hostpathogen proteinprotein interaction network
C. albicans and zebrafish, ortholog-based PPI prediction is used to infer the putative PPIs among and between C. albicans and zebrafish. Although the use of interologs to infer hostpathogen interaction has been shown to be a useful approach [370], the putative PPIs may still contain inaccuracies, which could lead to deviation of the constructed interspecies PPI network from the actual PPI network. Therefore a high coverage of the C. albicans and zebrafish interactomes or even the experimentally validated interspecies PPIs will improve the network interspecies PPI construction scheme. With the proposed interspecies PPI network construction scheme, we are able to construct interspecies PPI networks for all kinds of interacting organisms efficiently given the interspecies transcriptomics data. In addition, the constructed host/pathogen PPI network can be easily scalable, that is, the use of the computational scheme is not limited by the number of proteins of interest. Recently, Tierney et al. have used simultaneous RNA-seq to quantify C. albicans and Mus musculus gene expression dynamics during phagocytosis by dendritic cells and inferred a host/pathogen regulatory network that could also identify novel interspecies hostpathogen interactions [390]. On the basis of their inferred network, they have proposed a mechanism how murine Ptx3 binding to C. albicans leads to cell wall remodeling via fungal Hap3 target genes, therefore altering recognition of the fungus by immune cells and attenuating host immune responses [390]. Their work have successfully demonstrated the usefulness of network inference approaches to decipher microbial pathogenesis mechanisms. Nevertheless, their network inference method is only restricted to a limited number of genes with prior knowledge, which can be overcome by our proposed systematic scheme. Advancing from the investigation of single species, the interspecies PPI network construction approach can further help characterize and elucidate hostpathogen interactions. In the future, with the accumulation of interspecies transcriptomics data, the proposed systems and computation framework can be used to explore progressive network rewiring over time during the infection process. In this manner the dynamics of the interspecies system can be comprehensively studied. It has been suggested that a disease is rarely a consequence of an abnormality in a single gene/protein given the functional interdependencies between molecular components in the cell [391], and that both network connectivity and dynamics are important biomarks as drug targets for therapeutic intervention [392]. Consequently, we believe that systems medicine targeting network connectivity and dynamics can be developed for the therapeutic treatment of infectious diseases with the help of the proposed interspecies PPI network construction method.
8.5 Conclusion In this chapter a computational systematic framework that integrates multiple omics data has been proposed to construct an interspecies PPI network for the characterization of host/pathogen mechanisms during the infection process. The proposed systematic method is shown to be useful, with results highlighting an offensivedefensive association between C. albicans pathogenesis and the zebrafish redox process during C. albicans infection in zebrafish. Results further could indicate that redox status is critical during the battle between the host and pathogen, which can determine the outcome of infection. While
II. Systems Infection Microbiology
8.5 Conclusion
201
the pathogen can control the redox environment to destroy immune defenses and support self-survival, in contrast, the host controls the redox environment to the detriment of the pathogen during an immune response. With the continued accumulation of interspecies transcriptomics data, the proposed interspecies PPI network method could be more precise and helpful in the development of systems medicine for infectious diseases from a host/ pathogen PPI network perspective.
II. Systems Infection Microbiology
C H A P T E R
9 Essential functional modules for pathogenic and defensive mechanisms via host/pathogen crosstalk network by database mining and two-sided microarray data identification 9.1 Introduction In daily life, human beings are always exposed to environments comprising a wide variety of microorganisms. It is unavoidable that humans will sometimes face opportunistic threats posed by some of these microbes. Pathogens, microorganisms that cause their host disease, have evolved numerous strategies to invade their hosts, while hosts have also evolved corresponding defensive responsive strategies to these invading microbes [393]. The result of such hostpathogen interactions can result in damage to or even death of the host. Consequently, the investigation of the systematic molecular mechanisms of hostpathogen interactions may help biologists and clinicians better understand the underlying biological scenario. Whenever the pathogenic mechanisms of pathogens and the corresponding defensive mechanisms employed by hosts are investigated, novel therapeutic strategies that can improve hosts in responding to microbial infection may be developed for drug discovery. Candida albicans, a fungal pathogen, is found to be a kind of ubiquitous commensal yeast that can occupy the mouth, gastrointestinal tract, and the vagina in humans. Under normal conditions, C. albicans is harmless to humans. However, it can induce serious mucosal and life-threatening systemic infections in individuals who are immunocompromised due to suitable environmental factors as infection with human immunodeficiency virus (HIV), organ transplantation, or cancer chemotherapy. Further, C. albicans is a major cause of hospital-acquired infection [394,395]. C. albicans exists with many morphological forms including a yeast form, a pseudohyphal form, and a hyphal form. The ability to switch from the yeast to hyphal form has been found as one of the major factors
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00007-9
203
© 2021 Elsevier Inc. All rights reserved.
204
9. Essential functional modules for pathogenic and defensive mechanisms via host/pathogen
accounting for the virulence of the organism, and other studies have also demonstrated that nonfilamentous C. albicans mutants are avirulent [140,285,396]. At present, the mouse, the fruit fly, and the wax moth are the main model organisms for studies of C. albicans infection. Nevertheless, there are certain disadvantages in using these organisms in such models. The fruit fly and the wax moth lack adaptive immunity [397,398], and the mouse is too expensive for large-scale experiments. Therefore Chao et al. [320] have developed the zebrafish (Danio rerio) model as a minivertebrate host system for C. albicans infection studies. They have showed that C. albicans can invade zebrafish and kill the host in a dose-dependent manner [320]. Brothers et al. have also developed the zebrafish larva as a transparent vertebrate model of disseminated candidiasis, showing that the infection model could reproduce many aspects of candidemia in mammalian hosts [399]. Furthermore, the zebrafish could undergo rapid embryonic development and requires relatively small spaces to breed, leading to low experimental costs and making it a suitable infection model organism. In addition, the zebrafish has both innate and adaptive immune systems [364] and, therefore, has become widely used in the study of human diseases [363]. Several studies have identified the virulence factors and the corresponding virulenceassociated genes during the oral infection of C. albicans [142]. Other studies have investigated innate immune responses occurring during the infection process, especially focusing on pathogen recognition mechanisms [400]. However, these studies have mainly addressed on specific genes and their particular roles in the infection process and have not investigated hostpathogen interaction from a systems point of view [251]. In the light of experimental observations in which about 50% of zebrafish have been seen to die of extensive bleeding 18 h after being infected with C. albicans (1 3 108 CFU) [320], we will aim to investigate both the functional modules of the activated pathogen essential in the invasion of zebrafish by C. albicans and the zebrafish functional modules likely to be responsible for defensive responses and the extensive bleeding. In other words the goal of this chapter is to investigate the pathogenesis of C. albicans in fatal infections of zebrafish and the significant defensive mechanisms employed by zebrafish against C. albicans infection from the cross-talk network perspective. Consequently, we can simultaneously quantify the time-course gene expression profiles for both C. albicans and zebrafish during C. albicans infection. With the help of simultaneous hostpathogen interaction microarrays and other high-throughput omics data, the early-stage infection and late-stage infection protein interaction networks in both C. albicans and zebrafish could be constructed. Proteinprotein interactions (PPIs) are at the core of the intercellular interactions to control major biological cellular functions. Differential PPIs imply mechanistic changes due to a result of an organism’s response to environmental conditions [401]. In the case of hostpathogen interaction an examination of the differential PPIs at different stages of infection can show how the host attempts to respond to the pathogen and how the pathogen responds within the host [401]. Consequently, changes in PPIs during infection may affect the pathogenesis of pathogens, while the reconfiguration of the PPIs in the host may reflect the activation of defensive mechanisms against pathogens. Using these constructed PPI networks, proteins with significant changes in their interaction profiles are investigated to play important roles in infection pathology. Furthermore, from the identification of such significant proteins, the C. albicans functional modules involved with pathogenesis and the zebrafish
II. Systems Infection Microbiology
9.2 Material and methods
205
functional modules associated in defense against C. albicans infection could be identified by the systematic method. It is appealing that by understanding the underlying pathogenic/defensive interaction mechanisms between the host and pathogen during C. albicans infection, biologists and clinicians may better appreciate how the pathogen infects its host and thereby could identify significant biomarkers for drug targets to devise effective therapeutic strategies to prevent the loss of life in cases of C. albicans infection [389].
9.2 Material and methods 9.2.1 Omics data selection and database mining In order to investigate important cellular function modules in C. albicanszebrafish interactions, high-throughput omics data from many different sources are integrated, including simultaneous time-course gene expression profiles of C. albicans and zebrafish interactions obtained from two-side microarray data, PPI information from Homo sapiens and Saccharomyces cerevisiae, and ortholog data between humans and zebrafish and between S. cerevisiae and C. albicans. The time-course gene expression microarray data are obtained from the GEO database (accession number: GSE32119). Experiments have been performed to obtain in vivo genome-wide gene expression profiles simultaneously for both C. albicans and zebrafish during C. albicanszebrafish interactions. Wild type AB strain zebrafish are intraperitoneally injected with C. albicans cell suspensions (SC5314 strain), and gene expressions in both C. albicans and zebrafish are then assessed at nine subsequent times: 0.5, 1, 2, 4, 6, 8, 12, 16, and 18 h after infection (hpi); this experiment has been performed three times in total [374]. At present, since there is a little available in terms of protein interaction maps in either C. albicans or zebrafish, PPI information for these organisms is inferred from the interactome of S. cerevisiae and humans with the help of ortholog data [4]. Both the PPI data of S. cerevisiae and of humans are acquired from the Biological General Repository for Interaction Datasets (http:// thebiogrid.org/) [402]. The ortholog data pertaining to C. albicans and S. cerevisiae are retrieved from the Candida Genome Database (CGD) (http://www.candidagenome.org/) [403]; the ortholog data pertaining to zebrafish and humans are taken from the Zebrafish Model Organism Database (http://zfin.org) [376] and the InParanoid database (http://InParanoid.sbc.su.se) [375]. Further, gene annotations of C. albicans and zebrafish are obtained from CGD; the Gene Ontology (GO) database (http://www.geneontology.org/) [404]; and the Database for Annotation, Visualization, and Integrated Discovery (DAVID) (http://david.abcc.ncifcrf.gov/) [405].
9.2.2 Selection of protein pool The overall flowchart to illustrate the proposed approach is illustrated in Fig. 9.1. For both host data and pathogen data, gene expression profiles, ortholog data, and PPI data are used to construct dynamic PPI networks. In order to integrate gene-expression profiles and PPI information, gene expression values are overlaid on the corresponding proteins as the protein expression levels [406]. Since the systems approach adopted in this study is based on dynamic PPI networks, the extent of coverage of the interactome still needs to be considered when selecting proteins of interest. For C. albicans the PPI information is inferred from the
II. Systems Infection Microbiology
206
9. Essential functional modules for pathogenic and defensive mechanisms via host/pathogen
FIGURE 9.1 A flowchart for construction of PPI networks and determination of enriched functional modules in hostpathogen interaction by comparing the early-stage and late-stage PPI networks. This figure shows the adopted approach in flowchart form. Blue boxes show the data sought in this study. Orange boxes indicate the steps used in the data-gathering process. Purple boxes represent the results of each processing step, and the green box denotes the final result of the whole approach [412]. PPI, Proteinprotein interaction.
interactions of S. cerevisiae, the best studied model system. However, the zebrafish has been found with a much lower overall coverage in terms of protein interaction maps than C. albicans. Therefore the selection of the protein pool is different for C. albicans and zebrafish. One-way analysis of variance (ANOVA) is employed to select differentially expressed proteins in C. albicans. The null hypothesis of ANOVA has assumed that the average expression level of a protein is the same at every time point [282]. Proteins with Bonferroni adjusted P values
II. Systems Infection Microbiology
9.2 Material and methods
207
of less than 0.1 are selected in the protein pool as target proteins. For zebrafish the protein pool can include all proteins even though they are not differentially expressed. Since PPI networks are used in this chapter, those target proteins for which PPI information is not available are filtered out of the protein pool.
9.2.3 Construction of proteinprotein interaction networks The systems biology strategy is to identify proteins significant in PPI network reconfiguration during the infection process and then to investigate the enriched functional modules composed of these significant proteins. For this reason, early- and late-stage PPI networks for both C. albicans and zebrafish are constructed for PPI network configuration comparison. Previous histological analysis has shown that the first zebrafish is found to die 5 h after being infected with C. albicans (1 3 108 CFU), and about 50% of zebrafish are found to die by 18 hpi [320]. Therefore the gene expression data taken nine time points after infection are separated into two groups; one contains the 0.54 hpi data, the early stage of infection, and the other comprises the 418 hpi data, the late stage of infection. Therefore the PPI networks constructed from gene expression data within the 0.54 hpi period are designated as the early-stage PPI networks, and the gene expression data after 4 hpi are used to construct the late-stage PPI networks. With data pertaining to the proteins in the protein pool and the PPI data obtained from the database mining, a candidate PPI network for both postinfection stages is constructed for C. albicans and zebrafish by linking the proteins with the PPI information. However, under the specific conditions of the infection process, these candidate PPI networks may be inappropriate because they are constructed from data obtained under all possible experimental or biological conditions in the literature and databases. Therefore these candidate PPI networks need refining into a suitable PPI networks occurring specifically during infection by two-side gene expression profiles of C. albicans and zebrafish. In this chapter a discrete dynamic model is employed to identify the PPI networks that occur in the infection of zebrafish by C. albicans [82] (see Supplementary methods in Appendix 1). Based on the time-course microarray data, the system parameter estimation method in Eqs. (9.A1)(9.A4) and the model selection measurement, Akaike Information Criterion (AIC) in Eq. (9.A5) is then used to detect significant PPIs by pruning the false positives in candidate PPI network for real PPI networks [40,81] (see Supplementary methods in Appendix 1). In this way, with different sets of microarray data (0.54 hpi for the early stage and 418 hpi for the late stage), two refined PPI networks are constructed as real PPI networks for both C. albicans and zebrafish in the early and late stages of C. albicans infection of zebrafish. These early- and late-stage PPI networks will be compared with each other to find out their network reconfigurations for essential cellular function modules to investigate the host/pathogen mechanism in the infection process of C. albicans infection of zebrafish.
9.2.4 Network reconfiguration between the early and late stages of the infection process Living organisms can take appropriate actions to respond to diverse environmental changes and internal cellular perturbations. Through the adjustment of their molecular
II. Systems Infection Microbiology
208
9. Essential functional modules for pathogenic and defensive mechanisms via host/pathogen
interactions, organisms tend to maintain a proper, beneficial, or stable state in response to these changes in such conditions [407]. Therefore the PPI network changes with these PPI variations over time to balance out the effects of environmental changes; that is, the PPI network reconfigures as corresponding protein interactions change to respond to the different environmental and internal cellular conditions brought about by the infection process. A matrix indicating significant PPIs in the real PPI network is constructed from those identified protein interaction abilities (see Supplementary methods in Appendix 1). The established PPI interaction matrix of a real PPI network can be thus represented: 2 3 b11 b12 ? b1K 6 b21 b22 ? b2K 7 6 7 (9.1) 4 ^ ^ & ^ 5 bK1 bK2 ? bKK where bij denotes the identified interaction ability between proteins i and j, and K represents the number of proteins in the real PPI network. Therefore the interaction matrix of the differential PPI network by comparing the early-stage with late-stage PPI networks is represented as follows: 2 3 2 3 b11;2l 2 b11;1l b12;2l 2 b12;1l ? b1K;2l 2 b1K;1l d11;l d12;l ? d1K;l 6 d21;l d22;l ? d2K;l 7 6 b21;2l 2 b21;1l b22;2l 2 b22;1l ? b2K;2l 2 b2K;1l 7 756 7 (9.2) Dl 5 6 4 ^ 5 ^ & ^ 5 4 ^ ^ & ^ dK1;l dK2;l ? dKK;l bK1;2l 2 bK1;1l bK2;2l 2 bK2;1l ? bKK;2l 2 bKK;1l where dij;l represents the change of protein interaction ability of the ith organism system between the late-stage PPI network and the early-stage PPI network for protein i and protein j, dij;1l and dij;2l indicate the identified protein interaction ability between protein i and protein j for the early-stage PPI network and the late-stage PPI network of the ith organism, respectively, and i can represent the pathogen or host. Therefore for each organism system, a matrix Dl is established to show the differential PPI network between the earlystage and late-stage PPI networks. In addition, the structural variations of each protein between the early-stage and late-stage PPI networks can be determined by the differential PPI network. The following structure variation value (SVV) is employed to be an index to quantify the PPI network reconfiguration between these two stages in the infection process of C. albicans 2 3 SVV1;l 6 SVV2;l 7 7 SVVl 5 6 (9.3) 4 ^ 5 SVVK;l PK where SVV1;l 5 j51 dij;l , i 5 host or pathogen, and i 5 1, . . . , K; that is, the reconfiguration of the protein i of the ith organism is calculated from the absolute sum of the ith row of Dl in (9.2) and the reconfiguration of the PPI network is represented by the vector SVVl for the ith organism. For a protein i of the ith organism system, SVV1;l implies the extent of the structure change of the ith protein between the early stage and late stage of the infection process.SVVl can represent the network reconfiguration of the ith organism.
II. Systems Infection Microbiology
9.3 Essential functional modules for pathogenic and defensive mechanisms
209
9.2.5 Investigation of significant functional modules in the infection process During the hostpathogen interaction in the infection process of C. albicans with zebrafish, the pathogen makes modifications to its PPI network for invasive purposes, while the host makes adjustments to its PPI network to defend itself against the pathogen too. When the participation of a protein in a specific biological process is correlated with the changes, it suffers the change of PPI network during that process. In this study, changes in the PPI structures in both organisms (i.e., the differential PPI networks) between the early and late stages of infection can reveal proteins with significant SVVs, which are then considered to play important roles in the infection process. Furthermore, cellular function modules made up of proteins with significant SVVs are regarded as important factors in the specific biological behavior of the infection process of C. albicans. Since no zebrafish has died in the early stage (0.54 hpi) and the infected fish has started to die in the late stage (418 hpi), these significant functional modules are considered to be important in conferring the virulence of the pathogen for C. albicans. In the case of zebrafish, these significant functional modules are possibly associated with defensive mechanisms by which certain biological processes are activated or inhibited in order to respond to C. albicans infection. In order to determine the significance of the SVV of a given protein, an empirical P value is computed. A null distribution of SVVs is created based on the SVVs of random PPI networks. The random PPI networks are generated by permuting the network struc´ nyi random graph model ˝ ture with the network size being constrained; that is, ErdosRe is employed to create the random PPI networks with the same number of protein interactions. Hence, the SVVs for each protein in the random PPI networks could be computed. With 100,000 iterations the P value of a given SVV is calculated as the fraction of the number of random PPI networks in which SVV is at least as large as the SVV of the real PPI network. SVVs with P values # 0.05 are considered to be significant and the corresponding proteins are assumed to undergo significant changes in their interaction characteristics during the infection process of C. albicans. Once proteins with significant SVVs are thus found in C. albicans interactions with zebrafish, gene annotations from the GO and DAVID databases are used to form enriched functional modules composed of the proteins identified with significant SVVs. As time and environment change, the activated functional modules in an organism may also change. It is believed that the functional modules composed of proteins with significantly elevated SVVs in C. albicans are responsible for its pathogenesis. Similarly, the enriched functional modules in infected zebrafish might be considered the major functional modules for defensive response induced by C. albicans infection.
9.3 Essential functional modules for pathogenic and defensive mechanisms 9.3.1 Construction of dynamic proteinprotein interaction networks and identification of significant proteins in Candida albicans infection Using the computational methods outlined previously, several functional modules tentatively accounting for C. albicans pathogenicity and the associated defensive responses of
II. Systems Infection Microbiology
210
9. Essential functional modules for pathogenic and defensive mechanisms via host/pathogen
zebrafish are investigated for their possible roles in the infection process of C. albicans with zebrafish. For C. albicans the PPI network consists of 1369 differentially expressed proteins selected in the protein pool. For zebrafish a total of 7861 proteins are all selected in the protein pool for PPI network construction without considering whether they are differentially expressed. Since a candidate PPI network is constructed from target proteins along with PPI information obtained from database mining, there are a large amount of false positives in the candidate PPI network. It is more appealing to refine the candidate PPI network by pruning false positives via microarray data of C. albicans infections. In this way the refined early-stage and late-stage PPI networks are established for real early-stage and late-stage PPI networks of both C. albicans and zebrafish. For C. albicans the early-stage PPI network is constructed from 1318 proteins and 2902 PPIs; the late-stage PPI network is comprised of 1301 proteins and 4045 PPIs. For zebrafish, there are 13,399 PPIs and 6689 proteins in the early-stage PPI network and 18,807 interactions among 7023 proteins in the late-stage PPI network. The extent of which a protein is considered significant in the infection process is based on the changes in the PPIs, that is, the differential PPI networks (Figs. 9.A1 and 9.A2: The differential PPI networks obtained from the early and late stage PPI networks of C. albicans and zebrafish, respectively.). In other words, significant proteins are identified by the comparison between edge variations of the PPI networks in the two stages of the infection process, as revealed by the SVVs in the differential PPI networks with P values # 0.05 (distributions of SVVs for C. albicans and zebrafish are shown in Fig. 9.A3 in Appendix 2). In this way, 139 C. albicans proteins are found to be of significance during the course of infection, and 380 zebrafish proteins are identified as playing important roles in defensive processes against C. albicans. Some functionally enriched modules comprising these SVV-significant proteins will be discussed in the following sections.
9.3.2 Investigation of essential cellular function modules for pathogenic and defensive mechanisms in Candida albicans infection with zebrafish 9.3.2.1 Functionally enriched Candida albicans modules for pathogenic mechanism in the infection process of C. albicans with zebrafish The 139 SVV-significant proteins investigated in C. albicans can be divided into nine cellular function modules using annotations from the GO database. The nine cellular function modules are associated with hyphal morphogenesis, ion and small molecule transport, protein secretion, shifts in carbon utilization, stress responses, protein metabolism and catabolism, signal transduction, transcription-related processes, and other processes (proteins not belonging to the abovementioned functional modules or lacking GO annotation). In Fig. 9.2, eight out of nine functional modules are statistically enriched beyond what is expected by chance (P , 0.05, Fisher’s exact test, except for other processes). Some of these nine functional modules are found to be associated with general biological processes. Therefore in this chapter, we have focused only on four of these modules: those playing a role in hyphal morphogenesis, ion and small molecule transport, protein secretion, and shifts in carbon utilization. In Fig. 9.2, we can infer that these four enriched functional
II. Systems Infection Microbiology
9.3 Essential functional modules for pathogenic and defensive mechanisms
211
FIGURE 9.2 The functional modules composed of 139 Candida albicans proteins found to be significant for pathogenic mechanisms in the infection process. This figure shows the differential PPI network constructed from 139 C. albicans proteins significant in the infection process and the interactions among them. Red and blue edges indicate positive and negative dij;l values, respectively, as calculated using (9.2). The orange nodes represent the significant proteins, that is, proteins with SVV P values # 0.05. There are nine enriched C. albicans functional modules occurring in hostpathogen interactions, playing roles in such processes as hyphal morphogenesis, ion and small molecule transport, protein secretion, shifts in carbon utilization, stress responses, protein metabolism and catabolism, signal transduction, transcription-related processes, and other processes. The functional modules marked with blue circles are also investigated in this chapter. The figure is created using Cytoscape plugin Cerebral [88,413]. The names of the proteins have been omitted for simplicity [412]. PPI, Proteinprotein interaction; SVV, Structure variation value.
modules account for pathogenesis of C. albicans in infecting zebrafish, and further discussion of them is given in the followings (Table 9.1). 1. Hyphal morphogenesis: The fungal pathogen C. albicans can grow in various morphogenic forms. The pathogen can exist in yeast form or undergo a process of morphogenesis to develop pseudohyphae or polarized hyphae. It has been demonstrated that the morphogenetic plasticity of C. albicans correlates closely with its pathogenicity and the phenotypic switch of the yeast to its hyphal form is found to be a crucial factor in C. albicans pathogenesis [396]. In general, mutant strains of C. albicans defective in the ability to form hyphae are less virulent in animal models [285,408]. From a systematic perspective, considering the change of network structure between the two infection stages, several proteins with significant SVVs are included in the
II. Systems Infection Microbiology
212
9. Essential functional modules for pathogenic and defensive mechanisms via host/pathogen
TABLE 9.1 Enriched functional modules for pathogenic mechanisms in Candida albicans and the corresponding significant proteins shown in Fig. 9.2 during the hostpathogen interaction [412]. Functional module
Protein symbol
Hyphal morphogenesis
Cas4, Als3, Ifd6, Kis1, Mac4, Ndt80, orf19.6705
Ion and small molecule transport
Tna1, Agp2, Can1, Mac1, Als3, orf19.3769, Git1, Hut1, Mcd4, Mep1, Mrs7, orf19.1403, orf19.1427, orf19.2322.3, orf19.3132, orf19.3558, orf19.4897, Seo1, Sfc1, Vcx1
Protein secretion
Sap4, Sap5, Sap6, Sro77, Sys3, Ddi1, orf19.3247, orf19.7261, orf19.7604, orf19.841, Plb2, Prd1, Sec20, Spc3
Shifts in carbon utilization
Lat1, orf19.3782, Tes15, orf19.4121, Tes1, Pyc2, Acc1, Agp2, Pox13
functional module, such as Cas4 and Als3 (Table 9.1). PAG1, also known as CAS4, is a gene in the RAM network, a conserved signaling network to regulate the polarized morphogenesis. The CAS4 mutant has shown to be with the hypersensitivity to cell wallperturbing agents and the loss of cell polarity [409]. Further, the hyphal form can interact with the yeast and pseudo hyphal forms to produce biofilms, which can act as a source of recurrent infection to play an important role in resistance to antifungal agents [410]. Als3 is involved in the biofilm formation and an ALS3 mutant has been shown to be biofilm-defective in vitro [411]. In summary, during C. albicans infection, morphogenesis is a crucial virulence-determining factor found in the network structure reconfiguration from the early to late stages of C. albicans infection. This table lists four of the functional modules considered to be significant in C. albicans pathogenesis and the corresponding proteins with associated GO annotation. 2. Ion and small molecule transport: C. albicans can prosper in various niches within its host, such as mouth, gastrointestinal tract, and the vagina. These niches can be characterized by their diverse environments, with extreme variation in pH and nutrient composition. Consequently, C. albicans must either adapt or, more likely, change its niche in order to survive, possibly resulting in damage to the host tissue. Several proteins with significant SVVs are included in this cellular function module, such as Tna1, Agp2, and Can1, which are responsible for nicotinic acid, carnitine, and amino acid transport, respectively (Table 9.1). These transporters, which have shown significant interaction variations in network structures from the early to late stages of C. albicans infection, appear to be that a reorganization of substrate uptake and utilization occurs. This reorganization could also enable C. albicans to make like available nutrients from its host, allowing it to survive in a hostile microenvironment. Furthermore, using GO annotations, proteins such as Als3, Mac1, and orf19.3769, which are involved in the transport of such metal ions as iron, copper, and zinc, are also identified. Copper, zinc, and iron are all examples of nutritionally essential trace elements, which are also referred to as micronutrients. These minerals are also required for the growth and the optimal function of many organisms. Both the excess and deficiency of these minerals will have adverse effects on such organisms.
II. Systems Infection Microbiology
9.3 Essential functional modules for pathogenic and defensive mechanisms
213
Thus the maintenance of an adequate supply of micronutrients is very important for the viability of C. albicans. Previous research has shown that calprotectin from the cytoplasm of neutrophils can inhibit C. albicans growth through the competition for zinc [414]. Consequently, adequate zinc levels are required for C. albicans growth. Further, superoxide dismutases cofactored with copper and zinc (Cu/ZnSOD) are found in C. albicans [415]. These enzymes play critical roles in antioxidant defense when cells are exposed to oxygen. Antioxidants can also inhibit oxidation reactions, which can produce free radicals, leading to cell damage or death of C. albicans. A previous study has demonstrated that C. albicans lacking of Cu/ZnSOD is more susceptible to macrophages and its virulence has been seen to attenuate in mice [416]. Hence it might seem that the uptake of copper and zinc appears to be important in C. albicans growth and progression of infection. It is well known that C. albicans possesses iron acquisition mechanisms, which are found to be essential for hyphal growth in the infection process [417]. Such mechanisms can also deprive the host of iron, thereby exerting a harmful effect on zebrafish. Als3 is a hyphal-associated adhesion and invasion in C. albicans and is found to be essential in ferritin-binding to the external hyphal layer [191]. Ferritin is an ironcontaining host protein and therefore a potential iron source for pathogens in the infection process. Previous studies have indicated that C. albicans mutants lacking of ALS3 could display defective ferritin-binding abilities and that this could also attenuate the pathogenic damage done to oral epithelial cells [417]. This distinctive iron-utilization characteristic can contribute to the survival of the organism and pathogenesis in the host and also might seem to play a crucial role in hyphal formation, adhesion, and invasion during hostpathogen interactions. Altogether, ion and small molecule uptake and utilization can enable the pathogen to adapt, invade, or even damage the host in C. albicans infection. 3. Protein secretion: In the light of data from the differential PPI networks from the early to late infection stages, proteins playing a role in protein secretion, such as Sap4 to Sap6, Sro77, and Sys3, have shown significant network structure variations (Table 9.1). It is well known that every cell is contained within a membrane that separates its interior from the external environment. Hence, the protein-secretion system, which can transport or drive out molecules from the interior of microbes to the external environment, is an important pathogenic mechanism by which microbes could fit and survive in their host cell environments [418]. Several adhesions and extracellularly secreted hydrolases, such as secreted aspartyl proteinases, secreted lipases, and phospholipase B [419], are identified as contributing to the virulence of C. albicans, and it is also well known that these proteins can facilitate nutrient supplies, adhesion to host cells, tissue invasion, and even host cell damage. Dynamic changes to cell surface components and proteins released from the pathogen into the host cell environment can play crucial roles in hostpathogen interactions during C. albicans infection process. It implies that the virulence characteristics of C. albicans are closely related to its protein secretion mechanisms during C. albicans infection. The Golgi apparatus is also known to be involved in the secretion mechanism. Studies have demonstrated that in C. albicans, the Golgi complex, consisting of puncta, is redistributed to the distal portion of the extending hyphae; however, it is randomly distributed throughout the cytoplasm in the yeast form [336]. Since the Golgi apparatus is considered to be a locus of biomolecule manufacture, the relocation of the
II. Systems Infection Microbiology
214
9. Essential functional modules for pathogenic and defensive mechanisms via host/pathogen
Golgi to the distal hyphal tip during hyphal formation implies that post-Golgi secretory vesicles need not to undergo long-distance transport from the cell body to the growing apical tip. Consequently, such Golgi redistribution will provide a more rapid apical growth and result in a further efficiency of the infection process [336,418]. It also appears to be that the clustered Golgi puncta could give C. albicans the ability to secrete virulencerelated proteins that could adhere to, invade, or damage host tissue during hostpathogen interactions. However, there is no explicit evidence yet, except the wellknown example of Sap4 to Sap6 [420], showing that mutant strains deficient in these identified protein secretion-related genes could exhibit a reduced virulence. Further studies are still needed to fully investigate the relationship between protein secretion and pathogenesis of C. albicans infection. Despite this, protein secretion mechanisms certainly enable C. albicans to adapt, survive, and invade the host, and, therefore, play a crucial role in the hostpathogen interaction process during C. albicans infection. 4. Shifts in carbon utilization: When host immune cells recognize and attach to the pathogen-associated molecular patterns (PAMPs) of pathogens, phagocytosis is activated, and then pathogens are engulfed by the cell membranes of phagocytes to form an internal phagosome. Phagocytosis is an important cellular process used by hosts to destroy and remove pathogens. Some studies have shown that the phagosome of the host is to create a nutrient-poor environment for pathogens [421]. Moreover, C. albicans has been shown to go through carbon starvation and glucose deprivation after internalization by macrophages [421]. Since glucose generally serves as the preferred source of energy and precursor for the synthesis of several other substances, the glucose-deficient environment of the macrophage requires carbon metabolism by the pathogen to be modulated for its source of energy and synthesis of substances after phagocytosis. Previous studies have shown that genes responsible for controlling the glyoxylate cycle and gluconeogenesis for the assimilation of two carbon compounds are activated after C. albicans is exposed to macrophages. Further, acetyl-CoA is a precursor to drive the glyoxylate cycle or gluconeogenesis and to be derived from fatty acids from either C. albicans or the macrophage [421]. In this chapter, by looking at proteins with significant SVVs, a functional module of carbon utilization-related genes including Lat1, orf19.3782, and Tes15 is determined (Table 9.1). Annotation data via the CGD can reveal that Lat1 and orf19.3782 are found to be associated with acetyl-CoA biosynthesis or transport. Tes15, orf19.4121, and Tes1 are found to be involved in the acyl-CoA metabolic processes; acyl-CoA is found a coenzyme involved in the metabolism of fatty acids. Pyc2 is found to be involved in the process of gluconeogenesis. Acc1, Agp2, and Pox13 are found to be involved in fatty acid biosynthesis, metabolic processes, and oxidation. It is found that during the early stage of infection, C. albicans is internalized by macrophages, and some acetyl-CoA-, acyl-CoA-, and fatty acid-associated proteins, such as Lat1, Tes15, and Agp2 are also activated (Fig. 9.3). Nevertheless, during the late-stage of C. albicans infection, macrophages are killed and C. albicans starts to scatter. When this happens, glucose is used as the chief carbon source, causing the acetyl-CoA-associated and fatty acid-associated proteins to decrease (Fig. 9.3). However, the trends in the gene expression profiles of Tes15, orf19.4121, Tes1, and Pox13 are different from the other five proteins (Fig. 9.3). Accordingly, this gradually increasing and subsequently decreasing expression profile can suggest the induction of acetyl-CoA synthesis during infection. A possible
II. Systems Infection Microbiology
9.3 Essential functional modules for pathogenic and defensive mechanisms
215
FIGURE 9.3 Gene expression profiles of Candida albicans proteins in the functional module underlying shifts in carbon utilization. (A) Lat1, (B) orf19.3782, (C) Tes15, (D) orf19.4121, (E) Tes1, (F) Pyc2, (G) Acc1, (H) Agp2, and (I) Pox13 [412].
explanation is that there is still some C. albicans that have been phagocytosed and consequently fatty acid-associated proteins are needed to produce glucose. Altogether, the rapid adaptation to ever-changing host environments, such as the reorganization of carbon utilization, can enable C. albicans to survive and infect the host. 9.3.2.2 Functionally enriched zebrafish modules for defensive mechanism during Candida albicans infection To examine what functional modules of zebrafish are induced by C. albicans infection, the bioinformatics database DAVID [405] and GO annotations are used for the analysis of
II. Systems Infection Microbiology
216
9. Essential functional modules for pathogenic and defensive mechanisms via host/pathogen
zebrafish proteins. By applying functional annotation clustering in DAVID, 380 proteins, which are identified with significant SVVs, could be classified into 10 cellular function modules. These cellular function modules can represent immune response, apoptosis mechanism, ion transport, protein secretion, hemostasis-related processes, signal transduction, transcription-related processes, embryonic morphogenesis and development, metabolism and catabolism, and other processes [proteins not belonging to the abovementioned cellular function modules or without GO annotation (Fig. 9.4)]. Among these 10 cellular function modules, 9 are statistically enriched (P , 0.05, Fisher’s exact test, except for other processes). Because some functional modules could represent general biological processes, in this chapter we have only focused on five functional modules: immune response, apoptosis mechanism, ion transport, protein secretion, and hemostasis-related process (Fig. 9.4). The first four cellular function modules are considered to play defensive roles in the battle of host against pathogen during C. albicans infection. The cellular function
FIGURE 9.4 The cellular function modules composed of 380 zebrafish proteins found to be significant for defensive mechanisms in the infection process. This figure shows a differential PPI network constructed from 380 infection-significant zebrafish proteins and their interactions. Red and blue edges indicate positive and negative dij, values, respectively, calculated using Eq. (9.2). The orange nodes represent significant proteins, that is, proteins with SVV P values # 0.05. There were 10 enriched zebrafish functional modules occurring in hostpathogen interactions, including those underlying immune response, apoptosis mechanism, ion transport, protein secretion, hemostasisrelated processes, signal transduction, transcription-related processes, embryonic morphogenesis and development, metabolism and catabolism, and other processes. The cellular function modules marked with blue circles were investigated in this chapter. The figure is created using Cytoscape plugin Cerebral [88,413]. The protein names have been omitted for simplicity [412]. PPI, Proteinprotein interaction; SVV, structure variation value.
II. Systems Infection Microbiology
9.3 Essential functional modules for pathogenic and defensive mechanisms
217
module of the hemostasis-related process can be used to explain the pathological outcome, that is, the fatal bleeding of zebrafish found in the C. albicanszebrafish infection model in this chapter. These defensive cellular function modules are now further discussed in the followings (Table 9.2). 1. Immune response: It is well known that C. albicans can be both a harmless commensal organism and a fatal pathogen. The transition from commensal organism to pathogen is found to be dependent on the interaction between C. albicans and the host innate immune system. Several proteins with significant SVVs identified from the network structure variations in Eq. (9.3) during infection are all involved in the immune response, such as Tlr2 and B2m (Table 9.2). During the interaction between host and pathogen, after the pattern recognition receptors (PRRs) of the host cells having recognized fungal PAMPs, an innate response is usually triggered to combat the pathogen. These PRRs include various toll-like receptors (TLRs) which are expressed by different cell types, such as macrophages, monocytes, and dendritic cells, and are found to be the primary immune sensors for the detection of invading pathogens [422]. One of the mechanisms involved in the innate immune recognitions of fungal pathogens is mediated by the Dectin-1/Tlr2 receptor complex that can recognize β-glucan, which is a major component of the cell wall of C. albicans [423]. Further, the activation of macrophages and dendritic cells expressing TLRs will also cause the adaptive immune response. Beta-2 microglobulin (B2m), which is found a protein with significant SVV in this chapter, is related to MHC (major histocompatibility complex) class I molecules, which could help T cells recognize antigens. It has been shown that B2m-knockout mice cannot express MHC class I molecules and have not CD8 1 and natural killer T cells [424]. If dysfunction of the host immune system is to occur, pathogens will invade easily, resulting in severe damage or even life-threatening systematic infection. The identification of the enrichment of the cellular function module underlying the immune response reinforces the crucial role the immune system plays in defensive mechanisms against invasive C. albicans in the battle between C. albicans and zebrafish in the infection process. TABLE 9.2 Enriched cellular function modules for defensive mechanisms in zebrafish and the corresponding significant proteins in Fig. 9.4 during the hostpathogen interaction [412]. Functional module
Protein symbol
Immune response
Tlr2, B2m, Akt2, Akt2l, Apaf1, Cxcr3.2, Pik3r3a, Sigirr, Ticam1, Tlr20a, Vtna
Apoptosis mechanism
Akt2, Akt2l, Apaf1, Cdk5, Gdnfa, Nras, Phlda3, Pik3r3a, Plcg2, Prkar2ab, Sgk1, Tax1bp1a
Ion transport
Tfa, Abcc9, Cacnb3a, Cacnb4b, Clk2a, Cox5ab, Grid2, Grin1a, Grin1b, Kcnh1, Kcnq1, Sfxn1, si:ch211-12e13.7, si:ch211258f14.5, Slc12a3, Slc26a6l, Slc39a6, Trpc6a, Trpm7, zgc:109934, zgc:162160
Protein secretion
Rab2a, Rab3da, Rab3db, Rab6ba, Rab8a, Rab10, Rab11a, Rab35, Ap1m2, Ap3m1, Atg4c, Bcap31, Naca, Nup85, Ramp2, Scamp2, Snx17, Tnpo2, Trpc4apa, zgc:113338
Hemostasis-related processes
Calcrla, Lama4, Acvrl1, Bmp4, Cdh2, Csrp1a, Ell, Gata2a, Hapln1b, Hopx, Nr2f2, Nrxn3a, Nrxn3b, Plxnb2a, Rab11
II. Systems Infection Microbiology
218
9. Essential functional modules for pathogenic and defensive mechanisms via host/pathogen
2. Apoptosis mechanism: Apoptosis is a biological process of leading to cell death and the proper modulation of apoptosis is essential for the survival of the host. However, hosts and pathogens can induce apoptosis in different ways so as to gain their optimal benefit. Pathogens have evolved several strategies to induce or inhibit host cell apoptosis, allowing the pathogen to cope with the innate response and favoring further infection of the pathogen into the host tissues [425,426]. In contrast to pathogens, hosts defend against infection of C. albicans by inducing apoptosis in infected cells and inhibiting apoptosis in immune cells [427,428]. It has been shown that the outer surface of the cell wall of C. albicans is covered with phospholipomannan, which binds to the membranes of macrophages and stimulates Tlr2-mediated apoptosis [429]. This can result in macrophages apoptosis and enhance the survival of C. albicans [429,430]. In contrast, several studies have shown that the resistance of monocytes to C. albicansinduced apoptosis may also limit pathogen replication, protect monocyte viability and, therefore, enhance the host-defense response [428,431]. In addition, although there has no explicit evidence to indicate direct apoptotic effects on the nonphagocytic cells of the host in C. albicans infection, it has been shown that apoptosis affords the infected intestinal epithelial cells a protection mechanism to defend against invasive enteric pathogens by destroying infected or damaged epithelia [432]. Thus we suggest that zebrafish could induce apoptosis in C. albicans-infected cells to prevent pathogen dissemination or to kill pathogens in the infected cells. Taken together, during the C. albicans zebrafish interaction process, it seems that C. albicans can interfere with the apoptosis mechanism of zebrafish to escape from host-defense and infect host cells, while zebrafish can manipulate apoptosis to help eliminate C. albicans in the infected cells. At present, no evidence exists to indicate that mutants with these identified apoptosis-related genes being knocked out have higher mortality rates or are more highly susceptible to pathogen infection; however, it is reasonable to conjecture that dysfunctions in zebrafish apoptosis-mechanisms will weaken the ability of the fish to defend against C. albicans, and that zebrafish lacking such genes will be infected more severely and eventually die. Thus the enriched cellular function module pertaining to apoptosis might play an important role in the complex hostpathogen interaction process during the C. albicans infection. 3. Ion transport: From the investigation of significant SVVs, several proteins involved in calcium, potassium, iron, and zinc ion transport have been found to be significant in infection (Table 9.2). Previous studies have shown that the regulation of Ca2 1 and K 1 -signaling pathways is involved in T lymphocyte activation [433]. In addition, intracellular calcium and potassium ion homeostasis could also influence apoptosis [434]. Consequently, the proper calcium and potassium ion transport can assist the immune response and apoptosis in zebrafish in response to C. albicans infection. In addition to calcium and potassium ions, there are also other ion transporters that also participate in micronutrient transport, such as iron and zinc. Micronutrients are nutrients, which are required in small quantities to support normal physiological function. Pathogens have also developed certain strategies to deprive the host of these micronutrients in order to promote the growth and pathogenesis of C. albicans in the infection process. Conversely, hosts remove micronutrients from invading pathogens, that is, making these micronutrients
II. Systems Infection Microbiology
9.3 Essential functional modules for pathogenic and defensive mechanisms
219
unavailable to the pathogens, an idea termed nutritional immunity. Iron is a crucial cofactor for several proteins and enzymes, consequently it is involved in numerous cellular functions and metabolic pathways [424]. A well-studied topic of nutritional immunity is the iron-withholding defense system [435]. Using the approach in this chapter, it is shown that transferrin-a (Tfa), a protein related to iron transport, can suffer a significant protein interaction change. Hosts have several iron-withholding mechanisms, and one of them can act through the host iron-binding proteins, the transferrins [417]. Therefore transferrin, responsible for iron scavenging in plasma and lymph, has antimicrobial activity in the hostpathogen interaction. Recent study has shown that there is also competition for micronutrients other than iron (e.g., zinc) during the hostpathogen interaction process [436]. Zinc, also crucial for living organisms, can play an important role in the immune system, and zinc deficiency can induce broad-spectrum defects in both innate and adaptive immunity [437]. Zinc sequestration by the host might inhibit microbial growth and protect against C. albicans infection. A previous study has shown that calprotectin, a neutrophil-derived protein, can compete with C. albicans for zinc which is needed for the growth of C. albicans [438]. Accordingly, from the ion transport cellular function module perspective, it is reasonable to infer that normal ion transport systems, which limit micronutrient availability and are required for optimal immune or apoptotic function, can help zebrafish in defending against C. albicans infection. 4. Protein secretion: Similar to the situation in C. albicans, protein secretion or protein transport is identified as the enriched cellular function module during C. albicans infection in zebrafish. Several proteins, especially those belonging to Rab family, have shown significant network structure variations during hostpathogen interaction process (Table 9.2). The Rab family is part of the Ras superfamily of small GTPases and functions in the regulation of intracellular vesicle trafficking and protein transport between different organelles and various secretory vesicles [439,440]. Even if there is no direct evidence linking zebrafish Rab family proteins with fungal infections, Rab GTPases have been found to be involved in the process of pathogen infection in many other organisms. In Caenorhabditis elegans, the small GTPase Rab1 is shown to control innate immunity by regulating antimicrobial peptide gene expression [441]. In red drum fish (Symphodus ocellatus), it has recently been reported that Rab1 could regulate the intracellular bacterial infection and thus is likely to play a role in bacteria-induced host immune defense [439]. In mammals, Rab5 and Rab7 have been shown to regulate the early events of HIV-1 infection in human placental cells [442]. Further, Rab5 and Rab7 are demonstrated to affect the entry and transport of some viruses and bacteria [439]. These results have indicated that Rab proteins are functionally associated with the endocytosis and trafficking of intracellular pathogens, and pathogens have evolved the corresponding strategies to modulate Rab functions [443]. Based on the findings from other organisms and the fact that many Rab proteins are identified as SVV-significant in this chapter, we can infer that Rab proteins in zebrafish also play important roles in host resistance against microbial infections. Further studies are still needed to characterize the cellular functions of zebrafish Rab proteins, which will help understand the relationship between protein secretion and host response during infection.
II. Systems Infection Microbiology
220
9. Essential functional modules for pathogenic and defensive mechanisms via host/pathogen
5. Hemostasis-related processes: In this chapter, the pathological outcome of the hostpathogen interaction in zebrafish is found to be massive fatal bleeding. Based on the pathological outcome and the fact that C. albicans may cause the deep-seated infection and disruption of endothelial surfaces [444], it seems that the damage of host endothelial cells or blood vessels might occur during the interaction of C. albicans with zebrafish. In contrast, from the molecular perspective, several proteins important in hemostasis-related processes are found to have significant SVVs (Table 9.2). We find that zebrafish protein Calcrla, previously named Crlr, and protein Lama4 belong to this functional module. The calcitonin receptor-like receptor (Crlr) is a main endothelial cell receptor, which is involved in cardiovascular homeostasis. In zebrafish, it has been demonstrated that mutation of crlr, which is found to be associated with the vascular development and angiogenesis, can lead to atrophy of the trunk dorsal aorta or the lack of blood circulation [445]. In this chapter, from the decreased Calcrla edge numbers in the protein interaction network dynamics, we might infer that Calcrla-employing biological processes could be attenuated in the late stage of infection. Blood vessels are composed of two major cell types: endothelial cells and periendothelial cells. Except these cellular components, there are still certain structural elements involved in the preservation of vascular integrity, such as adherens junctions, basement membranes, and the extracellular matrix [446]. Laminins are components of the basement membrane. Zebrafish with morpholino knockdown of lama4 have been shown to undergo cardiac dysfunction and embryonic hemorrhage [447]. In this chapter the gene expression of the identified zebrafish laminin, alpha 4 (Lama4), is found to decline as infection advances (Fig. 9.5), indicating that the vascular integrity may not be maintained in the C. albicans infection process. Therefore, from the behavior of the hemostasis-related functional module identified, we might speculate that C. albicans can penetrate endothelial cells and invade deeper tissues in zebrafish. In addition, the blood vessels of zebrafish are found to be damaged and vascular homeostasis cannot be maintained during the late stage of C. albicans infection. FIGURE 9.5 The gene expression profile of zebrafish laminin, alpha 4 (Lama 4) [412].
II. Systems Infection Microbiology
9.4 Discussion
221
The table lists five of the functional modules considered to be essential in the C. albicanszebrafish interaction and the corresponding proteins with associated GO annotation.
9.4 Discussion The importance of hostpathogen interaction process during infection has long been apparent to biologists and clinicians. It is essential to understand the possible cellular function factors, which could determine the virulence of pathogens during an infection process. At the same time the host defensive mechanisms and pathogenic mechanisms of damage, disease, and even mortality of the host are also of interest to researchers. Whenever the underlying molecular mechanisms are uncovered, it should become possible to develop various therapeutic strategies for drug design to prevent tissue damage and death caused by C. albicans infection. In this chapter the pathogenic functional modules of C. albicans active during infection and the corresponding defensive functional modules of zebrafish induced to respond the pathogenic threat are investigated by simultaneous hostpathogen interaction microarray data from both the systematic and molecular viewpoints. Through two-side gene expression profiles, PPI information obtained from database mining, and discrete dynamic interaction models, PPI networks of pathogen and host are constructed at two different infection stages. By comparing the host/pathogen PPI networks at the early and late infection stages to generate a differential host/pathogen PPI network, the host/pathogen PPI network reconfiguration and those proteins showing significant interaction changes during the infection period are then determined. Furthermore, enriched cellular function modules among those proteins identified as playing significant roles are investigated in great detail by GO annotation. Hyphal morphogenesis, ion and small molecule transport, protein secretion, and shifts in carbon utilization are found to be the most important molecular mechanisms of pathogenesis in C. albicans infection. Simultaneously, immune responses, apoptosis, ion transport, and protein secretion are found to be crucial molecular defensive mechanisms induced in zebrafish in response to the pathogen. Further, we conjecture from the functional module of hemostasis that C. albicans can damage the blood vessels of zebrafish, resulting in irreparable vascular destruction, which is consistent with the pathological outcome of fatal hemorrhage. Biological systems are highly dynamic entities that could continuously respond to environmental changes. However, few studies have investigated network reconfiguration or network rewiring to make clear these cellular responses [401]. The method employed in this chapter has been shown to be useful in constructing host/pathogen PPI networks and identifying the essential cellular function modules for pathogenic and defensive mechanisms in an infection process based on differential host/pathogen network analyses. Such an approach could highlight those PPIs that have changed dramatically across different conditions and potentially be suitable for the study of network comparisons with different cellular responses. However, there are still some drawbacks to be addressed. First, the PPI data for C. albicans and zebrafish used in this chapter are inferred from the PPIs of S. cerevisiae and humans with the help of corresponding ortholog data. Even though the imprecision in PPI information could lead to deviations between the constructed PPI network and the real situation, AIC is used to detect significant interactions to prune the false positives under the specific condition of infection process with the help of gene expression profiles,
II. Systems Infection Microbiology
222
9. Essential functional modules for pathogenic and defensive mechanisms via host/pathogen
that is, the potential false positive PPIs that are arose from ortholog-based inference could be pruned by AIC. In this situation the effect of the imprecise PPI information will be minimized. However, high coverage and reliable protein interaction maps for C. albicans and zebrafish will still benefit the construction of PPI networks and the investigation of essential cellular function modules in C. albicans infection in the future. Second, gene expression profiles are overlaid to estimate the expression of their corresponding proteins. However, there are several steps involved in the synthesis of proteins from mRNAs [448,449]. The overlay of protein expression levels using gene expression values without any modification may result in inaccuracies in the identified PPI networks. Once the high-throughput protein expression data are available, a great improvement in PPI network construction will be approached. Third, the interactions of C. albicans with epithelial cells during the infection process can be roughly divided into three major steps: adhesion, invasion, and damage [143,146]. In contrast, the host employs the corresponding defensive mechanisms in response to invasive fungal infections, perhaps beginning with recognition, followed by defensive responses, and eventually a victor emerges from the competition of infection. However, due to limitations in the number of time points in the microarray data, the infection process is divided into only early and late stages in this chapter. If more time points in the microarray data could be obtained during the C. albicans infection process, especially targeting the adhesion, invasion, and damage stages, more detailed stage-specific hostpathogen interactions could be investigated in the C. albicans infection process. In this way the invasive and defensive strategies taken by C. albicans and zebrafish, respectively, could be more specifically investigated. Humans have to face a large number of challenges due to infections of pathogens during the course of a lifetime. Consequently, the investigation of molecular mechanisms of life-threatening infection is crucial. For human fungal pathogens the relevant research into infection could provide the knowledge about PPI network structures, infectious mechanisms, and bioecology [450]. Through a deeper understanding of infections mechanisms of pathogens, new therapeutic strategies against invasive microorganisms are also possible. At present, only few studies have explored the systems biology of pathogen infection and the host responses simultaneously. Hence, a comprehensive picture combining the pathogenic mechanisms of the pathogen and defensive mechanisms of its host could provide significant biomarkers as drug targets for a novel antifungal drug discovery strategy to treat and prevent serious infectious disease, even mortality. From the pathogen perspective the result of the analysis of C. albicans pathogenesis in this chapter could highlight some functional modules associated with hyphal morphogenesis, ion and small molecule transport, protein secretion, and shifts in carbon utilization, all playing crucial roles in invasion and damage to host cells. The significant molecules involved in these processes might be considered as potential biomarkers of drug targets for drug discovery. From the host perspective the immune response, apoptosis, ion transport, protein secretion, and hemostasis-related processes are considered to be crucial molecular mechanisms for defense and survival in C. albicans infections. Consequently, proteins involved in these cellular function modules could be therapeutically protected to prevent the irreparable damage caused by the infection C. albicans. Most recently, we have developed a computational framework to construct interspecies PPI network [6], which can be integrated with the current methodology to investigate the interspecies cellular function modules for
II. Systems Infection Microbiology
9.6 Appendix
223
molecular mechanisms during infection in the future. It is appealing, with the help of more detailed biological cellular function modules for significant biomarkers as drug targets, that the therapeutic treatment of life-threatening infection can be developed further and the mortality rates due to infection can ultimately be decreased.
9.5 Conclusion In this chapter, with the help of simultaneous hostpathogen interaction, two-side microarrays for both C. albicans and zebrafish, we could investigate essential cellular function modules for pathogenic and defensive mechanisms in C. albicans infections using differential PPI network analysis. The early- and late-stage PPI networks for both organisms are constructed at first. We then decide the network reconfiguration to identify the proteins with significant interaction variations during C. albicans infection and to extract the enriched cellular function modules among these proteins. The hyphal morphogenesis, ion and small molecule transport, protein secretion, and shifts in carbon utilization functional modules in C. albicans are found to play important roles in pathogen invasion and damage to host cells. The zebrafish cellular function modules like those involved in immune response, apoptosis mechanism, ion transport, protein secretion, and hemostasis-related processes are induced as significant defensive mechanisms during C. albicans infection. The essential cellular function modules thus determined could provide more insights into the molecular mechanisms during the infection process and thereby could be considered as biomarkers of drug target to help to devise potential therapeutic strategies to treat C. albicans infection.
9.6 Appendix 9.6.1 Appendix 1: Supplementary method 9.6.1.1 Details of protein interaction network construction A discrete dynamic model is employed to determine the PPI networks induced in the infection of zebrafish by C. albicans. For a target protein i in the candidate PPI network, the dynamic model of the activity of protein i is calculated as follows [82]: zi ½t 1 1 5 zi ½t 1
Mi X
bij zi ½tzj ½t 1 αi xi ½t 2 β i zi ½t 1 hi 1 ωi ½t
(9.A1)
j51
where zi ½t represents the protein expression level of the target protein i at time t, bij denotes the ability of the jth interactive protein to interact with target protein i, zi ½t indicates the expression level of protein j which interacts with the target protein i at time t, αi represents the translation effect of mRNA to protein, xi ½t denotes the corresponding mRNA expression level of the target protein i, β i denotes the degradation rate of i, hi indicates the basal level of protein i, which denotes other unknown effect, and ωi ½t represents the stochastic measurement noise. The biological significance of Eq. (9.A1) is that the protein expression level of target protein i at a later time t 1 1 is a function of the protein
II. Systems Infection Microbiology
224
9. Essential functional modules for pathogenic and defensive mechanisms via host/pathogen
expression level occurring at time t, the interactions with Mi interactive proteins, the process of translation from mRNA, the effects of protein degradation, the basal level of protein i, and the stochastic measurement noises [82]. To identify the associated parameters using microarray data, a constrained least squares parameter estimation was adopted [82]. Eq. (9.A1) can be represented in the following regression form: 3 2 bi1 6 ^ 7 7 6 6 biMi 7 7 1 ωi ½t 5 φi ½t θi 1 ωi ½t (9.A2) 6 zi ½t 1 1 5 zi ½tz1 ½t ? zi ½tzMi ½t xi ½t zi ½t 1 6 7 6 αi 7 4 1 2 βi 5 hi
where φi ½t represents the regression data vector and θi denotes the kinetic parameter vector to be estimated. In order to avoid the danger of overfitting the estimated parameters, the original 2 data points are interpolated to L data points by the In cubic spline method. other words, there are zi ½l 1 1; φi ½l data point pairs for lA 1; ? ; L 2 1 . Hence Eq. (9.A2) can be written in the following algebraic form for target protein i:
Zi 5 Φi θi 1 Ωi 3 2 3 2 3 z i ½ 2 φi ½1 ω i ½ 1 5, Ωi 5 4 5. Z i 5 4 ^ 5 , Φi 5 4 ^ ^ zi ½L φ i ½ L 2 1 ω i ½ L 2 1 2
where
(9.A3)
In this case the parameter estimation problem for target protein i in the candidate PPI network can be represented by the following constrained least squares minimization equation:
1 2 min :Φi θi 2Zi :2 such that C θi # d (9.A4) θi 2 T where C 5 diag½0?0 2 1 0 2 1 and d 5 0 ? 0 , constraining the parameters αi and hi to be nonnegative. Once the interaction abilities bij are estimated protein by protein in the candidate PPI network using the constrained least squares parameter estimation method, the AIC [40,81] is applied to prune those insignificant interactions due to false positives in the candidate PPI network by the system order detection technique. AIC, which includes both estimated residual error and model complexity in one statistics, quantifies the relative goodness of fit of a model. For a protein interaction model with Mi number of interaction parameters (or proteins) to fit with data from L samples, the AIC can be written as follows [40,81]: 1 2Mi T ^ ^ (9.A5) AICðMi Þ 5 log ðZi 2 Zi Þ ðZi 2 Zi Þ 1 L L where Z^ i 5 Φi θ^ i , and σ^ 2i 5 L1 ðZi 2 Z^ i ÞT ðZi 2 Z^ i Þ is the estimated residual error. As the residual error σ^ 21 decreases, the AIC decreases. In contrast, while the number of interactive proteins (or parameters) Mi increases, the AIC increases. The minimization in Eq. (9.A5) will indicate the real model order of the protein interaction system. With the statistical selection of Mi interactive proteins by minimization of the AIC, the insignificant proteins more
II. Systems Infection Microbiology
225
9.6 Appendix
than Mi could be considered as false positives and should be pruned from the candidate PPI network to obtain the real PPI network. Hence, AIC can be adopted to select model order, filtering out insignificant protein interactions in the candidate PPI network producing a more real PPI network based on the estimated interaction abilities (bij s) obtained by time profile microarray data. In this way, with different sets of microarray data (0.54 hpi for the early stage and 418 hpi for the late stage), two PPI networks are constructed for the early and late stages of C. albicans infection of zebrafish by removing insignificant interactions through AIC for both organisms.
9.6.2 Appendix 2: Supplemental figures Differential PPI network of Candidate albicans and zebrafish FIGURE A9.1 The differential PPI network obtained from the early and late stage PPI networks of Candida albicans. Red and blue edges indicate positive and negative dij,l values respectively, calculated using Eq. (9.2). The protein names have been omitted for simplicity [412]. PPI, Proteinprotein interaction.
FIGURE A9.2 The differential PPI network obtained from the early and late stage PPI networks of zebrafish. Red and blue edges indicate positive and negative dij, l values, respectively, calculated using Eq. (9.2). The protein names have been omitted for simplicity [412]. PPI, Proteinprotein interaction.
II. Systems Infection Microbiology
226
9. Essential functional modules for pathogenic and defensive mechanisms via host/pathogen
FIGURE A9.3 Distributions of SVVs for Candida albicans and zebrafish. (A) C. albicans and (B) zebrafish [412]. SVVs, Structure variation values.
II. Systems Infection Microbiology
C H A P T E R
10 The role of inflammation and immune response in cerebella wound-healing mechanism after traumatic injury in zebrafish 10.1 Introduction TBI, also known as intracranial injury, is a greater cause of death and disability worldwide, especially in children and young adults. Recent data have shown that approximately 1.7 million people suffer a TBI annually in the United States [451]. TBI is mainly caused by an impact on the head or a penetrating head injury that could disrupt normal brain functions. Upon TBI the direct damage and the subsequent secondary injury in the brain often ends in chronic neurological disorders. The multifactorial nature of secondary injury after brain injury is always characterized by a complex multicellular process, including apoptosis, inflammation, immune response, proliferation of glial cells, and increased progenitor cell activity, which could cause to an increase in neurogenesis [452]. At present, most of the underlying molecular restoration mechanisms are still unclear, and this leads to less effectively therapeutic treatment strategies for TBI in human. The central nervous system (CNS) is very important to the most organisms. Once it is damaged, there will be lethal effects if it cannot be healed or regenerated. Recently, advances in neuroscience research have led to the development of innovative therapeutic treatment strategies to aim to regenerate the damaged CNS. Tissue regeneration is one of the most interesting biological phenomena. At present, the molecular and cellular mechanisms by which the regeneration takes place are still unclear [453]. The ability to regenerate the lost or damaged parts of the CNS has been seen in varying degrees in many organisms. In the past the adult mammalian CNS was viewed as a system without the capacity for regeneration once it is damaged [454]. In contrast to adult mammals with their limited ability to regenerate CNS, Danio rerio (zebrafish) can constructively produce new neurons along the rostrocaudal brain axis all its lifespan. Other organs also have an extensive regenerative capacity to respond to injuries such as trauma, lesion, and ischemic episodes
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00004-3
229
© 2021 Elsevier Inc. All rights reserved.
230
10. The role of inflammation and immune response in cerebella wound-healing mechanism
by producing new tissues to replace the lost ones. Adult zebrafishes still have the extraordinary ability to regenerate their damaged fins, skin, and heart [455,456]. Moreover, they can regenerate several organs or tissues in the nervous system: spinal cord, photoreceptor, retina, cerebellum, and optic nerve [456461]. This feature of the adult zebrafish brain depends on the presence of neural stem cell niches that can cause stem cells to continuously proliferate on a permissive environment for neurogenesis in the brain [462]. Further, the zebrafish immune system is also unusually similar to mammalian immune systems. Overall, the zebrafish genetic map has demonstrated a highly conserved synteny, which is similar to the human genome [319]. Consequently, the zebrafish has emerged as a powerful vertebrate model for the elucidation of the molecular and cellular mechanisms of regeneration [455], and numerous studies have already used this species as a model to study the regeneration of the nervous system [462]. In this chapter the schematic diagram of TBI experiment on zebrafish by stab lesion is established and the gene expression levels are measured through time-course microarray experiments from the injury time to the recovery stage. With the time-course microarray experiments the molecular mechanisms of the cerebellar wound healing are studied by systems biology method using the dynamic network modeling [82]. The dynamic network model integrates the information from mining various proteinprotein interaction (PPI) databases mining and the time-course microarray data identification in this chapter. The resultant PPI network serves as a basis to investigate the molecular mechanisms of cerebellar wound healing. Meanwhile, the evolution of behavior of zebrafish from the injury time to the recovery stage is observed under the confocal microscope, video-recorded, and quantitatively analyzed to measure the degree of disability of zebrafish caused by the injuries. This analysis will be helpful to focus on those genes involving in the recovery process if a high correlation is shown between the gene expression profile and the zebrafish movement index (ZMI). Based on the correlation between the gene expression profiles and the ZMI, we only focus on three groups of genes, that is, the acutely activated, positively correlated, and negatively correlated to ZMI groups. To extract the systematic knowledge from the time-course expression data, several systems biology tools, for example, STEM [463] and PANTHER [464], are employed in this chapter besides the dynamic network modeling method. Several significantly enriched pathways in the above three groups are identified, for example, chemokine signaling pathway (inflammation-related), Phosphatidylinositide 3-kinases (PI3K) signaling pathway (cell cycle-related), and axon guidance pathway. After the injury of the cerebellum, these pathways are coordinated by inflammation and immune response as defense mechanism to interact with cell cycle pathways and neurotransmitter-related pathways for neurogenesis and angiogenesis for the restoration of injured brain. Then, with the dynamic cellular PPI network as a backbone, the cross talks to coordinate with these pathways during the recovery process and the schematic diagram of wound healingrelated cellular pathways are also presented in this chapter. Specious nodes within the PPI network of signaling pathways of the injured cerebellum are PI3K, PAK2, and PLXNA3 for the coordination of the signaling inflammation and immune response to interact with cell cycle pathways for neurogenesis and angiogenesis, which are crucial for the restoration of injured brain. Further, we also find pathways may stimulate subnetworks essential for the neurogenesis and angiogenesis. These findings not only confirm that the proposed systems biology methods are successful but also improve the confidence on the result of wet experiment design employed. Some
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
10.2 Materials and methods for constructing proteinprotein interaction network of cerebellar
231
interactions, such as cross talks to coordinate among signaling pathways for wound healing, are also identified in this chapter, which will aid in illustrating these complex molecular restoration mechanisms in the future. These findings can provide an insight into how the role of inflammation and immune response plays in the molecular restoration mechanisms after a traumatic event to the brain and opens up a new opportunity to devise therapeutic treatment strategies for TBI in human.
10.2 Materials and methods for constructing proteinprotein interaction network of cerebellar wound-healing process in zebrafishes 10.2.1 Stab lesion assay and time-course microarray experiments in zebrafish traumatic brain injury model Zebrafish line (Tg(kdr:EGFP)) and wild-type are used to perform the stab lesion assay, microarray, and behavior video-tracking experiments. The body length of all zebrafishes used in this chapter is 2.83.8 cm. Before the stab lesion assay the zebrafishes are then kept under a light cycle of 14 h light and 10 h dark at a temperature of 28uC. In the stab lesion assay, 6-month-old adult zebrafish are anesthetized by immersion in the aquarium water in which 200 ppm tricaine (MS222; Sigma, St. Louis, MO, United States) is dissolved for 5 min. A syringe (27G; Thermo Scientific, United States) is then punched vertically through the cranial surface into the zebrafish cerebellum to a depth of 1.5 mm (Fig. 10.1). There are three large scales on the skull, which completely match with the positions of 2 optic tectum hemispheres and the cerebellum. The accumulating pigment around the three scales can also help us locate the lesion site of the wild-type zebrafish’s brain. The injury depth is also controlled by wrapping the needle with plastic tube and only left 1.5 mm needle tip for stabbing. The lesion depth with sagittal brain sections is examined too (see Fig. 10.A1 in Appendix). The injured zebrafish are then put back into fresh aquarium water for recovery. Zebrafish that have apparently ceased bleeding are selected for the subsequent experiments. The control group is not anesthetized. Every fish in the experiment is only anesthetized with tricaine to cause the lesion except the control animals. In this study the tricaine anesthetization is the standard procedure in zebrafish experiments. After the lesion the whole sample preparation process does not use any anesthetic drugs. Before they are sacrificed, ice-cold water is used to immobilize them and the cerebellum is quickly dissected out. The dissected cerebellum is collected into a 1.5 mL microcentrifuge tube with 200 mL TRIzol (Invitrogen, Carlsbad, CA, United States) (8 cerebellums per tube). After homogenization, each tube is filled with TRIzol up to 1 mL. After lesion the whole sample preparation process does not use any anesthetic drugs too. Therefore the anesthetic process will have not much influence on the gene expression profiles. The experimental procedures have been approved by the committee for the use of laboratory animals at National Tsing Hua University (IACUC number: 10140). For the microarray experiments, zebrafishes are euthanized by the prolonged immersion in ice-cold water. Then, these are dissected quickly to harvest the brains. Each dissection takes about 4 min. A surgical knife is used to cut out the cerebellums. The separated cerebellums are kept in RNAlater (Ambion, Inc., Austin, Texas, United States). The cerebellums are collected at the following time points: Control (no injury),
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
232
10. The role of inflammation and immune response in cerebella wound-healing mechanism
FIGURE 10.1 The diagram of the stab lesion assay. (A) Schematic diagram of the stab lesion. A 27 G syringe is used to create a cranial wound of depth 1.5 mm in the cerebellum. (BE) Bright field images of the cranium (B, D) and exposed brains (C, E) of control fish and after the stab lesion. B and C show intact cranium and cerebellum of the control (no stab lesion) and (D) and (E) show the injured cranium and cerebellum of an experimental animal at 0.5 hpl. The fresh wound can be seen clearly on both the cranium and the cerebellum (D and E—the white arrow shows the lesion site on the cranium and the black arrow shows the wound on the exposed cerebellum). doi:10.1371/journal.pone.0097902.g001 [493]. Ce, Cerebellum; OB, olfactory bulb; OT, optic tectum; Tel, telencephalon; SC, spinal cord.
0.25, 1, 3, 6, 10, 15, 21, and 28 dpl (The recovery process is also monitored using immunohistochemistry staining; see Fig. 10.A2). At each time point the sample is consisted of eight cerebellums from male zebrafishes. Microarray analysis of the samples is guided by WELGENE Biotech CO., LTD. The quality assurance and quality control data for the samples are given in Table 10.A1 in Appendix. The regeneration of cerebellum and the time-course microarray data in the cerebellar wound-healing process are shown in Fig. 10.2. The microarray dataset to be used for reconstructing the PPI network in this chapter can be retrieved as Dataset GSE56375 from the NCBI GEO repository website (http://www.ncbi.nlm.nih.gov/geo/).
10.2.2 Experiments for zebrafish movement index For the ZMI experiments, zebrafish are placed in a glass tank (18,612,615 cm) filled with 2.5 L aquarium water. For each time point, 3 male adult zebrafishes (with body length ranging from 2.8 to 3.8 cm) are transferred to the observation tank. Their behavior is then recorded with a video camera (HDR-XR200; Sony, Japan) which is placed above the tank and set to record trails of 5 min duration (Fig. 10.3AC). The first video of zebrafish behavior is recorded after placing the zebrafish in aquarium water for 25 min and the second video is recorded after 10 min. Three behavioral patterns are chosen as criteria for evaluating the movement of zebrafish, including total swimming distance (distance index,
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
FIGURE 10.2 Regeneration of cerebellum at different day-post-lesion and the time-course microarray data in cerebellar wound-healing process. (AH) Bright field images of the craniums (A,C,E,G) and cerebellums (B,D,F,H) at different time points post lesion. The cranial and cerebellum are intact before stab lesion (A, B). At 3 dpl the wound can be seen on the cranial and cerebellum. (C,D) At 7 dpl the cranial is seal and scar is observed in the cerebellum. (E, F) At 14 dpl the scar is hardly seen (scale bar, 100 mm). (I) The differentially expressed genes (1.5 or 0.67 fold change) are hierarchically clustered (5839 genes). Columns represent the day-post-lesion (dpl), and the rows represent the genes. The blocks indicate temporally upregulated and downregulated genes, respectively. The color bar represents the log value of the ratio relative to the intensity at 0 dpl. doi:10.1371/journal.pone.0097902.g002 [493].
234
10. The role of inflammation and immune response in cerebella wound-healing mechanism
FIGURE 10.3 Equipment setup for behavior analysis. (A) Close-up views of the video camera. (B) Close-up views of the aquarium. (C) The camera and aquarium setup. (D) Swim sample paths of zebrafish after TBI. (E) Quantitative behavior index (ZMI) of injured zebrafish describes the degree of disability in the behavior of the injured zebrafish. The greater the index value, the greater the degree of disability. doi:10.1371/journal. pone.0097902.g003 [493]. ZMI, Zebrafish movement index.
DI), single direction turning (turn direction index, TDI), and turning angle (turn angle index, TAI) [465,466]. The formulae of each index are given in the following with the notations: C and I stand for control and injured fish, respectively. ZMI 5 DI 1 TDI 1 TAI where swimming distanceC 2 swimming distanceI swimming distanceC 1swimming distanceI #of right turnI 2 #of left turnI #of right turnC 2 #of left turnC TDI 5 2 #of right turnI 1#of left turnI #of right turnC 1#of left turnC ð#of fish with turn angle . 100Þ 2 ð#of fish with turn angle . 100Þ C I TAI 5 ð#of fish with turn angle . 100ÞC 1ð#of fish with turn angle . 100ÞI DI 5
Note that the values of the above three indices are between 0 and 1 and will be equal to 0 when the behaviors of injured fish perfectly match to control fish. Multiple zebrafish tracking
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
10.2 Materials and methods for constructing proteinprotein interaction network of cerebellar
235
FIGURE 10.4 Flowchart for constructing the wound healingrelated PPI network using a dynamic network model and big data mining. The network is constructed by PPI information from databases mining and systems identification according to the time-course microarray data (as shown in the boxes). Open access systems biology tools STEM and PANTHER are used to identify significantly temporal patterns and enriched cerebellar wound healingrelated pathways. doi:10.1371/journal.pone.0097902.g004 [493]. PPI, Proteinprotein interaction.
and behavior analysis software from prof. YC Chen’s Laboratory (Department of Electrical Engineering, NTHU) are used to analyze the swimming distance, turn direction, and turn angle for each zebrafish in the video clips. An example of a swim path diagram used for this behavior analysis is shown in Fig. 10.3D.
10.2.3 Big data mining for candidate proteinprotein interaction network Several types of data are integrated to construct the cerebellar wound-healing molecular network, including time-course expression profiles (Fig. 10.6), zebrafish and human PPIs (from BioGRID and Reactome) [73,467], zebrafishhuman gene ortholog data (ZFIN) [376], and zebrafish gene annotations are retrieved from gene ontology (GO) [273]. The PPIs of human and zebrafish collected from the databases are served as a collection of candidates of potential PPIs in the cerebellar wound-healing network. However, there is a small amount of information on zebrafish PPIs. Therefore we can infer part of zebrafish PPIs from human PPIs based on ortholog information. Then, we have PPI information (inferred from human 1 zebrafish) and time-course expression profiles. For setting up the protein pool for differentially expressed proteins, these expression profiles will be used to select differentially expressed proteins according to fold change. The differentially expressed proteins are identified if their fold changes are $ 1.5 or # 0.67 comparing to control time point. These differentially expressed proteins are assumed to be most likely involved in the
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
236
10. The role of inflammation and immune response in cerebella wound-healing mechanism
FIGURE 10.5 The constructed zebrafish cerebellar wound healingrelated cellular PPI network by dynamic network modeling via microarray data and big data mining. The cerebellar wound healingrelated PPI network of zebrafish contains 5270 PPIs among 802 proteins. The red, green, and blue nodes belong to group N, A, and P (see Fig. 10.6 and the context for details). The information to draw the network is summarized in Appendix. doi:10.1371/journal.pone.0097902.g005 [493]. PPI, Proteinprotein interaction.
cerebellar wound-healing process. Based on the protein pool and PPI information, a candidate PPI network for the cerebellar wound-healing process is then constructed.
10.2.4 Dynamic network modeling for constructing cerebellar wound-healing proteinprotein interaction network However, since the candidate PPI network is based on a wide variety of biological experimental conditions and/or ortholog information, it may have contained a large amount of protein interactions irrelevant to the cerebellar wound-healing process (i.e., false-positive interactions). To develop a realistic PPI network for the cerebellar woundhealing process, the candidate PPI network is needed to further prune the false-positive interactions and then obtain a real PPI network (see Fig. 10.5) by system identification method and system order detection method using the real microarray data. The dynamic PPI network model is given in the following:
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
10.2 Materials and methods for constructing proteinprotein interaction network of cerebellar
237
FIGURE 10.6 Significantly temporal patterns in the wound healing process. (A) Using STEM, some significantly temporal patterns are identified (color background) (FDR corrected P-value , .05). According to the STEM results and ZMI, we focus on group 29, 40, and 11, which are positively correlated with ZMI (B), acute response (C), and negatively correlated with ZMI (D). doi:10.1371/journal.pone.0097902.g006 [493]. ZMI, Zebrafish movement index.
yp ½t 1 1 5 yp ½t 1
N X
bpq yp ½tyq ½t 1 αp xp ½t 2 β p yp ½t 1 ωp ½t 1 1
q51
where yp[t] and yq[t] denote the protein activity level of the target protein p and the qth protein interacting with protein p at time t, respectively; bpq represents the ability of the qth interactive protein to interact with protein p; αp represents the translation effect from mRNA to protein p; xp[t] denotes the mRNA expression level of protein p; β p represents the degradation effect of protein p; and ωp ½t 1 1 denotes stochastic noise. Rewriting the above dynamic PPI equation as the regression form, we obtain: 3 2 bp1 6 ... 7 7 6 7 yp ½t 1 1 5 yp ½ty1 ½t. . .yp ½tyN ½txp ½typ ½t6 6 bpN 7 1 ωp ½t 5 fp ½tθp 1 ωp ½t 4 αp 5 1 2 βp By collecting all the data from every time point, the interactions between the target protein yp and its interaction candidates (y1, . . . , yN) can be identified through the system identification and AIC system order detection methods. The technique details of constructing
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
238
10. The role of inflammation and immune response in cerebella wound-healing mechanism
PPI network are described in methods A1, A2, and A3 in Appendix. Finally, the real PPI network is constructed.
10.2.5 Systems biology tools and statistics After refining the candidate PPI network, two open access programs are employed to analyze the time-course expression profile. For identifying the significant patterns of timecourse expression profiles, The STEM tool is also employed to identify significant temporal patterns from time-course expression levels data. The default settings are used for the analysis and the significant patterns (with the Bonferroni corrected P-value, .05) in Fig. 10.6A are highlighted with color background and ordered by the corrected P-value. For identifying significantly enriched functions in the specific pattern groups identified by STEM, the enrichment and ontology analysis are performed through the website of protein analysis through evolutionary relationships (PANTHER) [464]. PANTHER can map gene lists to GO molecular function and biological process categories, as well as PANTHER biological pathways. Moreover, PANTHER can display the results in pathway diagrams to enable visualization of the relationships between genes in known pathways. The PANTHER tool can be used to identify the significantly enriched pathways in a group of proteins or genes. The default settings can be also used for the analysis and the Bonferroni correction for multiple testing can be used for correcting the original P-value too. The significantly enriched pathways are determined with the corrected P-value , .05. For the validation the sign test is used to justify the strong correlation between the temporal expression profiles and ZMI at the 5% significance level, where the null hypothesis is the probability of having positive correlation equal to 0 and the alternative hypothesis is the probability of having positive correlation not equal to 0.
10.3 The role of inflammation and immune response in the cerebellar wound-healing process 10.3.1 The significant temporal patterns and PPI network for the cerebellar wound-healing process in the zebrafish TBI model The first principal target of this chapter is to construct an integrated cerebellar woundhealing-related cellular PPI network. Various types of data are integrated with dynamic network model and microarray data for constructing cerebellar wound healingrelated cellular PPI network under this framework (Fig. 10.4). There are 802 proteins (nodes) and 5270 interactions (edges) in the final PPI network (i.e., the real PPI network refined by AIC system order detection; Fig. 10.5). This PPI network serves as the backbone of the following further analysis. Information from different systems biology tools can be integrated for this PPI network under this framework. First, the temporal information from the time-course microarray data in Fig. 10.2 are used to build up the PPI network through a dynamic network modeling. If the time-course microarray data is compared to two published data using zebrafish as a dynamic system to study the regeneration of neural systems (spinal cord and optic nerve) [460,468], it is found that our data is more consistent with the reported findings
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
10.3 The role of inflammation and immune response in the cerebellar wound-healing process
239
(the correlation coefficients are about 0.5, 0.7) although the experimental conditions (the lesion schemes) and the studying objects (the injured parts of neural systems) are different (see Figs. 10.A3 and 10.A4 in Appendix). Therefore this can serve as an independent validation of our results. Next, STEM is used to find significant temporal patterns in time-course microarray data. Then, several significant temporal patterns are revealed (Fig. 10.6). The temporal pattern of group 29 (group P hereafter) showed a strong positive correlation to the ZMI (comparing Fig. 10.3E to Fig. 10.6B, the Pearson correlation coefficient is 0.742760.1315 with the P-value for sign test is less than .05). The temporal pattern of group 11 (group N hereafter) is just the reverse, that is, there is a highly negative correlation with the ZMI (comparing Fig. 10.3E to Fig. 10.6D). The temporal pattern of group 40 (group A hereafter) is found to be related to acute responses during the wound-healing process (Fig. 10.6C) since the variations of expression levels are caused earlier than group P and N [469,470]. Second, the qualitative functional information are integrated for investigating the molecular systems mechanisms of cerebellar wound-healing process. For the genes in group A, some of the most common GO biological processes are found to be related to the regulation of MAP kinase activity, regulation of the MAPK cascade, phosphorylation, and dephosphorylation. For the genes in group P, some of the most common GO biological processes are found to be related to the regulation of the cell cycle, meiosis, DNA replication, and organelle organization. For the genes in group N, some of the most common GO biological processes are found to be related to the cation transport, potassium ion transport, ion transmembrane transport, and nervous system development. Further enrichment and ontology analysis to these biological processes of the wound healing will be given in the sequel.
10.3.2 Significant signaling pathways in the wound-healing process The presence of temporally coregulated proteins may imply the involvement of common biological pathways in response to TBI. Therefore to evaluate the biological functions of temporally coregulated proteins, pathway analysis was conducted using PANTHER. Upon the results the roles of these significantly enriched pathways of groups (A, N, and P) in the zebrafish TBI model are investigated for molecular mechanism of cerebellar wound-healing process. Significantly enriched pathways of group A: the acute inflammation and immune response can be found in the wound-healing process. Examples of the pathways are given in Fig. 10.A5 in Appendix. These pathways are also primarily involved in the acute inflammation and immune response to TBI (The detail is enlisted in Table 10.A2 in Appendix.). 1. Endogenous cannabinoid signaling: An endogenous cannabinoid (2-AG) has been involved in affecting the neuroprotective mechanism immediately after TBI [471]. The endocannabinoid system is also implicated in pain relief, the blocking of working memory, immune responses, etc. The endocannabinoid system has, therefore, been proposed as a potential target for acute therapies of TBI [472]. 2. PI3K/PKB pathway: Phosphoinositide 3-kinase (PI3K) can catalyze the production of phosphatidylinositol-3,4,5-trisphosphate, which has been involved in several essential
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
240
10. The role of inflammation and immune response in cerebella wound-healing mechanism
biological processes, including cell survival, regulation of gene expression and cell metabolism, and cytoskeletal rearrangements. A study has reported that PKB, which belongs to group A, could modulate the macrophage inflammatory response to Francisella infection and give a survival advantage in mice [473]. 3. Heterotrimeric G-protein signaling pathway: As signal transducers, G proteins can communicate signals from many hormones, neurotransmitters, and other signaling factors [474], which occur ubiquitously in the cerebellar wound-healing process.
10.3.3 Significantly enriched signaling pathways of group P: Having positive correlation with ZMI in the wound-healing process The primarily involving signaling pathways are given in Fig. 10.A6 in Appendix and will be investigated in the following (The detail is enlisted in the Table 10.A3 in Appendix.). They are found to either play important roles in cytoskeleton regulation, angiogenesis, and inflammation or reflect the cellular processes by which the behavioral disability could increase initially and then decrease. 1. Cell cycle and DNA replication: An ordered series of events could lead to the replication of cells with a series of integrated proteinprotein and proteinDNA interactions and enzymatic reactions, to ensure the accuracy of DNA replication. That may imply the occurrence of angiogenesis and neurogenesis in the damaged site. 2. Parkinson’s disease pathway, Huntington’s disease pathway, and Alzheimer’s diseasepresenilin pathway: Parkinson’s, Huntington’s, and Alzheimer’s diseases are the three most well-known neurodegenerative diseases. Patients with neurodegenerative diseases always suffer from the progressive loss of cellular structure or function of neurons, including the death of neurons. At present, research is currently identifying many similarities at a subcellular function level between these diseases. For example, mitochondrial dysfunction and oxidative stress have played an important role in the neurodegenerative disease pathogenesis [475]. The appearance of these signaling pathways also reveals some relationships between the secondary damage of cerebellar injury and neurodegenerative diseases. 3. Inflammation mediated by chemokine and cytokine signaling pathway, T cell activation, and B cell activation: after binding to a family of G-protein coupled with seven-transmembrane receptors, chemokines could control and manage trafficking and migration of immune cells. This pathway can illustrate the chemokine-induced adhesion and migration of leukocytes, which could result in infiltration to the tissue and transcriptional activation to enable the recruitment of more leukocytes for protection from infection in the cerebellar wound-healing process [476,477]. Therefore the inhibition of specific chemokines and receptors could prevent the excessive recruitment of leukocytes to effectively control the scale of inflammation. T and B cells are the two most abundant lymphocytes. T cell activation refers to a process in which mature T cells can express antigen-specific T cell receptors on their surface to recognize their cognate antigens and respond by entering the cell cycle, secreting cytokines or lytic enzymes, and initiating the cell-based functions of the immune system. B cell activation brings on the process by which a pre-B cell matures into a plasma cell.
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
10.3 The role of inflammation and immune response in the cerebellar wound-healing process
241
The signaling pathways for this inflammation resolution in the wound-healing process of traumatic injury in zebrafish are very complex. 4. Cytoskeletal regulation by Rho GTPase and Axon guidance: In cerebellum regeneration processes such as wound healing and angiogenesis, several cell types, including neural stem cells, leukocytes, lymphocytes, fibroblasts, neuronal cells, epithelial cells, and endothelial cells, are needed to change cell morphology for the migration from stem cell niche and the extravasation from the vascular in the wound-healing process of traumatic injury in zebrafish. Changing in cell morphology is a multistep cellular process involving cellular changes in the cytoskeleton, cell-substrate adhesions, and extracellular matrix [478]. The “cytoskeletal regulation by Rho GTPase” pathway is such a cellular change, regulated by Rho GTPase. The integrin signaling pathway is then triggered when integrins in the cell membrane bind to extracellular matrix components [479]. Furthermore, axons are also guided along specific tracts by attractive and repulsive cues during the establishment of the nervous system network in the cerebellar wound-healing process of zebrafish.
10.3.4 Significantly enriched signaling pathways of group N: Having negative correlation with ZMI in the wound-healing process The following signaling pathways are significantly enriched by the genes in group N (Fig. 10.A7 and Table 10.A4 in Appendix). These neurotransmitter-related signaling pathways have appeared to be inhibited during the wound-healing process. They may not be as crucial for regenerability as the abovementioned pathways, but they support the functional recovery of neural transmission in the cerebellar wound-healing process. Some examples of these neurotransmitter-related signaling pathways are Beta1 adrenergic receptor signaling pathway, Beta2 adrenergic receptor signaling pathway, synaptic vesicle trafficking, muscarinic acetylcholine receptor 2 and 4 signaling pathway, metabotropic glutamate receptor group II signaling pathway, 5-HT1 type receptor-mediated signaling pathway, metabotropic glutamate receptor group III signaling pathway, Enkephalin release pathway, GABA-B receptor II signaling pathway, Beta3 adrenergic receptor signaling pathway, dopamine receptor-mediated signaling pathway, opioid prodynorphin pathway, opioid proenkephalin pathway, 5-HT4 type receptor-mediated signaling pathway, muscarinic acetylcholine receptor 1 and 3 signaling pathway, ionotropic glutamate receptor signaling pathway, and opioid proopiomelanocortin pathway. These signaling pathways are all related to neurotransmitters and receptors in neural transmission and their roles in the restoration and neurogenesis of the cerebellar wound-healing process of zebrafish need further elaboration.
10.3.5 Cross talks between the three groups of proteins of proteinprotein interaction network in the cerebellar wound-healing process of zebrafish Moreover, the orchestrated interactions (cross talks) among signaling pathways are more important than themselves. The cross talks among signal/regulatory pathways have been found to coordinate for efficient inflammatory responses for different stimuli in
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
242
10. The role of inflammation and immune response in cerebella wound-healing mechanism
infectious process [3,5]. Consequently, we first examine the cross talks between the three groups of proteins, extracted from the dynamic PPI network (Fig. 10.5), in the cerebellar wound-healing process of zebrafish. In groups A and P, that is, the proteins whose temporary expression profiles are positively correlated with the ZMI and the proteins in the acute phase, there are two major sub-PPI networks (AP1 and AP2) emerging from the PPI network (see Fig. 10.7A; ZFIN symbols for the edges between two groups are given in Table 10.A5 in Appendix). In the subnetwork AP1 the cross talks between the PI3K signaling pathway enriched in group A and cell cycle signaling pathway and immune-related signaling pathway enriched in group P are also observed. This observation supports the understanding that PI3K signaling pathways are involved in several crucial recovery processes, including cell proliferation and immune response in the cerebellar wound-healing process of zebrafish. In the subnetwork AP2 the cannabinoid signaling pathway cross talks with the neurotransmitterrelated signaling pathways. The cannabinoid signaling pathway has shown its importance in neurogenesis [471,480]. This may imply that neurotransmitter-related signaling pathways should play a role in neurogenesis, which may be regulated by cannabinoid signaling pathway and has not been fully explored [481,482]. In groups A and N, there are also two major subnetworks (AN1 and AN2) (Fig. 10.7B; ZFIN symbols for the edges between two groups are given in Table 10.A5 in Appendix). The subnetwork, AN1, contains interactions among cellular communication pathways because G-protein pathways are enriched in both groups. In the subnetwork AN2, G-protein pathways and T and B cell activation pathways enriched in group A have links to neurotransmitterrelated signaling pathways enriched in group N. This cross talk between immune system and neurotransmitters is an interesting area in cerebellar wound-healing process [483485]. Third, there are four major subnetworks in groups N and P (NP1 to NP4) (Fig. 10.7C; ZFIN symbols for the edges between two groups are given in Table 10.A5 in Appendix). In the subnetwork NP1, neurotransmitter-related signaling pathways have links to the cell cycle signaling pathway, which constitutes another interesting cross talks [481,482]. In the subnetwork NP2, inflammation-related signaling pathways of group N have cross talks with the G-protein and Alzheimer’s pathway. Neurodegenerative diseases are often associated with the prolonged brain injury. Nevertheless, the molecular mechanisms how the prolonged brain injury could induce neurodegenerative diseases are still unclear. Based on the PPI network, it suggests that inflammation may be involved in the initiation of neurodegenerative diseases. The subnetwork NP3 has many gene regulatory interactions, for example, transcriptions, which are enriched in both groups. By examining adjacent nodes of NP3, we can find that the subnetwork NP3 links to the cell cycle pathway through nodes that are not included in the three groups. NP3 may play a role in regulating and communicating other genes that do not belong to the three groups. The subnetwork NP4 is found to be dominated by the axon guidance pathway, which can be seen as an indication of neurogenesis or functional recovery in the cerebellar wound-healing process of zebrafish. By examining adjacent nodes of NP4, axon guidance pathway is found to indirectly link to the G-protein pathways through nodes not included in the three groups. In summary, we have developed a schematic diagram of cross talks among the signaling pathways enriched in the three groups that may explain the underlying cellular molecular mechanism of the cerebellar wound-healing process of zebrafish (Fig. 10.8A).
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
10.3 The role of inflammation and immune response in the cerebellar wound-healing process
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
243
244
10. The role of inflammation and immune response in cerebella wound-healing mechanism
In general, the cross talks exist not only in these three groups but also in the whole PPI network. Next, by examining the cross talks of the following three pathways (Fig. 10.8B) significantly enriched in the whole network, we can find connections among inflammation, neurogenesis, and angiogenesis, which are crucial for the restoration from the injured brain. In the chemokine signaling pathway a pro-inflammatory pathway, chemokine (C-X-C motif) receptor 3 and 4 (CXCR3 and CXCR4), and chemokine (C-C motif) receptor 7 (CCR7) are found in the PPI network and considered a start point to activate a series of inflammatory events (e.g., the activations of STAT and NF-kB signaling pathways). The PI3K is in the middle of the signaling pathway and also involved in another signaling pathway, PI3K signaling pathway. In the PI3K signaling pathway, several receptor tyrosine kinases (FGFR, NGFR, and PDGFR), insulin receptor substrate 1 (IRS1), cholinergic receptor muscarinic 2 (CHRM2), and growth factor receptor-bound protein 2 (GRB2) are all found in the PPI network. The information will be transduced to AKT through PI3K and then promote the cell cycle progression through cyclin D1 (CCND1), which represents a sign for new cell generation. More interestingly, the PI3K is also functioned in the axon guidance pathway through regulating cytoskeleton. In the axon guidance pathway, it is seen that the semaphorinsrelated ones, integrin beta 1 (ITGB1) and plexin A3 (PLXNA3) in the PPI network are receptors for semaphorins to activate downstream regulators: ras-related C3 botulinum toxin substrate 1 (RAC1), dihydropyrimidinase-like 2 (DPYSL2 or CRMP2), and p21 protein (Cdc42/Rac)activated kinase 2 (PAK2). PAK2 is also involved in the vascular stabilization, which may be a crucial function after angiogenesis. DPYSL2 could control the branching guidance and the number of nurites. PLXNA3 is found to be related to the morphogenesis of motor neurons and thus could have contributions to the recovery of zebrafish movement. Through this pathway validation, it could confirm the PPI network and reveal the cellular relationships among inflammation, neurogenesis, and angiogenesis in the cerebellar woundhealing process from the aspect of signaling pathways cross talks.
10.4 Discussion and conclusion TBI is one of the leading causes of disability in the United States [451]. However, the complex and intricate nature of the brain makes it very difficult to aim at this problem. FIGURE 10.7
L
Cross talks among significantly enriched pathways in the three significantly temporal groups of proteins. (A) Cross talks between groups A and P: cross talks between PI3K pathways (enriched in group A) and the cell cycle (enriched in group P) dominate subnetwork AP1. Cross talks between endocannabinoid pathways (enriched in group A) and neurotransmitter-related pathways (enriched in group P) dominate subnetwork AP2. (B) Cross talks between groups A and N: Cross talks between G-protein pathways (enriched in proteins from both groups A and N) dominate subnetwork AN1. The cross talks between G-proteins and T and B cell activation pathways (enriched in group A) and neurotransmitter-related pathways (enriched in group N) dominate the subnetwork AN2. (C) Cross talks between groups N and P: Cross talks between neurotransmitter-related pathways (enriched in group N) and cell cycle pathways (enriched in group P) dominate subnetwork NP1. Cross talks between G-protein and neurodegenerative diseases pathways (enriched in group N) and inflammation pathways (enriched in group P) dominate subnetwork NP2. Cross talks between transcription pathways (enriched in both groups N and P) dominate subnetwork NP3. Cross talks between axon guidance pathways (enriched in both groups N and P) dominate subnetwork NP4. Red lines indicate the cross talks between groups and are listed in Table 10.A5 in Appendix. doi:10.1371/journal.pone.0097902.g007 [493].
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
10.4 Discussion and conclusion
245
FIGURE 10.8 Schematic diagram of wound healingrelated cellular pathways and their cross talks among inflammation, neurogenesis, and angiogenesis. (A) Pathways shown in blocks were enriched in the subnetworks (Fig. 10.7). Their cross talks are observed in the cerebellar wound-healing PPI network of zebrafish (solid arrows indicate cross talks involving proteins in groups A, N, and P; dashed arrows indicate cross talks involving proteins not in these groups). To establish cell communication and defense mechanisms (inflammation and immune response) after brain injury, cell communication is related to neurotransmitter-related pathways (through AP2 and AN2), the axon guidance pathway (through NP4 indirectly), and itself (through AN1). Defense mechanisms are related to the cell cycle pathway (through AP1), neurotransmitter-related pathways (through AN2), the cell communication pathway (through NP2), and neurodegenerative diseases (through NP2). Cell cycle pathways also interact with gene regulation pathways through NP3 indirectly. (B) The cross talks are found among pathways of inflammation, neurogenesis, and angiogenesis in cerebellar wound-healing process. Inflammation is mediated by chemokine signaling pathway. New neural and vascular cells generation is mediated by PI3K-AKT signaling pathway and axon guidance pathway. PI3K in chemokine signaling pathway also regulates the PI3K-AKT signaling pathway and axon guidance pathway. PAK2 in axon guidance pathway also maintains the stability of vascular. doi:10.1371/journal.pone.0097902.g008 [493].
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
246
10. The role of inflammation and immune response in cerebella wound-healing mechanism
At present, there are no FDA-approved pharmacological therapies [486]. However, several recent advances have been made, through mass spectrometry, bioinformatics tools, and systems biology, in discovering significant biomarkers to serve as potential therapy targets [487]. In this chapter, zebrafish, which can recover from cerebellum injuries, is used to investigate potential signaling pathways and their cross talks that may be crucial for TBI recovery and can also be considered useful biomarkers. According to a set of timecourse microarray experiments and the integration of omic data extracted from databases, a zebrafish PPI network of cerebellum wound-healing processes is constructed by systems biology method. By analyzing the temporal expression patterns, three major protein groups (the acute, positively, and negatively related groups, i.e., Groups A, P, and N) are then identified. Within each group, several significantly enriched signaling pathways are found, and their relationships with the cerebellar wound-healing process are confirmed through literature survey. The cross talks of these significantly enriched signaling pathways are also investigated for cellular relationship among inflammation, neurogenesis, and angiogenesis in the constructed zebrafish PPI network of cerebellum wound-healing process of zebrafish (see Fig. 10.8A). After an injury of the cerebellum, cellular communication (i.e., G-protein and endocannabinoid pathways) and defense mechanisms (inflammation and immune responses) are induced to coordinate the subsequent recovery processes. G-protein pathways and endocannabinoid pathways could interact with neurotransmitter-related pathways (through subnetworks AP2 and AN2). Then those neurotransmitter-related pathways could also cause the activation of cell cycle pathways (through subnetwork NP1), which is essential in neurogenesis. At the same time the G-protein pathways could also indirectly initiate axon guidance which is maintained by cross talks between groups N and P (subnetwork NP4). The defense mechanisms, that is, inflammation and immune responses, could interact with cell cycle pathways and neurotransmitter-related pathways for restoration through subnetworks AP1 and AN2, respectively. Inflammation can be also interpreted as a necessary and sufficient condition for neurogenesis [488]. Hence, there should exist cross talks between the inflammation and cell cycle pathways for coordination in the cerebellar wound-healing process of zebrafish. Moreover, inflammation also seems to be jointed to neurodegenerative diseases, for example, Alzheimer’s, Parkinson’s, and Huntington’s diseases (through subnetwork NP2), which are common in brain injury patients [489]. The cross talks between immune responses and neurotransmitter-related pathways have also been reported in the literature to provide an opportunity for TBI therapy. The following pathway validation could reconfirm the PPI network and also reveal the cross talks among inflammation neurogenesis, and angiogenesis, which are essential in the TBI recovery. The PI3K (Fig. 10.8B) could adapt the information from chemokines, growth factors, and G-protein coupled receptors and further transduce the information to NF-kB, cyclins, and PAK2. It seems to play a central role to regulate cytokine production, cell cycle progression, and axon outgrowth. Except PI3K, we could also find some potential targets for further experimental designs, for example, the steroid biosynthesis, the hormone biosynthesis, and the notch signaling pathway, which are all enriched in the PPI network. The systems biology methodology can systematically analyze high-throughput data and has also been suggested as a promising way to discover biomarkers as drug targets for TBI [490]. The refined PPI network by microarray data in this chapter has revealed the
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
247
10.5 Appendix
cross talks between several pathways, including some well-known cross talks to coordinate between the immune response and cell communication processes for regeneration. The results in this chapter have also indicated cross talks between the immune response process and neurodegenerative diseases, which have also been demonstrated by Feala et al. [490]. Neurotransmitters may also be involved in the TBI-recovery process through their effect on cell cycle regulations [481], immune responses [491], and cell communication [492]. However, the roles of the neurotransmitter-related pathways in the TBI process have been less addressed. It is, therefore, interesting to study the roles of neurotransmitters in neuron regeneration. Finally, the cross talks and plausible nodes identified in this chapter will be helpful in deciphering the molecular interactions between these pathways or cells in the wound-healing process and may make the therapeutic treatment of human CNS regeneration possible in the future.
10.5 Appendix 10.5.1 Method A1: Dynamic model of the wound healingrelated cellular proteinprotein interaction network In the above candidate PPI network the candidate interactions could be modeled as a dynamic interactive system, in which the interactive proteins and mRNA are the inputs and the protein activities are the outputs. In particular, for a target protein p that interacts with N proteins in the candidate PPI network, the dynamic interactive model of the protein p is given as follows: yp ½ t 1 1 5 yp ½ t 1
N X
bpq yp ½tyq ½t 1 αp xp ½t-β p yp ½t 1 ωp ½t 1 1
(10.A1)
q51
where yp[t] and yq[t] denote the protein level of the target protein p and the qth protein interacting with protein p at time t, respectively; bpq represents the ability of the qth interactive protein to interact with protein p; αp is the translation effect from mRNA of gene p to protein p; xp[t] denotes the mRNA expression level of gene p; β p indicates the degradation effect of protein p; and ωp[t 1 1] denotes stochastic noise. The rate of PPI is proportional to the product of the concentrations of the two proteins involved [283], that is, it is found to be proportional to the probability of molecular collisions between two proteins. The interaction is, therefore, modeled as a nonlinear multiplication of two protein concentrations scheme. For example, in the PPI network, the phosphorylation of yp[t] by kinase yq[t] is proportional to the product of the concentration of kinase yq[t] and its substrate yp[t] [283]. The biological meaning of Eq. (10.A1) is that the protein level of yp at time t 1 1 is attributed to its protein level at time t, plus the effects of interactions with N interactive proteins, plus the translation product from the mRNA, minus the degradation effect, plus any stochastic noise. Because of the undirected nature of protein interactions, there exists no direction between interacting proteins in the candidate PPI network. The interaction parameter bpq, translation parameter αp, and decay rate β p are estimated from the microarray data, as described later.
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
248
10. The role of inflammation and immune response in cerebella wound-healing mechanism
10.5.2 Method A2: Interaction parameter identification using the time profiles microarray data The interaction parameters for the dynamic PPI network model are identified using the microarray data collected as the before. The protein interaction parameters are identified using a least squares parameter estimation scheme [280]. Eq. (10.A1) can be rewritten as the following regression form: 2 3 bp1 6 7 6 ^ 7 6 yp ½t 1 1 5 yp ½ty1 ½t ? yp ½tyN ½t xp ½t yp ½t 6 bpN 7 7 1 ωp ½t (10.A2) 4 αp 5 1 2 βp 5 φp ½tθp 1 ωp ½t where φp[t] represents the regression vector to be computed directly from microarray data points, and θp is the parameter vector to be estimated. The dataset used for parameter estimation is small, so to avoid the overfitting in the parameter estimation process, the cubic spline method [308] will be used to interpolate enough time points for the gene expression data. For simplicity, at different time points t 5 1, . . ., L, Eq. (10.A2) can be presented as follows: Yp 5 Φp θp 1 Ωp
(10.A3)
where Yp 5 [yp[2] yp[1] ? yp[L]]T, Φp 5 [φp[1] φp[2] ? φp[L1]], and L is the number of data points of microarray data after cubic spline interpolation. The least squares parameter estimation routine can then be formulated as follows: 2
min :Yp 2Φp θp :2 : θp
(10.A4)
Because there is no large-scale measurement of protein expressions available, mRNA expression profiles in Fig. 10.2 are used to substitute for protein expression levels when identifying the protein interaction parameters. Although mRNA expression levels do not reflect protein expression levels exactly, there are partially positive correlations between them [321,322]. The least squares minimization scheme (10.A4) can be solved using the active set method for quadratic programming [280]. The PPI parameter bpq is estimated for each protein in the candidate PPI network using the time profiles microarray data. Nevertheless, since the candidate PPI network is constructed using data from a variety of biological experiments under a range of conditions, or inferred from orthology data, it may contain many PPIs irrelevant to the wound-healing process (i.e., the false positives). The false-positive interaction parameters b^pq are, therefore, pruned using the model order (true interaction number) detection method in the following.
10.5.3 Method A3: Determination of realistic interaction pairs When the PPI parameters b^pq are identified, the Akaike information criterion (AIC) in (7.2) is used to select model order and also identify realistic interactions in the
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
249
10.5 Appendix
cerebellar wound healingrelated PPI network, that is, to determine the interaction number N for target protein p in Eq. (10.A1). AIC should reach a minimum around the correct PPI number [40,81]. Therefore AIC can be used to detect model order (number of PPIs) based on the protein interaction abilities b^pq identified previously. In this way we used AIC model order detection to prune the false-positive interactions of the candidate PPI network one protein by another to obtain the refined PPI network using the time profiles of microarray data and achieve a realistic PPI network.
TABLE 10.A1 RNA QA/QC information [493]. Sample name
RNA conc. (ng/µL)
RNA quantity (ng)
OD260/280 ratio
Bioanalyzer chip lane location
28S/18S ration
RIN
6 hpl
227.12
2044.08
2.12
1
1.8
9.4
1 dpl
279.09
2511.81
2.11
2
1.3
N/A
3 dpl
429.08
3861.72
2.12
3
1.0
9.4
6 dpl
308.67
2778.03
2.13
4
1.3
9.7
10 dpl
372.76
3354.84
2.14
5
1.4
9.6
15 dpl
510.94
4598.46
2.13
6
0.9
9.1
21 dpl
354.08
3186.72
2.1
7
1.5
8.6
28 dpl
468.07
4212.63
2.13
8
1.5
9.3
Control
576.71
5190.39
2.17
9
1.4
9.0
TABLE 10.A2 The significantly enriched pathways in group A [493]. Pathways
P-value
Endogenous_cannabinoid_signaling
7.17E 2 06
PI3 kinase pathway
1.87E 2 05
Androgen/estrogene/progesterone biosynthesis
1.90E 2 05
GABA-B_receptor_II_signaling
3.47E 2 05
Heterotrimeric G-protein signaling pathway-Gq alpha and Go alpha mediated pathway
4.50E 2 05
Thyrotropin-releasing hormone receptor signaling pathway
2.88E 2 04
Insulin/IGF pathway-mitogen activated protein kinase kinase/MAP kinase cascade
6.87E 2 04
Metabotropic glutamate receptor group II pathway
1.26E 2 03
Tricarboxylic acid (TCA) cycle
1.69E 2 03
Circadian clock system
3.47E 2 03
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
250
10. The role of inflammation and immune response in cerebella wound-healing mechanism
TABLE 10.A3 The significantly enriched pathways in group P [493]. Pathways
P-value
Cell cycle
4.94E 2 08
Parkinson disease
3.20E 2 05
Integrin signaling pathway
3.43E 2 05
De novo pyrimidine deoxyribonucleotide biosynthesis
2.68E 2 04
Inflammation mediated by chemokine and cytokine signaling pathway
4.15E 2 04
Axon guidance mediated by semaphorins
6.53E 2 04
Cytoskeletal regulation by Rho GTPase
8.56E 2 04
T cell activation
8.63E 2 04
p53 pathway
2.02E 2 03
B cell activation
3.29E 2 03
Alzheimer disease-presenilin pathway
5.94E 2 03
Huntington disease
5.98E 2 03
TABLE 10.A4 The significantly enriched pathways in group N [493]. Pathways
P-value
Muscarinic acetylcholine receptor 2 and 4 signaling pathway
3.56E 2 10
Heterotrimeric G-protein signaling pathway-Gi alpha and Gs alpha mediated pathway
2.84E 2 08
GABA-B_receptor_II_signaling
9.11E 2 08
Dopamine receptor-mediated signaling pathway
1.39E 2 07
5HT1 type receptor-mediated signaling pathway
2.63E 2 07
Beta1 adrenergic receptor signaling pathway
3.49E 2 07
Beta2 adrenergic receptor signaling pathway
3.74E 2 07
Histamine H2 receptor-mediated signaling pathway
1.41E 2 06
Endothelin signaling pathway
4.66E 2 06
Enkephalin release
5.30E 2 06
Metabotropic glutamate receptor group II pathway
8.41E 2 06
Muscarinic acetylcholine receptor 1 and 3 signaling pathway
2.46E 2 05
Beta3 adrenergic receptor signaling pathway
5.54E 2 05 (Continued)
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
10.5 Appendix
251
TABLE 10.A4 (Continued) Pathways
P-value
Alpha adrenergic receptor signaling pathway
6.49E 2 05
Adrenaline and noradrenaline biosynthesis
7.01E 2 05
Metabotropic glutamate receptor group III pathway
7.30E 2 05
Opioid prodynorphin pathway
1.07E 2 04
5HT4 type receptor-mediated signaling pathway
1.07E 2 04
5-Hydroxytryptamine biosynthesis
1.24E 2 04
Opioid proopiomelanocortin pathway
1.29E 2 04
Opioid proenkephalin pathway
1.29E 2 04
Synaptic_vesicle_trafficking
1.54E 2 04
Transcription regulation by bZIP transcription factor
4.92E 2 04
Heterotrimeric G-protein signaling pathway-Gq alpha and Go alpha mediated pathway
5.89E 2 04
Nicotine pharmacodynamics pathway
7.76E 2 04
Alzheimer disease-amyloid secretase pathway
8.28E 2 04
Oxytocin receptor-mediated signaling pathway
8.28E 2 04
Thyrotropin-releasing hormone receptor signaling pathway
9.08E 2 04
Gonadotropin releasing hormone receptor pathway
1.02E 2 03
5HT2 type receptor-mediated signaling pathway
1.15E 2 03
Parkinson disease
2.24E 2 03
Hedgehog signaling pathway
2.83E 2 03
Apoptosis signaling pathway
2.87E 2 03
Histamine synthesis
3.99E 2 03
Inflammation mediated by chemokine and cytokine signaling pathway
4.11E 2 03
Cortocotropin releasing factor receptor signaling pathway
4.41E 2 03
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
252
10. The role of inflammation and immune response in cerebella wound-healing mechanism
TABLE 10.A5 ZFIN symbols of nodes in the subnetworks [493]. Subnetwork
Group A
Group N
Group A
Group N
AN1
cxcl12b
chrm2a
trio
srgap2b
prok1
agt
gabbr1b
adra2b
gabbr1b
chrm2a
gabbr1b
si:dkey-114o13.3
trio
ablim1a
oprm1
chrm2a
gabbr1b
gng7
arhgef18a
srgap2b
cnr1
chrm2a
gna15.1
prkceb
gna15.1
cck
gnb2
gng7
si:ch211234p18.3
srgap2b
oprm1
adra2b
cxcl12b
adra2b
cnr1
adra2b
gnb2
si:dkey-114o13.3
gna15.1
agt
cnr1
agt
prok1
cck
cxcl12b
agt
gabbr1b
agt
arhgef9a
srgap2b
oprm1
agt
pde4bb
adcy2b
calm2a
adcy2b
pde4cb
adcy2b
slc3a2b
adcy2b
AN2
Subnetwork
Group A
Group P
Group A
Group P
AP1
trio
rac2
si:ch211234p18.3
arhgap4a
sh3gl2
tpst2
arhgef9a
tagapa
trio
hmha1
sh3gl2
zgc:66125
trio
arhgap15
tubb1
actb1
pfkfb4l
prkacba
si:ch211234p18.3
arhgap15
pde4bb
prkacba
nfkbiab
tpst2
fancd2
tpst2
tubb1
tuba8l2
sh3gl2
ap1s3b
pde4cb
prkacba
trio
nck1a
trio
pak2a
arhgef9a
arhgap4a
rab43
ccnd1
zgc:113263
pcna
trio
tagapa
tubb1
tuba1b
pfkfb2a
prkacba
si:ch211234p18.3
hmha1
rps6kb1a
zgc:92237 (Continued)
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
253
10.5 Appendix
TABLE 10.A5 (Continued) Subnetwork
AP2
Group A
Group P
Group A
Group P
arhgef9a
hmha1
pfkfb2a
gpib
si:ch211234p18.3
tagapa
si:ch211234p18.3
rac2
arhgef9a
arhgap15
zgc:113263
pold1
tubb1
actb2
arhgef18a
arhgap15
arhgef18a
hmha1
arhgef9a
rac2
arhgef18a
rac2
arhgef18a
arhgap4a
pfkfb4l
gpib
fancd2
ube2t
arhgef18a
tagapa
kpna1
casp3a
trio
arhgap4a
cnr1
ccr9a
cxcl12b
si:dkey-217m5.3
gnb2
gng12a
cnr1
si:dkey-217m5.3
gabbr1b
si:dkey-217m5.3
gabbr1b
gng12a
oprm1
si:dkey-217m5.3
oprm1
ccr9a
gabbr1b
ccr9a
cxcl12b
ccr9a
Subnetwork
Group N
Group P
Group N
Group P
NP1
fes
rras
eef1a1b
zgc:92237
prkar2ab
prkacba
eef1a1b
rpl5a
adcy2b
prkacba
eef1a1b
rps23
fes
dpysl2b
eef1a1b
rps17
cdk5r1b
dpysl2b
eef1a1b
rps28
ndufa4
hadhaa
eef1a1b
rps20
ndufa4
hadhb
eef1a1b
rps12
eef1a1b
eef1g
eef1a1b
rpl4
ywhag2
cep72
eef1a1b
rps2
ablim1a
nck1a
ywhag2
prkacba
fes
pak2a
ywhag2
nek2
eef1a1b
tpst2
ywhag2
cenpj
eef1a1b
rpl28
ywhag2
tuba8l2
eef1a1b
rpl15
ywhag2
tubb5 (Continued)
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
254
10. The role of inflammation and immune response in cerebella wound-healing mechanism
TABLE 10.A5 (Continued) Subnetwork
NP2
NP3
NP4
Group N
Group P
Group N
Group P
eef1a1b
rps15a
ywhag2
mapre1a
eef1a1b
rpsa
ywhag2
plk1
adra2b
ccr9a
chrm2a
si:dkey-217m5.3
agt
ccr9a
adra2b
si:dkey-217m5.3
agt
si:dkey-217m5.3
chrm2a
ccr9a
polr2c
polr2eb
polr2c
slbp
polr2c
snrpf
polr2c
snrpd2
polr2c
ptbp1a
srgap2b
arhgdia
srgap2b
rac2
srgap2b
arhgap4a
srgap2b
net1
srgap2b
arhgdig
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
10.5 Appendix
255
FIGURE 10.A1 The wrapped needle and the sagittal brain sections after injury. We control the injury depth by wrapping the needle with plastic tube and only left 1.5 mm needle tip for stabbing. Also the lesion depth with sagittal brain sections is examined. The white-dashed line surrounds the intact cerebellum, and the yellowdashed rectangles in the other pictures point out the lesion depth. Lesion depth caused by modified needle only penetrates the cerebellum region and does not wound other brain parts [493].
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
256
10. The role of inflammation and immune response in cerebella wound-healing mechanism
FIGURE 10.A2 3D image of blood vessels and proliferating cells during regeneration [493]. (A) 3D image of blood vessels (green) and proliferating cells (red) before lesion. (BI) Represent 1-, 3-, 5-, 7-, 11-, 14-, 21-, and 28-dpl cerebellum, respectively. (B) After 1 dpl a wound can be clearly visible in cerebellum where the signals of blood vessels and proliferating cells disappear. (CF) At 3- to 11-dpl, blood vessels are regrowth in the wound, and proliferating cells are accumulated to the wound. (G) There is a gap in the wound where the blood vessels and proliferating cells are less. (HI) Blood vessels and proliferating cells fill in the wound and the PCNA signals are decrease if compare to 11- or 14-dpl. Immunohistochemistry staining is performed following the standard procedure. Zebrafish brains are harvested and fixed with 4% paraformaldehyde (Merck) overnight at 4 C. We then wash the zebrafish brains four times for 15 min in PBST. The brains are permeabilized in ice-cold acetone for 90 min at 4 C. They are then washed four times for 15 min with maleic acid buffer. Blocking buffer (2% BSA and 2% goat serum in maleic acid buffer) is used to block the brains for 4 h at room temperature. Primary antibodies, rabbit anti-eGFP (1:600; Novus Biologicals, Littleton, Colorado, United States) and mouse anti-PCNA (sc-56, 1:50; Santa Cruz Biotechnology, Santa Cruz, California, United States), are added to the blocking buffer incubate and stored overnight at 4 C. The samples are then washed four times for 30 min with maleic acid buffer. Secondary antibodies, goat antirabbit 488 (1:200; KPL, Gaithersburg, MD, United States) and goat antimouse 549 (1:200, KPL, Gaithersburg, Maryland, United States), are added to the blocking buffer to be incubated with the brains for 2 h. After that, brains are washed four times with maleic acid buffer for 30 min each time. Finally, brains are stored in FocusClear (CelExplorer Labs, Hsinchu, Taiwan) at 4 C. Fluorescence images are taken using A1R confocal microscope (Nikon, Japan).
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
FIGURE 10.A3 The published time-course microarray data for validation [493]. The published data of the optic nerve regeneration is GSE19298 [460]. In this study, microarray analysis is performed on total RNA extracted from whole eye following optic nerve crush or sham surgery at defined intervals (4, 12, and 21 days). The x axis is our time-course microarray data (after normalizing). The y axis is the published time-course microarray data (after normalizing). The Pearson correlation coefficient is denoted as r.
FIGURE 10.A4 The published time-course microarray data for validation [493]. The published data of the spinal cord regeneration is GSE39295 [468]. In this study the spinal cord has been injured by crushing dorsoventrally for 1 s at the level of 15th/16th vertebrae. Later the wound are sealed by placing a suture. Both spinal cord injured and sham operated fish are allowed to regenerate and the progress of regeneration is observed after 1, 3, 7, 10, and 15 days of injury. Zebrafishes are anesthetized deeply for 5 min in 0.1% tricaine (MS222; Sigma, United States) and approximately 1 mm length of spinal cord both rostrally and caudally from injury epicenter are dissected out from 5060 fishes in each batch and pooled for RNA extraction. The x axis is our time-course microarray data (after normalizing). The y axis is the published time-course microarray data (after normalizing). The Pearson correlation coefficient is denoted as r.
FIGURE 10.A5 Examples of the overrepresented pathways enriched by proteins from group A and involved in the acute inflammation and immune response during the wound-healing process. These pathways are primarily involved in the acute inflammation and immune response to TBI. (A) Endogenous cannabinoid signaling pathway, (B) PI3 kinase pathway, and (C) heterotrimeric G-protein signaling pathway-Gq alpha and Go alpha mediated pathway. Proteins in group A are shown in red [493].
FIGURE 10.A6 Examples of the overrepresented pathways enriched by proteins from group P and positively correlated with ZMI. These pathways either play an important role in cytoskeleton regulation, angiogenesis and inflammation, or reflected the process in which behavioral disability increases initially and then decreases. (A) Cell cycle pathway and (B) Parkinson disease pathway. Proteins in group P are shown in red [493].
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
FIGURE 10.A7 Examples of the overrepresented pathways enriched by proteins from group N and negatively correlated with ZMI. These pathways support the functional recovery of neural transmission. (A) Muscarinic acetylcholine receptor 2 and 4 signaling pathway and (B) GABA-B receptor II signaling pathway. Proteins in group N are shown in red [493].
C H A P T E R
11 Key immune molecular biomarkers in the pathomechanisms of early cardioembolic stroke: Multidatabase mining and systems biology approach 11.1 Introduction Proinflammatory cytokine in mice has been found to be a negative regulator of progenitor proliferation [494], which is a crucial step in brain regeneration in the zebrafish model [495]. However, inflammation has been shown to be necessary and sufficient for the enhancement on the proliferation of neural progenitors and subsequent neurogenesis [488]. Because proinflammatory cytokines can promote inflammation, which has stimulated debates about the role of inflammation in stroke recovery, more advanced studies into the relationship between the inflammation and stroke recovery are thus necessary. The close relationship between inflammation and the immune system implies that the role of the immune system in strokes is worth reexamining from a systems biology perspective. A recent genome-wide high-throughput experiment has examined patients with cardioembolic (CE) stroke at # 3, 5, and 24 h after stroke onset and compared this group with a vascular risk factor control group of patients without symptomatic vascular diseases [496]. Some roles of inflammation and immune response in cerebella wound-healing mechanism after traumatic injury in zebrafish was discussed in the previous chapter. However, this chapter has discovered some significant differences in the expression of genes related to cell death, coagulation, and inflammatory pathways. Therefore the roles of inflammation and immune responses in CE stroke still need to be elucidated. The present study has therefore carried out a further exploration of the pathophysiology of ischemic stroke by investigating the roles of immunerelated molecular mechanisms and their relationships with other molecular functions in early CE stroke. Systemic inflammation is found to be linked to the occurrence of strokes and may involve in not only peripheral cells, such as leukocytes, but also brain cells, such as glia, endothelial cells, and neurons [497]. Recent evidence has suggested that components of the immune system are intimately involved in all stages of an ischemic cascade, from the acute intravascular
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00020-1
263
© 2021 Elsevier Inc. All rights reserved.
264
11. Key immune molecular biomarkers in the pathomechanisms of early cardioembolic stroke
events triggered by the interruption of the blood supply to the parenchymal processes, therefore leading to the brain damage and the ensuing tissue repair [498]. The interactions between innate immune cells in the brain have also been better understood in recent years, prompting the realization that each of these cell types can contribute to the development of inflammation in the brain [499]. Multiprotein complexes, known as inflammasomes, can process damageassociated molecular patterns to trigger an effector response [500]. The pathophysiological processes following strokes are very complex, which are found to extensive and include bioenergetic failure, the loss of cell ion homeostasis, acidosis, the increase of intracellular calcium levels, excitotoxicity, the reactive oxygen speciesmediated toxicity, the generation of arachidonic acid products, the cytokine-mediated cytotoxicity, the activation of neuronal and glial cells, the complement activation, the disruption of the bloodbrain barrier, and infiltration of leukocytes [500]. Intravenous recombinant tissue plasminogen activator (tPA), which is used to induce thrombolysis following a thrombotic occlusion, is currently the only pharmacological agent approved for acute stroke therapy [497]. Nevertheless, a major limitation of tPA therapy is its narrow therapeutic window of 34.5 h [498,500]. A better understanding of the processes involved is therefore urgently required to identify significant biomarkers for drug targets to develop enhanced therapies. One way is the construction of the underlying proteinprotein interaction (PPI) networks (PPINs) based on database mining and high-throughput datasets from the blood of normal and stroke subjects, which can be used to investigate the differences between pre- and poststroke, pre- and posttreatment under the effects of the standard tPA treatment. This chapter first focuses on those immune-related cellular functions (e.g., B- and T-cell activation, inflammation, and interleukin (IL) signaling pathway) that are significantly enriched in the constructed PPINs. The interactions of inflammation- and immune-related functions with other significantly enriched cellular functions (e.g., ubiquitin proteasome pathway and multiple growth factors pathways) in the constructed PPINs are then identified at different stages of CE stroke. Using network comparisons between the cellular functional networks, the interactions of immune-related functions with other cellular functions in the early stages of CE stroke are discussed, and their roles in the pathomechanisms of stroke progression are further explored. In addition, proteins with estimated basal level changes in the core PPINs at different stages are used to investigate the roles of microRNA (miRNA) and methylation regulations in stroke pathogenesis. Finally, several potential druggable targets are also proposed based on their significance in the core PPINs and on the literature. The results provide a more comprehensive understanding of stroke pathophysiology from the perspective of systems biology and shed light on the development of targeted therapy for strokes based on core PPIN markers.
11.2 Immune events in pathomechanisms of early cardioembolic stroke 11.2.1 Proteinprotein interaction networks at different stages of cardioembolic stroke This chapter aimed at exploring the pathomechanisms of ischemic stroke by investigating the roles of immune-related functions and their relationships with other cellular molecular functions after CE stroke. To this end, we first utilize the corresponding microarray data and
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
11.2 Immune events in pathomechanisms of early cardioembolic stroke
265
TABLE 11.1 Number of proteins and interactions in four stage-specific constructed proteinprotein interaction networks of cardioembolic stroke [9]. Stage
Number of proteins
Number of interactions
Control (C)
11,554
91,729
# 3 h (I)
9433
52,295
5 h (II)
9432
52,774
24 h (III)
9339
51,989
protein interaction models to construct PPINs and then compare them to examine the changes in cellular functions and core networks during early CE stroke stage. In network construction, microarray data (GSE58294, [496]) is used to identify the interaction activities between proteins (see Section 11.3 for details). Four different PPINs were constructed based on microarray data for four corresponding stages of CE stroke [C: control; I: # 3 hps (hours poststroke); II: 5 hps; III: 24 hps]. The basic information for the constructed PPINs is shown in Table 11.1. It should be noted that all genes and their expression profiles were used to obtain the resultant PPINs, in contrast to a previous procedure [82,501] in which some differentially expressed genes (DEGs) are selected. PPINs in the present study thus represent interactive networks of all proteins, while the previously published PPINs are networks of proteins with significantly differential levels of expression only. Since there are some proteins that do not typically show significant changes in intracellular levels but do play an essential role under altered conditions, we suggest that our more systematic approach to PPIN construction is more appropriate. In each constructed PPIN, about 9% of nodes have significantly differential expression (Bonferroni-corrected P-value # .05) during early CE stroke, which implies that about 90% of nodes are neglected when only DEGs are considered. Using a bioinformatics classification system and principal network projection, each of the constructed PPINs can be presented at two different granular levels: a network of enriched cellular functions and a network of core proteins. In total, 38 enriched cellular functions are included in the four constructed PPINs. Four groups are used to categorize these enriched cellular functions based on their biological significance: immune, neuro/hormone, growth/ death, and general pathways. Inflammation, IL, B- and T-cell activation, and Toll-like receptors (TLRs) belong to the immune group; neurodegenerative diseases, dopamine, corticotropin, endothelin, and acetylcholine-related pathways belong to the neuro/hormone group; multiple growth factorrelated pathways and apoptosis belong to growth/death group; and ubiquitin, G-protein-coupled receptor, transcription, and integrin-related pathways belong to the general pathways group. The presence of cellular functions and the changes in interaction between them in a comparison of the cellular function networks at different stages can provide guidance to the roles played by the enriched cellular functions in the pathomechanisms of early CE strokes. In addition to cellular function networks, principal network projection is used to extract the main features of the constructed PPINs. Inspired by image compression and facial recognition, singular value decomposition (SVD) is used to extract so-called eigen-interactions, which can be used to represent the majority of the interactions in the constructed
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
266
11. Key immune molecular biomarkers in the pathomechanisms of early cardioembolic stroke
PPINs. Core proteins, the interactions of which have high similarity to the principal eigeninteractions, are then used to form core PPINs at each stroke stage. A comparative analysis of stage-specific core PPINs allows the identification of key molecules (biomarkers) in the progression of CE strokes and their evaluation as potential drug targets for future systems drug design. The goal of explaining the cellular functions and molecular mechanisms in the early pathogenesis of CE strokes can thus be achieved using this approach.
11.2.2 Changes in cellular functions and proteins immediately after cardioembolic strokes By comparing the cellular function network of Stage I with Stage C (control), we obtained the differential cellular function network for Stage C to I (Fig. 11.1A). Blood coagulation and the endothelin signaling pathway are conspicuous for their roles in vascular regulation. Even the role of the endothelin signaling pathway in the pathogenesis of CE stroke is unclear, endothelin 1 is found involved in the regulation of basilar constriction, and the dysregulation of basilar artery function may worsen the stroke injury [502]. While the blood coagulation is harmful to stroke patients [503], a coagulation cascade can activate inflammatory and immune responses. The IL signaling pathway (IL1, IL2, IL6, and
FIGURE 11.1 Differential cellular function and core PPI networks from Stage C to I. (A) Differential cellular function network. Node colors indicate the biological significance of the enriched cellular functions. Red: immune; blue: neuro/hormone; green: general pathway; yellow: growth/death. Colors of links indicate changes in interaction ability from Stage C to I. Blue: downregulated; orange: upregulated. Link width indicates the absolute value of the difference of interaction ability from Stage C to I; (B) differential core PPI network. Node colors indicate changes in basal level, representing changes in miRNA and methylation regulations from Stage C to I. Green: lowered; red: elevated. Link colors indicate changes in the interaction ability from Stage C to I. Blue: downregulated; orange: upregulated. Link width indicates the absolute value of the difference in interaction ability from Stage C to I [9]. miRNA, MicroRNA; PPI, proteinprotein interaction.
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
11.2 Immune events in pathomechanisms of early cardioembolic stroke
267
IL10 are found in the constructed PPINs) is in association with the TLR signaling pathway and T-cell activation via the general transcription regulation and the ubiquitin proteasome pathway, respectively. This could explain the role of ILs in the activation of the inflammation- and the immune-related response after the beginning of the stroke. The TLR signaling pathway has found with direct links to Huntington’s disease; so does T-cell activation, via the p38 mitogen-activated protein kinase pathway. This points out the fact that the IL signaling pathway could play a role in neuroprotective processes poststroke [504]. The TLR signaling pathway is found to interact with the angiotensin IIstimulated signaling and the fibroblast growth factor (FGF) signaling pathway. Angiotensin II is a significant causative factor in the cerebrovascular effects of hypertension [505], which has a downregulated interaction with the TLR signaling pathway. The FGF signaling pathway can signify a good prognosis [506] and may lead to an angiogenesis and neuroprotection after strokes [507]. Both the TLR signaling pathway and T-cell activation can interact with the FAS signaling pathway, which has negative effects on neuroprotection and can cause cell death. The tight regulation of the FAS signaling pathway by the inflammation- and immune-related pathways can be seen in this differential cellular function network. The transforming growth factor (TGF)-β signaling pathway has a downregulated interaction with B-cell activation, which can indicate that the ability of TGF-β signaling to limit inflammation is reduced after strokes. In summary, after CE stroke the inflammation- and immune-related pathways are mixed together with the neurodegeneration and cell death pathways and exert a combination of adverse and beneficial actions. Except for the differential cellular function network for C to I, the differential core PPIN can further reveal the cellular molecular mechanisms that operated immediately after strokes. In contrast to the cellular function networks, the node color in the diagrammatic representation of differential core PPINs (Fig. 11.1B) indicates changes in the basal level (β i) of proteins (green and red indicate the lowered and elevated basal levels, respectively). The proteins F2, GP5, SERPINC1, and THBD, which are jointed to the blood coagulation, bridge the complement system to other proteins, including SPP1 and YWHAZ. SPP1 is a cytokine to activate interferon γ (IFNγ) and IL12. SPP1 also links to a group of proteins related to the antigenpresenting process and T-cell activation, that is, the HLA protein family. YWHAZ and YWHAE, two general signal transduction proteins belonging to the 14-3-3 protein family, are also involved in the FGF signaling pathway and Parkinson’s disease. Particularly, YWHAE can also bridge the antigen-presenting process and the control of protein synthesis and turnover. In the group of proteins controlling protein synthesis, function, and turnover, RPS4Y1, a cytoplasmic ribosome, is a protein product of a Y-linked gene. It and its interchangeable counterpart, RPS4X, are found to be overexpressed in the new-onset heart failure [508], which can help explain the existence of RPS4Y1 at the onset of CE strokes, especially in its higher basal level poststroke. Not surprisingly, several proteins involved in the regulation of transcription and translation are present (EIF3E, EIF3A, and GTF2B), as several ribosome proteins are related to protein synthesis. The ubiquitin proteasome pathway not only controls protein synthesis and turnover but also participates in neurodegenerative diseases [509]. UBC (Ubiquitin C), the central protein in the ubiquitin proteasome pathway, can interact with several proteins related to the inflammation (ACTA2), the heme synthesis pathway (FECH), the TGF-β signaling pathway (DUSP14), and the PI3K-Akt signaling pathway (RHEB). The ubiquitin proteasome pathway is thus involved in the protein synthesis and the
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
268
11. Key immune molecular biomarkers in the pathomechanisms of early cardioembolic stroke
turnover of several cellular functions that are crucial to stroke status immediately after stroke onset. In summary, the differential core PPIN has revealed large changes immediately after stroke onset in the interactions between basal levels of inflammation- and immune-related cellular functions, as well as in UBC- and RPS4Y1-related protein synthesis and turnover.
11.2.3 Changes in cellular functions and proteins after tissue plasminogen activator treatment By comparing the cellular function network of Stage II with Stage I, we could obtain a differential cellular function network for I to II (Fig. 11.2A). The major difference between the two stages is the application of tPA treatment: Stage I is untreated and Stage II is treated. The immediate effects of tPA treatment on the cellular functions and proteins can be observed in the differential cellular function and core PPINs for I to II. In the differential cellular function network for I to II, almost all interactions between cellular functions display reverse changes. This includes blood coagulation and the endothelin signaling pathway, the IL signaling and ubiquitin proteasome pathway, the T-cell activation and ubiquitin proteasome pathway, Huntington’s disease and the TLR pathway, and the TLR and FAS signaling pathways. The tPA treatment could not only alter the direction of interaction changes but also strongly enhance the effects of inflammation- and immune-related cellular functions, that is, more cellular functions are connected to these cellular functions and more
FIGURE 11.2 Differential cellular function and core PPI networks from Stage I to II. (A) Differential cellular function network. Node colors indicate the biological significance of the enriched cellular functions. Red: immune; blue: neuro/hormone; green: general pathway; yellow: growth/death. Link colors indicate changes in interaction ability from Stage I to II. Blue: downregulated [9]; orange: upregulated. Link width indicates the absolute value of the difference in interaction ability from Stage I to II; (B) differential core PPI network. Node colors indicate changes in basal level, representing changes in miRNA and methylation regulations from Stage I to II. Green: lowered; red: elevated. Link colors indicate changes in the interaction ability from Stage I to II. Blue: downregulated; orange: upregulated. Link width indicates the absolute value of the difference in interaction ability from Stage I to II [9]. miRNA, MicroRNA; PPI, proteinprotein interaction.
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
11.2 Immune events in pathomechanisms of early cardioembolic stroke
269
interactions are added between the enriched cellular functions. Further, the blood coagulation following tPA treatment has connections to the dopamine receptormediated signaling pathway and Parkinson’s disease, both of which are all related to neurodegenerative diseases. The treatment can cause hemorrhagic side effects [510], which can be explained by the upregulated interaction between the TLR and angiotensin signaling pathways, causing blood vessel instability. The integrin signaling pathway, which plays a crucial role in the vascular stability, can interact with the Parkinson’s disease, the blood coagulation, and the ubiquitin proteasome pathway. The interactions of the ubiquitin proteasome pathway with the TLR signaling pathway, the T-cell activation, and the IL signaling pathway are also downregulated. This may be harmful to the regeneration-promoting ability of inflammationand immune-related cellular functions. In summary, the tPA treatment is shown to reverse most of the trends in interaction activity changes poststroke but may also cause a worsened prognosis. The differential core PPIN for I to II is shown in Fig. 11.2B, with the node and edge styles as shown in Fig. 11.1B. The main difference between these two core PPINs is the separation of HLA-DRB4 from UBC- and RPS4Y1-related cellular functions, caused by the absence of YWHAE and FECH. YWHAE belongs to the FGF signaling pathway and Parkinson’s disease, and FECH is related to the iron regulation. While UBC- and RPS4Y1related cellular functions continue interacting with a similar set of proteins as in the C to I network, most of the interacting proteins can show changes in basal level (β i), for example, RHEB, ACTA1, TAGLN, and RPS4Y1. Basal level changes of these four proteins cause a dramatic change in protein synthesis and turnover following tPA treatment. In summary, the tPA treatment is found to affect the integrity of the core PPIN and to reverse basal level changes in comparison to the differential core PPIN for C to I.
11.2.4 Changes in cellular functions and proteins in early tissue plasminogen activator treatment By comparing the cellular function network of Stage III with Stage II, we can obtain a differential cellular function network for II to III (Fig. 11.3A). Since the major difference between the cellular function networks for these stages is the time after tPA treatment, the differential cellular function network can be used to assess the effects of early tPA treatment. A reverse trend in interaction changes between enriched cellular functions in comparison to the differential core PPINs for I to II can demonstrate decay in the treatment effect over time. The emergence of the Wnt signaling pathway and the platelet-derived growth factor (PDGF) signaling pathway is noteworthy, because of their important roles in the neuroprotection, regeneration, and vascular growth. In this differential cellular function network, interactions of the Wnt signaling pathway with the TLR and endothelin signaling pathways are upregulated, while interactions with the dopamine receptormediated signaling pathway and the transcription regulation by bZIP transcription factor are downregulated. These findings could support the interpretation that the Wnt signaling pathway plays a crucial role in the immune-related cellular functions and neurodegenerative diseases [511]. The PDGF signaling pathway has downregulated interactions with the transcription regulation by bZIP transcription factor, blood coagulation, and the muscarinic acetylcholine receptor
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
270
11. Key immune molecular biomarkers in the pathomechanisms of early cardioembolic stroke
FIGURE 11.3 Differential cellular function and core PPI networks for Stage II to III. (A) Differential cellular function network. Node colors indicate the biological significance of the enriched cellular functions. Red: immune; blue: neuro/hormone; green: general pathway; yellow: growth/death. Link colors indicate changes in interaction ability from Stage II to III. Blue: downregulated; orange: upregulated. Link width indicates the absolute value of the difference in interaction ability from Stage II to III; (B) Differential core PPI network. Node colors indicate changes in basal level, representing changes in miRNA and methylation regulations from Stage II to III. Green: lowered; red: elevated. Link colors indicate changes in the interaction ability from Stage II to III. Blue: downregulated; orange: upregulated. Link width indicates the absolute value of the difference in interaction ability from Stage II to III [9]. miRNA, MicroRNA; PPI, proteinprotein interaction.
(mAChR) signaling pathway, and upregulated interactions with the corticotropin-releasing factor receptor (CRFR) signaling pathway. This points out that the PDGF signaling pathway may be a crucial cellular function mechanism for the effectiveness of tPA treatment [512] and that the mAChR and CRFR signaling pathways may employ different cellular function mechanisms to achieve their neuroprotective roles. In summary, the tPA treatment has been shown to connect several cellular functions to achieve its therapeutic effect over time, and to establish tight connections between these cellular functions. As mentioned previously, the differential core PPIN for II to III (Fig. 11.3B) reveals the molecular mechanisms of early tPA treatment, illustrating how the tPA takes effect after the treatment for 20 h. The differential core PPIN retains the same components (HLA-DRB4, C4BPA, UBC, and RPS4Y1) as the previous differential core PPINs, but these proteins are now disconnected. Several key molecules can be distinguished in this differential core PPIN. Antigen-presenting related HLA class II proteins HLA-DRB4 and HLA-DQA1 have been implied in the heart disease and ischemic stroke [513,514]; the specifics of how tPA treatment affects HLA class II proteins are still unknown; however, ORM1, an acute-phase plasma protein, is present at an increased level due to an acute inflammation; its basal level and downregulated interaction with C4BPA, a multimeric protein that controls the activation of the complement cascade, may be a result of the decayed effectiveness of tPA [515]. FOXA1 and NKX3-1 are two transcription factors active in prostate tumor progression through cooperation with androgen receptor, which is neuroprotective in strokes [516]. Instead of a direct interaction, UBC and RPS4Y1 are connected to interact indirectly through
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
11.2 Immune events in pathomechanisms of early cardioembolic stroke
271
EIF2S3, SNW1, HSP90AA1, and SLC7A9. SNW1 is involved in the notch signaling pathway, which can cause an unusual susceptibility to strokes [517] and may promote cell proliferation and differentiation after strokes [518]. SLC7A9 can mediate the transport of cysteine and control the level of homocysteine in the blood, which is an indicator for vascular diseases and stroke [519]. In summary, the differential core PPIN for II to III has found to become more broken up than the previous ones and points out the more extensive range affected by strokes and tPA treatment.
11.2.5 Pathomechanisms of early stroke and potential drug targets The comparative analysis of the cellular function and core PPINs provides insights into the pathomechanisms of CE stroke and how the standard tPA treatment affects the stroke progression. Fig. 11.4 summarizes the findings of pathomechanism of early CE stroke in this chapter. After the onset of CE stroke, changes in blood vessels activate the endothelin and blood coagulation functions. Via the interactions with coagulation cascade, inflammation- and immune-related functions are then activated to rectify the abnormal status caused by an obstruction in a blood vessel. These cellular functions are activated as compensators attempting to rectify the abnormality by interacting with the protein synthesis and turnover, neurodegeneration, cell death, and proliferation. Nevertheless, these cellular interactions are not always beneficial for the poststroke recovery. In particular, some neurodegenerative diseases are connected to inflammation- and immune-related functions [493]. Moreover, endothelin, blood coagulation, and inflammation- and immune-related functions are dependent on the feedback from protein synthesis and turnover, cell death, and proliferation. Under the tPA treatment, blood coagulation, B- and T-cell activation, and protein synthesis are affected (indicated by &- in Fig. 11.4) and the pathomechanism of CE stroke is also dependent on interference. Although this interference can briefly FIGURE 11.4 Diagram of the pathomechanism of early CE stroke and the enriched cellular functions regulated by the tPA treatment and miRNAs, showing cellular functions and proteins affected after the CE stroke onset. Symbol arrows indicate where tPA treatment (&-) and miRNA regulation (x-) can infer with the progression of stroke and potential targeted cellular functions and proteins [9]. CE, Cardioembolic; miRNA, microRNA; tPA, tissue plasminogen activator.
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
272
11. Key immune molecular biomarkers in the pathomechanisms of early cardioembolic stroke
relieve the symptoms caused by the vessel obstruction, several neurodegenerative diseases can emerge in the cellular function networks after the tPA treatment. Except for the pathomechanisms that determine how CE strokes and the tPA treatment affect the physiology, a comparison of protein basal levels between subsequent stages also provides an insight into epigenetic miRNA and methylation regulation in the stroke pathophysiology. Recent studies have shown that alterations in the miRNA expression could respond to ischemic stroke in animal models [520]. The miRNAtarget pairs [521] consistent with a dysregulation of miRNA following the ischemic stroke[522] and resulting from the comparative core PPIN analysis are summarized in Table 11.2. These potential miRNA/epigenetic regulations of the enriched cellular functions are also pointed out in Fig. 11.4 (indicated by x-). In addition to the miRNA regulation of protein
TABLE 11.2 Potential microRNA and methylation regulations in early cardioembolic stroke pathophysiology [9].
Lowered level
Target protein
Regulation type
BCAT1
miR-21, 25, 140, 146a Cell growth
[521,522] and references therein
AKAP12
miR-29b-1, 181a, 183, 335
Cell growth
[521,522] and references therein
DUSP14
miR-16, 26b
Signaling pathway
[521,522] and references therein
FECH
miR-16, 25, 124
Heme synthesis
[521,522] and references therein
H1F0
miR-181a, 494
Histones
[521,522] and references therein
TAGLN
miR-26b, 149
Undetermined
[521,522] and references therein
UBE2O
miR-328, 335
Protein synthesis and turnover
[521,522] and references therein
RPS4Y1a
miR-19b miR-146a, 335
Protein synthesis and turnover coagulation
[521,522] and references therein
Cellular function
SPP1a
Literature validation
[521,522] and references therein
C4BPA
Hypermethylation
Complement system
[525]
CD3G
Hypermethylation
Complement system
[526]
DEPDC7
Hypermethylation
Protein synthesis and turnover
[528]
FECH
Hypermethylation
Protein synthesis and turnover
[529]
HLA-DQA1 Hypermethylation
Leukocyte activation
[530]
NKX3-1
Protein synthesis and turnover
[532]
Hypermethylation
(Continued)
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
11.3 Material and methods of PPI network construction and principle network projection
TABLE 11.2
Elevated level
273
(Continued)
Target protein
Regulation type
Cellular function
Literature validation
RHEB
miR-18a, 155
Cell growth
[521,522] and references therein
RPL27
miR-186
Protein synthesis and turnover
[521,522] and references therein
ACTA2
Hypomethylation
Inflammation
[524]
CENPK
Hypomethylation
Cell growth
[527]
HLA-DRB4
Hypomethylation
Leukocyte activation
[531]
a
Indicates the selected potential drug targets.
basal levels, methylation regulation is also a potential epigenetic mechanism that can change protein basal levels after the stroke onset. Although studies of methylation in strokes have indicated a range of changes in the protein basal level [523], the targets of methylation regulation have not yet been the subject of a dedicated study. Proteins that have large basal level changes and are not miRNA targets (ACTA2 [524], C4BPA [525], CD3G [526], CENPK [527], DEPDC7 [528], FECH [529], HLA-DQA1 [530], HLA-DRB4 [531], and NKX3-1 [532]) can be considered as potential targets of methylation regulations. Based on the specific targets of miRNA regulations and the position of the target proteins in the core PPINs, several potential drug targets can be selected as target proteins (in Table 11.2). SPP1 in the blood coagulation could bridge the complement systems and antigen presentation, and the connection can activate IFNγ and IL12, making it a potential treatment candidate. Another potential target is RPS4Y1, a malespecific protein that may play a role in the male higher susceptibility to stroke. Finally, the possibility that the targets of miRNA and methylation regulation are identical cannot be ruled out because there are other factors to cause the changes of protein basal levels, such as differential gene regulations through transcription factors. The molecular mechanisms of miRNA and methylation regulations after the CE stroke onset still need further investigation in future studies.
11.3 Material and methods of PPI network construction and principle network projection The analysis flowchart (microarray data preprocessing, PPIN construction, principal network projection, and comparative network analysis) is summarized in Fig. 11.5.
11.3.1 Microarray data for early cardioembolic stroke The microarray dataset for early CE stroke [Gene Expression Omnibus (GEO) Accession No. GSE58294 [496]] contains gene expression data from the blood of both subjects with
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
274
11. Key immune molecular biomarkers in the pathomechanisms of early cardioembolic stroke
FIGURE 11.5 Flowchart of the early CE stroke model analysis process, consisting of data preprocessing, PPI network construction, principal network projection, and comparative analysis of cellular function and core PPI networks [9]. CE, Cardioembolic; PPI, proteinprotein interaction.
CE stroke and a vascular risk factor control group without symptomatic vascular diseases. We have assayed 23 control samples (C) and 23 CE stroke samples for each of three time points [i.e., # 3 (I), 5 (II), and 24 hps (III)]. GC robust multiarray average-empirical-Bayes background adjustment, quantile normalization, and median-polish summarization are performed on the raw data (CEL files) using MATLAB.
11.3.2 Proteinprotein interaction network construction The microarray data processing yields 23,520 gene expression levels at four stages (C, I, II, III). Due to the computational complexity of considering all interactions among all proteins, candidate protein interactions mined from PPI databases are used as candidate PPIN for the subsequent PPIN construction. Since the candidate PPIN considered for the CE stroke condition contains many false-positive PPIs, further pruning using real microarray data is necessary to complete PPIN construction. The details are described in the following sections.
11.3.3 Candidate proteinprotein interaction network construction via multidatabase mining To reduce computational complexity, candidate PPIs have to be provided prior to identifying interaction activities in the protein interaction model. The candidate PPIN is collected from 10 frequently used PPI databases (BIND [533], BioGRID [73], DIP [534], HPRD [535], I2D [536], IntAct [537], MINT [538], PIP [539], Reactome [540], and STRING [541]), which all consist of PPIs based on computational predictions and biological experiments. These candidate PPIs should be then pruned using the corresponding microarray data at different stages of CE stroke to construct realistic stage-specific PPINs. The candidate PPIN contains 15,017 proteins and 319,362 PPIs.
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
11.3 Material and methods of PPI network construction and principle network projection
275
11.3.4 Protein interaction model A protein interaction model is then introduced to describe the PPIs at a specific stage (labeled C, I, II, or III). Assuming there are P proteins in the candidate PPIN (P 5 15,017 in this chapter), the interactions of a target protein i with other proteins in the mth sample can be formulated as follows: yi ðmÞ 5
P X
αik yk ðmÞ 1 β i 1 εi ðmÞ
(11.1)
k51
where yi (m) is the level of target protein i in the mth sample; αik is the interaction activity of target protein i with interacting protein k; yk (m) is the level of protein k in the mth sample (αik 5 0 if there is no interaction between protein k and target protein i, or i 5 k, that is, that protein has no self-interaction); β i is the basal level of target protein i (β i $ 0); and εi (m) is the measurement noise or the stochastic noise from the environment and/or model uncertainty. Eq. (11.1) indicates that the level of target protein i is associated with its interacting proteins, basal level, and stochastic noise. By augmenting the levels of protein i in M samples (M 5 23 in this chapter), that is, by letting yi 5 [yi(1). . . yi(M)]T, ’i 5 1, . . . , P, Eq. (11.1) can be written in regression vector form: yi 5 Φi θi 1 Ei
(11.2)
T where Φi 5 y1 ?yp 1 ; θi 5 αi1 ?αiP β i ; Ei 5 ½Ei ð1Þ?Ei ðMÞT . The next step is to estimate the unknown θi in Eq. (11.2) based on the microarray data. This can be achieved by solving a least squares optimization with linear constraints, as follows: 2
min :Φi θi 2yi :2 such that β i $ 0 θi
(11.3)
The active-set algorithm [259] is used for parameter estimation.
11.3.5 Model order detection and identification Since the PPIs in the candidate PPIN are based on a wide variety of biological experimental conditions and computational predictions in databases, there exist a large number of false-positive PPIs. These false-positive PPIs had to be screened further using microarray data for CE strokes to obtain realistic PPINs for specific biological stroke stages. The Akaike information criterion (AIC) is used to select the true interaction model order (i.e., the real number of proteins interacting with protein i) [81]. For a protein interaction model for target protein i with order L, where LA{0, . . . ,P}, that is, L proteins interact with target protein i, the AIC value is calculated as follows: :Φi θ^ i 2yi :2 2L 1 M M 2
AICi ðLÞ 5 log
(11.4)
where θ^ i is the solution of parameter estimation in Eq. (11.3). According to the theory of system identification [81], the true system order should minimize AIC value in Eq. (11.4).
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
276
11. Key immune molecular biomarkers in the pathomechanisms of early cardioembolic stroke
By forward selection and backward elimination, the model order L with the lowest AIC value for the protein interaction model of target protein i can be obtained. After completing model order detection and identification, the estimated parameters are further tested for their significance using Student’s t-test with the null hypothesis is αij 5 0 and a P-value threshold of .05. The following is pseudo-code we used for the model order test based on the minimum AIC value. Details of the network construction can be found in the network construction section of supplementary files. Require: candidate network, gene expression profiles at a specific stage for all protein i in the candidate network do yi’expression profiles of protein i Φi’expression profiles of all proteins interacting with protein i in the candidate network function AICSTEPWISE(Φi, yi) Start with forward selection and after each candidate (other than the first one) is added to the model, perform backward elimination to see if any of the selected candidates can be eliminated without increasing the AIC value. return θi end function function TTEST(Φi, yi, θi) Calculate P-value for each interaction activity αij in θi and delete if $ .05. return θi end function end for The AIC stepwise procedure for determining the order of the A1BG protein interaction model is given in Table 11.A1 as an example in Appendix. Finally, by assembling the estimated parameters αij, i, j 5 1, . . . , P into a matrix, the resulting PPINs at four stages can be represented as NC, NI, NII, and NIII.
11.3.6 Core proteinprotein interaction network projection 11.3.6.1 Cellular function networks To improve the capturing of essential information from the constructed PPINs, two different levels of analyses are used to explore the cellular function and molecular mechanism relationships at different stages of the early stroke. First, the proteins in the constructed PPINs can be divided into several groups according to the PANTHER function classification system [542], based on their belonging cellular functions. The cellular function networks at each stage consist of these enriched cellular functions and the interactions between them. The interaction activity between two enriched cellular functions is obtained by summing the interaction activities between member proteins of the two cellular functions. The up- and downregulated interactions between enriched cellular functions can be observed by differentiating the cellular function networks of two stages.
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
11.4 Conclusion
277
11.3.6.2 Core proteinprotein interaction networks Second, due to the large size of the constructed PPINs, their crucial components can be more effectively illustrated by a core PPIN. A core PPIN is defined as a PPIN of core proteins, the interactions of which are similar to the principal eigen-interactions of the constructed PPINs. SVD is applied to determine the eigen-interactions vi by extracting the main features of the constructed PPINs. Given that N is the matrix representation of the PPIN at different stages of CE stroke (i.e., N 5 NC, NI, NII, or NIII in this chapter), the SVD of N is: N 5 UΣVτ
(11.5)
where U and V are unitary matrices and Σ is a diagonal matrix with singular values σi of N on its diagonal. The eigen-interactions vi are the columns of V, that is, V 5 [v1. . .vP] with the corresponding singular values σi such that σ1 $ σ2 $ ? $ σP. The percentage of the PPIN explained by the ith eigen-interaction can be calculated as follows: σ2 r i 5 PP i i51
σ2i
3 100%
(11.6)
We can then choose M principal eigen-interactions to meet a heuristic condition that will PM be application dependent. In this chapter, we choose the smallest M such that i51 r $ 85%, which is usually used in the principal component analysis. A core PPIN can then be constructed by selecting proteins based on the similarity of their interactions to the principal eigen-interactions v1, . . . ,vM. The inner product between protein interactions ([αi1. . .αiP]) and eigen-interactions (vi) is used to evaluate the similarity of the interactions. Proteins with the similarity above some threshold (. 6 in this chapter based on the number of nodes in the resulting core networks) are called core proteins, and the PPIN formed by the core proteins and the interactions between them is called the core PPIN.
11.4 Conclusion In this chapter, PPINs for four stages of stroke pathogenesis are constructed in a systems biology framework, based on multidatabase mining, microarray data, and protein interaction models. Cellular function classification and principal network projection are employed to derive cellular function and core PPINs. Comparative PPIN analysis is then used to investigate the underlying molecular mechanisms of stroke pathogenesis at cellular function and protein levels. The configuration of enriched cellular functions after stroke onset can suggest a reasonable pathogenesis mechanism (Fig. 11.4). Potential core PPIN markers of miRNA and methylation regulations are also proposed as potential therapeutic drug targets. Our results provide a direction for future study in stroke pathogenesis and treatment.
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
278
11. Key immune molecular biomarkers in the pathomechanisms of early cardioembolic stroke
11.5 Appendix
TABLE 11.A1 The Akaike information criterion (AIC) stepwise procedure for determining the order of the A1BG protein interaction model [9]. Step
Model
Initial
yi ðmÞ 5 β i 1 εi ðmÞ
First forward selection
yi ðmÞ 5 αik yk ðmÞ 1 β i 1 εi ðmÞ; k 5 1; . . . ; 9 When k 5 6, the AIC value can be minimized
23.2079
Second forward selection
yi ðmÞ 5 αik yk ðmÞ 1 αi6 y6 ðmÞ 1 β i 1 εi ðmÞ; k 5 1; . . . ; 5; 7; 8; 9 When k 5 3, the AIC value can be minimized
23.2230
First backward elimination
yi ðmÞ 5 αik yk ðmÞ 1 β i 1 εi ðmÞ; k 5 3; 6 No k are needed to be eliminated to decrease AIC value
Third forward selection
yi ðmÞ 5 αik yk ðmÞ 1 αi3 y3 ðmÞ 1 αi6 y6 ðmÞ 1 β i 1 εi ðmÞ k 5 1; 2; 4; 5; 7; 8; 9 No k are needed to be selected to decrease AIC value
III. Systematic Inflammation and Immune Response in Restoration and Regeneration Process
AIC value 0.4556
C H A P T E R
12 Cross-talk network biomarkers of pathogenhost interaction network from innate to adaptive immunity 12.1 Introduction Our immune system protects us from deadly threats from pathogens. To function effectively, the immune system has to detect the invasion of exogenous pathogens, watch for the pathogenic conversion of endogenous microbes, communicate the threats to the other organs in our bodies, for example, the nervous [543546] and digestive system [547549], and then coordinate these systems to evade the threats. Obviously, the immune system cannot function alone. In the past the studies about the immune system [550552] focused on the molecular functions and cellular constitution of the immune system itself, and on the physiological effects of immune-related molecules and cells. However, the immune system is one part of a biological organism. Hence, from a systematic perspective, we should consider all organic systems as a whole and not view the immune system in isolation. Immune-related molecules (e.g., chemokines, cytokines, and interferons) and cell types (e.g., lymphocytes, monocytes, and mast cells) have been commonly studied with respect to the molecular functions and cellular constitution of the immune system. After activating the first line of the host defense mechanisms (i.e., innate immunity), several cell types (e.g., macrophages, dendritic cells, and natural killer cells) are recruited to protect the host from pathogen invasion and eliminate the threats from pathogens. The recognition of pathogen-associated molecular patterns and/or damage-associated molecular patterns by pattern recognition receptors (PRRs) (e.g., toll-like receptors and C-type lectin receptors) [364,552] can be viewed as a starting point in a series of the following complex defensive mechanisms. The PRRs initiate downstream pathways that promote the activation of other parts of the innate immune system and the clearance of pathogens (e.g., the production and secretion of cytokines, chemokines, and chemotactic cues to recruit more leukocytes). At the same time, the macrophages and dendritic cells are responsible for presenting antigens to induce the synthesis of some antibodies specific to the presented antigens if it is
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00002-X
281
© 2021 Elsevier Inc. All rights reserved.
282
12. Cross-talk network biomarkers of pathogenhost interaction network from innate to adaptive immunity
the first exposure of the host to the pathogen (Fig. 12.1A). Alternatively, if it was not the first exposure of the host to the pathogen, the existing immunological memory cells proliferate and induce the synthesis of antibodies (Fig. 12.1A). In short, the interplays between T cells, B cells, macrophages, and dendritic cells have been coordinated elaborately at the physiological level. For the treatment of infectious diseases, current drug targets focus on some key molecules rather than the cell level. Therefore the investigation of systematic offensive and defensive mechanisms at a molecular level is the most important topic from a drug discovery and systems design perspective. To target molecules and functional modules for systems drug design and therapy against infectious diseases, a holistic picture of the molecular interaction networks and network biomarkers is indispensable. Compared with the immune system of the host, pathogenic mechanisms, not to mention interspecies proteinprotein interactions (PPIs) between the host and pathogen, have attracted less attention. The battle between host and pathogen is a two-sided affair: that is, the interplays between the host and pathogen shape the whole infection process, from the first exposure to the pathogen to the final outcomes of the infection process [553]. Nevertheless, about a decade ago, the traditional viewpoint to treat the host and pathogen separately was shifted to a more holistic viewpoint that includes both players in the infection process. This viewpoint transition results from (1) the realization of the indispensability of pathogenhost interactions (PHIs) in infectious diseases and (2) the advent of omics biotechnology to quantify genes, transcripts, and proteins at the whole cell/organism
FIGURE 12.1
Study design and the flowchart of PH-PPIN construction. (A) The first and second exposures induced the innate and adaptive immune responses, respectively, and the two-sided temporal gene expression profiles were recorded by microarray experiments (the red rectangles are the observation windows of microarray experiments). (B) The flowchart delineates the procedures used in this chapter. Selected proteins of interest based on the microarray data formed a protein pool. The PPI candidates collected from the database mining and ortholog information were further pruned into the innate and adaptive dynamic PH-PPINs by the dynamic interaction model, system order detection method, and microarray data. Finally, the interaction variation scores were used to evaluate the significance of proteins in the interaction difference network, which was derived from the two constructed PH-PPINs. PH-PPIN, Pathogenhost proteinprotein interaction network.
IV. Systems Innate and Adaptive Immunity in the Infection Process
12.2 Material and methods
283
levels [554]. This could permit a comprehensive interrogation of both the pathogen and host at the whole-genome, transcriptome, and proteome levels. Despite the tremendous advances in understanding pathogenic mechanisms and the subsequent triumphs in drug development [555], the remaining drug issues (e.g., drug resistance) of infectious diseases have become more troublesome. The dynamic and complex interactions between the host and pathogen may partially explain why certain drugs are often not effective in vivo [556]. Hence, to investigate the hostpathogen cross-talk mechanism in infection processes from a systematic perspective, in this chapter, we constructed dynamic pathogenhost PPI networks (PH-PPINs) of innate and adaptive immunity. To obtain systematic molecular interaction networks for drug -targeted therapy, we utilized the Candida albicanszebrafish infection model [320]. We measured the twosided temporal gene expression profiles of C. albicans and zebrafish during the infection process, constructed the interspecies PPIs using a dynamic interaction model, and identified the cross-talk network biomarkers with the proposed interaction variation scores (Fig. 12.1B). Given the success of the C. albicanszebrafish infection model [320] as well as its amenability to genetic manipulation [557], the zebrafish is a novel and potential model organism to study the immunity. Furthermore, the zebrafish and human immune systems are remarkably similar, and more than 75% of human genes implicated in diseases have counterparts in zebrafish [558]. This provides a strong connection between the zebrafish and humans with respect to pathogenic mechanisms as well as host immune responses, which are important for biomedical applications. The immune system of zebrafish as well as other vertebrates can be divided into two subsystems: that is, innate (unspecific) and adaptive (specific) immunity [364]. The first dataset (GSE32119, [374]) we used to construct a dynamic PH-PPIN was measured from the two-sided gene expression profiles during the first 18 h after zebrafish was first exposed to C. albicans to induce primary responses. The second dataset (GSE51603, [10]) was measured from the two-sided gene expression profiles during the first 42 h after zebrafish was secondarily exposed to C. albicans to induce secondary responses. To extract the interaction information from the time-course microarray data, two dynamic PH-PPINs were constructed for the innate and adaptive immunity in the infection process. By evaluating the interaction variations based on the corresponding interaction variation scores, some critical proteins and the corresponding cross-talk network biomarkers of larger interaction variations in the infection process were identified. These cross-talk network biomarkers suggest the pathogenic and defensive strategies taken by the host and pathogen during the infection process. Thus these cross-talk network biomarkers could be potential drug targets when battling infectious diseases [554].
12.2 Material and methods 12.2.1 Overview of microarray data In this chapter, there were two microarray datasets: one was the two-sided temporal gene expression profiles of the host (zebrafish) and pathogen (C. albicans) in the period after first
IV. Systems Innate and Adaptive Immunity in the Infection Process
284
12. Cross-talk network biomarkers of pathogenhost interaction network from innate to adaptive immunity
exposure (GSE32119, [374]), which were used to record the PHI information of innate immunity; the other was the two-sided temporal expression profiles of the host and pathogen in the period after secondary exposure (GSE51603, [10]), which were used to record the PHI information of adaptive immunity. In the first dataset an experiment was performed to simultaneously profile the genome-wide gene expressions of innate immunity in both C. albicans and zebrafish during the infection process. C. albicans (SC5314 strain) was intraperitoneally injected into adult AB strain zebrafish. The second dataset measured the genome-wide gene expressions of adaptive immunity in both C. albicans and zebrafish after the second exposure to C. albicans, 14 days after the first exposure. Then, a two-step homogenization/mRNA extraction procedure was performed using the whole zebrafish infected with C. albicans. This method could provide separate pools of gene transcripts from both the host and the pathogen, which could provide individual estimates of specific gene expression profiles in either the host or pathogen using sequencetargeted probes derived from the individual genomes. Agilent in situ oligonucleotide microarrays, which cover 6202 and 26,206 genes for C. albicans and zebrafish, respectively, were used to profile temporal gene expressions; the first dataset consisted of three replicates of each organism measured at 9 time points (0.5, 1, 2, 4, 6, 8, 12, 16, and 18 h postinjection), and the second dataset consisted of two replicates of each organism measured at 8 time points (2, 6, 12, 18, 24, 30, 36, and 42 h post-reinjection). Both datasets were prepared under similar experimental conditions.
12.2.2 Protein pool selection and database mining There are two construction steps that need to be completed before a dynamic PPI network with a dynamic interaction model can be constructed. The first construction step is to have a protein pool from which the nodes in the PPI networks will be selected, and the second construction step is to obtain all possible PPIs among the proteins in the protein pool by integrating the interaction information from database mining. Here, our protein pool consisted of the union of the differentially expressed genes in the first and second datasets and the differentially expressed genes between the first and second microarray datasets. The criterion used to select the differentially expressed genes in the first and second microarray datasets was to compute the P-value of the ANOVA test to determine whether the average expression levels differed over time (i.e., for the first dataset, the null hypothesis was μ1 5 ? 5 μ9 , and the average expression levels were the same for all the 9 time points; for the second dataset, the null hypothesis was μ1 5 ? 5 μ8 , and the average expression levels were the same for all the 8 time points), and then to select those proteins with a Bonferroni-corrected P-value ,.05 for inclusion in the protein pool. In addition, the genes in the top 5% of the expression difference between the first and second datasets were selected for the protein pool. Next, to know all possible interactions between the proteins in the protein pool, the interaction information for the zebrafishzebrafish, C. albicansC. albicans, and zebrafishC. albicans pairs are needed. However, the lack of information about these three kinds of interactions makes it difficult to collect all possible interactions. In addition, it is impossible to know the information of all interactions between the proteins in the protein pool. To overcome this issue, the protein interaction information from human and yeast was used because of their similarity to our study subjects (zebrafish and C. albicans) and their data availability. To infer the possible protein interactions of the study subjects (zebrafish and C. albicans), the ortholog information in the InParanoid database [375] was used to convert the protein interactions of human and yeast
IV. Systems Innate and Adaptive Immunity in the Infection Process
285
12.2 Material and methods
[374,540] into the protein interactions of zebrafish and C. albicans. It should be noted that the protein interactions inferred from the ortholog-based method were derived under different experimental conditions. Consequently, the data do not accurately reflect the actual biological condition of the PHIs during the C. albicans infection process; that is, false-positive protein interactions exist in the complete set of inferred possible protein interactions of zebrafish and C. albicans, and these false-positive protein interactions need to be validated and removed using real microarray data. Therefore the false-positive interactions were deleted from the candidate PPIs and then the realistic pathogenhost cross-talk PPI networks in innate and adaptive immunity were constructed using the two-sided microarray data and the dynamic model of PPI interaction in the following section.
12.2.3 Construction of pathogenhost proteinprotein interaction network To construct the interspecies PPI network from the protein pool and candidate PPIs, the dynamic PPI model was used to determine the realistic PH-PPIN using individual proteins in succession. Given that the total numbers of the host and pathogen proteins are N and M, respectively, then for a host target protein i in the PH-PPIN, the dynamic interaction model is given as follows [6]: pði hÞ ½k 1 1 5 σði hÞ pði hÞ ½k 1
N X n51
αðinhÞ pðnhÞ ½k 1
M X
ðpÞ γ im pm ½k 1 β ði hÞ 1 Eði hÞ ½k 1 1
(12.1)
m51
where pði hÞ ½k is the protein level of the host target protein i at time k, Eði hÞ ½k is the environmental noise at time k, σði hÞ is the self-regulation ability of the host target protein i, αðinhÞ is the interaction strength between the host protein n and the host target protein i, γ im is the interaction strength between the pathogen protein m and the host target protein i, and β ðhÞ i is the basal level of the host target protein i. Similarly, the dynamic interaction model of a pathogen target protein j can be described as follows: M N X X ðpÞ ðpÞ ðpÞ ðpÞ ðpÞ ðpÞ ðpÞ pj ½k 1 1 5 σj pj ½k 1 αjm pm ½k 1 γ jn pðnhÞ ½k 1 β j 1 Ej ½k 1 1 m51
(12.2)
n51
ðpÞ
ðpÞ
where pj ½k is the protein level of the pathogen target protein j at time k, Ej ½k is the ðpÞ environmental noise at time k, σj is the self-regulation ability of the pathogen target ðpÞ protein j, αjm is the interaction strength between the pathogen protein m and the pathogen target protein j, γ jn is the interaction strength between the host protein n and the ðpÞ pathogen target protein j, and β j is the basal level of the pathogen target protein j. The biological significance of this formulation is that the protein level of the host (pathogen) target protein i ðjÞ at the future time k 1 1 is determined by its current proðpÞ tein level (at time k) with the self-regulation ability σðhÞ ðσj Þ, the interaction strength i between the host (pathogen) target protein i ðjÞ and the proteins of the same species ðpÞ ðpÞ ðhÞ αðhÞ in ðαjm Þ and the other species γ im ðγ jn Þ, the basal level β i ðβ j Þ, and the environmental ðpÞ ðhÞ noise Ei ðEj Þ in the future. Due to the unavailability of proteomic data, the expression levels measured by the two-sided microarray experiments were used to represent the
IV. Systems Innate and Adaptive Immunity in the Infection Process
286
12. Cross-talk network biomarkers of pathogenhost interaction network from innate to adaptive immunity
protein levels in the dynamic interaction model. The dynamic interaction model for the host target protein i can be further rewritten into a concise form as follows: pði hÞ 5 Φi θði hÞ 1 Eði hÞ where
h iT h pði hÞ 5 pði hÞ ½1 ? pði hÞ ½K , θi 5 αði1hÞ
2
6 and Φi 5 4
pð1hÞ ½0 ^
? &
pð1hÞ ½K 2 1
?
hÞ ? αðiN
γ i1
? γ iM
ðpÞ p1 ½ 0 ? ^ & ðpÞ pðNhÞ ½K 2 1 p1 ½K 2 1 ? pðNhÞ ½0 ^
(12.3) σði hÞ
β ði hÞ
iT
,
h iT Eði hÞ 5 Eði hÞ ½1 ? Eði hÞ ½K ,
3 ðpÞ pM ½ 0 pði hÞ ½0 1 7 ^ ^ ^5 ðpÞ pM ½K 2 1 pði hÞ ½K 2 1 1
The dynamic interaction model for the pathogen can also be rewritten into a similar form. The only unknown parameter θðhÞ i can then be estimated using parameter estimation methods, such as the least-squares estimation. However, due to the lack of large-scale measurements of host and pathogen protein levels, we used the temporal gene expression profiles as a substitute of protein activities to identify the parameter θðhÞ i in the model. Furthermore, to prune the false-positive PPIs in the candidate PH-PPIN, the Akaike information criterion (AIC) was introduced to detect the true model order (the number of interactions). The true model order with the minimum AIC was considered as the criterion to delete false-positive interactions in the candidate PH-PPINs. Hence, the final dynamic PH-PPINs encompass the dynamic interaction model of each protein with the minimum AIC value to remove the false positive PPIs. Finally, after identifying the parameters for each protein in the protein pools, the identified ðpÞ interaction parameters ðαðhÞ in ; αjm ; γ im ; and γ jn Þ formed the final dynamic PH-PPIN.
12.2.4 Interaction variation score calculation To identify the network biomarkers in the PH-PPINs, the interaction variation scores (IVSs) were calculated for proteins to correlate proteins with the obvious transition of the PHIs from innate to adaptive immunity. The proteins in the PH-PPINs with the largest PPI variations from innate to adaptive immunity can be considered as the cross-talk network biomarkers in the entire infection process and are considered as the significant drug targets. Therefore we investigated these cross-talk network biomarkers in the following. The IVS is a measurement of the variation of the interaction strength under a biological condition transition. According to the dynamic interaction models, the constructed PH-PPIN under a specific condition (innate or adaptive) can be written as follows: 3 2 ðhÞ σ pð1hÞ ½k 1 1 7 6 ^1 6 ^ 7 6 6 7 6 ðhÞ 6 ðh Þ 6 pN ½k 1 1 7 6 αN1 7 6 6 ðpÞ 6 p ½k 1 1 7 5 6 γ 7 6 11 6 1 7 6 ^ 6 ^ 5 4 4 p ðÞ γ M1 pM ½k 1 1 2
? & ?
hÞ αð1N ^ σðNhÞ
? &
γ 1N ^
?
γ MN
γ 11 ^ γ N1 ðpÞ σ1 ^ ðpÞ αM1
32 ðhÞ 3 2 ðhÞ 3 2 ðhÞ 3 p1 ½k E1 ½k 1 1 β1 ? γ 1M 7 6 7 6 7 6 & ^ 7 ^ 76 ^ 7 6 ^ 7 6 7 76 ðhÞ 7 6 ðhÞ 7 6 ðhÞ 7 ? γ NM 76 pN ½k 7 6 β N 7 6 EN ½k 1 1 7 7 6 ðpÞ 7 1 6 ðpÞ 7 1 6 ðpÞ ðpÞ 7 7 6 7 6 7 6 ? α1M 7 76 p1 ½k 7 6 β 1 7 6 E1 ½k 1 1 7 7 6 ^ 7 6 ^ 7 6 & ^ 7 ^ 54 5 4 5 4 5 p p p p ðÞ ðÞ ðÞ ðÞ ? σM βM pM ½k EM ½k 1 1
IV. Systems Innate and Adaptive Immunity in the Infection Process
(12.4)
12.3 Investigating PH-PPINs for cross-talk network markers from innate to adaptive immunity
287
where the notations are the same as in the dynamic interaction models. The previous equation can be written in a more concise form: p½k 1 1 5 Ap½k 1 β 1 E½k 1 1
(12.5)
where A is a systematic interaction matrix of the PH-PPIN constructed under a specific condition. The interaction difference of two PH-PPINs between innate and adaptive immunity can be calculated in the following interaction difference matrix form: Dadaptive2innate 5 Aadaptive 2 Ainnate
(12.6)
If the variation of the interaction strength of a protein is larger during a biological condition transition (innate-adaptive immunity in this chapter), this may imply that the protein plays a more important role in the transition from innate to adaptive immunity. Therefore the IVS used to evaluate the interaction variability of a protein in the transition from the innate to adaptive immunity can be defined as follows: PQ q51 dpq IVSp 5 (12.7) Degree of protein p where dpq is the pq-entry of Dadaptive-innate , that is, the average interaction variation of the protein p in the transition from innate to adaptive immunity. The degree of protein p is the number of nonzero elements in the pth row of the interaction difference matrix Dadaptive-innate . Those proteins with larger IVSs are considered as significant proteins that play an important role in the transition from innate to adaptive immunity in the infection process.
12.3 Investigating PH-PPINs for cross-talk network markers from innate to adaptive immunity 12.3.1 The pathogenhost proteinprotein interaction networks of innate and adaptive immunity In this chapter, we aimed to investigate the systematic offensive and defensive mechanisms of pathogen and host at the molecular level, respectively. In particular, we aimed to understand the roles of PHIs in innate and adaptive immunity from a systems biology perspective. The cross talks of interactions between the host and pathogen were recorded based on the two-sided temporal gene expression profiles of C. albicans and zebrafish that were simultaneously measured during the primary and secondary response periods in the infection process. During the two periods (the red rectangles in Fig. 12.1A), the observed variations in the gene expression levels were mainly due to the innate and adaptive immunity, respectively. We have further selected 1620 proteins of interest for the protein pool, including those with differentially expressed features, that is, the top 5% of the expression level difference between the two datasets. The comparison of their temporal profiles (Fig. 12.2A) implied that their expression patterns changed: the activation of a group of pathogens’ genes was delayed and the repression of a group of hosts’ genes was advanced. The changes in the gene expression patterns implied the PHIs in these two periods should have corresponding variations. To determine the variations of the
IV. Systems Innate and Adaptive Immunity in the Infection Process
288
12. Cross-talk network biomarkers of pathogenhost interaction network from innate to adaptive immunity
FIGURE 12.2 Temporal gene expression profiles of the proteins of interest and the constructed innate and adaptive dynamic PH-PPINs. (A) The horizontal axis indicates the sampling time points in the microarray experiments. The vertical axis shows the genes clustered according to their expression patterns in innate immunity. (B) The innate and adaptive PH-PPINs consist of PPIs in three domains: pathogenpathogen, pathogenhost, and hosthost. PH-PPIN, Pathogenhost proteinprotein interaction network.
underlying PHIs, 26,060 PPI candidates inferred from the database mining and ortholog information were further pruned using the dynamic interaction models, model order detection method, and two-sided microarray data (Fig. 12.1B) and then the innate and adaptive dynamic PH-PPINs were formed (Fig. 12.2B). In particular, the two constructed
IV. Systems Innate and Adaptive Immunity in the Infection Process
12.3 Investigating PH-PPINs for cross-talk network markers from innate to adaptive immunity
289
PH-PPINs were the underlying molecular mechanisms used to explain the observed changes in the gene expression patterns in the infection process. The resultant PH-PPINs consisted of 1512 proteins (1431 C. albicans proteins; 81 zebrafish proteins) and 5721 PPIs (5510 intracellular interactions inside C. albicans; 145 interspecies interactions; and 66 intracellular interactions inside zebrafish) for innate immunity, 1578 proteins (1480 C. albicans proteins; 98 zebrafish proteins), and 3755 PPIs (3577 intracellular interactions inside C. albicans; 96 interspecies interactions; and 82 intracellular interactions inside zebrafish) for adaptive immunity. Looking at the amount of variation in the nodes and edges of the pathogen, although most of the pathogenic nodes are shared between innate and adaptive immunity, the number of edges changed from 5511 to 3577: that is, only 1203 edges are shared. This implies that the pathogen may use almost the same set of proteins (B85%) but with different links to interact with the host and to regulate cellular functions within the pathogen itself in response to various challenges from the innate and adaptive immunity. In contrast the host may use a different strategy, since a different distribution of node and edge numbers was found when compared with the pathogen. In the zebrafish, there are three more significantly enriched cellular functions (angiogenesis, coagulation, and circadian clock) in the adaptive PH-PPIN when compared with the innate PH-PPINs (metabolic processes, immune responses, and apoptosis). In addition, in C. albicans, there are two more significantly enriched cellular functions (circadian clock and filament growth) when compared with the innate PH-PPINs (response to stimulus, redox status, and budding). The new cellular functions in the adaptive PH-PPIN compared with the innate PH-PPIN indicated changes in the response strategies of the host and pathogen. To efficiently identify and evaluate the significance of proteins in the innate and adaptive dynamic PH-PPINs, we differentiated the two PH-PPINs into an interaction difference network (IDN) (Fig. 12.3), that is, the matrix D in Eq. (12.6), and then used IVSs to evaluate the interaction variations of proteins in the IDN.
12.3.2 Identifying cross-talk network biomarkers in the interaction difference network using interaction variation score Cell signalings depend on dynamic PPINs [559]. Hence, the interaction variations in the PPINs indicate the changes in the cell signaling and the corresponding consequences in the cellular functions. To illustrate the variation of PPINs, we adopted the notations of node color, edge color, and edge line style as shown in Fig. 12.3A to illustrate the existence of proteins and their interactions, and the variation of interactions from innate to adaptive dynamic PHPPINs. Further, to focus on the proteins with significant variations, the IVS given in Eq. (12.7) was used to evaluate the average interaction variation of a protein: that is, the ratio of the total interaction variation of a protein to the number of links possessed by the protein. Hence, the IVS can quantify the extent of the interaction variations, which may signify the significance of the proteins in the transition from the innate to adaptive immunity, that is, the IDN between innate and adaptive PH-PPINs (Fig. 12.3B). In the following, we focused on the proteins with the 10 highest IVSs in the three domains, that is, the hosthost (zebrafishzebrafish), pathogenpathogen (C. albicansC. albicans), and pathogenhost (C. albicanszebrafish), and identified the cross-talk network biomarkers in these domains.
IV. Systems Innate and Adaptive Immunity in the Infection Process
290
12. Cross-talk network biomarkers of pathogenhost interaction network from innate to adaptive immunity
FIGURE 12.3 The IDN between innate and adaptive immunity. (A) The following notations are used in the IDN. Node color: blue, red, and purple indicate the presence of a protein in innate immunity only, adaptive immunity only, and both innate and adaptive immunity, respectively. Edge color: blue and red indicate attenuated and enhanced interactions, respectively. Edge style: dash, dot, and solid line indicate the existence of an interaction in innate immunity only, adaptive immunity only, and both innate and adaptive immunity, respectively. (B) The IDN between innate and adaptive immunity consists of the interactions in the three domains: pathogenpathogen, pathogenhost, and hosthost domains. The round and square nodes indicate the pathogen and host proteins, respectively. IDN, Interaction difference network.
12.3.3 The cross-talk network biomarkers in the hosthost domain In the hosthost domain of the IDN, the 10 significant proteins with the highest IVSs show more close relationships with innate and adaptive immune responses. Extracting the 10 proteins and their first neighbors from the IDN, there are 5 components in the hosthost domain (Fig. 12.4A). The largest component consists of f2, LOC798231, LOC793315, ace2, gnai1, and their first neighbors (the left part of Fig. 12.4A). gnat2, a host G-protein that formed one end of the interspecies interaction, has connections with chemokine-related proteins (ccl-c5a and si:dkey-269d20.3) and chemotaxis-related proteins (ENSDARP00000105159 and ENSDARP0000111107). The angiogenesis- and coagulation-related proteins (agt, ace2, f2, and ENSDARP00000098661) are connected to these chemokine-related proteins. This component also consists of two other proteins: that is, serine proteinase inhibitor (serpinc1) and prokineticin (ENSDARP00000109666). The roles of angiogenesis, coagulation, and chemokines are manifested in the innate and
IV. Systems Innate and Adaptive Immunity in the Infection Process
12.3 Investigating PH-PPINs for cross-talk network markers from innate to adaptive immunity
291
FIGURE 12.4 The cross-talk network biomarkers in the hosthost and pathogenpathogen domains. (A) Chemokines, the complements system, and angiogenesis and coagulation are the three major cross-talk network biomarkers in the hosthost domain of the IDN owing to the higher IVSs of their members. (B) Redox status and pathogen expansion are the two major cross-talk network biomarkers in the pathogenpathogen domain of the IDN owing to the higher IVSs of their members. The shadowed nodes represent the proteins with the ten highest IVSs in their domains. IDN, Interaction difference network; IVS, interaction variation scores.
adaptive immunity in this component. The second component mainly consists of complements (c7b, c8g, c8a, c8b, and c9) and vitronectins (vtna and vtnb). Because of the well-known roles of the complements system in immunity, vitronectins have recently attracted much attention in the field of immunity [560]. The cd36 and apolipoproteins (apob1, apoba, and apobb) form the third component (the lower right part of Fig. 12.4A). CD36 plays a pivotal role in the macrophage foam-cell formation and atherogenesis, which is reduced by apolipoproteins. Although the last two components are much less documented, the roles of versican (vcanb) and tank in the inflammation have been reported [561].
12.3.4 The cross-talk network biomarkers in the pathogenpathogen domain In the pathogenpathogen domain, the 10 significant proteins with the highest IVSs and their first neighbors form a single component (see Fig. 12.4B). In this component the significance of redox status in the innate and adaptive immune responses is reemphasized [6]. ERG1, CAL0005908, MET10, and GCV3 are all related to the redox status of C. albicans. In addition, CAL0005225, ERG1, and SDS24 are related to the expansion of C. albicans due to their cellular functions in budding, filament growth, and the cell cycle, respectively. In particular, MET10 is related to the response to stress from the host and environment. Another major cellular function in this component is the transferase activity. MET2 is a homoserine acetyltransferase that can transform homoserine, a toxin for C. albicans, and is important for the C. albicans survival. ARG3 facilitates the production of citrulline, which can induce the pseudohyphal morphogenesis. The morphological transformation of C. albicans has been proven to be significant in its pathogenesis. The hydrolase CAF16 exerts its influence on RNA polymerase II, although the specific targeted genes are still unknown.
IV. Systems Innate and Adaptive Immunity in the Infection Process
292
12. Cross-talk network biomarkers of pathogenhost interaction network from innate to adaptive immunity
12.3.5 The cross-talk network biomarkers in the pathogenhost domain In the pathogenhost domain, we also selected 10 significant proteins from both the host and pathogen. These interspecies proteins form cross talks that are more complicated than those in the pathogenpathogen domain (Fig. 12.5). A possible molecular mechanism for the correlation between redox status in the host and pathogen is shown in the pathogenhost domain, that is, the interaction between thioredoxin (txn) and ribo-nucleotide reductase 1 (RNR1). In addition to its role in the redox status, RNR1 could also influence the iron utility, filament growth, and cell cycle of C. albicans. This implies that the effect of redox status on the pathogen is multifaceted. Compared with chemokine-related cellular functions in the hosthost domain, the role of chemokine-related cellular functions in the PHIs of the pathogenhost domain is more interesting. CAG1, one protein involved in how chemokine-related cellular functions affect the pathogen, is related to the hyphal growth, mating, and biofilm formation of the pathogen, which are all important in pathogenesis. In contrast to the appearance of redox status and chemokines in the pathogenpathogen and hosthost domains, respectively, the gene transcription and the circadian clock have only been seen in the pathogenhost domain. Interactions
FIGURE 12.5 The cross-talk network biomarkers in the pathogenhost domain. Redox status, circadian rhythm, gene transcriptions, and chemokines are the four major cross-talk network biomarkers in the pathogenhost domain. The shadowed nodes represent the proteins with the 10 highest IVSs in the pathogenhost domain. The round and square nodes indicate the pathogen and host proteins, respectively. IVSs, Interaction variation scores.
IV. Systems Innate and Adaptive Immunity in the Infection Process
12.3 Investigating PH-PPINs for cross-talk network markers from innate to adaptive immunity
293
between TAF60, gtf2a2, and polr2e are also found to emerge in the adaptive immunity. TAF60, a transcription factor, is responsible for the drug responses in the pathogen, and gtf2a2 and polr2e are related to the gene transcription in the host. Their interactions indicate a possible molecular mechanism as to how the PHIs affect the gene expression level. In addition to the gene transcription, the circadian clock has an interesting cellular function in the host and pathogen. The circadian clock-related proteins of the host (cry2a, cry2b, and per2) and pathogen (HRR25) form a subnetwork in the hostpathogen domain. The circadian rhythms in the host and pathogen are correlated and numerous cellular functions of the pathogen (yeast-hyphal switch, gene transcription, pathogenesis, etc.) are affected through HRR25. In summary, we found that the proteins of the most variable interactions in the IDN are the key elements related to chemokines, angiogenesis, coagulation, redox status, pathogen expansion, gene transcription, and circadian clock functions: that is, the so-called crosstalk network biomarkers. Thus these cross-talk network biomarkers change significantly in the transition from innate to adaptive immunity in the infection process and are potential drug targets for treatment and vaccination. To further evaluate the plausibility of the cross-talk network biomarkers, we selected the angiogenesis, coagulation, redox status, and circadian clock due to their systemic influence and investigated the interplay between these biomarkers based on the IDN.
12.3.6 The interplay among the cross-talk network biomarkers In addition to the interplay among individual proteins, a more interesting and important aspect is the interplay among the cross-talk network biomarkers in the host and pathogen. After we determined the functions of the proteins with the highest interaction variations in the different domains (see previous section), several cross-talk network biomarkers emerged, that is, circadian clock, redox status, coagulation, and angiogenesis. In addition, the roles of these functional modules have been reported in several studies [562565]. However, there is little knowledge about the interplay among those cross-talk network biomarkers, which manifests the linkage between clinical symptoms and perturbations of molecular functions. In particular, how the PHI variations influence the outcomes of the infection process is still unclear. In this section, considering the results of the previous section, we focus on the interplay of the four cross-talk network biomarkers (i.e., circadian clock, redox status, coagulation, and angiogenesis). The circadian clock and redox status cross-talk network biomarkers form a subnetwork, as do the angiogenesis and coagulation cross-talk network biomarkers (the rightmost and leftmost layers in Fig. 12.6). The circadian clock-related host proteins (arntl1b, cry2a, cry2b, per1b, and per2) link to each other and all link to HRR25, the circadian clock-related pathogen protein, which, in turn, links to DOG1. DOG1 is proven to play a role in the high oxidative stress tolerance. Then, two redox status-related host proteins, mapk12b and txn, have connections to HOG1 and RNR1, respectively: RNR1 impacts the pathogen expansion and HOG1 is essential in the oxidative stress response and chlamydospore formation in C. albicans. Thus the redox status cross-talk network biomarkers of the host and pathogen correlate with the circadian clock cross-talk biomarkers of the host and pathogen (see the subnetwork in the leftmost layer in Fig. 12.6).
IV. Systems Innate and Adaptive Immunity in the Infection Process
294
12. Cross-talk network biomarkers of pathogenhost interaction network from innate to adaptive immunity
FIGURE 12.6 The interplay between cross-talk network biomarkers and pathogenic functions. The subnetworks were extracted from the IDN in Fig. 12.3. The cross-talk network biomarkers circadian clock and redox status, and angiogenesis and coagulation are located on the left and right sides, respectively. These cross-talk network biomarkers interact with the pathogenic functional modules through a layer of proteins that are related to energy consumption and epigenetic modifications. IDN, Interaction difference network.
On the other side, among the host proteins (agt, f2, f2rl3, notch1b, and serpinc1), agt and notch1b are related to the angiogenesis, and f2, f2rl3, and serpinc1 are related to the coagulation. Angiogenesis can be activated and suppressed by a variety of immunological factors, for example, IL6, IL17, IFN, IL12, IL21, and IL27 [563], and has been implicated in the pathogenesis [566]. In addition, coagulation is necessary for the defense mechanisms in the infectious lung disease [567]. These two cross-talk network biomarkers are linked and the interactions are enhanced in innate versus adaptive immunity (see the subnetwork at the rightmost layer in Fig. 12.6). The two subnetworks in the right- and leftmost layer of Fig. 12.6 each link to distinct proteins. The proteins that interact with the circadian clock and redox status cross-talk network biomarkers (pygb, gys2, and azi1 in the host; PHR2, DCP2, RNR1, PUF3, and HOG1 in the pathogen) are mainly linked to energy consumption (phosphatases and glycosidases) and transcriptional regulation. The proteins that interact with the coagulation and angiogenesis cross-talk network biomarkers are associated mainly with the function of chemokine (ccl-c5a, si:dkey-269d20.3), the histone deacetylation (RPD31), and the G proteincoupled receptor (gnai1). In the central layers the effects exerted by the four cross-talk network biomarkers converge onto the pathogenesis, responses to environmental stress, and cell cycle in the pathogen (the center of Fig. 12.6). In summary, we computationally established the links between the important cellular functions in the pathogen (pathogenesis, responses to environmental stress, and cell cycle) and
IV. Systems Innate and Adaptive Immunity in the Infection Process
12.4 Discussion and conclusion
295
several systemic cross-talk network biomarkers (circadian rhythm, redox status, angiogenesis, and coagulation) in the host and pathogen, which may underpin the molecular mechanisms of clinical symptoms of candidiasis.
12.4 Discussion and conclusion We have presented PH-PPINs that were generated by dynamic interaction models and two-sided microarray data during the innate and adaptive response periods. The dramatic changes in the number of PHIs from the innate to adaptive immunity (145 interactions in innate immunity, 96 interactions in adaptive immunity, and 36 shared interactions) and almost the same nodes appearing in both the innate and adaptive immunity suggest that the strategy used by the pathogen could be characterized by the use of almost the same subset of proteins to respond to the two different defense mechanisms of the host (i.e., innate and adaptive immunity) but with different interactions. On the other hand, although the strategies used by the host were quite subtle, we can find that the expression patterns of the host genes change from the innate to adaptive immunity based on the temporal gene expression profiles (Fig. 12.2A). Once the lack of PPI information for zebrafish is remedied, the resultant PH-PPIN can provide a further insight into the responding strategies of the host and pathogen. Therefore we are able to quantitatively investigate the interaction variations from the innate to adaptive immunity in the infection process by following the clues regarding the variations in the number of interaction and the interaction strengths identified in the dynamic interaction models and two-sided microarray data. To focus our investigation over a smaller and more meaningful range, we have utilized IVS to evaluate the average interaction variation of a protein in the transition from the innate to adaptive immunity in the infection process. The IVS could rule out the possibility of a large IVS being caused by many small interaction variations, a weakness of the carcinogenesis relevance value of [568]. Hence, the IVS could better focus on proteins with large interaction variations. Further, we could visualize the interaction difference matrix from the innate to adaptive immunity as an IDN (Fig. 12.3B), which can be divided into three domains according to the types of interactions involved. For the three domains, we focused on the proteins with the 10 highest IVSs to investigate their interaction variations and identify the cross-talk network biomarkers. Not surprisingly, several immune-related and pathogenic cross-talk network biomarkers have emerged: chemokines, cytokines, the complement system, pathogen expansion, and redox status. Nevertheless, three additional cross-talk network biomarkers—circadian clock, angiogenesis, and coagulation—have been found for the larger interaction variations of their components. Although these cellular functions are not totally new in the immunity research, cross talk among these cross-talk network biomarkers is a novel contribution of this chapter. In particular, the influences of circadian clock, redox status, angiogenesis, and coagulation are systemic. The samples and sampling time points of the microarray data could provide us an opportunity to gain more insight into the pathogenic and defensive mechanisms of how these systemic crosstalk network biomarkers interact. The whole fish body samples could provide a holistic view of the systemic variations of the transcriptomes from the innate to adaptive immune response. The observation windows of the microarray experiments (Fig. 12.1A) could
IV. Systems Innate and Adaptive Immunity in the Infection Process
296
12. Cross-talk network biomarkers of pathogenhost interaction network from innate to adaptive immunity
reveal the involvement of the circadian clock in innate and adaptive immunity, which may be concealed if there are not enough sampling points over several days. Thus we could identify several significant proteins and cross-talk network biomarkers in the three domains based on their larger interaction variations from the innate to adaptive immunity and then explore them by taking a closer look at the IDN. In the infection process the host and pathogen are indispensable. In line with this idea, the PHIs are critical in the pathogenesis of infectious diseases and we could find the connections between pathogenic functions (including the functions used by pathogen to respond to the stresses and to proliferate) and the identified systemic cross-talk network biomarkers. In Fig. 12.6 the identified cross-talk network biomarkers could converge to the pathogenic cellular functions. This may provide a more insight into the different molecular mechanisms by which a pathogen interacts with host defensive mechanisms and a host clears a pathogen in cases with or without immunological memory. Chemokines are related to the angiogenesis and coagulation cross-talk network biomarkers (see Fig. 12.4A). These cross-talk network biomarkers could induce some epigenetic control on the gene transcriptions of the host and pathogen (see Figs. 12.5 and 12.6). Then, the pathogenic cellular functions, for example, budding and filament growth, could be activated to promote the transformation of the yeast to hyphal form, a significant marker of pathogenesis. Meanwhile, C. albicans could proliferate and expand their territory. Moreover, the responses of the pathogen to the environmental stresses could increase energy consumption and then affect the internal redox status of the pathogen. In addition, the redox status cross-talk network biomarker of the host could respond to the pathogenic functions. Finally, the circadian clock cross-talk network biomarker could be recruited into the PH-PPINs. The host circadian proteins (arntl1b, cry2a, cry2b, per1b, and per2) and pathogen circadian protein (HRR25) are tightly connected. In summary, our findings could underpin the criticality of the circadian clock cross-talk network biomarker in terms of the type of immune response generated by an organism [565] and further show how the circadian clock, redox status, angiogenesis, and coagulation cross-talk network biomarkers are tightly coupled with pathogenesis and the host immune systems. This could provide an opportunity to design new and efficient therapeutic guidelines for network biomarkers of drug targets and the time window for treatments.
IV. Systems Innate and Adaptive Immunity in the Infection Process
C H A P T E R
13 The coordination of defensive and offensive molecular mechanisms in the innate and adaptive hostpathogen interaction networks 13.1 Introduction The research of hostpathogen interactions (HPIs) has been recently highlighted in the infection process [320,369,556,569]. However, the gap between infection-activated molecular mechanisms and physiological phenomena limits the change of the knowledge from HPIs to biomedical applications [570,571]. Hence, the dual transcriptome data are used to simultaneously record the temporal gene expression profiles of the host (zebrafish) and pathogen (Candida albicans) during the innate and adaptive phases of the C. albicans infection. These infectious experiments can allow the analysis of the coordination of host and pathogen defensive and offensive molecular mechanisms in both phases of infection. In particular, dynamic hostpathogen proteinprotein interaction (PPI) networks (HP-PPINs) can be used to bridge the gap between the infection-activated molecular mechanisms and the physiological phenomena. Moreover, these dynamic HP-PPINs can quantitatively delineate the effects of current protein levels on the expression of other proteins [493] and can, therefore, be used to characterize the defensive and offensive molecular mechanisms behind the network interactions of host and pathogen proteins during the infection process. Hence, how to relate the infection-activated molecular mechanisms to the physiological phenomena using dynamic HP-PPINs may impetus the biomedical applications from the investigations of HPIs. The infection process has been described as a battle or tug of war between host and pathogen [572,573]. From the host perspective, the innate and adaptive immunities are sequentially activated from pathogen exposure to disease recovery and considered as two major phases of the battle. At the beginning, innate immunity mediates the first line of host defensive molecular mechanisms, including the pathogen recognition via the actions of several cell types such as macrophages, dendritic cells, and natural killer cells
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00014-6
297
© 2021 Elsevier Inc. All rights reserved.
298
13. The coordination of defensive and offensive molecular mechanisms
that are to be recruited to the sites of infection to eliminate pathogens. The recognition of pathogen-associated and/or damage-associated molecular patterns by pattern recognition receptors (PRRs), including toll-like receptors and C-type lectin receptors, can be viewed as an origin of the following complex molecular events in the infection process [364,552]. PRRs initiate the downstream signaling pathways that could activate the innate immune system to get rid of pathogens through the production and secretion of cytokines, chemokines, and chemo tactic cues that recruit more leukocytes [400]. Consequently, macrophages and dendritic cells process and present antigens to T cells and induce the adaptive phase. During this phase, B and T cells specifically and efficiently get rid of pathogens by producing antibodies and inducing specific types of cells to attack pathogens [574]. In order to react with these distinct molecular interactions with the pathogens during the two phases of the infection process, pathogens probably respond with two corresponding molecular strategies, although these molecular strategies are still poorly understood. From the pathogen perspective, the molecular mechanisms involved in the resource acquisition and utilization for offensive cellular functions are much clearer in C. albicans than in zebrafish. Nevertheless, both pathogens and hosts need resources to support their vital functions, leading to competition under the conditions of resource limitation in infected hosts. Among such resources, iron is an important nutrient for pathogenic microbes and plays crucial roles in multiple cellular processes [575]. Consequently, C. albicans have some strategies for obtaining iron from specific host molecules, leading to virulence and diseases [576]. Glucose also plays important roles as a carbon and energy source and as a morphogen that affects the yeast-to-hyphae transition, which is a crucial determinant of the optimal virulence in the host [577581]. The mechanisms of iron and glucose competition in pathogens have been qualitatively analyzed in terms of iron-mediated gene expression [10,242]. Nevertheless, in this study, dual transcriptome data of host and pathogen and a dynamic interaction model can enable the quantitative descriptions of molecular mechanisms associated with pathogen resource competition and interspecies cross talk with host counterparts from the systems biology perspective during innate and adaptive phases. Furthermore, the corresponding host responses to pathogen resource competition, which are less studied, are further identified and analyzed systematically based on the present innate and adaptive HPPPINs construction. Defenses and offenses of host and pathogen have been the typical study objects of infectious diseases [582,583], but the coordination of defenses and offenses in each phase with other atypical molecular mechanisms remains poorly characterized. Therefore after the defensive and offensive molecular mechanisms of the host and pathogen are identified during the innate and adaptive phases, the cross talk between these molecular mechanisms can be further investigated by observing and comparing the interaction strengths between the proteins and cellular functions in HPPPINs. In the innate pathogen exposure phase, host molecular mechanisms lack the specificity for the pathogen, in this case C. albicans. Nevertheless, immunological memory of the initial pathogen challenge can modulate subsequent specific host molecular mechanisms, although the ensuing coordination of host molecular mechanisms has not been made clear. Hence, in this chapter, we first propose a systematic method for the quantitative analysis of interaction strengths in the constructed PPI networks, then examine the defensive and offensive molecular mechanisms of the host and pathogen during the
IV. Systems Innate and Adaptive Immunity in the Infection Process
13.2 Materials and methods to coordinate defensive molecular mechanisms in innate and adaptive hostpathogen networks
299
innate and adaptive phases, and finally investigate interspecies and intraspecies cross talk (Fig. 13.1). Afterward, host subnetworks adjacent to HPIs indicate that neuroimmune molecular mechanisms may also be mediated by immunological memory. Therefore the comparisons between inferred PPI networks in both phases can allow the division of hostpathogen dynamic interaction systems into innate and adaptive loops. These observations of cross talk between known pathways indicate the mechanisms by which immunological memory in the adaptive loop can coordinate host molecular mechanisms to achieve specific defense against the pathogen. Especially, pathogens can enhance intraspecific cross talk and abrogate host apoptosis to cope with the enhanced host defensive mechanisms during adaptive phase. These observations of hostpathogen dynamic interaction systems may form the basis for future biomedical applications.
13.2 Materials and methods to coordinate defensive molecular mechanisms in innate and adaptive hostpathogen networks 13.2.1 Transcriptome datasets Transcriptome datasets include the simultaneously recorded temporal expression profiles of zebrafish and C. albicans during the innate phase immediately after the first pathogen exposure (GSE32119 [8,374]) and the simultaneously recorded temporal expression profiles of zebrafish and C. albicans during the adaptive response to the secondary pathogen exposure (GSE51603 [10]). Details of experimental procedures are described in a previous study [320]. In the first dataset, C. albicans (SC5314 strain) is intraperitoneally injected into adult AB-strain zebrafish (first exposure), and microarray experiments are performed to simultaneously profile genome-wide expression in C. albicans and zebrafish during the innate phase of the infection process. The second dataset includes the genome-wide expression data for C. albicans and zebrafish during the adaptive phase after the secondary exposure (14 days after the first exposure) to C. albicans. Then, a two-step homogenized mRNA extraction procedure is performed using the whole C. albicans-infected zebrafish. This approach can provide separate pools of gene transcripts from hosts and pathogens, and the individual estimates of the corresponding specific gene expression profiles of sequence-targeted probes are derived from individual genomes. Agilent in situ oligonucleotide microarrays cover 6202 and 26,206 genes for C. albicans and zebrafish, respectively, and are used to record the temporal gene expressions. The first dataset comprised three replicates of host and pathogen gene expression data from 0.5, 1, 2, 4, 6, 8, 12, 16, and 18 h postinjection, and the second dataset comprises two replicates of host and pathogen gene profiles from 2, 6, 12, 18, 24, 30, 36, and 42 h postreinjection.
13.2.2 Dynamic proteinprotein interaction model To construct a dynamic PPI network by systems biology method, a candidate network, including PPIs, is first built from mining several PPI databases (BioGRID [584], STRING [585], and REACTOME [540]), and the false-positive interactions are then eliminated to obtain the resulting PPI networks based on temporal microarray data, the dynamic PPI model, and a model order detection method.
IV. Systems Innate and Adaptive Immunity in the Infection Process
FIGURE 13.1 Overview of the dynamic HP-PPIN construction and analysis procedure. (A) The zebrafishC. albicans candidate PPI network is constructed based on PPI information from BioGRID, STRING, and REACTOME databases and ortholog information from the InParanoid database. (B and C) Dual transcriptome data from innate and adaptive phases are used to prune the candidate network, identify interaction strengths between proteins in the dynamic interaction model, and to obtain dynamic innate and adaptive HP-PPINs, respectively. Blue and green nodes represent host and pathogen proteins, respectively. (D) Functional enrichment analyses revealed defensive, offensive, and atypical functions in host and pathogen [594]. HP-PPIN, Hostpathogen proteinprotein interaction network; PPIN, hostpathogen proteinprotein interaction.
13.2 Materials and methods to coordinate defensive molecular mechanisms in innate and adaptive hostpathogen networks
301
13.2.3 Candidate proteinprotein interaction network To construct a candidate PPI network, interaction information for zebrafishzebrafish, C. albicansC. albicans, and zebrafishC. albicans protein pairs will be recruited at first. Nevertheless, incomplete information for all three interaction types results in incomplete candidate networks. Further, it is impossible to consider all interactions between proteins due to the computational complexity. Thus the interaction information from human and yeast (Saccharomyces cerevisiae), which bear genetic similarities with zebrafish and C. albicans, are used to fill information gaps. To infer the candidate interactions of zebrafish and C. albicans, ortholog information from InParanoid [375] is employed to convert the interactions of human and yeast proteins into those of zebrafish and C. albicans proteins (Fig. 13.1A). Among these, 1216 pathogen proteins and 1087 host proteins are included in the candidate PPI network, comprising 5347 hosthost interactions, 3634 HPIs, and 16,622 pathogenpathogen interactions. Interactions collected from databases and inferred by the ortholog-based methods are derived under various experimental conditions and may not be present during the C. albicans infection process. Consequently, these false-positive interactions in candidate PPI network should be validated and removed to obtain the real PPI network using the corresponding experimental data and the model order detection method (Fig. 13.1B). Therefore to prune false-positive PPI information in the candidate network, system identification techniques are employed with microarray data. After deleting false-positive interactions from the candidate PPI network, dynamic HP-PPINs in innate and adaptive phases are then generated using dual transcriptome data of host and pathogen and the following dynamic interaction model.
13.2.4 Dynamic interaction model and model order detection method Total numbers of host and pathogen proteins are denoted as N and M, respectively, and the dynamic interaction model of a host protein i in the HP-PPIN is described as follows [6]: ðhÞ ðhÞ pðhÞ i ½k 1 1 5 σ i pi ½k 1
1
M X
N X
ðhÞ ðhÞ aðhÞ in pn ½kpi ½k
n51 ðpÞ ðhÞ γ im pm ½kpðhÞ i ½k 1 β i
(13.1) 1 εðhÞ i ½k 1 1
m51
where
pði hÞ ½k
denotes the expression of host protein i at time k, εðhÞ i ½k denotes the measure-
ment noise at time k, σði hÞ indicates the transition ability of the current (at time k) to the
future (at time k 1 1) expression level of the host protein i, and αðinhÞ indicates the interaction strength between host proteins n and i. Hence, if there is no interaction between host protein n and i, αðinhÞ 5 0. We also assume that there is no self-interaction of protein i (αðiihÞ 5 0). γim indicates the interaction strength between pathogen protein m and host protein i. If there is no interaction between pathogen protein m and host protein i, γ im 5 0. β ði hÞ indicates the basal level of the host protein i. The term “transition ability” is coined to evaluate the impact of current protein levels (pði hÞ ½k) on future protein levels (pði hÞ ½k 1 1). Similarly, multiple factors, such as protein degradation rates, can affect the transition abilities, and
IV. Systems Innate and Adaptive Immunity in the Infection Process
302
13. The coordination of defensive and offensive molecular mechanisms
the dynamic interaction model of a pathogen protein j in the HP-PPIN can be described as follows: ðpÞ
ðpÞ ðpÞ
pj ½k 1 1 5 σj pj ½k 1
M X
ðpÞ ðpÞ
ðpÞ
αjm pm ½kpj ½k 1
m51
N X
ðpÞ
ðpÞ
ðpÞ
γ jn pðhÞ n ½kpj ½k 1 β j 1 εj ½k 1 1
(13.2)
n51
ðpÞ
ðpÞ
where pj ½k denotes the expression of the pathogen protein j at time k, εj ½k denotes the ðpÞ environmental noise at time k, σj indicates the transition ability of the current ðpÞ (at time k) to the future (at time k 1 1) protein level of the pathogen protein j, αjm indicates the interaction strength between pathogen proteins m and j (no interaction between pathoðpÞ ðpÞ gen protein m and pathogen protein j, αjm 5 0; αjj 5 0), γ jn indicates the interaction strength between host protein n and pathogen protein j (no interaction between host protein n and ðpÞ pathogen protein j, γ jn 5 0), and β j indicates the basal pathogen protein j expression. The biological relevance of the dynamic interaction model also follows the determination of the host (pathogen) protein i(j) expression in the future (at time k 1 1) according to current ðpÞ (at time k) protein expression with transition ability σðhÞ i ðσj Þ, the interaction between host (pathogen) protein i(j) and other host (pathogen) proteins with interaction strengths ðpÞ
αðinhÞ ðαjm Þ, the interaction between host (pathogen) protein i(j) and other pathogen (host) proðpÞ
teins with interaction strengths γ im(γ jn), its basal level β ðhÞ i ðβ j Þ, and the measurement noise ðpÞ
εðhÞ ðεj Þ. Therefore the dynamic interaction model for host protein i of K 1 1 time points i (k 5 1 ,. . . , K 1 1) can be further rewritten as the following regression form: ðhÞ ðhÞ ðhÞ pðhÞ i 5 Φi θi 1 εi
h pðhÞ 5 pðhÞ i i ½2 ?
where
(13.3)
pðhÞ i ½K11
iT
;
h iT ðhÞ ðhÞ 5 ; εðhÞ ½2 ? ε ½K11 ε i i i h iT ðhÞ ðhÞ ðhÞ ; θðhÞ γ i1 ?γ iM σðhÞ β i 5 αi1 ?αiN i i and
2
pðhÞ ½1pðhÞ i ½1 6 1 ðhÞ Φi 5 4 ^ ðhÞ pðhÞ 1 ½Kpi ½K
ðpÞ
ðhÞ ? pðhÞ p1 ½1pðhÞ N ½1pi ½1 i ½1 & ^ ^ ðpÞ ðhÞ ðhÞ ? pðhÞ N ½Kpi ½K p1 ½Kpi ½K
ðpÞ
? pM ½1pðhÞ i ½1 & ^ ðpÞ ? pM ½KpðhÞ i ½K
3 pðhÞ ½1 1 i 7 ^ ^5 ðhÞ pi ½K 1
The dynamic model for pathogen protein j of K 1 1 time points (k 5 1, . . . , K 1 1) can also be rewritten into a similar regression form: ðpÞ
ðpÞ ðpÞ
ðpÞ
pj 5 Φj θj 1 εj
(13.4)
where h ðpÞ ðpÞ pj 5 pj ½2 ?
ðpÞ
pj ½K11
iT
;
IV. Systems Innate and Adaptive Immunity in the Infection Process
13.2 Materials and methods to coordinate defensive molecular mechanisms in innate and adaptive hostpathogen networks
303
h iT ðpÞ ðpÞ ðpÞ εj 5 εj ½2 ? εj ½K11 ; h ðpÞ ðpÞ ðpÞ θi 5 αj1 ?αjN and
2
ðpÞ
ðpÞ
p ½1pj ½1 ? 6 1 ðpÞ ^ & Φj 5 4 ðpÞ ðpÞ p1 ½Kpj ½K ?
ðpÞ
ðpÞ
γ j1 ?γ jN
ðpÞ
ðpÞ
σj
ðpÞ
βj
pM ½1pj ½1 pðhÞ ? 1 ½1pi ½1 ^ ^ & ðpÞ ðpÞ ðpÞ pM ½Kpj ½K pðhÞ ? 1 ½Kpi ½K
iT
;
ðpÞ
ðpÞ
pðhÞ pj ½1 N ½1pj ½1 ^ ^ ðpÞ ðpÞ pðhÞ ½Kp ½K p N j j ½K
3 1 7 ^5 1
ðp
The unknown parameter θði hÞ ðθj ÞÞ can then be estimated using the least-squares estimation method in chapter 2. These constrainedleast-squares problems are solved using the lsqlin function in MATLAB with the active-set algorithm. The RSS for each gene will be calculated and plotted with the distribution of RSS. Some examples of the comparison between measurement and estimation will be presented in Fig. 13.A1 in Appendix. Because of the unavailability of host and pathogen protein expressions, gene expressions are measured using dual transcriptome data as a substitute for protein levels to estimate ðpÞ the parameter θðhÞ i ðθj Þ in dynamic interaction models. Although protein and gene expression levels do not always correspond accurately, gene expressions are reasonably used to represent protein expressions in the parameter estimation problem of Eqs. (13.3) and (13.4) [586]. Before parameter estimating, expression data are interpolated to avoid over fitting using cubic spline data interpolation. To limit unnecessarily complex, Akaike information criterion (AIC) is introduced to detect model order (numbers of interactions) during the parameter estimation of dynamic HP-PPINs in (13.1) and (13.2). For each host protein i with N0 interacting host proteins and M0 interacting pathogen proteins, the AIC value of its dynamic interaction model can be calculated as follows [40]: AICi ðN 0 1 M0 Þ 5 log ðhÞ θ^ i
1 2ðN 0 1 M0 Þ ðhÞ ^ ðhÞ 2 :pðhÞ i 2Φi θi :2 1 K11 K11
(13.5)
where is the estimated parameters under the assumption that there are N0 interacting ðhÞ host thogen proteins. K 1 1 is the size of data used to estimate θ^ i . For each pathogen protein j with N0 interacting host proteins and M0 interacting pathogen proteins, the AIC value of its dynamic interaction model can be calculated as follows: AICj ðN 0 1 M0 Þ 5 log ðpÞ
1 2ðN 0 1 M0 Þ ðpÞ ðpÞ ðpÞ 2 :pj 2Φj θ^ j :2 1 K11 K11
(13.6)
where θ^ j denotes the estimated parameter assuming N0 interacting host proteins and M0 ðpÞ interacting pathogen proteins. K 1 1 is the size of the data used to estimate θ^ j . The model order (numbers of interactions) with the minimum AIC value is considered as a criterion to prune false-positive interactions in the candidate HP-PPIN. Specifically, insignificant estimated interaction strengths out of model order are considered as false-positive interactions and are to be pruned from the candidate HP-PPIN to obtain the realistic HP-PPINs. Hence, the resulting dynamic HP-PPINs comprise dynamic HPI models with model orders
IV. Systems Innate and Adaptive Immunity in the Infection Process
304
13. The coordination of defensive and offensive molecular mechanisms
of the minimum AIC value. Finally, after identifying the parameters in dynamic interaction models for each host and pathogen protein based on the two-side transcriptome data of host and pathogen from innate and adaptive phases, the identified interaction paraðpÞ meters (αðhÞ in ; αjm , γ im, and γ jn) can complete the resulting dynamic innate and adaptive HP-PPIN (Fig. 13.1C).
13.3 Defensive and offensive molecular mechanisms based on the innate and adaptive HP-PPINs 13.3.1 Overview of dynamic innate and adaptive hostpathogen proteinprotein interaction networks In this chapter, two dynamic PPINs are constructed for the biologically related conditions: the innate and adaptive phases. Dual transcriptome data from host and pathogen in each phase are used to estimate interaction strengths between proteins in dynamic innate and adaptive HP-PPINs. In these analyses the magnitudes of edges in the constructed HPPPIs (interaction strengths) are used as quantitative estimates of the effects of one protein on its interacting proteins, the edge changes implying that the constructed HP-PPIs are dynamic systems. Table 13.1 summarizes the basic information for these dynamic HP-PPIs and includes the numbers of nodes and edges in dynamic innate and adaptive HP-PPINs (Fig. 13.1C). Although most of host and pathogen proteins are common to innate and adaptive phases, their interactions differ significantly between the two phases. Hence, both host and pathogen use differing interactions between similar sets of proteins to respond to challenges during the innate and adaptive phases. Thus we can perform GO annotation and functional enrichment analyses of the proteins in the constructed HP-PPIs (Fig. 13.1D) to identify the main biological processes involved in the innate and adaptive phases (Fig. 13.2). During the innate phase, all the interspecies interactions are negative, indicating that the host and pathogen could inhibit each other. In contrast, some positive interspecies interactions are identified in the adaptive phase, indicating the enhanced host offenses that are specific to C. albicans. In addition, TABLE 13.1 Node and edge information of dynamic innate and adaptive hostpathogen proteinprotein interaction networks. Node
Innate specific
Commom
Adaptive specific
Host
55
856
130
Pathogen
30
1102
77
Edge
Innate specific
Commom
Adaptive specific
Hosthost
981
633
865
Hostpathogen
570
155
374
Pathogenpathogen
2356
714
1664
IV. Systems Innate and Adaptive Immunity in the Infection Process
13.3 Defensive and offensive molecular mechanisms based on the innate and adaptive HP-PPINs
305
FIGURE 13.2 HP-PPINs of biological processes enriched in innate and adaptive phases. Innate (A) and adaptive (B) phase-specific networks of enriched biological processes are generated from dynamic innate and adaptive HP-PPINs (Fig. 13.1C), respectively. Node sizes of biological processes indicate the numbers of included proteins. Blue and green nodes represent host and pathogen biological processes, respectively, and red and blue edges represent positive and negative interaction strengths between corresponding connecting processes, respectively. Darker edges indicate larger absolute interaction strengths between two biological processes. The cellular functions in open brackets are subjects of subsequent analyses [594]. HP-PPIN, Hostpathogen proteinprotein interaction network.
some atypical cellular functions relating to immunity are indicated in the host, namely, gonadotropin-releasing hormone receptor pathway, Parkinson’s disease, and circadian clock systems. These neuroimmune functions of the host may be modulated by the immunological memory [587590] in addition to typical host immunerelated cellular functions such as inflammation, integrin signaling, and angiogenesis. Consequently, the further examination of the HP-PPINs of biological processes enriched in innate and adaptive phases may point out the changes required to coordinate host and pathogen molecular mechanisms from innate to adaptive phases.
13.3.2 Interspecies cross talk between host immunerelated molecular mechanisms and their pathogen counterparts Hosts can activate and coordinate innate and/or adaptive immunerelated components immediately after pathogen invasion according to their previous experiences of pathogen exposure. Moreover, both innate and adaptive immune systems could act as defensive mechanisms against pathogen invasion and contain several coordinated molecular mechanisms, including angiogenesis, inflammation, and integrin signaling. Nevertheless, the coordination of these defensive molecular mechanisms with pathogen counterparts is less addressed. That is, the pathogen functions interacting with host defensive mechanisms and the interaction types are poorly known. Thus to get an insight into interactions among immune-related cellular functions and with the pathogen, 120 and 126 host proteins are initially selected from constructed dynamic innate and adaptive HP-PPINs, respectively, based on GO annotations Gene Ontology (GO:0002376, immune system process). Accordingly, pathogen counterparts to the immune-
IV. Systems Innate and Adaptive Immunity in the Infection Process
306
13. The coordination of defensive and offensive molecular mechanisms
related proteins are defined as the ensemble of pathogen proteins with direct connections to immune-related proteins. These host and pathogen proteins and their interactions are further examined in the following analysis. Only 4 and 10 host proteins are found to be specific to the innate and adaptive phases, respectively, and 116 host proteins are found to be common to both. These host immunerelated proteins are then divided into several cellular functions according to the PANTHER classification system [464] as follows: angiogenesis, inflammation mediated by chemokine and cytokines, notch, interferon-γ (IFN-γ), integrin, toll receptor, blood coagulation, α-adrenergic receptor (α-AR), TGF-β, and apoptosis signaling pathways. Moreover, the corresponding proteins in the pathogen counterpart are also classified into transferase, transporter, oxidoreductase, and hydrolase activities based on GO annotations. Accordingly, these 14 functions (10 in the host and 4 in the pathogen) are organized into subnetworks by summarizing the interaction strengths between members of these functions in the innate and adaptive phases, respectively. Fig. 13.3 can show the connectivity between host immunerelated molecular mechanisms and pathogen counterparts during innate and adaptive phases at cellular functional and molecular levels. During the innate phase, angiogenesis, apoptosis, and notch signaling pathways are host cellular functions with direct pathogen interactions (Fig. 13.3A). Angiogenesis can access the information of C. albicans from multiple growth factor signaling pathways that are responsible for the pathogen recognition and initiation of innate immune responses [591]. Specifically, Notch1b in the angiogenesis and notch signaling pathway can interact negatively with the pathogen transferase Cdc4, which is also involved in the filamentous growth and cell cycle phase transition. Moreover, Fosab in the angiogenesis and apoptosis positively can interact with Cek1, which promotes C. albicans survival under unfavorable conditions [592]. In the apoptosis hub, Hspa8 has multiple interactions with pathogen proteins, with negative interaction strengths between Hspa8 and pathogen glucose transporters (Hgt6) but positive interaction strengths with pathogen transporters for iron (C1_09210C_A), succinate (Sfc1), H1/Ca21 (Vcx1), and PI3P (Vps17; Table 13.2). The role of the differential interactions between the heat shock protein, glucose, and other transporters during the innate phase is still unclear. During the adaptive phase the numbers of host immunerelated cellular functions that can interact with pathogen counterparts are increased in comparison with those during the innate phase (Fig. 13.3C). Moreover, host immunerelated cellular functions, including IFN-γ, TGF-β, and toll receptor signaling pathways, can interact with the pathogen in addition to angiogenesis, apoptosis, and notch signaling pathways. Mapk12b in IFN-γ and toll receptor signaling pathway can positively interact with Hog1, which is a pathogen MAP kinase of osmotic, heavy metal, and core stress responses. Further, in the TGF-β signaling pathway, TANK-binding kinase 1 (Tbk1) can positively interact with C4_03600C_A, which is related to protein sumoylation in yeast and probably inhibits C. albicans growth and adaption [593]. In comparisons of the subnetworks as shown in Fig. 13.3B and D, a repressilator structure among angiogenesis, IFN-γ, and inflammation signaling pathways of host is found to be emerged in both phases. Because the host can exhibit stable oscillations of immune responses through the repressilator structure, immune responses can be maintained in proportion to the stimulus from the pathogen, thus preserving the continuity of other host cellular functions. Consequently, the repressilator can be viewed as a self-protection mechanism of the host under
IV. Systems Innate and Adaptive Immunity in the Infection Process
13.3 Defensive and offensive molecular mechanisms based on the innate and adaptive HP-PPINs
307
FIGURE 13.3 Interspecies cross talk of host immunerelated cellular functions and their pathogen counterparts during innate and adaptive phases at functional and molecular levels. (A) and (C) denote connectivity between the known pathways of innate and adaptive phases, respectively, and red and blue boxes represent positive and negative interaction strengths, respectively, between the members of two pathways. Darker boxes represent the larger absolute interaction strength between two pathways. Names of host and pathogen functions are indicated with the upper and lower case letters, respectively. (B) and (D) denote the subnetworks of host immunerelated functions and their pathogen counterparts during the innate and adaptive phases, respectively, and blue and green nodes represent host and pathogen proteins, respectively. Red and blue edges represent positive and negative interaction strengths, respectively, between connecting proteins [594].
conditions of no stimulation and no immune responses, and in the presence of pathogens, appropriate immune responses could also avoid the unnecessary damage to host tissues. In addition to the repressilator structure, cross talk among pathogen cellular functions is greater during the innate phase than the adaptive phase. This phenomenon may reflect the effect of immunological memory, which can drive changes to the coordination of host immunerelated functions, enabling the specific inhibition of the cross talk between pathogen cellular functions during the adaptive phase. This immunological memory can also enable the apoptotic Hspa8 to interact negatively with a wider range of pathogen transporters than the glucose transporter in the innate phase to cause more substantial restriction of the nutrient availability and pathogen growth. Except transporters, energy metabolism is also affected by host immunerelated
IV. Systems Innate and Adaptive Immunity in the Infection Process
308
13. The coordination of defensive and offensive molecular mechanisms
TABLE 13.2 Downregulated and upregulated interactions between the cellular functions and members of cellular functions in the intraspecific and interspecific cross talk of host immunerelated cellular functions and their pathogen counterparts [594].
Names of cellular functions are indicated in bold (upper case: host and lower case: pathogen). * Indicates the yeast ortholog of Candida albicans protein.
functions, to guarantee a subsequent focus on resource competitionrelated molecular mechanisms of pathogens and their host counterparts.
13.3.3 Interspecies cross talk of pathogen resource competitionrelated molecular mechanisms and their host counterparts Interspecies cross talk involved 10 and 13 genes are related to the iron and glucose competitions, respectively, which were selected from the constructed dynamic HP-PPINs based on the Candida Genome Database (CGD) [377]. The protein products of these 23 genes are involved in the iron and glucose utilization in C. albicans and are hence referred to as resource competitionrelated proteins. Host counterparts are defined as the ensemble of host proteins with direct connections to pathogen resource competitionrelated proteins in the innate and adaptive HP-PPINs and are further divided into cytokinesis, translation, circadian, glycogen, and apoptosis functional groups according to GO annotations. The resulting seven cellular functions are then organized into the subnetworks of innate and adaptive phases. The connectivity between pathogen resource competitionrelated molecular mechanisms and host counterparts in innate and adaptive phases is indicated at cellular functional and molecular levels
IV. Systems Innate and Adaptive Immunity in the Infection Process
13.3 Defensive and offensive molecular mechanisms based on the innate and adaptive HP-PPINs
309
(Fig. 13.4). During the innate phase, both the iron and glucose competition positively interact with cytokinesis, since the iron competition negatively interacts with the host translation, and the glucose competition positively interacts with the host glycogen metabolic processes (Fig. 13.4A). In particular, the endosomal sorting complex Vps28, which is required for the ESCRT-I transport pathway and Pgi1 in the iron and glucose competition, is found to be positively interacted with the host actin protein Actb1 and promoted host cytokinesis (Table 13.3), possibly leading to the failure of cytokinesis under the conditions of limited resources [595]. Besides, among iron competition mechanisms, Sla1 is found to be negatively interacted with the host translation termination protein Gspt1, possibly allowing the disruption of the translation termination process and cell cycle arrest [596]. In agreement, Gsy1 has been reported to promote glycogen biosynthesis and affect the host translation initiation (Ddx18) through the host glycogen metabolic process (Ppp1cab) [597]. During the adaptive phase, host translation processes are not regulated by pathogen resource competitionrelated cellular functions. Since the effects of resource competition
FIGURE 13.4 Interspecies cross talk of pathogen resource competitionrelated cellular functions and host counterparts during innate and adaptive phases at functional and molecular levels. Parts (A) and (C) indicate the connectivity between known pathways in innate and adaptive phases, respectively, and red and blue boxes represent positive and negative interaction strengths between the members of the pathways, respectively. Darker boxes represent larger absolute interaction strengths between pathways. Names of host and pathogen cellular functions are denoted in upper and lower case letters, respectively. Parts (B) and (D) denote the subnetworks of pathogen resource competitionrelated cellular functions and their host counterparts during innate and adaptive phases, respectively. Blue and green nodes represent host and pathogen proteins, respectively, and red and blue edges represent positive and negative interaction strengths, respectively, between connecting proteins [594].
IV. Systems Innate and Adaptive Immunity in the Infection Process
310
13. The coordination of defensive and offensive molecular mechanisms
TABLE 13.3 Downregulated and upregulated interactions between the cellular functions and members of two cellular functions in the intraspecific and interspecies cross talk of pathogen resource competitionrelated cellular functions and host counterparts [594].
Names of functions are indicated in bold (upper case: host and lower case: pathogen).
on the host cytokinesis are reduced, the host apoptosis signaling is also affected by the resource competitionrelated cellular functions (Fig. 13.4C). Snf7 is a component of the endosomal sorting complex that is required for the ESCRT-III transport pathway and negatively interacted with the host actin protein Actb1 to inhibit the host cytokinesis. The heat shock proteins Hspa8 and Hspa5 also negatively interact with pathogen resource competitionrelated cellular functions, suggesting that the pathogen may abrogate host cell apoptosis to achieve a successful invasion during the adaptive phase [598]. In comparison of the two subnetworks of resource competitionrelated proteins and host counterparts (Fig. 13.4B and D), the effects of immunological memory on the coordination of host cytokinesis, translation, and apoptosis are pointed out, and the interactions with pathogen iron and glucose competition are revealed. During the innate phase the pathogen can promote host cytokinesis, compete for host resources, and interfere with host translation processes, and then the resulting pressure on the resource supply will weaken host immunity. Nevertheless, during the adaptive phase, the host has the benefit of immunological memory to avoid the resource restriction. Afterward, pathogen resource competitionrelated molecular mechanisms engage in the intraspecific cross talk to ensure the sufficient resource supply, and then the pathogen blocks the host apoptosis signaling pathway to avoid an attack from host cellular function and achieve a successful invasion [569,598]. These effects of immunological memory on the coordination of defensive and offensive molecular mechanisms in the host and pathogen reveal some relationships between immunological memory and the host cellular functions described earlier.
13.3.4 Impacts of immunological memory on host systems In order to further investigate the impact of immunological memory on host systems, host proteins are ranked according to relative interaction strengths in innate and adaptive
IV. Systems Innate and Adaptive Immunity in the Infection Process
13.3 Defensive and offensive molecular mechanisms based on the innate and adaptive HP-PPINs
311
HP-PPINs, and significant influences of immunological memory on interaction strengths are indicated, too. Moreover, this comparative analysis could distinguish proteins that are affected by the immunological memory, and then the top 10-ranked innate- and adaptivespecific host proteins with direct interactions with pathogens are selected. However, because no significant cellular functions are enriched among the selected proteins, the first nearest neighbors are further considered as follows: among innate-specific host proteins, fibroblast, vascular endothelial, and epidermal growth factor signaling pathways are found to be downregulated, indicating positive interaction strengths during the innate phase and zero strength during the adaptive phase. Further, the apoptosis and circadian clock systems are upregulated, indicating negative interaction strengths during the innate phase and zero strength during the adaptive phase. However, among the adaptive-specific host proteins, the circadian clock system and the pentose phosphate pathway are upregulated (interaction strength: zero in the innate phase and positive in the adaptive phase), and the circadian clock system and the adrenaline biosynthesis are downregulated (interaction strength: zero in the innate phase and negative in the adaptive phase). The pentose phosphate pathway proteins Slc18a2 and Synap23.2 are also involved in the serotonin (5HT) receptor signaling, which could influence circadian rhythms. In addition to the innate- and adaptive-specific host proteins, those proteins involved in common HPIs are further divided into groups of the most and least varied proteins according to the interaction strengths between innate and adaptive phases. Among the most varied proteins, apoptosis and Parkinson’s disease signaling pathways are found to be upregulated, whereas circadian clock and Parkinson’s disease signaling pathways are found to be downregulated. Moreover, among least varied proteins, DNA replication and Parkinson’s disease signaling pathways are found to be upregulated; Parkinson’s disease signaling pathway is found to be downregulated. Consequently, these differing proteins of Parkinson’s disease signaling pathway are downregulated and upregulated, suggesting tight modulation in both the phases [599]. In addition, proteins of α-AR signaling, 5HT receptor signaling, and the circadian clock system are found to be closely related. Consequently, the circadian 5HT production is reportedly regulated by adrenergic signaling [600]; 5HT and circadian systems of the brain have been extensively interconnected [601]; the adrenergic nerves are shown to govern circadian leukocyte recruitment to tissues [602]. Therefore the circadian clock, Parkinson’s disease, 5HT, and adrenergic signaling pathways are important in HPIs and in defensive and offensive cellular functions of hosts and pathogens. Under conditions of poor adaptation of the host system to a specific pathogen, the host can coordinate its molecular responses in the innate loop and form an immunological memory of the pathogen (Fig. 13.5) according to the host system responses and pathogen characteristics during the entry into the adaptive loop. Therefore the coordination of molecular mechanisms (the cellular functions related to the host block in Fig. 13.5) is informed by the corresponding immunological memory, which can operate via an adaptive feedback loop to modulate interactions between host molecular mechanisms. This regulation can enable the host to identify and adapt to pathogen stimuli and furnishes leukocytes with a repertoire of specific antibodies. For the adaptive control systems, a feedback loop is always used to achieve the adaptation when the controlled system dynamics and external disturbances are unknown. Further, similar to the adaptive control system, immunological memory seems to play a role of an adaptive feedback mechanism
IV. Systems Innate and Adaptive Immunity in the Infection Process
312
13. The coordination of defensive and offensive molecular mechanisms
FIGURE 13.5 Schematic structure of dynamic HPI systems. The dynamic HPI system can be divided into innate and adaptive loops. In the innate loop the initial invasion leads to antigen presentation to the host, resource competition, and interference with host cellular functions. The host then defends itself using the innate immune response, and pathogenic antigens are identified. In the subsequent adaptive loop, the immunological memory of pathogenic antigens regulates the coordination of host cellular functions based on the identified antigens and host responses. The coordinated cellular functions listed on the left- and right-hand sides of the dashed line are typical and atypical to immunity, respectively, reflecting systematic immune organization [594]. HPI, Hostpathogen interaction.
of the host system that can modulate host molecular mechanisms by regulating the interaction strengths between host and pathogen proteins. In the following pathogen challenges, those modulated host cellular functions then exhibit more specific and efficient responses against the pathogen, and finally the enhanced intraspecies pathogen cross talk can disrupt host apoptosis to evade host responses. Therefore both defensive and offensive cellular functions of the host and pathogen and neuroimmune cellular functions are modulated by the immunological memory. These neuroimmune cellular functions suggest that nervous and endocrine systems are also coordinated by the immunological memory, and as a result, the following host defensive and offensive molecular mechanisms are coordinated accordingly.
13.4 Discussion The present HP-PPINs can highlight the coordination of host and pathogen defensive and offensive molecular mechanisms during innate and adaptive phases and suggest new directions for HPI studies. Particularly, subnetworks of host immunerelated cellular functions and their pathogen counterparts point out the presence of a repressilator structure comprising angiogenesis, IFN-γ, and inflammation signaling and suggest potential
IV. Systems Innate and Adaptive Immunity in the Infection Process
13.4 Discussion
313
therapeutic strategies for resolving inflammation. Once pathogen invasion is detected, the repressilator structure can be initiated to modulate the durations and magnitudes of inflammatory responses according to the interspecific cross talk with pathogen transferases, host notch, blood coagulation, and α-AR signaling pathways. Consequently, the host can actively regulate the resolution of inflammation, even under the conditions of persistent pathogen stimuli [603]. The apoptosis of activated inflammatory cells is crucial to the resolution of inflammation and is found to be a hub in the present subnetworks (Fig. 13.3). Therefore these apoptotic heat shock proteins play important roles in the interspecies cross talk with pathogen resource transporters. Moreover, the comparisons of innate and adaptive HP-PPINs point out differential interactions between heat shock proteins and pathogen glucose transporters and iron, succinate, PI3P transporters, reflecting the regulation of immunological memory and its effects on strategies for the resource acquisition and utilization. These observations also point out how these pathogen resource competitionrelated cellular functions affect host physiology [604]. In contrast with the molecular mechanisms that are regulated by the immunological memory, the complement system, a part of the innate immune system, is considered as a component without adaption. In agreement, the comparative analyses of innate and adaptive HP-PPINs have shown that the complement system is relatively invariant and has been therefore omitted from the networks shown in Fig. 13.3. Nevertheless, a closer examination of innate and adaptive HP-PPINs (Fig. 13.1C) has shown interactions of the complement system with various signaling pathways, including inflammation, plasminogen activating, EGFR, and integrin signaling pathways. Therefore the complement system may significantly interact with host proteins, because with the exception Pkc1 in C. albicans the first and second nearest neighbors of the complement system in both innate and adaptive HP-PPINs are all host proteins. Nevertheless, the concept of “trained immunity” or “innate immune memory” has been proposed previously [605], warranting a more assessment of the invariance of the complement system in innate and adaptive phases. The subnetwork of pathogen resource competitionrelated cellular functions and their host counterparts has shown the involvement of host translation, cytokinesis, and glycogen metabolisms in the ensuing interspecific cross talk. During the innate phase the pathogen could restrict resource supply to the host by activating cytoskeleton synthesis and thus promoting host glycogenesis and cytokinesis. This coordination may make the host immunity become weak. However, during the adaptive phase, the pathogen respondence could increase the host immune activity by both enhancing the cross talk between iron and glucose competition mechanisms and inhibiting the apoptosis. These processes are probable characteristic of the changes in pathogen offensive strategies from the innate to adaptive phases. Except iron, various micronutrients and trace elements have been recently shown to be involved in the regulation of virulence and transcription in C. albicans, such as copper, zinc, and magnesium. Nevertheless, insufficient cellular function annotations are available for copper (0), zinc (3), and magnesium (0), compared with those for iron (70) in the CGD, warranting further studies of these micronutrients. In addition, competition with the host endogenous microbiome still demands examining using the systems biology approach. The present innate and adaptive HP-PPINs have pointed out the effects of immunological memory on the interspecific and intraspecific cross talk. Specifically, during the innate
IV. Systems Innate and Adaptive Immunity in the Infection Process
314
13. The coordination of defensive and offensive molecular mechanisms
phase, the host can adapt specifically to the pathogen through the antigen presentation on dendritic cells and antibody selection in leukocytes. This immunological memory can allow more powerful and effective responses during subsequent exposures to the same pathogen, leading to the incremental numbers of HPIs and decremental numbers of intraspecific cross talks between pathogen cellular functions. Consequently, various novel predictions are signified by the previously unrecognized cross talk between known pathways, and systematic analyses of the host proteins involved comprise a new research direction [606,607]. Even the proteins in the most and least varied groups are common to both dynamic innate and adaptive HP-PPINs, differences in interaction strengths are tending to suggest the roles of proteins in HPIs. In particular, proteins in the least varied group are involved in the core conserved molecular mechanisms of innate and adaptive phases, but those with large changes (from positive to negative interaction strengths or vice versa) could provide more specific and effective responses against the pathogen in accordance with immunological memory. Further, neuroimmune cellular functions such as circadian clock, Parkinson’s disease, 5HT, and adrenergic signaling pathways are found to be related to the interspecific cross talk and can affect the infection process and be regulated during the adaptation of the pathogen and evolution of immunological memory. Nevertheless, in the present constructed HP-PPINs, many genes are found to lack in specific cellular function annotations, thus limiting the present interpretations. Hence, more evidence for cellular function annotations may lead to the identification of new cellular functions that have potential to affect the host immunity and will validate the present connections between hosts and pathogens. Summarily, Fig. 13.5 depicts the schematic structure of dynamic HPI systems based on the observations at cellular function and molecular levels. The schematic structure emphasizes the dynamic system viewpoint on the HPI systems and integrates the cellular functions used by host and pathogen to interact each other during the innate and adaptive phases into a self-tuning control system, which consists of the innate and adaptive loops. During the innate phase, pathogen invasions are inputs to drive the self-tuning HPI systems. The invasion and resource competition could activate typical and atypical host cellular functions. In order, these typical and atypical cellular functions respond to pathogen that has completed the innate loop. The immunological memory of pathogen forms based on the host responses and pathogenic antigens and changes the interactions between host proteins, and the coordination of the host cellular functions is also changed after innate phase. During the adaptive phase, the challenges of pathogen can activate both the innate and adaptive loops. The coordination of host cellular functions is changed to exert specific effects on pathogen which are observed in the comparison of innate and adaptive networks. Therefore the host systems can compute the characteristics based on its responses and pathogen inputs and tune itself through the immunological memory. As well, to identify the coordination of typical molecular mechanisms that are subject to the immunological memory, several neuroimmune-related cellular functions have become the putative targets of immunological memory. As a result, the present analyses can expand the influence of immunological memory and form the basis for new directions in vaccine designs. Further studies of these identified cellular functions and proteins may promote the translation of hostpathogen relationships to biomedical applications [608].
IV. Systems Innate and Adaptive Immunity in the Infection Process
315
13.5 Appendix
13.5 Appendix
(A) The distribution of residual sum of squares (RSS) p j – Φ j θˆ j
2 2
and log of
residual sum of squares.
(B) Innate phase
(C) Adaptive phase
2
The RSSs and goodness of fit. (A) The distribution of RSS :pi 2Φi θi :2 and log of residual sum of squares. (B and C) The comparisons between measured and estimated expression profiles of top three largest residual sums of squares during the innate and adaptive phases, respectively [594]. RSS, Residual sum of squares.
FIGURE 13.A1
IV. Systems Innate and Adaptive Immunity in the Infection Process
C H A P T E R
14 The significant signaling pathways and their cellular functions in innate and adaptive immune responses during infection process 14.1 Introduction of innate and adaptive immune systems In humans and other vertebrates, immune system has the natural capability of the body to resist and defend against invasion by pathogenic microbes. In vertebrates the immune system can generally be separated into two main categories, that is, the innate immune system and the adaptive immune system [609]. The former is responsible for nonspecific immune responses and serves in the front line for rapid defense against foreign invading pathogens [610]. In contrast to the innate immune system the adaptive immune system consists of highly specialized systemic cells and sophisticated defensive processes that are capable of preventing, or at least restricting, specific pathogen invasion. The most significant difference between the innate and adaptive responses is that the adaptive immunity results in the formation of immunological memory after an initial response to a specific pathogen, leading to an enhanced immune response upon the following exposure to the same pathogen. The zebrafish (Danio rerio) has become a powerful model organism for the biomedical research in recent years because of its high reproductive rate and low maintenance cost [362]. Indeed, the use of zebrafish to study the immunity against infectious disease, including those due to bacterial or viral infections, is rapidly increasing [316,317]. Since zebrafish holds both innate and adaptive immune systems, it is a particularly suitable model organism for investigating immune mechanisms in vertebrates and mammals [316]. Candida albicans is a fungal pathogen that could grow as both yeast and filamentous forms, causing opportunistic oral and genital infections in humans [312]. Remarkably, C. albicans has the ability to adapt to diverse environmental changes, including fluctuations in temperature, nutrients, and pH levels, making it relatively difficult to treat in hosts [611]. Thus investigating the molecular mechanisms how the zebrafish immune system responds
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00022-5
317
© 2021 Elsevier Inc. All rights reserved.
318
14. The significant signaling pathways and their cellular functions
to C. albicans infection is essential to the development of novel therapeutic strategies against infectious diseases in humans. The transforming growth factor-β (TGF-β) signaling pathway is found to be essential in regulating the immune response to treat with infection [612], and a defective TGF-β signaling could lead to several systemic autoimmune defects. In the normal pathway, TGF-β signaling mediates its effect through the SMAD pathway [613]. Several studies have revealed that TGF-β signaling could suppress immune responses [612,614,615]. On the contrary, immune cells could also promote TGF-β signaling [615]. Thus the molecular mechanisms involved in the TGF-β signaling pathway, which could regulate the host tolerance as well as the innate and adaptive immunity, are important areas for research. However, the complex role of TGF-β signaling in maintaining the balance of the immune system is still poorly understood. In this chapter the main objective is to identify the significant proteins or functional modules involved in the zebrafish immune response toward the primary and secondary infection with C. albicans. By constructing two zebrafish intracellular proteinprotein interaction (PPI) networks for the primary and secondary infection, we try to compare and identify the significant proteins, molecular processes, and defense/offensive mechanisms between these two infection phases. Particularly, this chapter could illustrate the roles of TGF-β signaling played in innate and adaptive immune responses.
14.2 Materials and methods 14.2.1 Zebrafish strain and maintenance Male, wild-type AB strain zebrafish are used in the study. Zebrafish are adults approximately 9 months old and weighed 0.330.37 g. Fish are maintained in 10 L tanks at 28.5 C under a 14/10 h day/night cycle.
14.2.2 Candida albicans strain and growth conditions The SC5314 strain of C. albicans is used in this chapter. A single colony from fresh YPD agar plates (1% yeast extract, 2% peptone, 2% dextrose, 1.5% agar) is inoculated into 5 mL YPD broth and then incubated with shaking at 180 rpm at 30 C for 24 h. Cells are harvested by centrifugation, washed once with sterile PBS or Hank’s balanced salt solution (HBSS), and then resuspended in sterile PBS or HBSS. Suspensions of C. albicans cells are diluted with PBS or HBSS and then injected into zebrafish.
14.2.3 Infection and survival assay Zebrafish are anesthetized by immersion in water containing 0.17 g/mL tricaine (Sigma, United States) and then intraperitoneally injected with 1 3 105 (primary infection) and 1 3 107 (secondary infection) CFU of C. albicans at day 0 and 14, respectively, by using a 26.5 ga syringe (Hamilton Syringe 701N). After infection, fish are immediately transferred to the tanks to recover immediately and kept in separate 10 L tanks maintained with daily water changes. The tanks are housed in an incubator with a 14/10-h day/night cycle at 28.5 C. The fish are closely monitored and mortality was determined every hour.
IV. Systems Innate and Adaptive Immunity in the Infection Process
14.3 Investigating the defense/offensive strategies of innate and adaptive immunity
319
14.2.4 Purification of Candida albicans and zebrafish RNA C. albicans-infected zebrafish are immersed in Trizol reagent (Invitrogen, United States) and then ground in liquid nitrogen using a small mortar and pestle. The ground sample is then disrupted by using a MagNALyser System (Roche) with glass beads (cat. no. G8772-100G, Sigma) and then shaken at 5000 rpm for 15 s. After phase separation by the addition of chloroform, the total RNA is purified using an RNeasy Mini Kit (Qiagen, Germany). The purified RNA is quantified at an OD of 260 nm wavelength by using an ND-1000 spectrophotometer (Nanodrop Technology, United States) and the RNA quality is then analyzed using a Bioanalyzer 2100 (Agilent Technologies, United States) with an RNA 6000 Nanolabchip kit (Agilent Technologies).
14.2.5 Microarray experiments Total RNA (1 μg) is amplified using a Quick-Amp labeling kit (Agilent Technologies), and then labeled with Cy3 (CyDye, PerkinElmer, United States) during the in vitro transcription process. For the C. albicans and zebrafish arrays, 0.625 and 1.65 μg of Cy3 cRNA, respectively, are fragmented to an average size of approximately 50100 nucleotides by incubation with the fragmentation buffer at 60 C for 30 min. The fragmented and labeled cRNA is then hybridized to an oligomicroarray at 60 C for 17 h. The microarrays are washed, then dried by using a nitrogen gun, and then scanned for Cy3 at 535 nm by using an Agilent microarray scanner (Agilent Technologies, United States). Scanned images are then analyzed by Feature Extraction 9.5.3 software (Agilent Technologies), and image analysis and normalization software are also employed to quantify the signal and background intensities for each feature. Raw data are finally uploaded onto the NCBI GEO Database. (The array data has been uploaded onto NCBI GEO database with accession number: GSE51603.) The time points of the time-course microarray data for the primary infection are 1, 2, 3, 6, and 14 dpi, and those for the secondary infection are 14.2, 14.6, 14.12, 14.18, 15, 15.6, 15.12, and 15.18 dpi (Fig. 14.1). Each time point consists of two replicates with 10 zebrafish in each replica as well as the control group with comparable conditions. While the microarray dataset presented in this manuscript is newly generated and first reported in this chapter, it is also a most recent report from a series of pathogenhost interaction studies completed by our research group. Routine validation assays, including histological analysis, have already been performed and reported previously to ensure the quality and reproducibility of these data [320,374,493]. In comparison with the published reports that were focusing on the primary infection the results are highly consistent between the current study and the past findings.
14.3 Investigating the defense/offensive strategies of innate and adaptive immunity 14.3.1 Method for strategies of innate and adaptive immunity In this section the method of constructing dynamic intracellular PPI network for investigating the defense/offensive strategies of innate and adaptive immunity is introduced.
IV. Systems Innate and Adaptive Immunity in the Infection Process
320
14. The significant signaling pathways and their cellular functions
The method employed for constructing the dynamic intracellular PPI networks is separated into three key steps: (1) the data selection and preprocessing, (2) the selection of the target protein pool, and (3) the construction of the real PPI networks for zebrafish. The overall framework for the proposed method is given in Fig. 14.2. Our strategy first is to collect all intracellular protein interactions in databases into a Candidate intracellular PPI network for zebrafish. The Candidate intracellular protein interaction network is then further validated and pruned since it could not exactly represent the actual intracellular protein interactions in C. albicans infection. A dynamic model is used to describe the Candidate intracellular protein interactions. By using microarray data for C. albicansinfected zebrafish, the interaction abilities in the dynamic model could thus be determined. Significant PPIs based on these interaction abilities are thus identified to obtain the real intracellular PPI. The same procedure is applied to construct the zebrafish intracellular PPI network for primary and secondary infection on the basis of time-course microarray data obtained from the corresponding experiments. Details of procedures used to construct the dynamic intracellular PPIs of zebrafish are described in the following sections.
14.3.2 Dataset selection and target protein pool determination Three types of data are used in the proposed PPI network constructing method: (1) timecourse microarray profiles of gene expression of C. albicans-infected zebrafish, (2) PPI data of Homo sapiens, and (3) orthologous gene data between zebrafish and H. sapiens. There are two sets of time points for the primary and secondary C. albicans infection for obtaining microarray time-profile data for zebrafish gene expression (primary infection: 1, 2, 3, 6, and 14 dpi (days postinfection); secondary infection: 14.1, 14.25, 14.5, 14.75, 15, 15.25, 15.5, and 16 dpi; see Fig. 14.1). The manipulation of the animal model is approved by the Institutional Animal Care and Use Committee of National Tsing Hua University (IRB Approval No. 09808). The PPI data of H. sapiens were extracted from the Biological General Repository for Interaction Datasets (BioGRID) database (http://thebiogrid.org/) [73]. The gene ortholog data of zebrafish and humans are obtained from the InParanoid database (http://inparanoid.sbc.su.se/) [375]. For both the primary and secondary infection, one-way ANOVA is applied to the microarray time-series profile of gene expression to select for differentially expressed proteins. The P-value is set at .01 with Student t-test for the protein pool selection. Here the proteins’ expression is treated as the corresponding genes’ expression and the gene pool is viewed as the protein pool. A total of 422 and 1284 proteins are thus identified as differentially expressed for the primary and secondary infection, respectively. After these target protein pools for the primary and secondary infection are determined, Candidate PPI networks are constructed on the basis of the protein pool and PPI information available from data mining. There are a total of 420 and 2312 PPI interactions included in our Candidate PPI network by integrating multiple databases (BioGRID and InParanoid7).
14.3.3 Construction of zebrafish intracellular proteinprotein interaction networks These Candidate PPI networks are obtained by including all available PPI interactions, which cannot truly represent the zebrafish intracellular PPI networks under our experimental
IV. Systems Innate and Adaptive Immunity in the Infection Process
14.3 Investigating the defense/offensive strategies of innate and adaptive immunity
321
FIGURE 14.1 Flowchart for the construction of the zebrafish intracellular PPI network. PPI, Proteinprotein interaction. One-way analysis of variance (ANOVA) is applied to select the target protein pools. PPIs of H. sapiens data are obtained from the BioGRID database, and the orthologous information between zebrafish and H. sapiens is extracted from the InParanoid database. By using a dynamic model, we are able to construct the candidate PPI networks for zebrafish. As these candidate PPI networks could not truly represent the intracellular PPIs in zebrafish, we use the interaction ability identification method to determine the significant interactions. The refined intracellular PPI networks could thus be constructed from these significant interactions [10].
settings. Hence, false-positive PPIs must be pruned on the basis of our experimental data with the help of dynamic modeling of PPI networks. The dynamic PPI model for the kth zebrafish target protein in the intracellular PPI network can be represented by the following equation: pk ðt 1 1Þ 5 pk ðtÞ 2 Kk pk ðtÞ 1
Mk X
bkm pk ðtÞpm ðtÞ 1 qk 1 nk ðtÞ
(14.1)
m51
where pk ðtÞ denotes the protein level for the kth zebrafish target protein at time t. Mk is denoted as the number of PPIs in zebrafish for the kth target protein; Kk is the degradation effect for the kth zebrafish target protein; pm (t) represents the protein level for the mth
IV. Systems Innate and Adaptive Immunity in the Infection Process
322
14. The significant signaling pathways and their cellular functions
FIGURE 14.2 The bar depicts time points at which data are collected to construct the zebrafish intracellular PPI networks for primary and secondary infection. For primary infection, zebrafish are injected with 1 3 105 CFU (colony-forming units) Candida albicans, and time-course microarray data are collected at 1, 2, 3, 6, and 14 dpi. After 14 days, zebrafish are injected with 1 3 107 CFU C. albicans. The time points of the time-course microarray data for secondary infection are 14.1, 14.25, 14.5, 14.75, 15, 15.25, 15.12, and 15.6 dpi [10]. PPI, Proteinprotein interaction.
zebrafish protein that can potentially interact with the kth target protein, and bkm is denoted as the corresponding interaction ability between the two proteins. The basal level is denoted by qk , and the stochastic noise due to model the uncertainty and measuring error of the microarray data is denoted by nk (t). The regulatory parameters can be determined with the help of time-course microarray data. To identify the parameters in the model the gene expression profiles are used to substitute for protein activity levels. Eq. (14.1) can be rewritten in the regression form as follows: 3 2 1 2 Kk 7 6 6 bk1 7 6 pk ðt 1 1Þ 5 pk ðtÞ pk ðtÞ p1 ðtÞ . . . pk ðtÞ pMk ðtÞ 1 6 ^ 7 7 1 nk ðtÞ 5 ψk ðtÞθk 1 nk ðtÞ 5 4 bk Mk qk (14.2) where ψk ðtÞ represents the regression vector and θk is the parameter vector for the kth zebrafish target protein, which is to be estimated. We also employ the cubic spline method to interpolate additional time points within the microarray data to avoid over-fitting. Eq. (14.2) for different time points can be rearranged as the following: 2 3 2 3 2 3 pk ð t 2 Þ ψk ðt1 Þ nk ðt1 Þ 6 pk ðt3 Þ 7 6 ψk ðt2 Þ 7 6 7 6 7 6 7θk 1 6 nk ðt2 Þ 7 (14.3) 4 ^ 554 5 4 5 ^ ^ pk ðtL Þ ψk ðtL21 Þ nk ðtL21 Þ T T By defining the notations Pk 5 pk ðt2 Þ. . .pk ðtL Þ , ψk 5 ψk ðt1 Þ?ψk ðtL21 Þ , T and Ωk 5 nk ðt1 Þ?nk ðtL21 Þ , Eq. (14.3) can then be further rewritten as the linear regression form: pk 5 ψ k θ k 1 N k
(14.4)
where parameters can be identified by solving a constrained least-squares problem in Chapter 2, Biological Network Modeling and System Identification in Systems Immunology and Infection Microbiology. After the parameters are identified, Akaike’s information criterion
IV. Systems Innate and Adaptive Immunity in the Infection Process
14.3 Investigating the defense/offensive strategies of innate and adaptive immunity
323
(AIC) in Chapter 2, Biological Network Modeling and System Identification in Systems Immunology and Infection Microbiology, is employed to select significant PPI interactions [81]. The AIC includes both the estimated residual error and model complexity in one statistic. The true number of interactions will minimize AIC. Therefore, by AIC, we could detect the true interactive proteins of each target protein. Then, we could obtain the true PPI network by pruning the insignificant false-positive protein interactions out of the true protein interactive number of target protein one by one via AIC scheme.
14.3.4 Investigating the zebrafish intracellular proteinprotein interaction networks for primary and secondary infection We identify 57 proteins and 341 PPIs in the constructed zebrafish intracellular PPI networks during the primary infection, and 90 proteins and 385 PPIs during the secondary infection (see Figs. 14.A1 and 14.A2 in Appendix for the complete networks). A comparison between the two constructed PPI networks indicates that there were 20 proteins that are common to both PPI networks and 37 and 70 proteins that are specific to primary and secondary infection, respectively. The cellular functions of the 37 proteins that are specific to primary infection are mapped to the following main biological processes: metabolic processes (30.1%), cellular processes (14%), cell communication (10.3%), the cell cycle (8.8%), and the immune response (6.6%). Similarly, the 70 proteins identified to be specific to secondary infection are involved in metabolic processes (26.1%), cellular processes (16.7%), cell communication (13%), developmental processes (9.1%), and transport (5.1%).
14.3.5 Centrality analysis of the zebrafish intracellular proteinprotein interaction networks for primary and secondary infection To investigate meaningful implications or insights into the constructed PPI networks, we have conducted centrality analysis for both intracellular PPI networks. In particular, three common network centralities are considered, that is, the node degree and betweenness centrality. For the two constructed zebrafish intracellular PPI networks, 57 and 90 target proteins are identified in primary and secondary C. albicans infection, respectively. The top-ranking hub proteins selected by node degree are listed in Tables 14.1 and 14.2, which also include their potential roles during primary and secondary infection. Several proteins during primary infection are found to be related to the immunity according to their Gene Ontology functional annotations. For example, Gch2, which is involved in adaptive immunity and responds to the interferon (IFN)-γ stimulation, is identified as a hub protein. Another hub protein in primary infection is Hsp90a.1, which is involved in the Fc-γ receptor signaling. It is interesting to see that Fc-γ receptor signaling may play an essential role in the antigen presentation in the primary C. albicans infection in zebrafish. The betweenness centrality for each node in both PPI networks is also calculated. Betweenness is the measure of the number of passing shortest paths through a node [616].
IV. Systems Innate and Adaptive Immunity in the Infection Process
324
14. The significant signaling pathways and their cellular functions
TABLE 14.1 Hub proteins identified in the zebrafish intracellular proteinprotein interaction network for primary infection [10]. Rank
Zebrafish protein
Degree
GO functional annotation
1
Tfap2a
10
Regulation of cell differentiation
2
Gch2
10
Response to interferonγ
3
Hsp90a.1
10
Fc-γ receptor signaling
4
Acvr1b
9
TGF-β receptor activity
5
Btg2
9
Negative regulation of cell proliferation
6
Cct5
9
Response to virus
7
Clock
9
Circadian rhythm
8
Fkbp5
9
Heat shock protein binding
9
Fos
9
Innate immune response
10
Ncstn
9
T cell proliferation
GO, Gene Ontology; TGF, transforming growth factor.
TABLE 14.2 Hub proteins identified in the zebrafish intracellular proteinprotein interaction network for secondary infection [10]. Rank
Zebrafish protein
Degree
GO functional annotation
1
Psmd1
16
Antigen processing and presentation of exogenous peptide antigen via MHC class I
2
Fos
12
Innate immune response
3
Psmd13
11
Antigen processing and presentation of exogenous peptide antigen via MHC class I
4
Ndrg1
10
Mast cell activation
5
Casp2
10
Regulation of apoptosis
6
Gch2
10
Response to interferon-γ
7
Ncstn
9
T cell proliferation
8
Acvr1b
9
TGF-β receptor activity
9
Uba5
9
Protein ubiquitination
10
Usp14
8
Regulation of proteasomal protein catabolic process
MHC, Major histocompatibility complex; GO, Gene Ontology; TGF, transforming growth factor.
We have listed the top 10 proteins ranked by the betweenness centrality for both primary and secondary infection PPI networks in Table 14.3. In general, many hub proteins selected based on node degree during the primary infection are also selected based on the betweenness centrality. During the secondary infection, however, only one hub protein that has a
IV. Systems Innate and Adaptive Immunity in the Infection Process
325
14.3 Investigating the defense/offensive strategies of innate and adaptive immunity
TABLE 14.3 Proteins identified in the zebrafish intracellular proteinprotein interaction network for the primary infection and secondary infection are ranked by betweenness centrality [10]. Primary infection
Secondary infection
Rank Zebrafish protein Betweenness centrality Degree Zebrafish protein Betweenness centrality Degree 1
Elavl1
6048.44
9
Psmd1
16588.53
16
2
Zgc:63606
5972.60
5
Creb1
15020.0
5
3
Btg2
5092.00
9
Psme3
14903.00
4
4
Zgc:153257
5016.00
2
Tp53
14670.00
2
5
Fkbp5
4542.00
9
Cebpb
14334.60
6
6
Hug
4353.72
10
Mdm2
13140.00
3
7
Cul1a
3992.00
3
Casp3a
13080.00
6
8
Hsp90a.1
3642.00
10
Dhfr
13032.00
2
9
LOC563808
3641.05
5
Hspd1
12922.00
2
10
Esr1
2856.00
2
Bcl2
12612.34
4
node degree larger than 9 is selected based on betweenness centrality. This is because the PPI network for primary infection is more compact than the PPI network for the secondary infection. During the primary infection, hub proteins such as Elavl1, Btg2, Fkbp5, Hug, Hsp90a.1 are also the top in the betweenness ranking. Zgc:63606 connects to two hub proteins Elavl1 and Hug, leading to a large betweenness centrality. Similarly, Zgc:153257 is the bridge between two subnetworks, leading to a larger betweenness. Even they are high in betweenness centrality ranking, these two zebrafish proteins are still unknown in cellular function and may be good Candidates for future experiment to verify their relations with the immunity. In the case of the secondary infection, because the PPI network is more dispersed and contains lots of small networks without a connection to the main network, the betweenness centrality ranking is dominated by the nodes in the main network. Lots of hub proteins with a high degree are not in the top list ranked by the betweenness since they are located at the smaller subnetworks that are not connected to the main network, leading to a smaller number of total passing shortest paths through these proteins. On the other hand, despite Tp53, Dhfr, and Hspd1 have only a degree of 2, they act as bridges between these subnetworks within the main network, leading to a larger betweenness and may potentially be significant Candidates for further investigation.
14.3.6 Investigation of proteins common to proteinprotein interaction networks for primary and secondary infection This section focuses on the 20 identified proteins that are common to both primary and secondary infection in our constructed PPI networks. To identify the proteins that play the crucial but distinct roles in the innate and adaptive immunity, the PPI linkages between
IV. Systems Innate and Adaptive Immunity in the Infection Process
326
14. The significant signaling pathways and their cellular functions
TABLE 14.4 Zebrafish proteins identified to have the most significant changes in the number of proteinprotein interaction linkages between primary and secondary Candida albicans infection [10]. Rank
Zebrafish protein
Primary infection
Secondary infection
Changes
Involved biological process
References
1
Psmd13
5
12
17
Adaptive immunity
[617]
2
Psmd1
6
11
17
Adaptive immunity
[617]
3
Casp2
5
11
16
Apoptosis
[618]
4
Ncstn
5
11
16
Immune recognition
[619]
5
Wdr82
6
10
16
Unknown
6
Fos
3
9
12
Innate immunity
7
Acvr1b
1
8
9
Induction of apoptosis
[620]
Activation of mast cells of
[621]
Regulation immunoglobulin
[622]
8
Ampste24
4
4
8
Unknown
9
Loc553343
3
4
7
Unknown
10
Hsp90a.1
2
4
6
Apoptosis
[623,624]
Antigen presentation
[625]
primary and secondary infection are compared for each of these common proteins. Based on the number of changes in the PPI linkages, these proteins are also ranked to identify the top 10 proteins whose PPI linkages are most significantly different between the primary and secondary infection (Table 14.4).
14.4 The roles of significant signaling pathways in the innate and adaptive immune responses 14.4.1 The TGF-β pathway is involved in the control of the primary and secondary immune response Acvr1b (ALK4, activin receptor type 1B) is identified to be a hub protein in both the primary and secondary infection (Tables 14.1 and 14.2). Activins are members of the TGF superfamily that act as local regulators of biological processes and are associated with the cell growth and differentiation [620]. Correspondingly, activin is crucial to the control of innate and adaptive immune responses [615]. To regulate the immune response, TGF-β could mediate its effects via SMAD proteins [613]. To clarify the role of TGF-β in the innate and adaptive immune response, we focus on the protein interactions of SMAD proteins and Acvr1b in the primary and secondary infection and compare the differences between the initial and recurring infections. In the primary infection,
IV. Systems Innate and Adaptive Immunity in the Infection Process
14.4 The roles of significant signaling pathways in the innate and adaptive immune responses
327
FIGURE 14.3
Interactions of Acvr1b with other proteins in the primary infection. Acvr1b is identified to be a hub protein during both primary and secondary infection. The R-SMAD Smad2 is found to be a key protein in primary infection. The identification of R-SMAD in both primary and secondary infection suggests that the activation of TGF-β signaling requires the SMAD pathway [10].
the SMAD protein Smad2 has been found to interact with Acvr1b; whereas in the secondary infection, Smad2, Smad3, and Smad7 have been found to interact with Acvr1b. The SMAD proteins could be divided into three major groups according to their function: the receptorregulated SMADs (R-SMAD), the common mediator SMAD (Co-SMAD), and the inhibitory SMADs (I-SMAD) [613]. The R-SMADs Smad2 and Smad3, which, upon the phosphorylation, could interact with Co-SMAD and translocate to the nucleus [614], are identified as key proteins in the primary and secondary infection. However, Smad7, an I-SMAD, is found to interact with Acvr1b during the secondary, but not the primary, infection (Figs. 14.3 and 14.4). Smad7 has been reported to play an essential role in the negative regulation of TGF-β signaling by interfering with the binding of TGF-β to type I receptors [626]. Furthermore, we have compared the expression profile of Smad7 overtime during the primary and secondary infection to see if there was a significant change between the primary and secondary infection (Fig. 14.5). The time course of Smad7 expression has shown a significant increasing trend of the inhibitory Smad7 expression during the secondary infection, suggesting that the TGF-β signaling is suppressed in the secondary infection relative to the primary infection. This is in agreement with the previous findings, suggesting that Smad7 is involved in the reciprocal inhibition of TGF-β and IFN-γ [627].
IV. Systems Innate and Adaptive Immunity in the Infection Process
328
14. The significant signaling pathways and their cellular functions
FIGURE 14.4 Interactions of Acvr1b with other proteins in secondary infection. Acvr1b, Smad2, and Smad3 are found to be important in secondary infection. Smad7, an inhibitory SMAD protein, interacted with Acvr1b in the secondary but not the primary infection. This disparity suggests that TGF-β signaling may play a role in the control of innate and adaptive immune responses [10].
In the reciprocal inhibition of TGF-β and IFN-γ, Smad7 is the key component responsible for polarizing responses toward either the immunity or the tolerance to infection. More specifically, although Smad7 can suppress the TGF-β signaling pathway to initiate infection tolerance, it can also promote the immunity triggered through the IFN-γ signaling pathway. This is of interest, as the dual role of Smad7 in determining whether the occurrence of immunity or pathogen tolerance is suggestive of a significant mechanism that controls the immune response. The gene expression time course in the primary and secondary infection has shown that Smad7 expression, which is at the basal level in the primary infection, being increased rapidly during secondary infection (Fig. 14.5). The difference in Smad7 expression between the primary and secondary infections may indicate that after the initial low-dose infection, the zebrafish immune system is able to tolerate the invading pathogen, thereby shifting the immune response toward the infection tolerance. However, in the secondary infection with a lethal pathogen dose, the increased Smad7 expression suggests that the immune response is triggered to defend against the invading pathogen; thus the pathway responsible for infection tolerance is inhibited in this phase (Fig. 14.6). In addition, TGF-β has also been suggested to inhibit the cellular function of inflammatory cells and immune responses [612,628]. The cellular regulation of TGF-β and its relationships with various immune cells are depicted in Fig. 14.6. Under normal conditions, there is a feedback system that characterizes the relationship between the TGF-β pathway, the innate immune response, and the adaptive immune response (Fig. 14.6). Even though immune cells can secrete cytokines that could promote the TGF-β signaling, TGF-β signaling can inhibit the activation of these immune cells, thus acting as a feedback system. Smad7 during the secondary infection is found to suppress TGF-β signaling, leading to attenuate the inhibition
IV. Systems Innate and Adaptive Immunity in the Infection Process
14.4 The roles of significant signaling pathways in the innate and adaptive immune responses
329
FIGURE 14.5 The expression profile of Smad7 in the primary and secondary infection. Upon comparing the time course of Smad7 expression between the primary and secondary infection, it is clear that Smad7 expression remained stable in primary infection. During the secondary infection, however, the level of Smad7 expression could increase significantly by an average of 14%. This difference suggests that Smad7 may play a key role in the control of innate and adaptive immune responses [10].
FIGURE 14.6
Smad 7 is the key to the regulation of tolerance and immunity. (1) A previous study has reported that Smad7 is a mediator that controls the tolerance and immunity pathways [627]. The time course of Smad7 expression demonstrates a significant increase during secondary infection, suggesting that in zebrafish injected with a higher pathogen dose, the immune response is triggered to defend against the invading pathogen rather than to promote infection tolerance. (2) According to a previous study [615], the regulation of TGF-β, the innate immune response, and the adaptive immune response may be considered as a feedback system. (3) Smad7 is found to be a key protein during secondary infection. Smad7 attenuates TGF-β-mediated inhibition of cells in the adaptive immune system, resulting in the proliferation of T and B cells that promote host defense by the adaptive immune system challenged with a higher dose of the pathogen [10].
IV. Systems Innate and Adaptive Immunity in the Infection Process
330
14. The significant signaling pathways and their cellular functions
of immune cells (Fig. 14.6). Consequently, the increased proliferation of immune cells such as the T and B cells in the adaptive immune response could promote the defense against the invading pathogen. In summary, the identification of Acvr1b in the primary and secondary infection suggests that TGF-β signaling is indeed involved in the control of innate and adaptive immune responses. Furthermore, the discovery that Smad7 could interact with Acvr1b only during secondary infection suggests that TGF-β could control immune responses via a SMAD-dependent pathway. Therefore the control mechanism can be described as a feedback system involving the TGF-β signaling and the adaptive immune response (Fig. 14.6).
14.4.2 The role of proteasome in controlling the adaptive immune response Psmd1 and Psmd13, 26S proteasome regulatory subunits, are identified to be significant primarily during secondary infection. Proteasomal activity has been shown to be related to the inflammatory and autoimmune diseases such as systemic lupus erythematosus and rheumatoid arthritis because of its role in activating an antiapoptotic and proinflammatory regulator of the cytokine expression [629]. Therefore the identification of Psmd1 and Psmd13 in the constructed intracellular PPI networks for the secondary infection indicates that the proteasome system plays a pivotal role in the zebrafish immune response. Furthermore, the numbers of linkages from the primary to secondary infection for both Psmd1 and Psmd13 are found to increase significantly, suggesting that the proteasome is more active during the secondary infection and is therefore more important in the adaptive immune response of zebrafish.
14.4.3 The regulation of apoptosis in the primary and secondary infection Many of the 10 most significant hub proteins discussed above are related to the apoptotic process, as shown in Table 14.1. Hub proteins identified in the zebrafish intracellular. Further investigation has revealed that apoptosis is activated during the primary infection but is inhibited during the secondary infection. In the constructed zebrafish intracellular PPI networks, the 3 identified proteins (Casp2, Acvr1b, Hsp90a.1) are involved in the apoptosis out of the 10 hub proteins (Table 14.4). For Casp2 the number of linkages is found to increase significantly during the secondary infection relative to the number during the primary infection. The caspase family of proteins plays a dominant role in the activating apoptosis [630]. The analysis of Casp2 protein interactions revealed that it could interact with Bcl2 during the secondary but not the primary infection (Figs. 14.7 and 14.8). Bcl2 is a member of the Bcl2 family that regulates the cell death by inhibiting the apoptotic process [631]. Thus the finding that Casp2 and Bcl2 interact during the secondary infection suggests that apoptosis is suppressed during the secondary infection, in contrast to the induction of apoptosis during the primary infection. Acvr1b, a type 1B activin receptor, has been shown to be related to the apoptotic process in both the primary and secondary infection. Activins are members of the TGF-β superfamily and are local regulators of biological processes that are associated with the cell growth and differentiation [620]. The TGF-β pathway is also involved in inducing the
IV. Systems Innate and Adaptive Immunity in the Infection Process
14.4 The roles of significant signaling pathways in the innate and adaptive immune responses
331
FIGURE 14.7 Protein interactions of caspase 2 in the primary infection. Caspase 2 is identified to be a hub protein during both the primary and secondary infection. Bcl2, an antiapoptotic protein, is found to interact only with Casp2 during the primary infection [10].
apoptosis and the SMAD family of molecules could act as key signal transducers during this apoptotic process [632]. Smad7 protein is found to interact with Acvr1b during the secondary but not primary infection (Figs. 14.3 and 14.4). Smad7 is an inhibitory protein that could interfere with the phosphorylation of pathway-restricted SMAD proteins such as Smad2 and SMAD3 by binding to the type I receptors [614,633]. Therefore the interaction between Acvr1b and Smad7 supports our observation that apoptosis is inhibited during the secondary C. albicans infection. Hsp90a.1, a heat shock protein, is identified to be a key hub protein in the zebrafish intracellular PPI networks. The number of its linkages is found to increase significantly from the primary to secondary infection. Hsp70 and Hsp90 directly could interact with proteins regulating the programmed cell death machinery and thus could block the apoptotic process [623]. The identification of Hsp90a.1 as an important protein mainly during secondary infection in our constructed network could again suggest that apoptosis is inhibited during the secondary infection. Furthermore, Hsp90 is found to stabilize the 26S proteasome (Psmd1 and Psmd13, 26S proteasome proteins that are a part of the 10 hub proteins, as listed in Table 14.4), and it could thereby enable the cell to remove the unwanted or harmful proteins.
IV. Systems Innate and Adaptive Immunity in the Infection Process
332
14. The significant signaling pathways and their cellular functions
FIGURE 14.8
Protein interactions of caspase 2 in the secondary infection. Caspase 2 is identified to be a hub protein during both the primary and secondary infection. Bcl2, an antiapoptotic protein, is found to only interact with Casp2 during the secondary infection, suggesting that the apoptotic process is inhibited in the secondary infection [10].
In summary, apoptotic proteins such as Casp2, Acrv1b, and Hsp90a.1 are more prominent during the secondary rather than primary infection is found intriguing. Increasing evidence has supported that apoptosis has a crucial role in the innate and adaptive immunity during infection [426,569,634,635]. The results indicate that the apoptosis is inhibited in the secondary but not primary infection, suggesting that during infection, apoptosis can be adopted as an offensive or defensive strategy by the pathogen or zebrafish, respectively.
14.4.4 The identification of Ncstn for relationship between bacteria- and fungus-induced immune responses Ncstn, a part of the γ-secretase protein complex, is found to play a significant role during both the primary and secondary infection in the constructed PPI networks. Ncstn can generate a peptide epitope that could facilitate the immune recognition of intracellular mycobacteria with the related components of γ-secretase through major histocompatibility complex IIdependent priming of T cells [619]. Such pathogen recognition mechanisms are crucial to the adaptive immunity in the host. The identification of Ncstn during the C. albicans infection of zebrafish suggests that Ncstn responds not only to the bacterial infection but also to the fungal infection. Taken together, the initial investigation of the constructed PPI networks for the primary and secondary infection has revealed that the immune responses activated after the secondary infection are generally stronger. As shown in the in vivo experiment, zebrafish infected with 1 3 105 CFU C. albicans have a higher survival rate and survive longer after the secondary infection with a more lethal C. albicans dose (1 3 107 CFU) compared with
IV. Systems Innate and Adaptive Immunity in the Infection Process
333
14.5 Conclusion
FIGURE 14.9 Zebrafish activated with the adaptive immune response have a higher survival rate following the secondary high-dose infection with Candida albicans. This figure depicts the zebrafish survival rate versus time. In this experiment, zebrafish had been infected with a nonlethal low dose of the live yeast form of C. albicans (105 CFU) or injected with PBS. Two weeks later, all fish are infected with a higher dose of C. albicans (107 CFU). Zebrafish inoculated with a low dose of C. albicans have a survival time longer than that of the PBS group, demonstrating that zebrafish can activate the adaptive immunity to defend against the repeated C. albicans infection [10].
zebrafish without the prior infection (Fig. 14.9). Identification of the aforementioned hub proteins in the constructed zebrafish intracellular PPI networks could encourage us to explore how the zebrafish immune system responds to infection and whether the response differs in the primary and secondary infection. Note that the dynamic modeling approach is not free from errors. False-positive and falsenegative interactions in the initial putative PPI network can affect the accuracy of our constructed network. In order to minimize the effect of false-positive interactions, we applied AIC in the last step of network construction to prune the false-positive interactions based on the true model order selection. False-negative interactions are harder to avoid since if a PPI link is missing in the initial putative network, there is no effective method to recover the link. Therefore we used BioGRID and InParanoid7 database, the most comprehensive PPI databases available, to build the initial Candidate network. However, PPI links may still be missing in BioGRID and InParanoid7. Such error can be improved when more PPI databases are available and can be integrated to form a comprehensive initial candidate network.
14.5 Conclusion Using dynamic interaction modeling and time-course microarray data, we could construct intracellular PPI networks for the primary and secondary infection of zebrafish with C. albicans. Using these PPI networks, we could examine how immune responses in zebrafish are triggered against the primary and secondary infection. We have identified 341 and 359 intracellular PPIs in the intracellular PPI networks for the primary and secondary infection, respectively. Hub proteins of each network are also identified. By comparing the two constructed PPI networks, the 10 proteins with the most significant changes in linkage between the primary and secondary infection are also determined. These proteins might play crucial roles in the immune response of zebrafish during the
IV. Systems Innate and Adaptive Immunity in the Infection Process
334
14. The significant signaling pathways and their cellular functions
infection; thus, the biological and molecular processes that these proteins play during the primary and secondary infection are investigated. TGF-β signaling and apoptosis are found to be two of the main functional modules in the primary and secondary infection. Smad7, an I-SMAD protein, is found to be important in TGF-β signaling in the secondary infection only. Smad7 could interfere with the R-SMAD phosphorylation and thereby could attenuate the TGF-β signaling. Therefore the role of Smad7 in the secondary infection suggests that the attenuated suppression of immune cells enables the adaptive immune response to defend against the high-dose secondary infection. We have also identified a feedback system that could describe the relationship between the TGF-β signaling and the immune response. We have also discovered several crucial proteins (Casp2, Acvr1b, and Hsp90a.1) associated with the apoptosis. As the most significant proteins in the secondary infection are involved in the inhibition of apoptosis, the apoptotic process might be an important mechanism in the zebrafish immune response against C. albicans, particularly during the primary infection. These initial in silico analyses might encourage further an experimental investigation on the pertinent roles played by the apoptosis in the innate and adaptive immune response of zebrafish. We believe that new insights revealed by this work may lead to therapeutic advances and the improved design of drugs for the continuous battle against infectious diseases.
14.6 Appendix
FIGURE 14.A1
The primary infection network [10].
IV. Systems Innate and Adaptive Immunity in the Infection Process
14.6 Appendix
FIGURE 14.A2
The secondary infection network [10].
IV. Systems Innate and Adaptive Immunity in the Infection Process
335
C H A P T E R
15 Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms in human macrophages and dendritic cells during Mtb infection 15.1 Introduction to tuberculosis infected by Mycobacterium tuberculosis Tuberculosis (TB) is an ancient disease of humankind, accounting for a large number of deaths over the years. According to the World Health Organization (WHO) Global Health Observatory (GHO) data for 2014, nearly 9.6 million new cases of TB are identified each year, with almost 1.5 million TB-related deaths. Approximately one-third of the global population infect with this bacterium but remain asymptomatic, also known as latent TB [636]. Of those with latent TB, only 5%10% will develop into an active TB disease in their lifetime [637]. The main etiologic agent of TB is Mycobacterium tuberculosis (Mtb), which was first identified as a pathogen by Robert Koch in 1882 [638]. Dendritic cells (DCs), which are also involved in the first line of immune defense, could protect tissues from extraneous bacilli or viral infection. DCs could also phagocytose Mtb, process the bacilli, and present the mycobacterial antigens on the plasma membrane. The DCs may migrate to the regional lymph nodes where they prime T cells by presenting antigens on MHC (major histocompatibility complex) and secrete cytokines. DCs in the peripheral tissue could capture and process antigens in different ways as follows: (1) micropinocytosis [639]; (2) endocytosis via receptors such as the mannose receptor DEC-205 or DC-specific ICAM 3 grabbing at the nonintegrin (DC-SIGN) [640]; (3) Fc receptors and complement receptors (CR3), which could mediate the efficient internalization of immune complexes or bacteria [641]; (4) phagocytosis of viruses or bacteria [642]; (5) TLR (Toll-like receptor)-mediated pathogen recognition [643]. At present, the interactions between DCs and Mtb are still not fully understood and some reports are contradictory. For example, after interacting with pathogens, DCs mature and then migrate to lymph nodes where they prime T cells by presenting antigens on MHC and secrete cytokines such as IL-12 (interleukin 12) [644]. To compare with, a study has reported that the
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00010-9
339
© 2021 Elsevier Inc. All rights reserved.
340
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
interaction of Mtb with TLRs could induce high IL-12 secretion during the early infection whereas the interaction with DC-SIGN could result in a high IL-10 secretion [645]. After Mtb is phagocytosed by alveolar macrophages (Mφs), the ligation of TLR-2 and TLR-4 could stimulate the secretion of proinflammatory cytokines such as TNF-α (tumor necrosis factor α), IL-1β, and IL-12 [646]. TNF-α plays an essential role in the immune response to Mtb infection by stimulating neutrophils and Mφs in an autocrine and paracrine fashion to induce the apoptosis and production of reactive nitrogen intermediates (RNIs). Nitric oxide, an iNOS product, is found to be highly toxic to intracellular mycobacteria [647]. NOS2-deficient mice have shown an increased susceptibility to mycobacterial infection [648]. Some studies have reported that TNF-α is crucial in preventing the reactivation of latent TB in nonhuman primate and mouse model [649,650]. IL-1β is found to be correlated to the disease activity and fever in infected patients [651]. IL-12 is crucial in driving T helper type 1 (Th1) differentiation and IFN-γ (interferon γ) production. The increased susceptibility of IL-12p40 gene-deficient mice to Mtb infection further supports an essential role for IL-12 in the protective immune response against Mtb [652]. Nevertheless, IFN-γ is found to be the most important cytokine in the immune response to mycobacteria, and it plays a role in activating Mφs to produce reactive oxygen and nitrogen species. IFN-γ-deficient mice are found to be highly susceptible to Mtb infection and produce less NOS2 [653]. The host immune response is often found to be insufficient for handling Mtb infection as the bacterium has developed some sophisticated defense mechanisms such as blocking maturation, lysosomal fusion, and acidification to survive in Mφs and enhance the growth of bacteria. The 19-kDa lipoprotein of Mtb could interact with host antigen-presenting cells via TLR-1/2 to reduce the antigen processing and MHC-II expression [654], rather than inhibiting the cytokine production [655]. ESAT-6 has a similar effect through TLR-2 [656]. Lipoarabinomannan (LAM) is a major cell wall component of Mtb that binds to DC-SIGN. The binding of LAM could inhibit DC maturation, decrease IL-12 production, and then induce DCs to secrete IL-10 [657]. IL-10 is an immune suppressive cytokine, and its induction by Mtb can allow the survival of the bacteria [658]. Blocking the accumulation of ATPases and GTPases in the vacuole can interfere with the cellular function of the phagosome by decreasing the pH needed to kill the bacteria [659]. Other pathogenic mechanisms of Mtb immune evasion have been thoroughly outlined in a previous review [660]. Even if a few studies have investigated the effects of human microRNAs (miRNAs) on Mtb-infected Mφ [661,662] and DC [663,664], it is still an important issue to identify the cross-talk genome-wide genetic-and-epigenetic interspecies networks (GWGEINs) between pathogen and host both in Mφs and DCs during the Mtb infection. In addition, the different offense and defense mechanisms between Mφs versus Mtb and DCs versus Mtb will play crucial roles in the evaluation of potential drugs for treating human cells during the early Mtb infection. In this chapter, we identify the cross-talk GWGEINs in both human Mφs and DCs during early Mtb infection through systems biology approach. We propose the common and different pathways of the host-and-pathogen core networks (HPCNs), extracted from the cross-talk GWGEINs, to investigate the cellular mechanisms of both host and pathogen in Mφs and DCs during the early Mtb infection. Further, we also discuss the cross-talks between host and pathogen and infer the core network biomarkers to get an insight into the corresponding offense and defense mechanisms both in Mφs versus Mtb and DCs versus Mtb. Consequently, based on the core network biomarkers of offense
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
15.2 Materials and methods for constructing cross-talk GWGEINs and their core networks
341
and defense mechanisms among Mφs, DCs, and Mtb, we can propose the potential multiple drug targets and suggest the potential multimolecule drug for the therapeutic treatment of human cells during the early Mtb infection.
15.2 Materials and methods for constructing cross-talk GWGEINs and their core networks 15.2.1 Overview of the construction processes of cross-talk GWGEINs in Mφs and DCs infected with Mtb The flowchart of the strategy for constructing cross-talk GWGEINs and the HPCNs in Mφs and DCs during the early Mtb infection is shown in Fig. 15.1. The cross-talk GWGEIN is composed of the host/pathogen gene/miRNA regulation networks (GRNs), the host/pathogen proteinprotein interaction networks (PPINs), the interspecies PPINs and the regulation networks of hostmiRNAs on host-/pathogen-genes. The constructions of GWGEIN and HPCN can be divided into the following three steps: (1) big data mining and data preprocessing for the candidate cross-talk GWGEIN, (2) the identification of the real cross-talk GWGEIN by applying the system identification method and the system order detection scheme using the genome-wide microarray data of Mφs, DCs, and Mtb during the early Mtb infection, (3) HPCN construction by applying the principal network projection (PNP) method to the extraction of the real cross-talk GWGEIN. This could allow the identification of the differential cross-talk mechanisms between Mφs and DCs during the early Mtb infection.
15.2.2 Big data mining and data preprocessing In this chapter, in order to identify the real cross-talk GWGEIN between human cells and Mtb, the simultaneously measured activities of host and pathogen during the infection process are required. Since several gene expression profiles are available only on the host side [665] or only on the pathogen side [666] during the Mtb infection, those data are not suitable for identifying the offense and defense mechanisms between host and pathogen during the infection process. Consequently, in this chapter, the microarray raw data, obtained from a previous study investigating Mφs and DCs infected with Mtb [667], are the only one that presents all the data necessary to build the system model of the crosstalk GWGEIN. This issue is also a limitation on the system model as well. Therefore these assumptions need to be revised as more data becomes available. The construction of candidate GWGEINs requires several association databases as cylinder blocks. We use network models and genome-wide high-throughput data to identify the regulative and interactive abilities in the candidate GWGEIN, and prune the false positives of candidate GWGEIN by deleting the insignificant components out of system order as determined by Akaike information criterion (AIC). Then GWGEINs in Mφs and DCs are constructed to represent the real cellular network of biological systems. Finally, we extract HPCNs from GWGEINs by PNP to investigate the defense mechanisms of the host and pathogen in Mφs and DCs during Mtb infection. The multiple drug targets are also
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
342
FIGURE 15.1
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
Flowchart of host/pathogen cross-talk networks constructed using the systems biology approach.
selected based on core network markers of HPCNs for potential multimolecule drug design through drug design literature search [11]. The raw data is divided into two parts. The first part includes the mRNA and miRNA expression levels of human Mφs and DCs at 0, 4, 18, 48 h postinfection with Mtb H37Rv
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
15.2 Materials and methods for constructing cross-talk GWGEINs and their core networks
343
(ArrayExpress accession number E-MEXP-3521). The second part includes the Mtb transcriptional mRNA expression levels in Mφs and DCs at 0, 1, 4, 18 h postinfection (ArrayExpress accession number E-BUGS-58; http://www.ebi.ac.uk/arrayexpress/experiments/ bro-wse.html). The probing platforms used in the host and pathogen are Affymetrix GeneChip Human Genome HG-U133A and M. tuberculosis/Mycobacterium bovis 10944 YBv2_1_1, respectively, which contain 22283 and 10944 probes, respectively. The data of human Mφs at 0, 4, 18 h and Mtb at 0, 1, 4, 18 h during infection process are employed to identify the real cross-talk GWGEIN between Mφs and Mtb, while the data of human DCs at 0, 4, 18 h and Mtb at 0, 1, 4, 18 h during the infection process are employed to identify the real cross-talk GWGEIN between DCs and Mtb. In fact, we employ cubic spline to obtain a sufficient number of data points for applying system identification method. Therefore, in human Mφs and DCs, we have applied the cubic spline interpolation method to the data at 0, 4, 18, 48 h to obtain the sufficient number of data points between 0 and 18 h to avoid overfitting problem in parameter identification process. Constructing the candidate cross-talk GWGEIN requires the mining of big data obtained from the corresponding experimental or computational predictions. The big data, which are collected from several databases, are as follows: the host candidate GRN needs miRNA for regulatory gene associations from TargetScanHuman [668], transcription factors (TFs) to regulatory gene associations from GSEA [669,670], HTRIdb [671], and ITFP [672]. The host candidate PPIN needs PPI associations from BioGRID [73]. The pathogen candidate GRN needs TF for gene regulatory associations from TbDb [673675], the data in Ref. [676] and the hostmiRNAs targeting pathogen-gene associations in Ref. [677]. The pathogen candidate PPIN needs PPI associations from STRING [585] and the study in [678]. The hostpathogen candidate PPIN needs interspecies PPI associations from the study in Refs. [393,679]. In order to support the inferred epigenetic DNA methylation, the genome-wide DNA methylation profiles of monocytes (GSE70478) [680] and DCs (GSE64177) [681] infected with Mtb (with sample size 10 and 6, respectively) are used. According to the DNA methylation analysis of monocytes and macrophages, only 27 CpG sites have displayed differential DNA methylation during the maturation step [682]. As a result, the DNA methylation analysis between monocytes and DCs could represent that between Mφs and DCs. Methylation data have been analyzed using one-way analysis of variance statistics. To integrate the above big data, including the genome-wide expression data, the data for constructing the candidate cross-talk GWGEIN, and the genome-wide DNA methylation profiles, Matlab’s text-file and string manipulation tools in text mining are used to unify the gene name based on the gene symbols in the National Center for Biotechnology Information’s (NCBI’s) Gene database. Consequently, in the host candidate GRN, we obtain 445,335 TF gene pairs and 411,418 miRNA gene pairs. In the databases of pathogen candidate GRN, we can obtain 85,916 TF gene pairs and 12 miRNA gene pairs. In the databases of interspecies candidate PPIN, we can obtain 1,101,388 host PPI pairs, 1868 pathogen PPI pairs, and 10,536 hostpathogen interspecies PPI pairs. Finally, we integrate these candidate pairs into the candidate crosstalk GWGEIN (Fig. 15.1). The candidate cross-talk GWGEIN is then used to identify the real cross-talk GWGEINs in Mφs and DCs by applying the system identification method and the system order detection scheme using the genome-wide microarray data of Mφs, DCs, and Mtb during the early Mtb infection.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
344
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
15.2.3 Dynamic models of the cross-talk GWGEIN for Mφs, DCs, and Mtb during the early infection process Since all connections in the candidate GWGEIN are obtained from a number of plausible predictions in human cells and Mtb, we then construct the dynamic models of GWGEIN, which could characterize the molecular mechanisms in GWGEIN, to prune false-positive connections using genome-wide microarray data and finally obtain the real cross-talk GWGEIN between Mφs and Mtb (or DCs and Mtb) during infection process. For the GRN of host-genes in GWGEIN, the dynamic model of the ith host-gene is described as the following stochastic dynamic equation: xi ðt 1 1Þ 5 xi ðtÞ 1
Ji X j51
aij yj ðtÞ 2
Ki X cik xi ðtÞmiRNAk ðtÞ k51
2 λi xi ðtÞ 1 δi 1 ωi ðtÞ;
(15.1)
for i 5 1; . . .; I
where xi(t), yj(t), and miRNAk(t) denote the expression levels of the ith host-gene, the jth host-TF, and the kth hostmiRNA at time t, respectively; aij and 2cik indicate the abilities of the jth host-TF regulation and the kth hostmiRNA repression on the ith host-gene; δi and 2λi denote the basal level and the degradation rate of the ith host-gene, respectively; Ji and Ki represent the numbers of host-TFs and miRNAs regulating the ith host-gene in the candidate GWGEIN; and ωi(t) is the stochastic noise of the ith host-gene due to the modeling residue. The dynamic model of host-genes in (15.1) couldPcharacterize molecular Ji regulatory mechanisms, including the transcription regulations by j51 aij yj ðtÞ, the miRNA PKi repressions by 2 k51 cik xi ðtÞmiRNAk ðtÞ, the mRNA degradation by 2λixi(t), the basal level by δi, and the stochastic noise by ωi(t). Due to the direct effects of DNA methylation on the binding affinities of RNA polymerase to target genes [683], we can assume that the change of basal level δi between the Mtb-infected Mφs and the Mtb-infected DCs in the dynamic model (15.1) indicates the occurrence of methylation on the ith host-gene. For the PPIN of host proteins in GWGEIN, the dynamic model of the jth host-protein can be described as the following stochastic iterative dynamic equation: yj ðt 1 1Þ 5 yj ðtÞ 1
Jj X l51
bjl yj ðtÞyl ðtÞ 1
Mj X
djm yj ðtÞwm ðtÞ
m51
1 αj xj ðtÞ 2 γ j yj ðtÞ 1 κj 1 ϖj ðtÞ;
(15.2)
for j 5 1; . . . ; J
where yj(t), yl(t), xj(t), and wm(t) denote the expression levels of the jth host-protein, the lth host-protein, the jth host-gene, and the mth pathogen-protein at time t, respectively; bjl and djm indicate the interactive abilities between the jth host-protein and the lth host-protein and between the jth host-protein and the mth pathogen-protein, respectively; αj, 2γ j, and κj denote the translation rate, the degradation rate, and the basal level of the jth host-protein; Jj and Mj signify the numbers of host proteins and pathogen proteins interacting with the jth host-protein in the candidate GWGEIN; and ϖj(t) represents the stochastic noise due to modeling residue. The dynamic model of host proteins in (15.2) could characterize molecular P Jj mechanisms, including the intraspecies host PPIs by l51 bjl yj ðtÞyl ðtÞ, the interspecies PPIs by PM j m51 djm yj ðtÞwm ðtÞ, the protein translation by αjxj(t), the protein degradation by 2γ jyj(t), the basal level by κj, and the stochastic noise by ϖj(t).
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
15.2 Materials and methods for constructing cross-talk GWGEINs and their core networks
345
For the GRN of pathogen genes in GWGEIN the dynamic model of the nth pathogengene is described as the following stochastic dynamic regulatory equation: vn ðt 1 1Þ 5 vn ðtÞ 1 for n 5 1; . . .; N
Mn X
enm wm ðtÞ 2
m51
Kn X gnk vn ðtÞmiRNAk ðtÞ 2 ρn vn ðtÞ 1 ηn 1 ψn ðtÞ; k51
(15.3)
where vn(t) denotes the expression level of the nth pathogen-gene at time t; enm and 2gnk indicate the regulatory abilities of the mth pathogen-TF regulation and the kth hostmiRNA repression on the nth pathogen-gene, respectively; ηn and 2ρn are the basal level and the degradation rate of the nth pathogen-gene, respectively; Mn and Kn are the numbers of pathogen-TFs and hostmiRNAs regulating the nth pathogen-gene in the candidate GWGEIN; and ψn(t) denotes the stochastic noise due to the modeling residue. The dynamic regulatory model of pathogen genes in (15.3) could characterizePmolecular pathon genic mechanisms, including the pathogen transcription regulations by M m51 enm wm ðtÞ, the P Kn host miRNA repressions by 2 k51 gnk vn ðtÞmiRNAk ðtÞ, the pathogen mRNA degradation by 2ρnvn(t), the pathogen basal level by ηn, and the stochastic noise by ψn(t). Since it has been confirmed that miRNAs can exist extracellularly and circulate in body fluid [684,685], we can consider the repressions of hostmiRNAs on pathogen-genes in the dynamic regulatory model (15.3) of GWGEIN. For the PPIN of pathogen proteins in GWGEIN the dynamic interactive model of the mth pathogen-protein is described as the following stochastic dynamic interactive equation: wm ðt 1 1Þ 5 wm ðtÞ 1
Jm Mm X X hmq wm ðtÞwq ðtÞ 1 dmj wm ðtÞyj ðtÞ q51
j51
1 β m vm ðtÞ 2 σm wm ðtÞ 1 εm 1 τ m ðtÞ;
(15.4)
for m 5 1; . . .; M
where wq(t) and wm(t) denote the expression level of the qth pathogen-protein and the mth pathogen-protein at time t; hmq and dmj indicate the interactive abilities between the mth pathogen-protein and qth pathogen-protein and between the mth pathogen-protein and the jth host-protein, respectively; β m, 2σm, and εm represent the translation rate from the mRNA of the mth pathogen-gene, and the degradation rate and the basal level of the mth pathogen-protein; Mm and Jm denote the numbers of pathogen proteins and host proteins interacting with the mth pathogen-protein in the candidate GWGEIN; and τ m(t) is the stochastic noise due to modeling residue and measurement noise. The dynamic interactive model of pathogen proteins in (15.4) could characterize molecular interactive mechanisms, P m including the intraspecies pathogen PPIs by M q51 hmq wm ðtÞwq ðtÞ, the interspecies PPIs by P Jm j51 dmj wm ðtÞyj ðtÞ , the protein translation by β v (t), the protein degradation by m m
2σmwm(t), the basal level by εm, and the stochastic noise by τ m(t).
15.2.4 System identification method of the dynamic models of GWGEIN After constructing the stochastic dynamic Eqs. (15.1)(15.4) in the GWGEIN, we could apply a system identification method to Eqs. (15.1)(15.4) to identify their
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
346
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
respective parameters. The host GRN Eq. (15.1) could be rewritten as the following linear regression form: 3 2 ai1 7 6 ^ 7 6 6 aiJi 7 7 6 7 6 xi ðt 1 1Þ 5 y1 ðtÞ ? yJi ðtÞ xi ðtÞmiRNA1 ðtÞ ? xi ðtÞmiRNAKi ðtÞ xi ðtÞ 1 6 2ci1 7 1 ωi ðtÞ ^ 7 6 7 6 2c 6 iKi 7 4 12λ 5 i δi (15.5) which could be simply represented as follows: HG xi ðt 1 1Þ 5 φHG 1 ωi ðtÞ; i ðtÞθi
for i 5 1; . . .; I
(15.6)
where φHG i ðtÞ represents the regression vector obtained from the corresponding expression data; and θHG denotes the unknown parameter vector of the host gene i in the host GRN I to be estimated. The equations in (15.6) can be augmented for Fi time points of the ith host-gene as follows: 3 3 2 HG 2 3 xi ðt2 Þ ωi ðt1 Þ φi ðt1 Þ 6 xi ðt3 Þ 7 6 φHG ðt2 Þ 7 HG 6 ωi ðt2 Þ 7 7θ 1 6 6 756 i 7 5 i 4 5 4 4 ^ 5 ^ ^ xi ðtFi 1 1Þ ωi ðtFi Þ φHG i ðtFi Þ 2
(15.7)
HG and ΓHG to represent (15.7) as follows: For simplicity, we denote the notations XHG i , Φi i
XHG 5 ΦHG θHG 1 ΓHG i i i i
(15.8)
2 HG 3 3 2 3 xi ðt2 Þ ωi ðt1 Þ φi ðt1 Þ 6 xi ðt3 Þ 7 HG 6 φHG ðt2 Þ 7 6 7 7, and ΓHG 5 6 ωi ðt2 Þ 7. 7, Φ 5 6 i where XHG 56 i i i 4 5 4 5 4 ^ ^ 5 ^ HG xi ðtFi 1 1Þ ωi ðtFi Þ φi ðtFi Þ 2
The system identification problem of the host GRN Eq. (15.1) can then be formulated by solving the following constrained least square parameter estimation problem:
2
1 HG HG 2 min :ΦHG i θi 2Xi :2 2 θHG i Ji
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 60 ? 0 6 6^ & ^ 6 subject to 6 60 ? 0 60 ? 0 6 40 ? 0
Ki
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 1 ? 0 ^ & ^ 0 ? 1 0 ? 0 0 ? 0
0 ^ 0 1 0
3
2 3 0 7 0 7 6^7 ^ 7 7 HG 6 7 6 7 0 7 7θi # 6 0 7 415 0 7 7 21 5 0
(15.9)
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
347
15.2 Materials and methods for constructing cross-talk GWGEINs and their core networks
By solving the constrained optimization problem in (15.9), we can identify the parameters in the host GRN Eq. (15.1) and simultaneously guarantee the nonpositive hostmiRNA repression 2cik # 0, the nonpositive host-gene degradation 2λi # 0, and the nonnegative host-gene basal level δi $ 0. Similarly, the host PPIN Eq. (15.2) can be rewritten in the following linear regression form: 2
yj ðt 1 1Þ 5 yj ðtÞy1 ðtÞ
? yj ðtÞyJj ðtÞ
yj ðtÞw1 ðtÞ ? yj ðtÞwMj ðtÞ xj ðtÞ
yj ðtÞ
6 6 6 6 6 6 1 6 6 6 6 6 6 4
bj1 ^ bjJj dj1 ^ djMj αj 1 2 γj κj
3
7 7 7 7 7 7 7 1 ϖj ðtÞ 7 7 7 7 7 5
(15.10) which could be simply represented as follows: HP yj ðt 1 1Þ 5 φHP j ðtÞθj 1 ϖj ðtÞ;
for j 5 1; . . .; J
(15.11)
where φHP j ðtÞ represents the regression vector obtained from the corresponding expression data; and θHP denotes the unknown parameter vector of the host-protein j in the host j PPIN to be estimated. The equations in (15.11) can be augmented for Fj time points of the jth host-protein as follows: HP HP YHP 5 ΦHP j j θ j 1 Γj
(15.12)
2 HP 3 3 2 3 φj ðt1 Þ yj ðt2 Þ ϖj ðt1 Þ 7 6 yj ðt3 Þ 7 HP 6 6 ϖj ðt2 Þ 7 φHP 6 7; Φ 5 6 6 7. j ðt2 Þ 7 5 where YHP 6 7; and ΓHP j j 54 4 5 j ^ ^ 5 4 5 ^ yj ðtFj 1 1Þ ϖj ðtFj Þ φHP j ðtFj Þ 2
The system identification problem of the host PPIN Eq. (15.2) can then be formulated by solving the following constrained least square parameter estimation problem: 1 HP HP 2 min :ΦHP j θj 2Yj :2 HP 2 θj 2 Jj 1 Mj 3 zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 2 3 6 0 ? 0 21 0 0 7 0 6 7 HP 7θ # 4 1 5 0 1 0 subject to 6 0 ? 0 6 7 j 40 ? 0 0 0 21 5 0
(15.13)
By solving the parameter estimation problem in (15.13), we can estimate the parameters in the host PPIN Eq. (15.2) and simultaneously guarantee the nonnegative host-protein coding rate αj $ 0, the nonpositive host-protein degradation 2γ j # 0, and the nonnegative host-protein basal level κj $ 0.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
348
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
By the same process in host GRN Eq. (15.5), the pathogen GRN Eq. (15.3) can be rewritten as the following regression form: 2
vn ðt 1 1Þ
5 w1 ðtÞ
? wMn ðtÞ
6 6 6 6 6 vn ðtÞmiRNA1 ðtÞ ? vn ðtÞmiRNAKn ðtÞ vn ðtÞ 1 6 6 6 6 4
en1 ^ enMn 2gn1 ^ 2gnKn 1 2 ρn ηn
3 7 7 7 7 7 7 1 ψn ðtÞ 7 7 7 5
(15.14) which could be simply represented as follows: PG vn ðt 1 1Þ 5 φPG n ðtÞθn 1 ψn ðtÞ;
for n 5 1; . . .; N
(15.15)
where φPG n ðtÞ represents the regression vector obtained from the corresponding expression data; and θPG n denotes the unknown parameter vector of the pathogen gene n in the pathogen GRN equation to be estimated. The equations in (15.15) can be augmented for Fn time points of the nth pathogen-gene as follows: 2
3 vn ðt2 Þ 6 vn ðt3 Þ 7 6 7, where VPG n 54 5 ^ vn ðtFn 1 1Þ
PG PG PG VPG n 5 Φn θn 1 Γn 2 PG 3 2 3 φn ðt1 Þ ψn ðt1 Þ 6 φPG ðt Þ 7 6 ψn ðt2 Þ 7 6 n 2 7 6 7. ΦPG 7; and ΓPG n 56 n 54 ^ 5 4 5 ^ ψn ðtFn Þ φPG ðt Þ n
(15.16)
Fn
The system identification problem of the pathogen GRN Eq. (15.3) can then be formulated by the following constrained least square parameter estimation problem:
2
1 PG PG 2 min :ΦPG n θn 2Vn :2 PG 2 θn Mn
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 60 ? 0 6 6^ & ^ 6 subject to 6 60 ? 0 60 ? 0 6 40 ? 0
Kn
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 1 ? 0 ^ & ^ 0 ? 1 0 ? 0 0 ? 0
0 ^ 0 1 0
3
2 3 0 7 0 7 6^7 ^ 7 7 PG 6 7 6 7 0 7 7θn # 6 0 7 7 415 0 7 5 21 0
(15.17)
By solving the problem in (15.17), we can identify the parameters in the pathogen GRN Eq. (15.3) and simultaneously guarantee the nonpositive hostmiRNA repression 2gnk # 0, the nonpositive pathogen-gene degradation 2ρn # 0, and the nonnegative pathogen-gene basal level ηn $ 0.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
349
15.2 Materials and methods for constructing cross-talk GWGEINs and their core networks
As the same process in host PPIN Eq. (15.10), the pathogen PPIN Eq. (15.4) can be reformulated in the following linear regression form: 2
wm ðt 1 1Þ 5 wm ðtÞw1 ðtÞ
? wm ðtÞwMm ðtÞ
wm ðtÞy1 ðtÞ ? wm ðtÞyJm ðtÞ
vn ðtÞ
6 6 6 6 6 6 wm ðtÞ 1 6 6 6 6 6 4
hm1 ^ hmMm dm1 ^ dmJm βm 1 2 σm εm
3 7 7 7 7 7 7 7 1 τ m ðtÞ 7 7 7 7 5
(15.18) which could be simply represented as follows: PP wm ðt 1 1Þ 5 φPP m ðtÞθm 1 τ m ðtÞ;
for m 5 1; . . .; M
(15.19)
where φPP m ðtÞ denotes the regression data; and θPP m denotes the unknown
vector obtained from the corresponding expression parameter vector of the pathogen protein m in the pathogen PPIN to be identified. The equations in (15.19) can be augmented for Fm time points of the mth pathogen-protein as follows: PP PP PP WPP m 5 Φm θm 1 Γm 2 PP 3 2 3 2 3 φm ðt1 Þ wm ðt2 Þ τ m ðt1 Þ 6 7 6 wm ðt3 Þ 7 PP 6 φPP ðt2 Þ 7 6 τ m ðt2 Þ 7 6 7, Φ 5 6 m 6 7. where WPP 7, and ΓPP m 54 m 54 5 m ^ ^ 5 4 5 ^ wm ðtFm 1 1Þ τ m ðtFm Þ φPP ðt Þ m
(15.20)
Fm
The system identification problem of the pathogen PPIN Eq. (15.4) can then be formulated by solving the following constrained least square parameter estimation problem:
2
1 PP PP 2 min :ΦPP m θm 2Wm :2 PP 2 θm M m 1 Jm
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 60 ? 0 6 subject to 6 60 ? 0 40 ? 0
21 0 0
0 1 0
3
2 3 0 7 0 7 PP 415 θ 0 7 # 7 m 21 5 0
(15.21)
By solving the problem in (15.21), we can identify the parameters in the pathogen PPIN Eq. (15.4) and simultaneously guarantee the nonnegative pathogen-protein coding rate β m $ 0, the nonpositive pathogen-protein degradation 2σm # 0, and the nonnegative pathogen-protein basal level εm $ 0. For the accuracy of the system identification method, we need to interpolate extra time points (five times number of the parameters, θHG in host i PP GRN, θHP in host PPIN, θPG j n in pathogen GRN, and θm in pathogen PPIN to be estimated) by using the cubic spline method to avoid the overfitting in the parameter estimation process [40]. The solutions of the above constrained least square parameter estimation
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
350
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
problems in (15.9), (15.13), (15.17), and (15.21) can be obtained by using the function lsqlin in MATLAB optimization toolbox based on a reflective Newton method for minimizing a quadratic function [686]. The connection weights in cross-talk GWGEIN in Mφs and Mtb (or DCs and Mtb) during the infection process can be finally solved one gene by one gene (or one protein by one protein) by using the corresponding microarray data.
15.2.5 System order detection scheme of the dynamic system models of GWGEIN Due to the candidate cross-talk GWGEIN obtained from all computational and experimental predictions, we have applied a system order detection scheme to the host GRN model in (15.6), the host PPIN model in (15.11), the pathogen GRN model in (15.15) and the pathogen PPIN model in (15.19) to prune false positives in the candidate cross-talk GWGEIN using the microarray data of Mφs, DCs, and Mtb. Based on the theory of system identification [40], the insignificant parameters in the system models of GWGEIN that are out of system order (association number in network) will be deleted according to AIC. As a result, false positives of the candidate cross-talk GWGEIN are deleted by AIC using the microarray data of Mφs, DCs, and Mtb and then we could obtain the real cross-talk GWGEINs between Mφs and Mtb and between DCs and Mtb during infection process. In host GRN model (15.6), AIC of the ith host-gene could be defined by the following equation [687,688]: 0 1 tFi X HG HG 1 T HG ^ ^ @ A 1 2ðJi 1 Ki Þ AICHG xi ðt11Þ2φHG i ðJi 1 Ki Þ 5 log i ðtÞθi Þ ðxi ðt 1 1Þ 2 φi ðtÞθi Fi t5t1 Fi (15.22) HG ^ where θi denotes the estimated parameters of the ith host-gene by solving the parameter estimation problem in (15.9); and the estimated residual error is P tF i HG ^ HG T ^ HG ðxi ðt11Þ2φHG σ^ HG;i 2 5 t5t i ðtÞθi Þ ðxi ðt 1 1Þ 2 φi ðtÞθi Þ=Fi . By the tradeoff between the 1 residual error and parameter association number in the second term of (15.22), the in (15.22) can be achieved at the number Ji 1 Ki of the real gene/ minimum ALCHG i miRNA regulations in host GRN [40,687,688]. Therefore, after deleting insignificant false positives out of Ji 1 Ki , the real host GRN of the cross-talk GWGEIN can be solved by the minimum ALCHG in (15.22) one gene by one gene. i In a similar way, in host PPIN model (15.11), AIC of the jth host-protein could be defined by the following equation: 0 1 tF j T X ^ HP ^ HP A 1 2ðJj 1 Mj Þ @1 AICHP yj ðt11Þ2φHP yj ðt 1 1Þ 2 φHP j ðJj 1 Mj Þ 5 log j ðtÞθj j ðtÞθj Fj t5t1 Fj (15.23) HP θ^ j
where denotes the estimated parameters of the jth host-protein by solving the parameter estimation problem in (15.13); and the estimated residual error is PtFj HP ^ HP T ^ HP σ^ HP;j 2 5 t5t ðyj ðt11Þ2φHP j ðtÞθj Þ ðyj ðt 1 1Þ 2 φj ðtÞθj Þ=Fj . By the tradeoff between the 1 residual error and parameter association number in the second term of (15.23), the
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
15.2 Materials and methods for constructing cross-talk GWGEINs and their core networks
351
minimum ALCHP in (15.23) can be achieved at the number Jj 1 Mj of the real PPIs in j host PPIN. Therefore, after deleting insignificant false positives out of Jj 1 Mj , the real host PPIN of the cross-talk GWGEIN can be solved by the minimum ALCHP in (15.23) j one protein by one protein. Similar to the above definitions in host GRN (15.22) and PPIN (15.23), AICs of the nth pathogen-gene and the mth pathogen-protein could be respectively defined by the following equations: 0 1 T tF n X PG PG 1 ^ ^ @ A 1 2ðMn 1 Kn Þ AICPG vn ðt11Þ2φPG vn ðt 1 1Þ 2 φPG n ðMn 1 Kn Þ 5 log n ðtÞθn n ðtÞθn Fn t5t1 Fn 0 @ AICPP m ðMm 1 Jm Þ 5 log
tFm X
1 Fm t5t1
(15.24) 1 T ^ PP ^ PP A 1 2ðMm 1 Jm Þ wm ðt11Þ2φPP wm ðt 1 1Þ 2 φPP m ðtÞθm m ðtÞθm Fm (15.25)
PG θ^ n
PP θ^ m
where and denote the estimated parameters of the nth pathogen-gene and the mth pathogen-protein by solving the problem in (15.17) and (15.21), respectively; and the estimated residual errors of the nth pathogen-gene and the mth pathogen-protein are P tF n PtFm PG ^ PG T ^ PG σ^ 2PG;n 5 t5t ðvn ðt11Þ2φPG ^ 2PP;m 5 t5t ðwm ðt11Þ2 n ðtÞθn Þ ðvn ðt 1 1Þ 2 φn ðtÞθn Þ=Fn and σ 1 1 PP T PP PP PP ^ ^ φm ðtÞθm Þ ðwm ðt 1 1Þ 2 φm ðtÞθm Þ 5 =Fm , respectively. By the tradeoff between residual error PP and parameter association number, the minimum ALCPG n in (15.24) and the minimum ALCm in (15.25) can be achieved at the numbers Mn 1 Kn of the real gene/miRNA regulations in pathogen GRN and at the numbers Mm 1 Jm of the real PPIs in pathogen PPIN, respectively. Therefore the real pathogen GRN and PPIN of the cross-talk GWGEIN can be solved by the PP minimum ALCPG n in (15.24) and the minimum ALCm in (15.25), respectively. By applying a system identification method and a system order detection scheme to the dynamic models of the cross-talk GWGEIN to prune false positives in the candidate crosstalk GWGEIN using the microarray data of Mφs, DCs, and Mtb, we can then identify the real cross-talk GWGEINs between Mφs and Mtb (Fig. 15.2) and between DCs and Mtb (Fig. 15.3) during the infection process. Information about the nodes and edges of GWGEINs are shown in Tables 15.1 and 15.2, respectively. Because the hostpathogen interaction process in Mtb infection is very complex, it is difficult to investigate the pathogenic and defense mechanisms between host and pathogen from GWGEINs in Figs. 15.2 and 15.3. In this situation, we could apply the PNP method to the real cross-talk GWGEINs to extract the principal network structures of the real networks. miRNA, MicroRNA; PPI, proteinprotein interaction; TF, transcription factor.
15.2.6 Core network extraction from the real cross-talk GWGEIN by applying the PNP method Before using the PNP method, it is necessary to construct a combined network matrix H containing the estimated parameters in the real cross-talk GWGEIN as follows: V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
352
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
FIGURE 15.2 The real cross-talk GWGEIN in Mφs. It represents the real genetic-and-epigenetic network connection of host and pathogen in Mφs infected with Mtb. The edges in red represent the gene regulations of hostmiRNA on host- or pathogen-genes. The edges in blue color represent the PPINs of host, pathogen, and hostpathogen. The edges in gray represent the gene regulations of TFs on genes of host or pathogen [11]. GWGEIN, Genome-wide genetic-and-epigenetic interspecies network; miRNA, microRNA; Mtb, Mycobacterium tuberculosis; PPIN, proteinprotein interaction network; TF, transcription factor.
2 6 6 6 6 6 6 6 6 6 6 6 6 6 H56 6 6 6 6 6 6 6 6 6 6 6 6 4
b^11 ^ b^J1 d^11 ^ d^M1 a^11 ^ a^J1 0 ^ 0 2 c^11 ^ 2 c^1K
? b^jl ? ? d^mj ? ? a^ij ? ? & ? ? 2 c^ik ?
b^1J ^ b^JJ d^1J
d^11 ^ d^J1 h^11
^ d^MJ a^1J ^ a^JJ 0 ^ 0 2 c^J1 ^ 2 c^JK
^ h^M1 0 ^ 0 e^11 ^ e^M1 2 g^ 11 ^ 2 g^ 1K
? d^jm ? ? h^mq ? ? & ? ? e^nm ? ? 2 g^ nk ?
d^1M ^ d^JM h^1M
3
7 7 7 7 7 7 7 7 ^ 7 7 h^MM 7 7 0 7 7Aℝð2J12M1KÞ 3 ðJ1MÞ ^ 7 7 0 7 7 e^1M 7 7 ^ 7 7 e^MM 7 7 2 g^ M1 7 7 ^ 5
2 g^ MK
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
353
15.2 Materials and methods for constructing cross-talk GWGEINs and their core networks
FIGURE 15.3 The real cross-talk GWGEIN in DCs. It represents the real genetic-and-epigenetic network connection of host and pathogen in DCs infected with Mtb. The edges in red represent the gene regulations of hostmiRNA on host-genes or pathogen-gene. The edges in blue color represent the PPINs of host, pathogen, and hostpathogen. The edges in gray represent the gene regulations of TFs on genes of host or pathogen [11]. DC, Dendritic cell; GWGEIN, genome-wide genetic-and-epigenetic interspecies network; miRNA, microRNA; Mtb, Mycobacterium tuberculosis; PPIN, proteinprotein interaction network; TF, transcription factor. TABLE 15.1 Information regarding the nodes of the genome-wide genetic-and-epigenetic interspecies networks between Mφs and Mtb (Mycobacterium tuberculosis) and between dendritic cells (DCs) and Mtb [11]. HostmiRNAs
Host-TFs
Pathogen-TFs
Host-genes
Pathogen-genes
Mφs versus Mtb
23
82
92
9274
3070
DCs versus Mtb
21
63
87
9827
3050
miRNA, MicroRNA; TF, transcription factor. HG where a^ij and 2 c^ik are obtained in θ^ i estimated in (15.9) and pruned by (15.22); b^jl and HP d^jm are obtained in θ^ j estimated in (15.13) and pruned by (15.23); e^nm and 2 g^ nk are PG obtained in θ^ n estimated in (15.17) and pruned by (15.24); and h^mq and d^mj are obtained in PP θ^ m estimated in (15.21) and pruned by (15.25). a^ij and e^nm represent transcriptional regulatory abilities in the intraspecies GRNs of host and pathogen, respectively; b^jl and h^mq represent interactive abilities in the intraspecies PPINs of host and pathogen, respectively; 2 c^ik
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
354
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
TABLE 15.2 Information regarding the edges of the genome-wide genetic-and-epigenetic interspecies networks between Mφs and Mtb (Mycobacterium tuberculosis) and between dendritic cells (DCs) and Mtb [11]. Mφs versus Mtb
DCs versus Mtb
387
44
1
0
Host-TF regulations of host-genes
554
364
Pathogen-TF regulations of pathogen-genes
3468
3476
Host-TF and pathogen-protein interactions
45
24
Pathogen-TF and host-protein interactions
1
1
78,502
101,452
Pathogen PPIs
5758
5783
Hostpathogen PPIs
1086
1169
HostmiRNA repressions of host-genes HostmiRNA repressions of pathogen-genes
Host PPIs
miRNA, MicroRNA; PPI, proteinprotein interaction; TF, transcription factor.
and 2 g^ nk denote miRNA repression abilities in the intraspecies GRNs of host and pathogen, respectively; and d^jm and d^mj denote the interactive abilities between the jth host-protein and the mth pathogen-protein in the interspecies PPIN between host and pathogen. The estimated weights of the network connections in intraspecies GRNs, intraspecies PPINs, and the interspecies PPIN therefore construct the network matrix H of the real cross-talk GWGEIN. If a connection does not exist in the candidate cross-talk GWGEIN or has been pruned by AIC, the corresponding parameter in network matrix H is padded with zero. PNP is then applied to H to extract the HPCN of the real cross-talk GWGEIN. PNP is based on the singular value decomposition of H as follows: H 5 UDV T (2J12M1K) 3 (J1M)
(15.26)
(J1M) 3 (J1M)
where UAℝ ; VAℝ ; and D 5 diag(d1, . . . , ds, . . . , dJ1M) consists of the J 1 M singular values of H in descending order, that is, d1 $ ? $ ds $ ? $ dJ1M . Note that the diag(d1, d2) denotes the diagonal matrix of d1 and d2. The eigenexpression fraction (Es) is defined as follows: Es 5
d2s J1M P s51
(15.27)
d2s
P We select the top Q singular vectors of H with the minimal Q such that Q s51 Es $ 0:85 which means that the Q principal components contain 85% core network structure of GWGEIN from the perspective of energy. The projection (T) of each row in H to the top Q singular vectors V is defined as follows: Tðp; sÞ 5 hp 3 vs ;
for p 5 1; . . . ; ð2J 1 2M 1 KÞ and s 5 1; . . . ; Q
(15.28)
where hp and vs denote the pth row vector of H and the sth column vector of V, respectively. We then define the 2-norm projection value of each node, that is, gene/miRNA/
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
15.2 Materials and methods for constructing cross-talk GWGEINs and their core networks
355
protein, in the real cross-talk GWGEIN to the core network structure consisted of the top Q singular vectors as follows: " #1=2 Q X 2 Tðp; sÞ ; for p 5 1; . . . ; ð2J 1 2M 1 KÞ (15.29) DðpÞ 5 s51
If D(p) is close to zero, the pth node (i.e., its network connections) is almost independent of the core network structure consisted of the top Q singular vectors. Since the purpose of the identification of the core network is to investigate the offense and defense mechanisms between host and pathogen from a perspective of signal transduction pathways, the proteins with top D(p) from receptors to TFs and their connected miRNAs and genes are chosen as the core proteins/genes/miRNAs to construct the core network. Therefore we extract the HPCNs from the real cross-talk GWEGINs between Mφs versus Mtb and DCs versus Mtb (Figs. 15.4 and 15.5), respectively. Consequently, HPCNs possess the principal network structure that represents the core GWGEINs during Mtb infection.
FIGURE 15.4 HPCN in Mφs infected with Mtb. HPCN is extracted from GWGEIN in Mφs during the early Mtb infection by PNP. It represents the principal genetic-and-epigenetic network connection of host and pathogen in Mφs infected with Mtb. The edges in red represent the gene regulations of hostmiRNA on host- or pathogen-genes. The edges in blue represent the PPINs of host, pathogen, and hostpathogen. The edges in gray represent the gene regulations of TFs on genes of host or pathogen [11]. GWGEIN, Genome-wide genetic-and-epigenetic interspecies network; HPCN, hostpathogen core network; Mtb, Mycobacterium tuberculosis; PNP, principal network projection; TF, transcription factor.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
356
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
FIGURE 15.5 HPCN in DCs infected with Mtb. HPCN is extracted from GWGEIN in DCs during the early Mtb infection by PNP. It represents the principal genetic-and-epigenetic network connection of host and pathogen in DCs infected with Mtb. The edges in red represent the gene regulations of hostmiRNA on host-genes or pathogen-genes. The edges in blue represent the PPINs of host, pathogen and hostpathogen. The edges in gray represent the gene regulations of TFs on genes of host or pathogen [11]. DC, Dendritic cell; GWGEIN, genomewide genetic-and-epigenetic interspecies network; HPCN, hostpathogen core network; miRNA, microRNA; Mtb, Mycobacterium tuberculosis; PNP, principal network projection; PPIN, proteinprotein interaction network; TF, transcription factor.
15.3 Investigating pathogenic/host defense mechanism to identify drug targets 15.3.1 GWGEINs of Mφs and DCs infected with Mtb The real GWGEINs of Mφs and DCs are shown in Figs. 15.2 and 15.3, respectively. The number of nodes and edges are shown in Tables 15.1 and 15.2, respectively. There is no significant difference in the number of nodes between Mφs and DCs. However, the edges between both cell types in Table 15.2 have demonstrated remarkable difference in hostmiRNAs to host-genes (DC:44, Mφ:387) and host PPIs (DC:101452, Mφ:78502). The results show that there are more miRNA regulations in Mφs than in DCs. Because more genes are inhibited by miRNAs, it could result in that there are less PPIs in Mφs than in DCs. Interestingly, the identified PPIs between pathogen-TFs and host proteins (i.e., Rv1423 negatively interacts with BNIP3L: 216 in Mφs and 227 in DCs) in both cell types can indicate that Rv1423 may interfere with BNIP3L, which is involved in the apoptosis in
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
15.3 Investigating pathogenic/host defense mechanism to identify drug targets
357
FIGURE 15.6 The functional networks of GWGEINs in Mφs and DCs during the early Mtb infection. After constructing the GWGEINs of Mφs and DCs during the Mtb infection, we use GO to analyze the biological processes in both cell types. The number below each function represents the gene number in GWGEINs that participate in the cellular function. By comparing the gene number of each cellular function between Mφs and DCs, we find that DCs are able to more effectively control Mtb infection than Mφs [11]. DC, Dendritic cell; GO, gene ontology; GWGEIN, genome-wide genetic-and-epigenetic interspecies network; Mtb, Mycobacterium tuberculosis.
Mφs and DCs. We also plotted the functional networks of GWGEINs in Mφs and DCs are also plotted in Fig. 15.6. The number of genes participating in the innate immune response, antigen processing and presentation, cytokine production, and apoptosis is found to be much higher in DCs than in Mφs, suggesting that DCs are more responsive to the infection of Mtb than Mφs. However, the GWGEINs are still complex for us to investigate the defense mechanisms between host and pathogen. Therefore we extract HPCNs from GWGEINs in both Mφs and DCs during the early Mtb infection via the PNP method (Figs. 15.4 and 15.5).
15.3.2 HPCNs in Mφs and DCs infected with Mtb 15.3.2.1 The biological processes of the host core networks in both cell types By using the PNP method, we can extract the HPCNs in Mφs and DCs infected with Mtb as shown in Figs. 15.4 and 15.5, respectively. In order to get an overview of the molecular mechanisms of the host during infection, we could use gene ontology (GO) to analyze the biological processes of the host core networks in Mφs and DCs. In addition, based on the PPIs of HPCNs, we can construct the functional networks of HPCNs in Mφs and DCs as shown in Fig. 15.7. We use GO to analyze the biological processes of HPCNs in Mφs and DCs during the Mtb infection. The gene number participating in cell growth in Mφs is higher than in DCs, whereas the gene number participating in apoptosis in DCs is higher than Mφs. Thus DCs
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
358
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
FIGURE 15.7 The functional networks of HPCNs in Mφs and DCs during early Mtb infection. DC, Dendritic cell; HPCN, hostpathogen core network; Mtb, Mycobacterium tuberculosis.
tend to undergo apoptosis, and can avoid the influence from Mtb, whereas Mφs tend to go into cell growth and can be easily influenced by Mtb. Protein ubiquitination and ion homeostasis are only present in Mφs, which are required for Mφs to adapt to the rapid changes during infection. The IFN-γ-mediated and TLR signaling pathways are only present in DCs. In addition, protein deacetylation in DCs may protect DCs from the Mtb infectious mechanisms [11]. The result indicates that the number of genes participating in the innate immune response in DCs is found to be higher than in Mφs and the 110 innate immune response genes in DCs are found to be transcriptionally affected by 700 positive regulations and 653 negative regulations. In order to adapt to the infection of Mtb, posttranslation epigenetic modifications can be found in HPCNs such as the ubiquitination in Mφs and deacetylation in DCs. These epigenetic modifications can be detected by the basal level of κj in the host-protein expression model in (15.2). The host proteins (IRF2 and SUMO1) in the IFN-γ-mediated pathway, regulated by the deacetylase protein (HDAC1), the SUMO proteins (SUMO1, SUMO2, and SUMO3), and the ubiquitin proteins [UBC (ubiquitin C) and MUL1], and the host-protein (UBC) in TLR signaling pathway, regulated by the deacetylase protein (HDAC1), the acetyltransferase protein (DLAT), the methyltransferaseassociated protein (MTAP), the SUMO proteins (SUMO1, SUMO2, and SUMO3), and the ubiquitin proteins (UBC and MUL1), present in DCs can activate the immune response. Furthermore, the host proteins (AP3D1, ATG5, and PTPRC) in ion homeostasis, regulated by the ubiquitin protein (UBC) and the SUMO protein (SUMO2), are induced in Mφs to counteract the increase in metal ions during infection. Mφs can induce apoptosis during infection but Mtb can block this process. The number of cell growthrelated genes is found higher than those related to apoptosis (Fig. 15.7), which could facilitate the survival
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
15.3 Investigating pathogenic/host defense mechanism to identify drug targets
359
of Mtb in Mφs. However, in DCs, there are more apoptotic genes than cell growth genes. Furthermore, in Mφs, two proapoptotic proteins (MOAP1, and APAF1) are found involved in apoptosis, while, in DCs, three proapoptotic proteins (BCL2, CFLAR, and AIFM1) and one antiapoptotic protein (XIAP) are involved in apoptosis. We have further identified that the host-protein SKP1, involved in the innate immune response, can positively regulate the proapoptotic proteins in DCs. This indicates that DCs are less susceptible than Mφs and readily induce apoptosis mediated by the innate immune response as a means of eliminating Mtb. 15.3.2.2 Hostpathogen cross-talk interactions in both cell types The cross-talk between pathogen and host has been widely investigated. Nevertheless, the real genetic and epigenetic network connections and hostpathogen PPIs are still unclear. Here, we compare HPCNs to show the common and different cross-talk interactions between Mφs and DCs during Mtb infection (Fig. 15.8). It has been shown that 126 Mtb genes which are found by the analysis of transposon mutant pools to be required for the survival, are constitutively expressed rather than regulated at least in the primary mouse Mφs [689]. Seven Mtb proteins (Rv1049, Rv1681, Rv1337, Rv3868, Rv0928, Rv3283, and Rv3269) of the HPCN in Mφs infected with Mtb (Fig. 15.4) and three Mtb proteins (Rv1049, Rv0082, and Rv3369) of the HPCN in DCs infected with Mtb (Fig. 15.5), which are encoded by the Mtb genes required for the survival of Mtb in host, have been identified in this systems biology method. In addition, by identifying mutations that alter the phenotypic consequence (i.e., genetic interactions) of inactivating a gene of interest, 66 Mtb genes that encode the proteins which associate to form the multisubunit transporters required for the Mtb survival in the host have also been found [690]. Two Mtb proteins (Rv2004c and Rv3877) of the HPCN in Mφs infected with Mtb (Fig. 15.4) and an Mtb protein (Rv0427c) of the HPCN in DCs infected with Mtb (Fig. 15.5), which are encoded by the Mtb genes involved in the transporter assembly required for Mtb survival in host, have been also identified in this systems biology method. Nevertheless, the comparison between two HPCNs in Mφs and DCs infected with Mtb (Fig. 15.8) does not contain any crucial Mtb gene/protein required for the survival of Mtb in host. The comparison result in Fig. 15.8 contains 11 differentially expressed pathogen proteins and 17 nondifferentially expressed pathogen proteins between two host cells. The reason is that the result in Fig. 15.8 comprises the proteins or genes, which have the most differential interactions or regulations between two HPCNs in Mφs and DCs infected with Mtb. It intuitively indicates that the Mtb essential genes/proteins are involved in the HPCNs in Mφs and DCs infected with Mtb to respectively assist the different defense mechanisms between Mφs and DCs infected with Mtb from a perspective of signal transduction pathway. In Fig. 15.8, there are three pathogen proteins (Rv0667, Rv0762c, Rv1438) interacting with host proteins [UBC, EGFR (epidermal growth factor receptor)]. Further, there are other pathogen proteins (Rv2404c, Rv1696, and Rv1098c) that may help Rv0667, Rv0762c, and Rv1438 promote hostpathogen interactions. This suggests that Mφs are more easily influenced by Mtb than DCs during the early infection, and therefore specific interactions are present only in Mφs. Moreover, Rv2404c, Rv1696, and Rv1098c (their expressions are all significantly activated in Mφs (P-value , .05) rather than in DCs) promote cross-talk interactions in Mφs, inducing the influence on Mφ invasion. In addition, Rv2404c, Rv0667,
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
FIGURE 15.8 Comparison of the HPCNs in Mφs and DCs infected with Mtb during the early infection. HPCNs contain the major structure of GWGEINs via the PNP method, which could allow the investigation of the underlying mechanisms of the host and pathogen. HPCNs could highlight the different signaling pathways between Mφs and DCs during Mtb infection. The upper half shows the pathogen network and the lower half the host network. The edges with solid black lines represent presence in both cell types. The names in red, blue, and black represent the proteins specific in Mφs, DCs, and both cell types, respectively. We denote the lightning bolts to represent proteins with epigenetic modifications due to the significant basal level changes between Mφs and DCs [11]. DC, Dendritic cell; GWGEIN, genome-wide genetic-and-epigenetic interspecies network; HPCN, hostpathogen core network; PNP, principal network projection.
15.3 Investigating pathogenic/host defense mechanism to identify drug targets
361
Rv0762c, Rv1098c, Rv1696, and Rv1438 are localized on the cell wall. This is consistent with the pathogenesis of Mtb in which membrane proteins could influence the host immune response and enhance pathogen survival ability. The host-protein UBC is associated with the TLR signaling pathway. It is also identified that the host-protein UBC is negatively affected by the pathogen via Rv0667, Rv0762c, and Rv1438 Mφs, while it is positively affected by the pathogen Rv1438 in DCs. Consequently, Mtb can alter the host immune system in both Mφs and DCs by binding to UBC. Further, UBC plays a role in the phagocytosis as well as the antigen processing and presentation processing [691]. Accordingly, Mtb could avoid the host antigen processing and presentation processing by binding to UBC in both Mφs and DCs. Although Mtb could affect UBC in both cell types, the three negative interactions of UBC by pathogen in Mφs might result in more influence on the immune system and antigen presentation than in DCs. This accounts for why Mφs are found to be more susceptible to Mtb than DCs. EGFR is also found to be involved in cell growth. Pathogen protein Rv1438 can interact with EGFR in both Mφs and DCs, and affect the process of cell growth (Fig. 15.8). By doing this, Mtb can survive in host cells without being killed by the apoptosis process. Since Mtb could promote cell growth, a recent study has shown that the inhibition of EGFR could restrict the growth of Mtb [692]. In addition, EGFR is also involved in the MAPK signaling pathway, which could transduct extracellular signals from receptors on the membrane to the DNA in the nucleus of the cell. It is also identified that the oxidative stress responsive proteins (OXSR1 and OSGIN1) are negatively affected by UBC in Mφs. Moreover, it has been proposed that Helicobacter pylori infection could induce oxidative stress [693], which could also increase the DNA mutation risk by inhibiting oxidative stress responsive proteins [694]. The interaction of the pathogen protein, Rv1438, may cause the dysregulated EGFR in Mφs and influence the MAPK signaling pathway. Even if Rv1438 could interact with EGFR during the Mtb infection of Mφs and DCs, the expression of Rv1438 is higher in Mtb and the oxidative stress responsive proteins (OXSR1 and OSGIN1) are negatively affected by UBC, infecting Mφs, and causing more reactive intermediates that could lead to mutations in macrophages in Mφs during the Mtb infection. EGFR is not only associated with cell growth but also involved in the PI3K-Akt signaling pathway, which can play a significant role in the signal transduction pathways of cytokines. Rv1438 can interrupt the immune response, causing a delay in the inflammation, thus expediting the invasion of Mtb during the early infection. Further, it has been reported that Rv1438 is an important gene for in vitro growth of Mtb H37Rv [695]. Altogether, Rv1438 can induce the EGFR dysregulation through the interspecies interaction between them, which could interrupt the antigen presentation via UBC. Owing to its participation in several cellular functions in the host, Rv1438 could be a potential drug target for the future treatment of TB. 15.3.2.3 Host responses in Mnterrupt the antigen pMtb infection Human cells possess the ability to change the behavior of proteins in order to adapt to rapidly changing circumstance. This can be referred to as epigenetics or posttranslational modifications, including methylation, sumoylation, ubiquitination, and acetylation. The epigenetic influence on proteins is more efficient than the genetic influence on DNA transcription for adaptation to the changing environment during infection.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
362
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
In both cell types, there are proteins involved in the ubiquitination such as UBC and CUL3, the deacetylation such as HDAC1 and SIRT7, and the sumoylation such as SUMO1, SUMO2, and SUMO3 (Fig. 15.8). Ubiquitination can affect proteins in many ways such as inducing the degradation via proteasome, and altering their cellular location. As shown in Fig. 15.8, UBC has edges that are more specific in Mφs than they are in DCs, and CUL3 participates in DCs more than in Mφs. Further, the expression of HDAC1 is higher in DCs, and there are differential basal levels of UBC between Mφs and DCs. This indicates that HDAC1 has a higher activity in Mφs and may affect UBC, which is involved in several immune responses through deacetylation, which results in a change in basal levels. Furthermore, it has been reported that HDAC inhibitors could induce the inhibition of host immune response against microbial pathogens in Mφs and DCs [696]. Specifically, HDAC could function such as histone deacetylation to enhance the immune response in the Mφs against bacterial infection. UBC may be affected by not only the deacetylation but also by the sumoylation. Further, the proteins ELAVL1 and SUMO1/2/3 may be influenced by the deacetylation via their interaction with HDAC1. Oxidative stress is found to be a reflection of the imbalance between reactive oxygen species (ROS) and the biological system’s ability to detoxify the reactive intermediates or to repair the resulting damages. High ROS production and disturbances in normal redox state could lead to oxidative stress. Oxidative stress is considered as a primary response of the immune system, and the induction of ROS during infection could help the immune system to kill pathogens. Nevertheless, the simultaneous production of ROS and free radicals could damage cellular components, including proteins, lipids, and DNA. Amyloid precursor protein (APP) is found to be involved in the response to oxidative stress, and the APP receptor can receive the oxidative stress signals induced by the immune system during the Mtb infection. The higher APP expression in Mφs than in DCs demonstrates that there is a more oxidative stress in Mφs. The high ROS production not only helps the host kill the pathogen but also causes the DNA damage, and therefore the host cell must typically inhibit the ROS production. The tyrosine kinase SYK also plays a role in the ROS production [697]. High expression of APP in Mφs reflects the high oxidative stress (Fig. 15.9A). Consequently, APP can signal via SYK to induce the ROS production. mir-224 has been characterized as an inhibitor of ROS production through silencing SYK, but the low expression of mir-224 in Mφs demonstrates that Mφs still need SYK to produce ROS. The inhibition of the over production of ROS is only observed in Mφs, emphasizing the difference in defense mechanisms between Mφs and DCs. In particular, Mφs prioritize killing the pathogen, whereas DCs prioritize the antigen presentation to activate an adaptive immune response. The deacetylation of UBC may enhance the downstream immune response to protect the host from the invasion of Mtb. Nevertheless, pathogen proteins Rv0667, Rv0762c, and Rv1438 may interfere with the production of SYK through interacting with UBC to prevent ROS-mediated elimination of the pathogen. CD84, a member of the signaling lymphocyte activation molecule (SLAM) family, can mediate several immune responses, including T cell/B cell activation and antibody production (Fig. 15.9A). CD84 is a hemophilic receptor expressed on T cells, B cells, DCs, monocytes, and Mφs. The expression of CD84 can increase the activation of T cells, B cells, and DCs. SYK is also involved in the adaptive immune response. CD84 affects SYK through interacting with UBC in Mφs, thus mediating the adaptive immune response. This is a mechanism used by Mφs to activate the adaptive immune response during the
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
FIGURE 15.9 The signaling pathways of the host molecular mechanisms based on HPCNs in Fig. 15.8 in Mφs and DCs during the Mtb infection. (A) ROS production is present only in Mφs and may be influenced by Mtb proteins through the interaction with UBC. In addition, CD84 signals to SYK to activate lymphocytes through UBC. The deacetylation of UBC avoids the influence from Mtb proteins. (B) The high activity of DNA repair in Mφs reflects DNA damage caused by the oxidative stress in Mφs and is dysregulated by Mtb proteins in spite of the deacetylation of PMS2P1. (C) The induction of cell growth in Mφs may be influenced by Mtb proteins through the interaction with UBC despite the ubiquitination of PRKAR1A and EGFR methylation. DCs avoid cell growth through the inhibition of mir-612 and mir-636. (D) Differential regulation of RPS10, which is involved in the cell growth of both cell types, may be influenced by Mtb proteins through the interaction with UBC. Although RPS10 induces the ubiquitination in Mφs, Mtb proteins can influence RPS10 by inducing its activity in both cell types. (E) High expression of PFDN5 in Mφs indicates that Mφs tend to correct unfolded or misfolded proteins, whereas PFDN5 is inhibited by mir-224 in DCs. The accumulation of unfolded or misfolded proteins and antigens in DCs leads to the formation of the DALIS to delay the degradation via the proteasome at an early stage of maturation, until during the late stage maturation of DCs [11]. DALIS, Dendritic cell aggresome-like induced structures; DC, dendritic cell; HPCN, hostpathogen core network; Mtb, Mycobacterium tuberculosis; ROS, reactive oxygen species; UBC, ubiquitin C.
364
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
infection. Nevertheless, the activation of adaptive immunity can be blocked by the pathogen via the interaction of UBC to prevent antibody production and promote its survival. Other differences in the response to oxidative stress initiated in Mφs and DCs were also identified. RPL13A plays a role in ubiquitination and assists ETS1 in its translocation to the nucleus for transcription (Fig. 15.9B). The PMS2P1 gene is regulated by ETS1 in Mφs and DCs. The high expression of PMS2P1 in Mφs might be due to the high activity of ETS1. The main cellular function of PMS2P1 is to repair the DNA damage caused by oxidative stress. Consequently, the high expression of PMS2P1 can indicate that there is a more oxidative stress in Mφs, which is required for DNA repair. Mtb proteins can also account for the high expression of PMS2P1 in Mφs. PMS2P1 can also induce deacetylation to control its activity [698]; nevertheless, Mtb proteins can counteract this process, causing high expression of PMS2P1. High DNA repair activity is an important mechanism of cancer progression. Further, ETS1 can function as an oncogene to drive tumorigenesis [699]. The high expression of ETS1 and PMS2P1 as well as the influence from Mtb proteins may play a crucial role in the progression from TB to lung cancer in Mφs. PRKAR1A encodes the type 1α regulatory subunit (RIα) of cAMP-dependent protein kinase. The RIα protein is upregulated in many cancer cell lines, suggesting its potential role in cell cycle regulation and growth. It has been shown that the overexpression of RIα in lung cancer [700] points out the growth of lung cancer cells. EGFR methylation negatively mediates the EGFR downstream pathway [701]. Nevertheless, Mtb proteins can affect PRKAR1A cellular function by interacting with EGFR and UBC despite the methylation of EGFR and the ubiquitination of PRKAR1A (Fig. 15.9C) [702]. The significant change (P , 5.67 3 10214) in the gene expression profiles of EGFR between Mtb-infected Mφs and DCs could support this finding. PRKAR1A has higher expression in Mφs than in DCs, which may cause the cell growth in Mφs. Cell growth can benefit the survival of pathogen. Pathogen proteins such as Rv0667, Rv1438, and Rv0762c may interfere with the activity of PRKAR1A by interacting with UBC. The higher expression in Mφs than DCs may result from the inhibition of mir-612. The high expression of mir-612 and mir-636 in DCs could inhibit PRKAR1A expression, but PRKAR1A is highly expressed in Mφs due to the low mir-612 and mir-636 expression in this cell type. This suggests that DCs are able to avoid the dysregulation of cell growth caused by the pathogen, which could indirectly reduce the ability of Mtb to residue in DCs. Ribosomal protein S10 (RPS10) is also related to the cell growth, as it plays a crucial role in the ribosome biogenesis [703]. RPS10 is regulated by TFs (YBX1, SREBF1, ETS1) in both Mφs and DCs (Fig. 15.9D). The ubiquitination of RPS10 can interfere with its activity, which is associated with cell growth in Mφs [704]. Nevertheless, the cellular function of RPS10 can indirectly be blocked by Mtb proteins through the interaction with UBC, as a means of controlling the cell growth in Mφs. The ubiquitination of YBX1 could facilitate its transport from the cytosol to the nucleus for transcriptional regulation. In Fig. 15.9E, PFDN5 is regulated by YBX1 and mir-224, but mir-224 is expressed at higher levels in DCs than it is in Mφs, causing the higher expression of PFDN5 in Mφs. The decreased expression of PFDN5, which plays a role in protein folding, and the oxidative stress in DCs could result in the induction of misfolded proteins. For the maintenance of cellular homeostasis, misfolded proteins must be ubiquitinated for degradation via the ubiquitin-proteasome pathway. It has been reported that the
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
15.3 Investigating pathogenic/host defense mechanism to identify drug targets
365
ubiquitinated misfolded proteins can accumulate in DCs to form a DC aggresome-like induced structure (DALIS), and the ubiquitinated proteins in DALIS are protected from degradation via the proteasome during the early stages of DC maturation [705]. At the late stage of maturation, the proteasome actively participates in the removal of DALIS [705]. In addition, DALIS could act as an Ag (antigen of Mtb) storage center during the DC maturation to prioritize the degradation of proteins in response to infection [706]. Altogether, the formation of DALIS helps DCs promote antigen processing during DC maturation for the presentation of the peptides on MHC molecules to B cells. 15.3.2.4 The defense mechanisms of Mtb in Min chanism Rv0667 (RpoB), a DNA-directed RNA polymerase that could catalyze the transcription of DNA to RNA, plays a crucial role in the Mtb infection. There are many PPIs of Mtb that interact with Rv0667 in Mtb, infecting Mφs and DCs (Fig. 15.8). Rv0667 is highly expressed in Mtb, infecting Mφs, and it interferes with the cellular function of the host-protein UBC to facilitate Mtb invasion (Fig. 15.10A). In addition, the high expression of Rv1438, Rv1098c, and Rv2404c could increase the activity of Rv0667 in Mφs. Due to its role in the Mtb infection, RpoB is the target of the drug rifampicin and also plays a role during the infection. Mutations are spontaneous and then selected under the drug selection pressure. Probably, DNA damage could increase the mutation rate overall. The function of Rv0762c is still unknown at present. It has been suggested that Rv0762c functions in fatty acid metabolism as an additional bacterial adaptation for Mtb to survive novel host-derived pressures within the phagosomal environment [707]. As shown in Fig. 15.10A, Rv0762c may influence the host-gene RPS10, which is involved in the cellular metabolic process via its interaction with UBC and ETS1 in Mφs. Another metabolic enzyme, Rv1438 (TpiA), triosephosphate isomerase, is an important enzyme for the gluconeogenesis and glycolysis and is crucial for the survival of Mtb [708]. The varying basal expression levels of Rv1438 may result from acetylation [709], which can enhance its interaction with host proteins. For example, Rv1438 could interact with EGFR and UBC in Mφs and DCs (Fig. 15.10A). Rv1438 may also influence host defense mechanisms through EGFR and UBC in both cell types. Copper (Cu) is a crucial element for the growth and development of most organisms, including bacteria. It has been shown that Cu in Mtb binds to various enzymes, including cytochrome c oxidase and Cu/Zn superoxide dismutase. Further, Cu helps Mtb resist oxidative stress [710], suggesting that Cu is crucial for Mtb survival. Nevertheless, overload of Cu in most systems is toxic. A study reported that the concentration of Cu is dramatically increased in phagolysosomes of mouse Mφs after infection with several Mycobacterium species [711]. Another study has demonstrated that guinea pigs could respond to the Mtb infection by increasing Cu concentration in the lung lesions, and this is consistent with a reduction in bacterial burden [712]. These findings have demonstrated that Cu is employed by the host immune system to control Mtb infection, and Mtb has evolved Cu homeostasis as a defense mechanism. The rv0969 (ctpV) gene is part of a Cuinduced operon and encodes a Cu-specific inner membrane efflux pump that can transport the excess Cu out of Mtb in order to prevent its toxicity. In order to maintain the Cu homeostasis in Mtb, it is possible that the bacteria has a repressor of Rv0969 in the absence of Cu. Rv0967 (CosR) can repress the expression of cso (copper sensitive operon) to induce
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
FIGURE 15.10 The signaling pathways of pathogen defense mechanisms based on HPCNs in Fig. 15.8 in Mφs and DCs during Mtb infection. (A) The proteins Rv0667, Rv1696, Rv2404c, Rv0762c, Rv1438, and Rv1098c are expressed on the cell membrane of Mtb, and Rv0667 and Rv1438 are essential for the survival of Mtb. These proteins may counteract the defense mechanisms of both cell types through interactions with EGFR and UBC. Therefore these proteins with higher expression and more specific edges in Mφs than DCs could cause the dysfunction of cell growth and DNA repair in Mφs. Some of these proteins have differential basal levels, whereas only Rv1438 has been found to undergo acetylation, which could facilitate its interaction with host proteins. (B) The metal-dependent homeostasis signaling pathway plays an important protective role to counteract the defense mechanisms of host cells. However, the metal-dependent homeostasis signaling pathway of Mtb varies between Mφs and DCs. Mtb evolves this pathway to counteract the metal burst from host cells and induce the metal homeostasis in Mφs and DCs during infection. The cellular function of Rv0970 is still unknown, whereas rv0970 is regulated by different TFs of Mtb in Mφs and DCs, and interacts with Rv0967 and Rv0969 after translation, a member of the metal-dependent homeostasis signaling pathway. Rv1675c regulates itself and interacts with the metal-dependent proteins or TFs in Mφs. Furthermore, Rv1675c is highly expressed and demonstrates specific edges in Mφs, which plays an important role in the metal-dependent homeostasis signaling pathway in Mφs. The inhibition of mir-636 on rv0353 in Mφs could decrease antibody production [11]. DC, Dendritic cell; EGFR, epidermal growth factor receptor; HPCN, hostpathogen core network; Mtb, Mycobacterium tuberculosis; TF, transcription factor; UBC, ubiquitin C.
15.3 Investigating pathogenic/host defense mechanism to identify drug targets
367
Cu in Mtb [713]. As shown in Fig. 15.10B, Rv0967 and Rv0969 are similarly expressed in Mφs and DCs. This suggests that there may have been Cu homeostasis of Mtb during its infection in both cell types. Rv0970 is an integral membrane protein; however, its main cellular function is currently unknown. The rv0970 gene is regulated by several TFs in both cell types, specifically Rv0324 and Rv1423 in DCs and Rv0081 and Rv2324 in Mφs, causing Rv0970 with no significant differential expression. This indicates that there are different regulatory functions of Rv0970 in Mφs and DCs. Furthermore, Rv0970 interacts with Rv0967 and Rv1969 in both cell types and may participate in the Cu homeostasis. Thus Rv0970 possibly plays an essential role in the Cu homeostasis in both cell types, and controls different responses to different circumstances in Mφs and DCs. The interspecies interaction between bacteriahost occurs from their evolutionary origin and these mechanisms are indicated in the evolution of the human (their homeostasis or their immune system). Most of bacteria that live with and within the human are part of the superorganism. There are different mechanisms employed by the host to limit the growth of Mtb such as nutrient and oxygen limitation, acidic pH, and formation of reactive oxygen intermediate/RNI, which could force Mtb to become dormant. Rv0081 is upregulated in the multiple latency models and is shown to be regulated by the dormancy survival regulator (DosR) [714]. DosR is found to be crucial for the survival of Mtb during anaerobic dormancy, which could mediate the entrance into and throughout the dormant state [715]. Another study has demonstrated that the control of bacterial replication in the latency in mice requires IFN-γ, TNF-α, and nitric oxide (NO) [716]. In addition, the expression of Rv0081 can induce Mtb exposure to NO. These studies suggest that Rv0081 may play a crucial role in the dormancy of Mtb. Nevertheless, the dormancy mechanism of Rv0081 is not demonstrated in our HPCNs because at the 18 h time point, granulomas have not yet formed in Mφs. Rv0081 has another role as a member of the ArsR/SmtB family of metal-dependent transcriptional regulators. As a result, it functions as a regulator of metal homeostasis of Mtb for preventing metal stress within the host [717]. As seen in Fig. 15.10B, Rv0081 has more specific PPI edges and regulatory genes in Mφs, which could suggest that Rv0081 acts as a metal-dependent transcriptional regulator of Mtb specifically within Mφs. Another TF, Rv0324, with an N-terminal HTH ArsR-type DNA-binding domain, might act as an ArsR metal-dependent transcriptional regulator [718]. Rv0324 has more specific PPI edges and regulatory genes in DCs, which could suggest that Rv0324 acts as a metal-dependent transcriptional regulator of Mtb specifically within DCs (Fig. 15.10B). Therefore Mtb could utilize a different metal-dependent homeostasis pathway within the two cell types, which may result from the adaptive response that allows Mtb to survive in different circumstances. Rv2234 (PtpA), a protein tyrosine phosphatase, has been shown to be an important enzyme for the survival of Mtb in Mφs [719]. Furthermore, NO and ROS have been shown to reduce the Rv2234 activity in Mφs, and this could disrupt the growth of Mtb in Mφs [720]. In Fig. 15.10B, the high expression of Rv2234 in Mφs is not inhibited by NO or ROS. We also indicate that Rv2234 can interact with Rv0081 and Rv0324, which are both metaldependent transcriptional regulators in Mφs and DCs, respectively. This suggests that Rv2234 may be involved in the metal-dependent pathway in both cell types. Rv1173 (FbiC) is crucial for the F420 production and participates in the F420 biosynthetic pathway [721]. F420 is catalyzed by F420-dependent glucose-6-phosphate dehydrogenase
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
368
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
into H2F420 [722]. One mechanism for host killing of Mtb is through the phagosome acidification. The induction of NO in acidified phagosomes during the activation of Mφs is found to lead to its conversion to NO2. Mtb is more sensitive to NO2 than to NO under aerobic conditions and has evolved a mechanism to decrease the antibacterial action of Mφs by converting NO2 back to NO via H2F420 [723]. Though NO is known to kill Mtb under both aerobic and hypoxic conditions, NO2 is found to be more toxic than NO under aerobic conditions [724]. The high expression of Rv1173 in Mtb in Mφs compared to DCs shows a greater nitrosative stress in Mφs (Fig. 15.8). The induction of Rv1173 can increase the production of H2F420 to convert NO2 to NO for protecting Mtb from the nitrosative burst in Mφs under aerobic conditions. Rv1675c (Cmr) is a CRP/FNR family TF that is expressed in response to cAMP levels [724]. cAMP is a common second messenger molecule that plays a crucial role in the catabolite repression, virulence, and signaling pathways in many bacterial pathogens, including Mtb [725]. During the Mφ infection, Mtb can produce a cAMP burst within Mφs to promote the survival of Mtb [726]. In Fig. 15.10B, the high expression of Rv1675c in Mφs reflects the high cAMP level within Mφs. In addition, Rv1675c interacts with many proteins and has more specific edges in Mφs than DCs. Rv1675c also interacts with three metal-dependent proteins Rv0081, Rv0324, and Rv0967 as previously mentioned. This implies that Rv1675c may participate in the metal-dependent pathway that could facilitate Mtb resistance to the induction of metal toxicity within Mφs during the early infection, even if this has not yet been demonstrated. A recent study has reported that Rv1675c can regulate rv1675c and is involved in the Cmr-mediated gene regulation (Fig. 15.10B) [727] Furthermore, the interaction of Rv1675c with Rv0667 may indirectly influence the host in HPCN (Fig. 15.8). Taken together, Rv1675c may participate in the Mtb host manipulation through Rv0667 and plays a role in protecting Mtb from the metal toxicity by interaction with metal-dependent proteins in Mφs. This suggests that Rv1675c is crucial for Mtb survival during the early infection in Mφs, and could be selected as a potential drug target. Rv0353 is an Mtb antigen that can be processed and presented on MHC molecules, which could promote the production of antibodies against Mtb. It has been shown that Rv0353 favors the activation of DCs during the early Mtb infection [728]. However, mir636 can inhibit rv0353 in Mφs (Fig. 15.10B). Furthermore, the antigen processing capacity of Mφs is reduced by Mtb proteins, causing a reduction in the antibody production. In comparison, DCs can still process Rv0353 and present it on MHC molecules. Therefore the repression of mir-636 and the Mtb membrane proteins, shown as in Fig. 15.10A, may recover the antigen processing capacity of Mφs and increase the production of antibodies. Furthermore, Rv0353 also interacts with metal-dependent TFs, including Rv0081 and Rv0324 (Fig. 15.10B), indicating the participation of Rv0353 in the metal-dependent homeostasis signaling pathways in both cell types. 15.3.2.5 Overview of the defense mechanisms of the host and pathogen and the dysfunctions of the host in Mction of s Fig. 15.11 provides an overview of the previously mentioned defense mechanisms in Mφs and DCs. The diagram demonstrates that the same mechanisms employed by both cell types have different activities because of the influence from Mtb or as a result of miRNA inhibition.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
15.3 Investigating pathogenic/host defense mechanism to identify drug targets
369
First, we discuss the host mechanisms and the defensive Mtb proteins to counteract these host mechanisms. SYK is involved in the ROS production and is present only in Mφs. It acts as an immune response in Mφs to eliminate Mtb. Nevertheless, SYK can be influenced by Mtb proteins, indicating that Mtb can interfere with ROS production in Mφs. Another defense mechanism for host cells is the metal toxicity, which is utilized by both cell types. However, it seems that Mtb has evolved the metal-dependent homeostasis signaling pathway to counteract the metal toxicity in both cell types. The upper figure summarizes the defense mechanisms of the host in Mφs during the early Mtb infection. The defensive proteins (Rv1438, Rv0762c, and Rv0667) and metaldependent homeostasis signaling pathway of Mtb within Mφs could counteract the host defense mechanisms. The lower figure summarizes the defense mechanisms of the host in
FIGURE 15.11
The defense mechanisms of the host and pathogen and the dysfunction of the host in Mφs and DCs during the early Mtb infection. DC, Dendritic cell; Mtb, Mycobacterium tuberculosis.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
370
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
DCs during the early Mtb infection. The defensive proteins (Rv1438, Rv0762c, and Rv0667) and metal-dependent homeostasis signaling pathway of Mtb within DCs could influence host defense mechanisms. We observed that the defense mechanisms of Mφs are more easily influenced by Mtb than DCs, indicating that Mφs are more susceptible to Mtb than DCs. In addition, the dysfunction in Mφs such as DNA repair and cell growth caused by Mtb may easily cause the accumulation of mutations in Mφs. Thus Mtb infection of Mφs may promote the progression from TB to lung cancer [11]. The defense mechanisms of the host can be counteracted by Mtb, causing the dysfunction of host cells. PMS2P1 functions in DNA repair with its high expression in Mφs and can be influenced by Mtb proteins, causing the DNA repair dysfunction in Mφs. PRKAR1A is involved in cell growth and is highly expressed in Mφs. Its expression is reduced in DCs because of the mir-612 and mir-636 inhibition, which suggests that Mtb can easily dysregulate the cell growth in Mφs. RPS10 is also involved in cell growth, is present in both cell types, and can be influenced by Mtb proteins in both cell types. PFDN5 could function to correct unfolded or misfolded proteins. It is also found to be highly expressed in Mφs and poorly expressed in DCs because of mir-224 inhibition. Without the inhibition of PFDN5 by miRNA in Mφs, the correction of unfolded or misfolded proteins would be constitutively active in Mφs, which could contribute to progression from TB to cancer in Mφs. UBC is also found to be involved in the antigen processing and presentation and can be influenced by Mtb proteins in both cell types. There are more Mtb proteins to interfere with UBC in Mφs, which suggests that DCs are able to induce more antigen processing and presentation than Mφs. Mtb can induce cellular dysfunctions in DNA repair and cell growth in Mφs. This suggests that Mtb-infected epithelial cells may have similar dysfunctions in DNA repair and cell growth, which may contribute to the progression of lung cancer.
15.3.3 Drug targets, drug mining, and multimolecule drug design A major preventative agent against TB is Bacille CalmetteGue´rin (BCG), an attenuated vaccine strain extracted from or infected with M. bovis by Calmette and Gue´rin. BCG vaccination is typically administered at birth and is highly effective in preventing the development of TB. In general, BCG efficacy will decrease over time, and the protection in adults will be not as effective as in children [729]. Some studies have shown that BCG could protect humans for only 1020 years [730732]. Even if several new drugs have been developed and are successful against TB, the development of multidrug-resistance TB (MDR-TB) and extensive drug-resistance TB (XDR-TB) could still limit their efficacy [733,734]. MDR-TB is resistant to isoniazid and rifampicin, which are still the two most effective anti-TB drugs [735]. XDR-TB is also resistant to these as well as those second-line drugs [736]. Because of its essential role in the defense against Mtb infection in Mφs, we could predict Rv1675c as a potential drug target. The inhibition of Rv1675c may interfere with the autoregulation and metal-dependent pathways of Mtb in Mφs. Further, the cell wall or cell membrane proteins Rv1098c, Rv0967, Rv0667, Rv1696, and Rv2404c have been identified using the automated two-dimensional, capillary high-performance liquid chromatography (LC) coupled with mass spectrometry (MS) (2DLC/MS) in TubercuList [737]. In
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
15.3 Investigating pathogenic/host defense mechanism to identify drug targets
371
TubercuList, Rv1098c and Rv0667 have been also identified as the essential genes by the Himar1-based transposon mutagenesis in H37Rv strain [738], while Rv1696 is required for the growth in C57BL/6J mouse spleen, by the transposon site hybridization (TraSH) in H37Rv [739]. Nevertheless, in TubercuList, Rv0967 and Rv2404c have been identified as nonessential genes by the Himar1-based transposon mutagenesis in H37Rv strain [738], and nonessential genes for the in vitro growth of H37Rv, by sequencing of the Himar1based transposon mutagenesis [695]. Consequently, we suggested that Rv1098c, Rv0667, and Rv1696 are important for the survival of Mtb in both cell types and can be easily targeted by drugs. Since we have identified that the pathogen-TF Rv0081 is positively affected by Rv0969 during the Mtb infection of Mφs, the pathogen-TFs Rv0081 and Rv0324 could be positively affected by Rv0969 during the Mtb infection of DCs. While Rv0969 is targeted by the proposed drugs, the expression levels of the pathogen proteins Rv1675c and Rv0970 could be attenuated during the Mtb infection of Mφs and DCs (Fig. 15.10). Consequently, the inhibition of drug targets Rv0967, Rv0969, and Rv0970 could lead to the inhibitive combination of Cu homeostasis during the Mtb infection of Mφs and DCs. These proteins might also become potential drug targets of multimolecule drug design for future therapy of Mtb infection. The repression of mir-636 and Mtb membrane proteins may help Mφs increase the antibody production. Nevertheless, there is currently no drug to inhibit mir-636. After predicting potential multiple drug targets, we could then design a potential multimolecule drug for targeting these potential multiple drug targets through drug database mining from literature review. At present, a drug database for drugs targeting Mtb proteins has not been constructed, so we could explore studies predicting that some drugs may inhibit potential multiple drug targets. A study based on the comparison of binding sites of existing drugs for human use against the entire structural proteome of the pathogen could predict Lopinavir as a drug against the Mtb protein Rv1438 (TpiA) [740]. Another study has reported that TMC207 targets heavy metal P-type ATPases, including Rv0969 (CtpV) [741,742], which might mediate the Mtb P-type ATPase activation and enhance the metal toxicity to eliminate Mtb during the metal burst in Mφs and DCs. Furthermore, copper-boosting compounds ATSM and GTSM have been reported to increase the Cu level in Mtb rather than disrupting Cu homeostasis [743]. Finally, we combine these drugs to generate a potential multimolecule drug as shown in Fig. 15.12 for the potential multiple drug targets, which could be efficient for the treatment of Mtb infection in both cell types. The present drug molecule Lopinavir has been predicted to target the Mtb protein Rv1438 (TpiA), which could interrupt the interactions between Rv1438 and host proteins [740]. Another study has reported that the molecule TMC207 could target the heavy metal P-type ATPase, including Rv0969 (CtpV) [741,742], which may mediate the Mtb P-type ATPase activation and enhance the metal toxicity to eliminate Mtb during the metal burst in both cell types. Further, the copper-boosting compounds ATSM and GTSM have been reported to increase the Copper (Cu) level in Mtb instead of disrupting Cu homeostasis [743]. Consequently, these three molecules could be combined as a potential multimolecule drug for the treatment in both Mφs and DCs during early Mtb infection. In addition, the results suggest that Rv1098c may help Rv0667, Rv0762c, and Rv1438 promote the hostpathogen interaction and may help Rv0353 promote the production of antibodies against Mtb. When mir-636 has been suggested as a potential drug target to
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
372
FIGURE 15.12
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
The multimolecule drug for the potential multiple drug targets [11].
help Mφs increase antibody production through the activation of Rv0353, Rv1098c plays an essential role in increasing the antibody production in Mφs. Even if Rv1098c has been suggested as a potential drug target to help both cell types attenuate the survival of Mtb, these two drugs that can inhibit mir-636 and Rv1098c, respectively, may produce an inhibitory action to treat the Mtb-infected Mφs.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
15.4 Conclusion
373
15.4 Conclusion TB is a global disease, accounting for almost 2 million deaths per year. Even if drugs such as isoniazid, rifampin, and pyrazinamide have been used for curing TB, there is an urgent need to identify new drugs because of the presence of drug-resistant strains. In this chapter, we introduce a systems biology approach, big database mining, and microarray data to construct GWGEINs in both Mφs and DCs during the early Mtb infection. We have investigated differences for the biomarkers in the defense mechanisms of the host and pathogen in Mφs and DCs during the early Mtb infection by analyzing HPCNs extracted from GWGEINs via the PNP method. In addition to the production of cytokines by the host, oxidative stress is another strategy used to kill pathogens. Oxidative stress is identified in Mφs and DCs, but it is higher in Mφs. However, the pathogen can influence the activity of SYK, which is involved in the ROS production through interacting with UBC in Mφs. In addition, ROS is also detrimental to the host since it could cause DNA damage [667]. Therefore the Mtb-mediated dysfunction of DNA repair might contribute to the progression of lung cancer due to the long-term accumulation of mutations in DNA and the interspecies interaction between Rv1438 and EGFR. Furthermore, Mtb could also influence the cell growth mediators RPS10 and PRKAR1A, via interacting with UBC in Mφs. In contrast, DCs are less susceptible to Mtb. mir-612 can inhibit the expression of PRKAR1A to reduce cell growth in DCs. DCs could collect antigens as well as the misfolded and unfolded proteins to form DALIS, which is then degraded at the late stage of DC maturation. Antigen peptides are then loaded onto MHC molecules and presented to B cells for the antibody production. The difference in the defense mechanisms between Mtb-infected Mφs and DCs could be observed in our HPCNs. Though the role of antibody production is still unclear as the control of Mtb requires cellular immunity, especially after the antigen presentation by Mφs, the genes involved in the oxidative stress have been also reported to contribute to the difference in the defense mechanisms between Mtb-infected Mφs and DCs [667]. Another strategy employed by the host to kill pathogens is the metal burst in Mtb. Though metals are needed for the survival of bacteria, the overload of metals will become toxic. Mtb has evolved adaptive mechanisms to overcome the excess or absence of Cu in Mtb in order to maintain the Cu homeostasis. Rv0969 is found to act as an efflux pump to transport the excess Cu out of Mtb, and Rv0967 can repress the expression of cso in the absence of Cu. In addition, Rv0081 and Rv0324 are metal-dependent transcriptional factors that participate in the regulation of rv0970. Even if the cellular function of Rv0970 is still unknown, we predict that it may play an essential role in the metal-dependent homeostasis pathway. The difference between metal-dependent homeostasis pathways specific in Mφs and DCs is also shown in HPCNs in Fig. 15.10B. Even if the cellular function of Rv1675c is still unknown, we find that it can participate in the metal-dependent homeostasis pathway with a higher expression and more specific interactions in Mφs, indicating its crucial defensive role in Mφs. Consequently, Rv1675c will become a potential drug target. Another potential drug target is Rv1438, which can interact with the host proteins EGFR and UBC, and is crucial for the survival of Mtb. The membrane proteins Rv1098c, Rv0967,
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
374
15. Genetic-and-epigenetic host/pathogen networks for cross-talk mechanisms
Rv0969, Rv0970, Rv0667, Rv1696, and Rv2404c are essential for the survival of Mtb in both cell types and can be easily targeted by drugs. We have also observed that Mφs are more susceptible to Mtb than DCs, and the dysfunctions in Mφs such as DNA repair, cell growth, and the constitutive correction of unfolded or misfolded proteins may easily cause the accumulation of mutations in Mφs. Therefore the Mtb-induced dysfunctions in Mφs could suggest that the same dysfunctions may be present in Mtb-infected epithelial cells, and contribute to the progression of lung cancer. Accordingly, we have also designed a potential multimolecule drug to deal with the potential drug targets we proposed. Nevertheless, due to the lack of a database that targets Mtb proteins, we have explored several studies to deal with the crucial protein Rv1438, the metal-dependent protein Rv0969, and the increase in Cu levels in Mtb, as biomarkers for drug targets from which the multimolecule drug was designed (Fig. 15.12). In the pathogen-gene regulation model, we could demonstrate that hostmiRNA can inhibit pathogen-gene interactions. However, the influence of these pathogen-gene interactions on hostmiRNA is still possible although it has not been yet been elucidated. As a result, as novel cross-talk mechanisms between the host and pathogen are still to be identified in the future, the systems biology models used in this chapter can be improved.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
C H A P T E R
16 Investigating the host/pathogen crosstalk mechanism during Clostridium difficile infection for drug targets by constructing genetic-and-epigenetic interspecies networks using systems biology method 16.1 Introduction Clostridium difficile (C. difficile) is characterized as the major infectious cause of antibiotic-associated diarrhea and is the etiologic agent of pseudomembranous colitis. This pathogen is a rod-shaped Gram-positive anaerobic bacterium with the spore-forming ability. It was first identified by Hall and O’Toole in 1935, but no further studies had linked the bacterium to human disease until 1978 [744,745]. Today, C. difficile is recognized as the major pathogen responsible for nosocomial antibiotic-associated diarrhea. C. difficile infection (CDI) could induce symptoms ranging from the mild diarrhea to pseudomembranous colitis to toxic megacolon, and even to death. Over the past two decades, the morbidity and severity of CDI have been increasing worldwide. In the United States the annual reported cases of CDI had grown approximately threefold from 1996 to 2005 and twofold from 2001 to 2010 [746,747]. Nearly half a million American patients have suffered from CDI, resulting in a corresponding US$2 billion additional economic burden annually [748]. In 2002 the presence of a hypervirulent strain of C. difficile (BI/NAP1/027) and the subsequent outbreaks in Europe and North America also reflected the alarming threat of this rising enteric pathogen [749]. The Caco-2 cell line, which has been widely used as a model of the human epithelial barrier, is derived from human epithelial colorectal adenocarcinoma. Although it is isolated from a colorectal carcinoma, Caco-2 cells could differentiate to form an epithelial cell
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00018-3
375
© 2021 Elsevier Inc. All rights reserved.
376
16. Investigating the host/pathogen cross-talk mechanism
barrier that expresses tight junctions and microvilli when culturing with permeable filters and specific conditions [750]. Epithelial cells could form the first physical line in the immune defense against bacterial or viral infection. However, in the case of CDI, this monolayer can be breached by the pathogen, allowing C. difficile to enter mucosa and stimulate mucosal immune cells such as monocytes and neutrophils. The epithelial barrier breakdown could also facilitate neutrophil influx in the mucosa and gut lumen, a clinical feature of CDI, leading to the formation of pseudomembranes with the dead epithelial cells in severe cases. CDI usually occurs after a disturbance of the normal gut microbiome following an antibiotic treatment. After the disruption of the microbiota, C. difficile can colonize to the intestinal epithelial cells and produce pathogenic factors to breach the barrier. The major toxins of C. difficile are enterotoxin CD0663 (TcdA) and cytotoxin CD0660 (TcdB). Both toxins could enter host cells via the receptor-mediated endocytosis and are cytotoxic to the host tissue by inactivating small Rho GTPases (RAC1, RHOA, and CDC42) [751]. The glucosylation-dependent inactivation of Rho GTPases could result in the actin cytoskeleton depolymerization and tight junction breakdown. In addition, toxins could also stimulate the release of cytokines and proinflammatory mediators, including IL-8/CXCL8, MIP-2, and MCP-1. These cytokines could recruit monocytes and neutrophils for phagocytosis and trigger a severe inflammatory response. While the cytopathic effects of TcdA and TcdB are widely investigated, other interspecies cross-talk mechanisms and pathogenic factors that contribute to CDI remain largely unknown. The increasing morbidity of CDI in low-risk groups, including patients without recent exposure to antibiotics, and young healthy adults [752] could lead to the discovery of other mechanisms and virulence factors contributing to this phenomenon. However, C. difficile has not been fully investigated due to the difficulties in its genetic manipulation, which makes it hard to generate isogenic strains for further study. In this case the genome-wide genetic-and-epigenetic interspecies networks (GEINs) constructed using systems biology methods could provide us a systematic view of the cellular activities inside the pathogen. In a previous study the glucosylation of RHOA was shown to achieve the saturation at 60 min post infection, and all GTPases (CDC42, RAC1, and RHOA) also lose their enzyme activity at this time point [753,754]. In addition, the MTT-dependent cell viability assays presented in our data source study [755] have demonstrated that the significant cell death also occurs at 60 min post infection. Based on these observations, we define the early (060 min) and late stages (30120 min) of CDI to investigate the progression of cross-talk molecular mechanisms between two species. The 30 min overlap allows us to observe the causality and coherence of crosstalk molecular mechanisms. Another molecular mechanism regulating colonic gene expression is the microRNA (miRNA) system. miRNAs are small noncoding RNA molecules (B22 nucleotides) that bind to mRNA via the complementary base pairing, resulting in mRNA silencing in human cells. Interestingly, a recent study indicates that the host could utilize miRNA silencing to shape the gut microbiota, including the Clostridium genus [685], suggesting that miRNAs play a crucial role in not only host gene repression but also microbiota shaping. Although there are few studies investigating the effects of miRNAs on C. difficileinfected Caco-2 cells, and no existing studies have reported the miRNA silencing with
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.1 Introduction
377
regard to C. difficile, it is still necessary to create the GEINs of the host and pathogen during CDI. In addition to small noncoding RNA, long noncoding RNAs (lncRNAs) have also generated a significant attention in recent years. Unlike miRNAs, lncRNAs are too large (B2001000 nucleotides) to pass through the bacterium cell wall and cell membrane for the pathogen-gene regulation. In human cells, lncRNAs participate in gene regulation in a similar but more complex manner than miRNA [756], controlling various cellular responses. Furthermore, other epigenetic regulations such as DNA methylation and histone modification could confer rapid and strong cellular responses to the corresponding bacterial invasion. These epigenetic activities (miRNA repression, DNA methylation, and histone modification) could alter the behavior of host cells in response to the corresponding bacterial infection. In order to investigate the progression of hostpathogen cross-talk mechanisms, as well as how these epigenetic activities contribute to the progression during CDI, we identified the GEINs in both the host and pathogen during the early and late stages of CDI. We could then extract the hostpathogen core networks (HPCNs) from the GEINs to investigate the core pathways involved in the cellular responses of the host and pathogen during the early and late stages of CDI. In addition, we have also discussed the offensive and defense mechanisms employed by the host and pathogen, respectively, as well as the crucial events contributing to the progression of CDI during the early and late stages of infection. Our previous studies have demonstrated that systems biology approaches are powerful tools to investigate complex biological networks, especially the cross-talk molecular mechanisms between different species [6,7,11,757]. In this chapter, we have investigated that at the early stage of CDI, the C. difficile pathogenic factors CD0663, CD0660, and CD0478 can induce the production of reactive oxygen species (ROS) via a RAC1-related pathway, the disturbance of cytoskeleton homeostasis via GTPase inactivation, and the dysfunction of chaperone activities via acetylation. In response to these pathogenic effects, Caco-2 cells trigger the autophagy and DNA damage response to restore the injured components, and the immune response to deplete C. difficile. The depletion of C. difficile has been found to be durable and enhanced during the late stage of infection, resulting in a strong inflammatory response. However, the severe inflammation in turn could cause tissue damage of the host. Moreover, the accumulated cellular stress [oxidative stress and endoplasmic reticulum (ER) stress] is also found to result in the apoptosis of Caco-2 cells. Based on our results, we propose some pathogen proteins as potential drug targets due to their crucial roles during CDI. These include the cell-wall proteins CD2787, CD0237, and CD0440; the growth essential proteins CD1275 and CD2781; the anti-ROS enzymes CD2356, CD0171, and CD0179; and the sporulation-related proteins CD1214, CD2629, and CD2643. In addition, we also aim to restore the expression of host dysfunction proteins (RHOA, CDC42, RAC1, HSP90B1, HSPA5, and HSP90B2P) and to repress pathogenic effectrelated proteins (NFKB1, REL, and IL-8) induced by C. difficile. We also introduce a multimolecule drug containing E64, IgY, REP3123, camptothecin, and apigenin for these drug targets to treat CDI. This multiple molecule drug can inhibit multiple potential targets described previously as well as restore gene expression homeostasis of host dysfunction proteins and inflammation-related proteins.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
378
16. Investigating the host/pathogen cross-talk mechanism
16.2 Materials and methods 16.2.1 Overview of the construction of GEINs and HPCNs in Caco-2 cells during the early and late stages of CDI To investigate the progression of the cross-talk mechanisms between Caco-2 cells and C. difficile during the early and late stages of CDI, we identified the GEINs and extracted HPCNs in Caco-2 cells and C. difficile during the infection. A flowchart of how to obtain HPCNs of the host and pathogen at the early and late stages of infection by big data mining, model construction, and network identification for investigating the cross-talk molecular mechanisms and inferring potential drug targets is shown in Fig. 16.1. These processes can be divided into four steps: (1) the use of big data mining and data preprocessing of host/pathogen gene/miRNA expression data; (2) the construction of candidate GEIN, which consists of candidate host/pathogen intraspecies proteinprotein interaction (PPI) networks (PPINs), candidate interspecies PPINs between host and pathogen, candidate host/pathogen gene/ miRNA regulation networks (GRNs), candidate miRNA regulation networks of host-miRNAs on host/pathogen genes and candidate lncRNAs regulation networks of host-lncRNAs on host-genes; (3) the identification of real GEINs of each stage from the candidate GEIN via system identification method and system order detection scheme, using the genome-wide microarray data of Caco-2 cells and C. difficile during infection; and (4) the extraction of HPCNs from the real GEINs using the principal network projection (PNP) method. We then investigated the crucial molecular mechanisms that contribute to the progression of C. difficile infection and finally inferred some potential drug targets for multimolecule drug design.
16.2.2 Big data mining and data preprocessing of host/pathogen gene/miRNA microarray data To identify the cross-talk activities between the host and pathogen during the infection, it is necessary to simultaneously measure the gene expression of host and pathogen. However, C. difficile is notoriously difficult to cultivate and isolate and is extremely sensitive to oxygen. It should be cultured alone with human cells in an anaerobic environment. There are only few studies investigating the gene expression of epithelial cells exposed to C. difficile toxins [758] and there is no existing transcription profile of C. difficile during the invasion to human cells except the raw data of one study [755]. Therefore, in the present study, the raw data obtained from the previous study investigating the transcription profile of both Caco-2 cells and C. difficile [755] are the only available dataset providing the sufficient two-sided information for constructing the candidate GEIN. The microarray raw data has two parts. The first one contains the mRNA/miRNA expression profiles of three biological replicates of the Caco-2 cell line at 0, 30, 60, 120 min post infection with C. difficile 630. Each biological replicate contains two technical replicates (GEO accession number GSE18407). The Caco-2 cell line was cultured in Dulbecco’s modified Eagle’s medium at 37 C prior to infection. The second part contains the mRNA expression profiles of three biological replicates of C. difficile 630 in Caco-2 cells at 0, 30, 60, 120 min post infection (GEO accession number GSE18407; https://www.ncbi.nlm.nih. gov/geo/query/acc.cgi?acc 5 GSE18407). Furthermore, biological replicates 1, 2, and 3 of
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.2 Materials and methods
379
FIGURE 16.1 Flowchart of the systems biology approach used to construct GEINs for HPCNs, and to investigate the cross-talk mechanisms during CDI for drug targets and drug discovery. The blue gray blocks represent the external information, including the big data mining for constructing candidate GEIN, microarray data identification, and the surveyed literature for drug design; the rounded rectangular blocks denote the schemes and methods utilized to construct the candidate GEIN and real cross-talk GEINs at the early and late stages of CDI and then extract the HPCNs of each stage; and the light yellow blocks are the identified real GEINs and the cross-talk HPCNs at the early and late stages of infection, as well as the inferred drug targets and proposed multimolecule drug [12]. CDI, Clostridium difficile infection; GEIN, genetic-and-epigenetic interspecies network.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
380
16. Investigating the host/pathogen cross-talk mechanism
Caco-2 cells correspond to the biological replicates 1, 2, and 3 of C. difficile, respectively. The platforms used in the host and pathogen were Phalanx Human OneArray and C. difficile 630/ QCD32g58 array, respectively, which include 39,200 and 13,824 probes, respectively. The microarray data were validated using qRT-PCR. Considering the fact that each biological replicate of Caco-2 cell contains two technical replicates, we then took the average of the microarray data of the two technical replicates for further network identification. The expression data of human Caco-2 cells at 0, 30, and 60 min and C. difficile at 0, 30, and 60 min during the infection were utilized to identify the real GEINs of each of biological replicates during the early stage (060 min) of CDI. In addition, the expression data of human Caco-2 cells at 30, 60, and 120 min and C. difficile at 30, 60, and 120 min during the infection were utilized to identify the real GEINs of each of biological replicates during the late stage (30120 min) of CDI, where biological replicates 1, 2, and 3 in the early stage of infection correspond to biological replicate 1, 2, and 3 in the late stage of infection, respectively. To obtain a sufficient number of data points for the following network identification method, we could apply the cubic spline interpolation method to the expression data at 0, 30, 60, and 120 min to avoid an overfitting problem in the network identification process.
16.2.3 Construction of a candidate genetic-and-epigenetic interspecies network In order to construct the candidate GEIN, big data collected from the experimental or computational predictions are required. The source databases and literature of the big data are described as follows. The required information on host candidate PPIN PPI was obtained from DIP [759], BIND [533], IntAct [537], MINT [538], and the physical interaction part of BioGRID [584] since the genetic interactions of BioGRID are inferred indirectly from experiments. The host candidate GRN required of the transcription factors (TFs)/ lncRNAs/complex and their downstream-regulated genes was obtained from HTRIdb [671] and ITFP [672]. miRNAs to regulatory gene associations were from TargetScanHuman [668], starBase v2 [760], and CircuitDB [761]. The pathogen candidate PPIN required of PPI information can be obtained from STRING [585]. In the case of pathogen candidate GRN and hostpathogen cross-talk candidate PPIN and GRN, no existing database can provide sufficient information or predictions for network construction. Thus we explored studies about hostpathogen interspecies PPIs [751,762774] and utilized the sequence homology between C. difficile and Escherichia coli [775], as well as C. difficile with Homo sapiens [775], along with the reported interspecies PPIN between E. coli and H. sapiens [776], and human intraspecies PPIs (DIP, BIND, BioGRID, IntAct, and MINT), to construct the hostpathogen candidate interspecies PPIN. Similarly, we also surveyed the literature reporting pathogen gene regulatory pairs [777786] and utilized the sequence homology between C. difficile and E. coli [775], as well as C. difficile and H. sapiens [775], along with the E. coli intraspecies GRN obtained from RegulonDB [787] and human intraspecies GRN (HTRIdb and ITFP), to build the pathogen candidate intraspecies GRN. Finally, we utilized the sequence homology between C. difficile and H. sapiens, along with the miRNA regulatory information obtained from TargetScanHuman [668], starBase v2 [760], and CircuitDB [761] to create the candidate GRN of host-miRNAs targeting pathogen genes. The detailed construction procedures of hostpathogen candidate interspecies PPIN, pathogen candidate intraspecies GRN, and candidate GRN of host-miRNAs targeting pathogen genes are shown in Fig. 16.2.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.2 Materials and methods
381
FIGURE 16.2 The constructing schemes of candidate GEIN: (A) hostpathogen candidate interspecies PPIN; (B) pathogen candidate intraspecies GRN; and (C) candidate GRN of host-miRNAs targeting pathogen genes [12]. GEIN, Genetic-and-epigenetic interspecies network; PPIN, proteinprotein interaction network.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
382
16. Investigating the host/pathogen cross-talk mechanism
In totality, we obtained 144,361 TF/lncRNA/complex gene pairs, 5961 miRNA gene pairs for host candidate GRN, 1265 TF gene pairs, and 96 miRNA gene pairs for pathogen candidate GRN. In the case of PPINs, we totally inferred 3,425,976 PPIs for the host candidate PPIN, 290,018 PPIs for pathogen candidate PPIN, and 17,068 interspecies candidate PPIs between host and pathogen during the CDI.
16.2.4 Dynamic models of GEINs for Caco-2 cells and Clostridium difficile during infection Since the candidate GEIN was constructed by big data from numerous databases, experimental datasets, and literature, they contain some inevitable false-positive information. To avoid the effects of these false-positive information, the dynamic model is constructed to characterize the molecular mechanisms of GEINs and to prune the false positives in candidate GEIN by the real two-sided microarray data, thus producing the real GEINs for Caco-2 cells and C. difficile during the infection process. For the PPIN of host proteins in the candidate GEIN, the dynamic interaction model of the ith host protein can be described by the following dynamic interactive equation,
H pH i ðt 1 1Þ 5 pi ðtÞ 1
Fi X f51
for
H H aH if pi ðtÞpf ðtÞ 1
Qi X
H P H H H H H H cH iq pi ðtÞpq ðtÞ 1 αi gi ðtÞ 2 γ i pi ðtÞ 1 κi 1 ϖi ðtÞ;
q51
i 5 1; 2; . . .; I; αH i
$0
and 2 γ H i
(16.1)
#0
H H P where pH i ðtÞ, pf ðtÞ, gi ðtÞ, and pq ðtÞ represent the expression levels of the ith host protein, the fth host protein, the ith host gene, and the qth pathogen protein at time t, respectively; H aH if and ciq denote the interactive ability between the ith and fth host protein and between the ith host protein and qth pathogen protein, respectively; Fi and Qi signify the number of H host proteins and pathogen proteins that interact with the ith host protein; αH i , 2γ i , and H κi indicate the translation rate from the corresponding mRNA, the degradation rate, and the basal level of the ith host protein, respectively. In general, the basal level κH i in (16.1) represents an unknown activity affecting the expression of the ith host protein other than those mentioned previously, such as the epigenetic acetylation and ubiquitination. ϖH i ðtÞ denotes the stochastic noise of the ith host protein at time t. The biological meaning of (16.1) is that the expression level of the ith host protein can be affected by various Pi H H molecular mechanisms, including the host intraspecies PPIs Ff51 aif pi ðtÞpH f ðtÞ, interspecies PQi H H P H H H H PPIs q51 ciq pi ðtÞpq ðtÞ, protein translation αi gi ðtÞ, protein degradation 2γ i pi ðtÞ, basal H level κH i , and the corruption of stochastic noise ϖi ðtÞ. In addition, the translation rate
should be constrained to be nonnegative and the protein degradation rate should be constrained to be nonpositive in real PPIs.
383
16.2 Materials and methods
For the GRN of host-genes in the candidate GEIN, the dynamic regulatory model of the jth host gene can be described as follows: I
H gH j ðt 1 1Þ 5 gj ðtÞ 1
i51
2
Kj X
0
I
00
Ij Nj j X j X X X H H H bH p ð t Þ 1 e l ð t Þ 1 xH 00 ji i jn n n51
0 00 i 51 i 51 j Ij ði 21Þ1i 0
00
pH0 ðtÞpH00 ðtÞ i
i
(16.2)
H H H H H H dH jk gj ðtÞmk ðtÞ 2 λj gj ðtÞ 1 δ j 1 εj ðtÞ;
k51
for
H j 5 1; 2; . . .; J; 2 dH jk # 0 and 2 λj # 0
H H H where gH j ðtÞ, pi ðtÞ, mk ðtÞ, and ln ðtÞ indicate the expression levels of the jth host gene, the H ith host TF, the kth host miRNA, and the nth host lncRNA at time t, respectively; bH ji , 2djk , and eH jn represent the regulation ability of the ith host TF, the kth host miRNA, and the nth host lncRNA on the jth host gene, respectively; Ij, Kj, and Nj denote the number of hostTFs, host-miRNAs, and host-lncRNAs, respectively, which could regulate the expression H H H level of the jth host gene; pH i0 ðtÞpiv ðtÞ implies the ith host complex where pi0 ðtÞ and piv ðtÞ represent the subunits 1 and 2 of the ith host complex, respectively; I 0j and Ivj are the same, to denote the number of host complex, which could regulate the jth host gene in the candisignifies the regulation ability of the ith host complex on the jth date GRN; xH jðIvj ði0 21Þ1ivÞ H H host gene; 2λj and δj indicate the degradation rate and the basal level of the jth host gene, respectively. In general, the basal level δH j in (16.2) denotes an unknown regulation other than those mentioned previously such as DNA methylation. εH j ðtÞ represents the stochastic noise due to modeling residue and measuring error at time t. Notably, in the case of the ith host complex on the jth host gene, the index of the regulation ability xH jðIvj ði0 21Þ1ivÞ 0 Ivj ði 2 1Þ 1 iv guarantees the proper coordinate of the regulation ability xH of the jðIvj ði0 21Þ1ivÞ H H ith host complex pi0 ðtÞpiv ðtÞ in the host GRN system matrix of the jth host gene, that is, the regulation abilities of host complexes on the jth host gene can be aligned to an one row H H H H H H matrix of: xH j1 ; . . .; xjðIvj Þ ; xjðIvj 11Þ ; . . .; xjð2Ivj Þ ; xjð2Ivj 11Þ ; . . .; xjðIvj ði0 21Þ1ivÞ ; . . .; xjðIvj I 0j Þ . Therefore the biological meaning of Eq. (16.2) is that the expression level of the jth host gene can be regulated by numerous molecular mechanisms, including the host TF P Ij H H PNj H H regulations i51 bji pi ðtÞ, host lncRNA regulations n51 ejn ln ðtÞ, host complex regulations P Kj H H PI0j PIvj H H H H i0 51 iv51 xjðIvj ði0 21Þ1ivÞ pi0 ðtÞpiv ðtÞ, host miRNA repressions 2 k51 djk gj ðtÞmk ðtÞ, mRNA H H H degradation effect 2λH j gj ðtÞ, basal level δ j , and the corruption of stochastic noise εj ðtÞ. Similar to the protein model, the miRNA repression ability and gene degradation rate should be constrained to be nonpositive. Since DNA methylation can directly influence the binding affinities of RNA polymerases to target genes [683], we assumed that the regulation by methyltransferase could cause the significant change of the basal level δH j and the change of δH between the early stage and late stage of CDI in the dynamic model (16.2) implies the j occurrence of methylation at the jth host gene in the infection process. H In (16.2) the expression of the kth host miRNA mH k ðtÞ and the nth host lncRNA ln ðtÞ at time t can also be regulated by other regulators. Therefore the dynamic regulatory
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
384
16. Investigating the host/pathogen cross-talk mechanism
equation of the kth host miRNA was modeled as follows: Ik X H H H H H yH ki pi ðtÞ 2 μk mk ðtÞ 1 ϕk 1 ς k ðtÞ;
H mH k ð t 1 1Þ 5 m k ð t Þ 1
i51
for mH k ðtÞ
k 5 1; 2; . . .; K
and 2 μH k
(16.3)
#0
pH i ðtÞ
and denote the expression levels of the kth host miRNA and the ith host where TF at time t, respectively; yH ki represents the regulatory ability of the ith host TF on the kth host miRNA; Ik signifies the number of host-TFs that regulate the expression level of the kth H host miRNA; 2μH k and ϕk indicate the miRNA degradation rate and the basal level of the kth host miRNA, respectively; and ς H k ðtÞ implies the stochastic noise at time t. The dynamic model of host-miRNAs in (16.3) could characterize molecular regulatory mechanisms, Pk H H H including the transcription regulations Ii51 yki pi ðtÞ, miRNA degradation effect 2μH k mk ðtÞ, H basal level ϕH k , and the corruption of stochastic noise ς k ðtÞ. In addition, the degradation rate should be constrained to be nonpositive. Similarly, the dynamic model of the nth host lncRNA in the candidate GEIN can be described by the dynamic regulatory equation as follows: H lH n ðt 1 1Þ 5 ln ðtÞ 1
In X H H H H H zH ni pi ðtÞ 2 χn ln ðtÞ 1 ρn 1 ϑn ðtÞ;
(16.4)
i51
for
n 5 1; 2; . . .; N and 2 χH n #0
H where lH n ðtÞ and pi ðtÞ represent the expression levels of the nth host lncRNA and the ith host TF at time t, respectively; zH ni denotes the regulation ability of the ith host TF on the nth host lncRNA; In signifies the number of host-TFs regulating the expression level of the H nth host lncRNA; 2χH n and ρn indicate the degradation rate and the basal level of the nth host lncRNA, respectively; and ϑH n ðtÞ is the stochastic noise due to the modeling residue and measurement noise. The dynamic regulatory model of host-lncRNAs in (16.4) could characterize including the transcription regulations PIn H H molecular regulatory mechanisms, H H z p ð t Þ, degradation effect 2χ l ð t Þ, basal level ρH n n n , and the corruption of stochastic i51 ni i H noise ϑn ðtÞ. Furthermore, the constraint of this model is that the lncRNA degradation rate should be nonpositive. For the PPIN of pathogen proteins in the candidate GEIN, the dynamic interactive model of the qth pathogen protein can be described by the following equation:
pPq ðt 1 1Þ
5
pPq ðtÞ 1
Oq X
aPqo pPq ðtÞpPo ðtÞ 1
o51 2 γ Pq pPq ðtÞ 1 κPq
1 ϖPq ðtÞ;
Iq X P P cPqi pPq ðtÞpH i ðtÞ 1 αq gq ðtÞ i51
for
q 5 1; 2; . . .; Q; αPq
$0
(16.5) and 2 γ Pq
#0
where pPq ðtÞ, pPo ðtÞ, gPq ðtÞ, and pH i ðtÞ represent the expression level of the qth pathogen protein, the oth pathogen protein, the qth pathogen gene, and the ith host protein at time t, respectively; aPqo and cPqi denote the interactive ability between the qth pathogen protein and the oth pathogen protein, and between the qth pathogen protein and the ith host protein, respectively; Oq and Iq denote the number of pathogen proteins and host proteins
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
385
16.2 Materials and methods
that interact with the qth pathogen protein, respectively; αPq , 2γ Pq , and κPq indicate the translation rate, the degradation effect, and the basal level of the qth pathogen protein, respectively; and ϖPq ðtÞ denotes the stochastic noise of the qth pathogen protein at time t. The biological meaning of the Eq. (16.5) is that the expression level of the qth pathogen protein can be affected by various molecular interactive mechanisms, including the intraP Iq P P POq P P aqo pq ðtÞpPo ðtÞ, the interspecies PPIs i51 cqi pq ðtÞpH species PPIs o51 i ðtÞ, protein translation P P P P P αq gq ðtÞ, protein degradation 2γ q pq ðtÞ, basal level κq , and the corruption of stochastic noise ϖPq ðtÞ. Similar to the host protein dynamic model, the translation rate should be constrained to be nonnegative and the protein degradation rate should be constrained to be nonpositive. For the GRN of pathogen genes in the candidate GEIN, the dynamic regulatory model of the hth pathogen gene can be described as follows: gPh ðt 1 1Þ 5 gPh ðtÞ 1
Qh Kh X X bPhq pPq ðtÞ 2 dPhk gPh ðtÞmH k ðtÞ
q51 2 λPh gPh ðtÞ 1 δPh
k51 1 εPh ðtÞ; for
h 5 1; 2; . . .; H;
(16.6) 2 dPhk
#0
and 2 λPh
#0
where gPh ðtÞ, pPq ðtÞ, and mH k ðtÞ indicate the expression levels of the hth pathogen gene, the qth pathogen TF, and the kth host miRNA at time t, respectively; bPhq and 2dPhk represent the regulatory ability of the qth pathogen TF and the kth host miRNA on the hth pathogen gene, respectively; Qh and Kh denote the number of pathogen TFs and host-miRNAs that regulate the qth pathogen gene; 2λPh and δPh indicate the degradation rate and the basal level of the qth pathogen gene, respectively; and εPh ðtÞ represent the stochastic noise due to the modeling residue and measurement noise at time t. The biological meaning of Eq. (16.6) is that the expression level of the hth pathogen gene can be regulated by various P h P P molecular mechanisms, including the pathogen TF regulations Q q51 bhq pq ðtÞ, host miRNA P Kh P P P P H repressions 2 k51 dhk gh ðtÞmk ðtÞ, mRNA degradation 2λh gh ðtÞ, basal level δPh , and the corruption of stochastic noise εPh ðtÞ. In addition, the host miRNA repression rate and degradation rate should be constrained to be nonpositive. Remark 16.1: Note that the dynamic regulatory model of host-miRNAs in (16.3) and hostlncRNAs in (16.4) can be regulated by regulators other than host-TFs such as hostmiRNAs and host-lncRNAs. While in the candidate GEIN, no regulatory relationships about miRNA-to-miRNA, miRNA-to-lncRNA, lncRNA-to-miRNA, and lncRNA-tolncRNA were found by big data mining. Therefore no corresponding terms exist in Eqs. (16.3) and (16.4). GEIN, Genetic-and-epigenetic interspecies network; lncRNAs, long noncoding RNAs; miRNAs, microRNAs; TFs, transcription factors. Remark 16.2: The epigenetic regulations of the candidate GEIN in Ref. [11] only considered the regulation of miRNAs. In this study the epigenetic regulations of candidate GEIN in (16.1)(16.6) have considered miRNAs, lncRNAs, acetylation, ubiquitination, and methylation. Furthermore, the transcriptional regulation of miRNA and lncRNA genes via TFs
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
386
16. Investigating the host/pathogen cross-talk mechanism
allows us to construct the dynamic regulatory models of host-miRNAs and lncRNAs as shown in (16.3) and (16.4), which are absent in Ref. [11]. Therefore the proposed GEIN in this study contains more epigenetic information for cellular mechanisms in infection than that in Ref. [11]. GEIN, Genetic-and-epigenetic interspecies network; lncRNAs, long noncoding RNAs; miRNAs, microRNAs.
16.2.5 Parameter estimation of the dynamic models of candidate GEIN via the system identification method In order to identify the precise parameters, we applied a system identification method to the dynamic genetic and epigenetic Eqs. (16.1)(16.6) in the candidate GEIN. We rewrote the host PPIN dynamic Eq. (16.1) as the following linear regression form: 3 aH i1 6 ^ 7 6 H 7 6 aiFi 7 6 H 7 7 i6 6 ci1 7 H 7 6 gH ð t Þ p ð t Þ 1 ^ i i 6 H 7 7 6 c 6 iQi 7 6 αH 7 7 6 4 1 2i γH 5 2
h H P H H H H H P pH i ðt 1 1Þ 5 pi ðtÞp1 ðtÞ ? pi ðtÞpFi ðtÞ pi ðtÞp1 ðtÞ ? pi ðtÞpQi ðtÞ
κH i
HP HP H 1 ϖH i ðtÞ9φi ðtÞθi 1 ϖi ðtÞ;
(16.7)
i
for i 5 1; 2; . . .; I
φHP i ðtÞ
represents the regression vector that can be obtained from the microarray where expression data, and θHP is the unknown parameter vector to be estimated for the ith host i protein in the host PPIN. Eq. (16.7) of the ith host protein can be augmented for Ti data points as the following form: 2 H 3 2 3 2 HP 3 pH ϖi ðt1 Þ φi ðt1 Þ i ðt2 Þ 6 pH ðt3 Þ 7 6 φHP ðt2 Þ 7 HP 6 ϖH ðt2 Þ 7 i 7; for i 5 1; 2; . . .; I 6 756 i 7θ 1 6 i (16.8) 4 4 5 4 5 i ^ ^ 5 ^ pH ϖH φHP tTi i tTi 1 1 i tTi i which could be simply represented as follows: HP HP HP PH i 5 Φi θi 1 Γi ;
2
3 pH i ðt2 Þ 6 pH ðt3 Þ 7 i 6 7; where PH 5 i 4 5 ^ pH t 1 1 Ti i
2
3 φHP i ðt1 Þ 6 φHP ðt2 Þ 7 6 i 7; ΦHP i 54 ^ 5 φHP tTi i
for i 5 1; 2; . . .; I
(16.9)
2
3 ϖH i ðt1 Þ 6 ϖH ðt2 Þ 7 6 i 7. ΓHP i 54 ^ 5 ϖH i tTi
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
387
16.2 Materials and methods
Therefore the parameters in the vector θHP could be estimated by applying the followi ing constrained least squares estimation problem: 2
HP H minθHP :ΦHP i θi 2Pi :2 i 0 ? 0 0 ? subject to 0 ? 0 0 ?
0 0
0 0 HP 0 θi # 1 0 1
21 0
(16.10)
The parameters in the host PPIN dynamic Eq. (16.1) can be estimated by solving the constrained least squares problem (16.10) via the help of lsqlin function in MATLAB optimization toolbox, and simultaneously the host protein translation rate αH i is guaranteed to be nonnegative and the host protein degradation 2γ H is guaranteed to be nonpositive, i H that is, αH $ 0 and 2γ # 0. i i Similarly, the host GRN dynamic Eq. (16.2) could be represented as the following linear regression form: pH gH 1 ðtÞ ? j ðt 1 1 Þ 5
pH Ij ðtÞ
lH 1 ðtÞ
pH ðtÞpH ? lH 1 ðtÞ Nj ðtÞ 1
H H ? pH gH 0 ðtÞp } ðtÞ j ðtÞm1 ðtÞ I I j
3 bH j1 7 6 ^ 7 6 7 6 bH jIj 7 6 7 6 H 7 6 ej1 7 6 7 6 ^ 7 6 7 6 eH jNj 7 6 6 xH 7 7 6 j1 7 1 εH ðtÞ9φHG ðtÞθHG 1 εH ðtÞ; 6 ^ j j j j 7 6 6 xH 7 7 6 00 } 6 j Ij 3 Ij 7 7 6 7 6 2dH j1 7 6 7 6 ^ 7 6 6 2dH 7 jKj 7 6 7 6 4 1 2 λH j 5 δH j
j
? gH ðtÞmH ðtÞ gH ðtÞ 1 Kj j j
2
for j 5 1; 2; . . .; J
(16.11)
where φHG j ðtÞ represents the regression vector that can be obtained from the microarray expression data, and θHG is the unknown regulatory parameter vector to be estimated for j the jth host gene in host GRN. Eq. (16.11) of the jth host gene can be augmented for Tj data points as the following form: HG HG GH 1 ΓHG j 5 Φj θj j ;
2
3 gH j ðt2 Þ 6 7 6 gH 7 j ðt3 Þ 5 where GH 6 7; j 4 ^ 5 gH j tTj 1 1
2
3 φHG j ðt1 Þ 6 HG 7 6 φj ðt2 Þ 7 ΦHG 5 6 7; j 4 ^ 5 φHG tTj j
for j 5 1; 2; . . .; J
(16.12)
2
3 εH j ðt1 Þ 6 H 7 6 ε j ðt2 Þ 7 ΓHG 5 6 7. j 4 ^ 5 εH j tTj
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
388
16. Investigating the host/pathogen cross-talk mechanism
Therefore the regulatory parameters in the vector θHG can be estimated by applying the j following constrained least squares estimation problem: 2
HG H minθHG :ΦHG j θj 2Gj :2 j 2 0 ? 0 6^ & ^ subject to6 40 ? 0 0 ? 0
0 ? ^ & 0 ? 0 ?
0 ^ 0 0
0 ? ^ & 0 ? 0 ?
0 ^ 0 0
0 ? & & & 1 ? 0
1 0 ^ 0
? ? & 1
3 2 3 0 0 6 7 (16.13) ^7 7θHG # 6 ^ 7 j 405 ^5 0 1
The regulatory parameters in the host GRN dynamic Eq. (16.2) can be estimated by solving the above constrained optimization problem (16.13), and simultaneously the host miRNA repression ability 2dH jk is guaranteed to be nonpositive and the host gene degradation rate 2λH is guaranteed to be nonpositive, that is, 2dH j jk # 0 for k 5 1, . . ., Kj and H 2λj # 0. The dynamic model of host-miRNAs in Eq. (16.3) can also be rewritten as the following linear regression form: 2
3 yH k1 6 7 H 6 ^H 7 H H H H 6 mk ðt 1 1Þ 5 p1 ðtÞ ? pIk ðtÞ mk ðtÞ 1 6 ykIk 7 7 1 ς k ðtÞ 4 1 2 μH 5 k ϕH k HM 1 ςH for k 5 1; 2; . . .; K 9φHM k ðtÞθk k ðtÞ;
(16.14)
where φHM k ðtÞ represents the regression vector that can be obtained from the microarray expression data, and θHM is the unknown regulatory parameter vector to be estimated for k the kth host miRNA in host GRN. Eq. (16.14) of the kth host miRNA can be augmented for Tk data points as the following form: HM HM MH 1 ΓHM k 5 Φk θ k k ;
2
3 mH k ðt2 Þ 6 mH ðt3 Þ 7 k 6 7; where MH k 54 5 ^ H mk tTk 1 1
for k 5 1; 2; . . .; K
(16.15)
2
3 ςH k ðt1 Þ 6 ς H ðt2 Þ 7 k 7 56 ΓHM k 4 ^ 5. ςH k tTk
2
3 φHM k ðt1 Þ 6 φHM ðt2 Þ 7 k 7; ΦHM 56 k 4 ^ 5 φHM tTk k
Therefore the regulatory parameters in the vector θHM can be estimated by applying the k following constrained least squares estimation problem: 2
minθHM :ΦHM θHM 2MH k :2 k k k subject to 0 ? 0 1 0 θHM #1 k
(16.16)
The parameters in the host miRNA dynamic Eq. (16.3) can be estimated by solving the constrained least squares problem (16.16), and simultaneously the host miRNA degradaH tion rate 2μH k is guaranteed to be nonpositive, that is, 2μk # 0.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
389
16.2 Materials and methods
Similarly, the dynamic model of host-lncRNAs in Eq. (16.4) could be rewritten as the following linear regression form: 2 H 3 zn1 6 ^ 7 H 6 H 7 H H H 6 7 lH n ðt 1 1Þ 5 p1 ðtÞ ? pIn ðtÞ ln ðtÞ 1 6 znIn 7 1 ϑn ðtÞ (16.17) 4 1 2 χH 5 n H ρn HL H 9φHL ð t Þθ 1 ϑ ð t Þ; for n 5 1; 2; . . .; N n n n where φHL n ðtÞ denotes the regression vector that can be obtained from the microarray expression data, and θHL n is the unknown regulatory parameter vector to be estimated for the nth host lncRNA in host GRN. Eq. (16.17) of the nth host lncRNA can be augmented for Tn data points as the following form: HL HL HL LH n 5 Φn θn 1 Γn ;
2
3 lH n ðt2 Þ 6 lH ðt3 Þ 7 n 6 7; where LH n 54 5 ^ lH t 1 1 n Tn
for n 5 1; 2; . . .; N
(16.18)
2
3 ϑH n ðt1 Þ H 6 ϑ ðt2 Þ 7 6 n 7. ΓHL n 54 ^ 5 ϑH n tTn
2
3 φHL n ðt1 Þ HL 6 φ ðt2 Þ 7 6 n 7; ΦHL n 54 ^ 5 φHL n tTn
Therefore the regulatory parameters in the vector θHL n can be estimated by applying the following constrained least squares estimation problem: 2
HL H minθHL :ΦHL n θn 2Ln :2 n subject to 0 ? 0 1
0 θHL n #1
(16.19)
The regulatory parameters in the host lncRNA dynamic Eq. (16.4) can be estimated by solving the constrained least squares problem (16.19), and simultaneously the host H lncRNA degradation rate 2χH n is guaranteed to be nonpositive, that is, 2χn # 0. By the same process as in the host PPIN dynamic Eq. (16.1), we rewrote the pathogen PPIN Eq. (16.5) as the following linear regression form: 3 aPq1 6 ^ 7 6 P 7 7 6 a 6 qOq 7 6 P 7 6 cq1 7 7 6 ^ 7 1 6 6 P 7 6 cqIq 7 (16.20) 7 6 6 αP 7 q 7 6 6 1 2 γP 7 4 q 5 κPq 2
pPq ðt 1 1Þ 5 pPq ðtÞpP1 ðtÞ
? pPq ðtÞpPOq ðtÞ
PP P 1 ϖPq ðtÞ9φPP q ðtÞθq 1 ϖq ðtÞ;
P H pPq ðtÞpH 1 ðtÞ ? pq ðtÞpIq ðtÞ
gPq ðtÞ
pPq ðtÞ
for q 5 1; 2; . . .; Q
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
390
16. Investigating the host/pathogen cross-talk mechanism
where φPP q ðtÞ represents the regression vector that can be obtained from the microarray expression data, and θPP q is the unknown interactive parameter vector to be estimated for the qth pathogen protein in pathogen PPIN. Eq. (16.20) of the qth pathogen protein can be augmented for Tq data points as the following form: PP PP PPq 5 ΦPP q θq 1 Γq ;
2
3 pPq ðt2 Þ 6 pP ðt Þ 7 6 7 q 3 where PPq 5 6 7; 4 ^ 5 pPq tTq 1 1
for q 5 1; 2; . . .; Q
2
3 φPP q ðt1 Þ 6 φPP ðt Þ 7 6 q 2 7 ΦPP 7; q 56 4 ^ 5 φPP tTq q
(16.21)
2
3 ϖPq ðt1 Þ 6 ϖP ðt Þ 7 6 q 2 7 ΓPP 7. q 56 4 ^ 5 ϖPq tTq
Therefore the interactive parameters in the vector θPP q can be estimated by applying the following constrained least squares estimation problem: 2
PP P minθPP :ΦPP q θq 2Pq :2 q 0 ? 0 0 subject to 0 ? 0 0
? 0 ? 0
21 0 0 1
0 PP 0 θq # 0 1
(16.22)
The interactive parameters in the pathogen PPIN dynamic Eq. (16.5) can be estimated by solving the constrained least squares problem (16.22), which could guarantee the pathogen protein translation rate αPq to be nonnegative and the pathogen protein degradation rate 2γ Pq to be nonpositive simultaneously, that is, αPq $ 0 and 2γ Pq # 0. Finally, we rewrote the pathogen GRN Eq. (16.6) as the linear regression form: 3 2 bPh1 6 ^ 7 7 6 6 bP 7 hQh 7 6 7 6 6 2dPh1 7 P H P gPh ðt 1 1Þ 5 pP1 ðtÞ ? pPQh ðtÞ gPh ðtÞmH 7 1 ðtÞ ? gh ðtÞmKh ðtÞ gh ðtÞ 1 6 6 ^ 7 6 P 7 6 2dhKh 7 7 6 4 1 2 λPh 5 δPh PG P 1 εPh ðtÞ9φPG h ðtÞθh 1 εh ðtÞ;
for h 5 1; 2; . . .; H
(16.23)
φPG h ðtÞ
where represents the regression vector that can be obtained from the microarray expression data, and θPG h is the unknown regulatory parameter vector to be estimated for the hth pathogen gene in pathogen GRN. Eq. (16.23) of the hth pathogen gene can be augmented for Th data points as the following form: PG PG GPh 5 ΦPG h θ h 1 Γh ;
for h 5 1; 2; . . .; H
(16.24)
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
391
16.2 Materials and methods
2
3 gPh ðt2 Þ 6 gP ðt3 Þ 7 h 7; where GPh 5 6 4 5 ^ P gh t T h 1 1
2
3 φPG h ðt1 Þ 6 φPG ðt2 Þ 7 6 h 7; ΦPG h 54 ^ 5 φPG tTh h
2
3 εPh ðt1 Þ 6 ε P ðt2 Þ 7 6 h 7. ΓPG h 54 ^ 5 εPh tTh
Thus the regulatory parameters in the vector θPG h can be estimated by applying the following constrained least squares estimation problem: 2
PG P minθPG :ΦPG h θh 2Gh :2 h 2 0 ? 0 6^ & ^ subject to6 40 ? 0 0 ? 0
1 0 ^ 0
0 & & ?
? ? & ? 1 & 0 1
3 2 3 0 0 6^7 ^7 PG 7θ # 6 7 405 ^5 h 0 1
(16.25)
The regulatory parameters in the pathogen GRN dynamic Eq. (16.6) can be obtained by solving the abovementioned constrained least squares estimation problem (16.25), which could guarantee the host miRNA repression 2dPhk to be nonpositive and the pathogen gene degradation effect 2λPh to be nonpositive simultaneously, that is, 2dPhk # 0 and 2λPh # 0. As mentioned previously, to avoid the overfitting problem in the parameter identification process, we have applied cubic spline to interpolate extra data points (five times number of the parameters in the parameter vector to be estimated, that is, θHP in host PPIN, I HM HL θHG in host GRN, θ in host miRNA dynamic model, θ in host lncRNA dynamic j k n PG model, θPP in pathogen PPIN, and θ in pathogen GRN). Therefore, with the microarray q h expression data, we could solve the constrained least squares estimation problems in (16.10), (16.13), (16.16), (16.19), (16.22), and (16.25) and identify the precise parameters in GEINs gene by gene (or protein by protein) via the lsqlin function of MATLAB optimization toolbox. Since the measurement technology of genome-wide protein expression of Caco-2 cells and C. difficile has not yet been realized, and about 73% variance of protein abundance can be explained by the corresponding mRNA abundance [788], the microarray data of gene expressions can replace protein expressions, providing sufficient information for solving the previous constrained least squares parameter estimation problems in (16.10), (16.13), (16.16), (16.19), (16.22), and (16.25).
16.2.6 Pruning false positives in candidate GEIN for real GEIN via system order detection scheme Because a candidate GEIN contains several false-positive information obtained from computational, experimental, and homology-dependent predictions, we need to apply a system order detection scheme to the network models in (16.1)(16.6) to prune these false positives in the candidate GEIN. Therefore we applied Akaike information criterion (AIC) to delete the insignificant parameters that were out of the system order in models of the candidate GEIN [40] using the real microarray data of Caco-2 cells and C. difficile in the early and late stages of infection. Next, we could obtain the real GEINs in the early and late stages of CDI.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
392
16. Investigating the host/pathogen cross-talk mechanism
For the host PPIN model in (16.9), AIC of the ith host protein in the PPIN can be defined as the function of system interaction order as follows [40]:
T 2ð F 1 Q Þ 1 i i H HP ^ HP H HP ^ HP ð F ; Q Þ 5 log P 2Φ P 2 Φ (16.26) AICHP 1 θ θ i i i i i i i i Ti i Ti HP the where θ^ i represents the estimated parameter vector of the ith host protein T by solving HP HP;2 HP ^ HP HP ^ 1 H H constrained optimization problem in (16.10); and σ^ i 5 Ti Pi 2Φi θi Pi 2 Φ i θ i
is the estimated residual error. When we low down the system interaction order (Fi 1 Qi), the corresponding residual error will increase. Likewise, when we attempt to minimize the residual error, the system order increases. Therefore we need to trade-off the residual error and the system order to achieve the minimum value of AICHP for real system order. According to i the theory of system identification, the real system order Fi 1 Qi for the real host PPIN of GEIN could minimize AICHP i ðFi ; Qi Þ [11,40,687]. Based on the theory of AIC, host proteins, the interaction abilities of which are out of Fi , as well as pathogen proteins, the interaction abilities of which are out of Qi , should be considered as false positives and pruned away from the candidate PPIs to obtain the real PPIs one protein by one protein in GEIN. Similarly, for the host GRN model in (16.12), AIC of gene regulations in the jth host gene can be defined as follows: 0 1 T HG HG 1 HG ^ HG ^ A AICHG Ij ; Nj ; I 0 Iv; Kj 5 log@ GH GH j j 2Φj θj j 2 Φj θ j Tj (16.27) 2 Ij 1 Nj 1 I 0 Iv 1 Kj 1 Tj where θ^ j denotes the estimated regulatory parameter vector of the jth host gene by solving the constrained least squares problem in (16.13); and σ^ HG;2 5 j T HG HG HG ^ HG ^ 1 H GH is the estimated residual error. We then achieved the j 2 Φj θ j Tj Gj 2Φj θj HG
to get the corresponding real system order Ij 1 Nj 1 minimum value of AICHG j 0 ðI IvÞ 1 Kj to prune the false-positive regulations of candidate GRN one gene by one gene for the real host GRN of GEIN. In addition, for the host miRNA model in (16.15) and the lncRNA model in (16.18), AICs of the kth host miRNA and the nth host lncRNA can be defined as follows, respectively,
T 2ðI Þ 1 k H HM ^ HM H HM ^ HM ð I Þ 5 log M 2Φ M 2 Φ (16.28) AICHM θ θ 1 k k k k k k k k Tk Tk
T 2ð I Þ 1 n H HL ^ HL H HL ^ HL AICHL ð I Þ 5 log L 2Φ L 2 Φ (16.29) θ θ 1 n n n n n n n Tn n Tn where θ^ k and θ^ n denote the estimated regulatory parameter vector of the kth host miRNA by solving the constrained optimization problem in (16.16) and the nth host lncRNA by solving the constrained optimization problem in (16.19), respectively; HM
HL
T T HM ^ HM HM ^ HM HL ^ HL HL ^ HL σ^ kHM;2 5 ð1=Tk Þ MH MH and σ^ HL;2 5 ð1=Tn Þ LH LH n 2Φn θn n 2 Φn θn n k 2Φk θk k 2 Φk θk
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
393
16.2 Materials and methods
represent the estimated residual error of the kth host miRNA and the nth host lncRNA, respectively. Therefore, by the trade-off between system order and residual error in the corresponding AIC, we can achieve the minimum AICHM and AICHL k n to obtain their corresponding real system order Ik and In to prune the false positives off Ik and In for the real host miRNA regulations and the real host lncRNA regulations of GEIN, respectively. Finally, we applied the same pruning process by the AIC method as in host GEIN to prune the false positives from pathogen candidate GEIN. For the pathogen protein interaction model in (16.21) and the pathogen gene regulatory model in (16.24), AIC of PPIs in the qth pathogen protein, and gene regulations of the hth pathogen gene, can be defined as follows, respectively,
T 2 O 1 I 1 q q PP P PP ^ PP P PP ^ PP AICq Oq ; Iq 5 log P 2Φq θq Pq 2 Φ q θ q (16.30) 1 Tq q Tq
T 2ð Q 1 K Þ 1 h h PG P PG ^ PG P PG ^ PG Gh 2Φh θh G h 2 Φh θ h (16.31) AICh ðQh ; Kh Þ 5 log 1 Th Th where θ^ q and θ^ h denote the estimated parameter vector of the qth pathogen protein by solving the constrained optimization problem in (16.22), and the hth pathogen gene by solving the constrained optimization problem in (16.25), respectively. σ^ PP;2 5 ð1=Tq Þ T T q PG;2 PP ^ PP PP ^ PP PG ^ PG PG ^ PG P P P P P 2Φ θ 5 ð1=Th Þ G 2Φ θ G 2Φ θ P 2Φ θ and σ^ are the estiPP
q
q
PG
q
q
q
q
h
h
h
h
h
h
h
mated residual error of the qth pathogen protein and the hth pathogen gene, respectively. Thus, by the trade-off between system order and residual error in AIC, we could achieve PG the minimum AICPP q and AICh to obtain their corresponding real system order Oq 1 Iq and Qh 1 Kh to prune false positives for the real PPIN and GRN of real pathogen GEIN, respectively. After identifying the system order and pruning the false positives of candidate GEINs, we finally got the real GEINs of the early and late stages of CDI for each replicate. Since the real GEINs are still very complex, it is difficult to investigate the precise hostpathogen interaction mechanisms from these networks. We therefore performed PNP method to extract the core network structures of these GEINs to help investigate the cross-talk mechanisms at different stages of CDI.
16.2.7 Extracting core network structures from real GEINs via principal network projection method To prepare for applying PNP method to extract the HPCNs from real GEINs, we need to construct a combined network matrix H that contains all estimated parameters in the real GEIN as follows: 3 2 Hhp;hp Hhp;pp 0 0 0 6 Hpp;hp Hpp;pp 0 0 0 7 7 6 6 Hhg;hp 0 H H H hg;hm hg;hl hg;hc 7 7Aℝð2I12Q1K1NÞ 3 ðI1Q1K1N1I0 IvÞ 6 H56 7 0 H H 0 0 pg;pp pg;hm 7 6 4 Hhm;hp 0 0 0 0 5 Hhl;hp 0 0 0 0 V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
394
16. Investigating the host/pathogen cross-talk mechanism
2
3 2 H 3 2 P 3 c^11 ? c^H c^11 ? c^P1I a^H ? a^H 1Q 11 1I 6 7 6 7 6 c^Pqi ^ 7 ^ 5; Hhp;pp 5 4 ^ c^H ^ 5; Hpp;hp 5 4 ^ where Hhp;hp 5 4 ^ a^H 5; if iq P P H H H ^ ^ ? c c ^aH ^ ^ ^ cI1 ? cIQ ? aII Q1 QI I1 2 H 3 2 3 2 P H H H P 3 ^ ^ ^ a^11 ? a^1Q b11 ? b1I 2 d11 ? 2 d^1K 6 7 6 7 H 6 7 ^H ^ 7; Hhg;hm 5 6 ^ ^ 5; Hhg;hp 5 6 Hpp;pp 5 4 ^ a^Pqo 2 d^jk ^ 7 4 ^ bji 5 4 5; P P H H H H a^Q1 ? a^QQ b^ ? b^II ? 2 d^IK 2 d^I1 2 H I1 3 2 P H P 3 2 H 3 ? x^ 1I 0 I00 x^ 11 b^11 ? b^1Q e^11 ? e^H 1N H 6 ^ x^ 6 7 6 7 ^ 7 7; Hpg;pp 5 6 ^ b^P 00 ^ 5; Hhg;hc 5 6 Hhg;hl 5 4 ^ e^H ^ 7 jn j Ij ði0 21Þ1i00 4 5 4 5; hq H P P ^ e^H ? e H H I1 IN b^Q1 ? b^QQ x^ I1 ? x^ II0 I 00 2
6 Hpg;hm 5 6 4
2 d^11 P
^ P 2 d^
Q1
? P 2 d^
hk
?
2 d^1K P
^ P 2 d^
3
2
y^ H 6 11 Hhm;hp 5 4 ^ y^ H K1
7 7; 5
QK
? y^ H ki ?
3 y^ H 1I 7 ^ 5; H y^ KI
2
z^H 6 11 and Hhl;hp 5 4 ^ z^H N1
3 ? z^H 1I 7 z^H ^ 5: ni H ? z^NI
H ^ HP by solving the parameter estimation problem a^H if and c^iq could be obtained from θi H H in (16.10) and pruning false positives by AIC method in (16.26); b^ , 2 d^ , e^H , and ji
jk
jn
^ HG by solving the parameter estimation problem x^ H jðIvj ði0 21Þ1ivÞ could be obtained from θj in (16.13) and pruning false positives by AIC method in (16.27); y^ H ki could be obtained HM ^ from θk by solving the parameter estimation problem in (16.16) and pruning false ^ HL by solving the positives by AIC method in (16.28); z^H ni could be obtained from θn parameter estimation problem in (16.19) and pruning false positives by AIC method PP in (16.29); c^Pqi and a^Pqo could be obtained from θ^ q by solving the parameter estimation P problem in (16.22) and pruning false positives by AIC method in (16.30); and b^hq and P PG by solving the parameter estimation problem in 2 d^ could be obtained from θ^ h
hk
P (16.25) and pruning false positives by AIC method in (16.31). a^H if and a^qo represent the corresponding interactive abilities of intraspecies PPIs in host and pathogen PPINs; P c^H iq and c^qi denote the corresponding interactive abilities between host protein and H P pathogen protein in the interspecies PPIN; b^ji and b^hq represent the corresponding H regulatory abilities of intraspecies TF regulations in host and pathogen GRNs; 2 d^jk P and 2 d^ denote the corresponding repression abilities of host miRNA on host gene hk
^H and pathogen gene, respectively; e^H jn and x jðIvj ði0 21Þ1ivÞ indicate the corresponding regulatory abilities of host lncRNA and host complex with regard to host gene in the ^H intraspecies host GRN; y^ H ki and zni represent the corresponding regulatory abilities of host TF on host miRNA and lncRNA in the host intraspecies GRN, respectively. All these weighted connections (links) constitute the combined network matrix H. Note that if a connection has been removed by AIC or was not built in candidate GEIN by big data mining, the corresponding location in matrix H would be set to zero. We
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
395
16.2 Materials and methods
then extracted the core components of GEIN using PNP method, which is a principal network structure projection method based on the principal singular values in the reduction of network dimension by deleting insignificant structures. Therefore the combined network matrix H can be represented by singular value decomposition form as follows: H 5 U 3 D 3 VT 0
(16.32)
ðI1Q1K1N1I 0 IvÞ 3 ðI1Q1K1N1I 0 IvÞ
ð2I12Q1K1N Þ 3 ðI1Q1K1N1I IvÞ
; VAℝ ; and D 5 diag where UAℝ d1 ; . . . ; ds ; . . . ; dI1Q1K1N1I0 Iv is the diagonal matrix of d1 ; . . .; dI1Q1K1N1I0 Iv , which contains I 1 Q 1 K1N1I0 Iv singular values of the combined network matrix H with a descending order, that is, d1 $ ? $ ds $ ? $ dI1Q1K1N1I0 Iv . In addition, we define the eigenexpression fraction (Es) as the following normalization form: Es 5
d2s I1Q1K1N1I P 0 Iv s51
(16.33) d2s
To guarantee the integrality of the network structure, we select the minimum S such S P that Es $ 0:85, that is, the top S principal components containing 85% network structure s51
of GEIN from the energy perspective. Therefore the projection of H to the top S singular vectors of U and V are defined, respectively, as follows: VL ðwL ; sÞ 5 hT:;wL 3 u:;s and VR ðwR ; sÞ 5 hwR ;: 3 v:;s for wL 5 1; . . . ; I 1 Q 1 K 1 N 1 I 0 Iv; wR 5 1; . . . ; 2I 1 2Q 1 K 1 N and s 5 1; . . . ; S
(16.34)
where h:;wL , hwR ;: , u:;s , and v:;s represent the wLth column of H, the wRth row of H, the sth column of U, and the sth row of V, respectively. We further defined the 2-norm projection value of each node (protein/gene/miRNA/lncRNA/complex) in GEIN to the top S leftsingular vectors and right-singular vectors as follows: " #1=2 S X ½VL ðwL ; sÞ2 DL ðwL Þ 5 "
s51
S X ½VR ðwR ; sÞ2 DR ðwR Þ 5
#1=2
(16.35) ;
s51
for wL 5 1; . . . ; I 1 Q 1 K 1 N 1 I 0 Iv and wR 5 1; . . . ; 2I 1 2Q 1 K 1 N The physical meaning of (16.35) is that if the projection value DL(wL) is close to zero, the corresponding wLth node is almost independent to the core network reconstructed by the top S singular vectors; the larger the projection value, the larger the contribution of the node to the core network. So does the relationship between DR(wR) and the wRth node. Finally, we can extract the HPCNs from the GEIN of the early and late stages of CDI, respectively, by assessing the projection value of each node in (16.35). Since the purpose of this chapter is to identify the cross-talk mechanisms that contribute to the progression of CDI, we
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
396
16. Investigating the host/pathogen cross-talk mechanism
targeted the core host/pathogen proteins with top projection values, and their connecting TF/ miRNA/lncRNA/complex to form the HPCN for further systematic investigation.
16.3 Investigating the cross-talk mechanism by constructing the genetic-and-epigenetic interspecies network 16.3.1 The identified GEINs at the early and late stages of Clostridium difficile infection By applying a system identification method and system order detection scheme to two-sided microarray data (Fig. 16.1), we identified the early-stage and late-stage GEINs of three biological replicates using the network visualizing software Cytoscape [88] (Fig. 16.3). The numbers of identified edges and nodes are also shown in Tables 16.1 and 16.2, respectively. Among all three replicates, the node number of host TFs at the early stage is higher than that at the late stage. In addition, the identified edges in Table 16.2 demonstrate significant differences in host-TFs to host-genes regulation between the two stages. These results suggest that the activities of hostTFs are more abundant during the early stage. The host complex also exhibits similar behavior, and the activity of the complex at the early stage is much more abundant than at the late stage. Since the complexes in our model are composed of two TFs, it is convincible that the performance of complexes could also be influenced by TFs. Other than TFs, the number of host lncRNAs is greater during the late stage than that at the early stage. However, the decreased number of lncRNA-to-gene regulations and the increased number of TF-to-lncRNA regulations implied that host-lncRNAs play the role of target gene rather than that of regulator at the late stage (Table 16.2). E_R1 and L_R1 represent the early stage and late stage of replicate 1, respectively. Similarly, E_R2 and L_R2 denote the early stage and late stage of replicate 2, and E_R3 and L_R3 are the early stage and late stage of replicate 3. HP denotes host cytosolic protein (excluding host receptor and host TF); HR, HT, HM, HL, and HC represent host receptor, host TF, host miRNA, host lncRNA, and host complex, respectively; PP means pathogen protein excluding pathogen TF; and PT denotes pathogen TF. The arrow lines in first column represent transcriptional/posttranscriptional regulations; the solid lines in the first column denote PPIs; HG signify host-genes; and PG are pathogen genes. To further characterize genes in Caco-2 cells according to their functional groups, we performed an enrichment analysis via the function annotation tool DAVID [789] on the conserved target genes among all three replicates based on the biological process categories of GO database and the protein information resources of the Swiss-Prot database (Table 16.3). The early stage of CDI was characterized by the disturbance of cell shape and epithelial cell barrier, as well as immune activation and metal binding, which plays an important role in the scramble for metallic nutrients between the host and pathogen. At the late stage the analysis focusing on inflammatory-related functions and molecule secretion/transport suggests that a strong inflammatory response is triggered to eliminate the pathogen. The early stage of CDI is characterized by the change of cell shape and tight junction, and this could result from the activities of GTPases and pathogen toxins. The metalbinding ability is crucial for both host and pathogen cells due to its important role in the scramble of metallic nutrients and the transport of toxic molecule, including ROS. An
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.3 Investigating the cross-talk mechanism by constructing the genetic-and-epigenetic interspecies network
397
FIGURE 16.3 The real GEINs of three replicates during the early and late stages of CDI. These figures show that the identified real genome-wide GEINs of each replicate during the early and late stages of infection. The gray lines represent proteinprotein interaction; the red lines denote the transcriptional regulation; and the green lines signify the miRNA repression [12]. CDI, Clostridium difficile infection; GEIN, genetic-andepigenetic interspecies network; miRNA, microRNA.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
398
16. Investigating the host/pathogen cross-talk mechanism
TABLE 16.1 The number of identified nodes in the early stage and late stage of each replicate [12]. Nodes
Candidates
E_R1
L_R1
E_R2
L_R2
E_R3
L_R3
HP
14,686
14,263
14,565
13,823
14,799
14,160
14,608
HR
2137
2053
2070
2039
2083
2055
2068
HT
1780
860
815
1047
685
944
805
HM
14
14
13
14
14
14
14
HL
223
44
64
21
74
44
54
HC
3
3
3
3
3
3
3
PP
3585
3503
3400
3454
3509
3531
3527
PT
71
56
44
39
57
57
59
Total nodes
22,499
20,796
20,974
20,440
21,224
20,808
21,138
TABLE 16.2 The number of identified edges in the early stage and late stage of each replicate [12]. Edges
Candidates
E_R1
L_R1
E_R2
L_R2
E_R3
L_R3
HT-HG
138,486
20,800
18,594
26,695
16,483
21,770
18,827
HT-HM
31
17
6
4
6
7
8
HT-HL
218
47
76
34
90
55
64
HL-HG
184
12
3
27
8
7
15
HM-HG
5961
357
165
1,163
176
379
198
HC-HG
5442
1171
1304
1826
347
1175
738
HM-PP
96
4
5
0
2
19
7
PT-PG
1265
543
449
262
526
671
619
HGHG
3,425,976
109,102
41,123
183,116
29,978
104,647
34,759
HGPG
17,068
610
2,264
1,333
226
544
145
PGPG
290,018
9,826
36,279
41,960
7,543
14,202
3,989
Total edges
3,884,745
142,489
100,268
256,420
55,385
143,476
59,369
immune response is also activated to eliminate the pathogen. In the case of the late stage of infection, the abundant cellular processes, including macrophage activation, phagocytosis, and inflammatory response reflect that a strong inflammatory response is triggered in this stage. The cofactor transport and secretion-related process also contribute to the cytokine production and secretion. Since GEINs are very complex, it is difficult to investigate the precise hostpathogen interaction process from these networks directly. We therefore performed the PNP method to extract the core nodes to construct the corresponding HPCNs from early-stage and latestage GEINs of Caco-2 cells during the CDI.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.3 Investigating the cross-talk mechanism by constructing the genetic-and-epigenetic interspecies network
399
TABLE 16.3 The functional enrichment analysis of the conserved target genes among three replicates based on GO terms and protein information resources of Swiss-Prot [12]. Term
P-value
SP_PIR_KEYWORDS
Metal binding
5.73E 2 05
SP_PIR_KEYWORD
Disease mutation
.002178622
GOTERM_BP_FAT
GO:0048858Bcell projection morphogenesis
.004720934
GOTERM_BP_FAT
GO:0002252Bimmune effector process
.021024346
SP_PIR_KEYWORDS
Tight junction
.04930793
SP_PIR_KEYWORDS
Secreted
8.84E 2 04
GOTERM_BP_FAT
GO:0042116Bmacrophage activation
.002396512
GOTERM_BP_FAT
GO:0006954Binflammatory response
.007684734
GOTERM_BP_FAT
GO:0051181Bcofactor transport
.009323758
GOTERM_BP_FAT
GO:0006910Bphagocytosis, recognition
.040000612
Category Early stage
Late stage
The real GEINs in Fig. 16.3 of all three replicates during the early stage of CDI are combined with in this figure. The gray lines represent PPI; the red lines denote the transcriptional regulation; and the green lines signify miRNA repression [12]. The real GEINs in Fig. 16.3 of all three replicates during the late stage of CDI are combined with in this figure. The gray lines represent PPI; the red lines denote the transcriptional regulation; and the green lines signify miRNA repression This HPCN was extracted from the real GEIN in Fig. 16.4 by PNP method. The gray lines represent PPI; the red lines denote the transcriptional regulation; and the green lines signify miRNA repression. This HPCN was extracted from the real GEIN in Fig. 16.5 by PNP method. The gray lines represent PPI; the red lines denote the transcriptional regulation; and the green lines signify miRNA repression. 16.3.1.1 The hostpathogen core networks (HPCNs) during the infection of C. difficile 16.3.1.1.1 Construction of HPCNs to investigate the epigenetic activities in host core networks of CDI
Applying the PNP method to GEINs in Fig. 16.3, we could assess the projection value of each node for the construction of HPCN. Host/pathogen proteins with top 2000 projection values based on intraspecies ranking in all three replicates and their connected genes/miRNAs/lncRNAs/complex were selected as core nodes of GEINs of each stage. Since the identified GEINs in Fig. 16.3 belong to three biological replicates from the same cell line, the identified differential interactions and regulations can be viewed as the
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
400
16. Investigating the host/pathogen cross-talk mechanism
FIGURE 16.4
The real genome-wide GEIN of Caco-2 cells among all three replicates during the early stage of CDI. CDI, Clostridium difficile infection; GEIN, genetic-and-epigenetic interspecies network.
FIGURE 16.5 The real genome-wide GEIN of Caco-2 cells among all three replicates during the late stage of CDI [12]. CDI, Clostridium difficile infection; GEIN, genetic-and-epigenetic interspecies network.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.3 Investigating the cross-talk mechanism by constructing the genetic-and-epigenetic interspecies network
401
adaptability of cells while facing stress and stimulus at different replicates. For more complete information the combinations of these interactions/regulations in three replicates are considered as real GEINs in the early and late stages as shown in Figs. 16.4 and 16.5, respectively. Next, we extracted core nodes from the real GEINs in Figs. 16.4 and 16.5 using the PNP method to consist HPCNs as shown in Figs. 16.6 and 16.7 at the early and the late stages, respectively. Comparing Figs. 16.6 and 16.7, the number of core proteins in the HPCN of the early stage of CDI (Fig. 16.6) is higher than that in the HPCN of late-stage CDI (Fig. 16.7). This results from the redundancy of cellular processes utilized by cells. In the early stage of CDI, cells perform cellular functions via similar molecules, causing high projection values of these conserved proteins. However, in the late stage of infection, the alternative routes of the offensive and defense mechanisms utilized by cells result in different projection values among replicates. Only the conserved proteins within the top 2000 places were selected as core nodes in the HPCNs of late-stage CDI. In order to adapt to CDI, some posttranslation epigenetic modifications in host cells can also be found in HPCNs. These epigenetic modifications can be detected by the basal level κH i in the host protein expression dynamic Eq. (16.1) in the materials and methods section. During the early stage of CDI (Fig. 16.6), host mitogen-activated protein kinase (MAPK) pathway members (UBA52 and HSPA5) can be regulated by the deacetylase protein (HDAC11) and the ubiquitin protein (UBE2D3). In addition, the host proteins [DUSP6 and EGFR (epidermal growth factor receptor)] involved in the interleukin-3, -5 signaling pathway can also be regulated by the methyltransferase protein (PRDM14), the deubiquitinase
FIGURE 16.6 HPCN of Caco-2 cells during the early stage of CDI [12]. CDI, Clostridium difficile infection.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
402
FIGURE 16.7
16. Investigating the host/pathogen cross-talk mechanism
HPCN of Caco-2 cells during the late stage of CDI [12]. CDI, Clostridium difficile infection.
protein (OTUB1), and the ubiquitin protein (UCHL5). These activities in the MAPK pathway and interleukin-related pathway can trigger the immune response of host cells in response to the invasive bacterium. This finding is consistent with the functional enrichment analysis of early-stage GEIN in Table 16.3. Furthermore, in Fig. 16.6, the small GTPase (CDC42) and downstream effectors (GRB2 and KBTBD7) that participate in the GTPase signaling pathway could be regulated by the deacetylase protein (HDAC4) and ubiquitin proteins (UBA52, USP43). This suggests that the dynamic of cytoskeleton homeostasis can be affected by C. difficile toxins and host epigenetic activities, to influence the change of cell shape displayed in Table 16.3. Similarly, at the late stage of CDI (Fig. 16.7), the subunits (NFKB1 and REL) of the NF-κB complex are regulated by acetyltransferase proteins (GCNT2 and B3GNT6) and the methyltransferase protein (PRDM14), resulting in the assembly of the NF-κB complex and the induction of apoptosis in host cells. In addition to posttranslation modification, our results demonstrate that host heat shock protein (HSP) HSPA5 is DNA methylated in the HPCN at the early stage of CDI. The change in the methylation level of HSPA5 has been proposed in Ref. [790] via the NOME-Seq analysis of HSPA5 in the intestinal disease, suggesting that the altered nucleosome positioning could induce differences in the accessibility. Similarly, in the late-stage HPCN shown in Fig. 16.7, the significant difference in the basal level of the NF-κB subunit NFKB1 gene reflects the change of methylation level, which is consistent with a previous study showing that NFκB1 is hypomethylated in the intestinal inflammation [791]. These identified
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.3 Investigating the cross-talk mechanism by constructing the genetic-and-epigenetic interspecies network
403
epigenetic activities demonstrate the ability of host cells to alter the behavior of proteins in order to adapt to the bacterium infection. 16.3.1.1.2 Comparison of the pathogen core networks with previously predicted core genes in Clostridium difficile
During the pathogenesis of the bacterium, essential genes are required for the survival and growth of the pathogen. The absence of essential genes results in a decreased growth rate or death of the organism. The pathogen core networks extracted from GEINs display not only the essential proteins required for the survival of the pathogen but also the important enzymes that contribute to the pathogenesis and defense mechanism of the pathogen against various stress conditions. Here, we compare HPCNs with the existing predicted essential proteins of C. difficile to show the difference and advantage of HPCNs. Recently, by applying flux balance analysis (FBA) and synthetic accessibility (SA) to a curated C. difficile metabolic network, Larocque et al. predicted 76 essential C. difficile genes [775]. Seven C. difficile proteins (CD2664, CD2335, CD3550, CD0198, CD1225, CD0130, and CD0123) in the HPCN of the early stage of CDI (Fig. 16.6), and three C. difficile proteins (CD2588, CD1816, and CD0130) in the HPCN of the late stage of CDI (Fig. 16.7), encoded by C. difficile genes required for the survival of the pathogen, have been identified in this study. Moreover, applying the transposon-directed insertion site sequencing (TraDIS) to the C. difficile transposon mutant library led to the identification of 404 genes with no transposon insertion in the library, which can be considered as the essential genes for the C. difficile growth [792]. A total of 13 C. difficile proteins (CD2664, CD2335, CD0067, CD3550, CD3540, CD0198, CD1255, CD2714, CD3256, CD1316, CD0095, CD2739, and CD0123) in the HPCN of the early stage of CDI (Fig. 16.6) and 15 C. difficile proteins (CD3170, CD2588, CD2744, CD2771, CD2462, CD3304, CD2781, CD3540, CD1145, CD2461, CD2793, CD0059, CD1275, CD0052, and CD1767) in the HPCN of the late stage have been identified in this chapter as the products of the essential genes for the growth of C. difficile. Overall, 15 pathogen genes in the HPCN of the early stage of CDI, and 17 pathogen genes in the HPCN of the late stage, were previously identified as essential genes of C. difficile (Table 16.4). Besides these previously identified essential proteins, HPCNs also provide numerous crucial C. difficile enzymes that participate in the offensive and defensive mechanism utilized by the pathogen, such as the well-known toxins (CD0660 and CD0663); toxin regulators (CD0659, CD0661, and CD0664); cell-wall proteins (CD2787, CD1987, CD0237, and CD0440) TABLE 16.4
List of previously reported essential genes of Clostridium difficile in HPCNs [12].
Identified C. difficile essential genes of the HPCNs Early stage
CD2664c, CD2335c, CD3550c, CD0198c, CD1225a, CD0130a, CD0123c, CD0067b, CD3540b, CD1255b, CD2714b, CD3256b, CD1316b, CD0095b and CD2739b
Late stage CD2588c, CD1816a, CD0130a, CD3170b, CD2744b, CD2771b, CD2462b, CD3304b, CD2781b, CD3540b, CD1145b, CD2461b, CD2793b, CD0059b, CD1275b, CD0052b and CD1767b a
Genes represent the C. difficile essential genes identified in the study [793]. Genes denote the C. difficile essential genes identified in the study [584]. c Genes signify the C. difficile essential genes reported by both Refs. [584,793]. b
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
404
16. Investigating the host/pathogen cross-talk mechanism
that provide cell-adhesion abilities; defensive proteins (CD0141, CD0171, CD2115, CD1631, and CD1690) against ROS; and sporulation-related proteins (CD1214, CD2643, CD2629, and CD1511). The presence of these proteins in HPCNs demonstrates that there are various cross-talk activities between the host and pathogen in our biological system model. The upper clump represents the pathogen core pathways, and the lower clump signifies the host core pathways at the late stage of CDI. The gray solid lines denote the PPI; the red arrow lines are transcriptional regulation; the green dot lines signify protein translation; the gray blue dash lines represent protein secretion; and the purple clump with arrow and dash lines indicate the activity of ROS. The abundant activities of CD0663 and the acetylation of FN1 result in the enhanced ROS production and a strong inflammatory response, while these counter mechanisms in turn increase the cellular stress of host cells. The ER stress response reflects that the accumulated cellular stress could risk the host cell. Therefore the tissue damage caused by severe inflammation and the accumulated cellular stress eventually triggers the apoptosis process of host cells. In addition, C. difficile utilize DNA damage response and antioxidative proteins against human-produced ROS and reduce the toxins production and cell growth rate for sporulation to transform to endospore form. 16.3.1.1.3 Cross-talk mechanisms among hostpathogen interactions and their validations
The cross talk between host and pathogen has been extensively investigated. However, the epigenetic modulation and interspecies PPIs of CDI are still largely unknown. To further investigate the offensive and defense mechanisms between the host and pathogen, we rearranged the HPCNs in Figs. 16.6 and 16.7 from the perspective of signal transduction pathways. The rearranged core signal transduction pathways of HPCNs in Figs. 16.6 and 16.7 at the early and late stages of CDI are shown in Figs. 16.8 and 16.9, respectively. For the validation of our identified hostpathogen PPIs in Figs. 16.8 and 16.9, we surveyed the existing literature for the studies reporting the recognized hostpathogen interactions during CDI. The interaction (CD2787, FN1) in both Figs. 16.8 and 16.9 has been experimentally verified [762]. CD2787 can promote cell adhesion by binding to FN1, and degrading host extracellular matrix proteins. Meanwhile, the interaction (CD0660, HSP90B1) in Fig. 16.8 and interactions (CD0660, HSP90B2P), (CD0663, HSP90B1), and (CD0663, HSP90B2P) in Fig. 16.9 can be verified by Refs. [763,764], which could demonstrate that C. difficile toxins can enter host cells through GP96. Similarly, in Fig. 16.8, C. difficile toxin B can enter the host cell through FPR1 (CD0660, FPR1) as validated in Ref. [765]. Finally, the well-known cytotoxic interactions [(CD0660, CDC42), (CD0663, CDC42), (CD0660, RHOA), and (CD0663, RHOA)] in Fig. 16.8, and (CD0663, RAC1) in Fig. 16.9 can also be verified in Refs. [753,794,795]. Considering that hostpathogen interspecies interactions play a central role in the bacterial invasion, we aim to investigate the cross talk between C. difficile proteins and host plasma membrane proteins. Once the bacterium is attached to the surface of Caco-2 cells in the early stage of infection (Fig. 16.8), the cell surfaceassociated cysteine protease CD2787 (cwp84) of bacterium interacts with fibronectin 1 (FN1) on the host membrane. This negative interaction (CD2787, FN1) and the MIB2-induced ubiquitination of FN1 suggest that CD2787 could provide a degrading activity on FN1, which is consistent with previous studies [762]. Since FN1 is involved in the maintenance of cell shape, it can bind cell surface and various compounds, including actin. The degradation of FN1 could result in the morphological
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.3 Investigating the cross-talk mechanism by constructing the genetic-and-epigenetic interspecies network
405
FIGURE 16.8 Core pathways rearranged from the HPCN in Fig. 16.6 at the early stage of CDI. The upper clump represents the pathogen core pathways and the lower clump signifies the host core pathways at the early stage of CDI. The gray solid lines denote the proteinprotein interaction; the red arrow lines are transcriptional regulation; the green dot lines signify protein translation; the gray blue dash lines represent protein secretion; and the purple clump with arrow and dash lines indicate the activity of ROS. The pathogenic factors (CD0660, CD0663, and CD0478) of C. difficile trigger ROS production and dysfunction of protein folding of Caco-2 cells. Therefore host cells employ autophagy and DNA damage response to remove induced cellular injuries and activate the immune response to eliminate pathogen. In response, C. difficile generate various antioxidative proteins to counteract the host-produced ROS [12]. CDI, Clostridium difficile infection; ROS, reactive oxygen species.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
406
16. Investigating the host/pathogen cross-talk mechanism
FIGURE 16.9 Core pathways rearranged from the HPCN in Fig. 16.7 at the late stage of CDI [12]. CDI, Clostridium difficile infection.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.3 Investigating the cross-talk mechanism by constructing the genetic-and-epigenetic interspecies network
407
change of Caco-2 cell or the disturbance of the actin cytoskeleton. There are other C. difficile proteins (CD0237 and CD0440) that may help CD2787 with cell adhesion. This is consistent with the pathogenesis of C. difficile since both CD0237 and CD0440 are cell-wall proteins. In addition, there are three pathogen proteins (CD0660, CD0663, and CD0478) secreted by C. difficile that could interact with four host receptors (FPR1, SCARA3, HSP90B1, and ARRB2). FPR1 has been reported as the receptor of CD0660 (TcdB) [765]. Its interaction with toxins (CD0660, CD0663) and the ubiquitination modified by ARIH2 demonstrate that C. difficile toxins can enter the host-cell cytoplasm via the receptor-mediated endocytosis. Another identified CD0660 receptor, HSP90B1, encodes a member of HSP 90 kDa family and plays a role in protein folding. However, this protein is modified by the acetylation but is not regulated by any deacetylase in the GEIN, which would impair the chaperone ability of HSP90B1 and thus result in a misfolded-protein formation [796]. Arrestin Beta 2 (ARRB2) participates in the MAPK signaling pathway to cause a specific dampening of cellular responses to extracellular stimuli. The interaction (CD0478, ARRB2) and USP8-mediated ubiquitination of ARRB2 suggest that ARRB2 may help CD0478 enter the host cell and then trigger downstream activities. While in the genetic and epigenetic core pathways at the late stage of CDI shown in Fig. 16.9, a new pathogen protein CD1466 was secreted out by C. difficile. CD1466 encoding an ATP-binding protein belongs to the ABC-type transport system, which is the same as CD0478 in the early stage. CD1466 can interact with host receptors SNW1 and EGFR, causing various cellular responses in the host cell. For example, SNW1 can enter the cytoplasm and then interact with TFs via acetylation to trigger numerous immunity-related processes (Fig. 16.9). In addition, EGFR is responsible for many host cellular responses, including the GTPase activity, immune response, inflammation, and apoptosis. In Fig. 16.9 the target TFs of EGFR-mediated pathways include NFIC and NFKB1, which are both key TFs for the inflammatory response in our results. The strong inflammatory response triggered by CD1466 is used by host cells to remove invasive pathogens. However, severe inflammation can in turn induce tissue damage of host cells. The trade-off of inflammation in CDI will be discussed in later sections. In addition to CD1466, CD0663 and CD0660 are also secreted in the late stage of CDI. CD0663 can bind to CD46, HSP90B1, HSP90B2P, and RAC1. CD46 is a regulatory component of the complement system that could trigger various cellular responses. The activated complement system could also release some specific proteins that can recruit phagocytic cells such as neutrophils to clear toxins and whole microbes. The interaction between CD0663 and CD46 indicates that pathogen CD0663 could be the extracellular stimuli of CD46, and potentially responsible for these cellular processes. HSP90B2P is influenced by CD0663 in a similar manner to HSP90B1 in the early stage of CDI. This HSP of 90 kDa family plays a crucial role in the pathways identified in Fig. 16.9. The acetylation and interaction with toxins (CD0660 and CD0663) of HSP90B2P impair its chaperone ability. In addition, the interaction between CD0663 and RAC1 has been well characterized by the glucosylation of the latter protein, producing 50 -diphosphate (UDP) and resulting disturbances of cytoskeleton homeostasis. Interestingly, we noticed that there are more host proteins influenced by CD0663 than CD0660 in the late stage of infection. In contrast, CD0660 binds more host proteins than CD0663 in the early stage (Fig. 16.8). These results could provide an explanation for the perennial argument about the cytotoxic responsibility of CD0660 (TcdB) and CD0663 (TcdA),
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
408
16. Investigating the host/pathogen cross-talk mechanism
suggesting that both toxins are essential for pathogenesis. CD0660 functions prior to CD0663 and triggers rapid responses in the early stage of infection, and CD0663 works actively in the late stage. This role change may result from the CD2973-induced acetylation of CD0663 in the late stage. The rise of CD0663 activity is the turning point for the progression of CDI. Another host protein, FN1, also changes its role at this stage. FN1 was repressed by the ubiquitination and CD2787-induced degradation in Fig. 16.8, but the CD46-triggered activation and NAT8L-induced acetylation of FN1 could increase the corresponding expression levels (P-value ,8 3 1023, based on one-way analysis of variance) despite the influence of CD2787, thus allowing FN1 to trigger the downstream activities in the late stage of CDI. To further understand this, we separate the rearranged core pathways in Figs. 16.8 and 16.9 based on different cellular responses and their corresponding pathways for further discussion. For the early stage of CDI the genetic and epigenetic scheme of the core pathways (Fig. 16.8) can be separated into three parts: the offensive mechanism of the pathogen and the corresponding pathogenesis of host cells (Fig. 16.10A); the remedial actions employed by host cells in response to pathogen-induced injuries (Fig. 16.10B); and the counter mechanisms of Caco-2 cells and the defense mechanisms of C. difficile (Fig. 16.10C). During the late stage of CDI the core pathways (shown in Fig. 16.9) can be separated into two parts: the strong cellular responses triggered by epigenetic acetylation in host cells for depleting the tenacious pathogen at the late stage of CDI (Fig. 16.11A) and the severe inflammation and apoptosis processes of Caco-2 cells as well as the endospore formation of C. difficile (Fig. 16.11B). 16.3.1.2 A precise view of pathogenic effects and host responses at the early stage of C. difficile invasion 16.3.1.2.1 Pathogenic factors utilized by C. difficile and the resulting pathogenesis in Caco-2 cells
During the CDI the first event that initiates the pathogenesis is cell adhesion of the two organisms. CD2787 plays an important role in cell adhesion by binding FN1, resulting in the degradation of the extracellular matrix proteins of host cells [762]. CD2787 also interacts with the cytotoxin CD0660 (TcdB) to promote the secretion of toxins, and signals via the toxin regulator CD0664 (TcdC) to enhance the production of CD0660. The enterotoxin CD0663 (TcdA) and cytotoxin CD0660 (TcdB) are the two major C. difficile pathogenic factors, and the main causes of clinical symptoms of CDI. In Fig. 16.10A, we show that CD0660 can activate CD0663 via CD1625, a histidine kinase that plays a major role in the signal transduction of prokaryotes. Another route for CD0660 to control toxin production is through interaction with the pathogen transcriptional regulator CD1064. This protein could regulate a wide range of genes, including CD0660 and CD0663. Similarly, CD0663 (TcdA) can enhance the toxin production via CD1064 and CD0664. It should be noted that the toxin activator, CD0659, could regulate CD0664. Higher expressions of CD0660 and CD0663 in the early stage of CDI (CD0660: P-value ,5 3 1023, CD0663: P-value ,4 3 1023) reflects the higher activities of toxins at this stage. We also determined that CD2787 could activate CD0478 through the histidine kinase CD1352. CD0478 is an ATP-binding cassette transporter that plays a role in the antibiotic resistance of C. difficile [797]. We also determined that this protein is the ligand of some human cell receptors and is responsible for the subsequent response of human cells in CDI, suggesting its potential immunogenicity.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.3 Investigating the cross-talk mechanism by constructing the genetic-and-epigenetic interspecies network
409
FIGURE 16.10
The host/pathogen cross-talk mechanism through core pathways at the early stage of CDI: (A) the secreted pathogenic factors of Clostridium difficile and their induced cytopathic effect in Caco-2 cells; (B) the remedial schemes employed by host cells in response to C. difficile toxins; (C) offensive mechanisms via hostsecreted ROS, miRNAs, and immune response, and thus the defense mechanism of C. difficile at the early stage of CDI [12]. CDI, Clostridium difficile infection; ROS, reactive oxygen species.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
410
16. Investigating the host/pathogen cross-talk mechanism
FIGURE 16.11 The host/pathogen cross-talk mechanism through core pathways at the late stage of CDI: (A) the enhanced ROS production and cellular stress of host cells, and the anti-ROS mechanism of C. difficile; (B) the stress-induced apoptosis of Caco-2 cells and the leaving of C. difficile. The red arrow lines denote transcriptional regulation; the gray solid lines indicate the proteinprotein interaction; the green dot lines signify protein translation; the gray blue dash lines indicate protein secretion; and the purple clump with arrow and dash lines represent the activity of ROS. The definitions of node symbols are the same as those defined in Fig. 16.10. In part (A) the abundant activity of CD0663 and the acetylation of FN1 increase the production of ROS and trigger the formation and transport of cytokine, inducing a strong inflammatory response. However, the presence of ER stress response reflects that the accumulated cellular stresses also risk the survival of the host cell. Meanwhile, C. difficile employ DNA damage response and antioxidative proteins to counteract the oxidative stress. In part (B) the severe inflammation, accumulated oxidative stress and ER stress, and the activity of NF-κB complex trigger the apoptosis of Caco-2 cells. In addition, C. difficile actively transform to endospore to lie dormant and avoid the risking environment [12]. CDI, Clostridium difficile infection; ROS, reactive oxygen species.
Secreted pathogenic factors require host receptor-mediated endocytosis to enter the host cell. FPR1 is a G proteincoupled receptor that plays an important role in the activation of immune-related phagocytic cells. However, this receptor has also been reported to help toxins enter host cells via the receptor-mediated endocytosis as discussed previously. FN1 can negatively interact with its downstream protein PTEN, a tumor suppressor protein
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.3 Investigating the cross-talk mechanism by constructing the genetic-and-epigenetic interspecies network
411
(Fig. 16.10A). PTEN inhibition has been reported to activate the small GTPase RAC1 [798]. Decreased PTEN expression (P-value ,3 3 1023) could lead to an increased expression of RAC1 (P-value ,2 3 1023) during the CDI. This inverse correlation suggests that FPR1 can interact with PTEN to affect RAC1 expression. RAC1 is characterized by its participation in the assembly of the NADPH complex and thus the production of ROS in the CD0660infected Caco-2 cells [799]. Interestingly, our results demonstrate that another glycosylation-independent pathway exists for the RAC1 regulation of ROS production. Fig. 16.10A shows that RAC1 can activate RPS3, which plays a role in acetylation, thereby mediating ETS1 translocation to the nucleus for transcription. The SYK gene is regulated by ETS1 and is also involved in the ROS production. Higher expression of SYK in the early stage of CDI (P-value ,3 3 1023) reflects a higher oxidative stress caused by the ROS. Oxidative stress reflects the imbalance of the redox state in a biological system. The overproduction of ROS or insufficient reducing agents in the system could lead to an oxidative stress. Oxidative stress is usually activated by the host to eliminate the infecting bacteria. However, the simultaneous production of ROS and free radicals could cause a severe damage of cellular components, including lipids, proteins, and DNA. In the case of CDI, toxins could hijack the RAC1-dependent pathway to generate ROS, resulting in the oxidative-stress-mediated necrosis of host cells. HSPs are usually highly expressed in cells under stress. Recent studies focus on their chaperone function to assist with the folding and refolding of proteins. The acetylation of HSP90B1 and its interaction with CD0660 impair the chaperone ability of HSP90B1, resulting in the misfolded-protein generation and accumulation (Fig. 16.10A). Generally, HSP90 proteins could cooperate with HSP70 proteins to carry out their chaperone activities. We also determined that TSPAN7 could facilitate HSP90B1 translocation to the cytoplasm, and HSP90B1 can then work together with HSPA5 (Fig. 16.10A). Unfortunately, unlike HSP90 proteins, the deacetylation of HSPA5 will switch its folding ability into a degrading ability [800]. Moreover, the UBE2D3-induced ubiquitination of HSPA5 results in its low expression in the early stage of CDI (P-value ,2 3 1023). These effects could demonstrate that HSPA5 provides few chaperone abilities and its degrading activity is also limited. Other regulators of HSPA5 are also identified in our results. In Figs. 16.10A, HSPA5 gene is upregulated by PRRX2 and silenced by miR155HG, but the high level of miR155HG and DNA methylation of HSPA5 could induce a low expression of HSPA5 in the early stage of CDI. This failed regulation also results from the CD2787-induced degradation of FN1, which could decrease the expression of downstream PRRX2. Since the maintenance of the proteome is one of the most important tasks of the cell, the disturbance of protein homeostasis could be deadly. Once these misfolded proteins begin to aggregate, cells could employ various strategies, including autophagy to eliminate these aggregated components. Serine/Threonine Kinase 11 (STK11) is responsible for autophagy activation in response to the aggregation of misfolded proteins. The deacetylated HSPA5 can regulate the expression of STK11 via GATA1. STK11 is normally repressed by miR73HG. However, the high expression of STK11 (P-value ,3 3 1023) could demonstrate that host cells require autophagy to remove these misfolded proteins. ARRB2, another receptor-binding protein affected by FN1 and CD0478, can directly trigger STK11 and indirectly activate HSPA5 via the signalosome subunit COPS5 to promote the autophagy response.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
412
16. Investigating the host/pathogen cross-talk mechanism
The red arrow lines denote transcriptional regulation; the gray solid lines indicate the PPI; the green dot lines signify protein translation; the gray blue dash lines indicate protein secretion; and the purple clump with arrow and dash lines represent the activity of ROS. In part (A), C. difficile toxins hijack a RAC1-related pathway to promote ROS production in host cells. The significant change of acetylation level of HSPs (HSP90B1 and HSPA5) and their interactions with toxins impair their chaperone activity, resulting in the formation of misfolded protein and the induction of autophagy. As the remedial schemes for these injuries, in part (B), Caco-2 cells employ the DNA damage response and autophagy to deplete the damage caused by pathogens and innate immune response to remove the invasive bacterium. The alternative routes of these counter mechanisms and the miRNA silencing used by host cells to interfere with the activities of pathogen are also displayed in part (C). Furthermore, the scheme in part (C) shows the multiple pathways of C. difficile against the oxidative stress presented by Caco-2 cells. 16.3.1.2.2 Caco-2 cells adopt autophagy, DNA damage response, and the activation of PAK1 and GRB2 as remedial schemes in response to pathogen-induced damage
SCARA3, an ROS scavenger protein, can detect host-produced ROS and is affected by CD0660. This protein is employed by host cells to counteract various stress conditions, including overproduction and secretion of ROS. SCARA3 can work alone or together with the following pathways to counteract stress. Considering ROS-induced DNA damage and cellular component injury, LRRK2 is activated by SCARA3 to remove these unworkable components (Fig. 16.10B). LRRK2 encodes a leucine-rich repeat kinase and is also responsible for the autophagy initiation. Once activated by SCARA3, LRRK2 can respond to the upstream signals by performing its own function and the activation of YY1. YY1, if normally silenced by methylation, miR4500HG, and ETS1, could play an important role in DNA repair. In addition, this protein can eliminate the misfolded proteins and damaged components via the ubiquitination-mediated autophagy. Besides direct activation, LRRK2 can enhance the expression of YY1 by repressing ETS1 (Fig. 16.10B). The high expression level of LRRK2 in the early stage of CDI (P-value ,2 3 1023) and an increased amount of YY1 during the infection process (P-value ,1 3 1023) could demonstrate this relationship. These identified regulatory interactions suggest that host cells could recruit numerous proteins and genes to maintain the redox-state balance and protein homeostasis. Over the past decade, C. difficile toxins (CD0660, CD0663) have been characterized by their capacity for the glycosyltransferase-dependent inactivation of host Rho family GTPases (RHOA, CDC42, RAC1). It has been shown that these modifications could result in cytoskeleton rearrangement, severe inflammation, and subsequent apoptosis in human cells [751]. In Figs. 16.10B, CD0660 and CD0663 interact with RHOA and impair the binding ability of RHOA to its downstream effectors, causing the actin cytoskeleton dysfunction. However, the same interaction with CDC42 fails to block the binding to its downstream effector PAK1, and this may be due to the acetylation of CDC42. Even though, the CDC42-activated kinase 1 (PAK1) and the subsequent GRB2 still remain in a low expression due to their methylation and ubiquitination at this stage. Both PAK1 and GRB2 play dual roles in the actin cytoskeleton dynamics and activation of the immune response such as the interleukin-23 signaling pathway. Their low expression levels (PAK1: P-value ,2 3 1023, GRB2: P-value ,4 3 1023) indicate that these activities are limited.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.3 Investigating the cross-talk mechanism by constructing the genetic-and-epigenetic interspecies network
413
Interestingly, we noticed that PRRX2 functions as a double-edge sword. The repression of PRRX2, resulting from CD2787-mediated degradation of FN1, not only decreases the efficiency of chaperone functions as previously mentioned but also unlocks the repression of PAK1 and GRB2 genes. This process can restore the cytoskeleton homeostasis and initiate the interleukin-23 signaling pathway. 16.3.1.2.3 The offensive mechanisms of Caco-2 cells and the defense mechanisms of C. difficile at the early stage of CDI
The evolutionary war between microbes and humans has lasted for thousands of years. In addition to the defense mechanisms developed against microbial infection, host cells also evolved strategies to fight back and even kill the invasive microorganisms. However, unlike pathogens, lipids and proteins from human cells cannot easily pass through bacterium cell walls and membranes. These offensive interactions exist but still remain largely unknown and generally species specific. The major means employed by host cells include the production of tiny and toxic molecules (ROS), the recruitment of phagocytes (neutrophil, macrophages), and the newly detected miRNA regulation [685]. The production of ROS and immune response initiation have been discussed in previous sections. The alternative routes of these offensive processes are also revealed in Fig. 16.10C. It will be described in detail in the following paragraphs. Through its interaction with FN1 and CD0478, Arrestin Beta 2 (ARRB2) can signal through the COP9 signalosome subunit COPS5, which plays a role in multiple signaling pathways, to positively activate GRB2. GRB2 participates in cytoskeleton homeostasis and the innate immune response. GRB2 can activate ETS1 to regulate the SYK expression and thus the ROS production. The cooperation of ROS and innate immune processes forms an offensive mechanism to deplete pathogens. ARRB2 can also transmit stimulation signals to androgen receptor (AR). Once stimulated, AR dissociates from some accessory proteins, translocates to the nucleus, and then stimulates the transcription of PAK1 and miR155HG. Both PAK1 and miR155HG are involved in the innate immune response. The feature of this pathway is the activation of miR155HG. The high expression level of this miRNA could increase the probability of passing through pathogen cell barriers (cell walls and cell membranes) to silence pathogen genes. The relationship between host miRNA s and pathogen genes has recently been reported [685]. This seminal observation demonstrated that the host can regulate and shape gut microbiota through miRNAs. We have observed that miR155HG and miR1 can silence CD3256 and CD0130, respectively. CD3256 and CD0130 could function in pathogen pathways to counteract host-produced ROS. This indicates that host cells can interfere with pathogen defense mechanisms and enhance the host’s ability to eliminate C. difficile. Oxidative stress is usually employed by host cells to kill pathogens, but C. difficile toxins always enhance ROS production through a RAC1-mediated pathway, causing injury to the host and DNA damage. This suggests that C. difficile may employ some defensive mechanisms to avoid or defend against ROS. When ROS is released at the interface between the host and pathogen, pathogen cell-wall proteins are the first to be encountered. In Fig. 16.10C the cell-wall protein CD2787 (cwp84) and CD0440 (cwp27) can positively activate CD0812, which is a universal stress protein. Universal stress proteins can promote endurance under stress conditions such as heat, nutrient starvation, chemical agents, and
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
414
16. Investigating the host/pathogen cross-talk mechanism
oxidative stress. The homoserine kinase CD2119, which participates in several essential metabolic pathways, is also employed by CD0440 to activate CD2356. CD2356 is a thioredoxin reductase that can remove superoxide radicals and balance the redox state of pathogens. A high expression level of CD2356 in this stage (P-value ,6 3 1024) could function to resolve oxidative stress. In addition to performing its own activity, CD2356 can urge CD0130 to interact with CD0141. CD0130 encodes an S-adenosylmethionine synthase and has been identified as an essential survival gene of C. difficile, and the absence of this gene will result in gaps in metabolic networks and biomass decrease [775]. Its metal ionbinding ability can help CD0141 with exporting copper ions. Copper (Cu) is an essential element for most species, including bacteria. However, on overexposure to Cu is toxic. In fact, copper and its alloys are natural antimicrobial materials that have long been used as bactericides before antibiotics were discovered. The toxicity of copper ion is primarily due to its reaction with human-produced H2O2 (a member of ROS) to generate the hydroxyl radical ( OH), which could damage pathogen lipids, proteins, and DNA. To reduce this risk the copper homeostasis protein CD0141 is activated to export copper from pathogen. Interestingly, we have identified that CD0130 is silenced by the host miRNA miR1. This regulation could reduce the probability of CD0130CD0141 cooperation, thus interfering with pathogen defense mechanisms. Unfortunately, the expression of CD0130 remains high in the early stage of CDI (P-value ,3 3 1023), and this may be due to the lack of miR1 upregulation. CD0237 (FliD) encodes a flagellar protein that together with CD2787 could play a role in C. difficile adhesion. In Fig. 16.10C, this cell surface protein responds to ROS by activating CD1185 and CD2753. CD1185 is a diguanylate kinase signaling protein that participates in the formation of c-di-GMP, a ubiquitous second messenger involved in the bacteria biofilm formation and pathogen aggregation [801]. The aggregated pathogen cells could also promote the formation of biofilm. CD2753 is also a c-di-GMP-signaling component that is specific to C. difficile 630. The assembly of these second messenger subunits could also induce various cellular responses. One downstream TF, CD2643, which normally regulates the sporulation process, is activated to transcript CD1185, creating a selfactivation loop. CD1185 is originally methylated, but its high expression (P-value ,2 3 1023) at the early stage for the establishment of the feedback loop demonstrates the high activity of c-di-GMP and thus the formation of biofilm, which is a powerful scheme against stress conditions, including the oxidative stress. CD2115, a copper-transporting ATPase, is activated by CD2753, playing a similar role as CD0141. The existence of similar pathways modulating copper homeostasis indicates the important role of Cu ion in the oxidative stress, by highlighting the redundancy employed by pathogens to counteract the oxidative stress. Furthermore, in Figs. 16.10C, CD0237 (FliD) can signal via its downstream flagellum subunit CD0745, a putative chemotaxis protein, to trigger the CD1128-mediated DNA replication. CD1128 is the DNA polymerase 1 (PolA) of C. difficile, and the initiation of DNA replication could trigger the bacterial reproduction. The increased amount of pathogen could promote the concentration of toxins and the resistance against stress conditions. Further, CD1128 is methylated by CD2726, but its high expression level (P-value ,5 3 1023) and the positive interaction with CD0745 suggest that pathogen still recruits this protein for DNA replication. CD1128 then positively activates the transcriptional
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.3 Investigating the cross-talk mechanism by constructing the genetic-and-epigenetic interspecies network
415
regulator CD1064 to regulate CD2356. Another CD1128-mediated anti-ROS pathway includes CD3256 and CD0171. CD3256 (valS) is a valine-tRNA ligase and has been predicted to be an essential gene for the growth of C. difficile [792]. Its activation ability to CD0171 is important for pathogen since CD0171 could encode a key redox-sensing regulator. CD0171 is only active as a repressor when the intracellular NADH/NAD1 ratio is low, thus regulating the redox state of the pathogen [802]. Interestingly, we have observed that the host could interfere with this antioxidative defense process through miRNA silencing. miR155HG is upregulated in the host cell (P-value ,4 3 1023) and identified as a repressor of CD3256. The inhibition of CD3256 by miR155HG could result in not only a redox-state disturbance, but also a pathogen-biomass decrease since CD3256 was predicted as an essential component for the growth of C. difficile [792]. 16.3.1.3 The strong cellular activities of Caco-2 cells and the infection results of host and pathogen at the late stage of infection 16.3.1.3.1 The emphasized ROS production and stress accumulation of host cells, and the failure of antioxidative defense mechanisms in C. difficile
When CDI proceeds to the late stage, pathogen toxins exist in host cells, and the bacteria continue to secrete pathogenic factors into the interface between the two species. In this situation the scattered CD0663 binds the host membrane protein CD46, triggering various cellular responses and the complement system. As discussed previously, the CD2973induced acetylation of CD0663 could promote its activity in the late stage. This in turn increases the probability of CD0663CD46 interaction and the following responses. CD46 could activate these processes through FN1 (Fig. 16.11A). For example, FN1 can trigger the ROS production by activating NOX5. NOX5 is a key member of the NADPH oxidase complex and is also responsible for the superoxide generation. Unlike the indirect transcriptional regulation utilized in the early stage, ROS production via NOX5 in the late stage could provide a much rapid and efficient antimicrobial effect since it is directly activated by host-cell-surface FN1 and immediately released into the interface. RAB33B encodes a small GTP-binding protein and plays a significant role in vesicular transport in protein secretion, such as cytokine release. RAB33B is repressed by methylation, but its high expression level (P-value ,1 3 1023) and the positive interaction with FN1 indicate that host cells could enhance protein transport through RAB33B. In addition to direct interaction, FN1 can signal through the nuclear factor NFIC to upregulate the expression of RAB33B (Fig. 16.11A). Another important downstream protein of FN1 is C22orf28. Referred to as RTCB, C22orf28 was recently reported as an essential component in the ER stress response [803]. The ER stress response is a cellular stress response resulting from the accumulation of unfolded proteins in the ER. The major aim of the ER stress response is to refold or degrade misfolded proteins, thus relieving the ER stress. If this objective cannot be achieved in a certain time span, cells then activate the apoptosis. The presence of C22orf28 demonstrates that the prolonged accumulation of misfolded proteins from the early stage could induce ER stress and risk the survival of host cells. To relieve the stress, host cells then employ lncRNA HOTAIR to activate C22orf28, but the decreased level of HOTAIR (P-value ,3 3 1023) may in turn limit the activity of C22orf28. Overall, FN1 can be
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
416
16. Investigating the host/pathogen cross-talk mechanism
activated by CD46 and modified by the NAT8L-triggered acetylation, which could enhance the expression of FN1 and therefore promote cellular processes such as ROS production, ER stress response, and vesicular transport. Therefore the presence of CD46 and the acetylation of FN1 are considered as critical events during the progression from the early stage to the late stage of CDI. Similar to the early stage, the chaperone function of HSP90B1 is impaired by the acetylation and interaction with C. difficile toxin CD0663, which results in a misfolded-protein formation. This result suggests that HSP90B1 is a conserved receptor for C. difficile toxins. To improve the impaired chaperone ability, HSP90B1 urges the TF GATA2 to regulate its own expression. The acetylation of GATA2 also enhances its transcription ability to regulate the transcription of both NOX5 to improve ROS production and RAB33B to strengthen cytokine release. In addition to HSP90B1, another HSP 90 kDa member, HSP90B2P, is influenced by C. difficile toxins (Fig. 16.11A). C. difficile toxins (CD0663, CD0660) can directly interact with HSP90B2P to repress the protein-folding ability of HSP90B2P in the host cell. The acetylation of HSP90B2P also impairs its own chaperone function. In that case, HSP90B2P shows a similar response as HSP90B1. It can upregulate the expression of NOX5 through AR, the acetylation of which could promote its regulatory ability. HSP90B2P also activates the nuclear factor NFIC via the signal transduction protein VCAM1. The activated NFIC could then upregulate the expression of RAB33B. These same responses presented by HSP90B1 and HSP90B2P demonstrate that C. difficile toxins, whether intra or extracellular, could affect the cellular function of HSP90 proteins in a similar manner. The third manner in which toxins affect HSP90 proteins is through the inactivation of RAC1. The toxininduced glycosylation of RAC1 could lead to the production of 50 -diphosphate (UDP), a putative “danger signal” and the ligand for the receptor P2RY6 [804]. The secreted UDP can alarm neighbor cells by binding to P2RY6 and therefore activate HSP90B2P-related processes. Host miRNAs, as mentioned previously, can pass through pathogen cell walls and membranes, enter the cytosol, and then silence pathogenic genes. In the late stage of infection, host miRNAs, miR4500HG and miR16, can inhibit the pathogen genes CD1631 and CD1690, respectively. Both silenced genes could encode the protective proteins against oxidative stress (CD1631: superoxide dismutase, CD1690: thioredoxin). This result is similar to the previous stage, suggesting that host miRNAs may target anti-ROS genes in C. difficile. In response to the enhanced ROS production and the activities of the complement system, C. difficile could perform multiple methods to counteract host mechanisms. In Fig. 16.11A the flagellar sigma factor CD0266 (FliA) replaces CD2787 to control toxin production and induce defense mechanisms against ROS. It can directly interact with CD0660 (TcdB) or activate another toxin, CD0663 (TcdA), via the transmembrane protein CD2337. The downstream target of CD0663 is a GTP-sensing transcriptional repressor CD1275 (CodY). This protein plays an important role in the pathogen growth repression and its corresponding gene has been predicted as an essential gene for the growth of C. difficile [792]. The presence of this protein indicates that pathogens then transform from the rapid growth to the stationary phase and reflect the risk situation presented by the host cell. Interestingly, the high expression of CD1275 (P-value ,1 3 1023) and the decreased
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.3 Investigating the cross-talk mechanism by constructing the genetic-and-epigenetic interspecies network
417
expression of CD0660 (P-value ,3 3 1023) could demonstrate that CD1275 in turn inhibits the expression of toxins, which is consistent with a previous study [777]. CD2337 can also activate the metabolic dehydratase CD2341 and consequently its downstream essential protein CD0130. CD0130 plays a central role in this pathway to trigger various responses. However, a recent study has indicated that the acetylation of the CD0130-homolog in E. coli could impair the corresponding enzymatic activity [805]. Fig. 16.11A shows that CD0130 can activate the release of pathogenic factor CD1466 and signal via CD1931 (LexA) to regulate the toxin CD0663. CD1931 is a transcriptional repressor that can repress toxins and a number of genes involved in the DNA damage response [778]. The impaired enzymatic activity of CD0130 could result in low activity of CD1931 (P-value ,1 3 1023) and thus initiate the DNA damage response to remove ROS-induced DNA damage. CD0130 also employs alternative proteins to counteract ROS such as superoxide dismutase CD1631, a conserved and powerful antioxidative protein, and CD1287, a fur family transcriptional regulator. CD1287 can enhance the expression of CD1631 and CD0179, another defensive protein against H2O2, to eliminate ROS [806]. However, these protective activities are limited by the acetylation of CD0130. Furthermore, DNA methylation of CD0179 and host miR4500HG-induced silencing of CD1631 could also repress antioxidative defense mechanisms, resulting in DNA damage and component damage of the pathogen. Since the cellular function of CD1412 is currently unknown, our identified results suggest that it may participate in the defense mechanism triggered by CD0266 (Fig. 16.11A). Once activated by CD0266, CD1412 then triggers CD0179 and the downstream CD1690 protein. CD0179 encodes a glutamate dehydrogenase and is secreted by the pathogen, forming a protective layer against H2O2 [806]. Its downstream target, thioredoxin CD1690, can cooperate with CD0179 to remove ROS around C. difficile. The activation of the CD1412-mediated pathway and the high expression of CD1287 indicate that C. difficile could recruit these proteins against oxidative stress. However, the DNA methylation of CD0179 and host miRNAinduced silencing of CD1631 and CD1690 confer a limited efficiency. 16.3.1.3.2 The apoptosis process triggered by severe inflammation, accumulated cellular stresses of Caco-2 cells, and the leaving of C. difficile via endospore formation
Stimulated by CD1466, SNW1 participates in the NOTCH signaling pathway to modulate the NF-κB activity. Fig. 16.11B demonstrates that SNW1 can enhance the transcription ability of GATA2 and therefore improve the RAB33B-meidated protein secretion. In addition, SNW1 can activate NFKB1 through the diacylglycerol kinase DGKE, which is involved in several immune-related pathways, including the interleukin-3, -5 signaling pathways. The activation of NFKB1 is an important event of CDI, and the acetylation of this NF-κB subunit could enhance its transcription ability to upregulate IL-8, a major cytokine for neutrophil recruitment. Tight junction breakdown occurs at the early stage, allowing the recruited neutrophils to enter the lumen and then clear pathogens by phagocytosis. However, neutrophil infiltration can also cause severe tissue damage in the host cell. In fact, neutrophil infiltration is an important clinical feature induced by CDI. Another important property of NF-κB is its ability to regulate apoptosis. In Fig. 16.11B the chaperone ability of HSP90B2P is impaired by acetylation and the interaction with toxins. The prolonged ER and oxidative stress, initiated in the early stage, eventually trigger apoptosis via the signaling proteins
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
418
16. Investigating the host/pathogen cross-talk mechanism
VCAM1 and N4BP3. NFKB1 can upregulate its own expression and together with REL, another NF-κB subunit activated by HSP90B2P, promote the NF-κB complex activity, thus initiating apoptosis. NFKB1 and REL are normally methylated and silenced by miR143HG and miR155HG, respectively. However, their high expression level (P-value ,1 3 1023) indicates that host cells can undergo apoptosis in the final step of infection. Moreover, HSP90B2P utilizes ETS1 to enhance the expression of the long noncoding RNA MIAT, which has also been reported to participate in the apoptosis in epithelial cells [793]. High levels of ROS and scattered neutrophils could threaten the survival of C. difficile, forcing pathogens to leave the infection site. The decreased expression of the pathogen cell surface protein CD2787 (P-value ,4 3 1023) decreases the adhesion ability of C. difficile. Another cell-wall protein, CD1987, utilizes the transmembrane protein CD2781 to initiate the CD1128-related DNA replication and the following sporulation. CD1128 encodes DNA polymerase 1 (PolA) of C. difficile, and the acetylation of this protein can affect the replication timing. DNA replication in bacterium triggers either pathogen reproduction or sporulation. Endospores are special survival structures of some bacteria, including the Clostridium genus. They are highly resistant to ultraviolet radiation, heat, desiccation, chemical agents, and oxidative stress, allowing bacteria to lie dormant for even centuries. Fig. 16.11B demonstrates that C. difficile can initiate the sporulation process via CD1214. CD1214 is the stage 0 sporulation protein that helps pathogens transform to an endospore. CD1214 is originally methylated during infection but is activated by CD1128 and thus upregulates its own expression to enhance its activity. In addition, CD2787 can trigger another sporulation pathway via CD2247. CD2247 encodes a serine peptidase, which plays a crucial role in the C. difficile spore germination. Our results show that CD2247 also controls the sporulation process. For instance, it can activate the biosynthesis of spore coat protein CD1511 and improve the activity of CD2629, which is an ATPase required for the spore coat formation and proper localization. CD2247 can also activate a crucial sporulation regulator CD2643 (SigE) [779], and this sigma factor can upregulate the expression of CD2629 (P-value ,8 3 1023) and cooperate with CD1214. These activities suggest that pathogens could actively transform to an endospore form in the late stage of infection. 16.3.1.4 Drug targets prediction and multimolecule drug design for treating Clostridium difficile infection At present, the major agents used to treat CDI include metronidazole, vancomycin, and fidaxomicin. In patients with mild symptoms the guidelines from the Society of Healthcare Epidemiology of America and the Infectious Diseases Society of America recommend treating CDI with metronidazole and vancomycin. However, both antibiotics result in side effects and the recurrence rate still remains unacceptably high [807]. A metaanalysis has demonstrated that the recurrence rate of CDI is between 13% and 50% of all patients after an initial episode [808]. In addition, vancomycin and fidaxomicin are the only two FDA-approved drugs used to treat CDI, and the corresponding relapse rates are 24% and 13%, respectively [809]. Most recurrent cases are induced by the spore-mediated reinfection, suggesting the importance of endospores in CDI. Considering the crucial role of CD2787 (cwp84) in pathogenesis at the early stage of CDI, we recommend CD2787 as a potential drug target in the prevention of CDI.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.3 Investigating the cross-talk mechanism by constructing the genetic-and-epigenetic interspecies network
419
Moreover, other cell-wall proteins participating in the infectious process such as CD0440, CD0237, CD0266, and CD1987 are also potential drug targets since the inhibition of these cell surface proteins may reduce the occurrence of cell adhesion. In Figs. 16.8 and 16.9, four pathogen genes (CD0130, CD3256, CD1275, and CD2781) were previously predicted as the essential survival/growth genes by applying FBA and SA to a curated C. difficile metabolic network, as well as transposon-directed insertion site sequencing (TraDIS) to the C. difficile transposon mutant library [775,792]. However, CD0130 and CD3256 have important human-homologs, MAT2A and VARS, respectively [775]. The inhibition of these proteins may result in an unpredictable dysfunction of host cells. Therefore CD1275 and CD2781 are more conservative choices for further drug design. Similarly, proteins involved in the defense mechanisms employed by C. difficile against ROS, including CD0141 and CD2115 in the early stage of CDI, and proteins responsible for redox-state homeostasis (CD1690 and CD1631) in the late stage of CDI, have human-homologs [775]. Therefore the three remaining anti-ROS enzymes (CD2356, CD0171, and CD0179) are recommended as potential targets to repress the defense mechanisms of C. difficile. The heavy economic burden of CDI results from its high recurrence rate, suggesting that the inhibition of spore formation of C. difficile is a feasible therapeutic way to reduce the recurrence rate of CDI. Therefore the key sporulation pathway members included CD1214, CD2629, and CD2643 could become targets of the further multimolecule drug design. In addition to the inhibition of pathogen mechanisms, another strategy of drug design includes the promotion of host defense mechanisms. It has been reported that a low dose (0.2 ng/mL) of CD0660 is sufficient to induce a significant cell rounding in vitro [810], suggesting that the amplification of host defense is required. In our results the major pathogenic effects induced by C. difficile pathogenic factors include the tight junction/ cytoskeleton breakdown and the accumulation of ER stress, which result from the inactivation of Rho GTPases (RHOA, CDC42, and RAC1) and the dysfunction of chaperone proteins (HSP90B1, HSPA5, and HSP90B2P), respectively. Therefore we aim to increase the expression of these genes to maintain the activities of these dysfunction components. The severe inflammation and apoptosis process, induced by NFKB1, REL, and IL-8, are responsible for cell death at the late stage of CDI. Though the activities of immune response and inflammation are important counter mechanisms against bacteria, the overexpression of these related genes could cause tissue damage and apoptosis of host cells. Therefore we want to repress the expression of NFKB1, REL, and IL-8 to relieve these activities. After predicting these potential multimolecule drug targets, we will then perform drug mining from databases and literature review to design a multimolecule drug that targets these potential drug targets. Since there is currently no existing drug database for drugs targeting C. difficile proteins, we could explore some studies that predict drugs inhibiting these potential drug targets. One study has demonstrated that CD2787 (Cwp84) is a cysteine protease [762], and the specific cysteine inhibitor E64 can repress the activity of CD2787 and its degrading ability toward ECM proteins [762]. Another report has suggested that chicken-produced protein-specific egg yolk antibodies (IgY) can be considered as a potential therapy of CDI [811]. These protein-specific antibodies could affect C. difficile surface proteins, including CD2787, CD0239, and CD0237, especially the purified CD0237specific IgY. Furthermore, REP 3123 dihydrochloride (REP3123) has been reported to inhibit the toxin production and spore formation in C. difficile, thereby reducing
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
420
16. Investigating the host/pathogen cross-talk mechanism
pathogenesis in a hamster model [812]. In the case of host cells, we have applied the host core network biomarkers of HPCNs to the database extracted from the Connectivity Map (CMap) [813]. CMap contains the genome-wide expression data of five cell lines (HL60, MCF7, PC3, SKMEL5, and ssMCF7) in response to 1327 drugs. Since Caco-2 and MCF7 are epithelial cells, we have adopted the database of MCF7 cells in CMap for further correlation computation. Here, we aim to increase the expression of dysfunction proteins (RHOA, CDC42, RAC1, HSP90B1, HSPA5, and HSP90B2P) and repress the expression of inflammation-related proteins (NFKB1, REL, and IL-8) after administration of the drugs. The full table of potential drugs and the corresponding correlation values are listed in Table 16.5 (camptothecin and apigenin) with top correlation values that satisfy all criteria (positive for dysfunction proteins and negative for inflammation-related proteins) are thus selected as potential drugs for treating the cytopathic effects induced by C. difficile. In addition to the regulatory ability of the abovementioned genes, camptothecin has anticancer, antiviral, and antifungal properties. A recent study has also investigated the antibacterial ability of camptothecin, highlighting the potential antibiotic usage of this drug [814]. Similarly, as a natural product derived from many plants, apigenin has also been reported as a novel antiresistance antibiotic [815]. The combination of camptothecin and apigenin can enhance the expression of dysfunction proteins, inhibit inflammation-related proteins, and confer antimicrobial abilities against C. difficile. The E64 inhibitor and the FliD-specific IgY can inhibit the activities of cell surface proteins CD2787 and CD0237 of C. difficile, and REP3123 can repress the toxin production and spore formation of C. difficile. The combination of camptothecin and apigenin can thus upregulate the expression of dysfunction proteins and downregulate the inflammationand apoptosis-related proteins. Finally, we could combine these drugs as the potential multimolecule drug shown in Fig. 16.12 for the predicted multiple drug targets. This multimolecule drug could induce efficient prevention and elimination of C. difficile and remedial effects to restore gene expression homeostasis. The cysteine protease inhibitor E64 and CD0237-specific IgY can inhibit the activities of CD2787 and CD0237, thus interfering with cell adhesion and cell surface protein maturation [762]. In addition, according to our results, the inhibition of CD2787 and CD0237 will limit the toxin production and the formation of biofilm, reducing not only the probability of cell adhesion but also the cytotoxicity of C. difficile. REP3123 can repress the spore formation and toxin production of C. difficile. The repressed toxin production could limit pathogenesis progression and the inhibition of sporulation could prevent spore-mediated reinfection. Furthermore, the combination of human drugs (camptothecin and apigenin) can promote the expression of dysfunctional proteins (RHOA, CDC42, RAC1, HSP90B1, HSPA5, and HSP90B2P) and repress inflammation-related proteins (NFKB1, REL, and IL-8) against the severe pathogenic effects induced by C. difficile. They could also provide the potential antibiotic activity based on recent studies. Our results suggest that CD2356, CD0171, and CD0179 could participate in the defense mechanisms of C. difficile against oxidative stress. The cooperation among these proteins then could provide a well-designed protection against human-produced ROS. The inhibition of these antioxidative proteins can facilitate the host eliminating ability against pathogens, and the scattered ROS can induce the rapid necrosis of pathogen cells. Therefore we recommend that CD2356, CD0171, CD1064, and CD0179 are potential drug targets for further drug design.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
TABLE 16.5 The full table of potential drugs and the corresponding correlation values. Carcinine
Carbarsone
Carisoprodol
dl-alpha Tocopherol
Apigenin
Ginkgolide A
Lysergol
Piperine
Pepstatin
Diazoxide
Clindamycin
Nicergoline
HSPA5
0.6
0.08
0.09
0.10
0.15
0.05
0.04
0.03
0.05
0.06
0.03
0.04
RHOA
0.08
0.09
0.14
0.12
0.19
0.04
0.03
0.06
0.08
0.08
0.05
0.08
CDC42
0.02
0.04
0.06
0.03
0.07
0.05
0.03
0.02
0.05
0.02
0.04
0.04
RAC1
0.15
0.09
0.10
0.10
0.15
0.09
0.03
0.11
0.13
0.06
0.04
0.08
HSP90B1
0.08
0.09
0.13
0.11
0.14
0.10
0.02
0.07
0.10
0.06
0.04
0.10
HSP90B2P
0.02
0.09
0.14
0.16
0.20
0.15
0.07
0.19
0.22
0.08
0.05
0.11
NFKB1
2 0.03
2 0.05
2 0.03
2 0.05
2 0.04
2 0.04
2 0.08
2 0.04
2 0.05
2 0.02
2 0.04
2 0.03
REL
2 0.04
2 0.04
2 0.05
2 0.04
2 0.04
2 0.04
2 0.05
2 0.04
2 0.01
2 0.06
2 0.05
2 0.05
IL-8
2 0.02
2 0.02
2 0.07
2 0.06
2 0.07
2 0.04
2 0.06
2 0.02
0
2 0.09
2 0.12
2 0.03
Hexamethonium bromide
Sulfaguanidine Hydroxyzine Procaine Benzonatate Imipramine Trimetazidine Noscapine Monocrotaline Dacarbazine Cytisine Hydrocotarnine
HSPA5
0.03
0.02
0.11
0.03
0.05
0.09
0
0.12
0.09
0.04
0.07
0.13
RHOA
0.06
0.05
0.10
0.06
0.09
0.07
0.01
0.11
0.09
0.04
0.08
0.12
CDC42
0.01
0.05
0.01
0.04
0.08
0.05
0
0.07
0.05
0
0.05
0.07
RAC1
0.03
0.10
0.08
0.02
0.05
0.03
0.03
0.05
0.06
0.03
0.05
0.07
HSP90B1
0.05
0.02
0.08
0.04
0.09
0.10
0.01
0.11
0.07
0.02
0.06
0.09
HSP90B2P
0.14
0.08
0.03
0.07
0.04
0.05
0.06
0.02
0.07
0.07
NFKB1
2 0.06
0.06
0
0
2 0.05
2 0.02
2 0.04
2 0.04
0
0.07
2 0.03
2 0.05
2 0.03
2 0.01
REL
2 0.05
2 0.03
0
2 0.01
0
2 0.03
2 0.04
2 0.01
2 0.04
2 0.06
2 0.02
2 0.04
IL-8
2 0.06
2 0.01
2 0.02
0
2 0.02
2 0.03
2 0.04
2 0.08
2 0.09
2 0.08
2 0.03
2 0.09
422
FIGURE 16.12
16. Investigating the host/pathogen cross-talk mechanism
The potential multimolecule drug for treatment of Clostridium difficile [12].
16.4 Discussion and conclusion Using a systems biology approach to construct genome-wide GEINs via big data mining and two-sided genome-wide data identification, we have investigated the pathogenic model of CDI (Fig. 16.13). In Fig. 16.13, solid arrow lines represent pathogen infection and host cellular responses with literature support, and dashed arrow lines represent the identified/predicted responses. The red lines denote crucial epigenetic activities. In addition, the identified important epigenetic activities were labeled by stars. During the early stage of infection in Fig. 16.13A, epigenetic activities play a central role in the beginning of pathogenesis. The acetylation of HSP90B1 and the deacetylation of HSPA5, and the interaction of these HSPs with pathogen toxins (CD0660 and CD0663), could impair the chaperone activity of host cell, resulting in the formation and accumulation of misfolded proteins. In addition, C. difficile toxins (CD0660 and CD0663) display their cytotoxicity by inactivating Rho GTPases (RHOA, CDC42) via glucosylation, causing the disturbance of actin cytoskeleton homeostasis and tight junction breakdown [751]. Another central epigenetic activity is ubiquitination. The ubiquitination-mediated endocytosis of FPR1 allows C. difficile toxins to enter host cells, and then CD0660 and CD0663 hijack a RAC1-mediated pathway to
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.4 Discussion and conclusion
FIGURE 16.13
423
Overview of the pathogenic model of Clostridium difficile infection [12]: (A) early stage and
(B) late stage.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
424
16. Investigating the host/pathogen cross-talk mechanism
activate SYK-dependent ROS production [765,799]. ROS production is a primary response of the immune system, but the toxin-activated overproduction of ROS and free radicals could damage host lipids, proteins, and DNA. To repair the damaged DNA, SCARA3 and YY1 are activated to initiate the DNA damage response. The dysfunction of HSPs together with YY1 can trigger the autophagy process, removing misfolded proteins and ROSinduced damage [816]. As a countermeasure, host miRNA can pass through the pathogen cell wall and then repress pathogen genes, including the antioxidative proteins CD3256 and CD0130, to promote the eliminating ability of host cells. When the infection proceeds to the late stage (Fig. 16.13B), several events together result in the turning point of CDI. The acetylation of CD0663 can promote the activity of this toxin and thus increase the probability of CD0663 to bind to CD46. CD46 can activate the complement system and enhance ROS production via NOX5. The complement system guides neutrophils, which can be recruited by interleukin-8 to remove toxins and pathogens [817]. In addition, FN1 can be activated by CD46 and modified by acetylation, and these interactions could enhance the activity of FN1 and its downstream inflammatory responses. Furthermore, the newly produced CD1466 displays its immunogenicity by interacting with SNW1 and EGFR, thus inducing the cytokine production and inflammation. Taken together, the presence of CD1466 and CD46, and the acetylation of CD0663 and FN1, could result in an enhanced oxidative stress and a severe inflammatory response. Meanwhile, the acetylation of HSPs (HSP90B1 and HSP90B2P) and their interactions with toxins (CD0663 and CD0660) could impair the chaperone activity, aggravating the accumulated ER stress [796]. These activities in turn increase the cellular stress of the host cell. For example, the neutrophil infiltration and the increased NADPH oxidase could cause the tissue damage of host cell, and the presence of C22orf28 reflects high ER stress [803]. On the other hand, these processes are assuredly life threatening to C. difficile. To counteract these threatening stresses, C. difficile can utilize numerous redox-related proteins, including superoxide dismutase CD1631, extracellular glutamate dehydrogenase CD0179, and thioredoxin CD1690 in the defense against oxidative stress. Finally, the accumulated oxidative and ER stress trigger the apoptosis process via the NF-κB complex. For C. difficile the high levels of ROS and scattered neutrophil risk the survival of pathogens. Therefore C. difficile transform to endospores by activating the sporulation pathway and then lie dormant for the next infection. As discussed previously, CDI is characterized by cytoskeleton dysfunction, severe inflammation, and subsequent apoptosis. The actin cytoskeleton breakdown is mainly induced by CD0663 and CD0660, but the correlation between cytoskeleton dysfunction and apoptosis remains unclear. Here, we could demonstrate that the main cause of apoptosis during CDI is the accumulation of cellular stress, including oxidative and ER stress. The emergence of oxidative and ER stress has been reported in studies using CD0660-infection models [810,818] but generates few attention and further researches. We have demonstrated that Caco-2 cells can activate the DNA damage response and autophagy to counteract these stressors in the early stage of CDI. Unfortunately, the accumulated stress and tissue damage caused by severe inflammation could induce host cells to undergo apoptosis in the late stage of infection. There has been a long-lasted argument about whether CD0660 or CD0663 is responsible for the cytotoxicity in host cells. Some early studies have reported that only CD0660 is
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
16.4 Discussion and conclusion
425
essential for the virulence of C. difficile and a number of patients with CDI are caused by CD0663-negative and CD0660-positive strains [819]. However, a later study using toxin knockdown technology on C. difficile 630 has indicated that both CD0660 and CD0663 are responsible for disease [820]. The identified results also support this conclusion. During the early stage of CDI, CD0660 functions prior to CD0663 and triggers numerous host responses such as ROS production and chaperone dysfunction. In the late stage, acetylation enhances the activity of CD0663, which takes the place of CD0660, inducing a severe inflammatory response, and aggravating ER stress. The CD0663-activating ability of CD0660 identified in our toxin regulation pathways is also consistent with this temporal relationship. Without CD0660, CD0663 may be not sufficient to initiate pathogenesis in host cells in the early stage of CDI and the low expression level of CD0660 cannot trigger the subsequent apoptosis in the absence of CD0663 during the late stage of infection. In contrast, unlike the pathogenic effects on the host, the intraspecies interactions and cellular mechanisms inside C. difficile are largely unknown. We have found that C. difficile could generate redox-balancing proteins against oxidative stress and utilize the toxin production and bacterial reproduction as offensive mechanisms at the early stage of CDI. During the late stage of infection, our results have indicated that C. difficile could utilize anti-ROS proteins, including CD0179, CD1631, and CD1690, as well as the DNA damage response to counteract the oxidative stress presented by host cells. The decreased activities of toxin production, bacterium cell growth, and endospore formation also demonstrate that C. difficile can actively transform to endospores to leave the infection site. Finally, the molecular mechanisms of progression from the early stage to the late stage of CDI are investigated, and the potential drug targets and corresponding multimolecule drug are also proposed for therapeutic design. The crucial events of progression in CDI, such as the acetylation of CD0663 and FN1, the chaperone dysfunction of HSPs, and the pathogen silencing induced by host miRNA, are mainly caused by the epigenetic regulations identified in the proposed systems biology method. These results suggest that epigenetic regulation plays an important role in the progression of infection since these cellular activities could change the cellular functions of host cells in a more rapid and efficient manner than the adjustment of gene regulation. To the best of our knowledge, there are few studies focusing on the epigenetic modulation on pathogenic and offensive mechanisms in the host cell infected by C. difficile. In addition, no existing whole-epigenomic data of the host cells can be considered as the basis of such studies. We have shown that epigenetic regulation will play a more important role in host/pathogen cross-talk mechanisms in CDI, which could provide a novel direction for deeper studies on molecular mechanisms for drug target predictions and multiple drug discovery. With the identification of GEINs and HPCNs our knowledge of the bioinformatics of C. difficile and the core proteins is bound to increase, driving further experimental hypotheses and investigation directions for cross-talk mechanisms, including the PPIN and gene regulation network (GRN) between host and C. difficile during infection. These further studies will complementarily complete our genetic-and-epigenetic network, providing a novel basis for developing the whole-genome cellular network of CDI. The upper panel (A) displays the dominant role of epigenetic activity, and the crosstalk interplay mechanisms between Caco-2 cells and C. difficile during the early stage of infection. The lower panel (B) shows the molecular and epigenetic activities during the
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
426
16. Investigating the host/pathogen cross-talk mechanism
late stage of CDI and the outcome of both organisms. The cross marks signify the dysfunction or the breakdown of proteins and structures accordingly; the solid arrow lines represent the protein interaction/cellular response with literature support; the dash arrow lines denote the identified/predicted response; the red lines denote crucial epigenetic activities; the red dots in the ER of Caco-2 cell signify misfolded proteins; and the definitions of epigenetic activities are the same as those defined in Fig. 16.8. In panel (A), the epigenetic activities dominate the initialization of host responses toward bacterial invasion. The acetylation/deacetylation of HSPs lead to the formation of misfolded proteins, the ubiquitination of FPR1 allows pathogen toxins enter Caco-2 cells via endocytosis and then trigger ROS overproduction, and the glucosylation of GTPases impair the homeostasis of cytoskeleton. In this situation, Caco-2 cells utilize DNA damage response and autophagy as defense mechanisms and eliminate pathogens via ROS with the promotion of miRNA silencing. Between two stages, the acetylation of CD0663 and FN1, and the presence of CD46 and CD1466 serve as the turning points of the progression of infection, resulting in enhanced ROS production and severe inflammation as shown in panel (B). The acetylation activity results in the dysfunction of HSPs and the assembly of NF-κB complex, inducing not only the production of IL8, which recruits neutrophils, but also the apoptosis process of host cells. To avoid the recruited phagocytic neutrophils and human-produced ROS, C. difficile cells activate the sporulation pathway to transform to endospore.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
C H A P T E R
17 Investigating the common pathogenic mechanism for drug design between different strains of Candida albicans infection in OKF6/TERT-2 cells by comparing their genetic and epigenetic host/pathogen networks: Big data mining and computational systems biology approaches 17.1 Introduction Candida albicans is considered as a commensal fungus. Nonetheless, it will become an opportunistic pathogenic fungus according to host’s condition [821]. This pathogen is a Grampositive fungus but adapts to anaerobic and aerobic condition. Moreover, C. albicans exists in the oral and vaginal mucosa and gastrointestinal tract of many organelles [821], especially oral epithelial cell. The common disease includes pseudomembranous candidiasis and denture-associated erythematous candidiasis. Pseudomembranous candidiasis is also known as thrush, a condition that mainly influences infants and immunocompromised patients. The infection of C. albicans is not only mentioned earlier but also occurs in up to 90% of HIVinfected patients. In addition, the acceptance of cytotoxic chemotherapy will lead to a severe immunosuppression and promote the mucosal damage. Meanwhile, C. albicans is associated with a susceptible factor including a balance between the oral bacterial community which saliva does not eliminate pathogen and the long-term use of broad-spectrum antibiotic [822]. Furthermore, both C. albicans and hosts require some resources to support vital functions,
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00013-4
427
© 2021 Elsevier Inc. All rights reserved.
428
17. Investigating the common pathogenic mechanism for drug design
leading to competition under the conditions of resource limitation in the infected hosts. C. albicans will further overgrow in the oral epithelial cell and cause oral candidiasis. Dentureassociated erythematous candidiasis can also take place in oral prosthetic devices among elderly people. Due to poor cleaning, C. albicans is also attached to medical equipment and oral prosthetic devices in the hospital. Further, patients will be infected so that it will result in poor hygiene or mucosal damage in the oral cavity. However, SC5314 and WO-1 are two common strains of C. albicans employed in laboratory for the clinical research. Compared to SC5314, WO-1 in the white cell transforms to opaque cell at high frequency [823]. In addition, the sequence of C. albicans SC5314 is transcribed by previous studies so that C. albicans SC5314 is frequently used as a wild-type control derived from common laboratory [403]. Although previous studies could not indicate why C. albicans will be separated into different strains, strains SC5314 and WO1 are estimated to be separated from each other by a divergence time of 1 million years [824]. Maybe, both strains of C. albicans will exist in human body via constant evolution or adapt to host microenvironment. The OKF6/TERT-2 cell line applied to be a model of the human oral epithelial cell is acquired from human oral keratinocytes. In the past, based on the cell line TR146, previous studies have usually applied TR146 cell to do experiment and also employed it to pathogen infection [825]. However, TR146 is not considered to be human oral keratinocytes or true model. Due to biological technology advancement, the biological simulation technique is more and more mature. OKF6/TERT-2 cell line is a 3D system that could resemble the commercially available system based on the cell line TR146 [826]. The cell line includes in a multiple-layer epithelial structure where layers are organized similarly to the cells in native oral mucosa. Therefore it will be better represented by the normal submucosa and true human mucosa. However, in the immune system, epithelial cells become the first defense line to antagonize bacterial infection. Nonetheless, under the C. albicans infection condition, this monolayer of cell surface can be destructed by the pathogen’s hyphae or cell surface proteins, allowing C. albicans to enter oral mucosa and motivate oral mucosal immune cells such as macrophages, neutrophils, and dendritic cells. Moreover, some cell surface proteins of C. albicans can degrade host cell surface protein to enter the cell so that the whole C. albicans will invade the host cell. C. albicans infection often arises after the disturbance of normal oral microbiome following immunocompromised patients, including the HIV-infected patients or the broad-spectrum antibiotic treatment. After the decrease of immune system or the interference of the oral microbiota, C. albicans can form colonization on oral epithelial cells by hyphal growth, grow hyphae to penetrate cell, and yield pathogenic factors to degrade the barrier. However, only one of C. albicans pathogenic mechanisms cannot induce endocytosis, but endocytosis may indirectly result in invading host cell. The major pathogenic factors of C. albicans are distinguished from two parts. One part is cell wall proteins orf19.1816 (ALS3) and orf19.1321 (HWP1). Previous studies indicate that orf19.1816 will induce endocytosis by binding host cell receptors such as ERBB2, HSP90B1, CDH1, and CDH2 so that it will be considered as an infection initiation [827829]. Moreover, orf19.1321 is also investigated by previous studies and plays an important role in cell adhesion and biofilm formation [830]. Another part includes pathogenic factors released such as orf19.5714 (SAP1), orf19.3708 (SAP2), orf19.6001 (SAP3), orf19.5716 (SAP4), orf19.5585 (SAP5), and orf19.5542 (SAP6). These pathogenic factors indicate to induce inflammatory response and the degradation of host cell surface
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
17.1 Introduction
429
proteins. These pathogenic factors will recruit neutrophils and macrophages for eliminating pathogen and induce a critical inflammatory response. Moreover, not only pathogenic factors but also the hyphae growth of C. albicans can trigger inflammatory response. The morphological transition of C. albicans can change the yeast type to a filamentous form, namely, hyphae. In the past, the hyphal of C. albicans is discovered and regarded as a virulence factor by previous studies [7]. Further, C. albicans will find nutrient sources or metal ion by hyphae growth so that hyphal will penetrate actively host cell to induce an inflammatory response in the host cell. In addition, C. albicans is polymorphic pathogen such as yeast type, pseudohyphae, true hyphae, and biofilm [4]. The yeast cell is also differentiated between white cell and opaque cell. Because of the polymorphic type of C. albicans, some transcription factors (TFs) of C. albicans must modulate these patterns. Currently, two TFs of C. albicans are orf19.610 (EFG1) and orf19.4433 (CPH1), respectively. Orf19.610 is the most important TF by the indication of previous studies and can regulate morphogenesis such as hyphae, biofilm, and white cell [831]. Moreover, Orf19.610 even modulates white cell to opaque cell. Therefore C. albicans WO-1 may be modulated by this TF. On the other hand, orf19.4433 contributes to the pheromone-stimulated biofilm and galactose metabolism. Due to the pheromone-stimulated biofilm, C. albicans will produce next generation [4]. However, C. albicans has been also investigated by numerous previous studies, but the cross-talk of interspecies mechanisms is less discovered apart from the mentioned previously. Mostly, the interspecies mechanisms are discussed between C. albicans and other animals such as zebrafish in the previous chapters. Nonetheless, the importance of hostpathogen interactions was recently highlighted in the previous study [832]. In the previous study, dynamic network modeling, protein interaction databases, and dual transcriptome data from zebrafish and C. albicans during infection were used to infer infection-activated hostpathogen dynamic interaction networks [832]. However, the study supplied host/pathogen proteinprotein interaction (PPI) model of zebrafish and C. albicans to understand innate and adaptive networks of zebrafish, but zebrafish is still different from human because of physiological phenomena. In addition, the cultivated human cell technique is still less mature in the past such as TR146. Therefore it is difficult to simulate the true human model. Not only cell culture technology but also pathogen control is a difficult problem. Due to C. albicans WO-1, it is susceptible to microenvironment so that C. albicans WO-1 is rarely used in laboratory. Although biological technology has a gradual progression and the cultivated cell technique is with a constant updating, interspecies infection is still difficult to investigate. In this case the genome-wide genetic-and-epigenetic interspecies networks (GEINs) established by systems biology approach might offer us a systematic point of host cellular functions with pathogen infection. Therefore we should use human cell to show true model and employ systems biology to comprehensively understand the infection progress. Furthermore, other molecular mechanisms to modulate gene expression are epigenetic regulations of microRNAs (miRNAs) and long noncoding RNAs (lncRNAs) systems, respectively. MiRNAs are short noncoding RNA molecules (2123 nucleotides). By binding with the corresponding mRNA-induced silencing complex, host cell genes will be silenced to decrease cellular progression. Surprisingly, recent studies suggest that host cell employs miRNA silencing to repress C. albicans gene [833], indicating that miRNAs play an important role in pathogen and host gene modulation. Moreover, lncRNAs have also played an important role in the epigenetic regulation of host cell in recent years. In controlling various cellular
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
430
17. Investigating the common pathogenic mechanism for drug design
responses, lncRNAs could participate gene regulation in a similar manner but with a more complicated process than miRNAs. But, recent researches do not suggest that lncRNAs modulate fungus gene for controlling pathogen responses. Furthermore, we also consider other epigenetic modifications such as histone modification and DNA methylation that give cell responses to pathogen invasion and host defense. These activities mentioned previously will alter the cross-talk behavior between pathogen infection and host cell defense. In order to investigate common and specific hostpathogen cross-talk mechanisms and epigenetic activities contributing to infection progression during different strains of C. albicans, we will then recognize cross-talk GEINs between human oral epithelial cells (OKF6/TERT-2 cell) and C. albicans SC5314 as well as C. albicans WO-1. Next, we will extract the hostpathogen core networks (HPCNs) from GEINs and then project HPCNs to KEGG pathways to discover the core pathways involved in common and specific cellular response of host and pathogen between different strains of C. albicans during the infection process. In addition, we will discuss the offense and defense mechanisms between host and different strains C. albicans and identify some important biomarkers for the pathogenesis of C. albicans infection. In this chapter, we indicate that orf19.5034 (YBP1) has anti-ROS (reactive oxygen species) ability, and orf19.939 (NAM7), orf19.2087 (SAS2), orf19.1093 (FLO8), and orf19.1854 (HHF22) play an important common role in hyphae growth and pathogenesis in the infection of C. albicans SC5314 and C. albicans WO-1. Moreover, orf19.5585 (SAP5), orf19.5542 (SAP6), and orf19.4519 (SUV3) will cause biofilm formation. In addition, orf19.7247 (RIM101) will coordinate other pathogen proteins for the degradation of host cell protein CDH1. Further, previous studies have indicated that orf19.1816 (ALS3), orf19.610 (EFG1), orf19.1321 (HWP1), orf19.4433 (CPH1), and orf19.723 (BCR1) are considered as important roles to endocytosis, morphological transformation and verified by our results [4,411,830,831]. Eventually, on the basis of our results, we will propose the above pathogen proteins as potential drug targets because of their crucial roles in the common mechanisms between different strains of C. albicans. Therefore multimolecules drugs such as Terbinafine, Cerulenin, Tunicamycin, Tetrandrine, and Tetracycline will be introduced to target multiple potential drug targets for the therapeutic treatment of different strains of C. albicans.
17.2 Materials and methods 17.2.1 Overview of the construction of genetic and epigenetic interspecies networks and hostpathogen core cross-talk networks in OKF6/TERT-2 cells line during the infection of C. albicans SC5314 and C. albicans WO-1 To investigate the common and specific pathogenetic mechanisms between C. albicans SC5314 and C. albicans WO-1 during the infection of human oral epithelial OKF6/TERT-2 cells, we will identify the cross-talk GEINs and extract HPCNs between human and C. albicans SC5314 as well as C. albicans WO-1 during the infection of OKF6/TERT-2 cells, respectively. In Fig. 17.1 the flowchart of constructing HPCNs between host and pathogen at the infection of C. albicans SC5314 and C. albicans WO-1 via big data mining, model construction, and network identification for investigating pathogenic mechanisms and inferring potential drug targets is shown. From Fig. 17.1, we may consider the constructions of GEINs and HPCNs under the
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
FIGURE 17.1 The flowchart of the systems biology method applied to construct GEINs for extracting HPCNs to discover the common and specific pathogenic mechanisms during the infection of Candida albicans SC5314 and C. albicans WO-1 for drug targets and potential common molecule drugs [13]. The gray blocks indicate the big data for constructing candidate GEIN and the blue blocks represent the input information, including microarray raw data and the surveyed literature for drug design in this chapter; the blocks with gray frame denote systems biology approach utilized to construct real cross-talk GEINs in the infection of C. albicans SC5314 and C. albicans WO-1 and then extract the HPCNs of two replicates via PNP method; the white rounded rectangular blocks are the corresponding results obtained from these processes and the inferred drug targets and proposed common multiple-molecule drugs. GEINs, Genetic-and-epigenetic interspecies networks; PNP, principal network projection.
432
17. Investigating the common pathogenic mechanism for drug design
following steps: (1) big data mining and data preprocessing of host/pathogen gene/miRNA expression; (2) the construction of candidate GEIN, which consists of candidate host/pathogen intraspecies PPI networks (PPINs), candidate interspecies PPINs between host and pathogen, candidate host/pathogen gene/miRNA regulation networks (GRNs), candidate miRNA regulation networks of host miRNAs on host/pathogen-genes and candidate lncRNAs regulation networks of host-lncRNAs on host-genes/pathogen-genes; (3) the network identification process for detecting the real interspecies GEINs via system identification method and system order detection scheme to prune the false positives in the candidate interspecies GEINs by using the two-sided genome-wide NGS (next-generation sequencing) data of OKF6/TERT-2 cells and C. albicans during infection (see Sections 17.6.217.6.4); (4) the extraction of the HPCNs by applying the principal network projection (PNP) method to the real interspecies GEINs (see Section 17.6.5). Therefore we can project true HPCNs to KEGG pathways to obtain core crosstalk pathways and compare to investigate the crucial common and specific pathogenic mechanisms that contribute to the infection progression of C. albicans SC5314 and C. albicans WO-1, respectively, and infer the common network biomarker as the potential multiple drug targets for common multiple drug design.
17.2.2 Data preprocessing of microarray data for human and pathogen To identify the cross-talk activities between host and pathogen during the infection, it is necessary to simultaneously measure the gene expression of host and pathogen. However, C. albicans SC5314 and C. albicans WO-1 must infect human cell, respectively. C. albicans WO-1 is difficult to control and extremely sensitive to anaerobic environment. Therefore C. albicans WO-1 can change its own morphology easily to transform white cell to opaque cell. In the previous studies, the full catalog of transcriptional changes is not complete owing to the limitations of microarrays, which lead to a limited dynamic range and poor sensitivity to analyze low abundance transcripts. However, in the present study, the raw data obtained from the previous study to investigate the transcription profiles of both immortalized oral epithelial cells (OKF6/TERT-2 cell line) and C. albicans is the only available dataset to provide sufficient information for constructing the candidate GEIN [834]. The dataset not only includes simultaneous human and pathogen gene expression but also contains different species. The two-sided NGS data issued by Liu et al. has two parts [834]. The first part includes the mRNA/miRNA expression profiles of two biological replicates of OKF6/TERT-2 cell line at 90, 300, 480 min postinfection with C. albicans SC5314. For OKF6/TERT-2 cells, the medium was cultured in Dulbecco’s modified Eagle’s medium without serum at 37 C before the cell was infected. The second part includes the mRNA expression profiles of two biological replicates of C. albicans SC5314 in OKF6/TERT-2 cells at 90, 300, 480 min postinfection (GEO accession number GSE56093; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? acc 5 GSE56093). Likewise, the NGS data of C. albicans WO-1 is with the same procedure as C. albicans SC5314. In this chapter the strain SC5314 was originally recovered from a patient with generalized candidiasis. On the other hand, the strain WO-1 was isolated from the blood and lungs of a patient suffering from systemic candidiasis. Next, the platform used in the host and pathogen is Illumina HiSeq 2000. The microarray data were verified by qRT-PCR.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
17.3 Investigating pathogenic mechanism of C. albicans infection by comparing genetic and epigenetic interspecies networks
433
Meanwhile, it has to use cubic spline method for data interpolation and extrapolation to obtain the sufficient number of data points at these time courses and the information after 8 h postinfection for avoiding the overfitting when performing the system identification process to construct real GEINs by the corresponding two-sided microarray data. Hence, the cubic spline method is employed to interpolate and extrapolate the two-sided microarray data from 0 min to 12 h.
17.3 Investigating pathogenic mechanism of C. albicans infection by comparing genetic and epigenetic interspecies networks 17.3.1 The identified interspecies genetic and epigenetic interspecies networks under the infection of Candida albicans SC5314 and Candida albicans WO-1 The interspecies GEINs of C. albicans SC5314 infection with OKF6/TERT-2 cells of two replicates are shown in Fig. 17.A3A and B, respectively, by applying network visualizing software Cytoscape [88]. Similarly, the interspecies GEINs of C. albicans WO-1 infection of two replicates with OKF6/TERT-2 cells are shown in Fig. 17.A3C and D, respectively. The number of identified nodes and edges are shown in Tables 17.1 and 17.2, respectively. Among C. albicans SC5314 of two replicates, the number of nodes of replicate 1 is higher than replicate 2, and the identified edges in Table 17.2 show important differences between infection individuals of C. albicans SC5314. On the other hand, among two replicates of C. albicans WO-1, the number of nodes of replicate 1 is similar to replicate 2, and the identified edges in Table 17.2 exhibit little differences between infection individuals of C. albicans WO-1. Overall, the infection progression of C. albicans WO-1 is more stable than C. albicans SC5314. TABLE 17.1 Information about the numbers of nodes of candidate interspecies genetic-and-epigenetic interspecies networks (GEINs) and real interspecies GEINs by the proposed system identification method in the infection of Candida albicans SC5314 and C. albicans WO-1 of two replicates [13]. Nodes
Candidates_SC5314
SC5314_R1
SC5314_R2
Candidates_WO-1
WO-1_R1
WO-1_R2
HP
23,217
15,817
9979
20,694
14,682
15,621
HR
3025
1960
1141
2488
1719
1743
HT
5649
855
357
4696
384
427
HM
607
385
370
412
331
342
HL
333
166
101
258
131
129
PP
3766
2414
2195
3736
3413
3411
PT
487
254
255
483
253
280
Total nodes
37,229
21,851
14,398
32,767
20,913
21,953
SC5314_R1 and SC5314_R2 represent SC5314 of replicate 1 and SC5314 of replicate 2, respectively. Similarly, WO-1_R1 and WO1_R2 denote WO-1 of replicate 1 and WO-1 of replicate 2, respectively. HP denotes host protein (excluding host receptor and host TF); HR, HT, HM, and HL represent host receptor, host TF, host miRNA, and host lncRNA, respectively. PP means pathogen protein excluding pathogen TF and PT represents pathogen TF.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
434
17. Investigating the common pathogenic mechanism for drug design
TABLE 17.2 Information about the identified number of edges of candidate interspecies genetic-andepigenetic interspecies networks (GEINs) and real interspecies GEINs by proposed system identification method in the infection of Candida albicans SC5314 and C. albicans WO-1 of two replicates [13]. Edges
Candidates_SC5314
SC5314_R1
SC5314 R2
Candidates_WO-1
WO-1_R1
WO-1_R2
HT-HG
152,491
8036
5039
141,001
7963
7638
HT-HM
1347
154
21
901
62
42
HT-HL
271
162
98
200
129
124
HL-HG
37
1
0
35
0
0
HM-HG
170,671
2401
2408
118,017
2799
3744
HM-HL
130
4
3
78
2
5
HM-HM
45
6
1
21
0
4
HT-PG
21,328
374
242
20,177
586
489
HM-PG
23,730
394
328
16,639
543
771
HL-PG
1
0
0
1
0
0
PT-HM
48
9
1
29
0
1
PT-HG
8910
208
169
8726
220
248
PT-PG
86,491
7225
5112
85,211
5723
7167
HG—HG
6,449,171
30,892
24,665
5,521,448
119,623
127,687
HG—PG
1,615,845
10,248
7540
1,453,353
45,214
47,146
PG—PG
132,600
4830
3988
131,485
33,643
34,707
“-” Transcriptional or posttranscriptional regulation; “—” proteinprotein interaction; HG, host genes; PG, pathogen genes.
Therefore, compared to C. albicans WO-1, C. albicans SC5314 in the infection progression more easily produces the individual difference. To further find gene functions in OKF6/TERT-2 cells during the infection of two strains of C. albicans according to their functional groups, we could exhibit the specific host cellular functions and functional abundance analysis of related pathways of the conserved host target-genes among two replicates on the basis of Gene Ontology (GO) terms by applying the DAVID analysis [835]. In Table 17.3, the infection progression of C. albicans SC5314 was characterized by the redistribution of epithelial cell barrier, cell shape, and cell adhesion so that they can activate the corresponding oxidation response to the inflammation response and metal-binding, which could act as an important character in the struggle for nutrients and metal material between host and pathogen. While considering the infection progression of C. albicans WO-1, it is similar to C. albicans SC5314. However, in addition to finding host gene functions, pathogen gene functions are also found. Based on Candida Genome Database, we have performed the specific pathogen functions and functional plenty analysis of the conserved pathogen target-genes among two replicates based on GO terms by applying a GO Term Finder. In Table 17.4 the gene functions of C. albicans
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
17.3 Investigating pathogenic mechanism of C. albicans infection by comparing genetic and epigenetic interspecies networks
435
TABLE 17.3 The specific and common host cellular functions and functional abundance analysis of related pathways of the conserved host target-genes among two replicates in the infection of Candida albicans SC5314 and WO-1 on the basis of GO (Gene Ontology) terms by applying the DAVID analysis [13]. Term
P-value
GOTERM_MF_DIRECT
GO:0016712oxidoreductase activity (0.47%)
0.00298310389230338
GOTERM_MF_DIRECT
GO:0003676nucleic acid binding (5.5%)
0.0129970927095585
UP_SEQ_FEATURE
Metal ion-binding (0.43%)
0.0176534099242106
GOTERM_BP_DIRECT
GO:0016338cell-adhesion (0.3%)
0.0181497359081671
KEGG_PATHWAY
hsa03030:DNA replication (0.39%)
0.034840174
GOTERM_MF_FAT
GO:0008135nucleic acid binding (0.89%)
0.00700213339377146
UP_SEQ_FEATURE
Metal ion-binding (0.45%)
0.00288724333819777
KEGG_PATHWAY
hsa03030:DNA replication (0.45%)
0.015170596995848
SP_PIR_KEYWORDS
Cell adhesion (2.64%)
0.0332759579717729
GOTERM_MF_FAT
GO:0016651oxidoreductase activity (0.7%)
0.0337336128643766
Category SC5314
WO-1
SC5314 are characterized by epithelial cell barrier so that C. albicans SC5314 can make morphological transition due to structural molecule activity and molecular function regulation. Besides, C. albicans SC5314 can invade oral epithelial cell via the induced endocytosis and active penetration which are applied by the hydrolase activity, protein binding and structural molecule activity. While considering the gene functions of C. albicans WO-1, it is the same as C. albicans SC5314. Because the interspecies GEINs are very complicated, it is hard to investigate the accurate common and specific infection pathogenic mechanism from these networks directly. We thereby used PNP approach (see Section 17.6.5) to extract the core interspecies GEINs to find the corresponding hostpathogen core networks (HPCNs) in Figs. 17.A4 and 17.A5 (see Appendix) from GEINs of OKF6/TERT-2 cells during the infection of C. albicans WO-1 and C. albicans SC5314, respectively. From this approach, we can further investigate common and specific pathogenic mechanisms of the C. albicans infection. The C. albicans SC5314 infection is characterized by the transformation of host cell shape, and this will result in the activities of GTPases because pathogen proteins adhere to host cell surface. However, while C. albicans SC5314 executes endocytosis function, cytoskeleton, and cytoplasm change cell type in the host cells so that it will produce DNA replication and nucleic acid binding to activate gene expression. In addition, the metal ionbinding ability plays a crucial role for human and pathogen because of its function in the looking for metallic nutrients. By this process, host cell also produces ROS-related molecule although the ion is toxic. Applying the ROS-related molecule also leads to
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
436
17. Investigating the common pathogenic mechanism for drug design
TABLE 17.4 The specific and common pathogen functions and functional abundance analysis of related pathways of the conserved target- pathogen genes among two replicates in the infection of Candida albicans SC5314 and WO-1 on the basis of GO (Gene Ontology) terms by applying the CGD GO Term Finder analysis [13]. GO term
P-value
5198
Structural molecule activity (6.8%)
3.18E 2 19
5515
Protein binding (12.2%)
8.85E 2 15
16,740
Transferase activity (15.6%)
4.38E 2 10
98,772
Molecular function regulator (5.7%)
2.53E 2 8
16,787
Hydrolase activity (16.3%)
5.32E 2 05
5515
Protein binding (12.2%)
2.14E 2 17
16,787
Hydrolase activity (17.7%)
1.13E 2 14
16,740
Transferase activity (15.8%)
6.16E 2 13
98,772
Molecular function regulator (5.7%)
7.02E 2 10
5198
Structural molecule activity (5.4%)
4.98E 2 06
GOID SC5314
WO-1
eliminating C. albicans SC5314. Similarly, the C. albicans WO-1 infection is characterized by the transformation of host cell shape, and this will result in the activities of GTPases because pathogen proteins adhere to host cell surface. However, while C. albicans WO-1 executes endocytosis function, cytoskeleton, and cytoplasm change cell type in the host cells so that it will produce DNA replication and nucleic acid binding to activate gene expression. In addition, the metal ionbinding ability plays a crucial role for human and pathogen because of its function in the looking for metallic nutrients. By this process, host cell also produces ROS-related molecule, although the ion is toxic. Applying the ROSrelated molecule also leads to eliminating C. albicans WO-1. The C. albicans SC5314 pathogenic function is considered by the alteration of pathogen cell morphological transformation, including structural molecule activity, molecular functional regulation, and transferase activity. Currently, C. albicans is also infected by transferase activity so that it will produce more hyphal growth. It is unnecessary to grow hyphae by C. albicans TFs. Moreover, hydrolase activity will lead to ion production by modulating pathogen cell. Similarly, the C. albicans WO-1 function is also considered by the alteration of pathogen cell morphological transformation, including the structural molecule activity, molecular functional regulation, and transferase activity. C. albicans WO-1 also has hydrolase activity and transferase activity. Moreover, adaption to environment will execute additional cellular functions such as the transformation to opaque cell.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
17.3 Investigating pathogenic mechanism of C. albicans infection by comparing genetic and epigenetic interspecies networks
437
17.3.2 The hostpathogen core cross-talk networks during the infection of C. albicans SC5314 and C. albicans WO-1 Employing PNP approach to interspecies GEINs in Fig. 17.A3A and B, we could evaluate the projection value of each node by Eq. (17.A40) for the building of HPCN. Host proteins with top 5000 projection values and pathogen protein with top 1500 projection values based on intraspecies ranking in all two replicates and their connected genes/miRNAs/lncRNAs are selected as core network nodes of interspecies GEINs of two replicates. Since the recognized interspecies GEINs in Fig. 17.A3A and B, respectively, are part of two replicates, which are biological replicates from the same cell line, the recognized differential regulations and interactions can be considered as the adaptability of cells while confronting stimulus and stress at different replicates. For more intact information the combinations of these interactions and regulations in two replicates are viewed as real interspecies GEINs as shown in Fig. 17.A3E. Next, we extract core nodes from the real interspecies GEINs in Fig. 17.A3E by PNP approach in Section 17.6.5 to construct HPCNs as shown in Fig. 17.A4 at the infection of C. albicans SC5314. Likewise, HPCNs of C. albicans WO-1 are constructed by the same procedure and shown in Fig. 17.A5. Comparing Figs. 17.A4 and 17.A5, the number of core proteins in the HPCN during the infection of C. albicans WO-1 is higher than that in the HPCN of C. albicans SC5314 infection. Because C. albicans SC5314 infection produces a number of larger individual difference in host cells, the difference can cause different projection values among different replicates. However, the C. albicans WO-1 infection with OKF6/TERT-2 cells mostly causes several cellular functions by same molecules. Accordingly, a molecule can cause high projection value among different replicates. Hence, only the pathogen proteins within top 1500 projection values and host proteins with top 5000 projection values are selected as core network nodes of HPCNs in the infection of C. albicans SC5314. To further investigate common and specific progression genetic and epigenetic mechanisms between C. albicans SC5314 and C. albicans WO-1 from Figs. 17.A4 and 17.A5, based on the projection of HPCNs to KEGG pathways, we then construct host core signaling pathways among different strains via selecting core membrane proteins, including core receptors, core TFs, core proteins, and their regulatory miRNAs and lncRNAs. On the other hand, we construct pathogen core signaling pathways among different strains via selecting cell wall core proteins, signal transmission core pathogen proteins, and core pathogen TFs. However, it is still not complete for core proteins to construct a signal transduction pathway by the proteins mentioned earlier. We still need to consider the role of epigenetic modifications such as acetylation, methylation, ubiquitination, and phosphorylation. These epigenetic modifications can contribute to both basal levels χH i and χPj in the host and pathogen PPIN in Eqs. (17.A1) and (17.A2), respectively, because these basal levels are employed to model unknown interactions except those mentioned in these equations. When a PPI basal level exceeds a threshold, the core proteins with an overtaking threshold of basal level in infection progression are speculated that these core proteins may be affected by epigenetic modification such as acetylation, methylation, ubiquitination, and phosphorylation. Moreover, the genes with an overtaking threshold of basal level in the infection progression are speculated that these core genes may be influenced by DNA methylation. The core pathways of each strain in infection progression are described in the following and shown in Figs. 17.2 and 17.3.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
Candida albicans SC5314 DNA damage response
Hyphal growth
Endocytosis
Hydrolase release Biofilm formation
TFs
Response to ROS SAP5
SAP6
RSR1
ALS3
DLH1
Host miRNAs
Cytosolic proteins
HWP1
orf19.578 orf19.666
SUV3 orf19.3292
Ligands
Genes
Cell adhesion
Membrane proteins
HHF22 ARP2 TEC1
CAP1 YBP1
BCR1
CPH1
Acetylation
EFG1
SFL1
Methylation Phosphorylation
orf19.581 SAS2 PRA1
GCN5 NDT80 ASH2
PH
Ubiquitination
orf19.2995
DNA methylation
NAM7
DLH1 HHF22
FAR1
orf19.6082
Pathogen
CDC42 orf19.666 ERG6
orf19.2306
RIM101
FLO8
RSR1
DEM1
UGT51C1
PSD2
SEC15
SFL2
SAP6 SAP5
HWP1
ALS3
HHF1 ROS
SAP5 IL15RA
ERBB2
EGFR
HSP90B1
Pathogen cell wall
orf19.578
TJAP1
CDH1
SAP6
Host plasma membrane
C3
ARRB2
RER1 TMEM205
MAPK6 IL1B
HIST1H4B
TNFAIP8L1 CCDC22
UBC
BPHL MRPL50
SSR4
LIPE MAPK1
GRB2
MYC DHX9
Host
GPR89A
UCN2 AVEN VCAM1
C18orf8
SH2D1A RAB12
HMGN1P4 EED
MAPK14
CTSH JUN
ETS1
FOXA1
miR1972-2 miR548D2
NFKB1
UBC
PPARD
ROS production
Apoptosis
ECM degradation
AVEN
UCN2 IL1B
TNFAIP8L1
Autophagy
miR143HG
miR3941
IL20 DEFB4A
Host nuclear membrane
YBX1 miR31
CCDC22 SERPINF1
FOS
miR30B
MMP12
BPHL
GATA1
Inflammatory response
Innate immune response
FIGURE 17.2 The core cross-talk pathways extracted and rearranged based on KEGG pathways from the cross-talk HPCN in Fig. 17.A4 during the Candida albicans SC5314 infection. The upper layer is the pathogen core pathways and the lower layer signifies the host core pathways during the C. albicans SC5314 infection. The gray lines represent the proteinprotein interaction; the red arrow lines are transcriptional regulation; the green dot lines denote the protein translation; the black dash lines indicate the protein secretion; the blue lines with circle endpoint represent miRNA repression and the circles with purple frame and arrow lines represent the production activity and response of ROS. The pathogenic factor orf19.1816 (ALS3) of C. albicans SC5314 triggers to induce endocytosis. The TFs orf19.1623 (CAP1) and orf19.5034 (YBP1) are pathogenic factor of C. albicans SC5314 to react via ROS of host production. Moreover, OKF6/TERT-2 cells apply autophagy and immune response to recruiting immune cells such as macrophages and neutrophils to eliminate C. albicans SC5314. Finally, the endoplasmic reticulum stress reflects on the accumulated cellular stress and host cell extrusion so that host cells will produce severe inflammatory response and cause apoptosis process. In addition, pathogenic factors orf19.5585 (SAP5) and orf19.5542 (SAP6) also generate inflammation response and apoptosis process of the host cell [13]. miRNA, MicroRNA; ROS, reactive oxygen species.
17.3 Investigating pathogenic mechanism of C. albicans infection by comparing genetic and epigenetic interspecies networks
439
17.3.3 Analysis of core interspecies pathways to investigate host/pathogen cross-talk and common and specific pathogenic progression mechanisms during C. albicans SC5314 infection As shown in Fig. 17.2, in the infection progression of C. albicans SC5314, orf19.1816 (ALS3) plays a significant role in the cell adhesion and endocytosis induction by interacting with EGFR, ERBB2 (also known as HER2), CDH1 (also known as E-cadherin), HSP90B1 and TJAP1 (also known as TJP4) [176,827829]. However, orf19.1321 (HWP1) also plays an important role in the cell adhesion and maintenance of hyphal cell wall [830]. Orf19.610 (EFG1) and orf19.4433 (CPH1) are important TFs for regulating biofilm, hyphal growth, and virulence [4,830,831]. Through pathogen cell surface proteins, orf19.1816 (ALS3) interacts with orf19.610 (EFG1) and orf19.723 (BCR1) to modulate TF orf19.610 (EFG1) and orf19.723 (BCR1), respectively [411,836]. The TF orf19.610 (EFG1), which is also mediated by orf19.454 (SFL1), positively regulates cell adhesion and endocytosis-related gene orf19.1816 (ALS3) and biofilm-related gene orf19.1854 (HHF12). The TF orf19.723 (BCR1), which is also triggered by the influence of TF orf19.5908 (TEC1), negatively regulates cell adhesion and biofilm-related gene orf19.1321 (HWP1) (P-value , 1 3 10216). On the other hand, one of pathogen cell surface proteins, orf19.1321 (HWP1), triggers TFs orf19.5908 (TEC1), orf19.610 (EFG1), orf19.723 (BCR1), and orf19.4433 (CPH1) by signaling proteins orf19.3969 (SFL2), orf19.454 (SFL1), and orf19.1093 (FLO8) [356,837]. The TF orf19.4433 (CPH1) positively regulates gene orf19.3760 (DLH1) (P-value , 1.5 3 1026), endocytosis-related genes orf19.578, and both gene orf19.5542 (SAP6) and orf19.5585 (SAP5) which are related to the hydrolytic activity and biofilm formation. Furthermore, the TF orf19.4433 (CPH1) negatively regulates DNA damage-related gene orf19.666, which is also positively regulated by TF orf19.5908 (TEC1), cell adhesion and endocytosis-related gene orf19.1816 (ALS3), and hyphae growth-related gene orf19.2614 (RSR1). The other two pathogen cell surface proteins are orf19.578 and orf19.1059 (HHF1). Orf19.578 binds to ARRB2 so that it can induce endocytosis and exocytosis. However, after receiving the corresponding signal, orf19.578 can activate TFs orf19.1623 (CAP1) and orf19.5908 (TEC1) through signaling transduction proteins orf19.1418 (SEC15), orf19.2614 (RSR1), orf19.390 (CDC42), orf19.7105 (FAR1), orf19.666, orf19.3760 (DLH1), orf19.2119 (NDT80), orf19.454 (SFL1), and orf19.1854 (HHF22). The TF orf19.1623 (CAP1) positively regulates the hyphae growth and biofilm-related gene orf19.4519 (SUV3). The TF orf19.5908 (TEC1) positively regulates gene orf19.666 and negatively modulates orf19.1854 (HHF22), respectively. Finally, some of C. albicans SC5314 can go into host cell by the induced endocytosis, and the other of C. albicans SC5314 forms colony morphology by orf19.1059 binding to IL15RA. After receiving the corresponding signal, orf19.1059 (HHF1) can trigger TFs orf19.1623 (CAP1) and orf19.5908 (TEC1) via signaling transduction proteins orf19.3954 (PSD2), which is affected by the orf19.1631-induced methylation, orf19.1631 (ERG6), which is affected by the orf19.3964induced methylation, orf19.6082, which is influenced by the orf19.169-induced (CHO2) methylation, orf19.3964 (ASH2), which is influenced by the orf19.1631-induced methylation and orf19.705-induced acetylation, orf19.705 (GCN5), orf19.5034 (YBP1), orf19.2616 (UGT51C1), orf19.2306, orf19.4392 (DEM1), orf19.1854 (HHF22), orf19.666, orf19.3760 (DLH1), orf19.2119 (NDT80), and orf19.454 (SFL1) [838,839]. In addition, orf19.2087 (SAS2), which is affected by the orf19.705-induced acetylation, triggers TF orf19.1623 (CAP1) indirectly and binds to
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
440
17. Investigating the common pathogenic mechanism for drug design Candida albicans WO-1 Hydrolase release
DNA damage response
Hyphal growth
Cell adhesion
Endocytosis
Biofilm formation
White-opaque switch
Ligands
Genes
SUV3 HWP1 orf19.666
Response to ROS
RSR1
ARP2
SAP5
ALS3
WOR1
orf19.578
CPH1
TEC1
CAP1
Membrane proteins
EFG1
BCR1
Acetylation
WOR1
YBP1
CO2
SFL1 NDT80
Methylation
orf19.581 PH
PAR1 DLH1
ASH2
Phosphorylation
orf19.2995
SAS2 orf19.666
FAR1
orf19.6082
Ubiquitination
NAM7 HHF22
CDC42
ERG2
DNA methylation
FLO8 RIM101
RSR1
DEM1 orf19.2306 UGT51C1
PSD2
Host miRNAs
Cytosolic proteins
HHF22
orf19.3292
GCN5
TFs
SAP6
Pathogen
SFL2 SEC15
SAP6 SAP5
HHF1
ALS3
HWP1
Pathogen cell wall
orf19.578 SAP5
ROS
SAP6 IL15RA
HSP90B1
ERBB2
EGFR
CDH1
TJAP1
Host plasma membrane
C3
ARRB2
RER1 MAPK6
CCDC22 TMEM205
TNFAIP8L1
HIST1H4B
UBC
IL1B
MRPL50 BPHL
SSR4
LIPE
MYC DHX9
MAPK1
AVEN
GRB2
GPR89A
Host
C18orf8 VCAM1
UCN2 RAB12
HMGN1P4
SH2D1A MAPK14 EED CTSH
ETS1
JUN
FOXA1
NFKB1
miR210
GATA1
BPHL
CCDC22 HSP90B1
AVEN
PPARD DEFB4A
VMP1
Apoptosis
YBX1 miR31
miR3941
MMP12
TNFAIP8L1
ROS production
Autophagy
FOS
miR30B UBC
miR548D2
Inflammatory response
miR143HG
UCN2 IL20
Innate immune response
Host nuclear membrane
IL1B
ECM degradation
FIGURE 17.3 The core cross-talk pathways extracted and rearranged based on KEGG pathways from the cross-talk HPCN in Fig. 17.A5 during the Candida albicans WO-1 infection. The upper layer is the pathogen core pathways and the lower layer signifies the host core pathways during the C. albicans WO-1 infection.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
17.3 Investigating pathogenic mechanism of C. albicans infection by comparing genetic and epigenetic interspecies networks
441
L
orf19.939 (NAM7) interacting with orf19.1093 (FLO8). Therefore orf19.939 (NAM7), which interacts with the abovementioned proteins orf19.1093 (FLO8) and orf19.2087 (SAS2), can activate downstream proteins such as orf19.2995, which is affected by the orf19.3295-induced methylation, orf19.3111 (PRA1), which results in hyphae growth by neutral and alkaline pH, and orf19.581, which activates the abovementioned signals orf19.2995 and orf19.3111 (PRA1) and simultaneously is induced with methylation itself by orf19.2575. When hyphae initiate growth, C. albicans SC5314 can release protease such as orf19.5585 (SAP5) and orf19.5542 (SAP6) by hyphae. Moreover, the protease orf19.5585 (SAP5) could cooperate with pathogen protein orf19.7247 (RIM101) to degrade CDH1. Accordingly, some proteins of C. albicans 5314 proteins can go into host cell easily and initiate the degradation of membrane proteins which are involved in extracellular matrix (ECM). In response to pathogen infection the receptors such as EGFR, ERBB2, CDH1, HSP90B1, and TJAP1 at the host membrane interact with pathogen cell wall proteins to start inducing endocytosis. First, receptor EGFR could activate the downstream proteins BPHL and AVEN to modulate TFs, JUN (also known as c-Jun), which is affected by the USP12PX-induced ubiquitination and PRMT1-induced methylation, and ETS1, respectively (see Fig. 17.2) [840]. The TF ETS1 (also known as ETS-1) positively regulates gene BPHL involved in ROS production and inflammatory-related gene CCDC22 (CCDC22: P-value , 7 3 1023). Another TF JUN positively regulates ECM degradation through gene MMP12 and negatively regulates ROS production through apoptosis-induced gene PPARD, respectively. Second, receptor ERBB2 could trigger TFs NFKB1 (subunit of NF-κB), YBX1 (also known as YBX-1), and GATA1 by signaling transduction proteins, RER1, HIST1H4B, which is influenced by NEURL4-induced ubiquitination and PLA2G12B-induced phosphorylation, UBC affected by the OTUB1-induced ubiquitination and INPP4B-induced phosphorylation and interacted with BPHL, SSR4, LIPE, MAPK1 (also known as p38, ERK, PRKM1), GRB2, UCN2, SH2D1A, RAB12, DHX9 (also known as DDX9), C18orf8, and EED, which modulates both YBX1 and GATA1 [841]. TF NFKB1 could negatively regulate the resistance to Gram-negative fungus gene DEFB4A [842]. Both TFs YBX1 and GATA1 could positively regulate gene IL20 (YBX1: P-value , 1 3 1026, GATA1: P-value , 1 3 1023), which is involved in initiating the innate immune response. However, TF YBX1 with the SHMT1-induced methylation and USP24-induced ubiquitination could also negatively regulate the innate immune and inflammation-related gene UCN2 (P-value , 8 3 1026). In addition, TF GATA1 could simultaneously positively regulate gene AVEN, which can eliminate fungus by autophagy and induce the apoptosis and inflammation-related gene TNFAIP8L1 [843]. Third,
The gray lines represent the proteinprotein interaction; the red arrow lines are transcriptional regulation; the green dot lines denote the protein translation; the black dash lines indicate the protein secretion; the blue lines with circle endpoint represent miRNA repression and the circles with purple frame and arrow lines represent the production activity and response of ROS. The pathogenic factor CAWG_02005 (ALS3) of C. albicans WO-1 triggers to induce endocytosis. The TFs CAWG_02548 (CAP1) and CAWG_00057 (YBP1) pathogenic factor of C. albicans WO-1 will react via ROS of host production. Moreover, OKF6/TERT-2 cells apply autophagy and immune response to recruiting immune cells such as macrophages and neutrophils to eliminate C. albicans WO-1. Finally, the endoplasmic reticulum stress reflects on the accumulated cellular stress and host cell extrusion so that host cells will produce severe inflammatory response and cause apoptosis process. In addition, pathogenic factors CAWG_05098 (SAP6) and CAWG_05066 (SAP5) also lead to inflammation response and apoptosis process of the host cell [13]. miRNA, MicroRNA; ROS, reactive oxygen species.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
442
17. Investigating the common pathogenic mechanism for drug design
receptor HSP90B1, which is affected by the USP11-induced ubiquitination, binds to TF FOXA1 and TMEM205, which could modulate TF JUN and play an important role in chemotherapeutic agent [844]. The TF FOXA1 then negatively regulates to induce apoptosis-related gene SERPINF1 [845]. Fourth, receptor CDH1, which is affected by the NAT10-induced acetylation and binds to protease orf19.5585 (SAP5), triggers TFs JUN and FOS (also known as c-Fos) through signaling transduction protein CCDC22 [846]. The TF FOS affected by the UBFD1induced ubiquitination also negatively regulates gene MMP12 as mentioned earlier [847]. Finally, the receptor TJAP1 modulates TFs ETS1, YBX1, GATA1, and JUN by signaling transduction proteins, MAPK6 (also known as ERK3), TNFAIP8L1, MRPL50, MYC (also known as C-Myc), which is affected by the NAALADL1-induced acetylation and PDPR-induced phosphorylation, and activates the downstream proteins VCAM1 and AVEN, GPR89A, UCN2, VCAM1, HMGN1P4, which is affected by the KDM4A-induced methylation, MAPK14 (also known as p38, PRKM14), CTSH and AVEN (see Fig. 17.2). Besides, the receptor ARRB2, which is affected by the HDAC1-induced acetylation and HERC5-induced ubiquitination, binds to orf19.578 so that it can induce endocytosis and activate downstream proteins SSR4 and C18orf18 to modulate TF NFKB1 [848]. On the other hand, receptor IL15RA receives orf19.1059 (HHF1) of cell colony signal to interact MYC. Eventually, from the released signals by C. albicans SC5314, the receptor C3 (also known as C3a and C3b) binds to orf19.5542 (SAP6) to activate downstream protein IL1B (also known as IL-1β), which also is affected by the PPP2CA-induced phosphorylation and interacted with orf19.5585 (SAP5) [830,849,850]. When IL1B is stimulated by these signals mentioned earlier, it can evoke the apoptosis and inflammatory response [851]. Apart from the epigenetic regulations in core pathways mentioned earlier, we could also discover epigenetic regulations of miRNAs, including miR-30B, miR31, miR3941, miR143HG, miR1972-2, and miR548D2. MiR-30B can inhibit AVEN during C. albicans SC5314 infection to reduce autophagy and apoptosis [852]. UCN2 and CCDC22 are silenced by miR31 and miR1972-2 to decrease inflammatory response, respectively. In addition to inflammatory response, UCN2 can also decrease the innate immune response by miR31. However, IL1B can be stimulated to initiate apoptosis and inflammatory response, and miR3941 can also inhibit IL1B to reduce tissue necrosis. Moreover, we found miR548D2 inhibits orf19.3292 to strength the effect of ROS which was first produced by host cell. Besides, orf19.7292 is silenced by miR143HG to reduce hyphae growth of pathogen. Finally, there is one gene UBC with an overtaking threshold basal level, indicating that this might have been resulting in DNA methylation via infection progression [853].
17.3.4 Analysis of core interspecies pathways to investigate host/pathogen cross-talk and common and specific pathogenic mechanisms during C. albicans WO-1 infection As shown the core cross-talk pathways in Fig. 17.3, our results indicate that most interactions and regulations are same as Fig. 17.2. Accordingly, we only discuss different interactions and regulations in the C. albicans WO-1 infection from C. albicans SC5314 infection. For pathogen interactions, the CAWG_01529-induced (orf19.2575) methylation
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
17.3 Investigating pathogenic mechanism of C. albicans infection by comparing genetic and epigenetic interspecies networks
443
of CAWG_00472 (orf19.581) simultaneously influences upstream proteins CAWG_3130 (PRA1) and CAWG_01080 (orf19.2995) and downstream protein CAWG_00418 (WOR1). Nonetheless, CAWG_00418 (WOR1) is subjected to not only protein CAWG_04472 (orf19.581) but also microenvironment filled with CO2 so that CAWG_00418 (WOR1) can transform white cell to opaque cell. Furthermore, CAWG_02083 (EFG1) is also the main regulation of white-to-opaque switch of gene CAWG_00418 (WOR1) so that yeast cell of C. albicans WO-1 can transform white cell to opaque cell [295]. Other pathogen interactions signify that CAWG_04469 (orf19.578) cannot trigger any TFs by interaction between CAWG_05375 (FAR1) and CAWG_00299 (orf19.666) but interact with CAWG_00581 (CDC42) directly. Moreover, CAWG_04844 (ASH2) does not bind to CAWG_01970 (GCN5) due to reducing the epigenetic modification of CAWG_01970 (GCN5). Finally, because of the acetylation of CAWG_00969 (HHF1), CAWG_04836 (PSD2) could directly regulate CAWG_02542-induced methylation and be indirectly mediated by acetylation to increase interactions with CAWG_04444 (NAM7). On the other hand, in the pathogen regulations, it is noted that CAWG_00682 (CPH1) does not regulate the DNA damage-related gene CAWG_00299 (orf19.666); TF CAWG_02766 (TEC1) does not regulate the DNA damage-related gene CAWG_01979 (orf19.666). Instead, the TF CAWG_02766 (TEC1) could modulate cellular functions such as hyphae growth, biofilm formation, and white cell pheromone response. Due to the regulatory functions of TF CAWG_02766 (TEC1), our result may imply that C. albicans WO-1 could transform to opaque cell in the infection progression to reduce the regulation of CAWG_01979 (HHF22). On the host side, similarly, we only discuss different interactions and regulations in C. albicans WO-1 infection from Fig. 17.3. For host interactions, MRPL50 via the C. albicans WO-1 infection directly triggers TF FOXA1 but not protein MYC (also known as C-Myc) in the C. albicans SC5314 infection. Moreover, EED does not simultaneously trigger two TFs GATA1 and YBX1 from the result in Fig. 17.3. Instead, TF YBX1 is activated by protein AVEN. Due to different epigenetic modifications of MYC, MYC cannot trigger protein VCAM1 but further increase downstream protein interactions. For example, AVEN triggers MAPK14 (also known as p38, PRKM14) via VCAM1. Furthermore, receptor EGFR with the NAALADL2-induced acetylation and OTUD3-induced ubiquitination activates less proteins such as only AVEN. Compared to C. albicans SC5314 infection, EGFR triggers two proteins BPHL and AVEN. Therefore the activity of EGFR is limited to epigenetic modification for the result in Fig. 17.3. Moreover, BPHL binds to GRB2 under the C. albicans WO-1 infection via UBC. We can infer that UBC affected by the NAALADL2-induced acetylation and PPP4R4-induced phosphorylation could activate different interactions. By contrast, the receptor HSP90B1 cannot transmit stimulation signals to TF GATA1 due to the GAPDHP24induced phosphorylation. Therefore we can infer that phosphorylation will produce inhibition. Finally, the HGSNAT-induced acetylation and USP41-induced ubiquitination of receptor ARRB2 could activate downstream proteins such as LIPE, DHX9, and RAB12. Conversely, in the C. albicans SC5314 infection, ARRB2 with the same epigenetic modification could trigger different proteins such as SSR4 and C18orf8. It is noted that different epigenetic proteins could cause regulations of TF to strengthen the corresponding cellular response. Subsequently, when HSP90B1 with the GAPDHP24-induced phosphorylation further triggers the regulation of TF ETS1, HSP90B1 with ubiquitination will not activate TF ETS1 in the infection of C. albicans SC5314. We may speculate that ubiquitination modification may degrade a receptor to
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
444
17. Investigating the common pathogenic mechanism for drug design
reduce its regulation of TF. In addition, CCDC22 receives signal from receptor CDH1 influenced by the MTAP-induced methylation and USP47-induced ubiquitination and results in repression of different miRNAs such as mir210. In conclusion, distinct epigenetic modifications will lead to different host cellular responses from the exhibition of the results in Fig. 17.3. However, the detailed statements will be discussed in the next section.
17.4 Discussion From Figs. 17.3 and 17.4, we identified the host/pathogen cross-talk core pathways during the infection of C. albicans SC5314 and C. albicans WO-1 by systems biology method, respectively. Overall, there is a little difference between pathogenic mechanisms of two strains of C. albicans such as white-to-opaque switch. However, previous studies have indicated that C. albicans WO-1 can change white cell to opaque cell at high frequency compared to C. albicans SC5314 [854]. Therefore we may speculate from the information in Tables 17.1 and 17.2 that C. albicans even transform to opaque cell, but they will cause common pathogenic mechanism. In addition, C. albicans usually maintain in white cell in normal microenvironment. Perhaps opaque cell exists in difficult environment usually resulting in C. albicans maintenance of white cell. Previous studies have suggested that opaque cells keep own cell type in certain condition including carbon dioxide, anaerobic growth, and acidic (pH , 7) so that opaque cell can proceed sexual reproduction [855]. In addition to mentioned studies the MTLa locus of C. albicans WO-1 is absent in all strains of C. albicans. Therefore we also can imply that C. albicans WO-1 may transform opaque cell to white cell immediately due to external microenvironment. On the other hand, mutation and genetic diversity are easier for C. albicans SC5314 than C. albicans WO-1 [854]. From the information in Tables 17.1 and 17.2, C. albicans WO-1 is more stable than C. albicans SC5314. Hence, pathogen proteins within top 1500 projection values and host proteins with top 5000 projection values in PNP projection method [see Eq. (17.A40)] can be analyzed to discover common pathogenic mechanism. Next, by Figs. 17.2 and 17.3, we extracted the specific core pathways to investigate C. albicans infection mechanism of different strains, respectively. From outside host cell to inside the host cell, we can infer that C. albicans becomes harmful from harmless on the oral epithelial cell. Therefore we will extract Figs. 17.2 and 17.3 to the following three figures so that we could discuss the common and specific pathogenic mechanism of two strains of C. albicans and how to result in cell apoptosis and inflammatory response step by step. Probably, we could exploit common drugs based on their common pathogenic mechanism to treat the infection of different strains of C. albicans. Details will be discussed in the following subsections.
17.4.1 Defense mechanism of OKF6/TERT-2 cell and the offense mechanism of different strains of C. albicans at host cell surface As shown in Fig. 17.4A, C. albicans SC5314 is commensal on human oral epithelial cell. However, pathogen cell surface protein orf19.1816 (ALS3) binds to receptors CDH1 (also known as E-cadherin), ERBB2 (also known as HER2), EGFR and HSP90B1 so that these receptors will
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
(A)
(B)
Candida albicans SC5314
Hyphal growth
Cell adhesion
RSR1
Cell adhesion
RSR1
ALS3
BCR1
CPH1
Hyphal growth
Endocytosis
HWP1
Candida albicans WO-1
EFG1
Endocytosis
HWP1
BCR1
CPH1
SFL1
ALS3
EFG1
Pathogen
SFL1 FLO8
FLO8
SFL2 ALS3
EGFR
SFL2 ALS3
HWP1
ERBB2
HSP90B1
ERBB2
EGFR
CDH1
RER1
BPHL
Pathogen cell wall
HWP1
HSP90B1
Host plasma membrane
CDH1
RER1 HIST1H4B
HIST1H4B TMEM205
TMEM205 SSR4
SSR4 CCDC22
CCDC22
Host
GRB2
GRB2
UCN2
UCN2
SH2D1A
SH2D1A
AVEN
JUN
FOS
RAB12
RAB12
EED
EED
YBX1
JUN
GATA1
FOS
YBX1
PPARD
miR30B UCN2
UCN2 MMP12 AVEN
ROS production
ECM degradation
Innate immune response
Host nuclear membrane
GATA1
miR30B
Autophagy
PPARD
MMP12 AVEN
ROS production
ECM degradation
Innate immune response
Autophagy
FIGURE 17.4 The specific and common host defense mechanism in the infection of different strains of Candida albicans are extracted from Figs. 17.2 and 17.3. (A) OKF6/TERT-2 cells take a defense strategy against C. albicans SC5314 at the beginning of infection. (B) OKF6/TERT-2 cells take a defense strategy against C. albicans WO-1 at the beginning of infection. The red arrow lines represent transcriptional regulation; the gray solid lines signify the proteinprotein interaction; the green dot lines indicate the protein translation; the blue lines with circle endpoint represent miRNA repression. In (A), when C. albicans SC5314 infects OKF6/TERT-2 cells, host cells will produce immune response, autophagy, and ROS production to defend against pathogen. Moreover, C. albicans SC5314 begins to grow hyphae and adhere to host cell surface proteins so that it increases invasion for starving nutrient source. The induction of endocytosis is also beneficial to C. albicans SC5314 invasion. Similarly, in (B), when C. albicans WO-1 infects OKF6/TERT-2 cells, host cells will produce immune response, autophagy, and ROS production to defend against pathogen. Moreover, C. albicans WO-1 begins to grow hyphae and adhere to host cell surface proteins so that it increases invasion for starving nutrient source. The induction of endocytosis is also beneficial to C. albicans WO-1 invasion [13]. miRNA, MicroRNA; ROS, reactive oxygen species.
446
17. Investigating the common pathogenic mechanism for drug design
be degraded and induced in endocytosis. As the infection progresses, C. albicans SC5314 will invade into host cell through endocytosis and begin invasion. Receptor CDH1 with the NAT10induced acetylation activates to downstream TFs JUN (also known as c-Jun) and FOS (also known as c-Fos) through signaling transduction protein CCDC22, which is involved in trafficking between the trans-Golgi network and vesicles in the cell periphery. Therefore TF FOS affected by UBFD1-induced ubiquitination negatively regulates gene MMP12, which is positively regulated by TF JUN to result in the degradation of ECM. TF JUN affected by the USP12PX-induced ubiquitination and PRMT1-induced methylation could positively regulate gene PPARD so that host cells will produce ROSs to eliminate C. albicans SC5314. Moreover, HSP90B1 with the USP11-induced ubiquitination could bind to proteins TMEM205, which is related to chemotherapeutic agent, to trigger TF JUN. Previous studies have indicated that after accepting chemotherapeutic agent, C. albicans will invade host cell and recover itself in immunecompromised patients [822]. Next, receptor EGFR activates TF Jun via BPHL to regulate biological oxidations and hydrolase activity. Finally, the receptor ERBB2 triggers TFs YBX1 and GATA1 through a sequence of signaling transduction proteins RER1 to regulate ER protein, HIST1H4B, which is affected by the NEURL4-induced ubiquitination and PLA2G12B-induced phosphorylation involved histone binding, SSR4, GRB2 influencing cell death, UCN2, SH2D1A related to immune regulatory interactions, RAB12 involved in autophagy modulation, and EED. TF YBX1 with the SHMT1-induced methylation and USP24-induced ubiquitination negatively regulates an innate immune-related gene UCN2. TF GATA1 positively regulates gene AVEN which can eliminate C. albicans SC5314 by autophagy. However, because pathogen is not considered as danger signal, miR-30B represses gene AVEN to reduce autophagy function. Then, because gene MMP12 regulation results in ECM degradation such as collagen, vitronectin, laminin, and fibronectin, C. albicans SC5314 starves for nutrient or carbon sources. C. albicans SC5314 begins to grow hyphae and invade epithelial cells. Orf19.1816 (ALS3) can trigger TFs orf19.610 (EFG1) and orf19.723 (BCR1) by directly binding to these TFs. The TF orf19.610 can positively regulate gene orf19.1816 so that C. albicans can invade host cell and strengthen cell adhesion to anchor host cell surface. The TF orf19.723 (BCR1) can negatively regulate gene orf19.1321 (HWP1) which is related to the biofilm forming and cell adhesion. In addition, pathogen receptor orf19.1321 can activate downstream pathogen proteins orf19.3969 (SFL2) to regulate C. albicans morphogenesis through signaling transduction protein orf19.454 (SFL1) for triggering TF orf19.610 (EFG1), and orf19.1093 (FLO8) to the hyphal formation through regulating TFs orf19.610 (EFG1), orf19.723 (BCR1), and orf19.4433 (CPH1). The TF orf19.4433 (CPH1) also negatively regulates hyphae growthrelated gene orf19.2614 (RSR1). According to previous studies, yeast cell of C. albicans could damage macrophages via hyphae growth to reduce destruction of C. albicans [607]. As shown in Fig. 17.4B, C. albicans WO-1 is commensal on human oral epithelial cell. However, pathogen cell surface protein CAWG_2005 (ALS3) can bind to receptors CDH1 (also known as E-cadherin), ERBB2 (also known as HER2), HSP90B1 and EGFR so that these receptors will be degraded and induced in endocytosis. As infection progresses, C. albicans WO-1 will invade into host cell through endocytosis and begin invasion. Compared to C. albicans SC5314, the receptor CDH1 with the MTAP-induced methylation and USP47induced ubiquitination can activate to downstream TFs JUN (also known as c-Jun) and FOS (also known as c-Fos) through CCDC22, which is involved in trafficking between the transGolgi network and vesicles in the cell periphery. Therefore TF FOS with the NAA16-
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
17.4 Discussion
447
induced acetylation and USP31-induced ubiquitination positively can regulate gene MMP12 to result in degradation of ECM. TF JUN with the PPP3CA-induced phosphorylation and USP34-induced methylation can positively regulate gene PPARD to produce ROSs to eliminate C. albicans WO-1. Moreover, receptor HSP90B1 affected by the GAPDHP24-induced phosphorylation can bind to proteins TMEM205, which is related to chemotherapeutic agent, to trigger TF JUN. Previous studies have indicated that after accepting chemotherapeutic, C. albicans WO-1 will invade host cell again and recover itself in immune-compromised patients [822]. In addition, the receptor ERBB2 can trigger TF GATA1 through a sequence of signaling proteins RER1 to regulate ER protein, HIST1H4B which is influenced by the UBE2J2-induced ubiquitination and SSH2-induced phosphorylation involved histone binding, SSR4, GRB2 which could influence cell death, UCN2, SH2D1A which is related to immune regulatory interactions, RAB12 involved in autophagy modulation, and EED. Finally, comparing to C. albicans SC5314, the receptor EGFR with the NAALADL2-induced acetylation and OTUD3-induced ubiquitination then activates downstream protein AVEN to modulate TF YBX1. The TF YBX1, which is influenced by the OTUD3-induced ubiquitination and TPMTP2-induced methylation, positively regulates innate immune-related gene UCN2. TF GATA1 negatively regulates gene AVEN to eliminate C. albicans WO-1 by autophagy. However, MiR-30B represses gene AVEN to reduce autophagy function. Then, due to gene MMP12 being regulated to result in ECM degradation such as collagen, vitronectin, laminin, and fibronectin, C. albicans WO-1 starves for nutrient or carbon sources. C. albicans WO-1 begins to grow hyphae and invade host cell. CAWG_2005 (ALS3) triggers TF CAWG_02083 (EFG1) and CAWG_01948 (BCR1) by directly binding to these TFs. The TF CAWG_02083 (EFG1) can positively regulate gene CAWG_2005 (ALS3) so that C. albicans could invade host cell and strengthen cell adhesion to anchor cell. The TF CAWG_01948 (BCR1) can positively regulate gene CAWG_03451 (HWP1), which is related to biofilm forming and cell adhesion. In addition, pathogen receptor CAWG_03451 (HWP1) can activate downstream pathogen proteins CAWG_04849 (SFL2), which could regulate C. albicans morphogenesis through signaling transduction protein CAWG_01914 (SFL1) for triggering TF CAWG_02083 (EFG1); the receptor HWP1 can also trigger signaling protein CAWG_04944 (FLO8), which could influence on hyphal formation to modulate TF CAWG_02083 (EFG1), CAWG_01948 (BCR1), and CAWG_00682 (CPH1). The TF CAWG_00682 (CPH1) can also positively regulate hyphae growth-related gene CAWG_01560 (RSR1). According to previous researches, yeast cell of C. albicans could damage macrophages via hyphae growth to reduce destruction of C. albicans [607]. In conclusion, C. albicans SC5314 and WO-1 infect host cell at the beginning, then immune system of host cell can defense it. Although hyphae of C. albicans can destruct macrophage, host cells combine other cellular mechanisms to resist it such as autophagy, ROS production, and immune response. At this moment, C. albicans SC5314 and WO-1 could be considered as a commensal pathogen of adhesion stage at the host cell. Furthermore, in different strains, gene expressions of MMP12 and PPARD under the infection of C. albicans SC5314 are higher than C. albicans WO-1. By contrast, gene expression of AVEN under the infection of C. albicans SC5314 is lower than C. albicans WO-1. From the experimented data, gene PPARD has a significant change of expression in the infection progression of both strains. As a result, the degradation intensity of C. albicans SC5314 is stronger than that of C. albicans WO-1. Relatively, ROS production in the infection of C. albicans SC5314 is also more powerful than C. albicans
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
448
17. Investigating the common pathogenic mechanism for drug design
WO-1. Hence, C. albicans WO-1 is easily eliminated by regulating the expressing of host gene AVEN. In addition, the expression levels of CDH1 and ERBB2 under the infection of C. albicans SC5314 are higher than C. albicans WO-1. In this situation the infection of C. albicans SC5314 has induced a relatively strong endocytosis and epigenetic modification. But the infection of C. albicans WO-1 will bring about more protein folding of host cell by higher expression HSP90B1 due to phosphorylation. Namely, compared to C. albicans SC5314, host cell with infection of C. albicans WO-1 might lead to more misfolded protein formation.
17.4.2 OKF6/TERT-2 cell confronts different strains of C. albicans by strong reactive oxygen species and microenvironment response As shown in Fig. 17.5A, due to the degradation of ECM, C. albicans proceeds to invasion stage, and host cell membrane begins to be destructed by the hyphal of C. albicans SC5314. More and more ROSs are generated by host cell. To Fig. 17.5A, pathogen cell surface protein orf19.1816 (ALS3) binds to the host cell receptor EGFR so that the receptor EGFR will be degraded and induced in ROS production of host cell. After receiving the corresponding signal the receptor EGFR triggers TFs NFKB1 (subunit of NF-κB) and GATA1 to regulate biological oxidations and hydrolase activity through a sequence of signaling transduction proteins BPHL, UBC with the OTUB1-induced ubiquitination and INPP4B-induced phosphorylation, LIPE involved in lipid metabolism and bound to DHX9 and MAPK1 (also known as p38, ERK, PRKM1), DHX9 which could mediate TLR4 signal and NF-κB activation, MAPK1 which participates with EGFR-related signaling pathway to regulate cell survival and differentiation, UCN2 mentioned earlier, SH2D1A referred to the previous section, RAB12 mentioned earlier, EED, and C18orf18. The TF GATA1 positively regulates immune-related gene IL20 to recruit more macrophage and proinflammatory cytokine. Another TF NFKB1 negatively regulates fungus infection-related gene DEFB4A. Since C. albicans are Gram-positive fungus, gene DEFB4A negatively regulated by TF NFKB1 results in resisting pathogen invasion and inflammation finally. In addition, some of C. albicans SC5314 still stay at cell surface because they would like to form colony morphology by hyphal growth and yeast cell. However, pathogen membrane protein orf19.1059 (HHF1), which is related to colony morphology, interacts with the host cell receptor IL15RA. After the receptor receives the signaling, IL15RA binds to MYC. But, MYC with the NAALADL1-induced acetylation and PDPR-induced phosphorylation could activate a downstream signaling protein AVEN for triggering the TF ETS1. The TF ETS1 could positively regulate genes BPHL mentioned previously and CCDC22 involved in the activation of proinflammatory NF-κB signaling. Therefore genes CCDC22 and DEFB4A will generate inflammatory response by the regulation of TFs ETS1 and NFKB1, respectively. Nonetheless, inflammatory response in the invasion phase will be not overreaction due to the repression of miR1979-2 in response to gene CCDC22. However, pathogen starts to antagonize the immune system of host cell. After orf19.1059 (HHF1) interacts with the host cell receptor IL15RA, it will trigger TF orf19.1623 (CAP1) through a sequence of downstream signaling transduction proteins, orf19.3954 (PSD2) with the orf19.1631-induced methylation, orf19.1631 (ERG6) affected by the orf19.3964-induced methylation and bound to orf19.3964 directly, orf19.6082 influenced by the orf19.169-induced (CHO2) methylation, orf19.3964 (ASH2) with the orf19.1631-induced
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
(A)
(B)
Candida albicans SC5314 DNA damage response
Hyphal growth
Response to ROS
Candida albicans WO-1 DNA damage response
Response to ROS
Biofilm formation
Hyphal growth
Biofilm formation
White-opaque switch WOR1
orf19.666
SUV3
ARP2
HHF22
HHF22
orf19.666
SUV3
ARP2 orf19.3292
orf19.3292
CAP1
TEC1
CAP1
EFG1
YBP1
TEC1
CO2
EFG1 WOR1
YBP1
PH PH GCN5
GCN5
orf19.581
ASH2
PRA1
SFL1
ASH2
Pathogen
HHF22
orf19.6082
orf19.581 SAS2
PRA1 SFL1
Pathogen
HHF22
orf19.6082
NAM7
NAM7
UGT51C1
UGT51C1
ERG6
ERG6
FLO8
SFL2
FLO8
SFL2
PSD2
PSD2
HHF1
ALS3
HWP1 ROS
IL15RA
Pathogen cell wall
HHF1
Pathogen cell wall
HWP1 ROS
Host plasma membrane
EGFR BPHL
ALS3
IL15RA
Host plasma membrane
ERBB2 RER1 UBC
UBC MYC
MYC
LIPE MAPK1
UCN2
Host
UCN2 AVEN
SH2D1A C18orf8
GRB2
MAPK1
Host
DHX9 AVEN
BPHL
LIPE
DHX9
SH2D1A
RAB12
RAB12 C18orf8 EED
EED ETS1 miR548D2
miR143HG
BPHL
ROS production
NFKB1
GATA1
miR1972-2
CCDC22
DEFB4A
Inflammatory response
IL20
Innate immune response
Host nuclear membrane
ETS1 miR548D2
miR143HG
NFKB1
GATA1
miR210
BPHL
ROS production
CCDC22
DEFB4A
Inflammatory response
Host nuclear membrane
IL20
Innate immune response
FIGURE 17.5 The continuous ROS and stress production as defense mechanism in host cells, and the corresponding anti-ROS and offense mechanism of Candida albicans. (A) OKF6/TERT-2 cells antagonize in C. albicans SC5314 invasion. (B) OKF6/TERT-2 cells antagonize in C. albicans WO-1 invasion. The red arrow lines represent transcriptional regulation; the gray solid lines signify the proteinprotein interaction; the green dot lines indicate the protein translation; the blue lines with circle endpoint represent miRNA repression. The circles with purple frame and arrow lines represent the production activity and response of ROS. In (A), because more C. albicans SC5314 invade gradually, host cell continues ROS production and increases stress on C. albicans SC5314. Therefore C. albicans SC5314 needs to execute DNA damage response and resist ROS. Next, C. albicans SC5314 performs hyphae growth function to form biofilm continually. Due to hyphal growth, host cells are oppressed to cause inflammation response and cellular stress. At this time, OKF6/TERT-2 cells stay at an unbalance status. In (B), because more C. albicans WO-1 invades gradually, host cell continues ROS production and increases stress. Therefore C. albicans WO-1 need to execute DNA damage response and resist ROS. Next, C. albicans WO-1 performs hyphae growth function to form biofilm continually. Due to hyphal growth, host cells are oppressed to cause inflammation response and cellular stress. At this time, OKF6/TERT-2 cells stay at an unbalance status. Eventually, C. albicans WO-1 senses carbon dioxide to transform white cell [13]. miRNA, MicroRNA; ROS, reactive oxygen species.
450
17. Investigating the common pathogenic mechanism for drug design
methylation and orf19.705-induced acetylation, orf19.705 (GCN5) and orf19.5034 (YBP1) stabilizing TF orf19.1623 (CAP1) and reacting to oxidative stress. Another orf19.1059related (HHF1) pathway binds to orf19.1854 (HHF22) through interacting with signaling protein orf19.2616 (UGT51C1) to modulate TF orf19.1623 (CAP1). Orf19.1631 (ERG6) protein is related to drug resistance such as azole-resistance and induction of fluconazole by own methyltransferase. The TF orf19.1623 (CAP1) can positively regulate hyphae growth and biofilm-related gene orf19.4519 (SUV3) to confront ROS production by host cell because C. albicans SC5314 could form colony at host cell surface. Moreover, orf19.1816 (ALS3) interacts with the TF orf19.610 (EFG1) so that orf19.610 can positively regulate biofilm and drug resistance-related of gene orf19.1854 (HHF22). Finally, pathogen receptor orf19.1321 (HWP1) activates TFs orf19.5908 (TEC1) through a sequence of signaling transduction proteins, orf19.3969 (SFL2) and orf19.454 (SFL1) which also triggers TF orf19.610 (EFG1). The TF orf19.5908 (TEC1) can positively regulate gene orf19.666 and negatively regulate orf19.1854 (HHF22), respectively. Due to host defense mechanism and microenvironment, orf19.666 will induce in DNA damage response so that C. albicans SC5314 will progress DNA repair itself. However, orf19.1321 (HWP1) can execute some cellular functions without activating TFs. From the microenvironment, pathogen receptor orf19.1321 (HWP1) binds to orf19.939 (NAM7) via orf19.1093 (FLO8) to stimulate orf19.3111 (PRA1) which results in hyphae growth directly by neutral and alkaline pH. Previous researches have suggested hyphal can grow and elongate in neutral and alkaline pH condition [855]. Therefore orf19.3111 (PRA1) will promote the elongation of hyphae so that it will form biofilm. Moreover, orf19.3111 (PRA1) is not only influenced by microenvironment but also activated by orf19.2575-induced methylation of orf19.581 for increasing hyphal growth. Eventually, not only host defense mechanism but also host miRNAs withstand C. albicans SC5314 invasion. Since orf19.3292 is a peptide-methionine (R)-S-oxide reductase, it will combine with thioredoxin to decrease ROS concentration. Therefore MiRNA548D2 could inhibit orf19.3292 to strength ROS and eliminate pathogen. Moreover, orf19.7292 (ARP2) is composed of Arp2/3 complex required for virulence, hyphal growth, and cell wall/cytoskeleton organization. Then, MiRNA143HG will silence gene orf19.7292 (ARP2) to reduce hyphae growth and further prevent forming biofilm. As shown in Fig. 17.5B, due to the degradation of ECM, C. albicans proceeds to the invasion stage, and host cell membrane begins to be destructed by the hyphal of C. albicans WO-1. More and more ROSs are generated by host cell. In Fig. 17.5B the pathogen cell surface protein CAWG_2005 (ALS3) binds no longer to EGFR but to host cell receptor ERBB2 so that receptor ERBB2 will be degraded and induced in the ROS production. After receiving the corresponding signal the receptor ERBB2 triggers TFs NFKB1 (subunit of NF-κB) and GATA1 through a sequence of signaling transduction proteins, RER1, UBC with NAALADL2-induced acetylation and PPP4R4-induced phosphorylation, BPHL to regulate biological oxidations and hydrolase activity and the binding to GRB2, LIPE involved in lipid metabolism and bound to DHX9 and MAPK1 (also known as p38, ERK, PRKM1), DHX9 mediating the TLR4 signal and NF-κB activation, MAPK1, which participates in the EGFR-related signaling pathway and the regulation of cell survival and differentiation, UCN2 accepting signals GRB2 and MAPK1, SH2D1A referred to the previous section, RAB12 mentioned earlier, EED, and C18orf18. Compared to C. albicans SC5314, receptor
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
17.4 Discussion
451
ERBB2 instead of receptor EGFR can activate BPHL-related pathway to strengthen ROS production via the modification of epigenetic signals UBC. The TF GATA1 could positively regulate immune-related gene IL20 to recruit more macrophage and proinflammatory cytokine. Another TF NFKB1 could negatively regulate fungus infections-related gene DEFB4A. Since C. albicans are Gram-positive fungus, gene DEFB4A negatively regulated by TF NFKB1 can result in resisting pathogen invasion and inflammation finally. In addition, some of C. albicans WO-1 still stay cell surface because they would like to form a colony morphology by hyphal growth. However, pathogen membrane protein CAWG_00969 (HHF1), which is related to colony morphology, interacts with the host cell receptor IL15RA. After the receptor receives the signaling, IL15RA binds to signaling transduction protein MYC with the ELP6-induced acetylation and MBD5-induced methylation to activate the downstream protein AVEN and interact with RAB12 for triggering the TFs ETS1 and GATA1. The TF ETS1 can positively regulate gene BPHL as mentioned previously and negatively regulate CCDC22, involved in the activation of proinflammatory NF-κB signaling. Therefore genes CCDC22 and DEFB4A will generate inflammatory response through the regulation of TFs ETS1 and NFKB1, respectively. Nonetheless, inflammatory response in the phase will be not overreaction due to the repression of miR210 in response to gene CCDC22. Compared to C. albicans SC5314, MYC regulated by the methylation instead of phosphorylation can activate RAB12 for triggering the downstream TFs. However, pathogen begins to antagonize the immune system of host cell. After CAWG_00969 (HHF1) with the CAWG_03659-induced (NAT4) acetylation interacts with the receptor IL15RA, it will trigger TF CAWG_02548 (CAP1) through the downstream signaling transduction proteins CAWG_01562 (UGT51C1), CAWG_01979 (HHF22) with CAWG_03824induced acetylation, CAWG_03824 (SAS2) with the CAWG_1970-induced acetylation, the CAWG_01970 (GCN5) with the CAWG_03824-acetylation and CAWG_01059 (SPP1) methylation, and CAWG_00057 (YBP1) which is influenced by the CAWG_01970-induced acetylation to stabilize TF CAWG_02548 (CAP1) and react to oxidative stress so that it will grow hyphae. Compared to C. albicans SC5314, another CAWG_00969-related (HHF1) pathway cannot trigger TF CAWG_02548. In addition, both CAWG_01970 (GCN5) and CAWG_00057 (YBP1) are influenced by acetylation to strengthen the modulation of TF CAWG_02548 (CAP1) to induce hyphae growth or elongation. Next, our results indicate that CAWG_00969 (HHF1) through the following signaling transduction proteins, CAWG_04836 (PSD2) with the CAWG_02542induced methylation, CAWG_02542 (ERG6) with the CAWG_05150-induced methylation (SWD3), CAWG_01344 (orf19.6082) with the CAWG_01594-induced methylation (CHO2), CAWG_04844 (ASH2) with the CAWG_01059-induced methylation (SPP1), is not enough to interact with the downstream protein CAWG_1970 (GCN5). We can infer that CAWG_04844 (ASH2) may not receive enough epigenetic modifications or other protein activations but CAWG_02542 (ERG6) protein is still related to drug resistance by its own methyltransferase and cannot influence the drug resistance of endurance. In addition, the influence of ROS will be decreased by CAWG_04844 (ASH2) repression. The TF CAWG_02548 (CAP1) can negatively regulate hyphae growth and biofilm-related gene CAWG_04191 (SUV3) to confront ROS production by host cell because C. albicans WO-1 could form colony at cell surface. Moreover, CAWG_02005 (ALS3) interacts with the TF CAWG_02083 (EFG1) so that CAWG_02083 (EFG1) can positively regulate biofilm and the drug resistance-related gene CAWG_01979 (HHF22) and the main activation of white-to-opaque switch of gene
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
452
17. Investigating the common pathogenic mechanism for drug design
CAWG_00418 (WOR1). Finally, pathogen receptor CAWG_03451 (HWP1) can activate TFs CAWG_02766 (TEC1) through signaling transduction proteins, CAWG_04849 (SFL2) and CAWG_01914 (SFL1) which also triggers TF CAWG_2083 (EFG1). The TF CAWG_02766 (TEC1) can positively regulate gene CAWG_00299 (orf19.666). Due to host defense mechanism and microenvironment, CAWG_00299 (orf19.666) will generate DNA damage response so that C. albicans WO-1 will progress DNA repair itself. However, CAWG_03451 (HWP1) can execute some cellular functions without activated TFs. From the microenvironment, pathogen receptor CAWG_03451 (HWP1) can bind to CAWG_04444 (NAM7) via interacting with CAWG_04944 (FLO8) to stimulate CAWG_03130 (PRA1) which results in hyphae growth directly by pH. Previous researches have suggested hyphal can grow and elongate in neutral and alkaline pH condition [855]. Therefore CAWG_03130 (PRA1) will promote elongation hyphae to form biofilm. Moreover, CAWG_04472 (orf19.581) with the CAWG_01529-induced methylation simultaneously increases the expression level of CAWG_00418 (WOR1) and CAWG_03130 (PRA1) for triggering the hyphal growth and white-to-opaque switch. Nonetheless, CAWG_00418 (WOR1) will not only be influenced by the CO2-induced or acidic microenvironment but also be activated by protein CAWG_04944 (FLO8) indirectly which is also related to the CO2-induced white-opaque switch and virulence. Eventually, not only host defense mechanism but also host miRNAs withstand C. albicans WO-1 invasion. Since CAWG_01270 (orf19.3292) is a peptide-methionine (R)-S-oxide reductase, it will be combined with thioredoxin to decrease ROS concentration. Therefore MiRNA548D2 inhibits orf19.3292 to strength ROS and eliminate pathogen. Moreover, orf19.7292 (ARP2) is composed of Arp2/3 complex required for virulence, hyphal growth, and cell wall/cytoskeleton organization. Then, MiRNA143HG silences gene CAWG_02173 (ARP2) to reduce hyphae growth and further prevent forming biofilm. In the invasion phase the common pathogenic mechanisms of C. albicans SC5314 and C. albicans WO-1 are hyphae growth and elongation to further form biofilm. In addition, C. albicans SC5314 and C. albicans WO-1 can resist ROS and immune cells such as macrophages and neutrophils. However, host cell stays at an unstable condition between acidic and alkaline PH. Because of lipid metabolism and ROS response, lipid depletion can cause CO2 production. In addition, the overreaction of ROS also produces CO2 and hydrogen ion so that it will generate acidic substance. With these materials, host cell becomes acidic gradually. So, C. albicans WO-1 exploits these materials to switch white cell. But acidic substance production represses hyphal growth rather than to activate. In conclusion, host cell has to need a balance between acidic and neutral pH. Due to more and more hyphae to oppress the host cell and to penetrate cytoplasm, host cell generates inflammatory response gradually recruit more cytokines to eliminate C. albicans. At this moment, C. albicans is still eliminated because of immature biofilm. Moreover, by the exhibition of experimental data, gene expressions of BPHL and DEFB4A under C. albicans SC5314 are higher than C. albicans WO-1. By contrast, gene expression of IL20 under C. albicans SC5314 is lower than C. albicans WO-1. In addition, gene IL20 has a significant change of expression in the initial infection. Therefore the innate immune response also plays an important role in this stage so that C. albicans need to be perished. However, under C. albicans SC5314, host cell is easier to resist pathogen by the host gene expression. On the contrary, C. albicans WO-1 is difficult to be antagonized so that it will invade host cell rapidly. In addition, the expression of IL15RA in C. albicans WO-1 condition is higher than C. albicans SC5314 from the experimental data
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
17.4 Discussion
453
experiment. It suggests that C. albicans WO-1 simply binds to IL15RA so that C. albicans WO1 quickly forms cell colony. As a result, C. albicans WO-1 is not easier to be wiped out. Although host cell is anxious to rub out C. albicans, it will stay at an unbalance state such as pH changing and CO2 production. Hence, pathogen protein orf19.3111 (PRA1) will induce hyphal growth even in different strains. From the experimented data, orf19.3111 (PRA1) has a significant change of expression in the infection.
17.4.3 Released pathogenic factor and accumulated cellular response result in apoptosis and inflammatory response further leading to necrosis As shown in Fig. 17.6A, due to numerous invasion of C. albicans SC5314 and over ROS production, cellular stress becomes larger gradually. Because of overfull ROS, host cell leads to producing a more inflammatory response. From the pathogen cell surface protein orf19.1816 (ALS3) binding to these host receptors EGFR, TJAP1, and HSP90B1, these receptors will finally cause apoptosis and inflammatory response. First, the receptor EGFR triggers TF ETS1 via signaling protein AVEN which is modulated by MYC with the NAALADL1induced acetylation and PDPR-induced phosphorylation. TF ETS1 positively regulates gene CCDC22. Second, the receptor TJAP1 (also known as TJP4) activates TFs ETS1, JUN, and GATA1 through a sequence of signaling transduction proteins, MAPK6 (also known as ERK3), TNFAIP8L1, MRPL50, which is related to the maintenance of cell organelle, MYC, which is affected by the NAALADL1-induced acetylation and PDPR-induced phosphorylation and can activate downstream three proteins VCAM1, AVEN, and GPR89A interacting with VCAM1 and reducing intracellular pH, HMGN1P4 with KDM4A-induced methylation, MAPK14, CTSH, AVEN, and EED. The TF JUN with the USP12PX-induced ubiquitination and PRMT1-induced methylation can negatively regulate gene PPARD to influence ROS production and apoptosis and gene MMP12 to influence ECM degradation. The TF GATA1 can positively regulate the inflammation-related gene TNFAIP8L1 and apoptosis-related gene AVEN, respectively. Eventually, the receptor HSP90B1 with the USP11-induced ubiquitination interacts with TF FOXA1 directly. The TF FOXA1 can negatively regulate apoptosisrelated gene SERPINF1. In addition, another pathway also activates TF GATA1 because of ARRB2. When the pathogen membrane protein orf19.578 binds to the receptor ARRB2, which is affected by the HDAC1-induced acetylation and HERC5-induced ubiquitination, ARRB2 then activates TF GATA1 via the signaling transduction pathway including SSR4, GRB2 which could induce cell death, UCN2, SH2D1A, RAB12, and VCAM1-activating EED. Not only these receptors but also complement system of receptor C3 (also known as C3a and C3b) will induce inflammatory response. Because C. albicans SC5314 releases proteases orf19.5542 (SAP6) and orf19.5585 (SAP5), C3 will be degraded by orf19.5542 (SAP6) and triggers IL1B (also known as IL-1β). IL1B is affected by the PPP2CA-induced phosphorylation and bound to both orf19.5542 (SAP6) and orf19.5585 (SAP5) directly to lead to inflammatory response and cell apoptosis. However, as a result of over inflammatory reaction and cellular stress, miRNAs must repress this response. MiR1979-2 and miR3941 inhibit genes CCDC22 and IL1B, respectively. Next, miR-30B represses AVEN to reduce apoptosis. Nevertheless, host cell decreases apoptosis but C. albicans SC5314 invades continuously. Orf19.1816 (ALS3) binds to TF orf19.610 (EFG1) directly
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
454
17. Investigating the common pathogenic mechanism for drug design
(A)
(B)
Candida albicans SC5314
Hyphal growth
Candida albicans WO-1
Biofilm formation
Hyphal growth
Endocytosis
DNA damage response
Hydrolase release
orf19.666 HHF22
RSR1
RSR1 orf19.578
HHF22
SAP5
SAP6
ALS3 CPH1
TEC1
White-opaque switch
Hydrolase release
orf19.666
SAP6
Endocytosis
DNA damage response
orf19.578
ALS3
Biofilm formation
WOR1
SAP5
EFG1 CPH1
TEC1
EFG1
SFL1
SFL1
NDT80 DLH1
Pathogen
orf19.666
Pathogen
FAR1
FAR1
CDC42
SFL2
CDC42
SFL2
RSR1 SAP6
RSR1
SAP6
SEC15 SAP5 ALS3
HWP1
orf19.578
SEC15
Pathogen cell wall
SAP5
ALS3
SAP5
Pathogen cell wall
orf19.578
HWP1
SAP6
SAP6
SAP5
EGFR
HSP90B1
TJAP1
C3
ARRB2
MAPK6
EGFR
HSP90B1
TJAP1
C3
ARRB2
MAPK6
SSR4
TNFAIP8L1
Host plasma membrane
Host plasma membrane
IL1B IL1B
TNFAIP8L1 MRPL50
GRB2
MRPL50
TMEM205
Host
MYC MYC UCN2
GPR89A
AVEN
Host
SH2D1A
VCAM1
GPR89A
AVEN
VCAM1 RAB12
HMGN1P4
RAB12
HMGN1P4 MAPK14 EED
MAPK14 EED
CTSH
ETS1
JUN
FOXA1
Host nuclear membrane miR3941
GATA1
miR1972-2
miR30B SERPINF1
CCDC22
CTSH
ETS1
JUN
FOXA1
GATA1 miR30B
MMP12
TNFAIP8L1 PPARD
Host nuclear membrane miR3941
miR210 AVEN
IL1B
TNFAIP8L1 VMP1
PPARD AVEN
IL1B
CCDC22 HSP90B1
Apoptosis
Autophagy
ROS production Inflammatory response
ECM degradation
Innate immune response
Inflammatory response
ROS production
Apoptosis
FIGURE 17.6 Candida albicans could release pathogenic factor and the accumulated cellular stress in host cell could result in apoptosis and inflammatory response leading to necrosis. The red arrow lines represent transcriptional regulation; the gray solid lines signify the proteinprotein interaction; the green dot lines indicate the protein translation; the blue lines with circle endpoint represent miRNA repression. In (A), C. albicans SC5314
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
17.4 Discussion
455
L
so that C. albicans SC5314 carries out cellular functions quickly such as biofilm formation and endocytosis. The TF orf19.610 (EFG1) can positively regulate the endocytosis-related gene orf19.1816 (ALS3) and biofilm-related gene orf19.1854 (HHF22). In addition, orf19.1321 (HWP1) interacts with orf19.3969 (SFL2) to modulate TF orf19.4433 (CPH1). Then, TF orf19.4433 (CPH1) can positively regulate the endocytosis-related gene orf19.578, DNA damage-related gene orf19.666, hydrolytic activity, and biofilm-related genes orf19.5542 (SAP6) and orf19.5585 (SAP5) but negatively regulate hyphal growth-related gene orf19.2614 (RSR1). Therefore C. albicans SC5314 can invade or damage host cell largely. Apart from this, pathogen membrane protein orf19.578 can trigger TF orf19.5908 (TEC1) through a sequence of downstream signaling transduction proteins orf19.1418 (SEC15) involved in hyphae branch growth, orf19.2614 (RSR1) featuring the same as orf19.1418 (SEC15), orf19.390 (CDC42) keeping hyphal growth and Rho-type GTPase activity, orf19.7105 (FAR1), orf19.666, orf19.3760 (DLH1), orf19.2119 (NDT80) for the wild-type drug resistance, and orf19.454 (SFL1) receiving orf19.3969 (SFL2) to regulate morphogenesis function through the activation of TFs orf19.610 (EFG1) and orf19.5908 (TEC1). TF orf19.5908 (TEC1) negatively regulates genes orf19.1854 (HHF22), which is involved in forming biofilm, and orf19.666. However, due to numerous invasions of C. albicans SC5314 and biofilm formation, C. albicans SC5314 can execute DNA repair easily via orf19.666 regulated by two TFs to recover chromosome of C. albicans. Through these cellular functions especially forming biofilm, C. albicans SC5314 finally causes apoptosis and cell necrosis. Due to the hyphal of C. albicans SC5314, it will actively penetrate and destroy mitochondrial or nuclear. Following the process, hyphae will cross each other and then form biofilm. No matter where to form biofilm outside or inside the host cell membrane, biofilm will lead to cell necrosis and stress response because host cell covered by C. albicans SC5314 stays at the hypoxia and acidic condition. Therefore C. albicans SC5314 would like to do sexual reproduction and be not subject to drug control. Previous studies have signified that biofilm of C. albicans can antagonize drug so that it cannot be eliminated and yields drug resistance of C. albicans [856]. In addition, ROS production will not eliminate C. albicans because previous studies have indicated biofilm formation could against it easily [857]. Therefore oxidative stress via generating ROS will be harmful to host cell. Not only producing antidrug pathogen but also switching white cell to opaque cell can generate more yeast cells by forming biofilm. In addition, while host membrane is covered by the biofilm formation of C. albicans, host cell then emerges more cell stress and inflammatory response
brings about numerous invasions, and host cells generate more and more inflammatory response via hyphal elongation. Following this, hyphae generate oppression on host cell so that host cell triggers apoptosis function of host cell. C. albicans SC5314 releases pathogenic factor to further involve in inflammatory response and apoptosis. Due to ER stress on the whole host cell, it influences the risk of the survival of host cell. Nonetheless, C. albicans SC5314 will form biofilm easily and enforce sexual reproduction so that it will output more yeast cells in the host cell. Next, yeast cells can colonize or invade other host cells. In (B), C. albicans WO-1 brings about numerous invasions, and host cells generate more and more inflammatory response via hyphal elongation. Following this, hyphae generate oppression on host cell so that host cell triggers apoptosis function of the host cell. C. albicans releases pathogenic factor to further involve in inflammatory response and apoptosis. Due to ER stress on the whole host cell, it influences the risk of the survival of host cell. Nonetheless, C. albicans WO-1 will form biofilm easily and enforce sexual reproduction so that it will output more yeast cells in the host cell. Next, yeast cells can colonize or invade other host cells. Finally, the whole microenvironment can produce more another cell type such as opaque cell because the whole host cell stays at anaerobic or acidic condition via the last response. Therefore C. albicans WO-1 yields more opaque cells [13]. miRNA, MicroRNA.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
456
17. Investigating the common pathogenic mechanism for drug design
caused by pathogenic factor and hyphae. Finally, host cell accumulates much cell stress and then causes cell apoptosis. Moreover, due to damaging host cell, inflammatory response will cause necrosis by hyphae. As shown in Fig. 17.6B, due to numerous invasions of C. albicans WO-1 and over ROS production, cellular stress becomes larger gradually. Because of overfull ROS, host cell leads to producing more inflammatory response. As pathogen cell surface protein CAWG_02005 (ALS3) binds to host cell membrane receptors EGFR, TJAP1, HSP90B1, and ARRB2, these host receptors will finally cause apoptosis and inflammatory response. First, receptor EGFR with the NAALADL2-induced acetylation and OTUD3-induced ubiquitination can trigger TF ETS1 via signaling transduction protein AVEN that is also modulated by MYC, which is influenced by the ELP6-induced acetylation and MBD5-induced methylation. TF ETS1 can negatively regulate genes CCDC22 and HSP90B1. By contrast, receptor HSP90B1 will bring about folding protein and immune response under C. albicans WO-1 infection. Second, the receptor TJAP1 (also known as TJP4) can activate TF FOXA1 through a sequence of signaling transduction proteins, MAPK6 (also known as ERK3), TNFAIP8L1, and MRPL50 related to the maintenance of cell organelle. The TF FOXA1 positively regulates autophagy-related gene VMP1. However, compared to C. albicans SC5314, autophagy function will increase cellular stress at this phage. Eventually, the receptor HSP90B1, which is affected by the GAPDHP24-induced phosphorylation, triggers TF JUN through signaling transduction protein TMEM205. The TF JUN with the PPP3CA-induced phosphorylation and USP34induced methylation can positively regulate the apoptosis-related gene PPARD. Finally, as CAWG_04469 (orf19.578) binds to the receptor ARRB2, ARRB2 with the HGSNAT-induced acetylation and USP41-induced ubiquitination can activate TF GATA1 through proteins RAB12 involved in autophagy modulation, and EED in the signal pathway. Compared to C. albicans SC5314, receptor ARRB2 can trigger TF GATA1 with less signaling proteins since it will be influenced by a stronger epigenetic modification. The TF GATA1 can negatively regulate the apoptosis-related gene AVEN and inflammatory-related gene TNFAIP8L1, respectively. In addition, another pathway also activates TFs JUN, FOXA1, and GATA1. MYC with epigenetic modification can activate TFs JUN, FOXA1, and GATA1 through signaling transduction proteins GPR89A activating TFs FOXA1 and connecting of VCAM1, which activates TF GATA1 via binding EED, HMGN1P4 with the USP25-induced ubiquitination, MAPK14, and CTSH. In Fig. 17.6B, MYC with methylation can activate to interact with RAB12 under the infection of C. albicans WO-1 rather than C. albicans SC5314. Not only these receptors but also complement system of receptor C3 (also known as C3a and C3b) will induce inflammatory response. Because C. albicans WO-1 releases proteases CAWG_05066 (SAP5) and CAWG_05098 (SAP6), C3 will be degraded by CAWG_05098 (SAP6) to trigger IL1B (also known as IL-1β). IL1B is influenced by the PPP1R15A-induced phosphorylation and bound to both CAWG_05066 (SAP5) and CAWG_05098 (SAP6), which directly leads to inflammatory response and cell apoptosis. However, as a result of over inflammatory reaction and cellular stress, miRNAs must repress inflammatory response. The only difference about miRNAs in the infection of C. albicans WO-1 is characterized by mir210 silencing CCDC22. Nevertheless, host cell decreases apoptosis but C. albicans WO-1 invades continuously. CAWG_02005 (ALS3) binds to TF CAWG_02083 (EFG1) directly so that C. albicans WO-1 carries out cellular
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
17.4 Discussion
457
functions quickly such as biofilm formation and endocytosis. The TF CAWG_02083 (EFG1) could positively regulate the endocytosis-related gene CAWG_02005 (ALS3), biofilmrelated gene CAWG_01979 (HHF22), and white-to-opaque switch main gene CAWG_00418 (WOR1). In addition, receptor CAWG_03451 (HWP1) can modulate TFs CAWG_00682 (CPH1), CAWG_02083 (EFG1), and CAWG_02766 (TEC1), which are mediated by proteins CAWG_04849 (SFL2) and CAWG_01914 (SFL1). Then, TF CAWG_00682 (CPH1) could positively regulate the hyphal growth-related gene CAWG_01560 (RSR1), hydrolytic activity, and biofilm-related genes CAWG_05098 (SAP6) and CAWG_05066 (SAP5) but negatively regulate the endocytosis-related gene CAWG_04469 (orf19.578). By contrast, TF CAWG_00682 (CPH1) does not regulate the DNA damage-related gene CAWG_00299 (orf19.666) under the infection of C. albicans WO-1. We infer that CAWG_00682 (CPH1) is not activated indirectly by cell cycle proteins such as CAWG_05375 (FAR1) and CAWG_03794 (NDT80). Since these proteins may execute cellular functions in the white cell, C. albicans WO-1 transforms opaque cells to further reduce proteins expression. The TF CAWG_02766 (TEC1) could positively regulate the DNA damage-related gene CAWG_00299 (orf19.666) continuously. Compared to SC5314, CAWG_02766 (TEC1) does not regulate gene CAWG_1979 (HHF12). Perhaps, white cells of C. albicans WO-1 mostly transform to opaque cells so that TF CAWG_02766 (TEC1) could reduce the modulation of CAWG_1979 (HHF12). Hence, C. albicans WO-1 can invade or damage host cell by different types of C. albicans largely. However, compared to C. albicans SC5314, CAWG_04469 (orf19.578) does not trigger any TFs. But CAWG_04469 (orf19.578) strengthens downstream proteins to execute hyphal growth function indirectly. By interacting with CAWG_04469 (orf19.578), CAWG_03378 (SEC15) could enhance the expression of gene CAWG_01560 (RSR1). CAWG_00581 (CDC42) via CAWG_04469 (orf19.578) and CAWG_05375 (FAR1) could also increase the gene expression of CAWG_01560 (RSR1) to elongate and grow hyphae. Through these cellular functions especially forming biofilm, C. albicans WO-1 finally causes apoptosis and cell necrosis. Due to hyphae form of C. albicans WO-1, it will actively penetrate and destroy mitochondria or nuclear of host cell. Following the infection, hyphae will cross-talk each other and then form biofilm. No matter where to form biofilm outside or inside the host cell, biofilm will cause cell necrosis because host cell covered by C. albicans WO-1 stays at the hypoxia and acidic condition. According to the research in Ref. [823], C. albicans WO-1 could transform white cell to opaque easily. Therefore, compared to C. albicans SC5314, C. albicans WO-1 would like to do sexual reproduction and be not subject to drug controlling easily. Previous studies have signified that C. albicans of biofilm can antagonize drug so that it does not be eliminated and yields drug resistance [856]. In addition, ROS production will not eliminate C. albicans because previous studies have indicated that biofilm formation could against it easily [857]. Therefore oxidative stress via generating ROS will be harmful to host cell. Not only producing antidrug pathogen but also switching white cell to opaque cell can generate different cell types simply. In addition, while host membrane is covered by the biofilm formation of C. albicans, it then emerges more cell stress and inflammatory response caused by pathogenic factors and hyphae. Ultimately, host cell accumulates much cell stress and then causes cell apoptosis. Moreover, due to damaging host cell, inflammatory response will cause necrosis by hyphae. In the host cell damage phase, C. albicans SC5314 and C. albicans WO-1 all lead to cell apoptosis and necrosis finally. However, C. albicans SC5314 is still hard to switch white
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
458
17. Investigating the common pathogenic mechanism for drug design
cell to opaque cell. Nevertheless, C. albicans WO-1 switches white cell to opaque cell quickly, but C. albicans SC5314 still has chance to switch to opaque cell. Perhaps, microenvironment will promote this cellular function continuously. Conversely, C. albicans WO-1 in the opaque cell can run sexual reproduction so that it will strength genetic diversity and its own stability. Our results in Tables 17.1 and 17.2 have shown that it can be described as the stability of infection progression of C. albicans WO-1. Moreover, by the exhibition of experimental data, the gene expressions of AVEN and TNFAIP8L1 under the infection of C. albicans SC5314 are lower than C. albicans WO-1. By contrast, the receptor expression of EGFR under the infection of C. albicans SC5314 is higher than C. albicans WO-1. In addition, the protein expressions of MAPK6 and MAPK14 under C. albicans SC5314 condition are higher than C. albicans WO-1 condition. Therefore the inflammatory response and apoptosis caused by C. albicans WO-1 are stronger than C. albicans SC5314. However, under the infection of C. albicans SC5314, host cell is easier to resist pathogen by combining with the expression of the abovementioned proteins. On the contrary, C. albicans WO-1 is hard to be eliminated so that host cell needs more immune or inflammatory reaction. Furthermore, different strains of C. albicans could induce inflammation by releasing pathogenic factor. By the exhibition of experimental data, orf19.5542 (SAP6) has a significant change of expression in the infection progression at different strains. In this way, pathogen protein orf19.5542 (SAP6) can combine with the expression of receptor C3 to induce inflammatory response indirectly. In addition, the expression level of receptor C3 affected by C. albicans WO-1 is higher than C. albicans SC5314, that is, it can be considered that C. albicans WO-1 will generate a further stronger inflammatory response and apoptosis. As discussed in earlier, we could summarize the genetic and epigenetic pathogenic mechanisms in the infection progression of different strains of C. albicans in Fig. 17.7. The results in Fig. 17.7 suggest that the only differences for the infection mechanism of C. albicans SC5314 are exhibited as follows: TF orf19.4433 (CPH1) regulates gene orf19.666 to cause DNA damage response; Jun with methylation and ubiquitination regulates gene MMP12 to cause ECM degradation and PPARD to cause ROS production and apoptosis; FOS with ubiquitination regulates gene MMP12 to cause ECM degradation; FOXA1 regulates gene SERPINF1 to cause apoptosis; and mir1979-2 represses gene CCDC22 to cause inflammatory response under the infection of C. albicans SC5314. The other differences for the infection mechanism of C. albicans WO-1 are exhibited as follows: TF CAWG_02083 (EFG1) regulates gene CAWG_00418 (WOR1) to cause white-opaque switch; FOXA1 regulates gene VMP1 to cause autophagy; FOS with ubiquitination and methylation regulates gene MMP12 to cause ECM degradation; and mir210 represses gene CCDC22 to cause inflammatory response under the infection of C. albicans WO-1.
17.4.4 Prediction of drug target proteins and multiple-molecules drug design for the infection of different strains of C. albicans Recently, the major drugs employed to treat C. albicans infection include Amphotericin B, Fluconazole, and Caspofungin [858]. However, C. albicans will generate drug resistance by
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
459
17.4 Discussion
Candida albicans SC5314 TEC1
SUV3
TEC1
HHF22
TEC1
orf19.666
CPH1 CPH1
Candida albicans WO-1 Biofilm formation
SAP5
HHF22
Biofilm formation Hyphal growth
Endocytosis
DNA damage response
orf19.578
Endocytosis
RSR1
Hyphal growth
CPH1
Hyphal growth
Biofilm formation
orf19.666
TEC1
CPH1
Biofilm formation
Biofilm formation
CPH1
TEC1
DNA damage response
RSR1
SAP6
SUV3
Hyphal growth
Biofilm formation
orf19.578
CPH1
TEC1
SAP6
CPH1
Hydrolase release Biofilm formation
Hydrolase release Biofilm formation
SAP5
CPH1
Hydrolase release
Hydrolase release
BCR1
HWP1
Cell adhesion
BCR1
HWP1
Cell adhesion
EFG1
HHF22
Biofilm formation
EFG1
HHF22
Biofilm formation
EFG1
ALS3
EFG1
ALS3
ETS1 ETS1 NFKB1
CCDC22 BPHL
TNFAIP8L1
GATA1
AVEN
GATA1
IL20
miR30B
miR3941
ETS1
CCDC22
ROS production
ETS1
BPHL
Inflammatory response Autophagy
DEFB4A
GATA1
TNFAIP8L1
GATA1
AVEN
GATA1
Innate immune response Innate immune response Autophagy
UCN2 AVEN
Apoptosis
IL1B
Apoptosis
orf19.3292
orf19.666
UCN2
miR30B
AVEN
miR3941
miR1979-2
IL1B
Apoptosis Hyphal growth
ARP2
Response to ROS
miR548D2
orf19.3292
Response to ROS
DNA damage response
ROS production
JUN
WOR1
PPARD
White-opaque switch ROS production Apoptosis
ECM degradation
SERPINF1
Apoptosis
FOXA1
MMP12
ECM degradation
FOS
CCDC22
Apoptosis Inflammatory response
miR143HG
PPARD MMP12
Innate immune response Autophagy
Hyphal growth
Apoptosis
FOS
Apoptosis Innate immune response
IL20
YBX1
EFG1
FOXA1
Inflammatory response Inflammatory response Autophagy
Inflammatory response
miR548D2
JUN
Innate immune response
NFKB1
Apoptosis
ARP2
JUN
Inflammatory response ROS production
Inflammatory response
miR143HG
CPH1
Cell adhesion Endocytosis
Inflammatory response
Innate immune response
DEFB4A
GATA1
YBX1
Cell adhesion Endocytosis
Inflammatory response
miR210
VMP1 MMP12 CCDC22
Autophagy
ECM degradation Inflammatory response
FIGURE 17.7 Summarizing the common and specific epigenetic and genetic pathogenic mechanisms in the infection of different strains of Candida albicans. The figure summarizes the common and specific genetic and epigenetic pathogenic mechanisms from different strains of C. albicans. The green rectangular block denotes the differential regulations and functions between different strains of C. albicans [13].
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
460
17. Investigating the common pathogenic mechanism for drug design
forming biofilm. In addition, C. albicans is kept a balance between host defense and fungus. As long as the balance is destroyed, C. albicans will invade host cell and finally produce biofilm. Following this, C. albicans will cause diseases such as thrush and denture-associated erythematous. It can be seen that the treatment of C. albicans has to prevent hyphae growth and biofilm production. Nonetheless, C. albicans easily leads to reinfection after accepting treatment and staying immunocompromised such as chemotherapy and HIV-infected patients. Furthermore, current therapeutic treatments by drugs such as Amphotericin B, Fluconazole, and Caspofungin have side effects. Therefore we also have to find another drug to reduce reinfection and side effects. Finally, C. albicans WO-1 transforms cell type such as opaque cell. In the future the therapeutic treatment of C. albicans WO-1 needs to get a new direction for drug targets. Based on the above results, the infection of different strains of C. albicans with OKF6/TERT-2 cell could be used to investigate the common pathogenic mechanism to predict drug targets for the design of multiple molecules drug. We consider the important roles of orf19.1816 (CAWG_02005 in C. albicans WO-1, i.e., ALS3), orf19.610 (CAWG_02083 in C. albicans WO-1, i.e., EFG1), orf19.1321 (CAWG_03451 in C. albicans WO-1, i.e., HWP1), orf19.4433 (CAWG_00682 in C. albicans WO-1, i.e., CPH1), orf19.1623 (CAWG_02548 in C. albicans WO-1, i.e., CAP1), and orf19.723 (CAWG_01948 in C. albicans WO-1, i.e., BCR1). In these significant pathogen proteins, their cellular functions include hyphae growth, endocytosis, and biofilm formation. Thus we can see that these pathogen TFs and pathogen cell surface proteins play a very important role in the pathogenic mechanism during the infection of C. albicans. Therefore Amphotericin B, Fluconazole, and Caspofungin mentioned earlier are still feasible drug treatment. Next, we will find other pathogen proteins as drug targets to discover new drug according to their pathogen functions and roles in the pathogenic mechanism of C. albicans infection. Based on the above results, the following proteins play important roles in the hyphal growth and biofilm formation: orf19.2614 (CAWG_01560 in C. albicans WO-1, i.e., RSR1), orf19.7292 (CAWG_02173 in C. albicans WO-1, i.e., ARP2), orf19.4519 (CAWG_04191 in C. albicans WO-1, i.e., SUV3), orf19.1854 (CAWG_01979 in C. albicans WO-1, i.e., HHF22), orf19.5542 (CAWG_05098 in C. albicans WO-1, i.e., SAP6), and orf19.5585 (CAWG_05066 in C. albicans WO-1, i.e., SAP5). Moreover, we also investigate other proteins involved in defense mechanism such as ROS response. Pathogen proteins about anti-ROS are orf19.1623 (CAP1), orf19.5034 (CAWG_00057 in C. albicans WO-1, i.e., YBP1), and orf19.3292 (CAWG_01270 in C. albicans WO-1). However, C. albicans in the infection progression exploits pathogen proteins such as orf19.7247 (CAWG_00020 in C. albicans WO-1, i.e., Rim101) which could exploit orf19.5585 (SAP5) for the degradation of host cell surfaces protein. Eventually, the following pathogen proteins interact simultaneously with many proteins involving morphological transformation such as GTPase activity and the influence of microenvironment so that they can be considered as significant proteins to trigger TFs indirectly such as orf19.2087 (CAWG_03824 in C. albicans WO-1, i.e., SAS2), orf19.666 (CAWG_00299 in C. albicans WO-1), orf19.1093 (CAWG_04944 in C. albicans WO-1, i.e., FLO8), and orf19.939 (CAWG_04444 in C. albicans WO-1, i.e., NAM7). All the abovementioned pathogen proteins could be considered as potential common drug targets for therapeutic treatment of the infection of different strains of C. albicans. After identifying these common-molecule drug targets, we then exhibit drug databases and research reviews to design a multiple-molecule drug that targets different strains of C. albicans. While the absence of existing drug database for drugs targeting C. albicans
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
17.4 Discussion
461
proteins, we mine previous researches that forecast some drugs to inhibit these common potential drug targets. Previous studies have demonstrated that orf19.7247 (RIM101) can induce hyphae growth and degradation of host cell receptor [859]. For one previous study, drug targets of C. albicans are inferred by the sequence homolog between C. albicans and Saccharomyces cerevisiae. Consequently, we can find five drugs to eliminate C. albicans for new therapy. First, Tunicamycin could repress pathogen protein orf19.7247 function to reduce the hyphae growth induction and coordination of pathogen proteins for the degradation of host cell protein CDH1 [860]. Second, researches have shown that Terbinafine can inhibit the activity of orf19.5034 and its anti-ROS ability toward the stability of pathogen TF orf19.1623 (CAP1) to eliminate pathogen initially via ROS production [839,860]. Third, Terbinafine also represses the activity of orf19.1854 (HHF22) related to the hyphae growth and biofilm formation functions. However, Terbinafine directly or indirectly influences orf19.939 (NAM7), orf19.2087 (SAS2) so that orf19.939 and orf19.2087 could reduce the chance of triggering TFs [860]. Finally, Cerulenin could affect the expression levels of pathogen proteins orf19.939 and orf19.4519 (SUV3) to decrease biofilm formation [860]. Furthermore, Tetracycline could inhibit orf19.5585 (SAP5) and orf19.5542 (SAP6) so that C. albicans would not form biofilm and release pathogenic factor [861]. Perhaps, aspartic proteinase inhibitors could be also employed for orf19.5585 and orf19.5542 [862]. Ciclopirox olamine is also a broad-spectrum antibiotics to target orf19.939, orf19.1321 (HWP1), orf19.5585, and orf19.5542 [863]. Eventually, Tetrandrine can play important roles in inhibiting orf19.1816 (ALS3), orf19.610 (EFG1), and orf19.5908 (TEC1) to reduce the regulatory ability of pathogen TFs [864]. Nevertheless, previous studies have also shown a prolonged use of broad-spectrum antibiotics could lead to an impaired immune response [865]. So, we do not consider the broad-spectrum antibiotics because an immunocompromised response will cause a reinfection. Other pathogen proteins are also applied to azole compounds drug, especially Fluconazole. Nonetheless, orf19.1623 (CAP1), orf19.390 (CDC42), and orf19.578 are important human-homologs of CAPZA1, CDC42, and GRTP1, respectively. The repression of these proteins may cause unpredictable dysfunction of host cell, especially the GTPase activity and actin growth. Therefore orf19.5034 (YBP1) is considered as a better drug target instead of orf19.1623 (CAP1). Eventually, Cerulenin, Tunicamycin, Tetracycline, Tetrandrine, and Tunicamycin are combined with the previous five drugs as a potential multiple molecules drug in Fig. 17.8 common for the therapeutic treatment of two strains of C. albicans based on the above-predicted drug targets. But, Tetracycline proved by FDA also has side effect for colitis. However, orf19.2614 (RSR1) and orf19.7292 (ARP2) are involved in hyphae growth and considered as important virulence factors. But we cannot find drug for them. Moreover, orf19.666 participates in many pathogen protein interactions and DNA responses. So, we recommend that these pathogen proteins are potential drug targets for further drug design. In addition, no drug is also explored for pathogen protein orf19.4884 (WOR1) of C. albicans WO-1. In conclusion, our results show that orf19.2614 (RSR1), orf19.666, orf19.7292 (ARP2), and orf19.4884 (WOR1) will be significant drug targets for the design of new common multiple drugs to efficiently eliminate both C. albicans SC5314 and C. albicans WO-1.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
Chemical structures
Drug name
Drug target
Fluconazole
orf19.1816, orf19.610, orf19.1321, orf19.939, orf19.2087, orf19.1093, orf19.3292, orf19.5034
Amphotericin_B
orf19.5034, orf19.939, orf19.2087 , orf19.1093, orf19.3292
Caspofungin
orf19.610, orf19.1321, orf19.5034 , orf19.939, orf19.2087, orf19.1093 , orf19.3292, orf19.723
FDA Terbinafine
orf19.5034, orf19.939, orf19.2087 , orf19.1093, orf19.1854
FDA Tetracycline
orf19.5585, orf19.5542
Cerulenin
orf19.939, orf19.4519
Tunicamycin
orf19.7247
Tetrandrine
orf19.1816, orf19.610, orf19.5908
FIGURE 17.8 The potential common multiple-molecule drugs for the treatment of infection of different strains of Candida albicans. At the top three drugs, Fluconazole, Amphotericin B, and Caspofungin are applied to treat oral C. albicans infection for patients. Terbinafine can inhibit the activity of orf19.5034 (YBP1), orf19.939 (NAM7), orf19.2087 (SAS2), orf19.1093 (FLO8), and orf19.1854 (HHF22). Tetracycline can inhibit orf19.5585 (SAP5) and orf19.5542 (SAP6) of C. albicans from forming biofilm and releasing pathogenic factor. Cerulenin can affect the expression level of pathogen proteins orf19.939 (NAM7) and orf19.4519 (SUV3). Tunicamycin can repress pathogen orf19.7247 (RIM101) to reduce its ability of coordinating pathogen proteins for the degradation of host cell protein CDH1. Tetrandrine can inhibit orf19.1816 (ALS3), orf19.610 (EFG1), and orf19.5908 (TEC1) to reduce the ability of regulation functions of pathogen TFs. Moreover, other pathogen proteins are also applied to kinds of azole. So, these drugs are combined as a multiple-molecule drug to perish both strains of C. albicans simultaneously [13]. TF, Transcription factor.
17.6 Appendix
463
17.5 Conclusion The pathogenic mechanism during the C. albicans infection and the resistance mechanism of host cells is complicated and involved. For host cells the hyphal growth and hydrolase-triggering virulence factors have been extensively investigated, but little studies have concentrated on the cross-talk mechanism between human cell and C. albicans. Here, based on big data mining and system identification method, we have investigated GEINs between host OKF6/TERT-2 cells and C. albicans during C. albicans infection. Moreover, our results could also distinguish the common and specific pathogenic mechanism of different strains to investigate systematic resistance and infection mechanism for host cells and C. albicans, respectively. In this chapter, based on the system identification via mRNA/miRNA and lncRNA expression profiles in the NGS data, we have established the host/pathogen cross-talk GEINs for the infection of different strains of C. albicans. The important epigenetic modifications of infection progression in different strains of C. albicans such as the ubiquitination and acetylation of ARRB2 and orf19.578; the induced endocytosis of CDH1, HSP90B1, EGFR, and ERBB2; and host miRNA repressing on pathogens are mostly investigated by epigenetic interactions and regulations identified in interspecies cross-talk GEINs by the proposed system biology approach. Furthermore, core signal pathways are all identified through KEGG pathways via core cross-talk GEINs projected by GEINs through PNP method. These results have indicated that epigenetic interactions and regulations play a significant role in the pathogenic mechanism of different C. albicans strains. In the past, there are few studies concentrating on the defense mechanism in the human cell infected by C. albicans and the offense mechanism of C. albicans via pathogen epigenetic modification. However, previous studies only discovered epigenetic modification of C. albicans which did not infect host cell [866]. In addition, even few studies about the interaction of host cells are limited to host cell surface proteins and do not indicate downstream signaling transduction proteins and target genes to generate the corresponding cellular response. Our results have exhibited that epigenetic regulations will play an important role in the common and specific pathogenic mechanism of C. albicans, especially C. albicans WO-1, which will offer a new direction of drug targets and designs because it will transform cell type rapidly during the infection process. At the same time, with the recognition of crosstalk GEINs and HPCNs, the bioinformatics of C. albicans infection will increase for discovering more potential drugs for therapy. In the future the four pathogen proteins orf19.2614 (RSR1), orf19.666, orf19.7292 (ARP2), and orf19.4884 (WOR1) will be significant drug targets for the design of new common multiple drugs to efficiently eliminate both C. albicans SC5314 and C. albicans WO-1.
17.6 Appendix 17.6.1 Construction of candidate interspecies genetic and epigenetic interspecies networks via big data mining The candidate interspecies GEIN is constructed as shown in Fig. 17.1 through big data mining from numerous databases which contain many experimental data and V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
464
17. Investigating the common pathogenic mechanism for drug design
bioinformatics predictions. The host candidate PPIN required of PPI information was obtained from MINT [537, 538], DIP [759], BIND [533], IntAct [537], and BioGRID [584]. The host candidate GRN required of TFs/lncRNAs and their downstream-regulated genes information was obtained from CircuitDB 2 and ITFP [672,761], while the regulatory associations from miRNAs to genes are from CircuitDB 2 and TargetScan. The candidate pathogen PPIN is obtained from BioGRID. As for hostpathogen interspecies candidate PPIN and GRN and the pathogen candidate intraspecies GRN, at present no existing database can provide sufficient information or prediction for the candidate interspecies network constructions. Meanwhile, the number of interactions of the pathogen candidate PPIN is not sufficient to construct the real GEIN. Accordingly, we need to infer the putative interspecies and intraspecies PPIs [827]. Moreover, the similar concept is also employed to interspecies and intraspecies GRNs. For example, we, through in Fig. 17.A1, assume that the protein A0 and protein B0 of S. cerevisiae are exhibited to interact based on databases such as SGD, BioGRID, String, and Reactome [584,867,868]. Next, we further identify that C. albicans protein A is homologous to S. cerevisiae protein A0 ; C. albicans protein B is homologous to S. cerevisiae protein B0 via the sequence homology acquired from databases InParanoid. Hence, we can imply that C. albicans A interacts with C. albicans B. Thus we could utilize the sequence homology between C. albicans SC5314 and S. cerevisiae to construct the putative pathogen candidate intraspecies PPIN [585,867,868]. In addition, we also discovered previous studies about pathogenpathogen intraspecies PPIs [869871]. Due to similar species between C. albicans SC5314 and Saccharomyces, it is adapted to infer more PPIs. Similarly, hostpathogen interspecies PPIs are also applied to this method by using the sequence homology between C. albicans SC5314 and Homo sapiens. In addition, we also discovered previous studies about hostpathogen interspecies PPIs [197,829,872,873]. Combining with previous studies and the sequence homology, we could construct the hostpathogen candidate interspecies PPIN. The sequence homology information for the three species—H. sapiens, C. albicans, S. cerevisiae—were acquired from databases: InParanoid. Likewise, we have constituted with literatures reporting pathogen gene regulatory pairs and utilized the sequence homology between C. albicans SC5314 and S. cerevisiae to construct the pathogen candidate intraspecies GRN based on S. cerevisiae intraspecies GRNs (Yeastract) [874877]. Similarly, on the basis of CircuitDB 2 and ITFP, we have used the sequence homology between C. albicans SC5314 and H. sapiens to construct the hostpathogen candidate interspecies GRN. Finally, we have also utilized the sequence homology between C. albicans SC5314 and H. sapiens to construct candidate GRN of host-miRNAs targeting pathogen-genes based on databases such as TargetScan and CircuitDB 2. The detailed construction procedures of hostpathogen candidate interspecies PPIN, pathogen candidate intraspecies PPIN, pathogen candidate intraspecies GRN, candidate GRN of hostpathogen, and candidate GRN of host-miRNAs targeting pathogen-genes, are shown in Fig. 17.A1 at the end of Appendix. Thus the procedure of constructing candidate interspecies GEINs between C. albicans WO-1 and H. sapiens is same as C. albicans SC5314 and H. sapiens as shown in Fig. 17.A2. As a result, in the intraspecies candidate PPIN, we obtained 132,600 PPI pairs of C. albicans SC5314 and 6,449,171 PPI pairs of human; in the interspecies candidate PPIN, we obtained 1,615,845 PPI pairs between human and C. albicans SC5314. In the intraspecies candidate
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
17.6 Appendix
465
FIGURE 17.A1
The constructing programs of intraspecies candidate GEIN. (A) Candida albicans SC5314C. albicans SC5314 candidate intraspecies PPIN; (B) hostC. albicans SC5314 candidate interspecies
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
466
17. Investigating the common pathogenic mechanism for drug design
L
GRN, we obtained 86,491 TFgene pairs of C. albicans SC5314, 152,491 TFgene pairs of human cells, 170,671 miRNAgene pairs of human cells, and 37 lncRNAgene pairs of human cells, 45 miRNAmiRNA pairs of human cells, 130 miRNAlncRNA pairs of human cells, 271 TFlncRNA pairs of human cells, 1347 TFmiRNA pairs of human cells; in the interspecies candidate GRN, we obtained 8910 pairs between C. albicans SC5314 TF and human gene, 48 pairs between C. albicans SC5314 TF and human miRNA, 21,328 pairs between human TF and C. albicans SC5314 gene, 23,730 pairs between human miRNA and C. albicans SC5314 gene, and 1 pair between human lncRNA and C. albicans SC5314 gene. Likewise, in the intraspecies candidate PPIN, we obtained 131,485 PPI pairs of C. albicans WO1 and 5,521,448 PPI pairs of human cells; in the interspecies candidate PPIN, we obtained 1,453,353 PPI pairs between human and C. albicans WO-1 PPI pairs. In the intraspecies candidate GRN, we obtained 85,211 TFgene pairs of C. albicans WO-1, 141,001 TFgene pairs of human cells, 118,017 miRNAgene pairs of human cells, 35 lncRNAgene pairs of human cells, 21 miRNAmiRNA pairs of human cells, 78 miRNAlncRNA pairs of human cells, 200 TFlncRNA pairs of human cells, and 901 TFmiRNA pairs of human cells; in the interspecies candidate GRN, we obtained 8726 pairs between C. albicans WO-1 TF and human gene, 29 between C. albicans WO-1 TF and human miRNA, 20,177 pairs between human TF and C. albicans WO-1 gene, 16,639 pairs between human miRNA and C. albicans WO-1 gene, and 1 pair between human lncRNA and C. albicans WO-1 gene. In conclusion, we built the candidate interspecies GEINs composed of many candidate interaction and regulation pairs mentioned earlier, and then we discovered the real interspecies GEINs by pruning the false positives from the corresponding candidate interspecies GEIN via the system order detection scheme and the system identification approach by using the genome-wide microarray data of OKF6/ TERT-2 cells and C. albicans SC5314 in the following section. Similarly, the candidate interspecies GEINs between human and C. albicans WO-1 was constructed with the same process of C. albicans SC5314 infection.
PPIN; (C) candidate GRN of host-TFs targeting C. albicans SC5314-genes; (D) candidate GRN of host-miRNAs targeting C. albicans SC5314-genes; (E) candidate GRN of C. albicans SC5314 TFs targeting host-genes; (F) candidate GRN of C. albicans SC5314 TFs targeting host-miRNAs; (G) candidate GRN of C. albicans SC5314 TFs targeting C. albicans SC5314-genes. The sequence-homolog pairs between Saccharomyces cerevisiae and C. albicans SC5314, and C. albicans SC5314 with Homo sapiens were acquired from the reference and met with the standard of sequence-homolog (E-value , 1025, identification . 30%, overlap . 80%).With the assistance of existing S. cerevisiae intraspecies PPIN and host intraspecies PPIN, we can infer potential C. albicans SC5314 intraspecies PPIN and humanC. albicans SC5314 interspecies PPIN as shown in (A, B), respectively; similarly, we can infer potential human to C. albicans SC5314 interspecies GRN by sequence-homolog, CircuitDB2, ITFP, and TargetScan as shown in (C, D), respectively; similarly we can infer potential C. albicans SC5314 to human interspecies GRN by sequence-homolog, CircuitDB2, ITFP, and TargetScan as shown in (E, F), respectively; finally, we can infer potential C. albicans SC5314 intraspecies GRN by sequence-homolog, Yeastract [13]. GEIN, Genetic-and-epigenetic interspecies network; GRN, gene regulation network; miRNA, microRNA; PPIN, proteinprotein interaction network; TF, transcription factor.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
FIGURE 17.A2 The constructing programs of candidate GEIN. (A) Candida albicans WO-1C. albicans WO-1 candidate intraspecies PPIN; (B) hostC. albicans WO-1 candidate interspecies PPIN; (C) candidate GRN of host-TFs targeting C. albicans WO-1-genes; (D) candidate GRN of host-miRNAs targeting C. albicans WO-1-genes; (E) candidate GRN of C. albicans WO-1 TFs targeting host-genes; (F) candidate GRN of C. albicans WO-1 TFs targeting host miRNAs; (G) candidate GRN of C. albicans WO-1 TFs targeting C. albicans WO-1-genes. The sequence-homolog pairs between Saccharomyces cerevisiae and C. albicans WO-1, and C. albicans WO-1 with homo sapiens were acquired from the reference and met with the standard of sequence-homolog (E-value , 1025, identification . 30%, overlap . 80%). With the assistance of existing S. cerevisiae intraspecies PPIN and host intraspecies PPIN, we can infer potential C. albicans WO-1 intraspecies PPIN and humanC. albicans WO-1 interspecies PPIN as shown in (A, B), respectively; similarly, we can infer potential human to C. albicans WO-1 interspecies GRN by sequence-homolog, CircuitDB2, ITFP, and TargetScan as shown in (C, D), respectively; similarly we can infer potential C. albicans WO-1 to human interspecies GRN by sequence-homolog, CircuitDB2, ITFP, and TargetScan as shown in (E, F), respectively; finally, we can infer potential C. albicans WO-1 intraspecies GRN by sequence-homolog, Yeastract [13]. GEIN, Genetic-and-epigenetic interspecies network; GRN, gene regulation network; miRNA, microRNA; PPIN, proteinprotein interaction network; TF, transcription factor.
468
17. Investigating the common pathogenic mechanism for drug design
17.6.2 Dynamic models of candidate interspecies genetic and epigenetic interspecies networks for OKF6/TERT-2 cells and C. albicans during the infection The candidate interspecies GEIN is composed of the experimental results and computational predictions from numerous databases, experimental datasets, and literatures. Therefore the candidate interspecies GEIN contains a number of false-positive regulations and interactions. To reduce the effect of these false-positive information, we built the dynamic models to characterize the molecular- mechanisms of GEINs and to prune the false positives in the interspecies candidate GEIN, producing the real interspecies GEINs for OKF6/TERT-2 cells and C. albicans during the infection process. We could then extract the core interspecies GEINs by the PNP scheme to characterize the principal pathogenic mechanisms in GEINs during C. albicans infection. The PPIs of human-protein i in the candidate PPIN can be described as the following stochastic dynamic interactive equation: H pH i ðt 1 1Þ 5 pi ðtÞ 1
Ni X
H H aH in pi ðtÞpn ðtÞ 1
n51 H 2 βH p ðtÞ 1 χH i i i
Ji X
H P H H bH ij pi ðtÞpj ðtÞ 1 αi gi ðtÞ
(17.A1)
j51
1 ωH i ðtÞ;
for i 5 1; 2; . . .; I; αH i $0
and 2 β H i
#0
H H P where pH i ðtÞ; pn ðtÞ; gi ðtÞ, and pj ðtÞ indicate the expression levels of the ith host protein, the
nth host protein, the ith host gene and the jth pathogen protein at time t, respectively; aH in and represent the interactive ability between the ith and nth host protein and between the ith bH ij H H and jth pathogen protein, respectively; αH i , 2β i , and χi signify the translation rate from the corresponding mRNA, the degradation rate, and the expression basal level of the ith host protein, respectively; Ni and Ji denote the number of host proteins and pathogen proteins that interact with the ith host protein, respectively; ωH i ðtÞ indicates the stochastic noise of the expression level of the ith host protein at time t. The biological meaning of Eq. (17.A1) is that the expression level of the ith host protein can be affected by various molecular mechanisms, including the P i H H P Ji H H H P host intraspecies PPIs by N n51 ain pi ðtÞ pn ðtÞ, the interspecies PPIs by J51 bij pi ðtÞpj ðtÞ, the H P H protein translation by αH i gi ðtÞ, the protein degradation by 2β i gj ðtÞ, the expression basal level H H by χH i , and the stochastic noise by ωi ðtÞ. Furthermore, the protein degradation rate β i should be limited to be nonnegative and the translation rate αH i should be limited to be nonnegative in real PPIs. The PPIs of pathogen-protein j in the candidate PPIN can be described as the following stochastic dynamic interactive equation: p
pPj ðt 1 1Þ 5 pj ðtÞ 1
Oj X
cPjo pPj ðtÞpPo ðtÞ 1
o51 2 εPj pPj ðtÞ 1 χPj p pj ðtÞ,
p po ðtÞ,
1 ωPj ðtÞ
Ij X P P dPji pPj ðtÞpH i ðtÞ 1 δj gj ðtÞ i51
; for
j 5 1; 2; . . .; J; δPj
$0
(17.A2) and 2 εPj
#0
p gj ðtÞ,
where and pH i ðtÞ indicate the expression levels of the jth pathogen protein, the oth pathogen protein, the jth pathogen gene, and the ith host protein at time t,
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
469
17.6 Appendix
respectively; cPjo and dPji represent the interactive ability between the jth and oth pathogen protein and between the jth and ith host protein, respectively; δPj , 2εPj , and χPj signify the translation rate, the degradation rate, and the expression basal level of the jth pathogen protein, respectively; Oj and Ij denote the number of pathogen proteins and host proteins that interact with the jth pathogen protein; ωPj ðtÞ indicates the stochastic noise of the expression level of the jth pathogen protein at time t. The biological meaning of Eq. (17. A2) is that the expression level of the jth pathogen protein can be affected by various Oj P molecular mechanisms, including the pathogen intraspecies PPIs by cPjo pPj ðtÞpPo ðtÞ, the o51
interspecies PPIs by
Ij P i51
P P dPji pPj ðtÞpH i ðtÞ, the protein translation by δj gj ðtÞ, the protein degrada-
P H tion by 2εPj gH i ðtÞ, the expression basal level by χj , and the stochastic noise by ω i ðtÞ.
Furthermore, similar to the host protein dynamic model, the protein degradation rate εPj should be limited to be nonnegative and the translation rate δPj should be limited to be nonnegative in real PPIs. The transcriptional regulations of host-gene k in the candidate GRN can be described as the following stochastic dynamic interactive equation: H gH k ðt 1 1Þ 5 gk ðtÞ 1
Ik X
H eH ki pi ðtÞ 2
i51 H 2 φH g ðtÞ 1 ϕH k k k
Jk Lk Mk X X X H H H P fklH gH ðtÞm ðtÞ 1 h l ðtÞ 1 nH k l km m kj pj ðtÞ
l51 1 ϖH ðtÞ; k
m51
for k 5 1; 2; . . .; K;
(17.A3)
j51
2 fklH
#0
and 2 φH k
#0
H H H P where gH k ðtÞ, pi ðtÞ, ml ðtÞ, lm ðtÞ, and pj ðtÞ indicate the expression levels of the kth host gene, the ith host TF, the lth host miRNA, the mth host lncRNA, and the jth pathogen TF H H H at time t, respectively; eH ki , 2fkl , hkm , and nkj represent the regulatory ability of the ith host TF, the lth host miRNA, the mth host lncRNA, and the jth pathogen TF on the kth host H gene, respectively; 2φH k and ϕk signify the degradation rate and the expression basal level of the kth host gene, respectively. In fact, the basal level in Eq. (17.A3) indicates unknown regulations other than those mentioned earlier, for example, DNA methylation and regulatory multiple epigenetic activities. Ik, Lk, Mk, Jk denote the number of host TF, host miRNA, host lncRNA, and pathogen TF, respectively, which regulate the expression level of the kth host gene; ϖH k ðtÞ indicates the stochastic noise of the gene expression level of the kth host gene at time t. The biological meaning of Eq. (17.A3) is that the expression level of the kth host gene can be regulated by various molecular mechanisms, P Ik H H including the host TF regulations by i51 eki pi ðtÞ, the host miRNA repressions by PLk H H P k H H 2 l51 fkl gk ðtÞmH ð t Þ, the host lncRNA regulations by M m51 hkm lm ðtÞ, the pathogen TF regl P Jk H P H ulations by j51 nkj pj ðtÞ, the mRNA degradation by 2φk , the expression basal level by ϕH k ,
and the stochastic noise by ϖH k ðtÞ. In addition, similar to protein model, the host gene should be limited to be nonpositive and the host-miRNA regulatory degradation rate 2φH k H ability 2fkl should be limited to be nonpositive.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
470
17. Investigating the common pathogenic mechanism for drug design
The transcriptional regulations of host-miRNA l in the candidate GRN can be described as the following stochastic dynamic regulatory equation: H mH l ðt 1 1Þ 5 ml ðtÞ 1
Il X
H oH li pi ðtÞ 2
Jl Rl X X H H qH rPlj pPj ðtÞ lr ml ðtÞmr ðtÞ 1 r51
i51
j51
(17.A4)
H H H H H 2 γH l ml ðtÞ 1 κl 1 ηl ðtÞ ; for l 5 1; 2; . . .; L; 2 qlr # 0 and 2 γ l # 0 H H P where mH l ðtÞ, pi ðtÞ, mr ðtÞ, and pj ðtÞ indicate the expression levels of the lth host miRNA, the ith H P host TF, the rth host miRNA, and the jth pathogen TF at time t, respectively; oH li , 2qlr , and rlj represent the regulatory ability of the ith host TF, the rth host miRNA, and the jth pathogen TF on H the lth host miRNA, respectively; 2γ H l and κl signify the miRNA degradation rate and the expression basal level of the lth host miRNA, respectively; Il, Rl, and Jl denote the number of host TF, host miRNA and pathogen TF, respectively, which regulate the expression level of the lth host miRNA; ηH l ðtÞ indicates the stochastic noise of the lth host miRNA at time t. The biological meaning of Eq. (17.A4) is that the expression level of the kth host gene can be regulated by variPl H OH ous molecular mechanisms, including the host TF regulations by Ii51 li pi ðtÞ, the host-miRNA Rl P P Jl P P H H repressions by 2 qH j51 rlj pj ðtÞ, the lr ml ðtÞmr ðtÞ, and the pathogen TF regulations by r51
H H mRNA degradation by 2γ H l ml ðtÞ, the expression basal level by κl , and the stochastic noise by H ηl ðtÞ. In addition, similar to host gene model the host-miRNA degradation rate 2γ H l should be limited to be nonpositive and the miRNA regulatory ability 2qH should be limited to be lr nonpositive. The transcriptional regulations of host-lncRNA m in the candidate GRN can be described as the following stochastic dynamic regulatory equation: H lH m ðt 1 1Þ 5 lm ðtÞ 1
Im Lm X X H H H H H H H sH p ðtÞ 2 tH mi i ml lm ðtÞml ðtÞ 2 μm lm ðtÞ 1 πm 1 ϑm ðtÞ i51
for m 5 1; 2; . . .; M;
2 tH ml
#0
l51
and 2 μH m
(17.A5)
#0
H H where lH m ðtÞ, pi ðtÞ, and ml ðtÞ indicate the expression levels of the mth host lncRNA, the H ith host TF, and the lth host miRNA at time t, respectively; sH mi and 2tml represent the regulatory ability of the ith host TF and the lth host miRNA on the mth host lncRNA, H respectively; 2μH m and πm signify the degradation rate and the expression basal level of the mth host lncRNA, respectively; Im and Lm denote the number of host TF and host miRNA, respectively, which regulate the expression level of the mth host lncRNA; ϑH m ðtÞ indicates the stochastic noise of the mth host gene at time t. The biological meaning of Eq. (17.A5) is that the expression level of the mth host lncRNA can be regulated by variPm H H smi pi ðtÞ, the hostous molecular mechanisms, including the host TF regulations by Ii51 PLk H H H H miRNA repressions by 2 l51 tml lm ðtÞml , the lncRNA degradation by 2μH m lm ðtÞ, the H H expression basal level by πm , and the stochastic noise by ϑm ðtÞ. In addition, similar to host gene model, the host lncRNA degradation rate 2μH m should be limited to be
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
471
17.6 Appendix
nonpositive and the host-miRNA regulatory ability 2tH ml should be limited to be nonpositive. The transcriptional regulations of pathogen-gene n in the candidate GRN can be described as the following stochastic dynamic regulatory equation: gPn ðt 1 1Þ 5 gPn ðtÞ 1
Jn In Ln Mn X X X X P P H P H uPni pH ðtÞ 2 v g ðtÞm ðtÞ 1 w l ðtÞ 1 xPnj pPj ðtÞ i nl n l nm m
i51 2 φPn gPn ðtÞ 1 ϕPn
l51 1 ωPn ðtÞ
m51
; for n 5 1; 2; . . .; N;
(17.A6)
j51
2 vPnl
#0
and 2 φPn
#0
H H P where gPn ðtÞ, pH i ðtÞ, ml ðtÞ, lm ðtÞ, and pj ðtÞ indicate the expression levels of the nth pathogen gene, the ith host TF, the lth host miRNA, the mth host lncRNA, and the jth pathogen TF at time t, respectively; uPni , 2vPnl , wPnm , and xPnj represent the regulatory ability of the ith host TF, the lth host miRNA, the mth host lncRNA, and the jth pathogen TF on the nth pathogen gene, respectively; 2φPn and ϕPn signify the degradation rate and the expression basal level of the mth pathogen gene, respectively; In, Ln, Mn, Jn denote the number of host TF, host miRNA, host lncRNA, and pathogen TF, respectively, which regulate the expression level of mth pathogen gene; ωPn ðtÞ indicates the stochastic noise of the gene expression level of the nth pathogen gene at time t. The biological meaning of Eq. (17.A6) is that the expression level of the kth pathogen gene can be regulated by various molecular mechanisms, Pn P H uni pi ðtÞ, the host-miRNA repressions by including the host TF regulations by 2 Ii51 P n P H PLk P P H 2 l51 vnl gn ðtÞml ðtÞ, the host-lncRNA regulations by M n51 wnm lm ðtÞ, and the pathogen TF P Jn P P P P regulations by j51 xnj pj ðtÞ, the mRNA degradation by 2φn gn ðtÞ, the expression basal
level by ϕPn , and the stochastic noise by ωPn ðtÞ. In addition, similar to host gene regulatory model, the mRNA degradation rate 2φPn should be limited to be nonpositive and the hostmiRNA regulatory ability 2vPnl should be limited to be nonpositive. Remark 17.1 The above dynamic models of interspecies GEIN mentioned earlier are also employed for the infection of C. albicans WO-1. Since C. albicans WO-1 and albicans SC5314 have the same candidate GEIN, the dynamic models between human and C. albicans WO-1 are along the interspecies GEIN of the infection of C. albicans SC5314. In addition, without the regulation about lncRNAs-to-miRNA, lncRNAs-to-lncRNA, pathogen TFs-to-lncRNAs found in the candidate interspecies GEIN by big data mining, there are no corresponding regulatory terms in Eqs. (17.A4) and (17.A5) in the infection of different strains of C. albicans.
17.6.3 Parameter estimation of the dynamic models of candidate interspecies genetic and epigenetic interspecies network by system identification approach After constructing the dynamic model equations (17.A1)(17.A6) of the candidate interspecies GEINs, we should identify the interactive parameters of PPIN in (17.A1) and (17.A2), and the regulatory parameters of GRN in (17.A3)(17.A6) by employing the system identification approach via two-sided microarray data to obtain the real interspecies GEINs in the infection progress by pruning the false positives in the candidate interspecies GEINs. Accordingly, we rew3rite the host PPIN dynamic equations as the linear regression form below [40,878,879]:
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
472
17. Investigating the common pathogenic mechanism for drug design H H H H H H pH 1 ðtÞ i ðt 1 1Þ 5 ½pi p1 ðtÞ?pi pNi ðtÞpi p2 3 aH i1 6 ^ 7 7 6 6 aH 7 6 iNi 7 6 bH 7 7 6 i1 7 6 H H H H ?pi pJi ðtÞgi ðtÞpi ðtÞ16 ^ 7 1 ωH i ðtÞ 6 bH 7 7 6 iJi 7 6 7 6 αH i 7 6 4 1 2 βH 5 i χH i HP HP H 9ψi ðtÞθi 1 ωi ðtÞ; for i 5 1; 2; . . .; I
(17.A7)
where ψHP i ðtÞ represents the regression vector that can be obtained from the microarray expression data and θHP is the unknown interaction parameter vector to be estimated for i the ith host protein in host PPIN. The expression (17.A7) of the ith host protein can be augmented for Yi time points as the following form: 2 3 2 HP 2 H 3 3 ωi ðt1 Þ pH ψi ðt1 Þ i ðt2 Þ 6 pH ðt3 Þ 7 6 ψHP ðt2 Þ 7 HP 6 ωH ðt2 Þ 7 i 6 756 i 7 7θ 1 6 i (17.A8) 4 5 4 4 ^ 5; for i 5 1; 2; . . .; I; 5 i ^ ^ HP H H pi ðtYi 1 1Þ ωi ðtYi Þ ψi ðtYi Þ which could be simply represented by: HP HP HP PH i 5 Φi θ i 1 Ω i ;
where
2
3 pH i ðt2 Þ 6 pH ðt3 Þ 7 i 6 7; PH i 54 5 ^ H pi ðtYi 1 1Þ
for i 5 1; 2; . . .; I
2
3 ψHP i ðt1 Þ 6 ψHP ðt2 Þ 7 6 i 7; ΦHP i 54 5 ^ HP ψi ðtYi Þ
(17.A9) 2
3 ωH i ðt1 Þ 6 ωH ðt2 Þ 7 6 i 7 ΩHP i 54 ^ 5 ωH i ðtYi Þ
Therefore the interaction parameters in the vector θHP can be estimated by employing i the following constrained least-squares estimation problem: 1 2 minθHP :ΦHP θHP 2PH i :2 i 2 i i subject to
0 0
? 0 ? 0
0 0
? 0 ? 0
21 0
0 1
0 HP 0 θ # 0 i 1
(17.A10)
We can acquire the interaction parameters in host PPIN equation (17.A1) by resolving the parameter estimation problem in (17.A10) with the help of the function lsqlin in MATLAB optimization toolbox and simultaneously ensure the host protein translation H rate αH i to be a nonnegative value and the host protein degradation rate 2β i to be a nonH H positive value; that is to say 2αi $ 0 and 2β i # 0. V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
473
17.6 Appendix
Similarly, we rewrite the pathogen PPIN dynamic interactive equation as the following linear regression form: H P H pPj ðt 1 1Þ 5 ½pPj pP1 ðtÞ?pH j pOj ðtÞpj p1 ðtÞ 2 H 3 cj1 6 ^ 7 6 H 7 6 cjO 7 7 6 j 6 dH 7 6 j1 7 7 6 P P H 6 ^ 7 1 ωP ðtÞ ?pH j pIj ðtÞδ j pj ðtÞ16 j 7 7 6 dH 6 jIj 7 6 δP 7 6 j 7 7 6 4 1 2 εPj 5 χPj PP PP P 9ψj ðtÞθj 1 ωj ðtÞ; for j 5 1; 2; . . .; J
(17.A11)
where ψPP j ðtÞ represents the regression vector that can be obtained from the microarray expression data and θPP j is the unknown parameter vector to be estimated for the jth pathogen protein in pathogen PPIN. Eq. (17.A11) of the ith host protein can be augmented for Yi time points as the following form: 3 2 3 2 PP 2 P 3 pPj ðt2 Þ ωj ðt1 Þ ψj ðt1 Þ 7 6 7 6 6 P 7 6 pPj ðt3 Þ 7 6 ψPP 6 ωj ðt2 Þ 7 ðt Þ 7 (17.A12) 1 6 6 7 5 6 j 2 7θPP 7; for j 5 1; 2; . . .; J; 5 j 4 5 4 4 ^ 5 ^ ^ pPj ðtYj 1 1Þ ωPj ðtYj Þ ψPP j ðtYj Þ which could be simply represented by: PP PP PPj 5 ΦPP j θj 1 Ωj ;
where
2
3 pPj ðt2 Þ 6 pP ðt3 Þ 7 6 7 j PPj 5 6 7; 4 ^ 5 pPj tYj 1 1
for j 5 1; 2; . . .; J
2
3 ψPP j ðt1 Þ 6 ψPP ðt Þ 7 6 j 2 7 ΦPP 7; j 56 4 ^ 5 PP ψj tYj
(17.A13) 2
3 ωPj ðt1 Þ 6 ω P ðt2 Þ 7 6 j 7 ΩPP 7 j 56 4 ^ 5 ωPj tYj
Next, the parameters in the vector θPP can be estimated by employing the following j constrained least-squares estimation problem: 1 PP P 2 minθPP :ΦPP j θj 2Pj :2 j 2 subject to
0 ? 0 ?
0 0 0 0
? ?
0 0
21 0
0 0 PP 0 θ # 1 0 j 1
(17.A14)
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
474
17. Investigating the common pathogenic mechanism for drug design
We can acquire the interaction parameters in pathogen PPIN equation (17.A2) by resolving the parameter estimation problem in (17.A14) with the help of the function lsqlin in MATLAB optimization toolbox and simultaneously ensure the pathogen protein translation rate δPj to be a nonnegative value and the pathogen protein degradation rate 2εPj to be a nonpositive value; that is to say δPj $ 0 and 2εPj # 0. Similarly, we rewrite the host GRN dynamic regulatory equation in (17.A3) as the following linear regression form: H H H H H H H H ðtÞ?gH gH k ðt 1 1Þ 5 ½p1 ðtÞ?pIk gk ðtÞm1 ðtÞgk ðtÞm k3ðtÞmLk ðtÞl1 ðtÞ 22 H ek1 6 ^ 7 7 6 6 eH 7 6 kIk 7 6 H 7 7 6 2 fk1 6 H 7 7 6 2 fk2 7 6 6 ^ 7 6 H 7 7 6 2 fkL k 7 6 P P H 6 hH 7 1 ϖH ðtÞ ?lH ðtÞp ðtÞ?p ðtÞg ðtÞ1 Mk 1 Jk k k 6 k1 7 6 ^ 7 6 H 7 7 6 h 6 kMk 7 6 nH 7 7 6 6 ^k1 7 7 6 6 nH 7 6 kJk 7 7 6 5 4 1 2 φH k ϕH k HG H 9ψHG for k 5 1; 2; . . .; K k ðtÞθk 1 ϖk ðtÞ;
(17.A15)
where ψHG k ðtÞ represents the regression vector that can be obtained from the microarray expression data and θHG is the unknown parameter vector to be estimated for the kth pathk ogen protein in host GRN. Eq. (17.A15) of the kth host gene can be augmented for Yk time points as the following form: 3 3 2 HG 2 H 3 gH ϖk ðt1 Þ ψk ðt1 Þ k ðt2 Þ HG H H 6 g ðt3 Þ 7 6 ψ ðt2 Þ 7 HG 6 ϖ ðt2 Þ 7 k 7θ 1 6 k 6 756 k 7; 5 k 4 5 4 4 5 ^ ^ ^ HG H H gk ðtYk 1 1Þ ϖk ðtYk Þ ψk ðtYk Þ 2
for k 5 1; 2; . . .; K;
(17.A16)
which could be simply represented by: HG HG HG GH k 5 Φk θk 1 Ωk ;
for k 5 1; 2; . . .; K
(17.A17)
where 2
3 gH k ðt2 Þ H 6 g ðt3 Þ 7 k 6 7; GH k 54 5 ^ H gk ðtYk 1 1Þ
2
3 ψHG k ðt1 Þ HG 6 ψ ðt2 Þ 7 6 k 7; ΦHG k 54 5 ^ HG ψk ðtYk Þ
2
3 ϖH k ðt1 Þ H 6 ϖ ðt2 Þ 7 6 k 7 ΩHG k 54 5 ^ H ϖk ðtYk Þ
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
475
17.6 Appendix
Hence, the regulatory parameters in the vector θHG can be estimated by employing the k following constrained least-squares estimation problem: 1 2 minθHG :ΦHG θHG 2GH k :2 k 2 k k 2
0 6^ 6 subject to6 6^ 40 0
? 0 1 0 & ^ 0 & & ^ ^ & ? 0 0 ? ? 0 0 ?
? & & 0 ?
0 0 ? ^ ^ & 0 ^ & 1 0 ? 0 0 ?
0 0 ? 0 ^ ^ & ^ ^ ^ & ^ 0 0 ? 0 0 0 ? 0
3 2 3 0 0 0 7 6^7 ^ ^7 6 7 HG 6 7 ^ ^7 7θk # 6 ^ 7 405 0 05 1 0 1
(17.A18)
We can acquire the regulatory parameters in host GRN equation (17.A3) by resolving the parameter estimation problem in (17.A18) with the help of the function lsqlin in MATLAB optimization toolbox and simultaneously ensure the host gene degradation rate H 2φH k is guaranteed to be a nonpositive value and the host-miRNA repression rate 2fkl to H H be a nonpositive value; that is to say 2fkl # 0 for k 5 1,. . ., Kj and 2φk # 0. Similarly, we rewrite the host-miRNA dynamic regulatory equation as the linear regression form below: H H H H H H H P mH ðtÞmH l ðt 1 1Þ 5 ½p1 ðtÞ?pIl ml ðtÞm l ðtÞm2 ðtÞ?ml ðtÞmRl ðtÞp1 ðtÞ 2 1H 3 ol1 6 ^ 7 6 H 7 6 o 7 6 lIl 7 6 2 qH 7 6 l1 7 6 2 qH 7 6 7 6 ^ l2 7 6 7 P H H ?pJl ðtÞml ðtÞ16 H 7 1 ηl ðtÞ 6 2 qlRl 7 6 P 7 6 rl1 7 6 7 6 ^ 7 6 P 7 6 rlJ 7 l 6 7 4 1 2 γH 5 l κH l HM H 9ψHM ðtÞθ 1 η ðtÞ; for l 5 1; 2; . . .; L l l l
(17.A19)
where ψHM ðtÞ represents the regression vector that can be obtained from the microarray l expression data and θHM is the unknown parameter vector to be estimated for the lth host l miRNA in host GRN. Eq. (17.A19) of the lth host miRNA can be augmented for Yl time points as the following form: 2 H 3 3 2 3 2 HM ηl ðt1 Þ ψl ðt1 Þ mH l ðt2 Þ 6 mH ðt3 Þ 7 6 ψHM ðt2 Þ 7 HM 6 ηH ðt2 Þ 7 l 7 7θ 1 6 l 6 756 l (17.A20) 4 5 4 4 ^ 5; for l 5 1; 2; . . .; L; 5 l ^ ^ H HM H ml ðtYl 1 1Þ ηl ðtYl Þ ψl ðtYl Þ which could be simply represented by: HM HM θl 1 ΩHM ; MH l l 5 Φl
for l 5 1; 2; . . .; L
(17.A21)
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
476
17. Investigating the common pathogenic mechanism for drug design
where
2
3 mH l ðt2 Þ 6 mH ðt3 Þ 7 l 6 7; MH l 54 5 ^ H ml tYl 1 1
2
3 ψHM ðt1 Þ l 6 ψHM ðt2 Þ 7 l 7; ΦHM 56 l 4 ^ 5 ψHM t Yl l
2
3 ηH l ðt1 Þ 6 ηH ð t 2 Þ 7 l 7 56 ΩHM l 4 ^ 5 ηH l t Yl
can be estimated by employing the Next, the regulatory parameters in the vector θHM l following constrained least-squares estimation problem: 1 2 minθHM :ΦHM θHM 2MH l l :2 l 2 l 2
0 ? 0 6^ & ^ 6 subject to6 6^ & ^ 40 ? 0 0 ? 0
1 0 ? 0 0 & & ^ ^ & & 0 0 ? 0 1 0 ? ? 0
0 ? 0 0 ^ & ^ ^ ^ & ^ ^ 0 ? 0 0 0 ? 0 1
3 2 3 0 0 7 6^7 ^7 6 7 HM 6 7 ^7 7θl # 6 ^ 7 5 405 0 0 1
(17.A22)
We can acquire the regulatory parameters in host GRN equation (17.A4) by resolving the parameter estimation problem in (17.A22) with the help of the function lsqlin in MATLAB optimization toolbox and simultaneously ensure the host-miRNA degradation rate 2γ H l is guaranteed to be a nonpositive value and the host-miRNA repression rate H H 2qH lr to be a nonpositive value; that is to say 2qlr # 0 for r 5 1,. . ., Rl and 2γ l # 0. Similarly, we rewrite the hostlncRNA dynamic regulatory equation as the following linear regression form: 3 sH m1 6 ^ 7 7 6 H 6 smIm 7 7 6 h i6 H 7 H H H H H H H 6 2 tm1 7 1 ϑH ðtÞ lH ðt 1 1Þ 5 p ðtÞ?p ðtÞl ðtÞm ðtÞ ?l ðtÞm ðtÞl ðtÞ1 m 1 Im m 1 m Lm m m 6 ^ 7 7 6 6 2 tH 7 mL 6 m 7 4 1 2 μH 5 2
HL H 9ψHL m ðtÞθm 1 ϑm ðtÞ;
for m 5 1; 2; . . .; M
πH m
(17.A23)
m
where ψHL m ðtÞ represents the regression vector that can be obtained from the microarray expression data and θHL m is the unknown parameter vector to be estimated for the lth host lncRNA in host GRN.
Eq. (17.A23) of the mth host lncRNA can be augmented for Ym time points as the following form: 3 3 2 HL 3 2 H lH ψm ðt1 Þ ϑm ðt1 Þ m ðt2 Þ 6 lH ðt3 Þ 7 6 ψHL ðt2 Þ 7 HL 6 ϑH ðt2 Þ 7 m 7θ 1 6 m 756 m 7; 6 5 m 5 4 5 4 4 ^ ^ ^ H HL H lm ðtYm 1 1Þ ψm ðtYm Þ ϑm ðtYm Þ 2
for m 5 1; 2; . . .; M;
(17.A24)
which could be simply represented by: HL HL HL LH m 5 Φm θ m 1 Ω m ;
for m 5 1; 2; . . .; M
(17.A25)
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
477
17.6 Appendix
where
3 lH m ðt2 Þ 6 lH ðt3 Þ 7 m 7; 6 LH m 54 5 ^ H l m tYm 1 1 2
3 ψHL m ðt1 Þ HL 6 ψ ðt2 Þ 7 7; 6 m ΦHL m 54 ^ 5 ψHL m tYm 2
3 ϑH m ðt1 Þ H 6 ϑ ðt2 Þ 7 7 6 m ΩHL m 54 ^ 5 ϑH m tYm 2
Thence, the regulatory parameters in the vector θHL m can be estimated by employing the following constrained least-squares estimation problem: 2
1 HL HL 2 :Φm θm 2LH minθHL m :2 m 2
0 ? 0 6^ & ^ subject to6 40 ? 0 0 ? 0
1 0 ? 0 & & ^ & 1 0 ? 0
3 2 0 0 7 ? ^ 7 HL 6 θ #4 & ^5 m 1 0
3 0 ^7 05 1
(17.A26)
We can acquire the regulatory parameters in host GRN equation (17.A5) by resolving the parameter estimation problem in (17.A26) with the help of the function lsqlin in MATLAB optimization toolbox and simultaneously ensure the host lncRNA degradation rate 2μH m is guaranteed to be a nonpositive value and the host-miRNA repression rate H 2tH to be a nonpositive value; that is to say 2tH ml ml # 0 for l 5 1,. . ., Lm and 2μm # 0. Finally, we rewrite the pathogen gene dynamic regulatory equation as the following linear regression form: 3 uPn1 6 ^ 7 6 P 7 6 unI 7 n 7 6 6 2 vp 7 6 n1 7 p 6 2v 7 6 n2 7 6 ^ 7 7 6 6 2 vp 7 h i6 nLn 7 7 6 H P H P H P H H H P P P P P gPn ðt 1 1Þ 5 pH 1 ðtÞ?pIn gn ðtÞm1 ðtÞgn ðtÞm2 ðtÞ. . .gn ðtÞmLn ðtÞl1 ðtÞ?lMn ðtÞp1 ðtÞ ?pJn ðtÞgn ðtÞ 1 6 wn1 7 1 ϖn ðtÞ 7 6 6 ^ 7 6 P 7 6 wnMn 7 6 P 7 6 xn1 7 7 6 6 ^ 7 6 P 7 6 xnJ 7 n 7 6 4 1 2 φP 5 2
PG P 9ψPG n ðtÞθn 1 ϖn ðtÞ;
for n 5 1; 2; . . .; N
ϕPn
n
(17.A27) where ψPG n ðtÞ represents the regression vector that can be obtained from the microarray expression data and θPG n is the unknown regulatory parameter vector to be estimated for the nth pathogen gene in pathogen GRN. Eq. (17.A27) of the nth pathogen gene can be augmented for Yn time points as the following form: 2 3 2 PG 2 P 3 3 ϖn ðt1 Þ gPn ðt2 Þ ψn ðt1 Þ 6 gP ðt3 Þ 7 6 ψPG ðt2 Þ 7 PG 6 ϖP ðt2 Þ 7 n 6 756 n 7; for n 5 1; 2; . . .; N; 7θ 1 6 n (17.A28) 4 4 5 4 5 5 n ^ ^ ^ PG P P gn ðtYn 1 1Þ ϖn ðtYn Þ ψn ðtYn Þ V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
478
17. Investigating the common pathogenic mechanism for drug design
which could be simply represented by: PG PG GPn 5 ΦPG n θn 1 Ωn ;
for n 5 1; 2; . . .; N
(17.A29)
where 2
3 gPn ðt2 Þ 6 gP ðt3 Þ 7 n 7; GPn 5 6 4 5 ^ P g n t Yn 1 1
2
3 ϖPn ðt1 Þ 6 ϖP ðt2 Þ 7 6 n 7 ΩPG n 54 ^ 5 ϖPn tYn
2
3 ψPG n ðt1 Þ 6 ψPG ðt2 Þ 7 6 n 7; ΦPG n 54 ^ 5 ψPG n t Yn
Hence, the parameters in the vector θHG can be estimated by employing the following n constrained least-squares estimation problem: 1 PG PG 2 :Φn θn 2GPn :2 minθPG n 2 2 0 ? 0 1 6^ & ^ 0 6 subject to6 6^ & ^ ^ 40 ? 0 0 0 ? 0 0
0 & & ? ?
? & & 0 ?
0 ^ 0 1 0
0 ^ ^ 0 0
? 0 & ^ & ^ ? 0 ? 0
0 ^ ^ 0 0
? & & ? ?
0 0 ^ ^ ^ ^ 0 0 0 1
3 2 3 0 0 6^7 ^7 7 PG 6 7 6 7 ^7 7θn # 6 ^ 7 405 05 0 1 (17.A30)
We can acquire the regulatory parameters in pathogen GRN equation(17.A6) by resolving the parameter estimation problem in (17.A30) with the help of the function lsqlin in MATLAB optimization toolbox and simultaneously ensure the pathogen gene degradation rate 2φPn is guaranteed to be a nonpositive value and the host-miRNA repression rate 2vPnl to be a nonpositive value; that is to say 2vPnl # 0 for l 5 1,. . ., Ln and 2φPn # 0. As the mentioned parameter estimation problem of dynamic models above, to prevent the overfitting problem in the parameter identification and obtain the accurate results of the system identification method, we apply the cubic spline to interpolate some extra numbers of data points (five times number of the parameters in the corresponding parameter vector, that is, θHP in human PPIN, θPP in pathogen PPIN, θHG in human-gene GRN, θHM in humani j k l HL PG miRNA GRN, θm in human-lncRNA GRN, θn in pathogen-gene GRN to be estimated). Then, with the microarray expression data, the solutions of the constrained least-square parameter estimation problems in (17.A10), (17.A14), (17.A18), (17.A22), (17.A26), and (17.A30) could be obtained for the accurate parameter identification in GEINs gene by gene (or protein by protein) by using the function lsqlin in MATLAB optimization toolbox for the optimal estimations of parameters in these estimation problems. Moreover, since the measurement technology of genome-wide protein expression of OKF6/TERT-2 cells and C. albicans has not been implemented yet, and about 73% variance of protein abundance can be explained by the corresponding mRNA abundance [839], that is to say, the microarray data of gene expressions can substitute protein expressions and contribute to sufficient information for resolving the above constrained least-squares parameter estimation problems in (17.A10), (17.A14), (17.A18), (17.A22), (17.A26), and (17.A30).
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
479
17.6 Appendix
17.6.4 Trimming false positives in candidate genetic and epigenetic interspecies networks by system order detection scheme Because candidate GEINs contain many false-positive interactions and regulations acquired from computational, experimental and homology-dependent predictions in the database mining process, we must employ system order detection scheme for host PPI model in (17.A1), pathogen PPI model in (17.A2), host-gene GRN model in (17.A3), host-miRNA GRN model in (17.A4), host-lncRNA model in (17.A5) and pathogen-gene GRN model in (17.A6) to prune these false positives in the candidate GEINs. Hence, we apply Akaike information criterion (AIC) to deleting the insignificant parameters out of the system order of the candidate GEINs by the real microarray data of OKF6/TERT-2 cells during C. albicans SC5314 and C. albicans WO-1 infection, respectively. In host PPI model in (17.A9), AIC of the host PPIs of the ith host protein can be defined as the function of system interaction order as follows [356,837,838]: 0 1 T HP HP 1 HP ^ ^ @ A 1 2ð N i 1 J i Þ AICHP PH 2ΦHP PH (17.A31) i ðNi ; Ji Þ 5 log i θi i 2 Φi θi Ti i Ti where θ^ i represents the estimated interactive parameters of human-protein i from the solutions of the parameter estimation problem in (17.A10), and the covariance of estimated HP HP residual error is ðσHP Þ2 5 ð1=Ti ÞðPH 2ΦHP θ^ ÞT ðPH 2 ΦHP θ^ Þ. According to system identifiHP
i
i
i
i
i
i
i
cation theory [838], the real system order Ni 1 Ji of the real PPIs of protein i in the host PPI could minimize AICHP (Ni, Ji). By this system order detection method, host proteins i with insignificant interaction abilities out of Ni and pathogen proteins with insignificant interaction abilities are out of Ji should be considered as false positives and trimmed from the candidate PPIs of ith protein. By a similar procedure could obtain the real host PPIs one protein by one protein in GEINs. Similarly, in the pathogen PPIN model in (17.A13), AIC of pathogen-protein j could be defined as follows [40]: 0 1 T 2 Oj 1 Ij 1 P PP ^ PP P PP ^ PP A @ 1 5 log AICPP O ; I P 2Φ P 2 Φ (17.A32) θ θ j j j j j j j j Tj j Tj where θ^ j denotes the estimated interactive parameters of pathogen-protein j obtained from the solutions of the parameter estimation problem in (17.A14), and the covariance of PP estimated residual error is ðσPP Þ2 5 ð1=Tj ÞðPP 2ΦPP θ^ ÞT ðPP 2 ΦPP θ ^PP Þ. We could minimize PP
j
j
j
j
j
j
j
in (17.A32) to achieve at the real interaction numbers and Ij of the real PPIs of protein j in the pathogen PPIN. Therefore, based on the real system order, Oj and Ij can be used to prune the false-positive interactions of candidate PPIN one protein by one protein for the real pathogen PPIN of GEINs. AICPP j
Oj
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
480
17. Investigating the common pathogenic mechanism for drug design
By the similar procedure, in the host-gene regulation model in (17.A17), AIC of hostgene k could be defined as follows [40]: 0
1 T HG HG 1 HG ^ HG ^ @ A 1 2ðIk 1 Lk 1 Mk 1 Jk Þ AICHG GH GH k 2Φk θk k 2 Φk θk k ðIk ; Lk ; Mk ; Jk Þ 5 log Tk Tk
(17.A33)
where θ^ k denotes the estimated regulatory parameters of host-gene k obtained from the solutions of the parameter estimation problem in (17.A18), and the covariance of estimated 2 HG HG ^ HG T HG ^ HG H 5 T1k ðGH in residual error is σHG k 2Φk θk Þ ðGk 2 Φk θk Þ. We could minimize AICk k (17.A33) to achieve the real regulation numbers Ik , Lk , Mk , and Jk of the real regulations of host-gene k in the host-gene GRN. Therefore the corresponding real system order Ik , Lk , Mk , and Jk can be used to prune the false-positive regulations of candidate GRN one gene by one gene for the real host-gene GRN of GEINs. By the similar procedure, in host-miRNA regulation model (17.A21), AIC of hostmiRNA l could be defined as follows [838]: HG
0
1 AICHM ðIl ; Rl ; Jl Þ 5 log@ l Tl
T HM ^ HM MH MH θl l 2Φl l
HM 2 ΦHM θ^ l l
1
A 1 2ðIl 1 Rl 1 Jl Þ Tl
(17.A34)
where θ^ l denotes the estimated regulatory parameters of host-miRNA l obtained from the solutions of the parameter estimation problem in (17.A22), and the covariance of estiHM HM ^ ÞT ðMH θ l^ Þ. We could minimated residual error is ðσHM Þ2 5 ð1=T ÞðMH 2ΦHM θ HM l 2 Φl HM
l
l
l
l
l
mize in (17.A34) to achieve the real regulation numbers Il , Rl , and Jl of the real regulations of host-miRNA l in the host-miRNA GRN. Therefore the corresponding real system order Il , Rl , and Jl can be used to prune the false-positive regulations one host miRNA by one host miRNA for the real host-miRNA GRN of GEINs. By the similar procedure, in the host-lncRNA regulation model in (17.A25), AIC of hostlncRNA m could be defined as follows [40]: 0 1 T HL HL 1 HL HL ^ ^ A 1 2ðIm 1 Lm Þ AICm ðIm ; Lm Þ 5 log@ LH 2ΦHL LH (17.A35) m θm m 2 Φm θ m Tm m Tm AICHM l
where θ^ m stands for the estimated regulatory parameters of host-lncRNA m obtained from the solutions of parameter estimation problem in (17.A26), and the covariance of esti2 HL ^ HL T H HL ^ HL H mated residual error is ðσHL m Þ 5 ð1=Tm ÞðLm 2Φm θm Þ ðLm 2 Φm θm Þ. We could minimize AICHL m in (17.A35) to achieve the real regulation number Im and Lm of the real regulations of host-lncRNA m in the host-lncRNA GRN. Therefore the corresponding real system order Im and Lm can be used to prune the false-positive regulations one lncRNA by one lncRNA for the real host-lncRNA GRN of GEINs. HL
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
481
17.6 Appendix
Finally, in the pathogen-gene regulation model in (17.A29), AIC of pathogen-gene n could be defined as follows [838]: 0
1 T PG PG 1 ^ ^ @ A 1 2ðIn 1 Ln 1 Mn 1 Jn Þ (17.A36) AICPG GPn 2ΦPG GPn 2 ΦPG n θn n θn n ðIn ; Ln ; Mn ; Jn Þ 5 log Tn Tn
where θ^ n stands for the estimated regulatory parameters of pathogen-gene n obtained from the solutions of the parameter estimation problem in (17.A30), and the covariance of PG PG estimated residual error is ðσPG Þ2 5 ð1=Tn ÞðGP 2ΦPG θ^ ÞT ðGP 2 ΦPG θ^ Þ. We could minimize PG
n
n
n
n
n
n
n
AICPG n in (17.A36) to achieve the real regulation numbers In , Ln , Mn , and Jn of the real reg-
ulations of pathogen-gene n in the pathogen-gene GRN. Therefore the corresponding real system order In , Ln , Mn , and Jn can be used to prune the false-positive regulations one pathogen gene by pathogen gene for the real pathogen-gene GRN of GEINs. After applying this AIC approach to identify the system order and prune the false positives of candidate GEINs, we eventually get the real interspecies GEINs of the OKF6/ TERT-2 cells under the infection of C. albicans SC5314 and C. albicans WO-1 for each replicate as shown in Fig. 17.A3, respectively. Because of the complexity of real interspecies GEINs, it is very difficult to investigate the accurate hostpathogen cross-talk mechanisms from these interspecies GEINs. We thereby introduce PNP approach to extract the core network structures of these interspecies GEINs to help investigating the cross-talk mechanisms to get an insight into the pathogenesis of different strain C. albicans infection.
17.6.5 Extracting core network structures from real interspecies genetic and epigenetic interspecies networks by using principal network projection approach Before we prepare for applying PNP approach to extract the hostpathogen core networks (HPCNs) from real interspecies GEINs, it is essential to construct a combined network matrix H of a real interspecies GEIN. Furthermore, the combined network matrix H includes all estimated interaction and regulation parameters in the real interspecies GEIN as follows: 3 2 0 0 Hhp;hp Hhp;pp 6 Hpp;hp Hpp;pp 0 0 7 7 6 6 Hhg;hp Hhg;pp Hhg;hm Hhg;hl 7 7Aℝð2I12J1L1MÞ 3 ðI1J1L1MÞ 6 H56 0 7 7 6 Hhm;hp Hhm;pp Hhm;hm 4 Hhl;hp 0 Hhl;hm 0 5 Hpg;hp Hpg;pp Hpg;hm Hpg;hl
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
482
17. Investigating the common pathogenic mechanism for drug design
FIGURE 17.A3 The real interspecies GEINs of two replicates during the infection of Candida albicans SC5314 and C. albicans WO-1 with OKF6/TERT-2 cells, respectively. Parts (A and B) reveal the recognized real genome-wide interspecies GEINs of each replicate during C. albicans SC5314 infection. Parts (C and D) reveal the recognized real genome-wide interspecies GEINs of each replicate during C. albicans WO-1 infection. In (A and B), the real interspecies GEINs of all two replicates during the C. albicans SC5314 infection are integrated with in (E). In (C and D), the real interspecies GEINs of two replicates during the C. albicans WO-1 infection are integrated with in (F). The gray lines indicate proteinprotein interaction; the red lines represent the transcriptional regulation, and the blue lines denote miRNA repression [13]. GEIN, Genetic-and-epigenetic interspecies network; miRNA, microRNA.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
483
17.6 Appendix
where
3 2 P P 3 ^H ? b^H b d^11 ? d^1I 1J 7 ? 6 11 6 7 P H 6 7 6 7 Hhp;hp 5 4 ^ a^H ^ d^ji ^ 7 ^ 5; Hhp;pp 5 6 ^ b^ij ^ 7; Hpp;hp 5 6 4 5; in 4 5 H P P H H ^ ^aH ? aII I1 d^J1 ? d^JI b^I1 ? b^IJ 2 P 2 3 2 H 3 P 3 n^ 11 ? n^ H c^11 ? c^1J e^H ? e^H 1J 11 1I 6 7 6 7 6 7 H Hpp;pp 5 4 ^ c^Pjo ^ 5; Hhg;hp 5 4 ^ e^H ^ 5; Hhg;pp 5 4 ^ n^ kj ^ 5; ki e^H ? e^H c^P ? c^PJJ n^ H ? n^ H I1 II I1 IJ 2 J1 H 3 2 3 2 3 H H H H H ^ ^ ^ ^ 2 f 1L 2 f 11 ? h11 ? h1M ^ ^ ? o o 1I 6 7 6 7 6 11 7 H H H 6 Hhg;hm 5 6 ^ 5; 2 f^kl ^ 7 ^ 7 4 ^ 5; Hhg;hl 5 4 ^ h^km 5; Hhm;hp 5 4 ^ o^li H H H H H H o^L1 ? o^LI h^I1 ? h^IM 2 f^ ? 2 f^IL 2 P I1 3 2 3 2 3 r^11 ? r^P1J 2 q^H ? 2 q^H ? s^H s^H 11 1L 11 1I 6 7 6 7 6 7 Hhm;pp 5 4 ^ r^Plj ^ 5; Hhm;hm 5 4 ^ 2 q^H ^ 5; Hhl;hp 5 4 ^ s^H ^ 5; mi lr ? s^H s^H 2 q^H ? 2 q^H r^PL1 ? r^PLJ M1 MI L1 LL 2
a^H 1I
a^H 11
2 6 Hhl;hm 5 4 2 Hpg;hm 5 4
2 t^11 H
^ H 2 t^M1
p 2 v^ 11
^ p 2 v^ J1
2
3
? H 2 t^ml ? ? p 2 v^ nl ?
2 t^1L H
3
2
3 2 P x^ 11 u^ P11 ? u^ P1I 6 7 6 P Hpg;hp 5 4 ^ u^ ni ^ 5; Hpg;pp 5 4 ^ u^ P ? u^ PJI x^ PJ1 2 PJ1 3 w^ 11 ? w^ P1M 6 7 Hpg;hl 5 4 ^ w^ Pnm ^ 5 P P w^ J1 ? w^ JM
7 ^ 5; H 2 t^ML p 3 2 v^ 1L ^ 5; p 2 v^ JL
? P x^ nj ?
P x^ 1J
3
7 ^ 5; x^ PJJ
^H ^ HP by resolving the parameter estimation problem where a^H in and bij could be acquired in θi P in (17.A10) and pruning false positives by AIC method in (17.A31); d^ and c^H could be ji
jo
PP acquired in θ^ j by resolving the parameter estimation problem in (17.A14) and pruning H H HG false positives by AIC method in (17.A32); e^H , n^ H , 2 f^ , and h^ could be acquired in θ^ ki
kj
kl
k
km
by resolving the parameter estimation problem in (17.A18) and trimming false positives HM by AIC method in (17.A33); o^H , r^P , and 2 q^H could be acquired in θ^ by resolving the li
lj
lr
l
parameter estimation problem in (17.A22) and pruning false positives by AIC method in ^ HL ^H (17.A34); s^H mi and 2 tml could be acquired in θm by resolving the parameter estimation problem in (17.A26) and pruning false positives by AIC method in (17.A35); and u^ Pni , x^ Pnj , PG 2 v^ P , and w^ P could be acquired in θ^ by resolving the parameter estimation problem nl
nm
n
P in (17.A30) and pruning false positives by AIC method in (17.A36). a^H in and c^jo denote the
interactive abilities of intraspecies in host and pathogen PPINs during the pathogen infecH P tion, respectively; b^ij and d^ji represent the interactive abilities between host protein i and H H ^ Pni denote the regulatory pathogen protein j in the interspecies PPIN; e^H ki , o^li , s^mi , and u
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
484
17. Investigating the common pathogenic mechanism for drug design
abilities of human TF i to regulate human-gene k, human-miRNA l, human-lncRNA m, and pathogen-gene n, respectively, in human-gene GRN, human-miRNA GRN, humanP ^ Pnj signify the regulatory abilities lncRNA GRN, and pathogen-gene GRN. n^ H kj , r^lj , and x
of pathogen TF j to regulate human-gene k, human-miRNA l, and pathogen-gene n, respectively, in human-gene GRN, human-miRNA GRN, and pathogen-gene GRN during H H p the pathogen infection, respectively. 2 f^ , 2 q^H , 2 t^ , and 2 v^ correspond to the represkl
lr
ml
nl
sion abilities of human-miRNA l to inhibit human-gene k, human-miRNA r, humanlncRNA m, and pathogen-gene n, respectively, in human-gene GRN, human-miRNA H GRN, human-lncRNA GRN, and pathogen-gene GRN. h^km and w^ Pnm indicate the regulatory abilities of human lncRNA m to regulate human-gene k and pathogen-gene n, respectively, in human-gene GRN and pathogen-gene GRN. All of these estimated interactions and regulations compose of the combined network matrix H. Note that if connections or regulations have been removed via AIC or not been built in candidate GEIN via big data mining, the corresponding parameters in matrix H are padded with zero. We thereby extract the core components of interspecies GEIN by PNP approach, which is a significant network structure projection approach on the basis of the principal singular values to reduce network dimension via deleting insignificant structures. Accordingly, the combined network matrix H can be denoted by the following singular value decomposition form: H 5 U 3 D 3 VT
(17.A37)
where UAℝð2I12J1L1MÞ 3 ðI1J1L1MÞ ; VAℝðI1J1L1MÞ 3 ðI1J1L1MÞ and D 5 diagðd1 ; . . .; ds ; . . . dI1J1L1M Þ are the diagonal matrix of d1 ; d2 ; . . .; ds ; . . .; dI1J1L1M .which includes the I 1 J 1 L 1 M singular values of the combined network matrix H in descending order, that is, d1 $ ? $ ds $ ? $ dI1J1L1M . Note that diag(d1, d2) signifies the diagonal matrix of d1 and d2. However, we can define the eigenexpression fraction (Es) as the normalization of singular values: Es 5
d2s I1J1L1M P s51
;
s 5 1; 2; . . .; I 1 J 1 L 1 M
(17.A38)
d2s
From the viewpoint of energy, we need to keep the main system energy of the whole network structure in the PNP projection. Therefore we choose the minimum Z such that Z P Es $ 0:85, that is, the top Z singular vectors of network matrix H containing 85% nets51 work structure of interspecies GEIN which is composed of these top Z principal components from the viewpoint of energy. Next, we define the projection of H to the top Z singular vectors of V, as follows: V ðwR ; sÞ 5 hwR ;: 3 vT:;s wR 5 1; . . .; 2I 1 2J 1 L 1 M
and
s 5 1; . . .; Z
(17.A39)
where hwR;: and vT:;s denote the wRth row of H and the sth row of V, respectively. Eventually, we define and apply the two-norm projection value of each node, including
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
17.6 Appendix
485
FIGURE 17.A4 Cross-talk HPCN of OKF6/TERT-2 cells during the infection of Candida albicans SC5314. This cross-talk HPCN is extracted from the real interspecies GEIN in Fig. 17.A3E via PNP method. The gray lines denote proteinprotein interaction; the red lines represent the transcriptional regulation, and the blue lines indicate miRNA repression [13]. GEIN, Genetic-and-epigenetic interspecies network; miRNA, microRNA; PNP, principal network projection.
FIGURE 17.A5 Cross-talk HPCN of OKF6/TERT-2 cells during the infection of Candida albicans WO-1. This cross-talk HPCN is extracted from the real interspecies GEIN in Fig. 17.A3F via PNP method. The gray lines denote proteinprotein interaction; the red lines represent the transcriptional regulation, and the blue lines indicate miRNA repression [13]. GEIN, Genetic-and-epigenetic interspecies network; miRNA, microRNA; PNP, principal network projection.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
486
17. Investigating the common pathogenic mechanism for drug design
gene, miRNA, lncRNA, and protein in the real interspecies GEIN to the top Z rightsingular vectors in the following: " #1=2 Z X ½VR ðwR ; sÞ2 DðwR Þ 5 ; (17.A40) s51
wR 5 1; . . .; 2I 1 2J 1 L 1 M The meaning of (17.A40) is that if the projection value D(wR) approaches zero, the corresponding wR node is not important and almost independent to the core network composed of the top Z singular vectors; The larger the projection value is, the larger the contribution of the node to the core network is. Consequently, we can extract the hostpathogen core networks (HPCNs) from the GEIN of the C. albicans SC5314 and C. albicans WO-1 infection as shown in Figs. 17.A4 and 17.A5, respectively, by evaluating the projection value of each node in (17.A40). Since the aim of this chapter is considered to identify the common and specific pathogenic mechanisms of infection progression of C. albicans SC5314 and C. albicans WO-1, we target the core host/pathogen proteins with the highest projection value, and their connecting TF/miRNA/lncRNA to form the HPCN so that we can systematically investigate pathogenic mechanisms for drug design.
V. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism During Bacterial Infection on Human Cells
C H A P T E R
18 Constructing host/pathogen geneticand-epigenetic networks for investigating molecular mechanisms to identify drug targets in the infection of EpsteinBarr virus via big data mining and genome-wide NGS data identification 18.1 Introduction EpsteinBarr virus (EBV), also known as human herpesvirus 4, was first identified in 1964 by M. Anthony Epstein et al. They did the experiment on Burkitt’s lymphoma (BL) and demonstrated that the malignant cells could contain viral particles with the characteristic herpesvirus morphology, proposing the first evidence of a tumor-associated virus in humans. EBV virions have a linear, double-stranded DNA genome about 172 kb in length and encode more than 80 genes [880]. The DNA is surrounded by a protein nucleocapsid, which is enclosed by a layer of proteinaceous material called the tegument. Outside this tegument is encircled by an envelope containing lipids, surface glycoproteins, and membrane proteins, which are necessary to the infection of human B cells [881]. It is a ubiquitous virus and seriously infects more than 90% of the population worldwide. EBV mainly infects human B lymphocytes and epithelial cells, so it is associated with a variety of malignancies about them, including BL, Hodgkin’s lymphoma (HL), gastric cancer (GC), nasopharyngeal carcinoma (NPC), T/NK cell lymphoma, and AIDS- or transplantationassociated lymphoma [882]. Like other herpesviruses, EBV exists in both latent and lytic phases with respect to viral gene expression [883]. Once infection, EBV could establish a
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00001-8
489
© 2021 Elsevier Inc. All rights reserved.
490
18. Constructing host/pathogen genetic-and-epigenetic networks
lifelong latency in the infected cells, predominantly in human B cells, and might remain in human for their lifetime. In the latent phase, genomic DNA of EBV transforms into episome existing in the memory B cell, and only a limited subset of viral latent genes are expressed. Thus the human immune system cannot target them easily, resulting in allowing EBV to evade the human immune responses, so that this latent mode of infection is beneficial to EBV to persist permanently. In contrast, during the lytic phase of infection, nearly all viral lytic genes of EBV are transcribed, then the lytic replication cycle takes place, and finally it can produce many progeny virus particles as well as release to cause other primary infections [884]. It is essential that the lytic cycle infection could produce infectious viral particles, enabling virus spreading from cell to cell and host to host. The lytic infection might occur in the tonsillar plasma cells, and the differentiated oropharyngeal epithelial cells. In vitro assays indicate that hypoxia, B cell receptor (BCR) stimulation, and the transforming growth factor-beta (TGF-β) can also induce a lytic replication cycle under some circumstances [885]. EBV infects human B lymphocytes, also called B cell, and uses oriLyt as the origin of replication during its lytic cycle. In the infected B cell, EBV can receive the signals, requiring for the cleavage and packaging of lytically replicated DNA [886]. EBV can enable itself to reactivate from the memory B cell pool into lytic replication, and upon the plasma cell differentiation, an infectious virion production is initiated [887]. Lytic reactivation can cause a cascade of viral lytic genes expressed in a temporally regulated manner in three stages: the immediate-early (IE), early (E), and late (L) stage, which are accompanied by the replication of viral genomes and the production of viral particles. Following the EBV genome encapsidation, DNA packaging, and virion release, new infectious virions can infect new cells in the same host and new hosts. EBV is an oncogenic herpesvirus, and its gene transcription is regulated by the epigenetic mechanism to maintain a persistent infection and to evade the human immune system. Epigenetic silencing of the dysregulated genes by methylation leads to the tumorigenesis. EBV can employ various gene expression programs that are essential for maintaining the viral persistence and latency in many human cellular types and microenvironments. EBV latent genomes can assemble into chromatin structures with different histone and epigenetic modification patterns that can regulate viral genes expression. These epigenetic regulators such as acetyltransferases and ubiquitin proteins could also have an influence on EBV pathogenesis by evading human immune detection, performing the antiapoptosis, and driving human cell carcinogenesis. Epigenetic modifications are considered to increase the diversity, remain the stability in gene expression programs [888], and have an effect on EBV and human cellular genomes in the infected B cells. A few researches could observe the EBV pathogenic mechanism via the interspecies networks and pathways between human and EBV, but it is still a keynote to do the intensive research in a comprehensive system network level. Thus we identify the genome-wide interspecies genetic-and-epigenetic networks (GIGENs) between human and EBV at the first infection stage and second infection stage during the EBV infection via the systems biology approach. In addition, in order to find out the more specific interactions, regulations, and gene/protein functions between human and EBV, we thereby extract the hostvirus core networks (HVCNs) from the GIGENs to provide us more information of drug targets for the multimolecule drug design. Among these pathways in the HVCNs, we want to explore the detailed core pathways to investigate the relationship between the defensive and
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.1 Introduction
491
offensive mechanisms of human and attack and antagonism strategies of EBV, so we could extract the hostvirus core pathways (HVCPs) from the HVCNs. The HVCPs could also contribute to understanding the significant events and their corresponding molecular mechanisms at the first and second infection stage in detail. Systems biology approach has been considered as a powerful method to construct the multiple and complex biological networks for investigating the not only intraspecies but also interspecies molecular mechanisms by the big data mining and two-sided next-generation sequencing (NGS) data. In this chapter, we finally summarized our results of pathogenic and defensive mechanisms based on HVCNs and HVCPs during EBV infection in the following; EBV can exploit viral EBNA2 to evade immune apoptosis by interacting with CD46 and to prevent the viral translocation from being interrupted by SNHG5. Viral Zta can activate EBV early lytic genes, inhibit the human acetylation of NAT1 via viral miR-BART1-3p, and suppress the human energy metabolism of CLOCK with viral miR-BART14. Viral miR-BART5 can operate the antiapoptotic response to protect EBV from human immune attacks. Besides, viral miR-BART1-3p can also perform the antiapoptosis and may hijack the acetylation function by repressing NAT1, so that EBV can exploit other acetyltransferases to perform the acetylation at the first and second infection stage. In addition, viral EBNA3B may hijack the ubiquitination function of RNF41 by interacting with PSME3, so that EBV can exploit other ubiquitin proteins to operate the ubiquitination at the first and second infection stage. Thus EBV can block the autophagy mechanism through the inhibited ATG5 by viral miR-BART14 and through the acetylation and ubiquitination to restrict the expressions of BECN1 and RAB7A. Then, viral proteins (EBNA3B, BALF4, and BDLF4) can hijack the autophagosomes. Viral EBNA2 can also promote the infected cell proliferation. Viral BHRF1 can prevent premature death of the human cell during the virus production, and viral BFRF3 can participate in the assembly of the infectious particles. Viral BDLF4 may collaborate with BALF4 and EBNA3B so that they could contribute to promote the viral production and the intracellular transportation of virions. LMP1 is found to prevent the apoptosis of the infected B cells to drive their proliferation. BFLF2 participates in the virion nuclear egress, and the viral receptor BBRF3 is found crucial for the virion assembly and egress. Viral LMP2B and LMP1 could function to activate human B cell by interacting with BLLF2, and both of them work with EBNA3B so that EBNA3B can mediate the immune evasive transport via autophagic vesicles. RBPMS could act as the latentlytic switch in EBV by interacting with STAT3. EBV utilizes LMP1 to maintain the transformational state in the latency and to help virions release from B cells in the lytic phase, and LMP1 could enhance the envelopment pathways during the lytic phase. EBV is found to increase the abilities of the immunosuppression of viral BCRF1 through the degraded HNRNPU via the ubiquitination. Viral EBNA1 can mediate the disruption of promyelocytic leukemia (PML) to block the proapoptotic signal. Then, the proapoptosis of ARRB2 is inhibited through the ubiquitination and the suppression of EBNA1. Viral miR-BART14 can repress AFG3L1P to trigger the proapoptosis. Viral miR-BART10 could repress PCBP2 to prevent the virion production from restriction. EBV is also found to exploit the ubiquitination to degrade BAX and to decrease the expression of STAT3 and PCBP2, which could participate in the transformation from the lytic phase into the latent phase. Viral BNLF2B and miR-BART1-3p are found to exploit PRKACB to promote the envelope assembly and intracellular transport of virions. EBV enables to promote the tRNA
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
492
18. Constructing host/pathogen genetic-and-epigenetic networks
splicing of FAM98B through the degraded NFATC2 to increase the expression of the late lytic genes and the genetic diversity about cell packaging, assembly, and cell transport. The activated EBNA1 can help virions for production and promote the cell transportation via TRIM3. Viral miR-BART14 can help CLIC5 operate the cell transport by repressing AFG3L1P, and EBV can help NRP1 activate the cell transport through the degraded STAT3. In this chapter, we have discussed the potential viral drug target proteins and miRNAs, which are inferred from the HVCNs, HVCPs, and EBV-related literature review, so we have proposed the multimolecule drugs composed of Thymoquinone (TQ), Valpromide (VPM), and Zebularine (Zeb) as the therapeutic inhibitors of EBV-associated malignancies. These multimolecule drugs can inhibit the switch from the latent phase into the lytic phase during the viral reactivation and also suppress the expression of some critical EBV lytic genes/proteins during the EBV infection to interrupt the viral production of virions, interfere the transportation of viral particles, and destroy the viral defensive mechanisms.
18.2 Materials and methods 18.2.1 Overview of the construction for interspecies GIGENs in human B cells infected with EBV during the lytic production phase The flowchart of the progression for constructing the interspecies GIGENs, the HVCNs, and the HVCPs in human B cells infected with EBV at the first and second infection stage in the lytic phase is shown in Fig. 18.1. The interspecies GIGENs are composed of human/ EBV gene/miRNA/lncRNA regulatory networks (GRNs), human/EBV proteinprotein interaction networks (PPINs), the interspecies PPINs, and the interspecies GRNs. We may consider the constructions of the interspecies GIGENs, the HVCNs, and the HVCPs under the following steps: (1) big data mining and data preprocessing for establishing the candidate interspecies GIGEN; (2) the identifying process for detecting the real interspecies GIGENs by pruning false positives from candidate interspecies GIGEN via the system identification approach and the system order detection scheme by using the genome-wide NGS data of human B cells and EBV during the EBV lytic reactivated infection; and (3) the extraction of the HVCNs by applying the principal network projection (PNP) method to the real interspecies GIGENs. These procedures could identify the crucial and specific interspecies mechanisms at the both infection stages during the EBV lytic phase.
18.2.2 Big data mining and data preprocessing of NGS data for human and EBV and methylation data for human NGS datasets were obtained from the NCBI Gene Expression Omnibus (GEO). A study by Tina O’Grady et al. have demonstrated that EBV reactivation includes the ordered induction of approximately 90 viral genes that are involved in the production of infectious virions [889]. They have found out extensive bidirectional transcription stretching across nearly the entire genome, and estimated that there are probably hundreds more EBV genes expressed during the EBV reactivation than has been previously known, and suggested that the complexity of the viral genome during the EBV reactivation might be much greater. They have
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
493
18.2 Materials and methods
EBV EBV EBV intraspecies intraspecies intraspecies miRNA transcriptional protein–protein regulation repression interaction database database database
Human Human Human intraspecies intraspecies intraspecies miRNA transcriptional protein–protein regulation interaction repression database database database
Human-EBV Human-EBV Human-EBV interspecies interspecies interspecies transcriptional protein–protein miRNA repression regulation interaction database database database
Big data mining Establishment of candidate genome-wide interspecies genetic-andepigenetic - network (GIGEN)
EBV
0, 1/12, 0.5, 1, 2, 4, 8, 24, 48 h postreactivated
Human B cells
0, 1/12, 0.5, 1, 2, 4, 8, 24, 48 h postreactivated
Defined dynamic models of GIGEN
NGS data
NGS data
System identification approach and system order detection scheme
GIGEN at the first infection stage
Human genome-wide DNA methylation profiles
Validate epigenetic DNA methylation
GIGEN at the second infection stage
Principal network projection (PNP)
Host–virus core network (HVCN) at the first infection stage
Host–virus core network (HVCN) at the second infection stage
Host–virus core pathway (HVCP)
Host–virus core pathway (HVCP)
at the first infection stage
at the second infection stage
Drug design by searching literature information
Human genome-wide DNA methylation profiles
Validate epigenetic DNA methylation
Predicted multiple drug targets Designed multimolecule drugs
FIGURE 18.1 The flowchart of constructing interspecies GIGEN network and HVCP for multiple drug targets and potential multimolecule drug via systems biology approach. The blocks filled with gray indicate the input information exploited in this process, including big data mining to establish the candidate GIGEN, NGS data to acquire the gene expression of human and EBV during the lytic phase, the genome-wide DNA methylation profiles to verify the epigenetic regulation of DNA methylation on human genomes, and the literature information about multimolecule drugs for multimolecule drug design based on the predicted drug targets; the blocks with gray frame represent systems biology approach exploited to provide the identified information in the proposed results; and the white blocks with solid line frame are our corresponding results obtained from these processes [14]. EBV, EpsteinBarr virus; GIGEN, genome-wide interspecies genetic-andepigenetic network; HVCP, hostvirus core pathway; NGS, next-generation sequencing.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
494
18. Constructing host/pathogen genetic-and-epigenetic networks
changed our view of virion production process during the EBV lytic phase. The NGS data are obtained from this study containing both the human (hg19 assembly) and the Akata EBV genomes with the time course through GEO series with accession number GSE52490 [889]. There are two parts in the raw data of the NGS dataset. One part involves the gene expression level of human B cells at 0, 1/12, 0.5, 1, 2, 4, 8, 24, 48 h postreactivation with the EBV lytic infection. The other part involves the gene expression level of EBV at 0, 1/12, 0.5, 1, 2, 4, 8, 24, 48 h postreactivation in the EBV lytic phase. It contains 44,446 probes in human and 134 probes in EBV. From the literature review we know that the EBV lytic phase can be classified into three stages, including the immediate-early (IE) stage, early (E) stage, and late (L) stage, which have shown that expression dynamics are consistent with those previously observed via microarray technology [889]. Thus we sketch the changes of genes expression level of typical lytic genes based on the classification mentioned earlier, and the result is shown in Fig. 18.2 drawn on the basis of the classification from literature reviews [890]. From Fig. 18.2 we have suggested that it had to use cubic spline method for data interpolation to obtain the sufficient number of data points at these time courses and the information after 48 h postreactivation for avoiding the overfitting when performing the system identification approach to construct real GIGENs by the corresponding gene expression data. Therefore the cubic spline method is applied to interpolate these data from 0 to 72 h, and the lytic phase is classified into the first infection stage from 0 to 24 h and the second infection stage from 8 to 72 h based on the changes of peak wave in Fig. 18.2. In addition, the analysis of variance (ANOVA) statistics is applied to these NGS data of mRNA expression of human and EBV in order to evaluate the P-value about the differential expression data between the first and second infection stage. The candidate interspecies GIGEN is constructed through big data mining from numerous databases that contain many experimental data and bioinformatic (computational) predictions. The human candidate PPIN was obtained from BioGRID [891], DIP [534], BIND [533], IntAct [892], and VirusMINT [893]. The human candidate GRN comprises transcription factors (TFs)/TF complexes regulating genes, lncRNAs regulating genes, and miRNAs repressing genes that are available at HTRIdb [671], ITFP [672], TargetScan (http://www. targetscan.org/), and CircuitsDB 2 [761]. The interspecies candidate PPIN and EBV candidate PPIN need the interactions of interspecies and intraspecies that are obtained from VirusMentha [894], CDFD (http://www.cdfd.org.in/labpages/computational_biology.html), Virhostome [895], IMEx [896], and PSICQUIC [897]. The interspecies candidate GRN and EBV candidate GRN involve TFs regulating genes and miRNA repressing genes of interspecies and intraspecies that are collected from VIRmiRNA [898], ViRBase [899], miRecords [900], starBase v2.0 [901], and miRTarBase [521]. In order to support the inference of human target genes that are subjected to the epigenetic regulation of DNA methylation, we exploited the genome-wide DNA methylation profiles of B cell and B cell immortalized (GSE41957) [902], uninfected and infected with EBV, respectively (with a sample size 6). We applied the ANOVA statistics to these DNA methylation data. We indicated that the analytical results of DNA methylation profiles in the uninfected and infected conditions could represent the regulations of DNA methylations at the both infection stages. In addition, DNA methylations of human genes were determined by the differentially significant changes of basal level δ(h)k between the first and second infection stage in the dynamic model Eq. (18.3).
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
495
18.2 Materials and methods
FIGURE 18.2 The changes of gene
Immediate-early (IE)
150
expression levels of typical lytic genes based on the classification from literature review. The NGS data of these typical lytic genes are sequenced by the experiment of RPKM at every time point and are classified as immediate-early (IE), early (E), and late (L) stage during the lytic phase on the basis of the classification from literature review [14]. NGS, Next-generation sequencing; RPKM, reads per kilobase per million mapped reads.
BRLF1 BZLF1 BMLF1
RPKM
100
50
0 12 4
8
24
48
Early (E) 400
BMRF1 BBLF3 BBLF4 BGLF5 BNLF2A BSLF1
350
RPKM
300 250 200 150 100 50 0 12 4
8
24
48
Late (L) 250
BCRF1 BVRF2 BDLF1 BLLF1 BCLF1
RPKM
200
150
100
50
0 12 4
8
24 Time (h)
48
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
496
18. Constructing host/pathogen genetic-and-epigenetic networks
As a result, in the intraspecies candidate PPIN, we could obtain 301 EBV PPI pairs and 23,570,918 human PPI pairs; in the interspecies candidate PPIN, we could obtain 5135 human-EBV PPI pairs. In the intraspecies candidate GRN, we could obtain 5 EBV TF-gene pairs, 67 EBV miRNA-gene pairs, 906,611 human TF-gene pairs, 817,900 human miRNAgene pairs, and 1948 human lncRNA-gene pairs; in the interspecies candidate GRN, we could obtain 1252 EBV TF-human gene pairs, 39,772 EBV miRNA-human gene pairs, 1355 human TF-EBV gene pairs, and 1718 human miRNA-EBV gene pairs. Among intraspecies human candidate GRN, there are three human TF complexes. The first one is ARNT:: AHR, which has 6368 human TF-gene pairs; the second one is HIF1A::ARNT, which has 1011 human TF-gene pairs; and the third one is NFE2L1::MAFG, which has 5787 human TF-gene pairs. In conclusion, we built the candidate interspecies GIGEN composed of many candidate pairs mentioned previously, and then we detected the real interspecies GIGENs by pruning the false positives from the corresponding candidate interspecies GIGEN via the system identification approach and the system order detection scheme by using the genome-wide NGS data of human B cells and EBV at the both infection stages during the EBV lytic phase.
18.2.3 Dynamic models of the interspecies GIGENs for human B cells and EBV during the lytic infection process The candidate interspecies GIGEN is composed of the experimental and computational predictions, which will result in a number of false-positive interactions and regulations. The false positives of the candidate interspecies GIGEN should thereby be pruned to construct real interspecies GIGENs by using the genome-wide NGS data of human B cells and EBV through the system identification approach and the system order detection scheme. We could then extract the core GIGENs by the PNP scheme to characterize the principal biological mechanisms in GIGENs [11,903]. The PPI of human-protein i in the candidate PPIN can be described as the following stochastic dynamic equation: ðhÞ pðhÞ i ðt 1 1Þ 5 pi ðtÞ 1
Ni X
ðhÞ ðhÞ αðhÞ in pn ðtÞpi ðtÞ 1
Ji X ðνÞ ðhÞ γ ðhÞ ij pj ðtÞpi ðtÞ
n51 j51 ðhÞ ðhÞ ðhÞ ðhÞ ðhÞ 2σi pi ðtÞ 1 λi gi ðtÞ 1 β i 1 εðhÞ i ðtÞ; ðhÞ for i 5 1; 2; . . .; I; 2σðhÞ # 0; and λ i i $0
(18.1)
where pði hÞ ðtÞ, pðnhÞ ðtÞ, gði hÞ ðtÞ, and pðj ν Þ indicate the expression levels of human-protein i, human-protein n, human-gene i, and EBV-protein j at time t, respectively; αðinhÞ and γðijhÞ represent the interactive abilities between human-protein n and human-protein i and between EBV-protein j and human-protein i, respectively; 2σði hÞ , λði hÞ , and β ði hÞ denote the degradation rate, the translation effect, and the basal level of human-protein i, respectively; the basal level β ði hÞ denotes the interactions with unknown factors, for example, acetylation and ubiquitination. Ni and Ji mean the numbers of human proteins and EBV proteins interacting with human-protein i in the candidate GIGEN, respectively; and εði hÞ ðtÞ is the stochastic noise of human-protein i owing to model uncertainty or other uncertain factors at time t.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
497
18.2 Materials and methods
Note that the biological interaction mechanism of human proteins in (18.1) involves the P i ðhÞ ðhÞ PJ i ðhÞ ðν Þ ðhÞ intraspecies human PPIs by Nn51 αin pn ðtÞpði hÞ ðtÞ, the interspecies PPIs by j51 γ ij pj ðtÞpi ðtÞ, ðhÞ ðhÞ ðhÞ ðhÞ the protein degradation by 2σi pi ðtÞ, the protein translation by λi gi ðtÞ, the basal level by β ðihÞ , and the stochastic noise by εði hÞ ðtÞ. The PPI of EBV-protein j in the candidate PPIN can be described as the following stochastic dynamic equation: ðνÞ pðνÞ j ðt 1 1Þ 5 pj ðtÞ 1
Mj X
ðνÞ ðνÞ αðνÞ jm pm ðtÞpj ðtÞ 1
m51
Ij X
ðhÞ ðνÞ γ ðνÞ ji pi ðtÞpj ðtÞ
i51
ðνÞ ðνÞ ðνÞ ðνÞ ðνÞ 2σðνÞ j pj ðtÞ 1 λj gj ðtÞ 1 β j 1 εj ðtÞ;
(18.2)
for j 5 1; 2; . . .; J; 2σjðνÞ # 0; and λðνÞ j $0 where pðj ν Þ ðtÞ, pðmν Þ ðtÞ, gðj ν Þ ðtÞ, and pði hÞ ðtÞ signify the expression levels of EBV-protein j, EBVνÞ protein m, EBV-gene j, and human-protein i at time t, respectively; αðjm and γ ðjiν Þ show the interactive abilities between EBV-protein m and EBV-protein j and between humanprotein i and EBV-protein j, respectively; 2 σðj νÞ , λðj ν Þ , and β ðj ν Þ correspond to the degradation rate, the translation effect, and the basal level of EBV-protein j, respectively; Mj and Ij stand for the numbers of EBV proteins and human proteins interacting with EBV-protein j in the candidate GIGEN, respectively; and εðj ν Þ ðtÞ is the stochastic noise of EBV-protein j owing to model uncertainty or other uncertain factors at time t. Note that the biological interaction mechanism of EBV proteins in (18.2) involves the intraspecies EBV PPIs by PMj PIj ðν Þ ðν Þ ðν Þ ðν Þ ðhÞ ðν Þ m51 αjm pm ðtÞpj ðtÞ, the interspecies PPIs by i51 γ ji pi ðtÞpj ðtÞ, the protein degradation by ðν Þ ðν Þ ðν Þ ðν Þ ðν Þ 2σj pj ðtÞ, the protein translation by λj gj ðtÞ, the basal level by β j , and the stochastic noise by εðj νÞ ðtÞ. The GRN of human-gene k in the candidate GRN can be described as the following stochastic dynamic equation: ðhÞ gðhÞ k ðt 1 1Þ 5 gk ðtÞ 1
0
ðhÞ aðhÞ ki pi ðtÞ 1
00
Ik Ik X ðhÞ X ðhÞ ζ kðI 00 ði0 21Þ1i00 Þ pðhÞ i0 ðtÞpi00 ðtÞ k
i0 51 i00 51 Jk Lk X X ðhÞ ðhÞ ðhÞ ðνÞ ðhÞ 2 bðhÞ w ðtÞg ðtÞ 1 c o ðtÞ 1 dðhÞ kr r k kl l kj pj ðtÞ r51 j51 l51 Qk X ðhÞ ðhÞ ðhÞ ðhÞ ðhÞ ðνÞ 2 eðhÞ kq wq ðtÞgk ðtÞ 2 μk gk ðtÞ 1 δk 1 ωk ðtÞ; q51 ðhÞ ðhÞ k 5 1; 2; . . .; K; 2bðhÞ kr # 0; 2ekq # 0; and 2μk # 0 Rk X
for
Ik X i51
(18.3)
ðh Þ ðh Þ ðhÞ ðν Þ ðν Þ where gðkhÞ ðtÞ, pði hÞ ðtÞ, pðhÞ i ðtÞpiv ðtÞ, wr ðtÞ, ol ðtÞ, pj ðtÞ, and wq ðtÞ indicate the expression levels 0 of human-gene k, human-TF i, human-TF complex i ::iv, human-miRNA r, human-lncRNA l, EBV-TF j, and EBV-miRNA q at time t, respectively; human-TF complex i0 ::iv is composed of ðh Þ ðh Þ ðh Þ ðhÞ human-TF i0 and human-TF iv; aðkihÞ , ζ ðhÞ kðIvk ði0 21Þ1ivÞ , 2bkr , ckl , dkj , and 2ekq represent the regula0 tory abilities of human-TF i regulation, human-TF complex i ::iv regulation, human-miRNA r repression, human-lncRNA l regulation, EBV-TF j regulation, and EBV-miRNA q repression on human-gene k, respectively; 2μðkhÞ and δðkhÞ denote the degradation rate and the basal level
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
498
18. Constructing host/pathogen genetic-and-epigenetic networks
of human-gene k, respectively; Remarkably, regarding the regulation ability ζ ðhÞ kðIvk ði0 21Þ1ivÞ of the human TF complex on the human-gene k, the index I 0k ði 2 1Þ 1 iv assures the appropriate coorðhÞ ðhÞ dinate of the regulation ability ζ ðhÞ kðIvk ði0 21Þ1ivÞ of the human TF complex pi ðtÞpiv ðtÞ in the human GRN of system matrix of the human-gene k., that is, the regulation abilities of human TF complexes on the human-gene k can be arranged to an one row matrix as follows: ðhÞ ðhÞ ðhÞ ðhÞ ðhÞ ðhÞ ðhÞ ðhÞ ðhÞ ðhÞ ζ k1 ; ζ k2 ; . . .; ζ kðIv ; ζ kðIv 11Þ ; ζ kðIv 12Þ ; . . .; ζ kð2Iv ; ζ ðhÞ kð2Ivk 11Þ ; ζ kð2Ivk 12Þ ; . . .; ζ kð3Ivk Þ ; . . .; ζ kðIvk ði0 21Þ1ivÞ ; . . .; ζ kðIv I Þ ; The basal kÞ kÞ k k ðhÞ 0 level δk denotes regulations from other unknown regulators. Ik, I k , Ivk , Rk, Lk, Jk, and Qk mean the numbers of human TFs, human TF complex subunit i0 , human TF complex subunit iv, human miRNAs, human lncRNAs, EBV TFs, and EBV miRNAs regulating human-gene k in the candidate GIGEN, respectively; and ωðkhÞ ðtÞ is the stochastic noise of human-gene k owing to model uncertainty or other uncertain factors at time t, for example, methylation and histone modification. Note that the biological regulatory mechanism of human genes P in (18.3) involves human-TF transcription regulations by Ii51 aðkihÞ pðihÞ ðtÞ, human-TF complex PI 0 PIv ðhÞ ðh Þ ðhÞ transcription regulations by iv51 ζ kðIvk ði0 21Þ1ivÞ pi0 ðtÞpiv ðtÞ, human-miRNA repressions by i0 51 P PR ðhÞ ðhÞ ðhÞ ðhÞ L 2 r51 bkr wr ðtÞgðkhÞ ðtÞ, human-lncRNA regulations by l51 ckl ol ðtÞ, EBV-TF transcription regulaPQ ðhÞ ðν Þ PJ ðhÞ ðν Þ tions by j51 dkj pj ðtÞ, EBV-miRNA repressions by 2 q51 ekq wq ðtÞgðkhÞ ðtÞ, the mRNA degradation by 2μðkhÞ gðkhÞ ðtÞ, the basal level by δðkhÞ , and the stochastic noise by ωðkhÞ . It seems reasonable to suppose that DNA methylation may exhibit the robust connection with gene expression changes and be associated with the transcriptional activity of human genes. DNA methylation has influence on the dynamics and stability of RNA polymerase II elongation, so that the intragenic DNA methylation could coordinate the differential gene expression via alternative promoters or splicing [904]. Thus we supposed that the differential changes of basal level δðkhÞ of human-gene k between the first and second infection stage in Eq. (18.3) are mainly due to DNA methylation during the EBV lytic infection. The GRN of EBV-gene s in the candidate GRN can be described as the following stochastic dynamic equation: 0 k k
ðνÞ gðνÞ s ðt 1 1Þ 5 gs ðtÞ 1
Is X
Is X Ivs X ðhÞ ðhÞ ζ ðνÞ sðIv ði0 21Þ1ivÞ pi0 ðtÞpiv ðtÞ s
i0 51 iv51 Js Rs Ls X X X ðhÞ ðνÞ ðhÞ ðνÞ 2 bðνÞ cðνÞ dðνÞ sr wr ðtÞgs ðtÞ 1 sj pj ðtÞ sl ol ðtÞ 1 r51 j51 l51 Qs X ðνÞ ðνÞ ðνÞ ðνÞ ðνÞ ðνÞ 2 esq wq ðtÞgðνÞ s ðtÞ 2 μs gs ðtÞ 1 δ s 1 ωs ðtÞ; q51 ðνÞ ðνÞ s 5 1; 2; . . .; S; 2 bðνÞ sr # 0; 2 esq # 0; and 2 μs # 0 i51
for
0
ðhÞ aðνÞ si pi ðtÞ 1
(18.4)
ðνÞ , 2bðsrν Þ , where gðsν Þ ðtÞ signifies the expression level of EBV-gene s at time t; aðsiν Þ , ζ sðIv 0 s ði 21Þ1ivÞ cðslν Þ , dðsjν Þ , and 2eðsqν Þ show the regulatory abilities of human-TF i regulation, human-TF complex i0 ::iv regulation, human-miRNA r repression, human-lncRNA l regulation, EBV-TF j regulation, and EBV-miRNA q repression on EBV-gene s, respectively; 2μðsν Þ and δðsν Þ correspond to the degradation rate and the basal level of EBV-gene s, respectively; and ωðsν Þ ðtÞ is the stochastic noise of EBV-gene s owing to the model uncertainty or other uncertain factors at time t. Note that the biological regulatory mechanism of EBV genes in (18.4) P s ðν Þ ðhÞ involves human-TF transcription regulations by Ii51 asi pi ðtÞ, human-TF complex tranPI0s PIvs ðνÞ ðhÞ ðh Þ scription regulations by iv51 ζ sðIvs ði0 21Þ1ivÞ pi0 ðtÞpiv ðtÞ, human-miRNA repressions by i0 51
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
499
18.2 Materials and methods
P Rs
P
ðν Þ ðhÞ ðν Þ r51 bsr wr ðtÞgs ðtÞ, human-lncRNA PJs ðν Þ ðν Þ lations by j51 dsj pj ðtÞ, EBV-miRNA
s regulations by Ll51 cðslν Þ oðl hÞ ðtÞ, EBV-TF transcription reguPQs ðν Þ ðν Þ repressions by 2 q51 esq wq ðtÞgðsνÞ ðtÞ, the mRNA degradaðν Þ ðν Þ tion by 2μs gs ðtÞ, the basal level by δðsν Þ , and the stochastic noise by ωðsνÞ ðtÞ.
2
The GRN of human-lncRNA z in the candidate GRN can be described as the following stochastic dynamic equation: ðLÞ gðLÞ z ðt 1 1Þ 5 gz ðtÞ 1
Iz X
0
ðhÞ aðLÞ zi pi ðtÞ 1
Iz X Ivz X ðhÞ ðhÞ ζ ðLÞ zðIv ði0 21Þ1ivÞ pi0 ðtÞpiv ðtÞ z
i0 51 iv51 Jz Rz Lz X X X ðhÞ ðνÞ ðhÞ ðLÞ 2 bðLÞ cðLÞ dðLÞ zr wr ðtÞgz ðtÞ 1 zj pj ðtÞ zl ol ðtÞ 1 r51 j51 l51 Qz X ðLÞ ðνÞ ðLÞ ðLÞ ðLÞ ðLÞ 2 eðLÞ zq wq ðtÞgz ðtÞ 2 μz gz ðtÞ 1 δz 1 ωz ðtÞ; q51 ðLÞ ðLÞ z 5 1; 2; . . .; Z; 2bðLÞ zr # 0; 2ezq # 0; and 2μz # 0 i51
for
(18.5)
where gðzLÞ ðtÞ stands for the expression level of human-lncRNA z at time t; aðziLÞ , ζ ðLÞ zðIvz ði0 21Þ1ivÞ , 2bðzrLÞ , cðzlLÞ , dðzjLÞ , and 2eðzqLÞ indicate the regulatory abilities of human-TF i regulation, humanTF complex i0 ::iv regulation, human-miRNA r repression, human-lncRNA l regulation, EBV-TF j regulation, and the EBV-miRNA q repression on human-lncRNA z, respectively; 2μðzLÞ and δðzLÞ denote the degradation rate and the basal level of human-lncRNA z, respectively; and ωðzLÞ ðtÞ is the stochastic noise of human-lncRNA z owing to the model uncertainty or other uncertain factors at time t. Note that the biological regulatory mechanism of human lncRNAs in (18.5) involves human-TF transcription regulations by PIz ðLÞ ðhÞ PI 0z PIvz ðLÞ ðh Þ ðhÞ iv51 ζ zðIvz ði0 21Þ1ivÞ pi0 ðtÞpiv ðtÞ, i0 51 i51 azi pi ðtÞ, human-TF complex transcription regulations by PRz ðLÞ ðhÞ ðLÞ human-miRNA repressions by 2 r51 bzr wr ðtÞgz ðtÞ, human-lncRNA regulations by PLz ðLÞ ðhÞ PJ z ð L Þ ð ν Þ j51 dzj pj ðtÞ, EBV-miRNA repressions by l51 c o ðtÞ, EBV-TF transcription regulations by PQz zl ðLl Þ ðν Þ 2 q51 ezq wq ðtÞgðzLÞ ðtÞ, the mRNA degradation by 2μðzLÞ gðzLÞ ðtÞ, the basal level by δðzLÞ , and the stochastic noise by ωðzLÞ ðtÞ. The GRN of human-miRNA f in the candidate GRN can be described as the following stochastic dynamic equation: ðhÞ xðhÞ f ðt 1 1Þ 5 xf ðtÞ 1
If X
I0
ðhÞ aðhÞ fi pi ðtÞ 1
i51
2
Rf X
ðhÞ
ðhÞ bfr wðhÞ r ðtÞxf ðtÞ 1
r51 Qf
2
Ivf f X X i0 51 iv51
Jf X
ðhÞ
ðhÞ ζ fðIvf ði0 21Þ1ivÞ pðhÞ i0 ðtÞpiv ðtÞ
ðhÞ
dfj pðνÞ j ðtÞ
(18.6)
j51
X ðhÞ ðhÞ ðhÞ ðhÞ ðhÞ ðhÞ efq wðνÞ q ðtÞxf ðtÞ 2 ρf xf ðtÞ 1 ηf 1 ψf ðtÞ; q51 ðhÞ
ðhÞ for f 5 1; 2; . . .; F; 2bfr # 0; 2eðhÞ fq # 0; and 2ρf # 0 ðhÞ
where xðfhÞ ðtÞ represents the expression level of human-miRNA f at time t; afiðhÞ , ζ fðIv ði0 21Þ1ivÞ , f ðhÞ ðhÞ ðhÞ 2bfr , dfj , and 2 e fq mean the regulatory abilities of human-TF i regulation, human-TF
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
500
18. Constructing host/pathogen genetic-and-epigenetic networks
complex i0 ::iv regulation, human-miRNA r repression, EBV-TF j regulation, and EBVmiRNA q repression on human-miRNA f, respectively; 2ρðf hÞ and ηðf hÞ signify the degradation rate and the basal level of human-miRNA f, respectively; and ψðf hÞ ðtÞ is the stochastic noise of human-miRNA f owing to model uncertainty or other uncertain factors at time t. Note that the biological regulatory mechanism of human miRNAs in (18.6) involves PIf ðhÞ ðhÞ human-TF transcription regulations by i51 afi pi ðtÞ, human-TF complex transcription reguPRf ðhÞ ðhÞ PIf0 PIvf ðhÞ ðhÞ ðhÞ bfr wr ðtÞxðhÞ lations by i0 51 iv51 ζ fðIvf ði0 21Þ1ivÞ pi0 ðtÞpiv ðtÞ, human-miRNA repressions by 2 r51 f ðtÞ, PJf ðhÞ ðνÞ d p ðtÞ , EBV-miRNA repressions by EBV-TF transcription regulations by j51 fj j PQf ðhÞ ðνÞ ðhÞ ðh Þ ðhÞ ðhÞ 2 q51 efq wq ðtÞxf ðtÞ, the mRNA degradation by 2ρf xf ðtÞ, the basal level by ηf , and the stochastic noise by ψðf hÞ ðtÞ. The GRN of EBV-miRNA u in the candidate GRN can be described as the following stochastic dynamic equation: ðνÞ xðνÞ u ðt 1 1Þ 5 xu ðtÞ 1
Iu X
0
ðνÞ ðhÞ aui pi ðtÞ 1
Iu X Ivu X ðνÞ ðhÞ ζ uðIvu ði0 21Þ1ivÞ pðhÞ i0 ðtÞpiv ðtÞ
i51 i0 51 iv51 Ju X ðνÞ ðhÞ ðνÞ 2 bur wr ðtÞxðνÞ ðtÞ 1 duj pðνÞ u j ðtÞ r51 j51 Qu X ðνÞ ðνÞ ðνÞ ðνÞ ðνÞ euq wq ðtÞxðνÞ 2 u ðtÞ 2 ρu xu ðtÞ 1 ηu q51 Ru X
1 ψuðνÞ ðtÞ;
(18.7)
ðνÞ
ðνÞ for u 5 1; 2; . . .; U; 2 bur # 0; 2 eðνÞ uq # 0; and 2 ρu # 0 ðνÞ
ðνÞ
where xðuν Þ ðtÞ shows the expression level of EBV-miRNA u at time t; aðνÞ ui , ζ uðIvu ði0 21Þ1ivÞ , 2bur , ðνÞ ðνÞ duj , and 2euq correspond to the regulatory abilities of the human-TF i regulation, human-TF complex i0 ::iv regulation, human-miRNA r repression, EBV-TF j regulation, and EBV-miRNA q repression on EBV-miRNA u, respectively; 2ρðuν Þ and ηðuν Þ stand for the degradation rate and the basal level of EBV-miRNA u, respectively; and ψðuν Þ ðtÞ is the stochastic noise of EBV-miRNA u owing to model uncertainty or other uncertain factors at time t. Note that the biological regulatory mechanism of EBV miRNAs in (18.7) involves P u ðνÞ ðhÞ aui pi ðtÞ, human-TF complex transcription reghuman-TF transcription regulations by Ii51 P u ðνÞ ðhÞ P0 Pu ðνÞ ðhÞ ζ uðIvu ði0 21Þ1ivÞ piðhÞ ðtÞp ðtÞ , human-miRNA repressions by 2 Rr51 bur wr ðtÞxðνÞ ulations by Ii0u51 Iviv51 0 u ðtÞ, iv PJu ðνÞ ðνÞ EBV-miRNA repressions by EBV-TF transcription regulations by j51 duj pj ðtÞ, P u ðνÞ ðνÞ ðνÞ ðν Þ ðν Þ 2 Q e w ðtÞx ðtÞ , the mRNA degradation by 2ρ x ð t Þ , the basal level by ηðuν Þ , and the stou u u q51 uq q chastic noise by ψðuν Þ ðtÞ.
18.2.4 System identification approach of the dynamic models of GIGENs After establishing the stochastic dynamic model Eqs. (18.1)(18.7) of the GIGENs, we identify the interactive parameters of PPIN in (18.1) and (18.2), and the regulatory parameters of GRN in (18.3)(18.7) by using the system identification approach to solve these parameter estimation problems for the purpose of pruning the false positives in these infection conditions. Thus we rewrite human PPIN Eq. (18.1) as the following linear
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
501
18.2 Materials and methods
regression form: ðhÞ ðhÞ pðhÞ i ðt 1 1Þ 5 p1 ðtÞpi ðtÞ ?
ðhÞ ðνÞ ðhÞ pðhÞ Ni ðtÞpi ðtÞ p1 ðtÞpi ðtÞ
2
αðhÞ i1 ^ αðhÞ iNi
3
7 6 7 6 7 6 7 6 7 6 ðhÞ γ i1 7 i6 7 6 ðhÞ ðhÞ ðhÞ 6 ^ 7 1 εðhÞ ðtÞ; ? pðνÞ Ji ðtÞpi ðtÞgi ðtÞpi ðtÞ1 6 i 7 6 γ ðhÞ 7 iJi 7 6 7 6 7 6 λðhÞ i 7 6 4 1 2 σðhÞ 5 i
β ðhÞ i ðhÞ for i 5 1; 2; . . .; I; 2σi # 0; and λðhÞ i $0
(18.8)
which could be simply represented as follows: ðhÞ HP HP pðhÞ i ðt 1 1Þ 5 φi ðtÞθi 1 εi ðtÞ;
ðhÞ for i 5 1; 2; . . .; I; 2σðhÞ i # 0; and λi $ 0
(18.9)
where φHP i ðtÞ data, and θHP i
indicates the regression vector obtained from the corresponding expression denotes the unknown interaction parameter vector of human-protein i in the human PPIN to be estimated. Eq. (18.9) could be augmented for Yi data points of humanprotein i as follows: 3 3 2 2 3 pðhÞ εðhÞ φHP i ðt2 Þ i ðt1 Þ i ðt1 Þ 7 7 6 HP 6 ðhÞ 6 7 7 6 φi ðt2 Þ 7 6 pðhÞ 7θHP 1 6 i ðt3 Þ 754 6 εi ðt2 Þ 7; 6 i 5 5 4 ^ 5 4 ^ ^ φHP εðhÞ pðhÞ i ðtYi Þ i ðtYi Þ i ðtYi 1 1Þ 2
ðhÞ for i 5 1; 2; . . .; I; 2σðhÞ i # 0; and λi $ 0
(18.10)
where Yi is the number of data points of protein expression. Thus we defined the notations HP Pði hÞ , ΦHP to represent Eq. (18.10) as follows: i , and Ξi ðhÞ HP HP HP PðhÞ for i 5 1; 2; . . .; I; 2σðhÞ i 5 Φi θ i 1 Ξ i ; i # 0; and λi $ 0 2 3 2 3 2 3 φHP pðhÞ εðhÞ i ðt1 Þ i ðt2 Þ i ðt1 Þ 6 7 6 7 6 7 6 pðhÞ 7 HP 6 φiHP ðt2 Þ 7 HP 6 εðhÞ ðt Þ 7 i ðt3 Þ where PðhÞ 7; Φi 5 6 7; Ξi 5 6 i 2 7. i 56 4 5 4 5 4 ^ 5 ^ ^ HP pðhÞ εðhÞ ðt 1 1Þ ðt Þ φ Yi Yi i i i ðtYi Þ
(18.11)
Next, we can formulate the parameter estimation of θHP by the following constrained least i square parameter estimation problem: 1 ðhÞ 2 HP HP min :ΦHP i θi 2Pi :2 subject to Aθi # b 2 θHP i 2 Ni 1 Ji 3 zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 0 6 0 ? 021 0 0 7 where A 5 4 ; b 5 . 5 0 ? 0 0 1 0 1
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
(18.12)
502
18. Constructing host/pathogen genetic-and-epigenetic networks
By solving the parameter estimation problem in (18.12) with the help of the function lsqlin in MATLAB optimization toolbox, we can then acquire the interaction parameters in human PPIN Eq. (18.1) and concurrently ensure the human-protein translation rate λði hÞ to be a nonnegative value and the human-protein degradation rate 2σði hÞ to be a nonpositive ðhÞ value; that is to say λðhÞ i $ 0 and 2σi # 0. Similarly, EBV PPIN Eq. (18.2) is rewritten as the following linear regression form: h ðνÞ ðνÞ pðνÞ j ðt 1 1Þ 5 p1 ðtÞpj ðtÞ ?
ðνÞ pM ðtÞpðνÞ j ðtÞ j
p1ðhÞ ðtÞpðνÞ j ðtÞ
2
αðνÞ j1 ^ αðνÞ jMj
3
7 6 7 6 7 6 7 6 7 6 6 γ ðνÞ 7 7 j1 i6 7 6 ðνÞ ðνÞ ðνÞ ^ 7 1 εðνÞ ðtÞ; 6 ? pðhÞ ðtÞp ðtÞ g ðtÞ p ðtÞ 1 Ij j j j j 7 6 ðνÞ 6 γ jIj 7 7 6 6 λðνÞ 7 7 6 j 7 6 6 1 2 σðνÞ 7 j 5 4 β ðνÞ j for j 5 1; 2; . . .; J; 2σjðνÞ # 0; and λðνÞ $ 0 j
(18.13)
which could be simply represented as follows: ðνÞ νP νP pðνÞ j ðt 1 1Þ 5 φj ðtÞθj 1 εj ðtÞ;
for j 5 1; 2; . . .; J; 2σjðνÞ # 0; and λðνÞ j $0
(18.14)
where φνP j ðtÞ indicates the regression vector obtained from the corresponding expression data, and θνP j denotes the unknown interaction parameter vector of EBV-protein j in EBV PPIN to be estimated. Eq. (18.14) could be augmented for Yj data points of EBV-protein j as follows: 2 6 6 6 4
3 3 2 ðνÞ 2 νP εj ðt1 Þ φj ðt1 Þ 7 7 7 6 6 7 6 φνP 7 6 εðνÞ ðt Þ 7 pðνÞ j ðt3 Þ j ðt2 Þ 7; 1 7 5 6 j 2 7θνP 6 5 4 5 j 4 ^ 5 ^ ^ φνP pðνÞ εðνÞ j ðtYj Þ j ðtYj 1 1Þ j ðtYj Þ 3
pðνÞ j ðt2 Þ
ðνÞ for j 5 1; 2; . . .; J; 2σðνÞ j # 0; and λj $ 0
(18.15)
where Yj is the number of data points of protein expression. Thus we define the notations νP Pðj ν Þ , ΦνP j , and Ξj to represent Eq. (18.15) as follows: νP νP νP PðνÞ j 5 Φj θ j 1 Ξ j ;
ðνÞ for j 5 1; 2; . . .; J; 2σðνÞ j # 0; and λj $ 0
3 2 ðνÞ 3 εj ðt1 Þ φνP j ðt1 Þ 6 7 6 7 6 ðνÞ 7 6 pðνÞ 7 νP 6 φνP 6 εj ðt2 Þ 7 νP j ðt2 Þ 7 j ðt3 Þ ; Ξ where PðνÞ 5 5 5 ; Φ 6 7 6 7 6 7. j 4 5 j 4 5 j 4 ^ 5 ^ ^ φνP pðνÞ εðνÞ j ðtYj Þ j ðtYj 1 1Þ j ðtYj Þ 2
pðνÞ j ðt2 Þ
3
2
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
(18.16)
503
18.2 Materials and methods
Next, we can formulate the parameter estimation of θνP j by the following constrained least square parameter estimation problem: 1 ðνÞ 2 νP νP min :ΦνP j θj 2Pj :2 subject to Aθj # b νP 2 θj 2 M j 1 Ij zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 6 0 ? 0 21 where A 5 4 0 ? 0 0
(18.17)
3
0 07 0 . 5; b 5 1 0 1
By solving the problem in (18.17), we can then acquire the interaction parameters in EBV PPIN Eq. (18.2) and concurrently ensure the EBV-protein translation rate λðj ν Þ to be a nonnegative value and the EBV-protein degradation rate 2σðj ν Þ to be a nonpositive value; that ðν Þ is to say λðνÞ j $ 0 and 2σj # 0. As the same process in PPIN, human-gene GRN Eq. (18.3) can be rewritten as follows: h ðhÞ p1 ðtÞ ? gðhÞ k ðt 1 1Þ 5
pðhÞ Ik ðtÞ
ðhÞ pðhÞ 1 ðtÞp1 ðtÞ
?
ðhÞ pðhÞ 0 ðtÞp 00 ðtÞ I I k
k
ðhÞ wðhÞ 1 ðtÞgk ðtÞ
ðhÞ ðhÞ ðhÞ ðνÞ ðνÞ ðνÞ ðhÞ ? wðhÞ Rk ðtÞgk ðtÞo1 ðtÞ ? oLk ðtÞp1 ðtÞ ? pJk ðtÞw1 ðtÞgk ðtÞ 3 2 aðhÞ 6 ^k1 7 7 6 7 6 ðhÞ 6 akIk 7 6 ðhÞ 7 7 6 ζ 6 k11 7 7 6 ^ 7 6 6 ζ ðhÞ 7 6 kI 0k Ivk 7 7 6 6 2bðhÞ 7 6 k1 7 7 6 ^ 7 6 6 2bðhÞ 7 6 kRk 7 7 i6 ðhÞ 7 6 ðhÞ ðhÞ 6 ck1 7 1 ωðhÞ ðtÞ; ? wðνÞ ðtÞg ðtÞg ðtÞ1 Qk k k k 7 6 ^ 6 ðhÞ 7 7 6 c 6 kLk 7 7 6 7 6 dðhÞ k1 7 6 7 6 ^ 7 6 6 dðhÞ 7 7 6 kJk 7 6 7 6 2eðhÞ k1 7 6 7 6 ^ 7 6 6 2eðhÞ 7 kQk 7 6 7 6 4 1 2 μðhÞ k 5 ðhÞ δk ðhÞ ðhÞ for k 5 1; 2; . . .; K; 2bðhÞ kr # 0; 2ekq # 0; and 2μk # 0
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
(18.18)
504
18. Constructing host/pathogen genetic-and-epigenetic networks
which could be simply represented as follows: ðhÞ HG HG gðhÞ k ðt 1 1Þ 5 φk ðtÞθk 1 ωk ðtÞ;
ðhÞ ðhÞ for k 5 1; 2; . . .; K; 2 bðhÞ kr # 0; 2 ekq # 0; and 2 μk # 0
(18.19)
where φHG k ðtÞ indicates the regression vector obtained from the corresponding expression denotes the unknown parameter vector of human-gene k in the human-gene data, and θHG k GRN to be estimated. Eq. (18.19) could be augmented for Yk data points of human-gene k as follows: 3 2 3 2 ðhÞ 3 gðhÞ ωk ðt1 Þ φHG ðt1 Þ k ðt2 Þ k 7 6 HG 7 6 6 ðhÞ 7 6 φk ðt2 Þ 7 6 gðhÞ ω ðt Þ 7 7θHG 1 6 k ðt3 Þ 754 6 6 k 2 7; k 5 5 5 4 4 ^ ^ ^ φHG gðhÞ ωðhÞ k ðtYk Þ k ðtYk 1 1Þ k ðtYk Þ 2
ðhÞ ðhÞ for k 5 1; 2; . . .; K; 2bðhÞ kr # 0; 2ekq # 0; and 2μk # 0
(18.20) where Yk is the number of data points of gene expression. Thus we define the notations HG GðkhÞ , ΦHG to represent Eq. (18.20) as follows: k , and Ξk HG HG HG GðhÞ k 5 Φk θk 1 Ξk ;
ðhÞ ðhÞ for k 5 1; 2; . . .; K; 2bðhÞ kr # 0; 2ekq # 0; and 2μk # 0
(18.21)
2
3 2 ðhÞ 3 2 HG 3 gðhÞ ω ðt Þ ðt Þ ðt Þ φ 2 1 1 k k 6 7 6 kðhÞ 7 φk HG ðt2 Þ 7 6 gðhÞ 6 ωk ðt2 Þ 7 ðt3 Þ 7; ΦHG 5 6 HG 6 7 k 5 5 where GðhÞ ; Ξ 6 7 k 6 7. k k 4 5 4 5 4 5 ^ ^ ^ HG ðhÞ ðhÞ φ ðt Þ Y gk ðtYk 1 1Þ ωk ðtYk Þ k k Next, we can formulate the parameter estimation of θHG by the following constrained least k square parameter estimation problem:
2 Ik 1 I 0 Ivk k zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 60 ? 0 6 6^ & ^ 6 60 ? 0 6 where A 5 6 60 ? 0 6^ & ^ 6 60 ? 0 6 40 ? 0
1 ðhÞ 2 HG HG min :ΦHG k θk 2Gk :2 subject to Aθk # b 2 θHG k Rk
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 1 ? 0 ^ & ^ 0 ? 1 0 ? 0 ^ & ^ 0 ? 0 0 ? 0
Lk 1 J k
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 0 ? 0 ^ & ^ 0 ? 0 0 ? 0 ^ & ^ 0 ? 0 0 ? 0
Qk
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 0 ? 0 ^ & ^ 0 ? 0 1 ? 0 ^ & ^ 0 ? 1 0 ? 0
0 ^ 0 0 ^ 0 1
(18.22) 3
2 3 0 07 7 6^7 ^7 6 7 7 607 07 6 7 7 7 7 0 7; b 5 6 6 0 7. 7 6 7 ^7 6^7 4 7 05 07 5 1 0
By solving the problem in (18.22), we can then acquire the regulatory parameters in the human-gene GRN Eq. (18.3) and concurrently ensure the human-miRNA repression ability 2bðkrhÞ to be a nonpositive value, the EBV-miRNA repression ability 2eðkqhÞ to be a nonpositive value, and the human-gene degradation rate 2μðkhÞ to be a nonpositive value; that is to say 2 b(h)kr # 0, 2e(h)kq # 0, and 2μ(h)k # 0.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.2 Materials and methods
505
Similarly, EBV-gene GRN Eq. (18.4) is revealed in the following linear regression form: h ðhÞ ðhÞ ðhÞ ðhÞ ðhÞ ðhÞ ðνÞ gsðνÞ ðt 1 1Þ 5 pðhÞ 1 ðtÞ ? pIs ðtÞ p1 ðtÞp1 ðtÞ ? pI 0s ðtÞpIvs ðtÞ w1 ðtÞgs ðtÞ ðνÞ ðhÞ ðhÞ ðνÞ ðνÞ ðνÞ ðνÞ ? wðhÞ Rs ðtÞgs ðtÞ o1 ðtÞ ? oLs ðtÞ p1 ðtÞ ? pJs ðtÞ w1 ðtÞgs ðtÞ 3 2 ðνÞ as1 7 6 ^ 7 6 6 aðνÞ 7 7 6 sIs 7 6 ðνÞ 6 ζ s11 7 7 6 7 6 ^ 7 6 ðνÞ 7 6 ζ 0 6 sI s Ivs 7 6 ðνÞ 7 6 2bs1 7 7 6 7 6 ^ 6 ðνÞ 7 6 2bsRs 7 7 i6 6 cðνÞ 7 ðνÞ ðνÞ 7 1 ωðνÞ ðtÞ; 6 s1 ? wðνÞ ðtÞg ðtÞ g ðtÞ 1 s s s Qs 7 6 ^ 6 ðνÞ 7 7 6 c 6 sLs 7 7 6 7 6 dðνÞ s1 7 6 7 6 ^ 7 6 7 6 dðνÞ sJs 7 6 6 2eðνÞ 7 6 s1 7 7 6 ^ 7 6 6 2eðνÞ 7 6 sQs 7 7 6 5 4 1 2 μðνÞ s δsðνÞ (18.23) ðνÞ ðνÞ for s 5 1; 2; . . .; S; 2bðνÞ sr # 0; 2esq # 0; and 2μs # 0 which could be simply represented as follows: νG νG ðνÞ gðνÞ s ðt 1 1Þ 5 φs ðtÞθs 1 ωs ðtÞ;
ðνÞ ðνÞ for s 5 1; 2; . . .; S; 2bðνÞ sr # 0; 2esq # 0; and 2μs # 0
(18.24)
where φνG s ðtÞ indicates the regression vector obtained from the corresponding expression data, and θνG s denotes the unknown parameter vector of EBV-gene s in the EBV-gene GRN to be estimated. Eq. (18.24) could be augmented for Ys data points of EBV-gene s as follows: 2 3 2 VG 3 2 ðνÞ 3 gðνÞ φs ðt1 Þ ωs ðt1 Þ s ðt2 Þ 6 gðνÞ ðt Þ 7 6 φ VG ðt2 Þ 7 νG 6 ωðνÞ 7 s 3 6 756 s 7θ 1 6 s ðt2 Þ 7; (18.25) s 4 5 4 5 4 5 ^ ^ ^ φVG ωðνÞ gðνÞ s ðtYs Þ s ðtYs 1 1Þ s ðtYs Þ ðνÞ ðνÞ for s 5 1; 2; . . .; S; 2bsr # 0; 2esq # 0; and 2μðνÞ s #0 where Ys is the number of data points of gene expression. Thus we define the notations Gðsν Þ , ΦνGs , and ΞνG s to represent Eq. (18.25) as follows: νG νG GsðνÞ 5 ΦνG s θs 1 Ξs ;
ðνÞ ðνÞ for s 5 1; 2; . . .; S; 2bsr # 0; 2eðνÞ sq # 0; and 2μs # 0
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
(18.26)
506
18. Constructing host/pathogen genetic-and-epigenetic networks
2
where
6 6 GðνÞ s 54
2 ðνÞ 2 VG 3 3 3 gðνÞ φs ðt1 Þ ωs ðt1 Þ s ðt2 Þ ðνÞ VG ðνÞ 6 6 7 7 gs ðt3 Þ 7; ΦνG 5 6 φs ðt2 Þ 7; ΞνG 5 6 ωs ðt2 Þ 7 7. 4 4 5 s 5 5 s ^ ^ ^ φVG ωðνÞ gðνÞ s ðtYs Þ s ðtYs 1 1Þ s ðtYs Þ
Next, we can formulate the parameter estimation of θνG by the following constrained least s square parameter estimation problem: 1 νG νG ðνÞ 2 min :ΦνG (18.27) s θs 2Gs :2 subject to Aθs # b νG 2 θs
where
2 Is 1 I 0s Ivs zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 60 ? 0 6 6^ & ^ 6 60 ? 0 6 A560 ? 0 6 6^ & ^ 6 60 ? 0 4 0 ? 0
Rs zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 1 ? 0 ^ & ^ 0 ? 1 0 ? 0 ^ & ^ 0 ? 0 0 ? 0
Ls 1 Js zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 0 ? 0 ^ & ^ 0 ? 0 0 ? 0 ^ & ^ 0 ? 0 0 ? 0
Qs zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 0 ? 0 ^ & ^ 0 ? 0 1 ? 0 ^ & ^ 0 ? 1 0 ? 0
3
0 ^ 0 0 ^ 0 1
2 3 0 07 7 6^7 ^7 6 7 7 607 07 6 7 7 7 0 7; b 5 6 6 0 7. 7 6^7 ^7 6 7 7 405 07 5 1 0
By solving the problem in (18.27), we can then acquire the regulatory parameters in EBVgene GRN Eq. (18.4) and concurrently ensure the human-miRNA repression ability 2bðsrνÞ to be a nonpositive value, the EBV-miRNA repression ability 2eðsqνÞ to be a nonpositive value, and the EBV-gene degradation rate 2μðsνÞ to be a nonpositive value; that is to say 2b(v)sr # 0, 2e(v)sq # 0, and 2μ(v)s # 0. Likewise, human-lncRNA GRN Eq. (18.5) can be rewritten as follows: h ðhÞ gðLÞ z ðt 1 1Þ 5 p1 ðtÞ ?
?
? pIðhÞ ðtÞ p1ðhÞ ðtÞp1ðhÞ ðtÞ z wRðhÞz ðtÞgzðLÞ ðtÞ
ðLÞ wðνÞ Qz ðtÞgz ðtÞ
ðhÞ ? pIðhÞ 0 ðtÞpIv ðtÞ z z
w1ðhÞ ðtÞgzðLÞ ðtÞ
ðνÞ ðνÞ ðLÞ o1ðhÞ ðtÞ ? oLðhÞz ðtÞ pðνÞ 1 ðtÞ ? pJz ðtÞ w1 ðtÞgz ðtÞ 3 2 aðLÞ z1 7 6 ^ 7 6 6 aðLÞ 7 7 6 zIz 6 ðLÞ 7 6 ζ z11 7 7 6 7 6 ^ 7 6 ðLÞ 7 6 ζ 0 6 zI z Ivz 7 6 ðLÞ 7 6 2bz1 7 7 6 7 6 ^ 6 ðLÞ 7 6 2bzRz 7 7 6 i6 ðLÞ 7 6 cz1 7 1 ωðLÞ ðtÞ; gðLÞ z ðtÞ 1 6 z 7 ^ 6 ðLÞ 7 7 6 c 6 zLz 7 7 6 ðLÞ 7 6 dz1 7 6 7 6 ^ 7 6 ðLÞ 7 6 dzJ z 7 6 6 2eðLÞ 7 6 z1 7 7 6 ^ 7 6 6 2eðLÞ 7 6 zQz 7 7 6 4 1 2 μzðLÞ 5 ðLÞ δz
ðLÞ ðLÞ for z 5 1; 2; . . .; Z; 2bðLÞ zr # 0; 2ezq # 0; and 2μz # 0
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
(18.28)
507
18.2 Materials and methods
which could be simply represented as follows: HL HL ðLÞ gðLÞ z ðt 1 1Þ 5 φz ðtÞθz 1 ωz ðtÞ;
ðLÞ ðLÞ for z 5 1; 2; . . .; Z; 2bðLÞ zr # 0; 2ezq # 0; and 2μz # 0
(18.29) where φHL z ðtÞ indicates the regression vector obtained data, and θHL z denotes the unknown parameter vector
from the corresponding expression of human-lncRNA z in the humanlncRNA GRN to be estimated. Eq. (18.29) could be augmented for Yz data points of human-lncRNA z as follows: 2
3 2 HL 3 2 ðLÞ 3 gðLÞ φz ðt1 Þ ωz ðt1 Þ z ðt2 Þ 6 gðLÞ ðt Þ 7 6 φHL ðt Þ 7 HL 6 ðLÞ 7 3 z 6 7 5 6 z 2 7θ 1 6 ωz ðt2 Þ 7; z 4 5 4 5 4 5 ^ ^ ^ ðLÞ HL ðLÞ φz ðtYz Þ ωz ðtYz Þ gz ðtYz 1 1Þ ðLÞ ðLÞ for z 5 1; 2; . . .; Z; 2bzr # 0; 2ezq # 0; and 2μðLÞ z #0
(18.30)
where Yz is the number of data points of gene expression. Thus we define the notations HL GðzLÞ , ΦHL z , and Ξz to represent Eq. (18.30) as follows: HL HL HL GðLÞ z 5 Φz θz 1 Ξz ;
ðLÞ ðLÞ for z 5 1; 2; . . .; Z; 2bðLÞ zr # 0; 2ezq # 0; and 2μz # 0
(18.31)
2
3 2 HL 3 2 ðLÞ 3 gðLÞ φz ðt1 Þ ωz ðt1 Þ z ðt2 Þ 6 gðLÞ ðt Þ 7 HL 6 φHL ðt Þ 7 HL 6 ωðLÞ 7 z 3 6 7; Φ 5 6 z 2 7; Ξ 5 6 z ðt2 Þ 7. where GðLÞ z 54 z z 5 4 5 4 5 ^ ^ ^ ðLÞ HL ðLÞ φ ðt Þ ωz ðtYz Þ gz ðtYz 1 1Þ Yz z Next, we can formulate the parameter estimation of θHL z by the following constrained least square parameter estimation problem: 1 HL HL ðLÞ 2 min :ΦHL z θz 2Gz :2 subject to Aθz # b 2 θHL z 2 Iz 1 I 0z Ivz zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 60 ? 0 6 6^ & ^ 6 60 ? 0 6 where A 5 6 0 ? 0 6 6^ & ^ 6 60 ? 0 4 0 ? 0
Rz zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 1 ? 0 ^ & ^ 0 ? 1 0 ? 0 ^ & ^ 0 ? 0 0 ? 0
L z 1 Jz zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 0 ? 0 ^ & ^ 0 ? 0 0 ? 0 ^ & ^ 0 ? 0 0 ? 0
Qz zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 0 ? 0 ^ & ^ 0 ? 0 1 ? 0 ^ & ^ 0 ? 1 0 ? 0
(18.32) 3
0 ^ 0 0 ^ 0 1
2 3 0 07 7 6^7 ^7 6 7 7 607 07 6 7 7 7 0 7; b 5 6 6 0 7. 7 7 6 ^7 6^7 7 4 05 07 5 1 0
By solving the problem in (18.32), we can then acquire the regulatory parameters in the human-lncRNA GRN equation in (18.5) and concurrently ensure the human-miRNA repression ability 2bðzrLÞ to be a nonpositive value, the EBV-miRNA repression ability 2eðzqLÞ to be a nonpositive value, and the human-lncRNA degradation rate 2μðzLÞ to be a nonpositive value; that is to say 2b(L)zr # 0, 2e(L)zq # 0, and 2μ(L)z # 0.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
508
18. Constructing host/pathogen genetic-and-epigenetic networks
As the same process in human-gene GRN Eq. (18.3), human-miRNA GRN Eq. (18.6) is exhibited as follows: h xðhÞ f ðt 1 1Þ 5
pðhÞ 1 ðtÞ
? pðhÞ If ðtÞ
ðhÞ ? wðhÞ Rf ðtÞxf ðtÞ
ðhÞ pðhÞ 1 ðtÞp1 ðtÞ
pðνÞ 1 ðtÞ
ðhÞ ? pðhÞ I 0 ðtÞpIv ðtÞ f
? pðνÞ Jf ðtÞ
3
2
f
ðhÞ wðhÞ 1 ðtÞxf ðtÞ
ðhÞ wðνÞ 1 ðtÞxf ðtÞ
aðhÞ 6 f1 7 6 ^ 7 6 ðhÞ 7 6 afI 7 7 6 f 6 ðhÞ 7 6 ζ f11 7 7 6 6 ^ 7 6 ðhÞ 7 7 6 ζ 6 fI 0f Ivf 7 7 6 6 ðhÞ 7 6 2bf1 7 7 6 6 ^ 7 7 i6 6 2bðhÞ 7 ðhÞ ðhÞ ðhÞ ? wðνÞ fRf 7 1 ψf ðtÞ; Qf ðtÞxf ðtÞxf ðtÞ1 6 7 6 ðhÞ 7 6 d 7 6 f1 6 ^ 7 7 6 7 6 ðhÞ 6 dfJ 7 f 7 6 6 ðhÞ 7 6 2ef1 7 7 6 6 ^ 7 7 6 6 2eðhÞ 7 fQf 7 6 7 6 7 6 1 2 ρðhÞ f 5 4 ðhÞ ηf ðhÞ
ðhÞ for f 5 1; 2; . . .; F; 2bfr # 0; 2eðhÞ fq # 0; and 2ρf # 0
(18.33)
which could be simply represented as follows: HM xðhÞ ðtÞτ HM 1 ψðhÞ f f ðt 1 1Þ 5 ϕf f ðtÞ;
ðhÞ
ðhÞ for f 5 1; 2; . . .; F; 2 bfr # 0; 2 eðhÞ fq # 0; and 2 ρf # 0
(18.34)
ϕHM f ðtÞ
indicates the regression vector obtained from the corresponding expression data, where and τ HM denotes the unknown parameter vector of human-miRNA f in human-miRNA GRN to f be estimated. Eq. (18.34) could be augmented for Yf data points of human-miRNA f as follows: 2 3 2 2 3 3 ðhÞ HM ψ xðhÞ ðt Þ ðt Þ ϕ ðt Þ 2 1 1 f f 6 7 6 fðhÞ 7 7 HM 6 xðhÞ ðt3 Þ 7 6 6 ψ ðt2 Þ 7 ðt Þ ϕ 6 7 2 HM f 6 7 6 7; f 7τ f 1 6 f 6 756 7 5 ^ ^ ^ 4 5 4 4 5 HM ðhÞ ðhÞ ϕf ðtYf Þ xf ðtYf 1 1Þ ψf ðtYf Þ (18.35) ðhÞ ðhÞ ðhÞ for f 5 1; 2; . . .; F; 2bfr # 0; 2efq # 0; and 2ρf # 0 where Yf is the number of data points of gene expression. Thus we define the notations , and ΓHM to represent Eq. (18.35) as follows: XfðhÞ , ΘHM f f ðhÞ
ðhÞ τ HM 1 ΓHM ; for f 5 1; 2; . . .; F; 2bfr # 0; 2eðhÞ XfðhÞ 5 ΘHM f f f fq # 0; and 2ρf # 0
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
(18.36)
509
18.2 Materials and methods
2
where
6 6 XfðhÞ 5 6 6 4
3
3
2
3 2 HM ψðhÞ ðt1 Þ ϕf ðt1 Þ 7 7 6 fðhÞ 7 HM 7 HM 6 6 ϕf ðt2 Þ 7 HM 6 ψf ðt2 Þ 7 7; Θ 5 6 7. ; Γ 5 7 6 f f 7 7 6 5 4 ^ ^ ^ 5 5 4 HM ðhÞ ðhÞ ϕ ðt Þ Yf xf ðtYf 1 1Þ ψf ðtYf Þ f xðhÞ f ðt2 Þ
xðhÞ f ðt3 Þ
Next, we can formulate the parameter estimation of τ HM by the following constrained least f square parameter estimation problem: 1 2 min :ΘHM τ HM 2XfðhÞ :2 subject to Aτ HM #b f f 2 f τ HM f
where
2 If 1 I 0f Ivf zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 60 ? 0 6 6^ & ^ 6 60 ? 0 6 A56 60 ? 0 6^ & ^ 6 60 ? 0 6 40 ? 0
Rf
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 1 ? 0 ^ & ^ 0 ? 1 0 ? 0 ^ & ^ 0 ? 0 0 ? 0
Jf
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 0 ? 0 ^ & ^ 0 ? 0 0 ? 0 ^ & ^ 0 ? 0 0 ? 0
(18.37)
3
Qf
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 0 ? 0 ^ & ^ 0 ? 0 1 ? 0 ^ & ^ 0 ? 1 0 ? 0
0 ^ 0 0 ^ 0 1
2 3 0 07 7 6^7 ^7 6 7 7 607 07 6 7 7 6 7 07 7; b 5 6 0 7. 6^7 ^7 6 7 7 405 07 7 5 1 0
By solving the problem in (18.37), we can then acquire the regulatory parameters in the human-miRNA GRN equation in (18.6) and concurrently ensure the human-miRNA ðhÞ repression ability 2bfr to be a nonpositive value, the EBV-miRNA repression ability 2eðhÞ fq to be a nonpositive value, and the human-miRNA degradation rate 2ρðf hÞ to be a nonposiðhÞ tive value; that is to say 2bfr # 0, 2eðhÞ fq # 0, and 2ρ(h)f # 0. Similarly, we can rewrite EBV-miRNA GRN Eq. (18.7) as the following linear regression form: h ðhÞ ðhÞ ðhÞ ðhÞ ðhÞ ðhÞ xðνÞ u ðt 1 1Þ 5 p1 ðtÞ ? pIu ðtÞ p1 ðtÞp1 ðtÞ ? pI 0u ðtÞpIvu ðtÞ ðνÞ ðνÞ ðνÞ ðνÞ ðνÞ ? wðhÞ Ru ðtÞxu ðtÞp1 ðtÞ ? pJu ðtÞw1 ðtÞxu ðtÞ 2
aðνÞ u1 ^ aðνÞ uIu
ðνÞ wðhÞ 1 ðtÞxu ðtÞ
3
7 6 7 6 7 6 7 6 6 ðνÞ 7 6 ζ u11 7 7 6 6 ^ 7 6 ðνÞ 7 6 ζ 0 7 6 uI u Ivu 7 7 6 6 2bðνÞ 7 6 u1 7 6 i6 ^ 7 7 ðνÞ ðνÞ 7 ðνÞ ðνÞ ? wQu ðtÞxu ðtÞxu ðtÞ1 6 6 2buRu 7 1 ψuðνÞ ðtÞ; 7 6 ðνÞ 7 6 6 du1 7 7 6 6 ^ 7 6 ðνÞ 7 6 duJ 7 u 7 6 6 ðνÞ 7 6 2eu1 7 7 6 6 ^ 7 6 ðνÞ 7 6 2euQu 7 7 6 5 4 1 2 ρðνÞ u ηðνÞ u
for u 5 1; 2; . . .; U;
ðνÞ 2 bur
ðνÞ # 0; 2 eðνÞ uq # 0; and 2 ρu # 0
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
(18.38)
510
18. Constructing host/pathogen genetic-and-epigenetic networks
which could be simply represented as follows: ðνÞ
ðνÞ νM νM xðνÞ u ðt 1 1Þ 5 ϕu ðtÞτ u 1 ψu ðtÞ;
ðνÞ for u 5 1; 2; . . .; U; 2bur # 0; 2eðνÞ uq # 0; and 2ρu # 0
(18.39)
where ϕνM u ðtÞ indicates the regression vector obtained from the corresponding expression data, and τ νM denotes the unknown parameter vector of EBV-miRNA u in EBV-miRNA GRN to be u estimated. Eq. (18.39) could be augmented for Yu data points of EBV-miRNA u as follows: 3 2 ðνÞ 3 2 3 2 νM ϕu ðt1 Þ ψu ðt1 Þ xðνÞ u ðt2 Þ 6 xðνÞ 7 6 ϕνM ðt2 Þ 7 νM 6 ψðνÞ ðt2 Þ 7 u ðt3 Þ 7τ 1 6 u 7; 6 756 u (18.40) 5 u 4 5 4 5 4 ^ ^ ^ ðνÞ ðνÞ νM ϕu ðtYu Þ ψu ðtYu Þ xu ðtYu 1 1Þ ðνÞ
ðνÞ # 0; and 2ρðνÞ for u 5 1; 2; . . .; U; 2bur # 0; 2euq u #0
where Yu is the number of data points of gene expression. Thus we define the notations νM Xuðν Þ , ΘνM u , and Γu to represent Eq. (18.40) as follows: νM νM XuðνÞ 5 ΘνM u τ u 1 Γu ;
ðνÞ
ðνÞ for u 5 1; 2; . . .; U; 2bur # 0; 2eðνÞ uq # 0; and 2ρu # 0
(18.41)
2 νM 3 2 ðνÞ 3 3 ϕu ðt1 Þ ψu ðt1 Þ xuðνÞ ðt2 Þ 6 xðνÞ ðt3 Þ 7 νM 6 ϕνM ðt2 Þ 7 νM 6 ψðνÞ ðt2 Þ 7 u 7; Γ 5 6 u 7. 7; Θ 5 6 u where XuðνÞ 5 6 4 5 u 4 5 4 5 u ^ ^ ^ ðνÞ ðνÞ νM ϕu ðtYu Þ ψu ðtYu Þ xu ðtYu 1 1Þ 2
Next, we can formulate the parameter estimation of τ νM u by the following constrained least square parameter estimation problem:
2 Iu 1 I0u Ivu zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 60 ? 0 6 6^ & ^ 6 60 ? 0 6 where A 5 6 60 ? 0 6^ & ^ 6 60 ? 0 6 40 ? 0
1 νM ðνÞ 2 νM min :ΘνM u τ u 2Xu :2 subject to Aτ u # b νM τu 2 Ru
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 1 ? 0 ^ & ^ 0 ? 1 0 ? 0 ^ & ^ 0 ? 0 0 ? 0
Ju
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 0 ? 0 ^ & ^ 0 ? 0 0 ? 0 ^ & ^ 0 ? 0 0 ? 0
(18.42) 3
Qu
zfflfflfflfflfflfflffl}|fflfflfflfflfflfflffl{ 0 ? 0 ^ & ^ 0 ? 0 1 ? 0 ^ & ^ 0 ? 1 0 ? 0
0 ^ 0 0 ^ 0 1
2 3 0 07 7 6^7 ^7 6 7 7 607 07 6 7 7 7 7 0 7; b 5 6 6 0 7. 7 6 7 ^7 6^7 405 07 7 1 05
By solving the parameter estimation problem in (18.42), we can then acquire the regulatory parameters in EBV-miRNA GRN Eq. (18.7) and concurrently ensure the human-miRNA ðνÞ repression ability 2bur to be a nonpositive value, the EBV-miRNA repression ability 2eðνÞ uq to be a nonpositive value, and the EBV-miRNA degradation rate 2ρðuν Þ to be a nonpositive ðνÞ ðν Þ value; that is to say 2bur # 0, 2eðνÞ uq # 0, and 2ρu # 0. In order to obtain the accurate results of the system identification approach, we have to interpolate some extra numbers of data points (five times number of the parameters in the
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.2 Materials and methods
511
HG corresponding parameter vector, θHP in human PPIN, θνP in human-gene i j in EBV PPIN, θk HL νG HM GRN, θs in EBV-gene GRN, θz in human-lncRNA GRN, τ f in human-miRNA GRN, τ νM u in EBV-miRNA GRN to be estimated) via the cubic spline method mentioned previously, which enables to prevent the solving process of parameter estimation problem from the overfitting for lack of enough data points. Therefore the solutions of the constrained least square parameter estimation problems in (18.12), (18.17), (18.22), (18.27), (18.32), (18.37), and (18.42) could be obtained with NGS expression data by using the function lsqlin in MATLAB optimization toolbox for the optimal estimations of parameters in these estimation problems. To date, there remains an unsettled question that the genome-wide mRNA microarray expression measurement cannot describe the protein behaviors in human B cells and EBV, and the corresponded mRNA abundance can explain over 73% of variance in protein abundance [788]; that is to say, the protein behaviors can be described by their corresponding gene expressions. As a result, the NGS data of gene expressions can replace protein expressions, which could contribute to the solutions of the constrained least square parameter estimation problems in need of protein expressions in (18.12), (18.17), (18.22), (18.27), (18.32), (18.37), and (18.42).
18.2.5 System order detection scheme of the dynamic models of GIGENs Because these interactive and regulatory parameters were filled with many false positives in the candidate GIGEN constructed by database mining with computational and experimental predictions, we thereby could apply the system order detection scheme to the human PPIN model in (18.11), EBV PPIN model in (18.16), human-gene GRN model in (18.21), EBV-gene GRN model in (18.26), human-lncRNA GRN model in (18.31), human-miRNA GRN model in (18.36), and EBV-miRNA GRN model in (18.41) to prune the false positives in the candidate GIGEN by using the NGS data of human B cells and EBV. According to Akaike information criterion (AIC), the insignificant parameters in the models of GIGENs could be deleted so that we finally acquired the real interspecies GIGENs during the EBV lytic phase. In human PPIN model (18.11), AIC of human-protein i could be defined as follows [6,7,11,757]: 0 1 HP T HP 1 2ðNi 1 Ji Þ ðhÞ ðhÞ HP HP HP AICi ðNi ; Ji Þ 5 log@ P 2Φi θ^ i Pi 2 Φi θ^ i A 1 (18.43) Yi i Yi where θ^ i indicates the estimated parameters of human-protein i obtained from the solutions of the parameter estimation problem in (18.12), and the estimated residual error T HP ^ HP ^ HP PðhÞ . According to system identification theory, AIC is a is κ^ 2HP;i 5 1=Yi PiðhÞ 2ΦHP i θi i 2 Φ i θi trade-off between the estimated residual error and parameter-associated error and will achieve the minimum at the real system order (i.e., the number of parameters). It can be in (18.43) can be solved at the number Ni 1 Ji of the real realized that the minimum AICHP i PPIs of protein i in the human PPIN. The insignificant interactions out of Ni and Ji should be deleted as false positives from PPIs of protein i. Then by the similar procedure one protein by one protein, we could obtain the real human PPIN. Similarly, in the EBV PPIN model in (18.16), AIC of EBV-protein j could be defined as follows: 0 1 T 2 M j 1 Ij νP νP 1 ðνÞ ðνÞ νP ^ νP ^ νP @ A AICj Mj ; Ij 5 log P 2Φj θj Pj 2 Φj θj 1 (18.44) Yj j Yj HP
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
512
18. Constructing host/pathogen genetic-and-epigenetic networks νP θ^ j
where denotes the estimated parameters of EBV-protein j obtained from the solutions of the parameter estimation problem in (18.17), and the estimated residual error is T ðνÞ νP ^ νP νP ^ νP 2Φ P 2 Φ . It can be realized that the minimum AICνP θ θ κ^ 2νP;j 5 1=Yj PðνÞ j j j j j j j in (18.44) can be achieved at the real interaction number Mj 1 Ij of the real PPIs of protein j in the EBV PPIN. The insignificant PPIs out of Mj and Ij are all pruned away one protein by one protein to obtain the real EBV PPIN. As the similar procedure in PPIN, AIC of human-gene k in human-gene GRN model (18.21) is defined as the following equation: HG
AICk
1 T HG HG 1 HG ^ HG ^ A Ik ; I 0k Ivk ; Rk ; Lk ; Jk ; Qk 5 log@ GðhÞ GðhÞ k 2Φk θk k 2 Φk θk Yk 2 Ik 1 I 0k Ivk 1 Rk 1 Lk 1 Jk 1 Qk 1 Yk 0
(18.45)
where θ^ k represents the estimated parameters of human-gene k obtained from the solutions of ^2 the parameter estimation T problem in (18.22), and the estimated residual error is κHG;k 5 ðhÞ HG HG 1=Yk G 2ΦHG θ^ GðhÞ 2 ΦHG θ^ . It can be realized that the minimum AICHG in (18.45) HG
k
k
k
k
k
k
k
can be achieved at the number Ik 1 I 0k Ivk 1 Rk 1 Lk 1 Jk 1 Qk of the real gene/miRNA/lncRNA regulations of gene k in the human-gene GRN. The insignificant regulations out of Ik ; I 0k Ivk ; Rk ; Lk ; Jk , and Qk should be pruned one gene by one gene to obtain the real human-gene GRN. Similarly, AIC of EBV-gene s in EBV-gene GRN model (18.26) can be defined in the following equation: 0 1 T νG νG 1 νG ^ νG ^ A AICνG Is ; I 0s Ivs ; Rs ; Ls ; Js ; Qs 5 log@ GðνÞ GðνÞ s s 2Φs θs s 2 Φ s θs Ys 2 Is 1 I 0s Ivs 1 Rs 1 Ls 1 Js 1 Qs 1 Ys
(18.46)
νG
where θ^ s means the estimated parameters of EBV-gene s obtained from the solutions of the parameter estimation problem in (18.27), and the estimated residual error is
T νG ^ νG νG ^ νG GðνÞ . It can be realized that the minimum AICνG in κ^ 2νG;s 5 1=Ys GðνÞ s 2Φs θs s 2 Φs θs s
(18.46) can be achieved at the number Is 1 I 0s Ivs 1 Rs 1 Ls 1 Js 1 Qs of the real gene/miRNA/ lncRNA regulations of EBV-gene s in EBV-gene GRN. Therefore the insignificant regulations out of real regulation orders Is ; I 0s Ivs ; Rs ; Ls ; Js , and Qs should be pruned away one gene by one gene to obtain the real EBV-gene GRN. Likewise, AIC of human-lncRNA z in human-lncRNA GRN model (18.31) is revealed in the following equation: 0 1 T HL HL 1 ðLÞ HL ðLÞ HL AICHL Iz ; I 0z Ivz ; Rz ; Lz ; Jz ; Qz 5 log@ Gz 2Φz θ^ z Gz 2 Φz θ^ z A z Yz (18.47) 2 Iz 1 I 0z Ivz 1 Rz 1 Lz 1 Jz 1 Qz 1 Yz
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.2 Materials and methods
513
HL θ^ z
where signifies the estimated parameters of human-lncRNA z obtained from the solutions of the parameter estimation problem in (18.32), and the estimated residual T HL ^ HL ^ HL . It can be realized that the minimum AICHL GzðLÞ 2 ΦHL error is κ^ 2HL;z 5 1=Yz GðLÞ z 2Φz θz z θz z in (18.47) can be achieved at the number Iz 1 I 0z Ivz 1 Rz 1 Lz 1 Jz 1 Qz of the real gene/ miRNA/lncRNA regulations of human-lncRNA z in the human-lncRNA GRN. Therefore the insignificant regulations out of real regulation orders Iz ; I 0z Iz ; Rz ; Lz ; Jz , and Qz should be pruned one human-lncRNA by one human-lncRNA to obtain the real human-lncRNA GRN. As the same procedure in human-gene GRN model (18.21), AIC of human-miRNA f in human-miRNA GRN model (18.36) is exhibited as follows: 0 1 T 1 A AICHM If ; I 0f Ivf ; Rf ; Jf ; Qf 5 log@ XfðhÞ 2ΘHM τ^ HM XfðhÞ 2 ΘHM τ^ HM f f f f f Yf (18.48) 2 If 1 I 0f Ivf 1 Rf 1 Jf 1 Qf 1 Yf where τ^ HM shows the estimated parameters of human-miRNA f obtained from the solutions of f the parameter estimation problem in (18.37), and the estimated residual error is T ðhÞ HM κ^ 2HM;f 5 1=Yf XfðhÞ 2ΘHM τ ^ X 2 ΘHM ^ HM . It can be realized that the minimum AICHM in (18.48) f f f τ f f f can be achieved at the number If 1 I 0f Ivf 1 Rf 1 Jf 1 Qf of the real gene/miRNA regulations of human-miRNA f in the human-miRNA GRN. Therefore the insignificant regulations out of real regulation orders If , I 0f Ivf , Rf , Jf , and Qf should be pruned one human-miRNA by one humanmiRNA to obtain the real human-miRNA GRN. Similarly, we can define AIC of EBV-miRNA u in EBV-miRNA GRN model (18.41) as the following equation: 0 1 1 T 0 00 A AICνM Iu ; Iu Iu ; Ru ; Ju ; Qu 5 log@ XðνÞ 2ΘνM ^ νM XuðνÞ 2 ΘνM ^ νM u τ u u τ u u Yu u (18.49) 0 00 2 Iu 1 Iu Iu 1 Ru 1 Ju 1 Qu 1 Yu where τ^ νM stands for the estimated parameters of EBV-miRNA u obtained from the soluu tions of the parameter estimation problem in (18.42), and the estimated residual error is T κ^ 2νM;u 5 1=Yu XuðνÞ 2ΘνM ^ νM XuðνÞ 2 ΘνM ^ νM . It can be realized that the minimum AICνM in u τ u u τ u u (18.49) can be achieved at the number Iu 1 I 0u Ivu 1 Ru 1 Ju 1 Qu of the real gene/miRNA regulations of EBV-miRNA u in the EBV-miRNA GRN. Therefore the insignificant regulations out of the real system orders Iu , I 0u Ivu , Ru , Ju , and Qu should be pruned away one EBVmiRNA by one EBV-miRNA to obtain the real EBV-miRNA GRN. Therefore we can identify the real interspecies GIGENs at the first infection stage (as shown in Fig. 18.3 drawn by Cytoscape [88]) and at the second infection stage (as shown in Fig. 18.4) during the EBV lytic infection after we apply the system identification approach and the system order detection scheme to obtain the interspecies
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
514
18. Constructing host/pathogen genetic-and-epigenetic networks
2500 Human receptors
80 EBV proteins
16,230 Human proteins
1 EBV TF
30EBV miRNAs
385 Human TFs 687 Human miRNAs
91 Human lncRNAs
39,990 PPIs 7539 miRNA repressions 8616 Transcriptional regulations 24 lncRNA regulations
FIGURE 18.3 The real interspecies GIGEN of the first infection stage in the lytic phase. The nodes with red frame correspond to the proteins/TF/miRNAs of EBV; the nodes with blue frame indicate the receptors/proteins/ TFs/miRNAs/lncRNAs of human; the edges in green denote the PPIs of human, EBV, human-EBV; the edges in purple represent the miRNA repressions of miRNAs on genes of intraspecies and interspecies; the edges in black mean the transcriptional regulations of TFs on genes of intraspecies and interspecies; and the edges in orange signify the lncRNA regulations of lncRNAs on genes of human [14]. EBV, EpsteinBarr virus; GIGEN, genome-wide interspecies genetic-and-epigenetic network; PPI, proteinprotein interaction; TF, transcription factor.
GIGENs by pruning the false positives of the candidate GIGEN with the use of NGS data of human B cells and EBV. Information concerning the number of nodes and edges of the candidate GIGEN from databases and the number of nodes and edges of the real GIGENs at the first and second infection stage are presented in Tables 18.1 and 18.2, respectively. However, the real GIGENs shown in Figs. 18.3 and 18.4 are too complicated for us to investigate the lytic replication, production, and cytolytic mechanisms between human and EBV during the lytic infection. Therefore we need to extract the HVCNs, which contain the principal network structures of the real networks, from the real GIGENs at the both infection stages in the EBV lytic phase by using the PNP method.
18.2.6 Extracting core network from the real interspecies GIGEN by using the PNP method It is essential to establish an integrated system network matrix H of a real GIGEN before we apply the PNP method to extract the core GIGENs from the real GIGENs. In addition,
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
515
18.2 Materials and methods
2446 Human receptors
75 EBV proteins
15,946 Human proteins
1 EBV TF
30 EBV miRNAs
366 Human TFs
655 Human miRNAs
28,414PPIs 5494 miRNA repressions 6381 Transcriptional regulations 12 lncRNA regulations
85 Human lncRNAs
FIGURE 18.4 The real interspecies GIGEN of the second infection stage in the lytic phase. The nodes with red frame correspond to the proteins/TF/miRNAs of EBV; the nodes with blue frame indicate the receptors/proteins/TFs/miRNAs/lncRNAs of human; the edges in green denote the PPIs of human, EBV, human-EBV; the edges in purple represent the miRNA repressions of miRNAs on genes of intraspecies and interspecies; the edges in black mean the transcriptional regulations of TFs on genes of intraspecies and interspecies; and the edges in orange signify the lncRNA regulations of lncRNAs on genes of human [14]. EBV, EpsteinBarr virus; GIGEN, genome-wide interspecies genetic-and-epigenetic network; PPI, proteinprotein interaction; TF, transcription factor. TABLE 18.1 Information concerning the number of nodes of candidate genome-wide interspecies geneticand-epigenetic networks at the first and second infection stage [14]. Candidates
First infection stage
Second infection stage
V_T
1
1
1
V_M
43
30
30
V_P
85
80
75
H_T
2688
385
366
H_M
1326
687
655
H_L
186
91
85
H_P
18,227
16,230
15,946
H_R
2880
2500
2446
Total
43,689
20,004
19,604
Nodes
EBV, EpsteinBarr virus; H_L, lncRNAs of human; H_M, miRNAs of human; H_P, proteins of human; H_R, receptors of human; H_T, TFs of human; TF, transcription factor; V_M, miRNAs of EBV; V_P, proteins of EBV; V_T, TFs of EBV.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
516
18. Constructing host/pathogen genetic-and-epigenetic networks
TABLE 18.2 Information concerning the number of edges of candidate genome-wide interspecies geneticand-epigenetic networks at the first and second infection stage [14]. Candidates
First infection stage
Second infection stage
301
58
44
Edges V_P ↹ V_P V_T - V_M
5
0
1
V_M V_G
67
1
2
H_P ↹ H_P
23,570,918
39,846
28,325
H_T - H_G
897,805
8424
6222
H_T - H_M
7471
86
71
H_T - H_L
1335
79
65
H_M H_G
815,889
6582
4837
H_M H_M
215
6
3
H_M H_L
1796
26
16
H_L-H_G
1948
24
12
V_P ↹ H_P
5135
86
45
V_T - H_G
1252
15
14
V_M H_G
39,558
914
620
V_M H_M
39
2
4
V_M H_L
175
6
7
H_T - V_G
680
6
4
H_T - V_M
675
6
4
H_M V_G
1,708
1
4
H_M V_M
10
1
1
Total
25,346,982
56,169
40,301
-, Transcriptional regulations; ↹, PPIs; EBV, EpsteinBarr virus; H_G, genes of human; H_L, lncRNAs of human; H_M, miRNAs of human; H_P, proteins of human; H_R, receptors of human; H_T, TFs of human; PPIs, proteinprotein interactions; TF, transcription factor; V_G, genes of EBV; V_M, miRNAs of EBV; V_P, proteins of EBV; V_T, TFs of EBV; , miRNA repressions.
the system network matrix H involves the whole estimated system parameters in the real GIGENs as follows: 2
0 6 0 6 6 Hνm;νm 6 H56 6 Hνg;νm 6 Hhm;νm 6 4 Hhg;νm Hhl;νm
Hνp;νp Hhp;νp Hνm;νp Hνg;νp Hhm;νp Hhg;νp Hhl;νp
0 0 Hνm;hm Hνg;hm Hhm;hm Hhg;hm Hhl;hm
Hνp;hp Hhp;hp Hνm;hp Hνg;hp Hhm;hp Hhg;hp Hhl;hp
0 0 Hνm;hc Hνg;hc Hhm;hc Hhg;hc Hhl;hc
0 0 0
3
7 7 7 7 ð2J12I1Q1R1LÞ 3 ðJ1I1Q1R1L1I 0 IvÞ Hνg;hl 7 7Aℝ 0 7 7 Hhg;hl 5 Hhl;hl
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
517
18.2 Materials and methods
2
α^ ðνÞ 11
?
6 where Hνp;νp 5 6 4 ^
α^ ðνÞ jm
α^ ðνÞ J1
2
γ^ ðhÞ 6 11 Hhp;νp 5 6 4 ^ γ^ ðhÞ I1 2 6 Hνm;νm 5 6 4
?
?
ur
?
2
6 6 Hνg;νp 5 6 ^ 4 ðνÞ d^ J1
2
a^ðνÞ 11 6 Hνg;hp 5 4 ^ a^ðνÞ J1 2
c^ðνÞ 11
6 Hνg;hl 5 4 ^ c^ðνÞ J1
ðνÞ
QR
?
ζ^ uðIvði0 21Þ1ivÞ ? ? ðνÞ d^
ðνÞ d^1J
3
2
ðνÞ
2 b^11
? a^ðνÞ si ?
2 ðνÞ 3 ζ^ a^ðνÞ 1I 6 11 7 6 ^ 5; Hνg;hc 5 4 ^ ðνÞ a^ðνÞ JI ζ^
?
ðνÞ a^ 1I
? ðνÞ a^
ðνÞ a^ QI
?
3
? ðνÞ 2 b^ sr
?
^ ðhÞ 6 2 e 11 6 7 ^ 5; Hhm;νm 5 6 ^ 4 ðνÞ ðhÞ c^JL 2 e^ R1
ðνÞ
2 b^1R
7 5;
7 ^ 7 5; ðνÞ 2 b^ JR
ðνÞ ζ^ JI0 Iv
fq
2 e^ðνÞ JQ
3
3
?
?
^
?
ðνÞ ζ^ 1I0 Iv
? ðhÞ 2 e^
ðνÞ 2 e^1Q
? ðνÞ 2 e^sq
?
2
3
7 ^ 7 5;
ui
ðνÞ ζ^ sðIvði0 21Þ1ivÞ
J1
ðνÞ c^1L
uj
QI 0 Iv
?
? ^cðνÞ sl ?
? ðνÞ d^
3 ^d ðνÞ 1J 7 7 ^ 7 7; 5 ^ ðνÞ d QJ
3 ðνÞ 2 ζ^ 1I0 Iv 7 2 e^ðνÞ 11 7 6 ^ ; H 5 7 4 ^ 5 νg;νm 2 e^ðνÞ ^ζ ðνÞ J1
7 6 7 ^ 7; Hνg;hm 5 6 4 ^ 5 ðνÞ ðνÞ 2 b^J1 d^JJ
sj
γ^ ðνÞ JI
2
3
3 2 ðνÞ ^ ðνÞ 2 b 1R 7 a^ 6 11 7 6 ;H 5 ^ ^ 7 5 νm;hp 4 ðνÞ ðνÞ a^ Q1 2 b^
? ðνÞ 2 b^
Q1
ðνÞ d^11
? 3
2
2
2
3 γ^ ðνÞ 1I ^ 7 5;
? γ^ ðνÞ ji
^ ðνÞ 6 d 11 6 7 6 ^ 7 5; Hνm;νp 5 6 ^ 4 ðνÞ ðνÞ ^ 2 e^ QQ d Q1
uq
Q1
^ ðνÞ ζ 6 11 6 Hνm;hc 5 6 ^ 4 ðνÞ ζ^ Q1
ðνÞ γ^ 11 7 6 ^ 7 5; Hνp;hp 5 4 ^ ðνÞ γ^ J1 α^ ðνÞ JJ
ðνÞ 2 e^ 1Q
? ðνÞ 2 e^
^ ðνÞ 6 2 b 11 6 Hνm;hm 5 6 ^ 4 ðνÞ 2 b^
2
α^ ðhÞ ? α^ ðhÞ 1I 7 7 6 11 7 ^ 5; Hhp;hp 5 4 ^ α^ ðhÞ 5; ^ in ðhÞ ðhÞ ðhÞ ^ ^ ? α α γ^ IJ I1 II
γ^ ðhÞ ij
^ ðνÞ 2 e^
3
γ^ ðhÞ 1J
?
ðνÞ 2 e^ 11
? 3
α^ ðνÞ 1J
3
7 ^ 7 5;
ðhÞ 2 e^ 1Q
ðhÞ
7 7 7; 5
^ 2 e^
3
RQ
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
518
18. Constructing host/pathogen genetic-and-epigenetic networks
2
^ ðhÞ 6 d 11 6 Hhm;νp 5 6 6 ^ 4 ðhÞ d^ R1 2
? ^d ðhÞ fj
?
ðhÞ a^ 11
?
R1
6 Hhg;νm 5 6 4 2 6 Hhg;hm 5 6 4
2 e^ðhÞ 11 ^ 2 e^ðhÞ I1 ðhÞ
2 b^11
^ ðhÞ 2 b^ I1
2
ðhÞ ζ^ 11
6 Hhl;νm 5 4 2 6 Hhl;hm 5 6 4
^ 2 e^ðLÞ L1 ðLÞ
2 b^11
^ ðLÞ 2 b^ L1
2
ðLÞ ζ^ 11
6 Hhl;hc 5 6 4 ^ ðLÞ ζ^ L1
2 e^ðhÞ 1Q
? 2 e^ðhÞ kq
?
?
I1
kr
? ? ðLÞ 2 b^ zr
? ?
ðLÞ ζ^ zðIvði0 21Þ1ivÞ
?
3
2 ðhÞ a^ 7 6 11 7 ; H 5 ^ 5 hg;hp 4 ^ ðhÞ a^ðhÞ I1 2 b^ ðhÞ
2 b^1R
?
?
ðhÞ d^11
6 7 7; Hhg;νp 5 6 6 ^ 5 4 ðhÞ ðhÞ 2 e^IQ d^
? ðhÞ 2 b^
2 e^ðLÞ zq
2
3
IR
ðhÞ ζ^ 1I0 Iv
3
2
c^ðhÞ 7 6 11 7 ^ 5; Hhg;hl 5 4 ^ ðhÞ c^ðhÞ I1 ζ^ II0 Iv
2 e^ðLÞ 1Q
2
3
ðLÞ d^11
6 6 7 5; Hhl;νp 5 6 ^ 4 ðLÞ 2 e^ðLÞ LQ d^ ^
L1
3 2 ðLÞ ðLÞ 2 b^1R a^ 7 6 11 7 ^ 5; Hhl;hp 5 4 ^ ðLÞ a^ðLÞ L1 2 b^ LR
3
2 ðLÞ c^ 7 6 11 7 ; H 5 4 ^ ^ 5 hl;hl c^ðLÞ ^ζ ðLÞ0 L1 LI Iv ðLÞ ζ^ 1I0 Iv
3 ^ ðhÞ 2 b 1R 7 7 ; ^ 7 5 ðhÞ 2 b^ RR
? ^ζ ðhÞ fðIvði0 21Þ1ivÞ
^
?
?
2 e^ðLÞ 11
fr
R1
ðhÞ ζ^ kðIvði0 21Þ1ivÞ
I1
2
3
?
6 Hhg;hc 5 6 4 ^ ðhÞ ζ^
? ðhÞ 2 b^
2
^ ðhÞ ζ 6 11 7 6 7 ^ 5; Hhm;hc 5 6 ^ 4 ðhÞ ðhÞ a^ RI ζ^ ðhÞ a^ 1I
? ^a ðhÞ fi
6 Hhm;hp 5 6 4 ^ ðhÞ a^ 2
3 2 ^ ðhÞ ^ ðhÞ d 1J 7 2 b 11 6 7 6 ^ 7 7; Hhm;hm 5 6 4 ^ 5 ðhÞ ðhÞ d^ RJ 2 b^ R1
3 ^ζ ðhÞ 1I 0 Iv 7 7 ; ^ 7 5 ðhÞ ζ^ RI 0 Iv
? ðhÞ d^
ðhÞ d^1J
3
?
7 7 ^ 7; 5 ðhÞ d^IJ
? a^ðhÞ ki ?
3 a^ðhÞ 1I 7 ^ 5; a^ðhÞ II
kj
? c^ðhÞ kl ? ? ðLÞ d^
3 c^ðhÞ 1L 7 ^ 5; c^ðhÞ IL ðLÞ d^1J
3
?
7 7 ^ 7; 5 ðLÞ d^
? a^ðLÞ zi ?
3 a^ðLÞ 1I 7 ^ 5; ðLÞ a^LI
zj
? ^cðLÞ zl ?
LJ
3 c^ðLÞ 1L 7 ^ 5 c^ðLÞ LL
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
519
18.2 Materials and methods HP θ^ i
^ ðhÞ where α^ ðhÞ by solving the parameter estimain and γ ij are the corresponding components in ^ ðνÞ tion problem in (18.12) and the system order detection problem in (18.43); α^ ðνÞ jm and γ ji are the νP corresponding components in θ^ j by solving the parameter estimation problem in (18.17) and ðhÞ ðhÞ ðhÞ ^ðhÞ the system order detection problem in (18.44); a^kiðhÞ ; ζ^ kðIvði0 21Þ1ivÞ ; 2b^kr ; c^ðhÞ ekq are the correkl ; dkj ; and 2^ HG sponding components in θ^ k by solving the parameter estimation problem in (18.22) and the ^ðνÞ ^ðνÞ ^ðνÞ ^ ðνÞ system order detection problem in (18.45); a^ðνÞ eðνÞ sq are the corresi ; ζ sðIvði0 21Þ1ivÞ ; 2bsr ; csl ; dsj ; and 2^ νG ^ sponding components in θs by solving the parameter estimation problem in (18.27) ðLÞ ^ðLÞ ^ ðLÞ ^ðLÞ ^ðLÞ and the system order detection problem in (18.46); a^ðLÞ ezq are the zi ; ζ zðIvði0 21Þ1ivÞ ; 2bzr , czl ; dzj ; and 2^ HL corresponding components in θ^ z by solving the parameter estimation problem in (18.32) and ðhÞ ðhÞ ^ ðhÞ ^ðhÞ system order detection problem in (18.47); a^ fi ; ζ^ fðIvði0 21Þ1ivÞ ; 2b^ ðhÞ fr ; dfj ; and 2efq are the correHM sponding components in τ^ f by solving the parameter estimation problem in (18.37) and ðνÞ ðνÞ ^ ðνÞ ^ðνÞ the system order detection problem in (18.48); a^ ui ; ζ^ uðIvði0 21Þ1ivÞ ; 2 b^ ðνÞ ur ; duj ; and 2 euq are the νM corresponding components in τ^ u by solving the parameter estimation problem in (18.42) and ^ ðνÞ the system order detection problem in (18.49). α^ ðhÞ in and α jm indicate the interactive abilities of ^ ðνÞ intraspecies in human and EBV PPINs during the EBV infection, respectively; γ^ ðhÞ ij and γ ji denote the interactive abilities of interspecies between human-protein i and EBV-protein j in ^ ðνÞ ^ðνÞ ^ðLÞ ^ ðhÞ human and EBV PPINs; a^ðhÞ ki ; asi ; azi ; afi ; and aui represent the transcriptional regulatory abilities of human-TF i to regulate human-gene k, EBV-gene s, human-lncRNA z, humanmiRNA f, and EBV-miRNA u, respectively, in human-gene GRN, EBV-gene GRN, humanlncRNA GRN, human-miRNA GRN, and EBV-miRNA GRN during the EBV infection, ðhÞ ðνÞ ðLÞ ðhÞ ðνÞ respectively; ζ^ kðIvði0 21Þ1ivÞ ; ζ^ sðIvði0 21Þ1ivÞ ; ζ^ zðIvði0 21Þ1ivÞ ; ζ^ fðIvði0 21Þ1ivÞ ; and ζ^ uðIvði0 21Þ1ivÞ mean the transcriptional regulatory abilities of human-TF complex i0 ::iv to regulate human-gene k, EBV-gene s, human-lncRNA z, human-miRNA f, and EBV-miRNA u, respectively, in the human-gene GRN, EBV-gene GRN, human-lncRNA GRN, human-miRNA GRN, and EBV-miRNA GRN ðhÞ ðνÞ ðLÞ ðhÞ ðνÞ during the EBV infection, respectively; d^kj ; d^sj ; d^zj ; d^ fj ; and d^ uj signify the transcriptional regulatory abilities of EBV-TF s to regulate human-gene k, EBV-gene s, human-lncRNA z, human-miRNA f, and EBV-miRNA u, respectively, in the human-gene GRN, EBV-gene GRN, human-lncRNA GRN, human-miRNA GRN, and EBV-miRNA GRN during the EBV infection, ^ðνÞ ^ðLÞ respectively; c^ðhÞ kl ; csl ; and czl show the transcriptional regulatory abilities of human-lncRNA z to regulate human-gene k, EBV-gene s, and human-lncRNA z, respectively, in the human-gene GRN, EBV-gene GRN, and human-lncRNA GRN during the EBV infection, respectively; ðhÞ ðνÞ ðLÞ ðhÞ ðνÞ 2 b^kr ; 2 b^sr ; 2 b^zr , 2 b^ fr ; and 2 b^ ur correspond to the repression abilities of human-miRNA r to inhibit human-gene k, EBV-gene s, human-lncRNA z, human-miRNA f, and EBV-miRNA u, respectively, in the human-gene GRN, EBV-gene GRN, human-lncRNA GRN, humanmiRNA GRN, and EBV-miRNA GRN during the EBV infection, respectively; ^ðhÞ ^ðνÞ ^ðνÞ ^ðLÞ 2 e^ðhÞ kq ; 2 esq ; 2 ezq ; 2 efq ; and 2 euq stand for the repression abilities of EBV-miRNA q to inhibit human-gene k, EBV-gene s, human-lncRNA z, human-miRNA f, and EBV-miRNA u, respectively, in the human-gene GRN, EBV-gene GRN, human-lncRNA GRN, human-miRNA GRN, and EBV-miRNA GRN during the EBV infection, respectively. The estimated weights (i.e., parameters) of the network links in intraspecies PPINs, intraspecies GRNs, interspecies PPINs, and interspecies GRNs thereby compose of the system network matrix H of the real GIGENs. In the network matrix H the corresponding parameter is zero if a link does not appear in the candidate GIGEN or has been pruned via AIC. Then, we extract the core
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
520
18. Constructing host/pathogen genetic-and-epigenetic networks
network of the real GIGEN by applying PNP to network matrix H. PNP is realized on the basis of the singular value decomposition of H in the following: H 5 UDν T
(18.50)
0
0
0
where UAℝ(2J12I1Q1R1L) 3 (J1I1Q1R1L1I Iv); νAℝ(J1I1Q1R1L1I Iv) 3 (J1I1Q1R1L1I Iv); and D 5 diag (d1,. . ., ds,. . ., dJ1I1Q1R1L1I0 Iv) includes the J 1 I 1 Q 1 R 1 L 1 I0 Iv singular values of H in descending order, that is, d1 $ ? $ ds $ ? $ dJ1I1Q1R1L1I0 Iv. Notably, diag (d1, d2) indicates the diagonal matrix of d1 and d2. Besides, we can define the eigenexpression fraction (Es) for the normalization of singular values as follows: Es 5
ds 2 0 I 00 J1I1Q1R1L1I P
(18.51) ds 2
s51
From the perspective of energy, we need to maintain the system energy of the whole network structure. Thus we choosePthe top K singular vectors of network matrix H with the minimum K so that it leads to Ks51 Es $ 85% to represent at least 85% energy of core network structure of GIGENs, which is composed of these top K principal components. Next, we define the projection (T) of network matrix H to these top K singular vectors of U and V as shown in the following: TR ðr; sÞ 5 hr;: 3 ν :;s ; for r 5 1; . . .; ð2J 1 2I 1 Q 1 R 1 LÞ and s 5 1; . . .; K TL ðl; sÞ 5 hT:;l 3 u:;s ; for l 5 1; . . .; ðJ 1 I 1 Q 1 R 1 L 1 I 0 IvÞ and s 5 1; . . .; K
(18.52)
where hr;: ; ν :;s ; h:;l ; and u:;s denote the rth row of H, the sth column of V, the lth column of H, and the sth column of U, respectively. Finally, we define and apply the 2-norm projection value of each node, including gene, miRNA, lncRNA, protein, and protein complex in the real GIGEN to the top K right singular vectors and the top K left singular vectors in the following: " #1=2 K X 2 DR ðrÞ 5 TR ðr; sÞ ; for r 5 1; . . .; ð2J 1 2I 1 Q 1 R 1 LÞ "
s51
K X DL ðlÞ 5 TL ðl; sÞ2
#1=2
(18.53) ;
for l 5 1; . . .; ðJ 1 I 1 Q 1 R 1 L 1 I 0 IvÞ
s51
It is implied that if the projection value DR(r) or DL(l) approaches zero, the contribution of the corresponding rth node or lth node, respectively, is insignificant and almost independent to the core network structure composed of the top K singular vectors [11,688]. Consequently, we build the core networks that consist of the core proteins, genes, and miRNAs by selecting the proteins, genes, and miRNAs with the top projection values in (18.53) from receptors to TFs and their associated genes and miRNAs. The extracted HVCNs between human and EBV at the first and second infection stage from the real GIGENs by the abovementioned PNP method are presented in Figs. 18.5 and 18.6, respectively, and the information concerning the number of nodes and edges of the HVCNs at the first and second infection stage are exhibited in Tables 18.5 and 18.6, respectively.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.3 Investigating interspecies molecular mechanisms for human B lymphocytes infected with EpsteinBarr virus
521
82 Human receptors 34 EBV proteins
166 Human proteins 1 EBV TF
20 Human TFs 10 EBV miRNAs 96 Human miRNAs
14 Human lncRNAs
588 PPIs 225 miRNA repressions 170 Transcriptional regulations
FIGURE 18.5 HVCN at the first infection stage in the lytic phase. The nodes with red frame correspond to the proteins/TF/miRNAs of EBV; the nodes with blue frame indicate the receptors/proteins/TFs/miRNAs/ lncRNAs of human; the edges in green denote the PPIs of human, EBV, human-EBV; the edges in purple represent the miRNA repressions of miRNAs on genes of intraspecies and interspecies; and the edges in black mean the transcriptional regulations of TFs on genes of intraspecies and interspecies [14]. EBV, EpsteinBarr virus; HVCN, hostvirus core network; PPI, proteinprotein interaction; TF, transcription factor.
18.3 Investigating interspecies molecular mechanisms for human B lymphocytes infected with EpsteinBarr virus 18.3.1 GIGENs of the first and the second infection stage in the lytic phase of B cells infected with EBV The GIGENs of the first and second infection stage are shown in Figs. 18.3 and 18.4, respectively. The number of nodes and edges are displayed in Tables 18.1 and 18.2, respectively. Among these edges, there are three human TF complexes identified in the real GIGENs. The first one is ARNT::AHR, which has 31 human TF-gene pairs at the first infection stage and 15 pairs at the second infection stage; the second one is HIF1A::ARNT, which has 16 human TF-gene pairs at the first infection stage and 3 pairs at the second infection stage; and the third one is NFE2L1::MAFG, which has 38 human TF-gene pairs at the first infection stage and 54 pairs at the second infection stage. There were no remarkable differences in the number of nodes between the first and second infection stage during the lytic phase. Nevertheless, the edges between the both infection stages in Table 18.2 could demonstrate significant differences in the human PPIs (first: 39,846/second: 28,325), interspecies
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
522
18. Constructing host/pathogen genetic-and-epigenetic networks
95 Human receptors 12 EBV proteins
148 Human proteins
14 Human TFs 11 EBV miRNAs 118 Human miRNAs
14 Human lncRNAs
435 PPIs 226 miRNA repressions 81 Transcriptional regulations 1 lncRNA regulations
FIGURE 18.6 HVCN at the second infection stage in the lytic phase. The nodes with red frame correspond to the proteins/miRNAs of EBV; the nodes with blue frame indicate the receptors/proteins/TFs/miRNAs/ lncRNAs of human; the edges in green denote the PPIs of human, EBV, human-EBV; the edges in purple represent the miRNA repressions of miRNAs on genes of intraspecies and interspecies; the edges in black mean the transcriptional regulations of TFs on genes of intraspecies and interspecies; and the edge in orange signifies the lncRNA regulation of lncRNA on gene of human [14]. EBV, EpsteinBarr virus; HVCN, hostvirus core network; PPI, proteinprotein interaction; TF, transcription factor.
PPIs (first: 86/second: 45), and EBV-miRNAs to human genes (first: 914/second: 620). The results also suggest that there are more interactions in human and between human and EBV, contributing to the enhancement of the transcriptional replication of viral particles via human, while EBV protects itself from silencing through miRNA of EBV, and inhibiting some biological processes of human, such as immune responses, apoptosis, autophagy, and metabolism. We have also done the DAVID [789] analyses of target genes in GIGENs to observe the specific functions between the first and second infection stage, and the results of analysis were presented in Tables 18.3 and 18.4. At the first infection stage, Table 18.3 indicates that EBV would start to operate the lytic replication at Ori-Lyt site, the initial site for viral lytic replication, and it initiates the viral life cycle, including decoding of genome information, translation of viral mRNA by human ribosomes, genome replication, and assembly and release of viral particles. The production encoded by BZLF1, an immediate-early gene of EBV in the lytic phase, easily binds to the response elements of hypermethylation; in addition, there is a viral early protein that enables to enhance the posttranscriptional modification of EBV gene expression.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
523
18.3 Investigating interspecies molecular mechanisms for human B lymphocytes infected with EpsteinBarr virus
TABLE 18.3 The specific cellular functions and related pathways of target genes in genome-wide interspecies genetic-and-epigenetic networks at the first infection stage by applying the DAVID analysis [14]. First infection stage Category
Term
P-Value
GOTERM_BP_DIRECT
GO:0000082—G1/S transition of mitotic cell cycle
6.36E 2 12
GOTERM_BP_DIRECT
GO:0019058—viral life cycle
3.3E 2 10
GOTERM_BP_DIRECT
GO:0045815—positive regulation of gene expression, epigenetic
4.58E 2 10
GOTERM_BP_DIRECT
GO:0006414—translational elongation
7.18E 2 10
GOTERM_BP_DIRECT
GO:0006996—organelle organization
7.93E 2 10
GOTERM_BP_DIRECT
GO:0043488—regulation of mRNA stability
4.5E 2 9
TABLE 18.4 The specific cellular functions and related pathways of target genes in genome-wide interspecies genetic-and-epigenetic networks at the second infection stage by applying the DAVID analysis [14]. Second infection stage Category
Term
P-Value
GOTERM_BP_DIRECT GO:0006351—transcription, DNA-templated
7.22E 2 11
GOTERM_BP_DIRECT GO:0006614—SRP-dependent cotranslational protein targeting to membrane
1.23E 2 6
GOTERM_BP_DIRECT GO:0006334—nucleosome assembly
1.48E 2 5
GOTERM_BP_DIRECT GO:0042787—protein ubiquitination involved in ubiquitin-dependent protein 1.87E 2 5 catabolic process GOTERM_BP_DIRECT GO:0097193—intrinsic apoptotic signaling pathway
3.85E 2 5
GOTERM_BP_DIRECT GO:0018279—protein N-linked glycosylation via asparagine
5.17E 2 5
SRP, Signal recognition particle.
But beyond that, the overexpression of some viral early proteins would cause the translational elongation factor to be phosphorylated, so this could strengthen the output and stability of the nuclear mRNA. At this stage, B cells in human enter G1/S transition of cell cycle, in which DNA replication is initiated, and prepare for undergoing the mitotic processing and, simultaneously, the formation of some organelles, such as autophagosomes, ribosomes, and the cytoskeleton. At the second infection stage, it mainly happens that new virions release outside the plasma membrane of B cells to infect other uninfected B cells or epithelial cells. As Table 18.4 indicates, it suggests that after the end of the replication, there are nucleosomes, being formed to protect the integrity of viral genome, and then these nucleosomes are packaged and assembled as well as transported outside. After that, virions are dependent on SRP, also known as signal recognition particle protein, to help virions being targeted to membrane. There are receptors of advanced glycation end products on the cell surface; after products and receptors form the complexes, these complexes can activate the
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
524
18. Constructing host/pathogen genetic-and-epigenetic networks
intracellular signaling pathways to initiate the reaction within the cells, like the intrinsic apoptotic signaling pathway or the ubiquitin-dependent protein catabolic signaling pathway to degrade the target proteins. However, the GIGENs have not concrete and competent information for us to investigate the lytic replication and cytolytic mechanisms between human and EBV during the lytic infection. Therefore we need to extract HVCNs from GIGENs at the both infection stages in the EBV lytic phase via the PNP method, as shown in Figs. 18.5 and 18.6, respectively.
18.4 HVCNs at the first and second infection stage in the lytic phase of B cells infected with EBV 18.4.1 The significant cellular processes of the HVCNs in the lytic replication cycle By using the PNP method, we could extract the HVCNs in B cells infected with EBV at the first and second infection stage in the lytic phase as shown in Figs. 18.5 and 18.6, respectively. The number of nodes and edges are exhibited in Tables 18.5 and 18.6, respectively. Among these edges in HVCNs, there are no edges of human TF complexes regulating the target genes after we apply the PNP method to the real GIGENs. There are a remarkable difference in the number of nodes about the proteins of EBV (the first infection stage: 34/the second infection stage: 12) during the lytic phase. The consequences may account for the importance of the EBV early expressed genes, which need to operate the cellular functions of replication to help new viral particles produce and have to prevent the premature death of the human cells during the virus production, simultaneously; however, due to the viral life cycle of EBV, the EBV late expressed genes may TABLE 18.5 Information concerning the number of nodes of the hostvirus core networks at the first and second infection stage [14]. First infection stage
Second infection stage
V_T
1
0
V_M
10
11
V_P
34
12
H_T
20
14
H_M
96
118
H_L
14
14
H_P
166
148
H_R
82
95
Total
423
412
Nodes
EBV, EpsteinBarr virus; H_L, lncRNAs of human; H_M, miRNAs of human; H_P, proteins of human; H_R, receptors of human; H_T, TFs of human; TF, transcription factor; V_M, miRNAs of EBV; V_P, proteins of EBV; V_T, TFs of EBV.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.4 HVCNs at the first and second infection stage in the lytic phase of B cells infected with EBV
525
TABLE 18.6 Information concerning the number of edges of the hostvirus core networks at the first and second infection stage [14]. First infection stage
Second infection stage
V_P ↹ V_P
42
8
V_T - V_M
0
0
V_M V_G
1
0
H_P ↹ H_P
510
419
H_T - H_G
126
67
H_T - H_M
24
7
H_T - H_L
16
4
H_M H_G
190
185
H_M H_M
4
1
H_M H_L
4
5
H_L-H_G
0
1
V_P ↹ H_P
36
8
V_T - H_G
3
0
V_M H_G
22
23
V_M H_M
1
4
V_M H_L
1
6
H_T - V_G
0
1
H_T - V_M
1
2
H_M V_G
1
1
H_M V_M
1
1
Total
983
743
Edges
-, Transcriptional regulations; ↹, PPIs; EBV, EpsteinBarr virus; H_G, genes of human; H_L, lncRNAs of human; H_M, miRNAs of human; H_P, proteins of human; H_R, receptors of human; H_T, TFs of human; TF, transcription factor; V_G, genes of EBV; V_M, miRNAs of EBV; V_P, proteins of EBV; V_T, TFs of EBV; , miRNA repressions.
express less than early expressed genes to prepare for entering the latent phase, in which nearly all viral genes are silenced for immune evasion by the human immune system. Furthermore, the edges between both infection stages in Table 18.6 could demonstrate significant differences in intraspecies PPIs (human, first: 510/second: 419; EBV, first: 42/second: 8), interspecies PPIs (first: 36/ second: 8), human-TFs to human genes (first: 126/ second: 67), human-TFs to human-miRNAs (first: 24/ second: 7), and human-TFs to human-lncRNAs (first: 16/ second: 4). These data indicate that EBV mainly affects human PPIs through proteinprotein interaction with human, and
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
526
18. Constructing host/pathogen genetic-and-epigenetic networks
TABLE 18.7 The specific cellular functions and related pathways of target genes in hostvirus core networks at the first infection stage by applying the DAVID analysis [14]. First infection stage Category
P-Value
Term
GOTERM_BP_DIRECT GO:0042493—response to drug
1.79E 2 08
GOTERM_BP_DIRECT GO:0043388—positive regulation of DNA binding
1.37E 2 05
GOTERM_BP_DIRECT GO:1902895—positive regulation of pri-miRNA transcription from RNA polymerase II promoter
6.55E 2 05
GOTERM_BP_DIRECT GO:2000378—negative regulation of reactive oxygen species metabolic process
1.17E 2 04
GOTERM_BP_DIRECT GO:0051090—regulation of sequence-specific DNA binding transcription factor activity
1.64E 2 04
GOTERM_BP_DIRECT GO:0016236—macroautophagy
2.14E 2 04
TABLE 18.8 The specific cellular functions and related pathways of target genes in hostvirus core networks at the second infection stage by applying the DAVID analysis [14]. Second infection stage Category
Term
P-Value
GOTERM_BP_DIRECT GO:0006919—activation of cysteine-type endopeptidase activity involved in apoptotic process
2.37E 2 04
GOTERM_BP_DIRECT GO:0032212—positive regulation of telomere maintenance via telomerase
3.20E 2 04
GOTERM_BP_DIRECT GO:0051897—positive regulation of protein kinase B signaling
.001802
GOTERM_BP_DIRECT GO:0006954—inflammatory response
.006715
GOTERM_BP_DIRECT GO:0070374—positive regulation of ERK1 and ERK2 cascade
.011796
GOTERM_BP_DIRECT GO:0015031—protein transport
.02314
human-TFs further transcriptionally regulate genes, miRNAs, and lncRNAs itself. In order to acquire more specific and functional mechanisms of the human genes during the EBV infection, we could do the analyses of target genes in HVCNs at the both infection stages by DAVID as well, and the results of analysis were presented in Tables 18.7 and 18.8. In Table 18.7, there are two crucial cellular functions at the first infection stage in response to drug and macroautophagy. Drugs would result in a change in the activity of lytic genes and stimulate or induce EBV from the latent phase reactivation into the lytic phase. Moreover, Faggioni et al. proposed that autophagy was blocked at the late process of autophagy mechanism to degrade microbiological infections during the EBV replication. Besides, through this block, EBV could enable to hijack the autophagic vesicles so that it would be more beneficial to enhance viral production for its intracellular transportation
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.4 HVCNs at the first and second infection stage in the lytic phase of B cells infected with EBV
527
[905]. In Table 18.8, protein transport and inflammatory response are two specific cellular functions at the second infection stage during the EBV infection. New virions may be conveyed with the autophagic vesicles mentioned previously and then be located at the plasma membrane. Upon the membrane lysis, these virions could transport and release outside the cell to prepare for infecting other uninfected cells. At that time, the human immediate defense system would detect some xenobiotics, so it elicits the inflammatory response to trigger the reaction of human immune system against the infection by new virions. For the purpose of adapting to the infection of EBV, it can find the posttranslation epigenetic modifications in HVCNs to regulate some intracellular signaling pathways.
18.4.2 The intracellular signaling pathways in HVCNs modified by the epigenetic regulation during the lytic infection These epigenetic modifications can be attributed to the changes of the basal level of β (h)i between the first and second infection stage in the dynamic model of human protein expression in Eq. (18.1). In HVCNs of the first infection stage in Fig. 18.5, the human proteins (CARD9, PSMA3, RNF41, TRIM29, NFATC1, and STAT3) in the immune response signaling pathway are regulated by the acetyltransferase proteins (NAT1 and KAT5), the deacetylase protein (HDAC8), the E3 ubiquitin ligase protein (SIAH1), and the methyltransferase-associated protein (MTHFD1L); the human proteins (CARD9, MAPK10, and MAPK8) in Toll-Like receptor signaling pathway are regulated by the acetyltransferase protein (NAT1) and the deacetylase protein (HDAC8); the human proteins (PSMA3, GRB2, TRIM29, and STAT3) in the interleukin (IL)-3, -5 and GM-CSF signaling pathway are regulated by the E3 ubiquitin ligase protein (SIAH1), the acetyltransferase protein (NAT9), the deacetylase protein (HDAC8), and the methyltransferase-associated protein (MTHFD1L); the human proteins (ESR1, ESR2, AR) in the nuclear receptor transcription pathway are regulated by the ubiquitin-associated proteins (USP11 and UBE2C), the acetyltransferase proteins (NAT1 and NAT9), and the methyltransferase-associated protein (MTHFD1L); the human proteins (TP73, SLC25A6, and DAXX) in the apoptotic modulation signaling pathway are regulated by the E3 ubiquitin ligase protein (SIAH1), the acetyltransferase proteins (KAT5 and NAT9), the deacetylase protein (HDAC8), and the ubiquitin-conjugating protein (UBE2E1); the human proteins (SLC25A6, HPD, PPCDC, CHST14, and PLCD1) in the integration of energy metabolism pathway are regulated by the acetyltransferase proteins (KAT5 and NAT9), the deacetylase protein (HDAC8), and the ubiquitin-conjugating protein (UBE2E1); the human proteins (GRB2, NFATC1, MAPK10, MAPK8, and STAT3) in the IL-2 pathway are regulated by the acetyltransferase protein (NAT9), the deacetylase protein (HDAC8), and the methyltransferase-associated protein (MTHFD1L); the human proteins (MRPL2 and HPD) in the viral mRNA translation signaling pathway are regulated by the deacetylase protein (HDAC8). These pathways at the first infection stage play the role of not only giving assistance for the translation of viral proteins but also providing the better microenvironment for the infected cell proliferation. In addition, these processes accompany with immune response, regulation of apoptosis, and metabolism of products, able to help many cellular functions operated by energy.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
528
18. Constructing host/pathogen genetic-and-epigenetic networks
In addition, in HVCNs of the second infection stage in Fig. 18.6, the human proteins (IKBKB and CRADD) in TWEAK pathway are regulated by the methyltransferase protein (FTSJ1) and the ubiquitin ligase protein (UBE3A); the human proteins (BAX and CRADD) in both apoptosis and survival caspase cascade pathway are regulated by the ubiquitinconjugating protein (UBE2C) and the ubiquitin ligase protein (UBE3A); the human proteins (YBX1 and PCBP2) in mRNA splicing pathway are regulated by the ubiquitin-conjugating protein (UBE2C) and the methyltransferase protein (NSUN2); the human proteins (YBX1 and EGFR) in PI3K-Akt signaling pathway are regulated by the methyltransferase proteins (FTSJ1 and NSUN2); the human proteins (ARRB2 and EGFR) in IL-3, -5 and GM-CSF signaling pathway and GPCR pathway are regulated by the methyltransferase proteins (FTSJ1 and NSUN2); the human proteins (FUT8 and PPARD) in metabolism signaling pathway are regulated by the ubiquitin ligase protein (UBE3A). Likewise, metabolism signaling pathway can generate energy to help new virions being transported to membrane and being released into extracellular space. After the cells are subjected to cell lysis, the programming of apoptosis enables to make the cells apoptotic, when infected with the lytic infection. From the modified results of pathways at the both infection stages, we speculated that in order to propagate the progeny virus, EBV could positively exploit human cells to promote the infected cell proliferation and transport and release progeny virus outside during the lytic phase. In the meantime, EBV is subjected to the offense by the immune response and apoptosis, so EBV has to develop the defense mechanism for survival. However, it is more difficult to understand the detailed mechanism of lytic replication by EBV at the molecular level in HVCNs, so we extracted HVCPs from HVCNs during the lytic phase by further using the significant changes of the gene expression between the first and second infection stage through the corresponding P-value and the PNP method to find the threshold value, as shown in Figs. 18.7 and 18.9.
18.5 HVCPs at the first and second infection stage during the lytic replication cycle 18.5.1 The new virion production through hostvirus cross-talk interactions at the first infection stage Although a large number of studies have been made on the cross talk between virus and host, little is known about hostvirus PPIs and the real genetic-and-epigenetic network communication. Thus, by further using the significant changes of the gene expression between the first and second infection stage via P-value and the PNP method, we extracted HVCPs during the lytic phase, being divided into the first and second infection stage as shown in Figs. 18.7 and 18.9, respectively. The EBV intraspecies connections, including {LMP1, BKRF2}, {LMP1, BLLF2, EBNA3B}, {BCLF1, BFRF3}, and {EBNA2, Zta}, and the interspecies connections, including {MAPK7, BDLF1} and {PSMA3, BVRF1} in Fig. 18.7 or Fig. 18.9 can be validated by the literature [906]. Human cells possess the abilities to affect the behavior of proteins in order to accommodate rapidly varying circumstance, while EBV cells, inclusive of EBV miRNA, also called miR-BARTs, have the ways to manipulate the operations of viral proteins and even human proteins for the purpose of survival and propagating progeny in this microenvironment, as well. Especially, EBV
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
529
18.5 HVCPs at the first and second infection stage during the lytic replication cycle RAC3 PSAP
BAX
FOS CLOCK FOXP1 APH1A
EBV- EBV- EBVPTK7 TGFB1I1 BKRF2 BALF4 LMP1
EBVBBRF3 GRB2
EBVBLLF1
ATG5 MAPK7 KANK2
EBVBFLF2
EBVBHRF1
EBVBDRF1
TRAF3
RELA
JAK2
CHMP5 ERBB3
NFKB1
RNF41 CD46
Ligands Membrane proteins EBVEBNA2
RPS14 EBVBFRF3
DAXX
EBVBORF1
EBVBALF3
EBVBCLF1
EBVLMP2B
EBVBNLF2A
Genes
PSME3
EBVBLLF2
SIAH1
RAB7A
EBVEBNA3B
JUN HDAC8
EBVBDLF4
CARD9
RBPMS
PNMA5 NAT1
BECN1
RGS17
STAT3 miR1244-1
miR301B
ZEB1
MYC
AR
GATA2
ESR1
EBVZta
ETS1
miR4465 miR24-1 miR200B
miR185
Cytosolic proteins TFs microRNAs
CHEK1 CCDC136
NUCB1
SLC25A6 SCO2
Plasma membrane
EBVEBV- BNRF1 BDLF1
PPIs Translocations Positive regulations in GRN Negative regulations in GRN miRNA repressions Acetylation Deacetylation Ubiquitination DNA methylation
Nuclear membrane
EBVmiR-BART1-3p EBVmiR-BART5
miR98
miR4659A miR1233-1 EBVmiR-BART14
PTK7
BECN1
STAT3 TGFB1I1 MYC miR185
Proapoptosis Latency maintenance
Mitochondrial productrelease
SLC25A6 ATG5
EBVSNHG5 CHMP5 miR-BART1-3p
Autophagy inimmunity
Antiapoptosis
Viral translocation breakpoint
RAB7A
RNF41
CD46
Ubiquitination
Viral release
Acetylation
miR346
EBVNAT1 CLOCK EBVBZLF1 miR-BART5
Energy metabolism
EBVEBNA2
Cell proliferation
Antiinflammation
FIGURE 18.7 The HVCP in B cells infected with EBV at the first infection stage in the lytic phase. The solid lines indicate the proteinprotein interactions; the dot lines denote the translocations, including protein translations and miRNA transcriptions; the solid lines having end point with arrow, bar, or circle stand for positively transcriptional regulations, negatively transcriptional regulations, miRNA repressions, respectively; the dashdot lines represent the gene functions that are inhibited; the bold lines mean the gene functions that are promoted; the stubby arrows beside the gene functions signify their abilities being repressed or enhanced [14]. EBV, EpsteinBarr virus; HVCP, hostvirus core pathway.
microRNAs can achieve the human immune evasion through the operations of antiapoptosis in HVCPs [907]. These are commonly referred to as epigenetics or posttranslational modifications, including DNA methylation, ubiquitination, acetylation, and deacetylation. The epigenetic effects on proteins are more competent than the genetic effects on DNA transcription so that cells have the abilities to adapt to the diverse environments during the EBV lytic infection. In order to promote viral persistence, EBV has evolved various strategies to modulate the human immune response, including interfering with antigen presentation pathways, impairing apoptotic signaling pathways, and suppressing human immune cell function [908]. Following lytic reactivation of EBV from the latent infection into the lytic
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
RELA
NFK B1
(A)
RNF41
Plasma membrane
CD46 EBVEBNA2
(B)
ERBB3
PNM A5 EBVZta
ETS1
EBVm iR-BART1-3p
m iR4465
EBVm iR-BART5
SLC25A6 RG S17 CARD9
m iR346
EBVm iR-BART14
m iR98
CH EK 1
Nuclear membrane
NAT1
m iR1233-1 EBVEBV- EBVBZLF1 EBNA2 m iR-BART5
RNF41 CD46 SNH G 5 NAT1 CLO CK
Nuclear membrane
EBVm iR-BART1-3p
m iR1244-1
Viral translocation breakpoint
Ubiquitination Autophagy inimmunity
Energy metabolism
m iR185
Cell proliferation SLC25A6 M YC
Acetylation
ZEB1
M YC
m iR185
EBVNAT1 m iR-BART1-3p
Antiapoptosis
Antiinflammation
Acetylation
Proapoptosis Mitochondrial productrelease CLO CK FO S
(C)
JAK 2 EBVBALF4
ATG 5 G RB2
EBVBDLF4
(D)
RNF41
PSM E3 EBVRPS14 EBNA3B
EBVLM P1
Plasma membrane
TRAF3
RAB7A
Antiapoptosis
APH 1A
BAX
K ANK 2
CH M P5
EBVBNRF1
Plasma membrane
DAXX
SLC25A6
H DAC8
RBPM S
JUN
CCDC136 BECN1 M YC EBVm iR-BART14
ESR1
m iR24-1
EBVm iR-BART1-3p
m iR1244-1
m iR301B CLO CK ATG 5
Energy metabolism
RAB7A
SLC25A6
Mitochondrial productrelease
RNF41 NAT1
FO XP1 EBV-
EBV-
EBV- EBV-
Plasma PTK 7 TG FB1I1 BK RF2 BALF4 LM P1 BLLF1 membrane EBVEBVBFLF2
BH RF1 EBVBFRF3 EBVBCLF1 EBVBNLF2A
EBVEBNA3B
H DAC8
RBPM S BECN1
Nuclear membrane
STAT3 EBVm iR-BART1-3p
m iR301B BECN1
Autophagy inimmunity
STAT3
Latency maintenance
TG FB1I1
Proapoptosis
Viral release
Virion production
Cytosolic proteins TFs m icroRNAs Genes
EBVBLLF2 EBVBDLF4
EBVBALF3 EBVLM P2B
Nuclear membrane
Ligands Membrane proteins
EBVBDLF1
EBVBO RF1 SIAH 1
SCO 2
EBVBBRF3 M APK 7 EBVBDRF1
CH M P5
Antiapoptosis Viral translocation breakpoint
G ATA2 m iR200B
SNH G 5 PTK 7
Ubiquitination Acetylation
RAC3 PSAP
(E)
m iR4465
m iR4659A
BECN1
Autophagy inimmunity
AR
Nuclear membrane
NAT1
Acetylation
PPIs Translocations Positive regulations in G RN Negative regulations inGRN miRNA repressions Acetylation Deacetylation Ubiquitination DNA methylation
FIGURE 18.8 The signaling pathways of the interspecies molecular mechanisms based on HVCP in Fig. 18.7 at the first infection stage during the EBV infection. (A) The core pathways of promoting cell proliferation and the impairment of immune information by EBNA2-mediated pathway with receptor CD46; (B) the blocked proapoptotic pathway of human by EBV through the changes of ubiquitination and acetylation; (C) the blocked autophagy mechanism by EBV through the involvement of viral BALF4, BDLF4, EBNA3B, miR-BART14, and miR-BART1-3p; (D) the complete progression of lytic production through the impairment of proapoptosis and the promotion of viral translocation and antiapoptosis; (E) the promotion of the integrated production for infectious virions by silencing autophagy and inhibiting the expression of STAT3 [14]. EBV, EpsteinBarr virus; HVCP, hostvirus core pathway.
18.5 HVCPs at the first and second infection stage during the lytic replication cycle CCL19
ESR2
GABRG1
CCL23
TNFRSF10D
ARRB2
EBVBNLF2B
EIF2AK2
LRRK2 EBVLMP1
Plasma membrane
IL10RA
NRP1
Ligands Membrane proteins
EBVEBNA1 EBVBCRF1 CARD8
Cytosolic proteins TFs
PML EBVBPLF1
PRKACB
ncRNAs
NUP155 BDH1
EBVBALF3
ABCA12
HSD17B4
HOOK1 GLDN
RASSF10
Genes PPIs Translocations Positive regulations in GRN Negative regulations in GRN miRNA repressions Acetylation Ubiquitination DNA methylation
EBVBORF1 IKBKB EBVBVRF1
TFEB TRIM3 TUBGCP2
PSMA3 FAM98B
HNRNPU
TCTN1 VCAM1
TRIM46
BAX
PCBP2
ST3GAL3
CLIC5 NFATC2
SPIB
TP73
CTCFL
531
YBX1
STAT3
RUNX2
Nuclear membrane
miR5586
EBVmiR-BART1-3p
miR296
miR130B
EBVmiR-BART14
AFG3L1P EBVmiR-BART10
miR127
miR421
EBVEBVPRKACB FAM98B EBV- BAX miR130B TRIM3 ARRB2 AFG3L1P CLIC5 PCBP2 NRP1 EBVmiR-BART1-3p miR-BART10 EBNA1 BCRF1
Envelope tRNA ImmunoAntiProMitochondrial Lytic cycle Virion Transport assembly splicing suppression apoptosis apoptosis fusion repression production
FIGURE 18.9 The HVCP in B cells infected with EBV at the second infection stage in the lytic phase. The solid lines indicate the proteinprotein interactions; the dot lines denote the translocations, including protein translations and miRNA transcriptions; the solid lines having end points with arrow, bar, or circle stand for positively transcriptional regulations, negatively transcriptional regulations, miRNA repressions, respectively; the dashdot lines represent the gene functions which are inhibited; the bold lines mean the gene functions that are promoted; the stubby arrows below the gene functions signify their abilities being repressed or enhanced [14]. EBV, EpsteinBarr virus; HVCP, hostvirus core pathway.
infection, the viral immediate-early (IE) lytic genes, BZLF1 and BRLF1, are expressed at first. They then collaboratively activate the promotors of the early (E) lytic genes, which encode the viral replication proteins. Next the viral genome replication, the late (L) viral genes are transcribed. The late EBV genes enable to encode some structural proteins required for viral genome encapsidation into infectious viral particles [885]. From the HVCP of the first infection stage in the lytic phase (Fig. 18.7), we further investigated the cellular mechanisms of the first infection stage by dividing HVCP in Fig. 18.7 into five parts as shown in Fig. 18.8AE. In Fig. 18.8A, the input signal, NFKB1,
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
532
18. Constructing host/pathogen genetic-and-epigenetic networks
playing a key role in regulating the immunological response to infections, mediates many biological processes such as inflammation, immunity, cell proliferation, and apoptosis. Here, by NFKB1, human cell sends the cell proliferation signal to induce more immune cell growth and the immune-related signal to antagonize the EBV infection of lytic reactivation. Receptor CD46 receives the immunity and cell growth signals via NFKB1 binding. CD46 is a costimulatory factor for the development of T-helper cell through IL-10 release, suppressing immune responses to prevent autoimmunity. Normally, the immune evasive strategy of EBV appears to rely strongly on IL-10, so EBV itself also codes for an IL-10 homolog, expressing during the lytic phase [909]. Therefore the viral protein, EBNA2, interacts with CD46 in order to exploit its immune regulation property and directly induce an immunosuppressive phenotype. This result leads to CD46 being significantly downregulated (P-value 5 5.73 3 10216) by the NFKB1-mediated apoptosis, so the expression of human TF, ETS1, at the downstream is downregulated (P-value 5 3.76 3 1024). The human genes (NAT1 and SNHG5) are positively regulated by ETS1, while the human genes (CD46 and RNF41) are negatively regulated by ETS1. The significantly low expression of NAT1 (P-value 5 2.93 3 10237) could be due to the low activity of ETS1, the inhibition of viral IE protein, Zta, the repression of viral miRNA, miR-BART1-3p, and DNA methylation (P-value 5 4.23 3 1025). The main function of NAT1, an acetyltransferase protein functioning as a xenobiotic metabolizing enzyme, is to impair the xenobiotic substances by acetylation. The significantly low expression of SNHG5 (P-value 5 1.32 3 10213) could also be owing to the low activity of ETS1, the repression of human miRNA, miR4465, and DNA methylation. SNHG5, a member of long noncoding RNA, enables to suppress the infected cell proliferation, migration, and invasion, and then prevents the translocation of cell growth factors from the cytoplasm into the nucleus and interrupts the viral translocation from the nucleus to the cytoplasm, so it functions as a translocation breakpoint in a B cell lymphoma [910,911]. The gene of human receptor, CD46, as mentioned previously, is downregulated by the impairment of its ligand, NFKB1, and DNA methylation (P-value 5 3.89 3 1025), so its operation on autophagy in immunity is reduced. Thus human originally wants to exploit the functions of genes (NAT1, SNHG5, CD46) to defeat EBV during the lytic reactivation, but EBV successfully evades the attacks of human. In addition, the remarkably high expression of RNF41 (P-value 5 8.21 3 1028) is owing to the low transcriptional inhibition of ETS1 and the low repression of miR98 (P-value 5 4.84 3 10219). RNF41 belongs to the family of RING finger-containing proteins and has been investigated to be involved in TLR-mediated responses, growth regulation, and inflammatory responses by promoting the ubiquitination of target proteins [912,913]. Thus RNF41 could degrade several proteins, which are infected with EBV, by the ubiquitination. However, RNF41 would be degraded by the ubiquitination when it functions as a receptor in the autophagy mechanism pathway blocked by EBV. In addition, because of interacting with CD46, EBV successfully evades the immune attack by NFKB1. Then, the signal of cell growth induces a viral immediate-early protein, Zta, for the transcriptional activation of early lytic genes via virus protein EBNA2, appearing to be more efficient in the upregulating genes which involve in the infected cell proliferation and survival [914]. Therefore the high activity of EBNA2 (P-value 5 2.01 3 10224) enables to enhance the happening of the infected cell proliferation. EBNA2 is repressed by viral microRNA, miR-BART5, which collaborates with EBNA2 to control the
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.5 HVCPs at the first and second infection stage during the lytic replication cycle
533
transcriptional regulation of lytic replicate genes about the infected cell proliferation. On the other hand, the immunity signal induces the immune-evasive strategy of EBV. EBV develops the resistance to apoptosis by counteracting the proapoptotic function of p53 with viral miRNA, miR-BART5 [915]. EBV-miR-BART5 functions as a role of antiapoptosis, while the high expression of EBV-BZLF1 (P-value 5 .023) has the ability to reduce the inflammatory responses, which will result in the operation of innate immune responses. EBV-miR-BART5 and EBV-BZLF1 are silenced by MIR346 and MIR1233-1, respectively, but the low expression of MIR346 and MIR1233-1 demonstrates that B cell still needs other proteins to inhibit the progression of the lytic replication cycle. The human genes (CLOCK and NAT1) are transcriptionally inhibited by EBV IE protein, Zta, and repressed by viral miRNA, miR-BART14 and miR-BART1-3p, respectively. CLOCK can result in the energy metabolism to induce the progression of apoptosis and autophagy, so the low activity of CLOCK cannot destroy viral proteins. A brief overview of Fig. 18.8A is given as follows: at the first infection stage of EBV infection, the human proteins (CLOCK, NAT1, SNHG5, CD46, and RNF41) are inhibited by epigenetic modifications or EBV proteins, or miRNAs or the low activity of TF, so that human is unable to make advantage of their operations to defeat EBV, while EBV develops the defense mechanism to antagonize the attacks through the responses of antiinflammation and antiapoptosis, and then this thereby enhances the ability of the infected cell proliferation via EBNA2 of EBV. In Fig. 18.8B, RELA is the subunit of NF-κB, related to many biological processes such as inflammation, immunity, differentiation, cell growth, tumorigenesis, and apoptosis. Its heterodimeric complex would induce ERBB3 to activate the pathways, which lead to cell apoptosis. Accordingly, human receptor, ERBB3, can transmit the signals from RELA via PNMA5 to human TFs, ZEB1 and MYC, which enable to induce proapoptosis, triggering the apoptotic process by releasing the mitochondrial products, and the inhibition of viral antiapoptosis. On the contrary, human TFs, MYC and ZEB1, are subjected to proteolysis by the ubiquitin proteins MUL1 and UBE2E1, respectively, via ubiquitination by the ubiquitinproteasome pathwayrelated proteins, so this decreases the transcriptional regulation for target genes. Thus, at the first infection stage, the low activity of genes, miR185 and MYC, does not have the abilities to operate the proapoptotic development. Humangene MYC is also subjected to the control of DNA methylation (P-value 5 7.95 3 1025). In addition, the low activity of MYC at the first infection stage affects the low expression of SLC25A6, being suffered from the repression of human miR1244-1, the transcriptional silence of DNA methylation (P-value 5 1.14 3 1022), and the acetylation by acetyltransferase protein KAT5. It causes that SLC25A6 could not trigger the apoptosis through the release of mitochondrial products, so that this indirectly suppresses the progression of proapoptosis. In addition, the low expression of ZEB1 (P-value 5 1.72 3 10213) is unable to transcriptionally inhibit EBV-miR-BART1-3p. It gives rise to a high expression of EBV-miRBART1-3p, promoting the antiapoptosis against the innate immune responses and proapoptosis, and successfully repressing the expression of target gene, NAT1. This prevents EBV from acetylation by NAT1 during the lytic replication. CHEK1 is required for the checkpoint to mediate the cell cycle arrest in response to DNA damage or the presence of unreplicated DNA. CHEK1 promotes the activation of TP53 and promotes the cell cycle arrest and suppression of the infected cellular proliferation. SLC25A6 is ubiquitously expressed in all tissues and involves in the regulation of cell
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
534
18. Constructing host/pathogen genetic-and-epigenetic networks
viability and apoptosis triggering. CARD9 plays an important regulatory role in cell apoptosis and innate immune response to a number of intracellular viruses. Therefore the human proteins (CHEK1, SLC25A6, CARD9, and NAT1) form the signaling transduction pathway, mediating cell apoptosis to promote the cell death of EBV. However, due to NAT1 being repressed by EBV miR-BART1-3p, the inactive acetylation caused by NAT1 results in the activations of CHEK1 and SLC25A6, subjected to the acetylation by acetyltransferase proteins MGAT4B and KAT5, respectively, and the ubiquitin protein MIB2 has an ubiquitined effect on CARD9. These epigenetic modifications might cause the following three effects: CHEK1 cannot suppress the expression of viral proteins infected cell proliferation of viral proteins, SLC25A6 cannot translocate ADP into mitochondria and ATP into cytoplasm, decreasing the release of mitochondrial products to trigger apoptosis, and CARD9 is unable to operate the immune functions to defeat EBV. These cellular conditions mentioned previously indicated that EBV miR-BART1-3p mediates the apoptotic dysfunction of human B cell for the purpose of survival. These findings purposed a crucial role for EBV to avoid apoptosis through its miRNAs at the first stage of EBV infection, and further block the proapoptotic pathway of human through the changes of ubiquitination and acetylation. There are growing evidence supporting the proviral role of caspase/apoptotic pathway in viral replication [11,788,904]. Autophagy is a catabolic pathway that helps cells degrade and recycle nutrients under stress conditions to promote cell survival. Although autophagy generally functions as a defense mechanism against viral infection, many viruses have learned how to manipulate the autophagic pathway for their own benefit. EBV exploits the autophagy mechanism for its intracellular transportation in order to enhance virion production by blocking the autophagy mechanism at the final degradative procedure during the EBV lytic replication, and by hijacking the autophagic vesicles. As shown in Fig. 18.8C, JAK2 mediates essential signaling events in both the innate and adaptive immunity. FOS has implicated as regulators of cell differentiation, transformation, and proliferation, and the expression of its gene has also been associated with the apoptotic cell death and a critical function in regulating the development of cells to form and maintain the cytoskeleton. CLOCK has the ability to regulate the circadian rhythms. CLOCK also manipulates a variety of physiological processes, which are translated into rhythms in metabolism and behavior, including metabolism, blood pressure, endocrine, and immune function. In addition, RNF41 acts as an E3 ubiquitin ligase protein and regulates the degradation of target proteins. RNF41 plays a role in the type I cytokine receptor signaling through controlling the balance between the JAK2-associated cytokine receptor degradation and ectodomain shedding. GRB2 may elicit an active programed cell death by suppressing the infected cell proliferation signals. ATG5 mediates the lipidation of LC3 (LC3II), which is a marker of autophagy, and is important for the formation or elongation of autophagosomes. ATG5 is involved in several cellular processes, including the lymphocyte development and proliferation, MHC II antigen presentation, autophagic vesicle formation, and apoptosis. On the one hand, human receptors (RNF41, GRB2, and ATG5) interact with each other and signal some immune or apoptotic information from ligands (JAK2, FOS, and CLOCK, respectively) to induce SLC25A6 via TFs, ESR1 and MYC, and then SLC25A6 can trigger apoptosis by releasing the mitochondrial products to defeat the invasion of EBV (shown in
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.5 HVCPs at the first and second infection stage during the lytic replication cycle
535
Fig. 18.8C). Besides, human protein ATG5 is reported to participate in the initiation of autophagy, which enhances the autophagic vesicle formation. Another autophagy-related protein, RAB7A, plays an essential role for the intact autophagy mechanism, is necessary to fuse between autophagosomes and lysosomes, and is involved in the biogenesis of lysosome. RPS14 catalyzes protein synthesis related to the formation of autophagosome, and RBPMS functions as a coactivator of the transcriptional activity. Moreover, BECN1, which is essential for autophagy, mediates the nucleation and maturation of autophagosome. On the other hand, human exploits ligands (JAK2, FOS, and CLOCK) to transmit the autophagy mechanism signals through receptors to induce autophagy-related proteins (ATG5, RAB7A, and BECN1), operating the autophagy function as one of the offensive mechanisms against EBV infection during the lytic replication. However, EBV-miRBART1-3p suppressed the xenobiotic acetylation by repressing the human gene, NAT1. The inactive acetylation caused by NAT1 results in the activations of SLC25A6 and autophagy-related RAB7A, subjected to the acetylation by acetyltransferase proteins KAT5 and CSGALNACT1, respectively. In addition, human proteins (RNF41, GRB2, and MYC) are degraded by the ubiquitination through the binding of the ubiquitinproteasome pathwayrelated proteins, UBE2K, USP46, and MUL1, respectively, so the low expression of these human proteins could lead to the reduction of the transmission of apoptotic and autophagic signals. In addition, in Fig. 18.8C, EBV miR-BART14 represses the gene of ligand, CLOCK, which could cause the low expression of CLOCK (P-value 5 6.16 3 1026), so CLOCK could decrease the energy metabolism so that CLOCK cannot induce the autophagic signal to autophagic-related ATG5. However, the low activity of ATG5 (P-value 5 6.75 3 10232) is due to not only the low expression of its ligand, CLOCK, but also the transcriptional inhibition by DNA methylation and the repression by human miR24-1 with high activity (P-value 5 9.67 3 1027). This affects the formation or elongation of autophagosomes via ATG5, which could lead to the reduction of the autophagy in immunity. At the first infection stage, SLC25A6 with a low activity by acetylation, DNA methylation (P-value 5 1.14 3 1022), repression of human miR1244-1, and the reduction of transcriptional regulation via TFs, ESR1 and MYC, could lead to an inability to induce apoptosis by releasing the mitochondrial products, and make an effect on the autophagic pathway (RNF41, RPS14, SLC25A6, RBPMS, and BECN1), which cannot successfully operate the functions of BECN1 to mediate the nucleation and maturation of autophagosome. Besides, BECN1 is repressed by a high expression of miR301B (P-value 5 .012) and subjected to DNA methylation, which could cause BECN1 to decrease the operations of the autophagy in immunity, as well (see Fig. 18.8C). The low expression of autophagy-related RAB7A (Pvalue 5 4.46 3 1028) is due to the acetylation and the repression of human miR4659A with a high activity (P-value 5 .036). Downregulated RAB7A could further weaken the lysosomal degradation through reducing the number of lysosomes and the fusion between autophagosomes and lysosomes, so the reduction of RAB7A could indicate an underlying mechanism leading to the intact autophagy mechanism being blocked, to be observed during the EBV lytic replication. In Fig. 18.8C, upon EBV blocking the progression of autophagy, viral protein, EBNA3B, negatively interacts with human protein, PSME3, which promotes the ubiquitination and proteasomal degradation, so it results in inhibiting apoptosis. We speculated that viral
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
536
18. Constructing host/pathogen genetic-and-epigenetic networks
EBNA3B may hijack the ubiquitination function of RNF41 by negatively interacting with PSME3, so that EBV exploits other ubiquitin proteins to perform the ubiquitination at the first and second infection stage in the lytic phase. In addition, EBNA3B further positively interacts with human protein, RBPMS, which plays a role as a coactivator of transcriptional activity, so RBPMS contributes to help EBV employ the autophagy-related BECN1. Furthermore, viral receptor, BALF4, is an envelope glycoprotein that could form spikes at the surface of virion envelope, so BALF4 is essential for the attachment to autophagosome surface. Therefore we could suggest that BALF4 is involved in the fusion between EBV virions and autophagosome membranes leading to EBV transportation to B cell membranes for lysis. Among these processes, another viral protein, BDLF4, is important for the EBV lytic replication cycle, but it is currently an uncharacterized viral protein. Thus we speculated that viral protein BDLF4 collaborates with BALF4 and EBNA3B to block the autophagy mechanism and hijack the autophagic vesicles so that it contributes to enhance the viral production and intracellular transportation of virions. It has uncovered that viral proteins (BALF4, BDLF4, and EBNA3B) and viral miRNAs (miR-BART14 and miR-BART1-3p) block the autophagy mechanism in the B cells to interfere with viral antigen presentation and to avoid their degradation (see Fig. 18.8C). Hence, the blocked autophagy mechanism mediated by EBV can occur at different steps of the autophagy mechanism pathway, from the formation of autophagosome to the degradation of lysosome. Therefore understanding the relationship between EBV and autophagy can help find out new approaches to manipulate EBV infection and lytic replication through the autophagy control. In Fig. 18.8D, BAX, belonging to the BCL2 protein family and being proapoptotic, would accelerate the programed cell death and apoptosis. Under stress conditions, BAX experiences a conformational variation that results in translocation to the mitochondrion membrane and then leads to the release of cytochrome C that then triggers apoptosis. BAX as a ligand binds to a human receptor, CHMP5, involved in the degradation of surface receptor proteins and the formation of endocytic multivesicular bodies (MVBs), to sort out endosomal cargo proteins into MVBs, and sometimes to function in the membrane fission, such as the lysis of enveloped viruses. Thus CHMP5 can transport proapoptotic signals, triggered by BAX, outside B cell membrane into cytoplasm by the endocytic MVBs and endosomal trafficking. Then, CHMP5 signals directly to TF, AR, and indirectly to TF, GATA2, via JUN, involved in the TLR pathway, and CHMP5, accordingly, induces the proapoptosis and interruption of translocation and inhibits antiapoptosis function (shown in Fig. 18.8D). However, CHMP5 has a low activity at the first infection stage because CHMP5 is degraded via the ubiquitination by ubiquitin ligase protein, HUWE1, which could reduce the expression of CHMP5 so that the low activity of TFs (GATA2 and AR) could cause the decreasing of transcriptional silencing to human genes (TGFB1I1 and PTK7), thereby being highly expressed (P-value 5 5.37 3 1023 and P-value 5 2.96 3 10213, respectively), which could promote the antiapoptosis of B cells infected with EBV to protect EBV from cell death. On the one hand, the low activity of lncRNA, SNHG5 (P-value 5 1.32 3 10213), is owing to the low expression of human TF, AR, the repression of human miR4465, and DNA methylation, and finally causes the decreasing responses to interrupt the translocation of viral genes. On the other hand, the low activity of CHMP5 is because of the ubiquitination,
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.5 HVCPs at the first and second infection stage during the lytic replication cycle
537
the reduction of transcriptional regulation of TF, AR, the repression of human miR200B, and the DNA methylation, so that these effects could impair the proapoptotic function and avoid viral release at the first infection stage in the lytic phase to maintain the progression of the completely viral lytic replication without disturbance. These results as shown in Fig. 18.8D are mainly due to the ubiquitination of CHMP5. There is another pathway mediated by EBV to induce the antiapoptosis and reduce the interruption of viral translocation and proapoptosis. APH1A, an endoprotease that catalyzes the intramembrane cleavage of integral proteins and membrane protein ectodomain proteolysis, binds to the receptor, KANK2, involved in the control of cytoskeletal formation by regulating the actin polymerization and promotion of cell proliferation as shown in Fig. 18.8D. KANK2 plays a role in the regulation of caspase-independent apoptosis. However, the high expression of viral protein, BNRF1 (P-value 5 5.15 3 1024), a tegument protein that plays a role in the suppression of human intrinsic defenses to enhance the activation and transcription of viral early genes, could negatively interact with receptor KANK2 to evade the caspase-independent apoptosis. Then, BNRF1 interacts with human DAXX, may function to regulate the apoptosis in the cytoplasm, and thereby disrupts the complex between DAXX and ATRX. Suppressing the DAXX-ATRX-dependent deposition of histone H3.3 on the viral chromatin could allow the viral transcription. The low expression of DAXX (P-value 5 2.67 3 1023) is owing to the disruption of viral BNRF1, the deacetylation by a histone deacetylase protein, HDAC8, and the positive interaction with the low activity of CHMP5. On the contrary, DAXX negatively can interact with CCDC136, acting as a coiled-coil protein, and CCDC136 can be combined into a rope of intermediate filaments that is an important component of cytoskeleton contributing to B cell proliferation. Nevertheless, the high expression of CCDC136 (P-value 5 3.64 3 10260) negatively interacts with a human TF, AR, and this interaction makes AR inactive so that it causes the results mentioned previously in Fig. 18.8D. In addition, LMP1 can activate the NF-κB signaling pathway and perform the antiapoptosis function [916] through the pathway composed of TRAF3 and CCDC136 to reduce the activity of TF, AR. This makes the downstream target gene, PTK7 operate the antiapoptosis. Human originally enables to have the proapoptotic influence on EBV proteins by CHMP5 that could receive the BAX proapoptotic signal by endocytosis, but the degradation of CHMP5 by the ubiquitination and viral protein BNRF1-mediated antiapoptotic pathway may cause the inaction of human TFs (GATA2 and AR). Consequently, the proapoptosis and interruption of viral translocation are impaired, so these promote the development of antiapoptosis and the complete progression of lytic replication at the first infection stage during the EBV infection. As shown in Fig. 18.8E, RAC3, a GTPase that belongs to the RAS superfamily of small GTP-binding proteins, can regulate a wide variety of processes, including the control of cell growth, cytoskeletal reorganization, the activation of protein kinases (PKs), differentiation, movement, and lipid vesicle transport. PSAP is a mitochondrial proapoptotic protein which forms a complex with BAX when the apoptosis is induced [917]. RAC3 can transmit signals, including the cell growth factor and the activation of PKs to PTK7, playing a role in antiapoptosis, and also the receptor for viral protein, BHRF1. PSAP, a proapoptotic protein as ligand, could bind to a receptor, TGFB1I1, playing a role in the antiapoptosis against the proapoptotic signal from PSAP, so PSAP with low expression (P-value 5 2.73 3 1027) could
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
538
18. Constructing host/pathogen genetic-and-epigenetic networks
negatively interact with the highly expressed TGFB1I1 (P-value 5 5.37 3 1023). Thus the proapoptotic signal can induce the antiapoptotic function by TGFB1I1 that is also the receptor for viral proteins, BFLF2 and BNLF2A (shown in Fig. 18.8E). The cell growth signal from receptor PTK7 is first received by a viral protein, BHRF1 that can prevent the premature death of human cell during the virus production. Then, viral protein, BFRF3 could participate in the assembly of the infectious particles by decorating the outer surface of the capsid shell and thus forming a layer between the capsid and the tegument. BFRF3 could interact with BCLF1 [906] that self-assembles to form an icosahedral capsid. The high expression of BCLF1 (P-value 5 5.69 3 1024) could promote the protection of viral genome by capsid. The antiapoptotic signal could induce a viral protein, BNLF2A to evade from the HLA class Irestricted T cell immunity and prevent the transporter associated with an antigen processing (TAP)-mediated peptide transportation and the subsequent loading. The function of viral BNLF2A would induce the active SCO2, a copper chaperone that could transport copper to the Cu site on the cytochrome C oxidase subunit II (COX2), which helps the inner mitochondrial membrane produce the necessarily aerobic ATP production. Although SCO2 may trigger proapoptosis, it is degraded via an ubiquitination by the ubiquitin ligase protein, G2E3. Thus the proapoptotic function of SCO2 is inhibited so that EBV could escape from the immune response and apoptosis again. Viral protein, BDLF4, mentioned previously in Fig. 18.8C, is important for the EBV lytic replication cycle. We could speculate that BDLF4 collaborates with BALF4 and EBNA3B to block the autophagy mechanism and hijack the autophagic vesicles so that it could contribute to promote the viral production and intracellular transportation of virions. Viral receptor, BKRF2, is required for the fusion between viral and plasma membranes leading to EBV entry into the human B cell. Membrane fusion is mediated by the fusion machinery composed of gB (also called BALF4), and the heterodimer gH (also called BXLF2)/gL (also called BKRF2) may also be involved in the fusion between the virion envelope and the outer nuclear membrane during the virion morphogenesis. Viral BKRF2 also interacts with viral BNLF2A which can encode an inhibitor of the TAP to help for immune evasion. In addition, viral BKRF2 interacts with a viral membrane protein, LMP1 [906] that can act as a CD40 functional homolog to prevent the apoptosis of the infected B lymphocytes and drive their proliferation. LMP1 signaling leads to the upregulation of antiapoptotic proteins and provides cell growth signals in the infected cells. LMP1 can contribute to help viral EBNA3B for immune evasion and hijack the autophagy mechanism via the viral BLLF2, an uncharacterized viral protein. Besides, the host receptor, TGFB1I1, interacts with viral BFLF2 and plays a fundamental role in virion nuclear egress. NEC1 (also called BFLF2) interacts with the newly formed capsid within the human nucleus via the vertexes and directs it to the inner nuclear membrane by NEC2 (also called BFRF1). Then, it could induce the budding of the capsid at the inner nuclear membrane and its envelopment into the perinuclear space. The NEC1/NEC2 complex could promote the fusion between the enveloped capsid and the outer nuclear membrane, and then subsequently release the viral capsid into the cytoplasm where it will bind the secondary budding sites in the human Golgi network. Therefore the antiapoptotic signal from human TGFB1I1 could promote the egress of EBV virion nuclear.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.5 HVCPs at the first and second infection stage during the lytic replication cycle
539
After that, another viral receptor, BBRF3, an envelope glycoprotein, is crucial for virion assembly and egress. BBRF3 is involved in the capsid budding via the inner nuclear membrane during the egress and participates in penetration through the plasma membrane during the lytic infection. BDRF1 receives the egress signal from its viral receptor, BBRF3. BDRF1 is a component of the molecular motor, which can translocate the viral genomic DNA into an empty capsid during the DNA packaging. BDRF1 could form a tripartite terminase complex together with TRM1 (also called BALF3) and TRM2 in the human cytoplasm. Once the complex reaches the human nucleus, it interacts with the vertex of capsid portal. This portal forms a ring in which genomic DNA can be translocated into the capsid. BDRF1 carries an RNase activity that plays an important role for the cleavage of concatemeric viral DNA into unit length genomes. Human ligand, FOXP1 could transcriptionally repress various proapoptotic genes and collaborate with the NF-κB signaling to promote the B cell expansion by the suppression of caspase-dependent apoptosis. Human receptor, MAPK7, a member of the MAP kinase family, is involved in a wide variety of cellular processes such as cell proliferation, differentiation, transcription regulation, and cell survival. Ligand FOXP1 transmits the antiapoptotic signaling to its receptor, MAPK7, and MAPK7 as a viral receptor could promote signaling transmission in the downstream signaling processes. Viral BDLF1 is a structural component of the icosahedral capsid. The capsid is composed of pentamers and hexamers of major capsid protein (MCP), which are linked together by heterotrimers called triplexes. These triplexes consist of a single molecule of triplex protein 1 (also called BORF1) and two copies of triplex protein 2 (also called BDLF1). In addition, BORF1 is required for the efficient transportation of BDLF1 to the nucleus, which is the site of capsid assembly. Thus the signal could promote BDLF1 with high expression (P-value 5 9.65 3 1024) to activate the activity of structural molecule, and BDLF1 could also positively interact with BORF1 to operate the function of viral capsid assembly through transporting BDLF1 to the nucleus via BORF1. Then, BORF1 interacts with a high expressed viral BALF3 (P-value 5 .0486), a component of the molecular motor. BALF3 (also called TRM1) could function with BDRF1 (also called TRM3) and TRM2, so that they collaboratively translocate EBV genomic DNA into empty capsid during the DNA packaging. BALF3 carries an endonuclease activity that plays an essential role for the cleavage of concatemeric viral DNA into unit length genomes. As shown in Fig. 18.8E, viral proteins, BALF3, BDRF1, and BFLF2 together interact with another viral membrane protein, LMP2B. LMP2B functions to downregulate the ability of LMP2A which is able to block B cell activation. It is possible that LMP2B works in cooperation with LMP1 via viral BLLF2. The high expression of LMP2B (P-value 5 .0142) downregulates the LMP2A-mediated interruption of B cell signaling, while LMP1 activates the B cell through the NF-κB, AP-1, and JAK/STAT pathways. Therefore EBV can exploit LMP2B and LMP1 that collaboratively interact with BLLF2 to work in concert with EBNA3B [906], so that it contributes to the immune evasive transport of complete virions via autophagic vesicles hijacked by EBV. In addition, BLLF2 with high activity (P-value 5 1.89 3 1023) positively interacts with a viral receptor, BLLF1, which initiates virion attachment to human B cell. This attachment triggers the fusion between virion and human membrane for the invasion of the human cell, but BLLF1 is degraded via the ubiquitination by the ubiquitin ligase protein, SIAH1,
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
540
18. Constructing host/pathogen genetic-and-epigenetic networks
so that BLLF1 can prevent the progression of viral replication from the aggression and interruption of human uninfected B cell. Besides, viral BLLF1 negatively interacts with human ubiquitin ligase protein, SIAH1, and this can decrease the degradation of ubiquitination via SIAH1 and cause SIAH1 with the low activity (P-value 5 8.42 3 1024). SIAH1 negatively interacts with a human histone deacetylase, HDAC8, so HDAC8 with high expression (P-value 5 6.39 3 10214) can make the function of degradation inactive to avoid viral proteins degrading. Human protein, RBPMS, with high expression (P-value 5 2.35 3 10211) that interacts with viral proteins (EBNA3B, BALF4, and BDLF4), can act as a latentlytic switch in EBV by negatively interacting with human TF, STAT3. STAT3 can transcriptionally activate cellular PCBP2 and then PCBP2 enables to repress the expression of EBV lytic genes [918]. Consequently, EBV not only utilizes viral proteins to control RBPMS and further manipulate the activity of STAT3 but also employs the viral miR-BART1-3p to repress the expression of human NAT1 which functions the acetylation, so that the reduction of acetylation via NAT1 can induce the inaction of STAT3. As a result, owing to the negative regulation on human RBPMS, the acetylation by acetyltransferase, LFNG, and DNA methylation (P-value 5 8.62 3 1023), STAT3 with low activity (P-value 5 1.35 3 10223) can decrease the ability for the latency maintenance, so that it can contribute to the complete progression of the EBV lytic replication cycle. The main perspective of Fig. 18.8E is that EBV can reduce the autophagy in immunity and silence human STAT3 through viral proteins and viral miRNA to maintain the integrated replication of infectious virions, including the antiapoptosis against immune responses, the cleavage of viral DNA into unit length genomes, the viral DNA packaging, the capsid and tegument assembly, and the transportation of virions via autophagic vesicles.
18.6 The transportation process of viral particles through hostvirus cross-talk interactions at the second infection stage The purpose of this chapter is to gain an insight into the pathogenesis mechanism of the human B cell infected with EBV at the second infection stage in the lytic phase (Fig. 18.9), which is split into three parts in Fig. 18.10AC to be helpful in investigating the transportation process of virions and the lysis of viral particles for further cellular function analysis in detail. In Fig. 18.10A, CCL19, a family of secreted proteins, is involved in the immunoregulatory and inflammatory processes. CCL19 plays a role in the normal lymphocyte recirculation and homing, in the trafficking of T cells in thymus, and in the T cell and B cell migration to the secondary lymphoid organs. CCL19 transmits the immunoregulatory signal to TFEB via a human receptor, ESR2. TFEB activates the expression of CD40L in T cells, thereby participating in T celldependent antibody responses in the activated CD4 (1) T cells. TFEB with high activity (P-value 5 1.34 3 1029) can activate the expression of many lysosomal genes, so it enables to positively regulate autophagy and proapoptosis against the EBV lytic infection. On the one hand, TFEB is able to directly induce FAM98B via a human TF, NFATC2, to suppress the function of tRNA splicing, but human NFATC2 is degraded via the ubiquitination by an ubiquitin ligase protein, SIAH1, which leads to
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
(A)
CCL19
(B)
GABRG1 EBVLMP1 TNFRSF10D
ESR2
CCL23
Plasma membrane
EBVBNLF2B
ARRB2 PML EBVEBNA1
NUP155 PRKACB
ABCA12
CARD8
TFEB
HOOK1 IKBKB
BAX
HSD17B4
RASSF10 TRIM3
GLDN
FAM98B
HNRNPU
TCTN1 TUBGCP2
NFATC2
SPIB
TP73
EBVmiR-BART1-3p
miR5586
TRIM46
Nuclear miR296 membrane
miR130B TRIM3 ARRB2 EBVEBNA1
tRNA splicing
Anti apoptosis
Antiapoptosis
Envelope assembly
LRRK2
(C)
Proapoptosis
Transport
Transport
IL10RA EBVBCRF1
BDH1
Plasma membrane
Virion production
Cytosolic proteins TFs
EBVBALF3
ncRNAs Genes
EBVBVRF1 PSMA3 BAX
PCBP2
ST3GAL3
CLIC5 STAT3
RUNX2 EBVEBVmiR-BART14 AFG3L1P miR-BART10
Nuclear membrane
miR421
AFG3L1P CLIC5
Ligands Membrane proteins
EBVBPLF1
EBVBORF1
PPIs Translocations Positiveregulations inGRN Negative regulations in GRN miRNA repressions Acetylation Ubiquitination DNA methylation
EBVPCBP2 NRP1 miR-BART10
Mitochondrial AntiTransport fusion apoptosis
FIGURE 18.10
Proapoptosis
EIF2AK2
NRP1
VCAM1
miR127 miR130B
EBVFAM98B EBVPRKACB BAX BCRF1 miR-BART1-3p
Immunosuppression
CTCFL YBX1
Lytic cycle repression
The signaling pathways of the interspecies molecular mechanisms based on HVCP in Fig. 18.9 at the second infection stage during the EBV infection. (A) The core pathways of the enhancement of antiapoptosis, immunosuppression, and genetic diversity pathways by EBV for packaging, assembly, and transport of viral particles; (B) the promotion of virion production, vesicle trafficking, releasing, and antiapoptosis pathways as a result of EBNA1-mediated PML disruption; (C) the maintenance of virion transportation by decreasing the repression of lytic cycle and increasing the operation of antiapoptosis [14]. EBV, EpsteinBarr virus; HVCP, hostvirus core pathway; PML, promyelocytic leukemia.
542
18. Constructing host/pathogen genetic-and-epigenetic networks
the low expression of NFATC2 (P-value 5 .002) and thereby decreases the transcriptional inhibition of FAM98B. Thus FAM98B enhances the ability for tRNA splicing to help some viral late lytic genes for translation and to increase the genetic diversity about packaging, assembly, and transportation. On the other hand, TFEB indirectly interacts with a human TF, SPIB via HNRNPU, associated with the pre-mRNAs processing in the nucleus. HNRNPU affects the pre-mRNA metabolism and transports to trigger the apoptotic process, but the low activity of HNRNPU (P-value 5 7.73 3 10263) is due to the ubiquitination by an ubiquitin ligase protein, WWP1 which can cause the decreasing apoptotic function and transcriptional regulation of TF, SPIB. The low activity of SPIB (P-value 5 3.91 3 10224) can reduce the transcriptional inhibition to FAM98B, so this can also cause FAM98B to counteract the repression of human miR5586 and to enhance the ability for tRNA splicing to increase the virus and human genetic diversity about packaging, assembly, and transport, which can help the virion production in the EBV infection. Low expressed SPIB could decrease the transcriptional inhibition to EBV protein BCRF1 and viral miR-BART1-3p. Viral BCRF1 with high expression (P-value 5 1.97 3 1025) could encode the viral homolog of IL-10 (vIL-10) to suppress the immune responses in the EBV lytic phase [919], while viral miR-BART1-3p operates the antiapoptotic function against the innate immune responses and proapoptotic signals. Viral miR-BART1-3p would reduce the repression of human PRKACB, regulating various cellular processes such as cell proliferation, regulation of microtubule dynamics, envelope disassembly and reassembly, and the regulation of intracellular transport mechanisms and ion flux. Viral miR-BART1-3p makes PRKACB with high expression (P-value 5 4.08 3 1024) normally operate its functions of envelope assembly and intracellular transport and promote viral progeny production and subsequent lysis. GABRG1, belonging to the ligand-gated ionic channel family, regulates the activity of ionic channel to transmit the antiapoptotic signal to its receptor, TNFRSF10D, which is also a viral receptor for viral BNLF2B. TNFRSF10D, a member of the TNF-receptor superfamily, contains a truncated cytoplasmic death domain which hence is not capable of inducing apoptosis but acts as an inhibitor to protect EBV proteins against the TRAILmediated apoptosis. TNFRSF10D interacts with viral protein, BNLF2B, displaying a high similarity with BCRF1 of EBV, which may operate the immune evasion as BCRF1 to protect EBV [920]. Viral BNLF2B positively interacts with human PRKACB and viral miRBART1-3p reduces the repression of human PRKACB, so EBV could exploit PRKACB to promote the envelope assembly and intracellular transport of virions. Thus viral BNLF2B could transmit an antiapoptotic signal to promote the viral particles production and to affect the transcriptional activity of human TF, TP73, through human proteins, HSD17B4, GLDN, and TUBGCP2. BAX is upregulated by the transcriptional regulation of TF, TP73, but BAX is also subjected to the repression of human miR296, the degradation via ubiquitination by the ubiquitin-conjugating enzyme UBE2C, and the DNA methylation (P-value 5 1.93 3 1023), so that these factors could lead to the reduction of proapoptotic function of human BAX. Another EBV membrane protein, LMP1 functions as a CD40 functional homolog to evade the apoptosis of the infected B cells and drive their proliferation. LMP1 signaling leads to the upregulation of antiapoptotic proteins and provides cell growth signals in the infected cells. LMP1 negatively interacts with human IKBKB, phosphorylating the inhibitor in the inhibitor-NF-κB complex, which can cause the dissociation of the inhibitor and
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.6 The transportation process of viral particles through hostvirus cross-talk interactions at the second infection stage
543
the activation of NF-κB. IKBKB is activated by several stimuli such as inflammatory cytokines, viral products, DNA damages, as well as other cellular stresses. IKBKB can modify the target protein through the polyubiquitination of the inhibitors and the subsequent degradation by proteasome. However, the proapoptotic function of human IKBKB is suppressed, which results from the negative interaction with LMP1 and the inactivation via the acetylation by acetyltransferase GALNT7, so the expression of IKBKB thereby is low (P-value 5 3.54 3 10220). This directly affects the operation of human proapoptotic protein, BAX, belonging to the BCL2 protein family, which acts as proapoptotic regulators. Under stress conditions, BAX experiences a conformational change, which results in translocation to the mitochondrion membrane and subsequently leads to the release of cytochrome C that then triggers the proapoptotic function. However, because of the positive interaction with the low expressed IKBKB, the transcriptional regulation of TP73, the repression by human miR296, the ubiquitination by ubiquitin-conjugating enzyme UBE2C, and the DNA methylation (P-value 5 1.93 3 1023), proapoptotic BAX is inhibited by EBV proteins (BNLF2B and LMP1) so that human additionally loses one of the offensive mechanisms that can antagonize the EBV lytic infection. From Fig. 18.10A, we have indicated that EBV not only evades the immune suppression by viral BCRF1 and inhibits proapoptosis by viral BNLF2B and LMP1, but also it exploits the antiapoptotic function to promote the propagation of EBV progeny via viral miR-BART1-3p. In Fig. 18.10B, human receptor PML operates DNA repair function via DNA damage response signal so that PML can rapidly promote the apoptotic operation to defeat the invasion of EBV (see Fig. 18.10B). PML constitutes some specific nuclear structures also known as PML nuclear bodies (NBs), considered as nuclear organizing centers, and the formations of these bodies are necessary to mediate the functions of PML, which play significant roles in cell cycle progression, transcriptional regulation, activation of p53, DNA damage response or repair, apoptosis, and tumor suppression. In addition to these cellular functions, PML NBs participate in the interferon-mediated antiviral response and the innate immune response to inhibit viral lytic replication and transcription, and the upregulation of PML is considered as a cellular mechanism for maintaining the EBV latent viral infection [921]. However, EBV-encoded proteins may disrupt PML NBs for the purpose of promoting cell survival with DNA damage by suppressing apoptosis. Viral EBNA1 expresses in both latent and lytic phases of infection. EBNA1 is necessary for the stable persistence of EBV episomes in proliferating cells and is the only EBV protein, which is expressed in all EBVassociated tumors. High expressed EBNA1 (P-value 5 8.81 3 1023), not repressed by human miR127 in this system, results in impairing the DNA repair, decreasing the activation of p53 and the antiapoptosis in response to DNA damage, and leads to increasing the cell survival by inducing the silencing of PML proteins, so EBNA1 thereby enables to protect virion production from apoptosis and promote the lytic infection. An interaction between viral EBNA1 and human CK2 kinase is important for EBNA1 to disrupt PML NBs and degrade PML. EBNA1 increases the association of CK2 with PML, so it thus increases the ability of CK2 to phosphorylate PML. Phosphorylation is a modification, which is well known to trigger the polyubiquitination and degradation of PML. Therefore PML disruption and silencing by EBNA1 are a defensive mechanism by which EBV may contribute to the advance of EBVassociated cancer [922].
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
544
18. Constructing host/pathogen genetic-and-epigenetic networks
The disrupted PML could positively interact with NUP155, nucleoporins proteins, which play an important role in the assembly and functioning of the nuclear pore complex (NPC) that regulates the movement of molecules across the nuclear envelope (NE). NUP155 participates in the formation of the double membrane NE. However, NUP155 positively interacts with human-disrupted PML and is degraded by the ubiquitin protein USP3 via ubiquitination, leading to the low expression of NUP155 (P-value 5 9.79 3 10212). This affects the transporter activity, the structural constituent of nuclear pore, and the binding and translocating ability of proteins during the nucleocytoplasmic transport, so the damaged DNA signal cannot be successfully transported from nucleus into cytoplasm to induce DNA repair and trigger proapoptosis to block EBV. NUP155 negatively interacts with ABCA12, so the low expressed NUP155 results in the high expression of ABCA12 (P-value 5 5.57 3 1025), a member of the superfamily of ATP-binding cassette (ABC) transporters, which can transport various molecules across extra- and intracellular membranes. The transporter ABCA12 then interacts with HOOK1, a member of the hook family of coiled-coil proteins, which bind to microtubules and organelles. HOOK1 links the endocytic membrane trafficking to the microtubule cytoskeleton, and its complex can function to promote the vesicle trafficking. HOOK1 interacts with TRIM3, a member of the cytoskeleton-associated recycling or transport (CART) complex, through RASSF10 to collaboratively mediate the vesicular trafficking via the TRIM3’s association with the CART complex and cooperatively promote the virion transport. TCTN1 is a member of the family of secreted and transmembrane proteins that act as a barrier that can prevent the diffusion of transmembrane proteins, but TCTN1 is inactivated via the acetylation by acetyltransferase NAA16, which can cause TCTN1 to be not capable of controlling the diffusion of transmembrane proteins that make EBV hijack this function to help for virion transport. Low expressed TCTN1 (P-value 5 1.19 3 10214) can cause the reduction of signal transmitting to human TF, YBX1 via TRIM46, and thereby the low expressed YBX1 (P-value 5 1.8 3 10219) can lead to the decreasing of transcriptional regulation to human target gene, ARRB2. The proapoptotic function of ARRB2 is inhibited because of the decreasing of the transcriptional regulation of YBX1, the ubiquitination by ubiquitin protein HERPUD1, and the DNA methylation (P-value 5 3.05 3 1022). ARRB2 translocated to plasma membrane as human receptor receives the signaling from a ligand, CCL23 that can participate in the immunoregulatory and inflammatory processes. CCL23 transmits signaling in response to the inflammation to induce ARRB2 to operate the proapoptotic function, but the activity of ARRB2 is suppressed. Then, it subsequently induces the operation of human CARD8, belonging to the caspase recruitment domain (CARD) containing a family of proteins. CARD8 enables to activate the expression of caspases or NF-κB pathway and may be a component of the inflammasome, a protein complex that can participate in the activation of proinflammatory caspases. However, ARRB2 positively interacts with CARD8, which makes CARD8 inactivate with low expression (P-value 5 7.67 3 10219) and has influence on the inactivation of proinflammatory responses so that it can promote the viral immune evasion. The inactivated CARD8 could cause an active human FAM98B. FAM98B can enhance the ability of tRNA splicing to help the EBV lytic phase for the production of viral particles. FAM98B drives human TF, CTCFL, with high expression (P-value 5 1.49 3 10254) to transcriptionally inhibit human miR130B. Human miR130B promotes cell growth and self-renewal, and its normal
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.6 The transportation process of viral particles through hostvirus cross-talk interactions at the second infection stage
545
expression can increase cell viability, reducing cell death and decreasing the expression of apoptosis-related proteins [923]. Nevertheless, in spite of the fact that the inhibited miR130B may reduce cell viability and increase the cell death and expression of apoptosisrelated proteins, it can also reduce the repression of TRIM3 by miR130B. Thus it also results in the high expression of TRIM3 (P-value 5 7.92 3 10240), which helps EBV for the transport of virions. A brief overview of Fig. 18.10B indicates that the EBNA1-mediated PML silencing and disruption are responsible for inducing the successful progression of EBV lytic cycle, which could promote virion production, vesicle trafficking, intracellular transport, and antiapoptosis during the lytic production infection. In Fig. 18.10C, EIF2AK2 is a serine/threonine PK, which is activated by autophosphorylation. EIF2AK2 plays an essential role in the innate immune response to viral infection, signal transduction regulation, apoptosis, and cell proliferation, so that EIF2AK2 can exert an influence on its antiviral activity on a wide range of DNA and RNA viruses, including EBV. EIF2AK2 binds to human receptor, IL10RA, a receptor for IL-10, which is structurally associated with interferon receptors. IL10RA has been exhibited to mediate the immunosuppressive signal of IL-10 and thereby suppresses the biosynthesis of proinflammatory cytokines. Hence, EIF2AK2 as a ligand binds to IL10RA, also a receptor of viral BCRF1 that can induce antiinflammatory factors to avoid EBV proteins subjected to the apoptotic offense by the immune system. BCRF1, a late gene during the lytic phase, encodes viral IL-10 (vIL-10), which is a human homolog of IL-10 (hIL-10). vIL-10 is not only expressed in the early phase but also the late phase of virus lytic production when human B cells are infected with EBV. BCRF1 operates some cellular functions similar to those of hIL-10 proteins, which generally participate in the immunosuppression. BCRF1 can suppress the cytokine synthesis of IL and interferon and weaken the human natural killer cell and cytotoxic T cell (CTL) responses so that EBV can subsequently establish a latent infection. Thus viral BCRF1 with high expression (P-value 5 1.97 3 1025) enables to mimic the activity of IL10RA, indicating that BCRF1 may have a role in the interaction between EBV and human immune system [924,925]. BCRF1 interacts with viral BPLF1, a large tegument protein that plays numerous roles in the EBV viral cycle. During the EBV lysis, BPLF1 with high activity (P-value 5 3.5 3 102101) remains association with the capsid while most of the tegument is detached and has a role in the capsid transport toward the human membrane. BPLF1 would interact with BALF3 and then BORF1, as mentioned previously at the first infection stage. BALF3 interacts with BORF1 at the second infection stage, which mainly helps EBV terminate some final steps of viral replication during the lytic infection, including DNA cleavage, DNA packaging, assembly of capsid and tegument, and the intracellular transport located at the human membrane preparing for virion lysis. Viral BVRF1 acts as a checkpoint at the formation between viral capsid and tegument. BVRF is a capsid vertex specific component that has a role during the EBV DNA encapsidation, ensuring an accurate DNA genome cleavage and stabilizing capsids. In addition, viral BVRF1 could negatively interact with human PSMA3 [906] to avoid the degradation of EBV proteins, so PSMA3 expresses with low activity (P-value 5 1.28 3 10220). Human protein, PSMA3 is a multicatalytic proteasome, characterized by its ability to cleave viral peptides, and has an essential function as a modified proteasome, an immunoproteasome, and a processing of class I MHC peptides. PSMA3 has an ATP-dependent proteolytic activity that can mediate
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
546
18. Constructing host/pathogen genetic-and-epigenetic networks
proteins degradation and negatively regulates the membrane trafficking of the cell surface. A low expressed PSMA3 cannot perform the antigen presentation processing, and this makes B cell be infected with EBV and unable to be presented to cytotoxic T cell that could inhibit the operation of cell-mediated immune response. Low expressed PSMA3 positively interacts with human TF, STAT3, also with low expression (P-value 5 8.98 3 10213) via ST3GAL3, and STAT3 is subjected to the ubiquitination by ubiquitin protein, USP33 so that human STAT3 has low activity (P-value 5 1.35 3 10223) to reduce the suppression to lytic infection and decrease the transcriptional inhibition of human target gene, NRP1. Human NRP1 thereby has high activity (P-value 5 1 3 10280), which is able to perform the function of transport, and contributes to help EBV for virion translocation. Besides, LRRK2 can act as a ligand to present largely in the cytoplasm but also is associated with the mitochondrial outer membrane and positively regulates autophagy through a calcium-dependent signaling pathway. LRRK2 binds to a human receptor, NRP1, which contains some specific protein domains that allow them to participate in multiple different types of signaling pathways that control cell migration, cell survival, transport, and attraction. Because of the block of autophagy by EBV at the first infection stage, it can cause the inactivation of LRRK2. The inactivated LRRK2 negatively interacts with its receptor, NRP1, to activate the transport function of NRP1. NRP1 transmits signal to human TF, RUNX2, through the positive interaction with BDH1 and then VCAM1, both of which have high activity (P-value 5 8.83 3 10270 and P-value 5 1.04 3 10239, respectively). However, human VCAM1 negatively interacts with RUNX2, which leads to the low expression of TF RUNX2 (P-value 5 5.62 3 10241) and thereby decreases the transcriptional regulation to human target gene, AFG3L1P. AFG3L1P is a long noncoding RNA and is associated with the mitochondrial fusion and protein importing into the mitochondrial intermembrane space that may trigger apoptotic processes. Thus the low expression of AFG3L1P (P-value 5 3.34 3 1028) is due to the reduction of transcriptional regulation by TF RUNX2 and the repression of EBV miR-BART14, so that human AFG3L1P is unable to trigger apoptosis to defeat viral proteins owing to the indirect influence of the blocked autophagy by EBV and the direct effect of the inhibition by EBV miR-BART14. In Fig. 18.10C a low expressed lncRNA, AFG3L1P results in the decreasing of transcriptional inhibition to human target gene, CLIC5, a member of the chloride intracellular channel (CLIC) family of chloride ion channels. CLIC5, a human target gene related to actin-based cytoskeletal structures, enables to insert into membranes and then form poorly selective ion channels that may also transport chloride ions. Hence, CLIC5 with high activity (P-value 5 7.32 3 1025) has influence on the activity and formation of chloride channel to enhance the transport. In addition, human proteins, BAX and PCBP2 interact with VCAM1 and ST3GAL3, respectively. Originally, the activation, conformational change, and relocation of BAX from the cytosol to the mitochondria could cause the release of cytochrome C to the cytosol and trigger apoptosis that can be functioned as the offensive mechanism against the EBV lytic infection [926]. In addition, BAX interacts with PCBP2, and PCBP2 can regulate the susceptibility to lytic cycle activation signals and interact with STAT3 via ST3GAL3 that can mediate the repression of EBV lytic cycle and the maintenance of EBV latency [927]. However, BAX is degraded via the ubiquitination by an ubiquitin-conjugating enzyme, UBE2C, which results in the reduction of apoptotic effect on EBV proteins, while PCBP2 is subjected to the
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.7 Overview of the lytic infection molecular mechanism from the first to second infection stage
547
ubiquitination also by UBE2C, the repression of human miR421 and EBV miR-BART10, and the DNA methylation (P-value 5 1.07 3 1024), all of which can cause PCBP2 with low expression (P-value 5 5.58 3 10270) and the decreasing of lytic cycle repression that contributes to the accomplishment of EBV lytic production and transportation. EBV miR-BART10, counteracting the expression of low activity of human miR421 (P-value 5 2.59 3 1025), has effect on the inhibition of apoptosis. In the summary of Fig. 18.10C, viral BCRF1 performs the immunosuppressive mechanism which directly helps EBV for final procedures of virion production and indirectly has influence on the transport of viral particles, the performance of antiapoptotic function via EBV miR-BART14 and miR-BART10, the persistence of lytic production cycle, and the progression of transportation by EBV miR-BART10. EBV can maintain the transportation of viral particles by decreasing the repression of lytic cycle and increasing the operation of antiapoptosis.
18.7 Overview of the lytic infection molecular mechanism from the first to second infection stage in human B cells infected with EBV The suppressed expression of EBV latent antigens is important for EBV-related tumors to evade from the immune surveillance. EBV lytic production enables to be triggered by a variety of inducer treatments and the induced lytic genes lead to cytotoxic T lymphocyte (CTL) responses [928]. Once the resting memory B cells latently infected with EBV are reactivated and enter the lytic phase, most of EBV lytic genes and a few of latent genes are transcribed and expressed in the lytic phase, and human immune system simultaneously detects the antigens of EBV. Thus it can trigger human immune responses such as antigenpresenting cells, cytotoxic T cells, proinflammatory cytokines, reactive oxygen species, and some proapoptotic signals that mediate some immune mechanisms, including proapoptosis, autophagy, inflammatory response, human intrinsic immune pathway, and the extrinsic immune pathway in response to human B cells infected with EBV in the lytic phase. As shown in Fig. 18.11, on the one hand, EBV can mediate some defensive mechanisms against human immune response at the first infection stage. Viral protein EBNA2 evades the immune apoptosis by interacting with human receptor CD46 with its immune inhibition property to induce immunosuppressive phenotype and then reduces the operation of human autophagy in the immunity of CD46 and the interruption of viral translocation of SNHG5. Viral IE protein Zta interacts with EBNA2 [906], transcriptionally activates EBV early lytic genes, inhibits human acetylation of NAT1 with viral miR-BART1-3p, and suppresses human energy metabolism of CLOCK, which may trigger apoptosis, with viral miR-BART14. Also, BZLF1 itself performs an antiinflammatory response to human immune system. Viral miR-BART5 is found to operate an antiapoptotic response to protect EBV from human immune attacks. Besides, viral miRNA miR-BART1-3p has a role in antiapoptosis and may hijack the acetylation function of NAT1 by repressing gene NAT1, so that EBV can exploit other acetyltransferases to perform the acetylation at the both infection stages. In addition, viral protein EBNA3B may hijack the ubiquitination function of RNF41 by negatively interacting with human-protein PSME3, so that EBV can exploit other ubiquitin proteins to operate the ubiquitination at the both infection stages. EBV decreases the transcriptional regulation of human TFs (ZEB1 and MYC) and the expression
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
Pathogenic epigenetic regulations
EBV
Human B cells
DNA methylation
EBNA2
CD46
NAT1
miR-BART14
CLOCK
PSME3
EBNA3B
RNF41
Ubiquitination
EBV
ZEB1, MYC, RNF41, GRB2
miR-BART14 and DNA methylation
EBV
Immune apoptosis Autophagy in immunity Viral translocation
miR-BART1-3p and DNA methylation
BZLF1
First infection stage
Acetylation
Hijack acetylation
Proapoptosis Ubiquitination
Hijack ubiquitination
Proapoptosis
ATG5
Ubiquitination, acetylation, DNA methylation
BECN1 RAB7A
Autophagy
Hijack autophagy
Proapoptosis TRAF3
LMP1
Infected B cell proliferation EBNA3B
LMP2B
PSME3
Immune evasive transport
EBVexploits viral proteins, miRNAs, and even epigenetic regulations to inhibit the operations of host proteins, so that these contribute to the promotion of antiapoptosis, the immune evasion, and further the hijacking of ubiquitination, acetylation, and autophagy mechanisms.
EBNA3B BALF4 BDLF4
RBPMS
Ubiquitination and DNA methylation
Transition stage STAT3
Activation of late lytic genes
The upregulated LMP1 at the first infection stage participates in the envelopment pathways at the nuclear membrane or at the cytoplasm and promotes the transformation of the infected host B cell into the second infection stage.
EBV LMP1
Pathogenic epigenetic Human regulations B cells Ubiquitination and DNA methylation
BNLF2B
BAX
Second infection stage Proapoptosis Immunosuppression
BCRF1
IL10RA Antiapoptosis
EBNA1
PML Ubiquitination and DNA methylation
AFG3L1P
miR-BART14 miR-BART10 EBV
Virion production ARRB2
Ubiquitination and DNA methylation Ubiquitination
Proapoptosis
PCBP2 Maintain the lytic phase STAT3
EBV operates the regulations of epigenetics with viral proteins and miRNAs to suppress the antiviral functions of host proteins, so that these enable to maintain the lytic phase and accomplish the progression of virion production and transportation.
FIGURE 18.11 Overview of molecular mechanisms in EBV lytic infection and the significant network marker for potential multimolecule drug design. The red words indicate the potential drug target proteins for multimolecule drug design; the green words represent the molecular mechanisms being hijacked by EBV; the blue and pink arrows denote the cellular functions being inhibited or promoted, respectively; the yellow arrow means the progression from the first into the second infection stage in the lytic phase. Upon the EBV reactivation into the lytic phase, human immune system can detect the EBV antigens, trigger the human immune responses, and then inhibit the progression of EBV lytic replication. However, EBV mediates some defensive mechanisms against human immune response and thereby protects the complete virion production and transportation from the human immune interference at the both infection stages. In addition, the transition stage of EBV is dependent on the activation of late lytic genes and the operation of LMP1 function [14]. EBV, EpsteinBarr virus.
18.7 Overview of the lytic infection molecular mechanism from the first to second infection stage
549
of human receptors (RNF41 and GRB2) via ubiquitination to indirectly inhibit human proapoptotic function of MYC and reduce the release of mitochondrial products of SLC25A6 that can trigger proapoptosis. Thus EBV can block the autophagy mechanism to hijack autophagy through the indirect inhibition of autophagy-associated proteins ATG5, BECN1, and RAB7A with viral. miR-BART14 and through the acetylation and ubiquitination to directly and indirectly restrict the other autophagy-related proteins (BECN1 and RAB7A). Then, viral proteins (EBNA3B, BALF4, and BDLF4) could enable to hijack the autophagy-related autophagosomes that contribute to the intracellular vesicle trafficking. In addition, EBV reduces the proapoptosis and viral release of CHMP5 and the translocational disruption of SNHG5 and increases the antiapoptotic function of TGFB1I1 and PTK7 through the ubiquitination of CHMP5 and the suppression of human intrinsic defenses by viral protein BNRF1, which also enhances the activation and transcription of viral early genes. The antiapoptotic signal induces viral protein BNLF2A to evade from the HLA class Irestricted T cell immunity and prevent the TAP-mediated peptide transport and the subsequent loading. In addition, the degradation of viral protein BLLF1 via ubiquitination can prevent the progression of viral replication from the aggression and interruption of human uninfected B cell. On the other hand, under the protection from the defensive mechanism of EBV proteins and miRNAs mentioned previously, EBV can securely produce viral particles without disturbance at the first infection stage. Viral protein EBNA2 can also promote the infected cell proliferation. Viral protein BHRF1 can prevent the premature death of the human cell during the virus production. Then, viral protein, BFRF3 can also participate in the assembly of the infectious particles by decorating the outer surface of the capsid shell and thus forming a layer between the capsid and the tegument. BFRF3 interacts with BCLF1 that can self-assemble to form an icosahedral capsid. Viral protein BDLF4 is important for the EBV lytic replication cycle and collaborates with BALF4 and EBNA3B so that they contribute to promote the viral production and intracellular transportation of virions. Membrane fusion is mediated by BALF4, and the heterodimer BXLF2/BKRF2 may also be involved in the fusion between the virion envelope and the outer nuclear membrane during the virion morphogenesis. LMP1 acts as a CD40 functional homolog to prevent the apoptosis of the infected B cells to drive their proliferation. LMP1 signaling can lead to the upregulation of antiapoptotic proteins and provide cell growth signals in the infected cells. BFLF2 plays a fundamental role in the EBV virion nuclear egress. Viral receptor BBRF3, a necessary lytic replication protein, is an envelope glycoprotein and is crucial for the virion assembly and egress. BDRF1 can translocate viral genomic DNA into empty capsid during the DNA packaging, and BDRF1 forms a tripartite terminase complex together with BALF3 and TRM2 in the human cytoplasm. Viral icosahedral capsid is composed of pentamers and hexamers of the MCP, which are linked together by triplexes. These triplexes consist of BORF1 and BDLF1. In addition, BORF1 is required for the efficient transport of BDLF1 to the nucleus, which is the site of capsid assembly. Viral BALF3 functions with BDRF1 and TRM2, so that they can collaboratively translocate EBV genomic DNA into empty capsid during the DNA packaging. Viral membrane proteins (LMP2B and LMP1) can function to collaboratively activate human B cell by interacting with BLLF2, and both of them work in concert with EBNA3B so that EBNA3B can mediate the immune evasive transport of complete virions via autophagic vesicles.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
550
18. Constructing host/pathogen genetic-and-epigenetic networks
RBPMS, interacting with viral proteins (EBNA3B, BALF4, and BDLF4), acts as the latentlytic switch in EBV by negatively interacting with STAT3. The inactivated STAT3 cannot transcriptionally activate cellular PCBP2 and then PCBP2 cannot repress the expression of EBV lytic genes. This increases the ability for the transcriptional activation of EBV late lytic genes, so that it may contribute to the complete progression of EBV lytic production cycle into the late stage. A viral membrane protein existing in the both infection stages may play a crucial role in the transformation during the lytic phase of viral production. EBV utilizes the significantly different expression of LMP1 for two purposes during its life cycle. First, in the latent infection of B cells, the steady-state expression of LMP1 is important for maintaining the transformational state. Second, in the EBV lytic production cycle, the induced LMP1 is efficient for virion release from B cells. Once the induction of EBV lytic replication, the expression of LMP1 is upregulated to activate TRAF3 for enhancement of B cell proliferation and inhibition of proapoptosis. Here we indicated that gene products of LMP1 are responsible for the efficient virus release in B cells. It is feasible that viral LMP1 enhances the envelopment, deenvelopment, and reenvelopment pathways either at the nuclear membrane or at the cytoplasm during the lytic phase [929]. Viral membrane protein LMP1 can promote the transformation of human B cell infected with EBV from the first into second infection stage and prevents the infected B cells from apoptosis. On the one hand, EBV maintains the defensive mechanism to protect the complete virion production and transportation from the human immune interference. Viral proteins (LMP1 and BNLF2B) indirectly decrease the proapoptotic function of BAX. BNLF2B displays high similarity with BCRF1, which may operate the immunosuppression to protect EBV. EBV is shown to increase the abilities of antiapoptotic performance of viral miR-BART1-3p and the immunosuppression of viral protein BCRF1 as an immune evasive mechanism through the degraded HNRNPU via the ubiquitination. Viral antigen EBNA1 can operate the ability of antiapoptosis to mediate the disruption and silencing of PML to block the proapoptotic signal. Then, the proapoptosis of ARRB2 is inhibited through the ubiquitination and the indirect suppression of EBNA1. Viral miR-BART14 can repress the expression of AFG3L1P, which can enable to trigger the performance of proapoptosis. Viral miR-BART10 exploits the antiapoptotic response to antagonize human proapoptosis and represses the expression of PCBP2 to prevent the virion production of lytic cycle from restriction. EBV also exploits the effects of ubiquitination to degrade BAX, which has a role in proapoptosis, and to decrease the expressions of proteins, STAT3 and PCBP2, which participate in the transformation from the lytic phase into the latent phase. On the other hand, EBV promotes the processing of transport to maintain the production of viral particles. Viral BNLF2B and miR-BART1-3p exploit PRKACB to promote the envelope assembly and intracellular transport of virions. EBV enables to promote the operation of tRNA splicing of FAM98B through the degraded TF NFATC2 via the ubiquitination to increase the expression of late lytic genes and the genetic diversity about cell packaging, assembly, and cell transport. The activated EBNA1 can help virions for production and indirectly promote the cell transportation of TRIM3. Viral miR-BART14 can help CLIC5 operate the function of cell transport by repressing the expression of AFG3L1P, and EBV can help NRP1 activate the function of cell transport through degrading STAT3 via
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.8 Drug target proteins and multimolecule drug design
551
ubiquitination. Viral BPLF1 remains association with the capsid while most of the tegument is detached and has a role in the capsid transport toward the plasma membrane. BALF3 can interact with BORF1 at the second infection stage, which mainly helps EBV terminate some final steps of viral production during the lytic infection. Viral BVRF1 acts as a checkpoint at the formation between viral capsid and tegument and has a role during the EBV DNA encapsidation, maintaining an accurate DNA genome cleavage to stabilize capsids. These suggest that EBV can exploit viral proteins and miRNAs to develop its defensive mechanism to defeat multiple immune attacks of human immune system, promote the virion production via the performance of viral proteins, and facilitate the transportation of viral particles by activating the expression of some human genes. A better understanding of hostvirus cross-talk interactions could help in the design of new therapeutic drugs against EBV-associated malignancies.
18.8 Drug target proteins and multimolecule drug design This section will investigate into the potential drug target proteins and proposed a multimolecule drug design, which is aimed at human B cell infected with the EBV lytic phase. Belonging to γ-herpesviruses, EBV plays the dual roles in its life cycle. One is latency established in human with only some latent genes being expressed, and the other is the reactivated lytic infection, which contributes to new virion production and transportation. Because most viral proteins are expressed during the lytic phase, EBV has to develop the defensive mechanisms to antagonize human immune system. Therefore our therapeutic strategies are aimed at applying the proposed multimolecule drug to the cells infected with EBV for the purpose of blocking the reactivation into the lytic phase, interrupting the viral production of virions, interfering the transportation of viral particles, and destroying viral defensive mechanisms. EBV is reactivated into the lytic phase that provides an approach to promote the EBVdependent viral cell killing, and the method, called the induced lytic therapy, needs some drugs and other compound agents that can induce EBV reactivation without resulting in unacceptable cytotoxic substance to normal cells. Current induced lytic strategies have used a PK encoded by the EBV early lytic gene, BGLF4, which can transform the nucleoside analog ganciclovir (GCV) into cytotoxic drugs that can promote apoptosis and kill viral cells. The phosphorylated GCV can also be transferred to the adjacent cells via gap junctions, so the activation of ganciclovir phosphorylation could cause “bystander killing” to a number of viral cells and even normal cells [885]. Ganciclovir, the classic antiviral drug, can successfully suppress the activity of viral cell DNA polymerase and eradicate the virus-infected cells. However, ganciclovir in the phosphorylation form could not only lead to a much greater cell death due to the bystander killing, but also inhibit the activity of human cell DNA polymerase. Besides, ganciclovir only has effective influence on EBV lyticinfected cells. Because EBV-positive tumor cells are mainly in the EBV latent infection, ganciclovir is thereby not useful for treating EBV-positive tumors by itself [930,931]. We considered that viral proteins EBNA2 and Zta have the primary role in the initiation of EBV lytic phase. We forecast EBNA2 and Zta as the potential drug targets in the progression of EBV lytic phase. EBNA2 has the ability to be efficient in upregulating
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
552
18. Constructing host/pathogen genetic-and-epigenetic networks
genes involved in the infected cell proliferation and survival, and EBNA2 can evade human immune attacks by interacting with CD46 in Fig. 18.8A. Besides, the switch between the latent phase and the lytic phase of EBV infection can be induced by the expression of immediate-early gene product, Zta, which is a TF and able to induce the entire program of EBV lytic gene expression [890,932]. We have also suggested that EBV membrane proteins LMP1 and LMP2B, and EBV nuclear antigen EBNA1 could be predicted as the potential drug targets because they participate in not only the reactivation from the latent phase into the lytic phase, but also the defensive mechanisms for the process of virion production at the both infection stages as shown in Figs. 18.7 and 18.9, respectively. LMP2B works in cooperation with LMP1 via viral BLLF2 to enhance the activity of human B cells and facilitate the production and transportation of viral particles. In addition, among all EBV-encoded proteins, LMP1 performs the function of antiapoptosis at the both infection stages during the lytic phase and plays a central role in the propagation of EBV-associated lymphoma. The conventional treatment of EBV-associated malignancies cannot avoid the tumor metastasis, recurrences, and disease progression, so we could consider that therapeutic strategies by targeting EBV-encoded proteins may increase the cure rate and provide a clinical benefit [933]. Viral protein EBNA1, another prime target for therapeutic intervention, acts as the major switch that regulates the EBV gene activity and activates the ability of EBV to maintain dormant in human. EBNA1 is crucial for the virus to reproduce via antiapoptosis. Knocking out EBNA1 could thereby destroy EBV and manipulate the growth of EBVassociated cancer [934]. To develop antiEBV drugs and construct the drug databases for drugs to target EBV proteins, the researchers began a complex screening process to find a small molecule that could chemically bind to viral proteins and inhibit their abilities to perform. Thus we do the drugs mining in literature reviews from those researchers in order to design a multimolecule drug that could target these potential drug targets. In the study of Ismail et al., the activity of the potent herbal extract drug Thymoquinone on EBV was assessed in vitro [935]. Thymoquinone (TQ) was tested for cytotoxicity on human cells of BL and some other EBV-related lymphoma. Thymoquinone was found to efficiently inhibit the RNA expression of viral genes EBNA2, LMP1, and EBNA1. Particularly, EBNA2 expression level with the most affection has indicated that EBNA2 might have a main contribution to thymoquinone potency against EBV-infected cells. Their results have suggested that thymoquinone has the potential to efficiently inhibit the growth of EBV-infected B cells [935]. In addition, other drug is Valpromide (VPM), not an HDAC inhibitor, which can block EBV reactivation. VPM could prevent the gene expression of viral BZLF1, which could mediate the lytic reactivation. VPM cannot activate the expression of some cellular immediate-early genes, including FOS and EGR1 that are the upstream of the EBV lytic cycle, but VPM can decrease their activities. Thus VPM can selectively suppress both viral and cellular gene expression. VPM represents a new class of antiviral agents to prevent the initiation of EBV lytic phase. VPM will be useful in exploring the mechanism of EBV lytic reactivation and may have therapeutic application [936]. The other drug is Zebularine (Zeb), a DNA methyltransferase inhibitor (DNMTi), which enables to induce the expression of E-cadherin that is a cellular gene frequently silenced by the hypermethylation in cancers. Zebularine can decrease the upregulation of viral genes, LMP2A, LMP2B, and EBNA2, to avoid the switching from the latent phase into the lytic phase upon cross-linking with the
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.9 Discussion
Thymoquinone (TQ)
553
FIGURE 18.12 The multimolecule drug design for the predicted drug targets. Thymoquinone (TQ) is found to inhibit the RNA expression of viral EBNA2, LMP1, and EBNA1; Valpromide (VPM) can prevent the gene expression of viral BZLF1; Zebularine (Zeb) can decrease the upregulation of viral LMP2A, LMP2B, and EBNA2. These molecular drugs can block the reactivation, interrupt the viral production of virions, interfere the transportation of viral particles, and destroy viral defensive mechanisms [14].
Valpromide (VPM)
Zebularine (Zeb)
lytic inducers such as the B cell receptors. Zebularine could also be exploited against EBVassociated tumors, because it does not induce the switching from the latent phase into the lytic phase that may cause the secondary EBV-related malignancies [937]. Consequently, we could integrate these drugs as a potential multimolecule drug as shown in Fig. 18.12 for the predicted drug targets. We considered that the multimolecule drugs can inhibit the activities of viral proteins and thus play a potent role in counteracting the reactivation from the latency into the lytic phase during the EBV lytic infection of human B cells for the further development as inhibitors of EBV-associated malignancies.
18.9 Discussion EBV has evolved mechanisms to facilitate their survival and propagation by remarkably modifying the transcriptional spectrum and protein content of the human cells they infect. Modifications of the human transcriptome and proteome are mediated by EBV-encoded modifier molecules that could modulate human cells via various different mechanisms. From the association between analyses of DAVID in Tables 18.3 and 18.4 about the real GIGENs and in Tables 18.7 and 18.8 about the HVCNs, and the HVCPs in Figs. 18.7 and 18.8 at the first infection stage and in Figs. 18.9 and 18.10 at the second infection stage, we have indicated that EBV exploits not only viral proteins itself but also human cellular proteins to mediate the initiation of EBV life cycle, the genetic and epigenetic regulations, the responses to some agents or drugs of inducers, the immune evasion from human proapoptosis, metabolic, and autophagy process, and then the blocking and hijacking of the autophagy mechanism to help virions for production and transportation at the first infection stage; besides, EBV could manipulate the interactions with human proteins to promote the DNA packaging and envelope the assembly of viral particles, antiinflammatory and antiapoptotic responses, and importantly the transportation of new virions at the second infection stage.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
554
18. Constructing host/pathogen genetic-and-epigenetic networks
In addition, from the HVCPs at the both infection stages, we have suggested that EBV could regulate the epigenetic modifications and hijack the autophagy to benefit the viral production and transportation during the lytic phase. However, whether or not EBV would manipulate these mechanisms to in turn eliminate some human cellular proteins or promote the proapoptosis in response to human cells still needs further study; the identified results have denoted that the acetylation, ubiquitination, and DNA methylation on human cellular proteins are beneficial to EBV, and then a number of acetyltransferases and ubiquitin proteins at the first infection stage are regulated by or interact with EBV miRNAs, TF, and viral proteins. Another mechanism in HVCPs during the lytic phase indicated that EBV primarily employs viral proteins itself to produce the new virions and then new virions are transported to cell membrane with lysis, mediated by human cellular proteins. Thus this could contribute to the immune evasion of viral proteins from the triggered human immune responses such as the function of immune cell functions, proapoptotic pathways, and antigen processing and presentation pathways. EBV is the first human virus found to encode viral miRNAs, a noncoding RNA that posttranscriptionally regulates gene expression to control human cellular events. We have proposed that EBV-encoded miRNAs could participate in cell interactions between human and EBV, including immune evasion, maintaining the survival of the infected cells, and the regulation of EBV and human genes potentially in the pathogenesis of EBV-related diseases [908]. At the both infection stages during the lytic phase, there are four identified viral miRNAs, including miR-BART1-3p, miR-BART5, miR-BART14, and miR-BART10. They are involved in the antiapoptotic process, the infected cell proliferation, and the envelope assembly to maintain the persistence of the lytic phase in the identified HVCPs. Renne et al. presented that EBV miRNAs can mainly perform three functions based on their cellular functions. One is the regulation of cell growth and survival, other is the regulation of innate immunity and inflammation, and another is the functions of orthologs and variants that could increase the functional genetic diversity of EBV [938]. The hijack of epigenetic regulations is essential to the ability of EBV to establish the lifelong latency and facilitate the malignant B cell transformation. Understanding how specific epigenetic regulations promote the progression of lymphomas, autoimmunity, and EBV-associated cancers is fundamental to design novel therapeutic interventions for the cure rate of some often fatal symptoms [939]. Thus, from the HVCPs in Figs. 18.8 and 18.10, we can predict other drug targets to EBV proteins and those four viral miRNAs. Viral proteins (BALF4, EBNA3B, and BDLF4) may block autophagy mechanism and hijack the autophagic vesicles for virion transportation; BNRF1 can inhibit human intrinsic defenses to enhance the activation and transcription of viral early lytic genes; BNLF2A can reduce the effect of antigenpresenting cells and then evade from T cell immune responses; BLLF2 mediates the interaction between EBV membrane protein LMP1 and another membrane protein LMP2B and may participate in the hijack of autophagy mechanism; BNLF2B is similar with viral BCRF1, and BNLF2B and BCRF1 can perform the function of immune evasion to protect EBV from the human immune surveillance. Therefore if we can design other multimolecule drugs on these drug targets, including EBV proteins and, particularly, EBV miRNAs in the future, then we can rapidly detect viral lytic proteins and interrupt the development of EBV lytic phase. Besides, EBV can exploit some viral-encoded proteins, which are the homologs of some human proteins to interact with human receptors for the purpose of evading and even
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.10 Conclusion
PDBid: 1vlk Structure of viral interleukin-10
555
PDB id: 2h24 Structure of human interleukin-10
FIGURE 18.13 The protein structures of EBV interleukin-10 and human interleukin-10. There are the structures of viral-encoded interleukin-10, BCRF1 (vIL-10) (left) and its homolog, human interleukin-10 (hIL-10) (right) from the database, PDBe. Each of them represents a copy of EBV or human IL10 and is viewed from the front [14]. EBV, EpsteinBarr virus; PDBe, Protein Data Bank in Europe.
inhibiting the human immune responses. We have done the analyses to find the structural similarities and compare the structures between viral BCRF1 and human cellular IL10 through the database, Protein Data Bank in Europe (PDBe), and its tool, PDBeFold. The structure alignment results have 94% amino acid sequence identity and 71% secondary structure identity between vIL-10 and hIL-10. As shown in Fig. 18.13, viral BCRF1 is a homolog of human IL10 and a functionally critical cytokine regulator about immune tolerance to interact with human receptor, IL10RA in the second infection stage in Fig. 18.9. This potentially crucial form of EBV immune modulation and immune suppression is due to EBV-encoded proteins similar to human cytokines termed as virokines [940]. It may be very important to discover more interactions via homologs between EBV and human in the future. Another strategy is to induce the latent form of infection into the lytic phase so as to cause an EBV-associated cell death. However, compared to the period of lytic reactivation and production, the period of latency in the EBV life cycle is much longer, and the expressed EBV latent proteins are limited, so these two latent conditions may contribute to the accuracy and effectivity of multimolecule drug design. The promising result in the drug design on EBV latent phase can prevent the cells infected with EBV latency from mutations that enable to result in the EBV-associated malignancies, such as EBV-positive tumor cells that are primarily initiated in the latent form of EBV infection.
18.10 Conclusion The World Health Organization has defined EBV as a class I carcinogen, and it is estimated to result in a small but significant portion of all human cancers. EBV may maintain the persistence in the human body for decades and cause the infected cells to get cancerous. It is estimated that EBV leads to nearly 400,000 cases of cancer each year, including BL, HL, gastric carcinoma, and nasopharyngeal carcinoma. Thus, to maintain the viral persistence in human B cells, EBV has evolved various strategies to manipulate the human immune responses, including the restriction of immune cell function, the blocking of
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
556
18. Constructing host/pathogen genetic-and-epigenetic networks
apoptotic pathways, and the interfering with antigen presentation functions. In this chapter, we investigate on pathogenic mechanisms by which dysregulations and dysfunctions of human B cells and immune modulation by EBV can contribute to the development of EBV lytic replication and production. Upon the reactivation of human B cells infected with EBV into the lytic phase, it could initiate the viral replication and infectious virion production. At that time, a number of EBV lytic genes are successively expressed and activated, so human immune system can detect the infection-associated viral proteins and thereby operates the immune responses. Viral nuclear antigen EBNA2 can impair human immune information by interacting with receptor CD46, and then this could promote the infected cell proliferation and help the viral immediate-early lytic gene BZLF1 for transcription. Zta can activate early lytic genes and induce viral antiapoptosis. EBV can exploit the epigenetic changes due to ubiquitination and acetylation to block the proapoptotic pathway via viral miR-BART1-3p, which can inhibit the expression of NAT1 and mediate human proteins and genes about the proapoptosis subjected to DNA methylation, acetylation, and ubiquitination. Originally, human wants to exploit autophagy mechanism to eliminate EBV proteins, but EBV enables to not only block the performance of autophagy but also hijack the autophagic vesicles to facilitate the viral transport. Human induces the proapoptosis and the interruption of viral translocation by CHMP5 and lncRNA SNHG5, respectively, but not only the epigenetic effects of ubiquitination, deacetylation, and DNA methylation have influence on these pathways, but also viral BNRF1 destroys the DAXX complex in proapoptotic pathway. Thus these host/virus mechanisms contribute to the complete progression of lytic production through the impairment of the proapoptosis and the promotion of viral translocation and antiapoptosis. EBV at the first infection stage has to prevent the infected human B cells from the elimination by autophagy and keeps the infected B cells in the lytic phase until the integrated production and transportation of new infectious virions are terminated. Thus EBV inhibits and hijacks the function of autophagy via BECN1 and silences the expression of STAT3, which can repress the expression and activity of lytic genes upon STAT3 in the activated status. These defensive mechanisms of EBV will result in the maintaining of lytic infection and the promotion of viral production. Viral membrane proteins LMP1, BNLF2B, and miR-BART1-3p can perform the function of antiapoptosis, and viral BCRF1 enables to operate the immunosuppression by releasing the viral IL10 to escape from the immune surveillance. In addition, EBV could exploit the transcriptionally activated FAM98B that can increase the ability of tRNA splicing, so this provides a more genetic diversity of proteins about packaging, assembly, and transportation for new viral particles. The disruption of PML by viral EBNA1 contributes to the inhibition of proapoptosis, and EBNA1 has the ability in the promotion of virion production and indirectly has influence on the facilitation of vesicle trafficking and virion releasing by TRIM3 and viral membrane protein LMP1. It is necessary to maintain the virion transportation by decreasing the repression of lytic cycle and increasing the operation of antiapoptosis. The epigenetic modification of ubiquitination and DNA methylation and the repression of viral miR-BART10 on PCBP2 can decrease the ability of PCBP2 in the inhibition of lytic phase, and viral miR-BART14 enables to repress the expression of lncRNA AFG3L1P, which can trigger the function of proapoptosis. EBV can indirectly exploit the expressed NRP1 and CLIC5 to promote the transportation of new virions.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
18.10 Conclusion
557
Finally, through the identified real GIGENs, HVCNs, and HVCPs via dynamic system models, big database mining, and NGS data, we get more information and new perspective on interspecies and intraspecies proteinprotein interactions and signaling transduction pathways from receptors to TFs, and then transcription regulations of target genes considering epigenetic miRNAs and lncRNAs regulations when compared to the related literature review. In addition, we have proposed the multimolecule drug design by exploring the EBV drug target proteins from HVCNs. These multimolecule drugs (Fig. 18.12) can inhibit the reactivation from the latent form into the lytic form in EBV life cycle and silence the abilities of some critical EBV lytic genes/proteins that can perform the reactivation into the lytic phase, the viral production of virions, the transportation of viral particles, and EBV defensive mechanisms during the lytic infection. In the future, we will search and collect more information about lncRNAs and open reading frame, and more predicted links between human and EBV to improve our models and results in this chapter. Thus it may discover other novel cross-talk mechanisms between human and EBV and more EBV pathogenic mechanisms during the lytic reactivation, the lytic infection, even the latent phase and its carcinogenesis.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
C H A P T E R
19 Human immunodeficiency virushuman interaction networks investigating pathogenic mechanism via for drug discovery: A systems biology approach 19.1 Introduction The human immunodeficiency virus (HIV) was first identified in 1983, and it quickly became a pandemic. According to the Joint United Nations Programme on HIV/Acquired Immune Deficiency Syndrome (AIDS), today, there are approximately 35 million people worldwide living with HIV infection. HIV prevention and treatment became important at the beginning of this century. HIV infection could deplete human CD4 1 T cells, which are critical for immune responses. Thus HIV infection impairs the immune system and causes AIDS. Although HIV can now be managed as a chronic disease through the use of highly active antiretroviral therapy (HAART) [941], how to cure the infection remains a significant challenge. It has been demonstrated that a mutation in HIV-1 conferring drug resistance can occur within a single day [942]. Because of its high rates of mutation, methylation, and recombination, HIV can quickly evolve resistance to antiretroviral drugs. A clear understanding of the molecular mechanism of HIV infection is required for an efficient antiretroviral drug design with minimal side effects; thus our current lack of a complete understanding has led to the unresolved disease in antiretroviral therapy. Avoiding the side effects of anti-HIV drugs requires uncovering the pathogenic mechanism of HIV infection from a genome-wide perspective. MicroRNAs (miRNAs), which are small noncoding RNA molecules (1922 nucleotides in length), bind to target mRNAs through the complementary base pairing to inhibit the mRNA translation. MiRNAs may play an important role in the HIV infection [903,943], because they can regulate many biological processes, including virus replication and translation [944]. Members of the same miRNA family that have distinct sequences and
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00009-2
559
© 2021 Elsevier Inc. All rights reserved.
560
19. Human immunodeficiency virushuman interaction networks
expression patterns can contribute to the diversity in posttranscription modification and biological processes. It has been reported that HIV-1 does not encode any miRNAs [945]. Thus it is important to investigate the dynamic changes in human miRNA regulation in CD4 1 T cells during the HIV-1 infection from the reverse transcription stage to the late stage in addition to the changes in proteinprotein interactions (PPIs). Genome-wide expression data of human CD4 1 T cells and HIV-1 gene expression data during HIV-1 infection have been reported [946]. Three stages of HIV-1 infection have been characterized by analyzing a time course of human gene/miRNA expression and viral gene expression of 24 h after infection. The first (or early) stage, also called the reverse transcription stage, occurs 212 h postinfection, during which the virus reverse transcribes its single-stranded RNA genome into double-stranded cDNA. The second stage, or the integration/replication stage, occurs 618 h after infection, during which the newly synthesized viral cDNA is integrated into the host genome. The late stage occurs 1624 h after infection, at which point viral particles are assembled and packaged for release. The genome-wide data have also shown the effect of miRNA regulation [688,947] and DNA methylation [687,948] on gene expression. Therefore the molecular pathogenic mechanisms, including miRNA regulation and DNA methylation, are considered in the investigation of the pathogenic mechanism of HIV infection in human CD4 1 T cells. Since Mohammadi and colleagues have investigated the profound perturbation of cellular physiology during HIV infection in terms of several regulatory mechanisms and have generated genome-wide time-course expression data for both HIV and the host SupT1 cell line [946], it is now possible to investigate the changes in the proteinprotein and miRNA interactions (PPMIs) in the mock-infected and HIV-1-infected CD4 1 T cells. In these experiments, polybrene and HIV-1 spinoculation were added to increase the efficiency of infection [949]. Mock-infected cells were also treated with the polybrene and mock spinoculation. PPMI has been used to investigate the topology and function of networks [950,951] and drug targets [952] in humans from a graph theory perspective. PPMI has also been used to analyze how miRNAs regulate biological processes through signaling pathways in the human hepatocarcinogenesis [953] and renal carcinogenesis [954]. In this chapter, we analyzed genome-wide expression data [946] in human CD4 1 T cells and HIV-1 to construct genome-wide PPMI networks at three stages of HIV-1 infection. As a control, we constructed three-stage PPMI networks in mock-infected cells. The identified genome-wide PPMI networks could provide an opportunity to unravel the pathogenic mechanism of HIV infection to aid in the discovery of new anti-HIV drugs. In a recent study, DNA methylation in the HIV-1-infected CD4 1 T cells was examined [955]. Another recent study has reported that DNA methylation can directly affect the binding affinities of RNA polymerase and miRNAs in primary human somatic and germline cells [683] and indirectly affect PPIs [956]. In this chapter, we applied a dynamic system and system identification method to construct the genome-wide PPMI networks of mock- and HIV-infected CD4 1 T cells. Methylation in CD4 1 T cells was inferred by the identified parameters of the dynamic system and was validated by literatures. Using the genome-wide DNA methylation profiles of HIV-infected and HIV-uninfected human B cells, we have validated the identified effect of DNA methylation on the pathogenic mechanism of the HIV-1infection in CD4 1 T cells. T cells (or thymus cells) and B cells (or bone marrowderived cells) are the major cells in the human immune response.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
19.2 Investigate pathogenic mechanisms at different stages of human immunodeficiency virus infection
561
19.2 Investigate pathogenic mechanisms at different stages of human immunodeficiency virus infection 19.2.1 Functional analysis of the core PPMI networks at three infection stages By applying a system identification method and system order detection scheme to the dynamic models of the PPMI network at each stage of HIV-1 infection using the highthroughput gene expression data from HIV-1 and CD4 1 T cells (see Section 19.4), we could obtain the PPMI network at the three HIV-1 infection stages. Based on the identified PPMI networks, we used an infection score to identify the core PPMI network with the top 10% of host proteins, which have specific biological functions, at each stage of HIV-1 infection in Fig. 19.1 (see Section 19.4). The core PPMI networks include 247 host proteins and 15 host miRNAs at the reverse transcription stage (Fig. 19.2A); 245 host proteins, 25 host miRNAs, and 16 viral proteins at the integration/replication stage (Fig. 19.2B); and 217 host proteins, 16 host miRNAs, and 16 viral proteins at the late stage (Fig. 19.2C). We also obtained 47 common proteins in Table 19.1 (see Section 19.4), 200 specific proteins at the reverse transcription stage, 198 specific proteins at the integration/replication stage, and 170 specific proteins at the late stage. We removed some specific proteins that did not connect to any other protein or miRNA in the corresponding specific PPMI networks, and then we obtained the specific PPMI networks at the reverse transcription (Fig. 19.A1), integration/replication (Fig. 19.A2), and late (Fig. 19.A3) stages by connecting specific proteins to interacting host miRNAs and viral proteins. Similarly, the connections between common proteins and their interacting host miRNAs and viral proteins constitute the specific core PPMI network at each infection stage (Fig. 19.3AC). We applied the GO (Gene Ontology) tool DAVID (Database for Annotation, Visualization and Integrated Discovery) (https://david.ncifcrf.gov/) to the host core proteins to identify the top categories based on P-value in Swiss-Prot and Protein Information Resource (SP-PIR) keywords, Biological Process, and Online Mendelian Inheritance in Man DISEASE. The SP-PIR keywords include molecular mechanisms, such as methylation, acetylation, and ubiquitin-like (Ubl) conjugation, participating in the epigenetic alteration of gene expression. The epigenetic alteration of gene expression in CD4 1 T cells has been observed during the HIV-1 infection [957]. We assumed that the molecular mechanisms of epigenetic alterations, such as DNA methylation and histone modification, could lead to basal changes, that is, δi in Eq. (19.1), in gene expression during HIV-1 infection [958]. The functional categories of the core proteins can help us determine the pathogenic and defensive mechanisms in HIV-1 infection. We obtain the top functional categories of the core proteins at the reverse transcription stage (Fig. 19.2A), which are acetylation (P , 7.56 3 10245), negative regulation of macromolecule metabolic process (P , 2.87 3 10211), Ubl protein conjugation (P , 6.96 3 10211), and hostvirus interaction (P , 3.64 3 1029). One hundred and thirty seven core proteins are found to be associated with acetylation and Ubl conjugation in the molecular mechanisms of epigenetics. The negative regulation of macromolecule metabolic processes could inhibit the degradation of host proteins. It has been reported that several human proteins, such as HDAC1, Gemin2, HuR, AKAP149, and DHX9, are required for the reverse transcription of HIV-1 [959]. Therefore the core proteins in the PPMI network at the reverse transcription stage could implicate epigenetic alteration of human proteins during viral infection, and the inhibition of human protein degradation may help increase the efficiency of the reverse transcription of HIV-1.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
FIGURE 19.1 Flowchart for the construction of the common core and specific interspecies PPMI networks. Interspecies PPMI networks for each stage of HIV infection were constructed from proteinprotein interaction, HIVhost interaction, miRNA regulation, and time-course gene expression data. Based on this information and the selected protein pool from data mining, putative interspecies PPMIs were constructed as the candidate interspecies PPMI networks. Then, by using coupling dynamic models (19.1) and (19.2) and gene expression profiles, the interaction of edges in these networks were evaluated by the constrained least square parameter estimation method in (19.A3). Candidate interspecies PPMI networks were pruned and refined by system order detection. By comparing HIV- and mock-infected cells, the infection indication matrix D in (19.5) was obtained for each stage of infection. Infection scores were computed for each protein, and the top 10% of proteins with the highest infection scores were selected as the core proteins. Finally, intersecting networks from each stage of infection yielded the core proteins of the common core hostpathogen interaction networks at the three stages. The rest of the proteins were included in the specific hostpathogen interaction network at each infection stage [18]. HIV, Human immunodeficiency virus; PPMI, proteinprotein and miRNA interaction.
19.2 Investigate pathogenic mechanisms at different stages of human immunodeficiency virus infection
563
FIGURE 19.2 The interspecies hostpathogen interaction networks at the reverse transcription stage (A), integration/replication stage (B), and late stage (C). The top 10% of host proteins with the highest infection scores are selected as the core proteins of this early hostpathogen interaction network [15].
The top categories of the core proteins at the integration/replication stage (Fig. 19.2B) include acetylation (P , 2.68 3 10252), negative regulation of macromolecule metabolic process (P , 4.08 3 10214), negative regulation of apoptosis (P , 1.29 3 10210), and hostvirus interaction (P , 2.07 3 1029). It has been also observed that HIV-1 replication could induce the proliferation of CD4 1 T cells, which is mediated by the effect of HIV-1 Vpr on apoptosis [960,961]. The host proteins identified in the core PPMI network at integration/replication could suggest that the epigenetic alteration of human proteins, the effects of HIV-1 proteins, and the reduced degradation of human proteins might lead to the HIV-1 integration and replication and host cell proliferation. The top categories of the core proteins at the late stage (Fig. 19.2C) include retroviral genome replication (P , .027), hepatocellular carcinoma (HCC) (P , .040), telomere capping (P , .053), and protein repair (P , .053). Interestingly, the roles of the identified core host proteins at the late stage include retroviral genome replication. Using the confocal microscopy of live HIVinfected cells, it has been observed that HIV replication was found in antigen-presenting cells during the late stage of infection before spreading to T cells [962]. The core proteins of the PPMI network at the late stage also suggest that the accumulated epigenetic alterations of human genes through previous stages could cause an incorrect folding of human proteins, which could induce crucial repair functions, through the SUMOylation, ubiquitination, and proteasome pathway [687,963]. The accumulated misfolded proteins lead to endoplasmic reticulum stress, which has a profound effect on cancer cell proliferation [687,963,964]. Telomere capping is then induced to stabilize cell proliferation. The HCC-associated proteins in the core PPMI network at the late stage could cause the mortality in HIV-infected patients [965].
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
564
19. Human immunodeficiency virushuman interaction networks
TABLE 19.1 The 47 proteins in the core hostpathogen cross-talk network and the infection scores at different stages of human immunodeficiency virus infection. Infection score
Infection score
Core proteins
Reverse transcription stage
Integration/ replication stage
Late stage
Core proteins
Reverse transcription stage
Integration/ replication stage
Late stage
APP
8.453
8.909
8.714
MYC
6.936
7.082
7.226
BAG6
6.307
6.068
6.016
NEDD4
7.454
7.014
8.436
CLN3
6.901
6.118
6.830
NPM1
6.006
6.701
6.059
CSNK1E
6.552
7.028
7.176
OXSR1
5.951
6.083
6.117
CSNK2A1
6.660
6.711
6.413
PAXIP1
6.506
6.184
6.984
CSNK2B
6.505
6.717
6.912
POT1
6.912
7.045
6.579
DBN1
5.804
6.293
6.472
PPP1CA
6.271
6.578
6.061
DDB1
5.948
6.492
6.348
RELA
5.592
6.726
6.238
EEF1A1
6.277
5.719
6.804
RUVBL1
6.676
5.986
6.514
ELAVL1
10.092
9.862
11.208
SH3KBP1
5.772
6.134
5.989
EP300
6.616
6.947
6.082
SMAD4
6.412
5.739
6.858
EWSR1
5.825
5.792
6.256
SQSTM1
8.372
8.363
7.937
FN1
9.521
8.667
9.609
STMN1
5.623
6.896
9.061
GATAD2A 6.277
6.463
7.805
TAF9
6.169
6.295
6.543
GRB2
6.737
7.200
6.857
TERF2
7.003
7.539
7.536
HDAC1
7.207
7.329
8.144
TP53
7.142
7.398
6.845
HDAC11
6.884
6.576
5.952
TUBA1B
6.678
6.490
6.052
HDAC3
6.059
6.730
7.587
TUT1
7.961
10.285
9.482
HGS
6.017
6.372
6.520
UQCRB
5.921
6.045
6.767
HSPA8
5.611
5.701
6.976
VHL
8.921
9.145
10.007
IQCB1
6.325
6.225
6.170
VIM
6.111
6.338
6.768
LATS2
5.902
8.290
6.222
YWHAB
6.100
6.340
6.395
LIG4
5.952
5.744
6.329
YWHAZ
6.228
6.700
6.084
MEPCE
5.644
5.786
6.845
The core set of 47 proteins common to the interaction networks throughout the viral life cycle. The names of the core proteins and the corresponding infection scores at different stages of infection are shown [15].
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
19.3 HIV/human interaction networks for multiple drug designs at three infection stages
565
19.3 HIV/human interaction networks for multiple drug designs at three infection stages 19.3.1 Functional analysis of the common core PPMI network at the three infection stages The top categories for the 47 common proteins are acetylation (P , 4.59 3 10211), hostvirus interaction (P , 1.69 3 1028), negative regulation of macromolecule metabolic process (P , 3.05 3 1028), and Ubl conjugation (P , 7.27 3 1026). EEF1A1 and NEDD4 interacting with Gag can contribute to Ubl conjugation and negative regulation of macromolecule metabolic process at the reverse transcription stage. Because no viral expression data are available at the reverse transcription stage, we cannot identify the direct target of the viral proteins identified based on system modeling at the early infection stage. Previous report has shown that in chronic untreated HIV infection, the more Gag epitopes targeted by the immune response can lower the viral load [968]. It has also been reported that Gag directly interacts with EEF1A1 and NEDD4 in T cells [967968]. We have identified EEF1A1 and NEDD4 as direct targets of HIV-1 infection in the core PPMI network at the reverse transcription stage (Fig. 19.3A). CSNK2A1 and RUVBL1, which interact with NEDD4, can promote the ubiquitination and degradation of NEDD4 [969], while SMAD4 and TAF9, which interact with EEF1A1, are involved in the transcriptional regulator proteins with potential to mediate downstream pathways [970,971]. It has been reported that Gag and the synthesis of K63-linked ubiquitin chains by NEDD4 family members are critical for the ability to stimulate virus replication, budding, and infection [978]. However, the signaling pathways in host target cells that influence virus replication, budding, and infection have not been completely understood. According to the results in Fig. 19.3A, we could suggest that NEDD4 protein may be activated in the reverse transcription stage and late stage in preparation for virus replication, budding, and infection. The ubiquitination and degradation of NEDD4 by CSNK2A1 and RUVBL1 can prohibit these functions activated in the reverse transcription stage (Fig. 19.3A). The suggestion can also be supported by the observation in SHIV-infected CD4 1 T cells in Indian rhesus macaques (Macaca mulatta) [968]. The hypermethylation of STMN1, CSNK2A1, YWHAB, and EEF1A1 and the miRNAs, miR-421, miR-320a, miR-449a, miR-671-5p, and miR-140-5p, mainly regulate cellular functions, including epigenetic regulation, protein degradation, and HIV reverse transcription during the early HIV-1 infection. By comparing the methylation profiles of HIV-infected and non-HIV-infected human B cells (see Section 19.4), we could observe significant methylation differences in STMN1 (P , 2.99 3 1026), CSNK2A1 (P , 5.59 3 1027), YWHAB (P , 1.33 3 10214), and EEF1A1 (P , 6.77 3 1029). APP, VIM, and OXSR1 and miR-326 could contribute to the negative regulation of apoptosis via DNA methylation and interspecies PPIs at the integration/replication stage. For the common core PPMI network at the integration/replication stage (Fig. 19.3B), we could find several miRNAs, including miR-18a*, miR-18b, miR-20b, miR-22, miR-96, miR-101, miR-142-3p, miR-200b, and miR-326, that directly interact with HIV proteins. Among them, only miR-326 is significantly expressed in the HIV-infected T cells at the integration/replication stage. It has been observed that antagomirs, which could knock down
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
566
19. Human immunodeficiency virushuman interaction networks
FIGURE 19.3 The core hostpathogen interaction networks during the reverse transcription stage (A), the integration/replication stage (B), and the late stage (C). By selecting the intersecting core host proteins from all three stages of infection, 47 host proteins are identified as core proteins (Table 19.1). Interactions among these core proteins during the reverse transcription stage (A), the integration/replication stage (B), and the late stage (C) are depicted. Pink and red labels denote the hypo- and hypermethylated proteins, respectively, that is, pink and red labels denote that the fold changes from the basal level, δi, are, respectively, less of greater than 35 and 100 in (19.1) between mock- and HIV-infected cells [18]. HIV, Human immunodeficiency virus.
miR-326, increased HIV-1 replication in CD4 1 T cells [973]. Therefore we could suggest that these HIV-interacting miRNAs modulate HIV-1 replication in human T cells. Hypermethylation of APP, VIM, and OXSR1 is also observed in the core PPMI network at the integration/replication stage (Fig. 19.3B). Significant methylation differences are observed in the DNA methylation profiles of APP (P , 1.46 3 1026), VIM (P , 7.75 3 1027), and OXSR1 (P , 9.83 3 1028). The methylated VIM interacts with HDAC1, which can participate in four of the top functional categories, including acetylation, hostvirus interaction, negative regulation of macromolecule metabolic process, and apoptosis, at the integration/replication stage (Fig. 19.3B). Methylated OXSR1 is found to interact with
19.3 HIV/human interaction networks for multiple drug designs at three infection stages
567
HSPA8, which can participate in acetylation and hostvirus interaction. Methylated APP is found to interact with TAF9, which can participate in the negative regulation of apoptosis. It has been reported that HDAC1 is associated with the acetylation and viral replication in HIV-1-infected T cells [974]. It has also been shown that HSPA8 is associated with the promoting viral replication in HIV-1-infected T cells [975]. In addition, a review study has proposed that TAF9 may be involved in the transcription and replication of HIV genes in HIV-infected T cells [976]. TAF9 is also the output of the Notch signaling pathway, which can control cell apoptosis [978]. A recent study has suggested that HIV proteins could alter the normal apoptotic signaling, resulting in an increased viral load in T cells and the formation of viral reservoirs that ultimately increase infectivity [984]. Based on these results, we could suggest that epigenetic alterations in APP, VIM, and OXSR1 and miR-326 repression play an important role in a negatively regulating cell apoptosis in the common core PPMI network at the integration/replication stage. The significantly upregulated methylation-associated proteins APP and ELAVL1 can contribute to human carcinogenesis and neuropathogenesis at the late stage. In the common core PPMI network at the late stage (Fig. 19.3C), we have identified numerous miRNAs, including miR-18b, miR-19b, miR-20b, miR-197, miR-214, and miR-324-5p, that could directly regulate viral proteins. Among these miRNAs, miR-214 and miR-324-5p interact with the most viral proteins and have been proposed to target HIV-1 and inhibit virion production [979982]. The hypermethylation of APP, GA-TAD2A, STMN1, and TUT1 was identified in the common core PPMI network at the late stage. Comparison of the methylation profiles in the HIV-uninfected and HIV-infected B cells has shown significant differences in the methylation profiles of APP (P , 1.46 3 1026), GATAD2A (P , 1.71 3 10210), STMN1 (P , 2.99 3 1026), and TUT1 (P , 1.07 3 1029). Interestingly, six methylation-associated host proteins/miRNAs, including the APP-interacting miRNA, miR-1260b, the GA-TAD2A-interacting proteins ELAVL1 and HDAC3, the STMN1interacting protein FN1, the TUT1-interacting protein/miRNA MEPCE, and miR-342-3p, are found to be closely related to human carcinogenesis [983990]. The common core proteins APP, OXSR1, TAF9, and ELAVL1 are significantly upregulated at the late stage of infection. APP and OXSR1 have been shown to be overexpressed in neuropathogenesis [991992], and the late HIV infection can also induce neuropathogenesis [993]. In addition, because TAF9 depletion leads to apoptotic cell death [994] and TAF9 activity is essential for cell viability [995], we could suggest that TAF9 overexpression is involved in regulating carcinogenesis at the late stage of HIV-1 infection.
19.3.2 Functional analysis of hostpathogen interaction networks Before conducting the functional analysis of the specific core PPMI networks at the three infection stages, the effect of the direct interactions has been investigated between host and viral proteins on host functions at the integration/replication and late stages. Because we only have the expression data for unspliced pre-mRNA, multiple-spliced RNA, and single-spliced RNA, we analyzed the impact of these three viral RNAs on host cellular functions. Based on the identified PPMI networks at the integration/ replication
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
568
19. Human immunodeficiency virushuman interaction networks
(Fig. 19.2B) and late (Fig. 19.2C) stages, we could obtain the interactions between host proteins and the viral RNAs encoded by these three viral RNAs. The results show that most host proteins only interact with the viral proteins encoded by a single viral RNA at the integration/replication (Fig. 19.4A) and late (Fig. 19.4B) stages. At the integration/replication stage (Fig. 19.4A), the host proteins that directly interact with viral proteins are categorized into the top four cellular functions, acetylation (P , 4.75 3 10221), hostvirus interaction (P , 8.61 3 10210), regulation of apoptosis (P , 3.94 3 1029), and negative regulation of macromolecule metabolic process (P , 2.07 3 1027), based on the GO tool DAVID. The viral proteins encoded by unspliced pre-mRNA could interact with 9, 1, 1, and 2 host proteins involved in acetylation, hostvirus interaction, regulation of apoptosis, and negative regulation of macromolecule metabolic process, respectively. The viral proteins encoded by single-spliced RNA could interact with 9, 2, 2, and 3 host proteins involved in acetylation, hostvirus interaction, regulation of apoptosis, and negative regulation of macromolecule metabolic process, respectively. The viral proteins encoded by multiplespliced RNA could interact with 5, 3, and 2 host proteins involved in acetylation, regulation of apoptosis, and negative regulation of macromolecule metabolic process, respectively. Therefore we can suggest that most HIV proteins use these posttranslational modifications to target host signaling pathways. The finding can also be supported [996]. In addition, the proteins encoded by single-spliced RNA and multiple-spliced RNA increasingly might contribute to the regulation of apoptosis and negative regulation of macromolecule metabolic process, leading to high carcinogenic risk [997]. At the late infection stage (Fig. 19.4B), the host proteins that directly interact with viral proteins are categorized into the top four cellular functions, acetylation (P , 3.17 3 1027), hostvirus interaction (P , 4.42 3 1026), regulation of protein catabolic process (P , 4.94 3 1026), and response to radiation (P , 9.4 3 1024). The viral proteins encoded by the unspliced pre-mRNA are found to interact with 5 and 1 host proteins involved in acetylation and hostvirus interaction, respectively. The viral proteins encoded by single-spliced RNA could interact with 3, 2, 3, and 2 host proteins involved in acetylation, hostvirus interaction, regulation of protein catabolic process, and response to radiation, respectively. The viral proteins encoded by multiple-spliced RNA are found to interact with 7, 1, 1, and 2 host proteins involved in acetylation, hostvirus interaction, regulation of protein catabolic process, and response to radiation, respectively. Because a number of tumors could occur at higher rates in HIV-positive individuals than in HIVnegative individuals [998999], studies of HIV-positive individuals treated by radiation therapy have been reported. HIV-infected patients may be more sensitive to ionizing radiation than noninfected patients [1000]. These results show that the viral proteins encoded by the single-spliced and multiple-spliced RNAs could contribute to the radiation sensitization of HIV-positive individuals. In addition, the data also show a lesser effect of the viral proteins encoded by unspliced pre-mRNA on human proteins from the integration/replication stage to the late stage. This may be because the HIV protein Gag traffics and targets to the plasma membrane for retroviral assembly and release at the late stage of infection [1001].
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
19.3 HIV/human interaction networks for multiple drug designs at three infection stages
FIGURE 19.4
569
Interspecies interactions between viral and host proteins at the integration/replication stage (A) and the late stage (B) [18].
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
570
19. Human immunodeficiency virushuman interaction networks
19.3.3 Functional analysis of the specific PPMI networks at the three infection stages We have obtained the top three functional categories at each stage, which are protein kinase activity (P , 2.56 3 1024), RNA splicing (P , 4.53 3 1024), and macromolecular complex subunit organization (P , 5.29 3 1024) at the reverse transcription stage (Fig. 19.5A); steroid hormone receptor binding (P , 4.05 3 1026), response to DNA damage stimulus (P , 4.19 3 1026), and regulation of DNA replication (P , 6.43 3 1025) at the integration/replication stage (Fig. 19.5B); and prostate cancer (P , 1.43 3 1025), protein biosynthesis (P , 3.18 3 1025), and chronic myeloid leukemia (P , 4.02 3 1025) at the late stage (Fig. 19.5C). The DNA methylated proteins TRIM28, MAZ, and SLC9A3R1 interacting with MAPK14, CDK13, and PRKACA can contribute to HIV replication, mRNA splicing, and T-cell immune responses at the reverse transcription stage. At the reverse transcription stage, several HIV
FIGURE 19.5 The identified functional core modules of HIV-1-infected CD4 1 T cells at the reverse transcription stage (A), integration/replication stage (B), and late stage (C). We have applied DAVID to the specific PPMI networks in Fig. 19.A119.A3 to identify the three main cellular functions of the proteins during the reverse transcription stage (A), the integration/replication stage (B), and the late stage (C), respectively [18]. DAVID, Database for Annotation, Visualization and Integrated Discovery; HIV, human immunodeficiency virus; PPMI, proteinprotein and miRNA interaction.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
19.3 HIV/human interaction networks for multiple drug designs at three infection stages
571
proteins associated with the mature reverse transcription complex [1002] and host protein kinases, such as the cAMP-dependent protein kinases [1003] and cyclin-dependent kinases [1004], can play an important role in the reverse transcription of full-length viral RNA and spliced RNAs of HIV-1 [1005]. The identified functional core modules shown in Fig. 19.5A cancharacterize the molecular mechanisms that occur at the reverse transcription stage. Comparison of the methylation profiles in the HIV-uninfected and HIV-infected cells can identify several hypermethylated proteins, including TRIM28 (P , 1.17 3 10211), MAZ (P , 1.34 3 1029), SLC9A3R1 (P , 2.56 3 1029), H2AFX (P , 4.11 3 1028), SNRNP200 (P , 5.38 3 1026), and BCAS2 (P , 1.89 3 1025), at the reverse transcription stage (Fig. 19.5A). The hypermethylated proteins TRIM28, MAZ, and SLC9A3R1 can interact with ADRBK1, MAPK14, CDK13, and PRKACA, which are involved in protein kinase activity. It has been reported that CDK13 plays an important role in regulating both HIV replication [1006] and mRNA splicing [1007] at the reverse transcription stage. MAPK14 and PRKACA have been shown to induce T cell immune responses in HIV-infected patients [1008]. In the functional core modules of the specific PPMI network at the reverse transcription stage, we can identify the epigenetic regulation of TRIM28, which has also been shown to epigenetically regulate HIV-1 transcription [1009]. The results could show that miR-140-5p inhibits the expression of CSK. Like LYP [1010], the repressed CDK5 could inhibit T-cell activation and play an important role in the development of autoimmune disorders [1011] when dissociated from CSK at the reverse transcription stage. Therefore we could propose that the miR-140-5p-mediated repression of CSK translation contributes to autoimmune disorders [1012] and the repression of T-cell activation in HIV-1-infected T cells. By using a miRNA array, miR-140-5p has been found to be differentially expressed between the HCV/HIV coinfection and the healthy control, between the HCV infection and the healthy control, and between the enteroviral infection and the healthy control [10131014]. In addition, we could show that the inhibition of miR320a derepresses POLR2A, macrophage migration inhibitory factor (MIF), SYF2, and SMARCAD1 (Fig. 19.5A). POLR2A is essential for the survival of lymphocytes [1015]. MIF contributes to tumorigenesis in T cells by the facilitating tumor proliferation [1016]. The HIVinduced production of SYF2 could be involved in the impaired proliferation capacity of T cells at the early stage of HIV infection [1017]. SMARCAD1 plays a critical role in the efficient transmission of epigenetic information in stem cells [1018]. Therefore we could propose that the inhibition of miR-320a derepresses abnormal the cell proliferation and epigenetic regulation in T cells. Biological experimental observations have also shown that miR-320a is downregulated in response to HIV-1 Tat [1019]. Moreover, we could also show that miR-941 inhibits PPP2R1A, leading to disruption of the PPI between PPP2R1A and AKT1. HIV-activated AKT1 has been implicated in the pathogenesis of T cells, which are the primary targets of HIV and respond to infection by providing several critical immune signals [1020]. It has been also reported that the disruption of the PPI between PPP2R1A and AKT1 contributes to cell survival [1021]. Therefore we could suggest that miR-941 regulation induces pathogenesis, activates critical immune signals, and contributes to cell survival. By using affymetrix expression array data, the significantly downregulated miR-941 has been also observed in the HIV-1infected CD4 1 T cells as compared to the HIV-1-infected CD8 1 T cells [1022]. The DNA methylated proteins CDK7 and TOPBP1 interacting with RUVBL2 and UBR5 contribute to the loss of fat-free mass and fat mass at the integration/replication stage. At the integration/replication stage, it has been inferred that the modifications of steroid hormones,
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
572
19. Human immunodeficiency virushuman interaction networks
particularly glucocorticoids and androgens, are associated with the loss of fat-free mass and fat mass observed in the untreated HIV-infected patients [10231026]. The effects of steroid hormone modifications and DNA repair factors on HIV-1 long terminal repeat activity [1027] and viral DNA [1028], respectively, may contribute to viral replication. The identified functional core modules in Fig. 19.5B also characterize the molecular mechanisms at the integration/replication stage. Comparison of the methylation profiles in the HIV-uninfected and HIV-infected cells could also identify several hypermethylated proteins, including CASP3 (P , 1.02 3 1027), CDK7 (P , 4.09 3 1029), and TOPBP1 (P , 9.88 3 1026), at the integration/ replication stage (Fig. 19.5B). The hypermethylated proteins CDK7 and TOPBP1 interact with RUVBL2 and UBR5, respectively. The use of genome-wide screening technologies in the HIV-infected human cells could show that UBR5 and RUVBL2 are involved in the folding and assembly of proteins [1029] and DNA repair factors [1028], respectively, at the integration/replication stage. UBR5 involved in the intracellular steroid hormone receptor signaling pathway has also been identified [1030]. The effects of methylation on the DNA repair [10311032] and posttranslation modification [1033,1034] in lymphocytes have also been discussed. We could show that miR-331-3p inhibits CTNNB1, which leads to an interaction between PHB2 and HIV proteins. It has been reported that PHB2 interacts with HIV-1 Env to promote replicative spread at the integration/replication stage [1035]. Thus the inhibition of CTNNB1 can also promote HIV-1 replication [1036]. A negative relationship or inverse relationship between PHB2 and CTNNB1 was also identified in cell migration [1037]. Therefore we propose that the inhibition of CTNNB1 by miR-331-3p can promote HIV-1 replication and replicative spread. We have observed that the interaction of IGF1R with miR-320a and SHC1 is disrupted by inhibition of miR-320a and SHC1 expression. Inhibition of SHC1 has been shown to lead to the more sustained HIV-1 replication and reduced apoptosis of the infected T cells [1038]. It has been shown that IGF1R is essential for the cell growth and mediation of antiapoptotic signals in the AIDS/HIV-related Kaposi’s sarcoma [1039]. The activated SHC1 has also been shown to induce the Ras/MAP kinase pathway, which can trigger cell cycle arrest through IGF1R [1040]. Therefore we propose that inhibition of miR320a and SHC1 can contribute to cell proliferation and dysregulation of the cell cycle in HIV1-infected cells at the integration/replication stage. The DNA methylated proteins CRK, DARS, and KRAS interacting with SOS1, EPRS, and RAF1 can contribute to cell proliferation, angiogenesis, and cell transformation associated with cancer development at the late stage. At the late stage of infection, several malignancies, especially prostate carcinoma in men [1041] and chronic myeloid leukemia [1042], have been known to occur more frequently. The increased carcinogenesis in HIV-infected patients is thought to be due to the prolonged survival in the era of HAART. It has been also observed that in the prostate, the number of CD4 1 T cells in the tumor is significantly greater than that in the surrounding benign tissue [1043]. Based on the identified functional core modules of the HIV-1-infected CD4 1 T cells at the late infection stage (Fig. 19.5C), we propose that the prostate carcinoma and chronic myeloid leukemia in HIV-infected patients are due to the pathogenesis of HIV infection, leading to a progressive depletion of CD4 1 T-cell populations. We have compared the synthesis of CD4 protein products in the HIV-uninfected and HIV-infected CD4 1 T cells [1044]. The increased levels of cellular proteins have been suggested to interact with HIV-1 proteins participating in HIV infection [1044]. The identified
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
19.3 HIV/human interaction networks for multiple drug designs at three infection stages
573
functional core modules shown in Fig. 19.5C can characterize the molecular mechanisms occurring at the late stage of HIV infection. Comparison of the methylation profiles in HIV-uninfected and HIV-infected cells could also identify several hypermethylated proteins, including CRK (P , 1.81 3 1024), DARS (P , 1.96 3 1028), and KRAS (P , 1.69 3 10211), at the integration/replication stage (Fig. 19.5C). The interaction of the hypermethylated and activated host protein CRK with SOS1 has been reported to promote cell proliferation and increase the cell transformation and tumorigenicity in lymphocytes [10451046]. The methylated and activated DARS has been reported to be involved in the translation of HIV proteins in the HIV particle assembly and budding at the late infection stage [1047,1048]. DARS and EPRS activation plays a critical role in the formation of a protein complex called GAIT (IFN-gamma-activated inhibitor of translation) involved in the translation silencing and inhibiting tumor angiogenesis [1049]. We suggest that the inhibition of EPRS could lead to the angiogenesis associated with cancer development. The result shows that the methylated and activated KRAS interacts with an activated RAF1, which has been reported to enhance the HIV-1 infectivity and function in virus assembly and release from infected cells [10501051]. In addition, we could show that miR-341-5p inhibits CCNE1. It has been reported that the activated CCNE1 is an essential downstream effector of the retinoblastoma (RB) tumor suppressor pathway [10521053]. The disruption of the RB pathway has been reported in human cancers [1054]. Therefore the result suggests that the inhibition of CCNE1 could result in carcinogenesis at the late stage of HIV-1 infection.3 The pathogenic mechanisms shown in Fig. 19.5AC during the three stages of HIV-1 infection are summarized in Fig. 19.6A. At the reverse transcription stage, we could identify the three main cellular functions, protein kinase, RNA splicing, and macromolecular complex subunit organization that are induced by the specific proteins in HIV-1-infected T cells compared to mock-infected T cells (Fig. 19.6A). Epigenetic regulation of TRIM28, MAZ, and SLC9A3R1 can contribute to HIV replication, mRNA splicing, and the immune responses of T cells. Regulation of miR-140-5p, miR-320a, and miR-941 is involved in the development of autoimmune disorders, tumor proliferation, and the pathogenesis of T cells at the reverse transcription stage. At the integration/replication stage, we have identified three main functions, steroid hormone receptor binding, response to DNA damage stimulus, and regulation of DNA replication, that are induced by the specific proteins in HIV-1-infected T cells when compared with mock-infected T cells (Fig. 19.6B). The epigenetic regulation of CDK7 and TOPBP contributes to DNA repair and the posttranslation modification, folding, and assembly of host proteins. The regulation of miR-331-3p, miR-320a, and miR-320a is involved in HIV-1 replication, replicative spread, antiapoptosis, cell proliferation, and dysregulation of the cell cycle at the integration/replication stage. At the late stage, we have identified prostate cancer, protein biosynthesis, and chronic myeloid leukemia, as the three top functional categories of the specific proteins in HIV-1-infected T cells when compared with mockinfected T cells (Fig. 19.6C). The epigenetic regulation of CRK, DARS, and KRAS contributes to cell proliferation, angiogenesis, translation of HIV proteins in the HIV particle assembly and budding, increase of cell transformation and tumorigenicity, to enhance the HIV-1 infectivity and functions in virus assembly and release. miR-341-5p is involved in carcinogenesis at the late stage of HIV-1 infection.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
FIGURE 19.6 Pathogenic mechanisms based on the specific PPMI networks at the three stages of HIV infection. Based on the pathogenic mechanisms shown in Fig. 19.5AC, the identified core pathogenic mechanisms in the reverse transcription (A), integration/replication (B), and late (C) stages are determined. The pink and red labels denote the identified hypo- and hypermethylated proteins, respectively [18]. HIV, Human immunodeficiency virus; PPMI, proteinprotein and miRNA interaction.
19.3 HIV/human interaction networks for multiple drug designs at three infection stages
575
19.3.4 Network-based pathway enrichment analysis Furthermore, we could apply network-based pathway enrichment analysis [1055] to the HIV-1 interacting proteins at the integration/replication and late stages (Fig. 19.2B and C) to explore the proposed HIV-1 directly regulating gene sets in this study. In Fig. 19.2B and C, we could identify 393 and 227 proteins that directly interact with HIV-1 proteins at the integration/ replication and late stages, respectively. By applying the network-based pathway enrichment analysis based on the identified host interactions in (19.1), at each infection stage, the top 10 significant gene sets at the integration/replication and late stages are shown in Table 19.2; and other significant gene sets are shown in Table 19.A1. The top five significant gene sets at integration/replication stage are BIOCARTA_CHREBP2_PATHWAY, PID_FAK_PATHWAY, PID_RB_1PATHWAY, PID_NFAT_3PATHWAY and, BIOCARTA_AKAPCENTROSOME_ PATHWAY. It has been observed that CHREBP is significantly upregulated in mouse models— transgenic mice expressing Vpr, which can also induce hepatic steatosis by dysregulating PPARα. and LXRα-mediated gene expression in liver [1056]. The majority (65%) of patients had steatosis [1057]. In the HIV-1-infected CD4 1 T cells, it has been suggested that HIV-1 Nef may displace FAK from CD4 to protect the cells from apoptosis [1058]. It has been proposed that the RB protein induced by HIV viremia in monocytes exposed to HIV-1 could mediate protection from the activation-induced apoptosis [1059]. It has been confirmed that the nuclear factor of activated T cells is required for the optimal latent virus reactivation in memory CD4 1 T cells [1060]. It has also been reported that HIV-1 Vpr protein could hijack centrosome functions to alter the cell cycle regulation [1061]. Moreover, the top five significant gene sets at the late stage are BIOCARTA_LONGEVITY_ PATHWAY, BIOCARTA_NDKDYNAMIN_PATHWAY, ST_B_CELL_ANTIGEN_RECEPTOR, REACTOME_RECYCLING_PATHWAY_OF_L1, and PID_PI3KCI_PATHWAY. It has been observed that the human longevity-associated hormone, human growth hormone and IGF-1 can regulate HIV-1-specific T-cell responses [10621063]. The NDK Dynamin pathway participating in the HIV-1 envelope has also been proposed [1064]. The association between CD4 1 T cells and B cell dysfunction in the HIV-1 infection has also been discussed [1065]. The significantly elevated levels of the calcium-binding myelomonocytic protein calprotectin (L1 protein) in people with AIDS compared with controls have been quantitatively measured [1066]. Moreover, HIV-1 Nef, which has a potent role in both early and late phases of the viral life cycle, could recruit and phosphorylate the tyrosine kinase ZAP-70 to downregulate MHC class I in the primary CD4 1 T cells [1067].
19.3.5 Multiple drug combinations for treatment at each stage of HIV-1 infection based on the specific PPMI networks Finally, in order to identify the drug combination for the treatment of patients at each stage of HIV-1 infection, we could use a strategy to discover a drug combination in the functional core modules of the specific PPMI network (Fig. 19.5AC) at each HIV infection stage (see Section 19.4). At the reverse transcription stage the gene CPNE3 is highly expressed and ADRBK1, CSK, PRKCZ, SLC9A3R1, CTCF, H2AFX, NDUFS8, PTBP1, and miR-320a are highly repressed in the HIV-1-infected T cells. Therefore, based on the functional core modules shown in Fig. 19.5A, we propose a combination of thalidomide,
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
TABLE 19.2
Top 10 significant canonical pathway gene sets found enriched by GSEA analysis [15].
Enriched gene sets interacted with HIV proteins at integration/replication stage
Size
z-Score
Enriched gene sets interacted with HIV proteins at late stage
Size
z-Score
BIOCARTA_CHREBP2_PATHWAY
16
6.12
BIOCARTA_LONGEVITY_PATHWAY
7
5.93
PID_FAK_PATHWAY
17
5.97
BIOCARTA_NDKDYNAMIN_PATHWAY
8
5.1
PID_RB_1PATHWAY
28
5.93
ST_B_CELL_ANTIGEN_RECEPTOR
18
4.73
PID_NFAT_3PATHWAY
27
5.86
REAC-TOME_RECYCLING_PATHWAY_OF_L1
9
4.7
BIOCARTA_AKAPCENTROSOME_PATHWAY
6
5.72
PID_PI3KCI_PATHWAY
18
4.69
ST_INTEGRIN_SIGNALING_PATHWAY
31
5.69
REACTOME_FORMATION_OF_TUBULIN_FOLDING_ INTERMEDIATES_BY_CCT_TRIC
6
4.59
REACTOME_RNA_POL_II_PRE_TRANSCRIPTION_EVENTS
19
5.68
REACTOME_IL_6_SIGNALING
3
4.5
KEGG_HEMATOPOIETIC_CELL_LINEAGE
11
5.68
PID_CXCR3_PATHWAY
16
4.43
REACTOME_SEMA4D_IN_SEMAPHORIN_SIGNALING
8
5.6
REACTOME_INTEGRATION_OF_PROVIRUS
3
4.4
REACTOME_CELL_CELL_COMMUNICATION
20
5.59
PID_PRL_SIGNALING_EVENTS_PATHWAY
6
4.37
HIV, Human immunodeficiency virus.
19.4 Methods
577
oxaprozin, and metformin as a multiple molecule drug to treat patients with HIV-1 infection at the reverse transcription stage (Table 19.3). At the integration/replication stage, NCOA3 is highly expressed and STAT5B, CTNNB1, KAT5, MMS19, DYRK2, TERF2IP, MLH1, SHC1, PPP2CA, and miR-320a are highly repressed in the HIV-1-infected T cells. Therefore, according to the functional core modules in Fig. 19.5B, we propose a combination of quercetin, nifedipine, and fenbendazole as a multiple molecule drug to treat patients with HIV-1 infection at the integration/replication stage (Table 19.3). At the late stage, KRAS, RAF1, CRK, CTBP1, and DARS are highly expressed and MAPK3, CCNE1, HSP90AB1, CTNNB1, BCL2L1, EIF5A, RPL6, EPRS, RPS2, and EEF1D are highly repressed in the HIV-1-infected T cells. Based on the functional core modules shown in Fig. 19.5C, we propose a combination of staurosporine, quercetin, prednisolone, and flufenamic acid as a multiple molecule drug to treat patients with HIV-1 infection at the late stage (Table 19.3).
19.4 Methods A flowchart of the strategy used to construct the common core and specific interspecies PPMI networks is shown in Fig. 19.1. In this chapter, we first use big data, including intraspecies and interspecies PPI [584,1068] and miRNA regulation [1069,1070], to construct the candidate PPMI network for HIV-1-infected CD4 1 T cells. We then apply a dynamic system and system identification method to identify the genome-wide PPMI networks of the HIV-infected and HIV-uninfected T cells during three stages of infection. To extract the core networks of the genome-wide PPMI networks, we define an infection score based on the identified protein and miRNA interactions in the genome-wide networks. We suggest that the proteins with high infection scores could be involved in the changes of biological processes between mock- and HIV-infected cells. Finally, we apply big mechanism analysis to unravel the pathologic mechanisms of HIV-1 infection during these three stages using a GO tool and to identify the core proteins with an aim to discover new anti-HIV multiple molecule drugs. We use the drug response genome-wide microarray data [813] from a promyelocytic cell line to design three multiple molecule drug combinations for the treatment of patients at three stages of HIV-1 infection.
19.4.1 Protein pool selection In this chapter, we want to identify hostpathogen interactions at three stages of HIV-1 infection, which orchestrate the host response to infection at each stage. Therefore the two-sided gene expression data, which simultaneously measured expression levels of host and pathogen genes at three stages during infection process, are required. Mohammadi and colleagues [946] generated two-sided genome-wide time-course expression data for HIV and human mRNA and human miRNA in HIV- and mock-infected T cells. The expression data at several time points were set to zero due to the low quality of the samples (according to Mohammadi and colleagues) [946]. The HIV expression data include three transcript types, unspliced pre-mRNA, single-spliced RNA, and multiple-spliced RNA. The expression of viral and human genes was
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
TABLE 19.3 The strategy to identify potential drug combinations for treating HIV-1-infected patients at each stage of HIV-1 infection is based on the drug response genome-wide microarray data. Reverse transcription stage
Integration/replication stage
Last stage
The highly expressed genes for potential inhibition strategy of multiple drug design
CPNE3
NCOA3
KRAS, RAF1, CRK, CTBP1, DARS
The suppressed genes for potential activation strategy of multiple drug design
ADRBK1, CSK, PRKCZ, SLC9A3R1, CTCF, H2AFX, NDUFS8, PTBP1, miR-320a
STAT5B, CTNNB1, KAT5, MMS19, DYRK2, TERF2IP, MLH1, SHC1, PPP2CA, miR-320A
MAPK3, CCNE1, HSP90AB1, CTNNB1, BCL2L1, EIF5A, RPL6, EPRS, RPS2, EEF1D
The potential multiple drug combination
We use the correlation coefficients between druggene pairs (see Section 19.4) to determine their suitability for the treatment of HIV. If the correlation coefficient of a druggene pair is larger than 0.3, expression of the gene can be activated by the drug, whereas if the correlation coefficient of a druggene pair is less than 0.1, expression of the gene can be inhibited by the drug. Based on these criteria, we obtain a drug combination of thalidomide, oxaprozin, and metformin that could be used to treat patients with HIV-1 infection at the reverse transcription stage, a combination of quercetin, nifedipine, and fenbendazole that could be used as a multiple molecule drug to treat patients at the integration/replication stage, and a combination of staurosporine, quercetin, prednisolone, and flufenamic acid that could be used as a multiple molecule drug to treat patients at the late stage [15].
19.4 Methods
579
measured by real-time polymerase chain reaction and high-throughput RNA sequencing, respectively. If more two-sided genome-wide time-course expression data at three stages of HIV-1 infection are available in the future, it could provide more reliable and accurate estimates. Host proteins are selected by one-way analysis of variance (ANOVA), where the null hypothesis is that the average expression of a protein is the same in HIV- and mock-infected cells. Host proteins with a P-value greater than .05 were pooled. A total of 3234 genes are identified, and an additional 159 miRNAs that potentially interact with these genes are added to the pool. All 16 HIV proteins are produced via alternative splicing of viral mRNA. Capsid, Gag, matrix, nucleocapsid, p1, p6, and pol are produced from the unspliced pre-mRNA (B9 kb); Rev, Tat, and Nef are produced from the multiple-spliced RNA (B2 kb); and Env, gp120, gp41, Vif, Vpr, and Vpu are produced from the single-spliced RNA (B4 kb). The genome-wide expression data include the expression of unspliced pre-mRNA, multiple-spliced RNA, and single-spliced RNA. Therefore every protein from an RNA group (unspliced pre-mRNA, multiple-spliced RNA, or single-spliced RNA), has the same expression data. Expression data for the three HIV transcripts are used to represent the expression of the 16 HIV proteins and identify the PPMI network. The genome-wide time-course expression data for HIV-1 and HIV-infected and mockinfected CD4 1 T cells have shown three infection stages. The reverse transcription, integration/replication, and late stages occur 212, 618, and 1624 h after infection, respectively. The three-stage expression data could allow us to identify the PPMI network at each stage by applying the system identification method and system order detection scheme to the dynamic system models described next. To validate our DNA methylation findings in human immune cells during the HIV-1 infection, we could apply one-way ANOVA to the genome-wide DNA methylation profiles of HIV-infected and HIV-uninfected human B cells [1071]. We could find significant differences in the DNA methylation profiles between the HIV-infected and non-HIVinfected human B cells (P , .05).
19.4.2 Reconstruction of the candidate PPMI network through data mining Human PPIs and miRNA regulation are collected from big datasets, including the Biological General Repository for Interaction Datasets (BioGRID; http://thebiogrid.org/), Database of Interacting Proteins (DIP; http://dip.doe-mbi.ucla.edu), Biomolecular Interaction Network Database (BIND; http://bind.ca), IntAct (http://www.ebi.ac.uk/intact/), molecular INTeraction database (MINT; http://mint.bio.uni-roma2.it/mint/), TargetScanHuman (http://www.targetscan.org/), and CircuitsDB 2 (http://penelope.unito.it/circuitsdb2), which contain 8,578,807 protein interaction pairs and 492,550 regulatory miRNAs. HIV protein interaction data and interspecies HIVhost protein interaction data are obtained from the National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov/genome/viruses/retroviruses/hiv1/interactions/) [1068], VirHostNet 2.0 (http://virhostnet.prabi.fr), VirusMINT database (http://mint.bio.uniroma2.it/virusmint/), and the pathogenhost interaction search tool (PHISTO; www.phisto.org/), which contains 14,437 interactions between human and HIV proteins and 21 HIV PPI pairs.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
580
19. Human immunodeficiency virushuman interaction networks
19.4.3 Dynamic system models of the hostpathogen interspecies PPMI network To identify the interspecies PPMI network from the candidate interspecies PPMI network, we apply the following dynamic models to characterize the molecular mechanisms, including PPIs and miRNA regulations, in the interspecies PPMI network. For the host the dynamic model of the ith host protein yi in the interspecies PPMI network can be described by the following regression form: yi ðt 1 1Þ 5 yi ðtÞ 1
N X
ain yn ðtÞ 1
n51
K X
bik hk ðtÞ 2
Z X
ciz Rz ðtÞ 2 λi yi ðtÞ 1 δi 1 ωi ðtÞ
(19.1)
z51
k51
where yi ðtÞ and yn ðtÞ are the expression of the ith and nth host proteins at time t for i, n 5 1, . . . , N, respectively; ain and bik are the interactions between the ith and nth host proteins and between the kth viral protein and the ith host protein, respectively; hk(t) and Rz(t) are the expression levels of the kth viral protein and the zth host miRNA at time t, respectively; 2 ciz is the negative regulatory ability of the zth miRNA on the ith host protein; 2 λi and δi represent the degradation effect and the basal expression level of the ith host protein, respectively; ωi(t) denotes the stochastic noise of the ith host protein at time t; and N, K, and Z indicate the total number of host proteins, viral proteins, and host miRNAs in the candidate PPMI network, respectively. For the virus the dynamic model of the jth viral protein hj in the interspecies PPMI network can be described by the following regression form: hj ðt 1 1Þ 5 hj ðtÞ 1
N X n51
αjn yn ðtÞ 1
K X k51
β jk hk ðtÞ 2
Z X
γ jz Rz ðtÞ 2 εj hj ðtÞ 1 κj 1 wj ðtÞ
(19.2)
z51
where hj ðtÞ and wj ðtÞ are the expression level and stochastic noise of the jth viral protein at time t for j 5 1, . . . , K, respectively; αjn and β jk are the interaction between the jth viral protein and the nth host protein and between the jth and kth viral proteins, respectively; 2 γ jz indicates the negative regulatory ability of the zth host miRNA to the jth viral protein; and 2 εj and κj are the degradation effect and the basal expression level of the j viral protein, respectively. In other words, the coupling dynamic models in (19.1) and (19.2) characterize the molecular mechanisms in HIV-1 and CD4 1 T cells, including the interspecies cross-talk P P between viral and host proteins using the terms Kk51 bik hk ðtÞ in (19.1) and N n51 αjn yn ðtÞ in P (19.2), intraspecies interactions in viral and host cells using the terms N a n51 in yn ðtÞ in (19.1) PK and k51 β jk hk ðtÞ in (19.2), respectively, the host miRNA negative regulation in host and P P viral proteins using the terms 2 Zz51 ciz Rz ðtÞ in (19.1) and 2 Zz51 γ jz Rz ðtÞ in (19.2), respectively, and the host and viral protein degradations using the terms 2 λiyi(t) in (19.1) and 2 εjhj(t) in (19.2), respectively. Furthermore, we apply the following criteria to the models in (19.1) and (19.2) to identify the parameters and prune the candidate interspecies PPMI network using genome-wide temporal expression data.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
581
19.4 Methods
19.4.4 Use of an infection score to Identify the core proteins involved in the changes of the biological processes between the interspecies PPMI networks of mock- and HIV-infected CD4 1 T cells at each infection stage After reconstructing the interspecies PPMI network, we could obtain three pruned interspecies PPMI networks in HIV-infected cells at three infection stages, and three networks in mock-infected cells for the three infection stages. In the host models (19.1) of the mockinfected cells, the interspecies interactions between HIV and T cells, bik for i 5 1, . . . , N, and k 5 1, . . . , K, are set to zero. To define an infection score based on the differences between the two interspecies PPMI networks of HIV-infected and mock-infected cells, we first define the network matrix of HIV-infected cells, Aν , according to θ^ i in (19.A3) as follows: 2 3 a^11;ν ? a^1N;ν b^11;ν ? b^1K;ν c^11;ν ? c^1Z;ν Aν 94 ^ (19.3) & ^ ^ & ^ ^ & ^ 5 ^ ^ a^N1;ν ? a^NN;ν bN1;ν ? bNK;ν c^N1;ν ? c^NZ;ν Similarly, we define the network matrix, Ao , of mock-infected cells as follows: 2 3 a^11;o ? a^1N;o 0 ? 0 c^11;o ? c^1Z;o & ^ ^ & ^ ^ & ^ 5 Ao 94 ^ a^N1;o ? a^NN;o 0 ? 0 c^N1;o ? c^NZ;o
(19.4)
Furthermore, we define the infection indication matrix D based on the interaction differences between the networks of mock- and HIV-infected cells as follows:
D9j2Aν 2 Ao j 3 jd11 j ? d1j ? d1ðN1K1ZÞ 7 6 ^ & ^ & ^ 7 6 7 ? d 56 j j ? d d ij iðN1K1ZÞ 7 6 i1 5 4 ^ & ^ & ^ j j d ? d ? d Nj NðN1K1ZÞ 2 N1 a^11;ν 2 a^11;o ? a^1N;ν 2 a^1N;o b^11;ν ? 6 56 ^ & ^ ^ & 4 a^N1;ν 2 a^N1;o ? a^NN;ν 2 a^NN;o b^N1;ν ?
^ b1K;ν ^ ^ bNK;ν
c^11;ν 2 c^11;o ^ c^N1;ν 2 c^N1;o
3 ? c^1Z;ν 2 c^1Z;o 7 7 & ^ 5 ? c^NZ;ν 2 c^NZ;o
(19.5)
where |dij| denotes the difference in the connection from the jth node to the ith node between the PPMI networks of mock- and HIV-infected cells. To determine the core proteins involved in the changes of biological processes between the networks in mock- and HIV-infected cells, we define the infection score of the ith protein, Ii, as follows: PN1K1Z dij j51 (19.6) Ii 9log2 linki 1 linki
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
582
19. Human immunodeficiency virushuman interaction networks
where linki indicates the number of the links to the ith protein;
P
N1K1Z j51
dij =linki repre-
sents the average weight difference of the ith protein between the networks of mock- and HIV-infected cells; and log2 linki is a compensation term to weight the hubs in the differential PPMI network. High infection scores suggest differences in the corresponding biological processes between mock- and HIV-infected cells. Because the contribution of the connected proteins to the ith protein depends on their connection weights and their expression levels, the have small average P hub proteins N1K1Z dij =linki among proteins weight difference in the distribution plot of j51
(Fig. 19.A4). The hub proteins could participate in the changes of biological processes between the networks in mock- and HIV-infected cells. Therefore we define the infection PN1K1Z dij =linki and score in (19.6), which consists of the average weight difference j51 the compensation term log2 linki, to bring the infection scores of the hubs closer to the center of the infection score distribution. In this chapter the core cross-talk network defined at each infection stage comprises the host proteins with the highest Ii (top 10%) and their interacting host miRNAs and viral proteins (Fig. 19.2AC).
19.4.5 Extraction of the common and specific PPMI networks during the HIV-1 infection Based on the core cross-talk networks in the reverse transcription (Fig. 19.2A), integration/replication (Fig. 19.2B), and late infection (Fig. 19.2C) stages, we could find a set of 47 proteins that are common to all three networks (Table 19.1), which are designated as the common PPMI network of the hostpathogen interaction along with their interacting host miRNAs and viral proteins (Fig. 19.3AC). All other proteins, including the interacting host miRNAs and viral proteins, are members of the specific PPMI networks at each infection stage (Figs. 19.A119.A3). In a genome-wide siRNA screen to knock down more than 20,000 human genes, 932 proteins are found to be essential for HIV replication [10721074]. These factors are called HIV dependency factors and are highlighted in the results (Figs. 19.219.4 and Fig. 19.A1-19.A4).
19.4.6 Design strategy for determining multiple molecule drug combinations for treating patients with HIV-1 infection in CD4 1 T cells We apply Pearson correlation to the drug response genome-wide microarray data [813] to evaluate the effects of 1327 drugs on 14,825 genes/miRNAs. If the correlation coefficient between a druggene (or drugmiRNA) pair is close to 1, gene expression can be activated by an increased concentration of the drug. Oppositely, if the correlation coefficient between a druggene (or drug-miRNA) pair is close to 21, gene expression can be inhibited by an increased concentration of the drug. Therefore we can determine multiple
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
19.6 Abbreviations
583
molecule drug combinations that activate the highly repressed genes and inhibit the highly expressed genes at a particular infection stage as a potential treatment for HIV patients.
19.5 Conclusion The mechanisms underlying HIV invasion and the host response are complex. HIV replication requires host machinery, and how the virus takes over and repurposes host resources has been extensively investigated for many years. Although some HIV therapies could increase the life spans of HIV-infected patients, curing HIV infection is still a big challenge. Thus understanding the interactions between HIV and host cells may identify new therapeutic targets. In this chapter, we provide a global view of PPMI using a systems biology approach. By comparing the PPMIs between HIV- and mock-infected cells, the common core PPMI network was extracted for each stage of infection, the intersection of which comprises the core set of proteins that define the entire infection cycle. Analysis of the common core PPMI network could yield new insight into the race between virus invasion and host defense. We have also analyzed the role of each viral protein during infection, as well as the role of epigenetic and miRNA regulation. Elucidation of the cellular processes associated with each viral protein may reveal how HIV hijacks host resources to facilitate infection, and investigation of miRNA and epigenetic regulation illustrates how miRNAs and methylation affect infection. Finally, we have proposed multiple drug combinations for treating HIV-1-infected patients with at each stage of infection. Thus the proposed systems biology methods can identify the functional core modules of hostpathogen networks as candidate drug targets for HIV therapy.
19.6 Abbreviations AIC AIDS ANOVA BIND BioGRID CCR5 DIP GAIT GO HCC HIV MIF MINT miRNA PHISTO PPIs PPMI SP-PIR Ubl
Akaike information criterion acquired immune deficiency syndrome analysis of variance Biomolecular Interaction Network Database the Biological General Repository for Interaction Datasets CC chemokine receptor 5 Database of Interacting Proteins IFN-gamma-activated inhibitor of translation. Gene Ontology hepatocellular carcinoma human immunodeficiency virus macrophage migration inhibitory factor molecular INTeraction database microRNA pathogenhost interaction search tool proteinprotein interactions proteinprotein and miRNA interaction Swiss-Prot and Protein Information Resource ubiquitin-like
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
584
19. Human immunodeficiency virushuman interaction networks
19.7 Appendix 19.7.1 Identification of the real PPMI network by pruning false-positive interactions through system identification and system order detection After constructing a candidate interspecies proteinprotein and miRNA interaction (PPMI) network through big data mining, the candidate interactions include many false-positive interactions (edges). To prune false-positive interactions and obtain the real interspecies PPMI networks at the three human immunodeficiency virus (HIV) infection stages, two coupling linear dynamic models in (19.1) and (19.2) are proposed as a mathematical description for the interspecies proteinprotein interactions (PPIs) between virus and host. To identify the parameters, including ain, bik, and ciz in model (19.1) and αjn, β jk, and γ jz in model (19.2), using genome-wide temporal expression data for host cells and real-time polymerase chain reaction temporal expression data for HIV (19.1) can be rewritten as the following regression form: 2
yi ðt 1 1Þ 5 y1 ðtÞ
? yNi ðtÞ h1 ðtÞ
? hKi ðtÞ
2R1 ðtÞ
?
2RZi ðtÞ
9φi ðtÞθi 1 ωi ðtÞ
3 ai1 6 7 ^ 6 7 6 aiNi 7 6 7 6 bi1 7 6 7 6 7 ^ 7 6 7 1 ωi ðtÞ yi ðtÞ 1 6 b iKi 6 7 6 ci1 7 6 7 6 7 ^ 6 7 6 ciZ 7 i 6 7 4 ð1 2 λi Þ 5 δi
(19.A1) where φi ðtÞ is the regression vector that can be obtained from the expression data and θi is the parameter vector to be estimated for protein i of host cells. We define Ni, Ki, and Zi as the number of nonzero parameters in ain, bik, and ciz, respectively. In other words, we have to identify Ni 1 Ki 1 Zi 1 2 parameters in θi. For the parameter estimation accuracy, the temporal expression data of the ith protein are interpolated to Li data points, Li 5 5(Ni 1 Ki 1 Zi 1 2) by applying the cubic spline method to avoid the overfitting in the parameter estimation. When the temporal expression data in yi(t), hk(t), and Rz(t) of protein i is interpolated to Li time points, Eq. (19.A1) is rewritten in the following form for the ith protein in host cells: Y i 5 Φi θ i 1 E i where
2
3 yi ð2Þ Yi 5 4 ^ 5; yi ðLi Þ
2
3 φi ð1Þ 5; ^ Φi 5 4 φi ðLi 2 1Þ
(19.A2) 2
3 ωi ð1Þ 5: ^ and Ei 5 4 ωi ðLi 2 1Þ
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
585
19.7 Appendix
Thus the parameter estimation problem for θ^ i could be solved by applying the following constrained least square minimization criteria: 1 2 θ^ i 5 arg minθi :Φi θi 2Yi :2 2 2 Subject to
0 6^ 6 60 6 4^ 0
? 0 & ^ ? 0 & ^ ? 0
0 ^ 0 ^ 0
? 0 & ^ ? 0 & ^ ? 0
21 0 0 & ^ & ^ ? 0 ?
? ? & ? 21 & & 1 ? 0
3 2 3 0 0 6^7 ^ 7 7 6 7 6 7 ^ 7 7θi # 6 0 7 5 415 0 21 0
(19.A3)
The constraint set contains nonpositive microRNA regulation as 2 ciz, nonpositive host protein degradation as 2 λi, and nonnegative host protein basal expression level as δi. The estimation problem could be optimized using the constrained least square scheme based on a reflective Newton method for minimizing a quadratic function in the MATLAB optimization toolbox, and the optimal parameter θ^ i is then obtained. Furthermore, a system order detection scheme, Akaike information criterion (AIC), is then applied to prune the false-positive connections in the PPMI network at each infection stage. The AIC value decreases with the number of interaction parameters, and a minimum is achieved at the true number of interaction parameters. Therefore the false-positive PPMI interactions in the candidate PPMI network are pruned using the AIC value to delete insignificant interactions out of the true number of interaction parameters to obtain the real PPMI network at each HIV infection stage. Because the genome-wide host and viral protein expression levels during HIV-1 infection have not yet be measured and 73% of the variance in protein abundance can be explained by mRNA abundance, mRNA expression profiles are frequently used as a substitute for the protein expression profiles. Furthermore, if the relationship between the protein expression y(t) and mRNA expression x(t) is linear, that is, y(t) 5 αx(t), α could be canceled out on both sides of Eq. (19.A1). Therefore the estimation of parameters ain, bik, and ciz is not influenced by replacing the protein expression with mRNA expression. Remark 1 To obtain a symmetric matrix of PPIs, we define ain 5 fani jjani j . jain jg Student’s t-test is used to calculate the statistical significance (P-value) of the parameters in (19.1) and (19.2) in the real PPMI network under the null hypothesis H0: ain 5 0 and bik 5 0 in (19.1) or αjn 5 0 and β jk 5 0 in (19.2). The parameter identification procedures for the host proteins in (19.1) and the viral proteins in (19.2) are the same. We then obtain the pruned interspecies PPMI networks for both the host and virus at the three HIV infection stages. To compare the networks of the HIV- and mock-infected CD4 1 T cells, we define an infection score using the differential PPMI network between the HIV- and mockinfected host cells at each stage as the summation of infection score of each protein in Eq. 19.6. The sizes and infection scores of pathways at different stages are given in Table 19.A1.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
TABLE 19.A1 Canonical pathway gene sets enriched by GSEA analysis at integration/replication and late stages [15]. Enricreplication stag replication stage
Size
z-Score
Enriched gene sets interacted with HIV proteins at late stage
Size
BIOCARTA_CHREBP2_PATHWAY
16
6.12
BIOCARTA_LONGEVITY_PATHWAY
7
z-Score 5.93
PID_FAK_PATHWAY
17
5.97
BIOCARTA_NDKDYNAMIN_PATHWAY
8
5.1
PID_RB_1PATHWAY
28
5.93
ST_B_CELL_ANTIGEN_RECEPTOR
18
4.73
PID_NFAT_3PATHWAY
27
5.86
REACTOME_RECYCLING_PATHWAY_OF_L1
9
4.7
BIOCARTA_AKAPCENTROSOME_PATHWAY
6
5.72
PID_PI3KCI_PATHWAY
18
4.69
ST_INTEGRIN_SIGNALING_PATHWAY
31
5.69
REACTOME_FORMATION_OF_TUBULIN_FOLDING_ INTERMEDIATES_BY_CCT_TRIC
6
4.59
REACTOME_RNA_POL_II_PRE_TRANSCRIPTION_EVENTS
19
5.68
REACTOME_IL_6_SIGNALING
3
4.5
KEGG_HEMATOPOIETIC_CELL_LINEAGE
11
5.68
PID_CXCR3_PATHWAY
16
4.43
REACTOME_SEMA4D_IN_SEMAPHORIN_SIGNALING
8
5.6
REACTOME_INTEGRATION_OF_PROVIRUS
3
4.4
REACTOME_CELL_CELL_COMMUNICATION
20
5.59
PID_PRL_SIGNALING_EVENTS_PATHWAY
6
4.37
KEGG_ARRHYTHMOGENIC_RIGHT_VENTRICULAR_CARDIOMYOPATHY_ARVC
13
5.59
KEGG_ALDOSTERONE_REGULATED_SODIUM_REABSORPTION
8
4.23
BIOCARTA_CK1_PATHWAY
8
5.53
REACTOME_POST_CHAPERONIN_TUBULIN_FOLDING_PATHWAY
6
4.09
KEGG_HYPERTROPHIC_CARDIOMYOPATHY_HCM
14
5.43
BIOCARTA_LYM_PATHWAY
3
3.87
REAC-TOME_RNA_POL_II_TRANSCRIPTION_PRE_INITIA TION_AND_PROMOTER_OPENING
11
5.42
KEGG_ENDOCYTOSIS
58
3.84
KEGG_BASAL_TRANSCRIPTION_FACTORS
10
5.39
PID_HDAC_CLASSIII_PATHWAY
8
3.72
REACTOME_SEMA4D_INDUCED_CELL_MIGRATION_AND_GROWTH_CONE_COLLAPSE
6
5.37
BIOCARTA_PTEN_PATHWAY
8
3.65
ST_JNK_MAPK_PATHWAY
20
5.21
REACTOME_CLEAVAGE_OF_GROWING_TRANSCRIPT_IN_THE_ TERMINATION_REGION_
17
3.65
BIOCARTA_KERATINOCYTE_PATHWAY
23
5.21
BIOCARTA_LAIR_PATHWAY
3
3.62
PID_PI3KCI_AKT_PATHWAY
15
5.15
BIOCARTA_MCALPAIN_PATHWAY
4
3.56
BIOCARTA_G2_PATHWAY
8
5.13
KEGG_PENTOSE_PHOSPHATE_PATHWAY
7
3.49
REACTOME_TRANSCRIPTIONAL_REGULATION_OF_WHITE_ADIPOCYTE_DIFFERENTIATION
22
5.09
REACTOME_MRNA_3_END_PROCESSING
13
3.47
PID_ATR_PATHWAY
15
5.07
REACTOME_RETROGRADE_NEUROTROPHIN_SIGNALLING
5
3.36
BIOCARTA_AKAP95_PATHWAY
3
5.03
REACTOME_SYNTHESIS_OF_PIPS_AT_THE_PLASMA_MEMBRANE
9
3.35
REACTOME_TRANSPORT_OF_MATURE_MRNA_DERIVE D_FROM_AN_INTRONLESS_TRANSCRIPT
13
4.92
BIOCARTA_ARF_PATHWAY
4
3.34
BIOCARTA_MAL_PATHWAY
11
4.81
PID_KIT_PATHWAY
23
3.32
REACTOME_TRANSPORT_OF_RIBONUCLEOPROTEINS_INTO_THE_HOST_NUCLEUS
11
4.81
SIG_PIP3_SIGNALING_IN_B_LYMPHOCYTES
10
3.31
KEGG_NUCLEOTIDE_EXCISION_REPAIR
16
4.79
PID_INTEGRIN2_PATHWAY
4
3.31
REACTOME_SLC_MEDIATED_TRANSMEMBRANE_TRANSPORT
22
4.71
KEGG_MTOR_SIGNALING_PATHWAY
21
3.25
PID_INTEGRIN3_PATHWAY
5
4.7
REACTOME_IL_RECEPTOR_SHC_SIGNALING
9
3.25
PID_HIV_NEF_PATHWAY
20
4.68
BIOCARTA_CARDIACEGF_PATHWAY
6
3.24
19.7 Appendix
FIGURE 19.A1
The specific hostpathogen cross-talk network at the late-stage infection stage [15].
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
587
588
19. Human immunodeficiency virushuman interaction networks
FIGURE 19.A2
The distribution plots of Ii and
P
N1K1Z j51
dij =linki among proteins [15].
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
19.7 Appendix
589
FIGURE 19.A3 The specific hostpathogen cross-talk network marker at the late-stage infection stage. The symbols are the same with those in Fig. 19.5 [15].
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
590
19. Human immunodeficiency virushuman interaction networks
FIGURE 19.A4
The specific hostpathogen cross-talk network at the integration/replication stage [15].
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
C H A P T E R
20 Systems multiple-molecule drug design in infectious diseases: Drugdesign specifications approach 20.1 Introduction After the introduction of multiple-molecule drug design of infectious diseases to target multiple biomarkers based on pathogenic mechanism and host-defensive mechanism of infectious diseases in Chapters 15 19, in this chapter, we will introduce systems multiplemolecule drug-design methodology of infectious diseases with less side effects based on drug-design specifications from the conventional system engineering design perspective. For almost a century, drug discovery was driven by the quest for magic bullets, which primarily followed the idea of “one drug one target one disease” [1075,1076]. Nevertheless, this concept is far from the biological reality, and even the most successful rationally designed drugs show a quite promiscuous binding behavior, that is, drugs rarely bind specifically to a single target and this fact challenges the concept of a magic bullet. In spite of the considerable progress in genome- and proteome-based high-throughput screening method and rational drug design, the number of successful single-target drugs did not increase appreciably during the past decade. Several highly efficient drugs, steroidal antiinflammatory drugs, salicyeate, metformin, or gleevec, affect many targets simultaneously [1075]. In addition, the combinatory therapy, which represents another form of multiple-molecule drugs, is used increasingly to treat many types of diseases, such as AIDS, cancer, and atherosclerosis [1076]. Therefore the use of multiple molecules is apparently an evolutionary success story of systems medicine design. In addition, traditional medical treatments often use multiple-component extracts of natural products, and the genetic and epigenetic cellular network model suggests that the partial inhibition of a surprisingly small number of targets can be more efficient than the complete inhibition of a single target. In recent, more analyses of drugs and drug-target networks have indeed shown a rich pattern of interactions among drugs and their targets [1076,1077]. Obviously, drug side effects are unavoidable for the traditional single target—drug paradigm of magic bullets since drugs acting on a single target seem to be the exception [1078,1079].
Systems Immunology and Infection Microbiology DOI: https://doi.org/10.1016/B978-0-12-816983-4.00019-5
591
© 2021 Elsevier Inc. All rights reserved.
592
20. Systems multiple-molecule drug design in infectious diseases: Drug-design specifications approach
In general, drug side effects are very complex phenomenological observations in the therapeutic treatment of human diseases. They have been attributed to a number of molecular scenarios, including the interaction with primary or additional targets, downstream pathway perturbations, kinetic and dosage effects, drug-drug interference, insufficient metabolization, the effect of active metabolites, and the aggregation or irreversible target binding of the drug [1078]. Among them, the direct interaction with proteins seems to be of the most important scenarios. Usually, these unexpected activities derived from offtargets are called drug side effects, which are unwanted and harmful to patient [1079]. In addition, the wrong selection of molecular target implies the lack of expected efficacy, which is the most important cause of failure in clinical trials [1079,1080]. The lack of efficacy problem is even prominent when dealing with complex and multifactorial diseases. Therefore the selection of right drug targets requires a complete understanding of entire molecular interconnected system where molecular drug targets play very specific roles on a large and precise molecular machinery. Consequently, we need many novel systems computational methods for drug-target identification, including molecular networks to better represent the biological system to intervene in clinical trials [1081]. Further, a network based representation enables the integration of multiple sources of information such as protein protein interaction (PPI), target druggability assessment, gene disease association, compound protein interaction or protein side effects association that eventually resembles the reality in which the decision of molecular target is based on multiple distinct factors [1082]. Often, the selected molecular drug target plays multiple cellular functions of infectious diseases. As a result, the inhibition or activation of the drug target can lead to severe side effects that do not compensate for the positive or negative ones [1083]. A less harmful alternative consists in the specific regulation of molecular interaction that is associated to the therapeutic treatment of infectious diseases. Therefore the application of host/pathogen cellular network biology can help identify new host/pathogen PPIs and gene regulations amenable to be disrupted by small molecule treatment. Unfortunately, targeting PPIs or gene regulations is a very challenging task [1084] and the progress is still limited to certain classes of PPIs or gene regulations, especially, in infectious diseases [1085]. Further, drug-target interactions are discussed in more detail from the perspective of cellular networking [1084,1086 1088]. In this chapter, we will focus on the host/pathogen network based approach to identify multiple drug targets by the investigation of pathogenic and host-defensive mechanism in the infectious process. Based on the above mentioned analyses, an integration of computational network based approaches for multiple drug targets with drug data mining for multiple-molecule drugs of infectious diseases could not only reduce the time and expenses of preclinical stages but also lead to more precise medicines with less side effects that will eventually be translated into lower attrition rates due to multiple drug targets and multiple-molecule drugs. From the flowchart of systems drug discovery of infectious diseases in Fig. 20.1, the first step is to construct a candidate host/pathogen genetic and epigenetic network (GEN) for infectious diseases by big database mining. Since there are a large amount of false positives in the candidate host/pathogen GEN, the network identification method needs to be used to prune these false positives to obtain the real host/pathogen GENs of every stage of infection by parameter estimation scheme and system order detection method via genomewide two-sided high-throughput data of different stages of infections (i.e., two-sided microarray
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
593
20.1 Introduction
FIGURE 20.1 The flowchart of systems drug discovery of infectious diseases based on design specifications for multiple-molecule drugs through big data mining, network identification, multiple drug target selection, and drug data mining [16].
data, next-generation sequencing data or protein PCR (polymerase chain reaction) data [879]). Since host/pathogen GENs of infectious diseases are still very complex, it is not easy to find pathogenic and host-defensive mechanisms to identify drug targets for drug discovery from them. A principal network projection (PNP) method is employed to extract core host/pathogen GENs from host/pathogen GENs based on the principal component analysis (PCA) of host/pathogen cross-talk GENs in different stages of infection [9,687,903,1089]. Then, we project the core host/pathogen cross-talk GENs to KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways (i.e., denotation based on KEGG pathways) to obtain the core host/pathogen cross-talk pathways from which we could
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
594
20. Systems multiple-molecule drug design in infectious diseases: Drug-design specifications approach
investigate the pathogenic mechanism and host-defensive mechanism based on the changes of core host/pathogen cross-talk pathways of different stages of infection. We could also identify some significant biomarkers for multiple drug targets based on pathogenic mechanism and host-defensive mechanism in the infectious process. In order to efficiently design multiple-molecule drugs with less side effects to repress the pathogenic targets and to restore the cellular dysfunctions of these multiple drug targets of host cells to their normal cellular functions, those highly activated drug targets must be inhibited, those highly repressed drug targets must be activated, and those genes or proteins of host cells with housekeeping or without differential expression should be less influenced. These therapeutic requisitions and limitations on the drug design could be considered as drug-design specifications of systems multiple-molecule drug design with less side effects in the therapeutic treatment of infectious diseases such as design specifications for systematic design to achieve multipurposes in the system engineering design [1090]. Connectivity Map (CMAP) and DGIdb databases could provide a large amount of compounds (drug molecules), among which some existing compounds have been approved by the FDA and some potential compounds have not been approved by FDA yet. By drug data mining method and DGIdb databases, we could find some compounds (drug molecules) to most meet these drug-design specifications as the multiple-molecule drugs for therapeutic treatment of infectious diseases with less side effects. Finally, some design examples of multiple-molecule drugs with less side effects are introduced for infectious diseases to illustrate the proposed systematic design specifications scheme by repressing genetic expressions of pathogenic genes, activating genetic expression of inhibited genes, and restoring to the normal genetic expressions of these drug targets of host cells without influencing the expressions of their housekeeping genes and proteins via drug data mining.
20.2 Systems drug-design method in infectious diseases 20.2.1 Systems multiple drug design of infectious diseases with less side effects: Drug-design specification approach From the systems biology perspective [879], an infectious disease is caused by a perturbation of GEN of host cells by pathogenic genes in the infection process. Obviously, the first thing of systems medicine is how to design multiple-molecule drugs to repress the pathogenic genes and to remove the perturbation of GEN to restore to their normal cellular functions of perturbed GEN of host cells in infection. In general, the perturbed GEN is very complex, it is not easy to restore its whole cellular functions. In this situation, we could restore cellular dysfunctions to normal cellular functions of some significant drug targets selected from network biomarkers based on the host-defensive mechanism and the pathogenic molecule mechanism of infection disease. In order to investigate pathogenetic molecule mechanism and host-defensive mechanism, we need to construct the host/pathogen cross-talk GENs at different stages of infection by big data mining and system identification method via two-sided genome-wide high-throughput data [879,1087 1089]. The systematic design procedure of multiple-molecule drugs for infectious diseases in Fig. 20.1 will be discussed in detail in the following.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
20.2 Systems drug-design method in infectious diseases
595
20.2.2 Identification of multiple drug targets for infectious diseases via systems biology method Network models suggest that partial inhibition and activation of a small number of drug targets can be more efficient than the complete inhibition of a single target because cellular networks are robust and could prevent major changes due to one target [1091]. The combinational therapies of multiple-molecule drugs could lead us to suggest that the systematic drug-design strategies should be directed against multiple drug targets and that this novel drug-design paradigm might often result in the development of more efficient molecules than the current favored single-target drugs [1092,1093]. In general, the final effect of partial, but multiple, drug actions might often surpass that of a complete drug action at a single target. Therefore the future success of this novel drug-design paradigm will depend not only on its system model to identify the correct multiple drug targets, to find their multiple fitting, low-affinity drug molecules but also on its moreefficiency in vivo testing [1094,1095]. A comparison of various strategies suggests that multiple but partial attacks on the carefully selected targets are almost inevitably more efficient than the knockout of a single target through an equal and well selection [1091,1096 1098]. A plausible explanation for this higher efficiency might be that even partial but multitarget attacks could block an increased number of individual interactions or network links in host/pathogen GENs in infectious diseases. The reason underlying the attack efficiency of multiple targets is easier to understand from the systematic point of view [1099 1101]. In general, multiple target attacks are much better because they affect the host/pathogen GEN at more sites if they are distributed in the entire host/pathogen cross-talk network of infectious diseases [1091,1093,1094]. In order to gain much better multiple target attacks, a structure- and ligand-based drug design has been further discussed in Refs. [1094,1095,1102]. Based on the above analysis, the first step of the systems multiple drug design is to find a network biomarker as multiple drug targets in the therapeutic treatment of infectious diseases. To identify a network biomarker for multiple drug targets, we need to investigate pathogenic mechanism and defensive mechanism by comparing the progression molecular mechanisms between connective stages of an infectious disease. Before that, we need to identify host/pathogen GENs and further extract core host/pathogen cross-talk pathways by projecting core host/pathogen GENs to KEGG pathways. The steps for constructing host/pathogen GENs, core host/pathogen GENs and core host/pathogen cross-talk pathways for pathogenic mechanisms and host-defensive mechanism of different stages of an infectious disease by big data mining, system modeling, system identification, and PNP via two-sided high-throughput data are shown in Fig. 20.1. The procedure of constructing core host/pathogen GENs of infectious diseases can be divided to the following four steps: (1) we employed big data mining and made preprocessing of two-sided gene/miRNA (microRNA)/lncRNA expression data and DNA methylation data; (2) we then constructed candidate host/pathogen GENs by using candidate host/pathogen PPI network and candidate host/pathogen gene/miRNA/lncRNA regulatory networks in infectious diseases; (3) we identified real host/pathogen cross-talk GEN of each stage of an infectious disease by using the two-sided genome-wide high-throughput data of the infectious disease; (4) we then applied the PNP method to extract the core nodes of host/pathogen GENs such
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
596
20. Systems multiple-molecule drug design in infectious diseases: Drug-design specifications approach
as core proteins, genes, miRNAs, and lncRNAs to construct core host/pathogen GENs in different stages of an infectious disease [9,687,903,1089]. By comparing the core host/pathogen GENs between two connective stages of an infectious disease and projecting them to KEGG pathways to get core host/pathogen cross-talk signaling pathways of the infectious disease, we then could extract differential core host/pathogen signaling pathways between different stages of infection to get insight into pathogenic mechanism and host-defensive mechanism of the infectious disease. Based on pathogenic mechanism and host-defensive mechanism, we could select multiple biomarkers for multiple drug targets to design a multiple-molecule drug by applying drug database mining to CMAP and DGIdb databases for therapeutic treatment of the infectious disease [1100,1101]. CMAP provides the levels of 14,825 genes under 6100 different conditions containing 1327 different compounds (i.e., molecular drugs) and different concentrations of these compounds [813]. The correlation coefficients between the gene expression levels and the concentrations of compounds denote the relationship between compounds and genes. If the correlation coefficient is greater than zero or some positive value, the gene is said to be upregulated by applying the compound. If the correlation coefficient is less than zero or some negative value, the gene is said to be downregulated by applying the compound. After applying correlation coefficient between drug concentration and mRNA activity in microarray data of CMAP, we can then rank different compounds (molecular drugs) based on the successful drug target cases and the number of unaffected housekeeping genes of host cells, which cannot be affected by the drug. How to design multiple-molecule drugs for infectious diseases with less side effects will be discussed in the followings.
20.2.3 Multiple-molecule drug design of infectious diseases with less side effects based on drug-design specifications In the section, systems multiple-molecule drug design for a disease with less side effects will be discussed based on drug-design specifications: (1) the repression of pathogenic drug targets, (2) the restoration of cellular dysfunctions to normal cellular functions of those selected multiple drug targets, (3) less side effects on housekeeping and nondifferential expression genes or proteins of host cells in the above section. In the following the proposed drug-design examples for infectious diseases will be divided into two classes: one for bacterial infection disease and another for viral infection disease. For infectious diseases, we need to select multiple molecules to destroy cellular functions of core toxic proteins of pathogen and restore the host cellular dysfunctions due to pathogenic toxicity to normal cellular functions without side effects on host cells simultaneously. They will be discussed in detail as in the following subsections.
20.2.4 Multiple-molecule drug design with less side effects in bacterial infection diseases Clostridium difficile is the leading cause of nosocomial antibiotic-associated diarrhea, and the major etiologic agent of pseudomembranous colitis. In severe cases, C. difficile infection
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
20.2 Systems drug-design method in infectious diseases
597
(CDI) can cause toxic megacolon, intestinal perforation, and death as shown in Chapter 16, Investigating the Host/Pathogen Cross-Talk Mechanism During Clostridium difficile Infection for Drug Targets by Constructing Genetic and Epigenetic Interspecies Networks Using Systems Biology Method. The intestinal epithelium is the first tissue encountered in the adhesion and colonization of C. difficile and serves as a physical defense barrier against infection. Based on systems biology method via the host/pathogen two-sided microarray data, system identification, and PNP method in Fig. 20.1, we could obtain core cross-talk GENs of the early and late CDI, which could be projected to KEGG pathways to find the core host/pathogen cross-talk pathways for the infection and defense mechanism of host and pathogen in the early and late CDI. From the infectious and defensive mechanism investigated by two core cross-talk pathways in the early and late stage of CDI in Chapter 16, Investigating the Host/Pathogen Cross-Talk Mechanism During Clostridium difficile Infection for Drug Targets by Constructing Genetic and Epigenetic Interspecies Networks Using Systems Biology Method, we found the following two cell wall proteins: CD2787 and CD0237
(20.1)
play an important role in cell adhesion and pathogen defense mechanism; the following crucial proteins [1089]: CD1214; CD2629; and CD2643
(20.2)
are employed by C. difficile for sporulation. Therefore, by drug data mining based on the proposed drug-design specifications, we propose a potential multiple-molecule drug, which contains E64, IgY, REP3123, camptothecin, and apigenin for the therapeutic treatment of CDI owing to their abilities to inhibit the above targets proteins in (20.1) and (20.2) and to maintain the homeostasis of host dysfunctional proteins [1089]. The combination of camptothecin and apigenin can upregulate the expression of dysfunctional proteins and downregulate the inflammationand apoptosis-related protein of host cells. This multiple-molecule drug could induce an efficient prevention and elimination of C. difficile and some remedial effects to restore gene expression homoeostasis of host cells. The cysteine protease inhibitors E64 and CD0237specific IgY can inhibit the activities of CD2787 and CD0237 in (20.1), thus interfering with cell adhesion and cell surface protein maturation [762]. In addition, the inhibition of CD2787 and CD0237 in (20.1) will limit the toxin production and the formation of biofilm, reducing not only the probability of cell adhesion but also the cytotoxicity of C. difficile. REP3123 can repress the spore formation and toxin production of C. difficile. The repressed toxin production could limit the pathogenesis progression and the inhibition of sporulation could prevent the spore-mediated reinfection [1089]. Furthermore, the combination of human molecule drugs (camptothecin and epigenin) can promote the expression of dysfunctional proteins (RHOA, CDC42, RAC1, HSP90B1, HSPA5, and HSP90B2P) and repress inflammation-related proteins (NFKB1, REL, and IL-8) against the severe pathogenic effects induced by C. difficile. They could also provide the potential antibiotic activity based on recent studies. The results in Ref. [1089] also suggest that CD2356, CD0171, and CD0179 could participate in the defense mechanisms of C. difficile against the oxidative stress. The cooperation among these proteins provides a well-designed protection against human-produced ROS. The inhibition of these antioxidative proteins could facilitate the
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
598
20. Systems multiple-molecule drug design in infectious diseases: Drug-design specifications approach
host eliminating ability against pathogens, and the scattered ROS can induce the rapid necrosis of pathogen cells. Therefore CD2356, CD0171, CD1064, and CD0129 are recommended as potential drug targets for further drug design [1089].
20.2.5 Multiple-molecule drug design with less side effects in viral infection disease HIV infection could deplete human CD41 T cells, which are critical for immune responses. Thus HIV infection could impair the immune system and cause AIDS. HIV prevention and therapeutic treatment have become an important medical topic in this century. Although HIV can now be managed as a chronic disease through the use of highly active antiretroviral therapy [1100], curing the HIV infection remains a significant challenge. Because it has been demonstrated that a mutation in HIV-1 conferring drug resistance can occur within a single day due to its high rates of mutation, methylation, and recombination, HIV can quickly evolve resistance to antiretroviral drugs. In this situation, a clear understanding of molecular pathogenic mechanism and host-defensive mechanism of HIV infection is required for an efficient antiretroviral multiple-molecule drug design with less side effects. Based on the systems biology method via host/pathogen twosided omics data, including time-course data from high-throughput sequencing, realtime PCR, and human miRNA and PPI to construct a host/pathogen protein protein and miRNA interaction (PPMI) network of human CD41 T cells during HIV-1 infection through system modeling and identification as shown in Fig. 20.1. In Chapter 19, HIV Human Interaction Networks Investigating Pathogenic Mechanism via for Drug Discovery: A Systems Biology Approach, the pathogenic mechanisms based on the specific PPMI network at the three stages: (1) the reverse transcription, (2) the integration/replication, and (3) the late stages are shown in Fig. 19.6. At the reverse transcription stage, the following gene is highly expressed [1100]: CPNE3
(20.3)
and the following genes and miRNA are highly repressed in the HIV-1 infected T cell [1100]: ADRBK1; CSK; PRKCZ; SLC9A3R1; CTCF; H2AFX; NDUFS8; PTBP1; miR-30a
(20.4)
Therefore the drug-design specifications are to discover multiple-molecule drugs to inhibit the highly expressed gene CPNE3 in (20.3) and to activate the highly repressed genes in (20.4) without influence on the housekeeping genes of T cell. The strategy to identify a potential multiple-molecule drug to meet the above drug-design specifications of reverse transcription stage is based on the drug response genome-wide microarray data. We could use the correlation coefficients between drug gene pairs to determine their suitability for the treatment of reverse transcription stage HIV. If the correlation coefficient of a drug gene pair was larger than 0.3, the expression of the gene can be activated by the drug, whereas if the correlation coefficient of a drug gene pair is less than 20.3, expression gene can be inhibited by the drug. If the correlation coefficient is about 0, the gene
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
20.3 Discussion
599
expression is not affected by the drug. Based on these drug-design specifications, we could obtain a multiple-molecule drug including thalidomide, oxaprozin, and metformin to treat patients with HIV-1 infection at the reverse transcription stage. At the integration/replication stage, as shown in Chapter 19, HIV Human Interaction Networks Investigating Pathogenic Mechanism via for Drug Discovery: A Systems Biology Approach, the following gene is highly expressed [1100]: NCOA3
(20.5)
and the following genes and miRNA are highly repressed in HIV-1 infected T cells [1100]: STAT5B; CTNNB1; KAT5; MMS19; DYRK2; TERF2IP; MLH1; SHC1; PPP2CA; and miR320a
(20.6)
We need to discover a potential multiple-molecule drug to repress the highly expressed gene in (20.5) and to activate those highly repressed genes in (20.6) to restore the genes in (20.5) and (20.6) to their normal gene expressions without influence on housekeeping genes. By the use of the correlation coefficients between drug gene pairs, we propose a multiple-molecule drug with a combination of quercetin, nifedipine, and fenbendazole to treat patients with HIV-1 infection at the integration/replication stage. As shown in Chapter 19, HIV Human Interaction Networks Investigating Pathogenic Mechanism via for Drug Discovery: A Systems Biology Approach, the following genes are highly expressed in the late stage of HIV-1 infection [1100]: KRAS; RAF1; CRK; CTBP1; DARS
(20.7)
and the following genes are highly repressed [1100]: MAPK3; CCNE1; HSP90AB1; CTNNB1; BCL2L1; E1F5A; RPL6; EPRS; RPS2; EEF1D
(20.8)
For the therapeutic treatment of HIV-1 infection at the late stage, we need to discover a multiple-molecule drug from drug database mining to repress those the highly expressed genes in (20.7) and to activate those highly repressed genes in (20.8) to restore to their normal expressions without influence on housekeeping genes. By the use of the correlation coefficients between drug gene pairs, we propose a combination of staurosporine, quercetin, prednisolone, and flufenamic acid as the multiple-molecule drug to treat patients with HIV-1 infection at the late stage.
20.3 Discussion In practical engineering designs [1090], we are beforehand with the design specifications including the desired targets, specific limitations, and performance indices. For example, control engineers design a controller for a missile to attack an aircraft. First, engineers need to know the system model of the missile, which must be identified from flight experimental data. In general, the system model of missile is a very complex dynamic system of nonlinear and partial differential equations. For the convenience of control design the complex dynamic model must be reduced to a simple significant (core) model by model reduction method via
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
600
20. Systems multiple-molecule drug design in infectious diseases: Drug-design specifications approach
PCA method [1086]. In order to attack an aircraft a robust controller must be designed for the missile to robustly track the desired target under some design specifications in order to tolerate uncertainties, modeling errors, external disturbance, and some physical constraints. Based on the systematic design methodology of meeting the prescribed design specifications, the achievement in control system engineering designs is outstanding [1090]. Even the biological systems are more complex than the engineering systems, the systematic design methodology may be useful for systems medicine design of infectious diseases. Based on this systematic design methodology, we use big data mining method to construct candidate host/pathogen GEN and to prune the false positives of candidate host/pathogen GEN to obtain host/pathogen GENs of an infectious disease by its host/pathogen two-sided genome-wide high-throughput data because the biological systems are more complex than the physical systems in engineering fields and cannot be modeled by conventional physical laws. The identification of host/pathogen GENs in different stages of a disease by host/pathogen two-sided genome-wide high-throughput data is like the system identification for the missile model by experimental flight data. Complex physical systems (especially for nonlinear partial differential missile system) are not easily designed and should be truncated to more simple significant models so that a robust simple controller needs to be designed. Similarly, the host/pathogen GENs are still too complex to analyze for the pathogenic mechanism and host-defensive mechanism to investigate a set of multiple drug targets for multiple-molecule drug design. Therefore the PNP method needs to be employed to extract core host/pathogen GENs for investigating pathogenic mechanism and host-defensive mechanism of an infectious disease to identify its multiple drug targets. Finally, a robust control based on simple significant system model is designed to achieve a desired target tracking under several design constraints and requirements (i.e., design specifications). Similarly, the proposed systems drug-design method needs some drug-design specifications to achieve some desired therapeutic targets with less side effects, that is, to activate some drug targets, which are highly repressed in infectious disease, to inhibit some drug targets, which are highly activated in infectious disease, and not to influence some housekeeping genes or proteins. These prescribed therapeutic goals, constraints, and limitations, which could restore cellular dysfunctions to normal cellular functions of drug targets with less side effects, are called design specifications of systems medicine design. Therefore, based on systematic design methodology in engineering, we proposed a systems drug-design methodology in Fig. 20.1. First, we construct the real host/pathogen GENs by big database mining, system modeling, and high-throughput data identification method. Then, we extract the core host/pathogen GENs by PNP not only to investigate pathogenic mechanism and host-defensive mechanism of an infectious disease but also to identify the significant drug targets for multiple-molecule drug design. The activation of these highly repressed targets, the inhibition of those highly activated targets and no influence on nondifferential and housekeeping genes are considered as the design specifications of systems medicine design. Therefore we employ systems biology and drug data mining method to select a set of multiple molecules to meet the above-designed specifications as potential multiple-molecule drugs for in vivo testing of the therapeutic treatment of an infectious disease.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
20.4 Conclusion
601
20.4 Conclusion In order to make drugs more workable and safe in the therapeutic process of infectious diseases, a systems drug-design method is proposed for the design of multiple-molecule drugs with less side effects. Based on systematic control design methodology with robust control design purpose, to efficiently achieve desired therapeutic targets under several drug-design specifications through a simple network model, simple core host/pathogen GENs are constructed through big database mining, system modeling, and PNP method through host/pathogen two-sided genome-wide high-throughput data at different infectious stages. We then project core host/pathogen GENs to KEGG pathways to obtain core host/pathogen cross-talk pathways to investigate the pathogenic mechanism and hostdefensive mechanism for the identification of the network biomarker from which a set of multiple drug targets could be selected for systems drug design in the therapeutic treatment of the infectious diseases. The activation of highly repressed targets and the inhibition of highly activated targets to restore their cellular dysfunctions to normal cellular functions with less side effects on the housekeeping and nondifferential expression genes or proteins can be considered as the drug-design specifications of systems drug design as the outstanding systematic engineering designs. Then we employ drug data mining method to select adequate multiple drug molecules from drug databases mining to meet the above drug-design specifications as a potential multiple-molecule drug with less side effects for the infectious disease. Two design examples of bacterial infectious diseases and viral infectious diseases are also given to illustrate the proposed systems multiplemolecule drug-design method with less side effects.
VI. Systematic Genetic and Epigenetic Pathogenic/Defensive Mechanism and Systems Drug Design
References [1] Lund MN, Lundegaard C, Kesmir C, Brunak S. Immunological bioinformatics. Cambridge, MA: MIT Press; 2005. [2] De RK, Tomar N, editors. Immunoinformatics. 2nd ed. New York: Humana Press, Springer; 2014. [3] Chen BS, Yang SK, Lan CY, Chuang YJ. A systems biology approach to construct the gene regulatory network of systemic inflammation via microarray and databases mining. BMC Med Genomics 2008;1:46. [4] Wang YC, Lan CY, Hsieh WP, Murillo L, Agabian N, Chen BS. Global screening of potential Candida albicans biofilm-related transcription factors via network comparison. BMC Bioinformatics 2010;11:53. [5] Yang SK, Wang YC, Chao CC, Chuang YJ, Lan CY, Chen BS. Dynamic cross-talk analysis among TNF-R, TLR-4 and IL-1R signalings in TNFalpha-induced inflammatory responses. BMC Med Genomics 2010;3:19. [6] Wang YC, Lin C, Chuang MT, Hsieh WP, Lan CY, Chuang YJ, et al. Interspecies protein-protein interaction network construction for characterization of host-pathogen interactions: a Candida albicans-zebrafish interaction study. BMC Syst Biol 2013;7:79. [7] Wang YC, Huang SH, Lan CY, Chen BS. Prediction of phenotype-associated genes via a cellular network approach: a Candida albicans infection case study. PLoS One 2012;7(4):e35339. [8] Kuo ZY, Chuang YJ, Chao CC, Liu FC, Lan CY, Chen BS. Identification of infection-related and defenserelated genes via a dynamic host-pathogen interaction network using a C. albicans-zebrafish infection model. J Innate Immun 2013;5(2):13752. [9] Wu CC, Chen BS. Key immune events of the pathomechanisms of early cardioembolic stroke: multi-database mining and systems biology approach. Int J Mol Sci 2016;17(3):305. [10] Lin C, Lin CN, Wang YC, Liu FY, Chuang YJ, Lan CY, et al. The role of TGF-β signaling and apoptosis in innate and adaptive immunity in zebrafish: a systems biology approach. BMC Syst Biol 2014;8:116. [11] Li CW, Lee YL, Chen BS. Genetic-and-epigenetic interspecies networks for cross-talk mechanisms in human macrophages and dendritic cells during MTB infection. Front Cell Infect Microbiol 2016;6:124. [12] Li CW, Su MH, Chen BS. Investigation of the cross-talk mechanism in Caco-2 cells during Clostridium difficile infection by construction genetic and epigenetic interspecies networks: big data mining and genome-wide identification. Front Immunol 2017;8(901). [13] Yeh SJ, Yeh CC, Lan CY, Chen BS. Investigating the common pathogenic mechanism for drug design between different strains of Candida albicans infection in OKF6/TERT-2 cells by comparing their genetic and epigenetic interspecies networks: big data mining and computational system biology approaches. Front Immunol 2019;11(2):119. [14] Li CW, Jheng BR, Chen BS. Constructing the genome-wide interspecies genetic- and epigenetic-networks and the molecular mechanisms for human B lymphocytes infected with Epstein-Barr virus via big data mining and genome-wide identification. PLoS One 2018;13(8):e0202537. [15] Li CW, Chen BS. Investigating the shift of host-pathogen interaction network marker to reveal the pathogenic and host defense mechanisms in the HIV infection: a systems biology approach via big data mining. Curr HIV Res 2018;16:7795. [16] Chen BS. Systems multiple molecule drug design with less side-effects via drug data mining and genomewide data identification: drug design specifications approach. Australas Med J 2018;11(6):3619. [17] Johanson R. System modeling and identification. Prentice Hall; 1993. [18] Kitano H. Computational systems biology. Nature 2002;420:20610. [19] Kitano H. Systems biology: a brief overview. Science 2002;295:16624. [20] Lin LH, Lee HC, Li WH, Chen BS. Dynamic modeling of cis-regulatory circuits and gene expression prediction via cross-gene identification. BMC Bioinformatics 2005;6:258. [21] Vu TT, Vohradsky J. Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae. Nucleic Acids Res 2007;35:27987.
603
604
References
[22] Baldwin Jr. AS. The NF-kappa B and I kappa B proteins: new discoveries and insights. Annu Rev Immunol 1996;14:64983. [23] Ghosh S, May MJ, Kopp EB. NF-kappa B and Rel proteins: evolutionarily conserved mediators of immune responses. Annu Rev Immunol 1998;16:22560. [24] Hayden MS, Ghosh S. Signaling to NF-kappaB. Genes Dev 2004;18:2195224. [25] Makarov SS. NF-kappa B in rheumatoid arthritis: a pivotal regulator of inflammation, hyperplasia, and tissue destruction. Arthritis Res 2001;3:2006. [26] Coussens LM, Werb Z. Inflammation and cancer. Nature 2002;420:8607. [27] Calvano SE, Xiao W, Richards DR, Felciano RM, Baker HV, Cho RJ, et al. A network-based analysis of systemic inflammation in humans. Nature 2005;437:10327. [28] West MA, Shapiro MB, Nathens AB, Johnson JL, Moore EE, Minei JP, et al., Inflammation and the Host Response to Injury Collaborative Research Program. Inflammation and the host response to injury, a largescale collaborative project: Patient-oriented research core-standard operating procedures for clinical care. IV. Guidelines for transfusion in the trauma patient. J Trauma 2006;61(2):4369. [29] Tegner J, Yeung MK, Hasty J, Collins JJ. Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling. Proc Natl Acad Sci USA 2003;100:59449. [30] Zou M, Conzen SD. A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics 2005;21:719. [31] Santamaria P. Cytokines and chemokines in autoimmune disease: an overview. Adv Exp Med Biol 2003;520:17. [32] Foxwell BM, Bondeson J, Brennan F, Feldmann M. Adenoviral transgene delivery provides an approach to identifying important molecular processes in inflammation: evidence for heterogenecity in the requirement for NFkappaB in tumour necrosis factor production. Ann Rheum Dis Nov 2000;59(Suppl. 1):i549. [33] Kitano H, Oda K. Robustness trade-offs and host-microbial symbiosis in the immune system. Mol Syst Biol 2006;2:2006.0022. [34] Werner SL, Barken D, Hoffmann A. Stimulus specificity of gene expression programs determined by temporal control of IKK activity. Science 2005;309:185761. [35] Muzio M, Polentarutti N, Bosisio D, Prahladan MK, Mantovani A. Toll-like receptors: a growing family of immune receptors that are differentially expressed and regulated by different leukocytes. J Leukoc Biol 2000;67:4506. [36] Systems biology in practice: concepts, implementation and application, Wiley-Blackwell, vol. 43. Choice: Current Reviews for Academic Libraries; 2005. p. 685. [37] Bar-Joseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, Robert F, et al. Computational discovery of gene modules and regulatory networks. Nat Biotechnol 2003;21:133742. [38] Hood L. Systems biology: integrating technology, biology, and computation. Mech Ageing Dev 2003;124:916. [39] Davidson EH, McClay DR, Hood L. Regulatory gene networks and the properties of the developmental process. Proc Natl Acad Sci USA 2003;100:147580. [40] Johansson R. System modeling and identification. Prentice Hall; 1993. [41] Wu WS, Li WH, Chen BS. Computational reconstruction of transcriptional regulatory modules of the yeast cell cycle. BMC Bioinformatics 2006;7:421. [42] Pahl HL. Activators and target genes of Rel/NF-kappaB transcription factors. Oncogene 1999;18:685366. [43] Klipp E, Nordlander B, Kruger R, Gennemark P, Hohmann S. Integrative model of the response of yeast to osmotic shock. Nat Biotechnol 2005;23:97582. [44] Breitkreutz BJ, Stark C, Tyers M. Osprey: a network visualization system,”. Genome Biol 2003;4:R22. [45] Oyama N, Iwatsuki K, Homma Y, Kaneko F. Induction of transcription factor AP-2 by inflammatory cytokines in human keratinocytes. J Invest Dermatol, 113. 1999. p. 6006. [46] Murakami S, Lefebvre V, de Crombrugghe B. “Potent inhibition of the master chondrogenic factor Sox9 gene by interleukin-1 and tumor necrosis factor-alpha. J Biol Chem 2000;275:368792. [47] Fukuda K, Yoshida H, Sato T, Furumoto TA, Mizutani-Koseki Y, Suzuki Y, et al. Mesenchymal expression of Foxl1, a winged helix transcriptional factor, regulates generation and maintenance of gut-associated lymphoid organs. Dev Biol 2003;255:27889.
References
605
[48] Imagawa S, Nakano Y, Obara N, Suzuki N, Doi T, Kodama T, et al. A GATA-specific inhibitor (K-7174) rescues anemia induced by IL-1beta, TNF-alpha, or L-NMMA. FASEB J 2003;17:17424. [49] Koyano S, Saito Y, Sai K, Kurose K, Ozawa S, Nakajima T, et al. Novel genetic polymorphisms in the NR3C1 (glucocorticoid receptor) gene in a Japanese population. Drug Metab Pharmacokinet 2005;20:7984. [50] Nakano Y, Imagawa S, Matsumoto K, Stockmann C, Obara N, Suzuki N, et al. Oral administration of K11706 inhibits GATA binding activity, enhances hypoxia-inducible factor 1 binding activity, and restores indicators in an in vivo mouse model of anemia of chronic disease. Blood 2004;104:43007. [51] Choi SJ, Oba T, Callander NS, Jelinek DF, Roodman GD. AML-1A and AML-1B regulation of MIP-1alpha expression in multiple myeloma. Blood 2003;101:377883. [52] Hawkins GA, Amelung PJ, Smith RS, Jongepier H, Howard TD, Koppelman GH, et al. Identification of polymorphisms in the human glucocorticoid receptor gene (NR3C1) in a multi-racial asthma case and control screening panel. DNA Seq 2004;15:16773. [53] Kitano H. Biological robustness. Nat Rev Genet 2004;5:82637. [54] Albert R. Scale-free networks in cell biology. J Cell Sci 2005;118:494757. [55] Boldrick JC, Alizadeh AA, Diehn M, Dudoit S, Liu CL, Belcher CE, et al. Stereotyped and specific gene expression programs in human innate immune responses to bacteria. Proc Natl Acad Sci USA 2002;99:9727. [56] Le Y, Murphy PM, Wang JM. Formyl-peptide receptors revisited. Trends Immunol 2002;23:5418. [57] Kitano H. Robustness from top to bottom. Nat Genet 2006;38:133 02//print. [58] Zhang S, Jin G, Zhang XS, Chen L. Discovering functions and revealing mechanisms at molecular level from biological networks. Proteomics 2007;7:285669. [59] Theilgaard-Monch K, Porse BT, Borregaard N. Systems biology of neutrophil differentiation and immune response. Curr Opin Immunol 2006;18:5460. [60] Ichikawa JK, English SB, Wolfgang MC, Jackson R, Butte AJ, Lory S. Genome-wide analysis of host responses to the Pseudomonas aeruginosa type III secretion system yields synergistic effects. Cell Microbiol 2005;7:163546. [61] Eswarappa SM. Location of pathogenic bacteria during persistent infections: insights from an analysis using game theory. PLoS One 2009;4:e5383. [62] Smith KD, Bolouri H. Dissecting innate immune responses with the tools of systems biology. Curr Opin Immunol 2005;17:4954. [63] Girardin E, Grau GE, Dayer JM, Roux-Lombard P, Lambert PH. Tumor necrosis factor and interleukin-1 in the serum of children with severe infectious purpura. N Engl J Med 1988;319:397400. [64] Baglioni C. Mechanisms of cytotoxicity, cytolysis, and growth stimulation by TNF. In: Beutler B, editor. Tumor necrosis factors: the molecules and their emerging role in medicine. Raven Press, 1992. p. 4258. [65] Felson DT, Anderson JJ, Boers M, Bombardier C, Furst D, Goldsmith C, et al. American College of Rheumatology. Preliminary definition of improvement in rheumatoid arthritis. Arthritis Rheum 1995;38:72735. [66] Clark IA. Along a TNF-paved road from dead parasites in red cells to cerebral malaria, and beyond. Parasitology 2009;136:145768. [67] Berk BC, Abe JI, Min W, Surapisitchat J, Yan C. Endothelial atheroprotective and anti-inflammatory mechanisms. Ann NY Acad Sci 2001;947:93109 discussion 10911. [68] Verstrepen JL, Bekaert T, Chau TL, Tavernier J, Chariot A, Beyaert R. TLR-4, IL-1R and TNF-R signaling to NF-kappaB: variations on a common theme. Cell Mol Life Sci 2008;65:296478. [69] Brockman JA, Scherer DC, McKinsey TA, Hall SM, Qi X, Lee WY, et al. Coupling of a signal response domain in I kappa B alpha to multiple pathways for NF-kappa B activation. Mol Cell Biol 1995;15:280918. [70] Li H, Lin X. Positive and negative signaling components involved in TNFalpha-induced NF-kappaB activation. Cytokine 2008;41:18. [71] Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 1999;27:2934. [72] Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, et al. Human protein reference database—2009 update. Nucleic Acids Res 2009;37:D76772. [73] Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006;34:D5359.
606
References
[74] Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, et al. STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res 2009;37:D41216. [75] Aggarwal BB. Signalling pathways of the TNF superfamily: a double-edged sword. Nat Rev Immunol 2003;3:74556. [76] Tracey KJ, Cerami A. Tumor necrosis factor: a pleiotropic cytokine and therapeutic target. Annu Rev Med 1994;45:491503. [77] Chen NJ, Chio II, Lin WJ, Duncan G, Chau H, Katz D, et al. Beyond tumor necrosis factor receptor: TRADD signaling in Toll-like receptors. Proc Natl Acad Sci USA 2008;105:1242934. [78] Ji H, Pettit A, Ohmura K, Ortiz-Lopez A, Duchatelle V, Degott C, et al. Critical roles for interleukin 1 and tumor necrosis factor alpha in antibody-induced arthritis. J Exp Med 2002;196:7785. [79] Lang T, Mansell A. The negative regulation of Toll-like receptor and associated pathways. Immunol Cell Biol 2007;85:42534. [80] Ludwig A, Fechner M, Wilck N, Meiners S, Grimbo N, Baumann G, et al. Potent anti-inflammatory effects of low-dose proteasome inhibition in the vascular system. J Mol Med (Berl) 2009;87:793802. [81] Akaike H. A new look at the statistical model identification. IEEE Trans Autom Control 1974;19:71623. [82] Wang YC, Chen BS. Integrated cellular network of transcription regulations and protein-protein interactions. BMC Syst Biol 2010;4:20. [83] Turner NA, Mughal RS, Warburton P, O’Regan DJ, Ball SG, Porter KE. Mechanism of TNFalpha-induced IL1alpha, IL-1beta and IL-6 expression in human cardiac fibroblasts: effects of statins and thiazolidinediones. Cardiovasc Res 2007;76:8190. [84] Greenfeder SA, Nunes P, Kwee L, Labow M, Chizzonite RA, Ju G. Molecular cloning and characterization of a second subunit of the interleukin 1 receptor complex. J Biol Chem 1995;270:1375765. [85] Wesche H, Henzel WJ, Shillinglaw W, Li S, Cao Z. MyD88: an adapter that recruits IRAK to the IL-1 receptor complex. Immunity 1997;7:83747. [86] Medzhitov R, Preston-Hurlburt P, Kopp E, Stadlen A, Chen C, Ghosh S, et al. MyD88 is an adaptor protein in the hToll/IL-1 receptor family signaling pathways. Mol Cell 1998;2:2538. [87] Wada Y, Ohta Y, Xu M, Tsutsumi S, Minami T, Inoue K, et al. A wave of nascent transcription on activated human genes. Proc Natl Acad Sci USA 2009;106:1835761. [88] Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003;13:2498504. [89] Hsu H, Shu HB, Pan MG, Goeddel DV. TRADD-TRAF2 and TRADD-FADD interactions define two distinct TNF receptor 1 signal transduction pathways. Cell 1996;84:299308. [90] Legler DF, Micheau O, Doucey MA, Tschopp J, Bron C. Recruitment of TNF receptor 1 to lipid rafts is essential for TNFalpha-mediated NF-kappaB activation. Immunity 2003;18:65564. [91] Tada K, Okazaki T, Sakon S, Kobarai T, Kurosawa K, Yamaoka S, et al. Critical roles of TRAF2 and TRAF5 in tumor necrosis factor-induced NF-kappa B activation and protection from cell death. J Biol Chem 2001;276:365304. [92] Yang J, Lin Y, Guo Z, Cheng J, Huang J, Deng L, et al. The essential role of MEKK3 in TNF-induced NFkappaB activation. Nat Immunol 2001;2:6204. [93] Blonska M, Shambharkar PB, Kobayashi M, Zhang D, Sakurai H, Su B, et al. TAK1 is recruited to the tumor necrosis factor-alpha (TNF-alpha) receptor 1 complex in a receptor-interacting protein (RIP)dependent manner and cooperates with MEKK3 leading to NF-kappaB activation. J Biol Chem 2005;280:4305663. [94] Ea CK, Deng L, Xia ZP, Pineda G, Chen ZJ. Activation of IKK by TNFalpha requires site-specific ubiquitination of RIP1 and polyubiquitin binding by NEMO. Mol Cell 2006;22:24557. [95] Hsu H, Huang J, Shu HB, Baichwal V, Goeddel DV. TNF-dependent recruitment of the protein kinase RIP to the TNF receptor-1 signaling complex. Immunity 1996;4:38796. [96] Wang C, Deng L, Hong M, Akkaraju GR, Inoue J, Chen ZJ. TAK1 is a ubiquitin-dependent kinase of MKK and IKK. Nature 2001;412:34651. [97] Stylianou E, Saklatvala J. Interleukin-1. Int J Biochem Cell Biol 1998;30:10759. [98] Akashi-Takamura S, Miyake K. Toll-like receptors (TLRs) and immune disorders. J Infect Chemother 2006;12:23340. [99] Takeda K, Akira S. TLR signaling pathways. Semin Immunol 2004;16:39.
References
607
[100] Martin MU, Wesche H. Summary and comparison of the signaling mechanisms of the Toll/interleukin-1 receptor family. Biochim Biophys Acta 2002;1592:26580. [101] Yamamoto M, Akira S. TIR domain-containing adaptors regulate TLR signaling pathways. Adv Exp Med Biol 2005;560:19. [102] Zhang G, Ghosh S. Negative regulation of Toll-like receptor-mediated signaling by Tollip. J Biol Chem 2002;277:705965. [103] Ye H, Arron JR, Lamothe B, Cirilli M, Kobayashi T, Shevde NK, et al. Distinct molecular mechanism for initiating TRAF6 signalling. Nature 2002;418:4437. [104] Cheng H, Addona T, Keshishian H, Dahlstrand E, Lu C, Dorsch M, et al. Regulation of IRAK-4 kinase activity via autophosphorylation within its activation loop. Biochem Biophys Res Commun 2007;352:60916. [105] Li X, Commane M, Burns C, Vithalani K, Cao Z, Stark GR. Mutant cells that do not respond to interleukin-1 (IL-1) reveal a novel role for IL-1 receptor-associated kinase. Mol Cell Biol 1999;19:464352. [106] Jiang Z, Ninomiya-Tsuji J, Qian Y, Matsumoto K, Li X. Interleukin-1 (IL-1) receptor-associated kinasedependent IL-1-induced signaling complexes phosphorylate TAK1 and TAB2 at the plasma membrane and activate TAK1 in the cytosol. Mol Cell Biol 2002;22:715867. [107] Yang K, Zhu J, Sun S, Tang Y, Zhang B, Diao L, et al. The coiled-coil domain of TRAF6 is essential for its auto-ubiquitination. Biochem Biophys Res Commun 2004;324:4329. [108] Lamothe B, Besse A, Campos AD, Webster WK, Wu H, Darnay BG. Site-specific Lys-63-linked tumor necrosis factor receptor-associated factor 6 auto-ubiquitination is a critical determinant of I kappa B kinase activation. J Biol Chem 2007;282:410212. [109] Sato S, Sanjo H, Takeda K, Ninomiya-Tsuji J, Yamamoto M, Kawai T, et al. Essential function for the kinase TAK1 in innate and adaptive immune responses. Nat Immunol 2005;6:108795. [110] Gottipati S, Rao NL, Fung-Leung WP. IRAK1: a critical signaling mediator of innate immunity. Cell Signal 2008;20:26976. [111] Thomas JA, Allen JL, Tsen M, Dubnicoff T, Danao J, Liao XC, et al. Impaired cytokine signaling in mice lacking the IL-1 receptor-associated kinase. J Immunol 1999;163:97884. [112] Wu H, Arron JR. TRAF6, a molecular bridge spanning adaptive immunity, innate immunity and osteoimmunology. Bioessays 2003;25:1096105. [113] Lomaga MA, Yeh WC, Sarosi I, Duncan GS, Furlonger C, Ho A, et al. TRAF6 deficiency results in osteopetrosis and defective interleukin-1, CD40, and LPS signaling. Genes Dev 1999;13:101524. [114] Naito A, Azuma S, Tanaka S, Miyazaki T, Takaki S, Takatsu K, et al. Severe osteopetrosis, defective interleukin-1 signalling and lymph node organogenesis in TRAF6-deficient mice. Genes Cell 1999;4:35362. [115] Bian ZM, Elner SG, Yoshida A, Kunkel SL, Su J, Elner VM. Activation of p38, ERK1/2 and NIK pathways is required for IL-1beta and TNF-alpha-induced chemokine expression in human retinal pigment epithelial cells. Exp Eye Res 2001;73:11121. [116] Nakano H, Oshima H, Chung W, Williams-Abbott L, Ware CF, Yagita H, et al. TRAF5, an activator of NFkappaB and putative signal transducer for the lymphotoxin-beta receptor. J Biol Chem 1996;271:146614. [117] Shinkura R, Kitada K, Matsuda F, Tashiro K, Ikuta K, Suzuki M, et al. Alymphoplasia is caused by a point mutation in the mouse gene encoding Nf-kappa b-inducing kinase. Nat Genet 1999;22:747. [118] Coornaert B, Carpentier I, Beyaert R. A20: central gatekeeper in inflammation and immunity. J Biol Chem 2009;284:821721. [119] Opipari Jr. AW, Hu HM, Yabkowitz R, Dixit VM. The A20 zinc finger protein protects cells from tumor necrosis factor cytotoxicity. J Biol Chem 1992;267:124247. [120] Lee EG, Boone DL, Chai S, Libby SL, Chien M, Lodolce JP, et al. Failure to regulate TNF-induced NFkappaB and cell death responses in A20-deficient mice. Science 2000;289:23504. [121] Yin L, Wu L, Wesche H, Arthur CD, White JM, Goeddel DV, et al. Defective lymphotoxin-beta receptorinduced NF-kappaB transcriptional activity in NIK-deficient mice. Science 2001;291:21625. [122] Yang J, Boerm M, McCarty M, Bucana C, Fidler IJ, Zhuang Y, et al. Mekk3 is essential for early embryonic cardiovascular development. Nat Genet 2000;24:30913. [123] Oda K, Kitano H. A comprehensive map of the Toll-like receptor signaling network. Mol Syst Biol 2006;2:2006.0015. [124] Frantz S, Kobzik L, Kim YD, Fukazawa R, Medzhitov R, Lee RT, et al. Toll4 (TLR4) expression in cardiac myocytes in normal and failing myocardium. J Clin Invest 1999;104:27180.
608
References
[125] Donato R. Intracellular and extracellular roles of S100 proteins. Microsc Res Tech 2003;60:54051. [126] Rafii S, Lyden D. S100 chemokines mediate bookmarking of premetastatic niches. Nat Cell Biol 2006;8:13213. [127] Moon A, Yong HY, Song JI, Cukovic D, Salagrama S, Kaplan D, et al. Global gene expression profiling unveils S100A8/A9 as candidate markers in H-ras-mediated human breast epithelial cell invasion. Mol Cancer Res 2008;6:154453. [128] Klune JR, Dhupar R, Cardinal J, Billiar TR, Tsung A. HMGB1: endogenous danger signaling. Mol Med 2008;14:47684. [129] Park JS, Svetkauskaite D, He Q, Kim JY, Strassheim D, Ishizaka A, et al. Involvement of Toll-like receptors 2 and 4 in cellular activation by high mobility group box 1 protein. J Biol Chem 2004;279:73707. [130] Bianchi ME, Manfredi AA. High-mobility group box 1 (HMGB1) protein at the crossroads between innate and adaptive immunity. Immunol Rev 2007;220:3546. [131] Evans PC, Taylor ER, Coadwell J, Heyninck K, Beyaert R, Kilshaw PJ. Isolation and characterization of two novel A20-like proteins. Biochem J 2001;357:61723. [132] Thomson W, Barton A, Ke X, Eyre S, Hinks A, Bowes J, et al. Mekk3 is essential for early embryonic cardiovascular development. Nat Genet 2000;24:30913. [133] Musone SL, Taylor KE, Lu TT, Nititham J, Ferreira RC, Ortmann W, et al. Multiple polymorphisms in the TNFAIP3 region are independently associated with systemic lupus erythematosus. Nat Genet 2008;40:10624. [134] The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007;447:66178. [135] Dagia NM, Goetz DJ. A proteasome inhibitor reduces concurrent, sequential, and long-term IL-1 betaand TNF-alpha-induced ECAM expression and adhesion. Am J Physiol Cell Physiol 2003;285: C81322. [136] Kalogeris TJ, Laroux FS, Cockrell A, Ichikawa H, Okayama N, Phifer TJ, et al. Effect of selective proteasome inhibitors on TNF-induced activation of primary and transformed endothelial cells. Am J Physiol 1999;276: C85664. [137] Cheong R, Bergmann A, Werner SL, Regal J, Hoffmann A, Levchenko A. Transient IkappaB kinase activity mediates temporal NF-kappaB dynamics in response to a wide range of tumor necrosis factor-alpha doses. J Biol Chem 2006;281:294550. [138] Chen BS, Wang YC. On the attenuation and amplification of molecular noise in genetic regulatory networks. BMC Bioinformatics 2006;7:52. [139] Pfaller M, Diekema D. Epidemiology of invasive candidiasis: a persistent public health problem. Clin Microbiol Rev 2007;20(1):13363. [140] Calderone RA, Fonzi WA. Virulence factors of Candida albicans. Trends Microbiol 2001;9(7):32735. [141] Karkowska-Kuleta J, Rapala-Kozik M, Kozik A. Fungi pathogenic to humans: molecular bases of virulence of Candida albicans, Cryptococcus neoformans and Aspergillus fumigatus. Acta Biochim Pol 2009;56 (2):211. [142] Martin R, et al. Hostpathogen interactions and virulence-associated genes during Candida albicans oral infections. Int J Med Microbiol 2011;301(5):41722. [143] Zakikhany K, et al. In vivo transcript profiling of Candida albicans identifies a gene essential for interepithelial dissemination. Cell Microbiol 2007;9(12):293854. [144] Wa¨chtler B, et al. From attachment to damage: defined genes of Candida albicans mediate adhesion, invasion and damage during interaction with oral epithelial cells. PLoS One 2011;6(2):e17046. [145] Dalle F, et al. Cellular interactions of Candida albicans with human oral epithelial cells and enterocytes. Cell Microbiol 2010;12(2):24871. [146] Zhu WD, Filler SG. Interactions of Candida albicans with epithelial cells. Cell Microbiol 2010;12(3):27382. [147] Kitano H. Foundations of systems biology. Cambridge, MA: MIT Press; 2001. [148] Barabasi A-L, Oltvai ZN. Network biology: understanding the cell’s functional organization. Nat Rev Genet 2004;5(2):10113. [149] Emmert-Streib F, Glazko GV. Network biology: a direct approach to study biological function. Wiley Interdiscip Rev: Syst Biol Med 2011;3(4):37991.
References
609
[150] Joyce AR, Palsson BØ. The model organism as a system: integrating ‘omics’ data sets. Nat Rev Mol Cell Biol 2006;7(3):198210. [151] Vazquez A, et al. Global protein function prediction from protein-protein interaction networks. Nat Biotechnol 2003;21(6):697700. [152] Sharan R, Ulitsky I, Shamir R. Network-based prediction of protein function. Mol Syst Biol 2007;3(1). [153] Hu L, et al. Predicting protein phenotypes based on protein-protein interaction network. PLoS One 2011;6 (3):e17668. [154] Lee TI, et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 2002;298 (5594):799804. [155] Jones T, et al. The diploid genome sequence of Candida albicans. Proc Natl Acad Sci USA 2004;101 (19):732934. [156] Braun BR, et al. A human-curated annotation of the Candida albicans genome. PLoS Genet 2005;1(1):e1. [157] Teixeira MC, et al. The YEASTRACT database: a tool for the analysis of transcription regulatory associations in Saccharomyces cerevisiae. Nucleic acids Res 2006;34(Suppl. 1):D44651. [158] Arnaud MB, et al. Sequence resources at the Candida genome database. Nucleic Acids Res 2007;35(Suppl. 1): D4526. [159] Nobile CJ, et al. Biofilm matrix regulation by Candida albicans Zap1. PLoS Biol 2009;7(6):e1000133. [160] Borneman AR, et al. Divergence of transcription factor binding sites across related yeast species. Science 2007;317(5839):81519. [161] Tuch BB, et al. The evolution of combinatorial gene regulation in fungi. PLoS Biol 2008;6(2):e38. [162] Lavoie H, et al. Evolutionary tinkering with conserved components of a transcriptional regulatory network. PLoS Biol 2010;8(3):e1000329. [163] Liu TT, et al. Genome-wide expression and location analyses of the Candida albicans Tac1p regulon. Eukaryot Cell 2007;6(11):212238. [164] Znaidi S, et al. Genomewide location analysis of Candida albicans Upc2p, a regulator of sterol metabolism and azole drug resistance. Eukaryot Cell 2008;7(5):83647. [165] Sellam A, et al. Genome-wide mapping of the coactivator Ada2p yields insight into the functional roles of SAGA/ADA complex in Candida albicans. Mol Biol Cell 2009;20(9):2389400. [166] Znaidi S, et al. Identification of the Candida albicans Cap1p regulon. Eukaryot Cell 2009;8(6):80620. [167] Zordan RE, et al. Interlocking transcriptional feedback loops control white-opaque switching in Candida albicans. PLoS Biol 2007;5(10):e256. [168] Chen C, et al. An iron homeostasis regulatory circuit with reciprocal roles in Candida albicans commensalism and pathogenesis. Cell Host Microbe 2011;10(2):11835. [169] Schubert S, et al. Regulation of efflux pump expression and drug resistance by the transcription factors Mrr1, Upc2, and Cap1 in Candida albicans. Antimicrob Agents Chemother 2011;55(5):221223. [170] Sellam A, Tebbji F, Nantel A. Role of Ndt80p in sterol metabolism regulation and azole resistance in Candida albicans. Eukaryot Cell 2009;8(8):117483. [171] Hogues H, et al. Transcription factor substitution during the evolution of fungal ribosome regulation. Mol Cell 2008;29(5):55262. [172] Askew C, et al. The zinc cluster transcription factor Ahr1p directs Mcm1p regulation of Candida albicans adhesion. Mol Microbiol 2011;79(4):94053. [173] Askew C, et al. Transcriptional regulation of carbohydrate metabolism in the human pathogen Candida albicans. PLoS Pathog 2009;5(10):e1000612. [174] Schaller M, et al. Models of oral and vaginal candidiasis based on in vitro reconstituted human epithelia. Nat Protoc 2006;1(6):276773. [175] Fu Y, et al. Candida albicans Als1p: an adhesin that is a downstream effector of the EFG1 filamentation pathway. Mol Microbiol 2002;44(1):6172. [176] Phan QT, et al. Als3 is a Candida albicans invasin that binds to cadherins and induces endocytosis by host cells. PLoS Biol 2007;5(3):e64. [177] Zhao X, et al. Analysis of the Candida albicans Als2p and Als4p adhesins suggests the potential for compensatory function within the Als family. Microbiology 2005;151(5):161930. [178] Zhao X, Oh S-H, Hoyer LL. Unequal contribution of ALS9 alleles to adhesion between Candida albicans and human vascular endothelial cells. Microbiology 2007;153(7):234250.
610
References
[179] Kim SW, Joo YJ, Kim J. Asc1p, a ribosomal protein, plays a pivotal role in cellular adhesion and virulence in Candida albicans. J Microbiol 2010;48(6):8428. [180] Umeyama T, et al. Deletion of the CaBIG1 gene reduces β-1, 6-glucan synthesis, filamentation, adhesion, and virulence in Candida albicans. Infect Immun 2006;74(4):237381. [181] Palmer GE, Sturtevant JE. Random mutagenesis of an essential Candida albicans gene. Curr Genet 2004;46 (6):34356. [182] Warenda AJ, et al. Candida albicans septin mutants are defective for invasive growth and virulence. Infect Immun 2003;71(7):404551. [183] Clemente-Blanco A, et al. The Cdc14p phosphatase affects late cell-cycle events and morphogenesis in Candida albicans. J Cell Sci 2006;119(6):113043. [184] Singleton DR, Masuoka J, Hazen KC. Cloning and analysis of a Candida albicans gene that affects cell surface hydrophobicity. J Bacteriol 2001;183(12):35828. [185] Hope H, et al. Activation of Rac1 by the guanine nucleotide exchange factor Dck1 is required for invasive filamentous growth in the pathogen Candida albicans. Mol Biol Cell 2008;19(9):363851. [186] Thewes S, et al. In vivo and ex vivo comparative transcriptional profiling of invasive and non-invasive Candida albicans isolates identifies genes associated with tissue invasion. Mol Microbiol 2007;63(6):160628. [187] Zucchi PC, Davis TR, Kumamoto CA. A Candida albicans cell wall-linked protein promotes invasive filamentation into semi-solid medium. Mol Microbiol 2010;76(3):73348. [188] Chen X, Kumamoto CA. A conserved G protein (Drg1p) plays a role in regulation of invasive filamentation in Candida albicans. Microbiology 2006;152(12):3691700. [189] Li F, et al. Eap1p, an adhesin that mediates Candida albicans biofilm formation in vitro and in vivo. Eukaryot Cell 2007;6(6):9319. [190] Eck R, et al. A multicopper oxidase gene from Candida albicans: cloning, characterization and disruption. Microbiology 1999;145(9):241522. [191] Almeida RS, et al. The hyphal-associated adhesin and invasin Als3 of Candida albicans mediates iron acquisition from host ferritin. PLoS Pathog 2008;4(11):e1000217. [192] Miwa T, et al. Gpr1, a putative G-protein-coupled receptor, regulates morphogenesis and hypha formation in the pathogenic fungus Candida albicans. Eukaryot Cell 2004;3(4):91931. [193] Maidan MM, et al. The G protein-coupled receptor Gpr1 and the Gα protein Gpa2 act through the cAMPprotein kinase A pathway to induce morphogenesis in Candida albicans. Mol Biol Cell 2005;16(4):197186. [194] Ferreira C, et al. Candida albicans virulence and drug-resistance requires the O-acyltransferase Gup1p. BMC Microbiol 2010;10(1):238. [195] Zacchi LF, Schulz WL, Davis DA. HOS2 and HDA1 encode histone deacetylases with opposing roles in Candida albicans morphogenesis. PLoS One 2010;5(8):e12171. [196] Roth-Ben Arie Z, et al. Adhesion of Candida albicans mutant strains to host tissue. FEMS Microbiol Lett 1998;163(2):1217. [197] Sun JN, et al. Host cell invasion and virulence mediated by Candida albicans Ssa1. PLoS Pathog 2010;6(11): e1001181. [198] Younes S, et al. The Candida albicans Hwp2 is necessary for proper adhesion, biofilm formation and oxidative stress tolerance. Microbiol Res 2011;166(5):4306. [199] Gale CA, et al. Linkage of adhesion, filamentous growth, and virulence in Candida albicans to a single gene, INT1. Science 1998;279(5355):13558. [200] Rouabhia M, et al. Disruption of sphingolipid biosynthetic gene IPT1 reduces Candida albicans adhesion and prevents activation of human gingival epithelial cell innate immune defense. Med Mycol 2011; 49(5):45866. [201] Badrane H, et al. Candida albicans IRS4 contributes to hyphal formation and virulence after the initial stages of disseminated candidiasis. Microbiology 2005;151(9):292331. [202] Newport G, et al. Inactivation of Kex2p diminishes the virulence of Candida albicans. J Biol Chem 2003;278 (3):171320. [203] Herrero AB, et al. KRE5 gene null mutant strains of Candida albicans are avirulent and have altered cell wall composition and hypha formation properties. Eukaryot Cell 2004;3(6):142332. [204] Hope H, et al. The Candida albicans ELMO homologue functions together with Rac1 and Dck1, upstream of the MAP Kinase Cek1, in invasive filamentous growth. Mol Microbiol 2010;76(6):157290.
References
611
[205] Munro CA, et al. Mnt1p and Mnt2p of Candida albicans are partially redundant α-1,2-mannosyltransferases that participate in O-linked mannosylation and are required for adhesion and virulence. J Biol Chem 2005;280(2):105160. [206] Sandini S, et al. The MP65 gene is required for cell wall integrity, adherence to epithelial cells and biofilm formation in Candida albicans. BMC Microbiol 2011;11(1):106. [207] Wilson D, et al. Deletion of the high-affinity cAMP phosphodiesterase encoded by PDE2 affects stress responses and virulence in Candida albicans. Mol Microbiol 2007;65(4):84156. [208] Franke K, et al. The vesicle transport protein Vac1p is required for virulence of Candida albicans. Microbiology 2006;152(10):311121. [209] Hashash R, et al. Characterisation of Pga1, a putative Candida albicans cell wall protein necessary for proper adhesion and biofilm formation. Mycoses 2011;54(6):491500. [210] Calderon J, et al. PHR1, a pH-regulated gene of Candida albicans encoding a glucan-remodelling enzyme, is required for adhesion and invasion. Microbiology 2010;156(8):248494. [211] Yuan X, et al. The RIM101 signal transduction pathway regulates Candida albicans virulence during experimental keratomycosis. Investig Ophthalmol Vis Sci 2010;51(9):466876. [212] Hube B, et al. The role and relevance of phospholipase D1 during growth and dimorphism of Candida albicans. Microbiology 2001;147(4):87989. [213] Timpel C, et al. Multiple functions of Pmt1p-mediated Protein O-mannosylation in the fungal pathogen Candida albicans. J Biol Chem 1998;273(33):2083746. [214] Rouabhia M, et al. Virulence of the fungal pathogen Candida albicans requires the five isoforms of protein mannosyltransferases. Infect Immun 2005;73(8):457180. [215] Timpel C, et al. Morphogenesis, adhesive properties, and antifungal resistance depend on the Pmt6 protein mannosyltransferase in the fungal pathogen Candida albicans. J Bacteriol 2000;182(11):306371. [216] Soloviev DA, et al. Identification of pH-regulated antigen 1 released from Candida albicans as the major ligand for leukocyte integrin αMβ2. J Immunol 2007;178(4):203846. [217] Bassilana M, Arkowitz RA. Rac1 and Cdc42 have different roles in Candida albicans development. Eukaryot Cell 2006;5(2):3219. [218] de Boer AD, et al. The Candida albicans cell wall protein Rhd3/Pga29 is abundant in the yeast form and contributes to virulence. Yeast 2010;27(8):61124. [219] Watts H, et al. Altered adherence in strains of Candida albicans harbouring null mutations in secreted aspartic proteinase genes. FEMS Microbiol Lett 1998;159(1):12935. [220] Schaller M, et al. The secreted aspartyl proteinases Sap1 and Sap2 cause tissue damage in an in vitro model of vaginal candidiasis based on reconstituted human vaginal epithelium. Infect Immun 2003;71(6):322734. [221] Albrecht A, et al. Glycosylphosphatidylinositol-anchored proteases of Candida albicans target proteins necessary for both cellular processes and host-pathogen interactions. J Biol Chem 2006;281(2):68894. [222] Raman SB, et al. Candida albicans SET1 encodes a histone 3 lysine 4 methyltransferase that contributes to the pathogenesis of invasive candidiasis. Mol Microbiol 2006;60(3):697709. [223] Song W, Wang H, Chen J. Candida albicans Sfl2, a temperature-induced transcriptional regulator, is required for virulence in a murine gastrointestinal infection model. FEMS Yeast Res 2011;11(2):20922. [224] Spiering MJ, et al. Comparative transcript profiling of Candida albicans and Candida dubliniensis identifies SFL2, a C. albicans gene required for virulence in a reconstituted epithelial infection model. Eukaryot Cell 2010;9(2):25165. [225] Elson SL, et al. An RNA transport system in Candida albicans regulates hyphal morphology and invasive growth. PLoS Genet 2009;5(9):e1000664. [226] Heymann P, et al. The siderophore iron transporter of Candida albicans (Sit1p/Arn1p) mediates uptake of ferrichrome-type siderophores and is required for epithelial invasion. Infect Immun 2002;70(9):524655. [227] Yi S, et al. A Candida albicans-specific region of the α-pheromone receptor plays a selective role in the white cell pheromone response. Mol Microbiol 2009;71(4):92547. [228] Hiller E, et al. Candida albicans Sun41p, a putative glycosidase, is involved in morphogenesis, cell wall biogenesis, and biofilm formation. Eukaryot Cell 2007;6(11):205665. [229] Alvarez FJ, et al. The Sur7 protein regulates plasma membrane organization and prevents intracellular cell wall growth in Candida albicans. Mol Biol Cell 2008;19(12):521425. [230] Martı´nez-Esparza M, et al. Role of trehalose in resistance to macrophage killing: study with a tps1/tps1 trehalose-deficient mutant of Candida albicans. Clin Microbiol Infect 2007;13(4):38494.
612
References
[231] Alberti-Segui C, et al. Identification of potential cell-surface proteins in Candida albicans and investigation of the role of a putative cell-surface glycosidase in adhesion and virulence. Yeast 2004;21(4):285302. [232] Bruckmann A, et al. A phosphatidylinositol 3-kinase of Candida albicans influences adhesion, filamentous growth and virulence. Microbiology 2000;146(11):275564. [233] Park H, et al. Transcriptional responses of Candida albicans to epithelial and endothelial cells. Eukaryot Cell 2009;8(10):1498510. [234] Kumamoto CA. A contact-activated kinase signals Candida albicans invasive growth and biofilm development. Proc Natl Acad Sci USA 2005;102(15):557681. [235] Naglik JR, et al. Candida albicans interactions with epithelial cells and mucosal immunity. Microbes Infect 2011;13(12):96376. [236] Sudbery PE. Growth of Candida albicans hyphae. Nat Rev Microbiol 2011;9(10):73748. [237] Selvaggini S, et al. Independent regulation of chitin synthase and chitinase activity in Candida albicans and Saccharomyces cerevisiae. Microbiology 2004;150(4):9218. [238] Gottlieb S, et al. Adhesion of Candida albicans to epithelial cells effect of polyoxin D. Mycopathologia 1991;115(3):197205. [239] Tsai P-W, et al. Human antimicrobial peptide LL-37 inhibits adhesion of Candida albicans by interacting with yeast cell-wall carbohydrates. PLoS One 2011;6(3):e17755. [240] Hameed S, et al. Calcineurin signaling and membrane lipid homeostasis regulates iron mediated multidrug resistance mechanisms in Candida albicans. PLoS One 2011;6(4):e18684. [241] Heung LJ, Luberto C, Del Poeta M. Role of sphingolipids in microbial pathogenesis. Infect Immun 2006;74 (1):2839. [242] Lan CY, et al. Regulatory networks affected by iron availability in Candida albicans. Mol Microbiol 2004;53 (5):145169. [243] Leach MD, et al. Molecular and proteomic analyses highlight the importance of ubiquitination for the stress resistance, metabolic adaptation, morphogenetic regulation and virulence of Candida albicans. Mol Microbiol 2011;79(6):157493. [244] Gow N, et al. A hyphal-specific chitin synthase gene (CHS2) is not essential for growth, dimorphism, or virulence of Candida albicans. Proc Natl Acad Sci 1994;91(13):621620. [245] Martin SW, Konopka JB. Lipid raft polarization contributes to hyphal growth in Candida albicans. Eukaryot Cell 2004;3(3):67584. [246] Chou H, Glory A, Bachewich C. Orthologues of the anaphase-promoting complex/cyclosome coactivators Cdc20p and Cdh1p are important for mitotic progression and morphogenesis in Candida albicans. Eukaryot Cell 2011;10(5):696709. [247] Umeyama T, et al. Candida albicans protein kinase CaHsl1p regulates cell elongation and virulence. Mol Microbiol 2005;55(2):38195. [248] Nett JE, et al. Interface of Candida albicans biofilm matrix-associated drug resistance and cell wall integrity regulation. Eukaryot Cell 2011;10(12):16609. [249] Kitamura A, et al. Effect of β-1,6-glucan inhibitors on the invasion process of Candida albicans: potential mechanism of their in vivo efficacy. Antimicrobial Agents Chemother 2009;53(9):396371. [250] Hwang CS, et al. Ssn6, an important factor of morphological conversion and virulence in Candida albicans. Mol Microbiol 2003;47(4):102943. [251] Rizzetto L, Cavalieri D. Friend or foe: using systems biology to elucidate interactions between fungi and their hosts. Trends Microbiol 2011;19(10):50915. [252] Aderem A, et al. A systems biology approach to infectious disease research: innovating the pathogen-host research paradigm. MBio 2011;2(1):e0032510. [253] Odds FC, Brown AJ, Gow NA. Antifungal agents: mechanisms of action. Trends Microbiol 2003;11(6):2729. [254] Jacobsen ID, et al. Candida albicans dimorphism as a therapeutic target. Expert Rev. Anti Infect. Ther. 2012;10(1):8593. [255] Boysen JH, et al. Detection of proteinprotein interactions through vesicle targeting. Genetics 2009;182(1):339. [256] Stynen B, Van Dijck P, Tournu H. A CUG codon adapted two-hybrid system for the pathogenic fungus Candida albicans. Nucleic Acids Res 2010;38(19):e184. [257] Chang YH, Wang YC, Chen BS. Identification of transcription factor cooperativity via stochastic system model. Bioinformatics 2006;22(18):227682.
References
613
[258] Werner E. An introduction to systems biology: design principles of biological circuits. Nature 2007;446 (7135):4934. [259] Gill PE, Murray W, Wright MH. Practical optimization, xvi. London; New York: Academic Press; 1981. p. 401. [260] Ihmels J, et al. Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet 2005;1(3):e39. [261] Murillo LA, et al. Genome-wide transcription profiling of the early phase of biofilm formation by Candida albicans. Eukaryot Cell 2005;4(9):156273. [262] Hube B. Candida: comparative and functional genomics. Horizon Scientific Press; 2007. [263] Calderone RA. Candida and candidiasis. 2002. [264] Seneviratne CJ, Jin L, Samaranayake LP. Biofilm lifestyle of Candida: a mini review. Oral Dis 2008;14 (7):58290. [265] Warnock DW. Trends in the epidemiology of invasive fungal infections. Nihon Ishinkin Gakkai Zasshi 2007;48(1):112. [266] Barnes RA. Early diagnosis of fungal infection in immunocompromised patients. J Antimicrob Chemother 2008;61(Suppl. 1):i36. [267] Richard ML, et al. Candida albicans biofilm-defective mutants. Eukaryot Cell 2005;4(8):1493502. [268] Blankenship JR, Mitchell AP. How to build a biofilm: a fungal perspective. Curr Opin Microbiol 2006;9 (6):58894. [269] Costerton JW, et al. Microbial biofilms. Annu Rev Microbiol 1995;49:71145. [270] Goffeau A, et al. Life with 6000 genes. Science 1996;274(5287):546 563-7. [271] Heckman DS, et al. Molecular evidence for the early colonization of land by fungi and plants. Science 2001;293(5532):112933. [272] Harbison CT, et al. Transcriptional regulatory code of a eukaryotic genome. Nature 2004;431 (7004):99104. [273] Ashburner M, et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000;25(1):259. [274] Berglund AC, et al. InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucleic Acids Res 2008;36: D2636 (Database issue). [275] Borneman AR, et al. Transcription factor binding site identification in yeast: a comparison of high-density oligonucleotide and PCR-based microarray platforms. Funct Integr Genomics 2007;7(4):33545. [276] Wu WS, Li WH, Chen BS. Identifying regulatory targets of cell cycle transcription factors using gene expression and ChIP-chip data. BMC Bioinformatics 2007;8:188. [277] Alon U. An introduction to systems biology: design principles of biological circuits. CRC Press; 2006. [278] Chen KC, et al. A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae. Bioinformatics 2005;21(12):288390. [279] Chen HC, et al. Quantitative characterization of the transcriptional regulatory network in the yeast cell cycle. Bioinformatics 2004;20(12):191427. [280] Coleman T, Hulbert L. A direct active set algorithm for large sparse quadratic programs with simple bounds. Math Program 1989;45(13):373406. [281] Mendenhall W, Sincich T. Statistics for engineering and the sciences. Prentice-Hall, Inc; 2006. [282] Pagano M, Gauvreau K, Pagano M. Principles of biostatistics, 2. Pacific Grove, CA: Duxbury; 2000. [283] Emmert-Streib F. The chronic fatigue syndrome: a comparative pathway analysis. J Comput Biol 2007;14 (7):96172. [284] Park H, et al. Role of the fungal Ras-protein kinase A pathway in governing epithelial cell interactions during oropharyngeal candidiasis. Cell Microbiol 2005;7(4):499510. [285] Lo HJ, et al. Nonfilamentous C. albicans mutants are avirulent. Cell 1997;90(5):93949. [286] Liu H, Kohler J, Fink GR. Suppression of hyphal formation in Candida albicans by mutation of a STE12 homolog. Science 1994;266(5191):17236. [287] Doedt T, et al. APSES proteins regulate morphogenesis and metabolism in Candida albicans. Mol Biol Cell 2004;15(7):316780. [288] Biswas K, Rieger KJ, Morschhauser J. Functional analysis of CaRAP1, encoding the Repressor/activator protein 1 of Candida albicans. Gene 2003;307:1518.
614
References
[289] Nobile CJ, Mitchell AP. Regulation of cell-surface genes and biofilm formation by the C. albicans transcription factor Bcr1p. Curr Biol 2005;15(12):11505. [290] Schweizer A, et al. The TEA/ATTS transcription factor CaTec1p regulates hyphal development and virulence in Candida albicans. Mol Microbiol 2000;38(3):43545. [291] Uhl MA, et al. Haploinsufficiency-based large-scale forward genetic analysis of filamentous growth in the diploid human fungal pathogen C. albicans. EMBO J 2003;22(11):266878. [292] Garcia-Sanchez S, et al. Candida albicans biofilms: a developmental state associated with specific and stable gene expression patterns. Eukaryot Cell 2004;3(2):53645. [293] Singh P, et al. SKN7 of Candida albicans: mutant construction and phenotype analysis. Infect Immun 2004;72 (4):23904. [294] Rottmann M, et al. A screen in Saccharomyces cerevisiae identified CaMCM1, an essential gene in Candida albicans crucial for morphogenesis. Mol Microbiol 2003;47(4):94359. [295] Sonneborn A, Tebarth B, Ernst JF. Control of white-opaque phenotypic switching in Candida albicans by the Efg1p morphogenetic regulator. Infect Immun 1999;67(9):465560. [296] Chen J, et al. A conserved mitogen-activated protein kinase pathway is required for mating in Candida albicans. Mol Microbiol 2002;46(5):133544. [297] Ramage G, et al. The filamentation pathway controlled by the Efg1 regulator protein is required for normal biofilm formation and development in Candida albicans. FEMS Microbiol Lett 2002;214(1):95100. [298] Lewis RE, et al. Lack of catheter infection by the efg1/efg1 cph1/cph1 double-null mutant, a Candida albicans strain that is defective in filamentous growth. Antimicrob Agents Chemother 2002;46(4):11535. [299] Tripathi G, et al. Gcn4 co-ordinates morphogenetic and metabolic responses to amino acid starvation in Candida albicans. EMBO J 2002;21(20):544856. [300] Tirosh I, Bilu Y, Barkai N. Comparative biology: beyond sequence analysis. Curr Opin Biotechnol 2007;18 (4):3717. [301] Cuccato G, Della Gatta G, di Bernardo D. Systems and synthetic biology: tackling genetic networks and complex diseases. Heredity (Edinb) 2009;102(6):52732. [302] Qian J, et al. Beyond synexpression relationships: local clustering of time-shifted and inverted gene expression profiles identifies new, biologically relevant interactions. J Mol Biol 2001;314(5):105366. [303] Ji L, Tan KL. Identifying time-lagged gene clusters using gene expression data. Bioinformatics 2005;21 (4):50916. [304] Lu TK, Collins JJ. Dispersing biofilms with engineered enzymatic bacteriophage. Proc Natl Acad Sci USA 2007;104(27):11197202. [305] Wang X, Preston 3rd JF, Romeo T. The pgaABCD locus of Escherichia coli promotes the synthesis of a polysaccharide adhesin required for biofilm formation. J Bacteriol 2004;186(9):272434. [306] Itoh Y, et al. Depolymerization of beta-1,6-N-acetyl-D-glucosamine disrupts the integrity of diverse bacterial biofilms. J Bacteriol 2005;187(1):3827. [307] Bar-Joseph Z, et al. A new approach to analyzing gene expression time series data. In: Proceedings of the sixth annual international conference on computational biology. ACM; 2002. p. 3948. [308] De Boor C. A practical guide to splines. Math Comput Springer, 1978. [309] Burden RL, Faires JD. Numerical analysis PWS. Boston, MA: Kent Publishing Co.; 1989. [310] Altman D, Bland J. Statistics notes: diagnostic tests 1: sensitivity and specificity. BMJ 1994;308 (6943):1552. [311] Altman DG, Bland JM. Statistics notes: diagnostic tests 2: predictive values. BMJ 1994;309(6947):102. [312] Leroy O, et al. Epidemiology, management, and risk factors for death of invasive Candida infections in critical care: a multicenter, prospective, observational study in France (2005-2006). Crit Care Med 2009;37 (5):161218. [313] Kojic EM, Darouiche RO. Candida infections of medical devices. Clin Microbiol Rev 2004;17(2):25567. [314] Olorode OA, Okpokwasli GC. The efficacy of disinfectants on abattoirs’ Candida albicans isolates in Niger Delta region. F1000Res 2012;1:20. [315] Leberer E, et al. Virulence and hyphal formation of Candida albicans require the Ste20p-like protein kinase CaCla4p. Curr Biol 1997;7(8):53946. [316] Meeker ND, Trede NS. Immunology and zebrafish: spawning new models of human disease. Dev Comp Immunol 2008;32(7):74557.
References
615
[317] Sullivan C, Kim CH. Zebrafish as a model for infectious disease and immune function. Fish Shellfish Immunol 2008;25(4):34150. [318] Amsterdam A, Hopkins N. Mutagenesis strategies in zebrafish for identifying genes involved in development and disease. Trends Genet 2006;22(9):4738. [319] Postlethwait J, Amores A, Force A, Yan YL. The zebrafish genome. Methods Cell Biol 1998;60:14963. [320] Chao CC, et al. Zebrafish as a model host for Candida albicans infection. Infect Immun 2010;78 (6):251221. [321] Orntoft TF, et al. Genome-wide study of gene copy numbers, transcripts, and protein levels in pairs of noninvasive and invasive human transitional cell carcinomas. Mol Cell Proteom 2002;1(1):3745. [322] Newman JR, et al. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature 2006;441(7095):8406. [323] Edwards Jr. JE, et al. Neutrophil-mediated protection of cultured human vascular endothelial cells from damage by growing Candida albicans hyphae. Blood 1987;69(5):14507. [324] Hummert S, et al. Game theoretical modelling of survival strategies of Candida albicans inside macrophages. J Theor Biol 2010;264(2):31218. [325] Efron B, Tibshirani RJ. An introduction to the bootstrap. Taylor & Francis; 1994. [326] Dyer SA, Dyer JS. Cubic-spline interpolation. 1. IEEE Instrum Meas Mag 2001;4(1):446. [327] Esser K, Brown AJP. The Mycota: a comprehensive treatise on fungi as experimental systems for basic and applied research. Springer-Verlag; 2006. [328] Yaar L, Mevarech M, Koltin Y. A Candida albicans RAS-related gene (CaRSR1) is involved in budding, cell morphogenesis and hypha development. Microbiology 1997;143(Pt 9):303344. [329] Woo M, Lee K, Song K. MYO2 is not essential for viability, but is required for polarized growth and dimorphic switches in Candida albicans. FEMS Microbiol Lett 2003;218(1):195202. [330] Dunkler A, Wendland J. Candida albicans Rho-type GTPase-encoding genes required for polarized cell growth and cell separation. Eukaryot Cell 2007;6(5):84454. [331] Jeong H, et al. Lethality and centrality in protein networks. Nature 2001;411(6833):412. [332] Roig P, Gozalbo D. Depletion of polyubiquitin encoded by the UBI4 gene confers pleiotropic phenotype to Candida albicans cells. Fungal Genet Biol 2003;39(1):7081. [333] Zou H, et al. Candida albicans Cyr1, Cap1 and G-actin form a sensor/effector apparatus for activating cAMP synthesis in hyphal growth. Mol Microbiol 2010;75(3):57991. [334] Shapiro RS, et al. Hsp90 orchestrates temperature-dependent Candida albicans morphogenesis via Ras1-PKA signaling. Curr Biol 2009;19(8):6219. [335] Reijnst P, Jorde S, Wendland J. Candida albicans SH3-domain proteins involved in hyphal growth, cytokinesis, and vacuolar morphology. Curr Genet 2010;56(4):30919. [336] Rida PC, et al. Yeast-to-hyphal transition triggers formin-dependent Golgi localization to the growing tip in Candida albicans. Mol Biol Cell 2006;17(10):436478. [337] Newport G, Agabian N. KEX2 influences Candida albicans proteinase secretion and hyphal formation. J Biol Chem 1997;272(46):2895461. [338] Shin DH, et al. Characterization of thiol-specific antioxidant 1 (TSA1) of Candida albicans. Yeast 2005;22 (11):90718. [339] Csank C, et al. Roles of the Candida albicans mitogen-activated protein kinase homolog, Cek1p, in hyphal development and systemic candidiasis. Infect Immun 1998;66(6):271321. [340] Brand A, et al. Calcium homeostasis is required for contact-dependent helical and sinusoidal tip growth in Candida albicans hyphae. Mol Microbiol 2009;71(5):115564. [341] Jiang W, et al. The topoisomerase I gene from Candida albicans. Microbiology 1997;143(Pt 2):37786. [342] Ghiselli G. SMC3 knockdown triggers genomic instability and p53-dependent apoptosis in human and zebrafish cells. Mol Cancer 2006;5:52. [343] Stockhammer OW, et al. Transcriptome analysis of Traf6 function in the innate immune response of zebrafish embryos. Mol Immunol 2010;48(13):17990. [344] Phelan PE, Mellon MT, Kim CH. Functional characterization of full-length TLR3, IRAK-4, and TRAF6 in zebrafish (Danio rerio). Mol Immunol 2005;42(9):105771. [345] Grassme H, Jendrossek V, Gulbins E. Molecular mechanisms of bacteria induced apoptosis. Apoptosis 2001;6(6):4415.
616
References
[346] Buffo J, Herman MA, Soll DR. A characterization of pH-regulated dimorphism in Candida albicans. Mycopathologia 1984;85(12):2130. [347] Weissman Z, Kornitzer D. A family of Candida cell surface haem-binding proteins involved in haemin and haemoglobin-iron utilization. Mol Microbiol 2004;53(4):120920. [348] Fratti RA, et al. Endothelial cell injury caused by Candida albicans is dependent on iron. Infect Immun 1998;66(1):1916. [349] Vidotto V, et al. Glucose influence on germ tube production in Candida albicans. Mycopathologia 1996;133 (3):1437. [350] Hudson DA, et al. Identification of the dialysable serum inducer of germ-tube formation in Candida albicans. Microbiology 2004;150(Pt 9):30419. [351] Paranjape V, Datta A. Role of nutritional status of the cell in pH regulated dimorphism of Candida albicans. FEMS Microbiol Lett 1991;64(23):3336. [352] Wheeler RT, Fink GR. A drug-sensitive genetic network masks fungi from the immune system. PLoS Pathog 2006;2(4):e35. [353] Wheeler RT, et al. Dynamic, morphotype-specific Candida albicans beta-glucan exposure during infection and drug treatment. PLoS Pathog 2008;4(12):e1000227. [354] Singleton DR, et al. Contribution of cell surface hydrophobicity protein 1 (Csh1p) to virulence of hydrophobic Candida albicans serotype A cells. FEMS Microbiol Lett 2005;244(2):3737. [355] Cerenius L, et al. Proteolytic cascades and their involvement in invertebrate immunity. Trends Biochem Sci 2010;35(10):57583. [356] Ryan O, et al. Global gene deletion analysis exploring yeast filamentous growth. Science 2012;337 (6100):13536. [357] Cohen ML. Changing patterns of infectious disease. Nature 2000;406(6797):7627. [358] Fauci AS, Touchette NA, Folkers GK. Emerging infectious diseases: a 10-year perspective from the National Institute of Allergy and Infectious Diseases. Emerg Infect Dis 2005;11(4):51925. [359] Morens DM, Folkers GK, Fauci AS. The challenge of emerging and re-emerging infectious diseases. Nature 2004;430(6996):2429. [360] Pfaller MA, Pappas PG, Wingard JR. Invasive fungal pathogens: current epidemiological trends. Clin Infect Dis 2006;43(Suppl. 1):S314. [361] Whiteway M, Bachewich C. Morphogenesis in Candida albicans. Annu Rev Microbiol 2007;61:52953. [362] Zon LI, Peterson RT. In vivo drug discovery in the zebrafish. Nat Rev Drug Discov 2005;4(1):3544. [363] Lieschke GJ, Currie PD. Animal models of human disease: zebrafish swim into view. Nat Rev Genet 2007;8 (5):35367. [364] Trede NS, et al. The use of zebrafish to understand immunity. Immunity 2004;20(4):36779. [365] van der Sar AM, et al. A star with stripes: zebrafish as an infection model. Trends Microbiol 2004;12 (10):4517. [366] Kanther M, Rawls JF. Host-microbe interactions in the developing zebrafish. Curr Opin Immunol 2010;22 (1):1019. [367] Segata N, et al. Computational meta’omics for microbial community studies. Mol Syst Biol 2013;9:666. [368] Sturdevant DE, et al. Host-microbe interaction systems biology: lifecycle transcriptomics and comparative genomics. Future Microbiol 2010;5(2):20519. [369] Westermann AJ, Gorski SA, Vogel J. Dual RNA-seq of pathogen and host. Nat Rev Microbiol 2012;10 (9):61830. [370] Dyer MD, Murali TM, Sobral BW. Computational prediction of host-pathogen protein-protein interactions. Bioinformatics 2007;23(13):i15966. [371] Lee SA, et al. Ortholog-based protein-protein interaction prediction and its application to inter-species interactions. BMC Bioinformatics 2008;9(Suppl. 12):S11. [372] Shea PR, et al. Interactome analysis of longitudinal pharyngeal infection of cynomolgus macaques by group A Streptococcus. Proc Natl Acad Sci USA 2010;107(10):46938. [373] Reid AJ, Berriman M. Genes involved in host-parasite interactions can be revealed by their correlated expression. Nucleic Acids Res 2013;41(3):150818. [374] Chen YY, et al. Dynamic transcript profiling of Candida albicans infection in zebrafish: a pathogen-host interaction study. PLoS One 2013;8(9):e72483.
References
617
[375] Ostlund G, et al. InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res 2010;38:D196203 (Database issue). [376] Bradford Y, et al. ZFIN: enhancements and updates to the Zebrafish Model Organism Database. Nucleic Acids Res 2011;39:D8229 (Database issue). [377] Inglis DO, et al. The Candida genome database incorporates multiple Candida species: multispecies search and analysis tools with curated gene and protein information for Candida albicans and Candida glabrata. Nucleic Acids Res 2012;40:D66774 (Database issue). [378] Spooner R, Yilmaz O. The role of reactive-oxygen-species in microbial persistence and inflammation. Int J Mol Sci 2011;12(1):33452. [379] Schoonbroodt S, Piette J. Oxidative stress interference with the nuclear factor-kappa B activation pathways. Biochem Pharmacol 2000;60(8):107583. [380] Allen LA, et al. Helicobacter pylori disrupts NADPH oxidase targeting in human neutrophils to induce extracellular superoxide release. J Immunol 2005;174(6):365867. [381] Harada T, Miyake M, Imai Y. Evasion of Legionella pneumophila from the bactericidal system by reactive oxygen species (ROS) in macrophages. Microbiol Immunol 2007;51(12):116170. [382] Tsunawaki S, et al. Fungal metabolite gliotoxin inhibits asembly of the human respiratory burst NADPH oxidase. Infect Immun 2004;72(6):337382. [383] Wellington M, Dolan K, Krysan DJ. Live Candida albicans suppresses production of reactive oxygen species in phagocytes. Infect Immun 2009;77(1):40513. [384] Nakagawa Y, Kanbe T, Mizuguchi I. Disruption of the human pathogenic yeast Candida albicans catalase gene decreases survival in mouse-model infection and elevates susceptibility to higher temperature and to detergents. Microbiol Immunol 2003;47(6):395403. [385] Frohner IE, et al. Candida albicans cell surface superoxide dismutases degrade host-derived reactive oxygen species to escape innate immune surveillance. Mol Microbiol 2009;71(1):24052. [386] Sim S, et al. NADPH oxidase-derived reactive oxygen species-mediated activation of ERK1/2 is required for apoptosis of human neutrophils induced by Entamoeba histolytica. J Immunol 2005;174(7):427988. [387] Yang TC, et al. Japanese encephalitis virus down-regulates thioredoxin and induces ROS-mediated ASK1ERK/p38 MAPK activation in human promonocyte cells. Microbes Infect 2010;12(89):64351. [388] Williams B, et al. Tipping the balance: Sclerotinia sclerotiorum secreted oxalic acid suppresses host defenses by manipulating the host redox environment. PLoS Pathog 2011;7(6):e1002107. [389] Durmus Tekir SD, Ulgen KO. Systems biology of pathogen-host interaction: networks of protein-protein interaction within pathogens and pathogen-human interactions in the post-genomic era. Biotechnol J 2013;8(1):8596. [390] Tierney L, et al. An interspecies regulatory network inferred from simultaneous RNA-seq of Candida albicans invading innate immune cells. Front Microbiol 2012;3:85. [391] Barabasi AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet 2011;12(1):5668. [392] Pawson T, Linding R. Network medicine. FEBS Lett 2008;582(8):126670. [393] Davis FP, et al. Host pathogen protein interactions predicted by comparative modeling. Protein Sci 2007;16 (12):258596. [394] Berman J, Sudbery PE. Candida albicans: a molecular revolution built on lessons from budding yeast. Nat Rev Genet 2002;3(12):91830. [395] Naglik J, et al. Candida albicans proteinases and host/pathogen interactions. Cell Microbiol 2004;6 (10):91526. [396] Whiteway M, Oberholzer U. Candida morphogenesis and host-pathogen interactions. Curr Opin Microbiol 2004;7(4):3507. [397] Alarco AM, et al. Immune-deficient Drosophila melanogaster: a model for the innate immune response to human fungal pathogens. J Immunol 2004;172(9):56228. [398] Cotter G, Doyle S, Kavanagh K. Development of an insect model for the in vivo pathogenicity testing of yeasts. FEMS Immunol Med Microbiol 2000;27(2):1639. [399] Brothers KM, Newman ZR, Wheeler RT. Live imaging of disseminated candidiasis in zebrafish reveals role of phagocyte oxidase in limiting filamentous growth. Eukaryot Cell 2011;10(7):93244. [400] Kumar H, Kawai T, Akira S. Pathogen recognition by the innate immune system. Int Rev Immunol 2011;30 (1):1634.
618
References
[401] Ideker T, Krogan NJ. Differential network biology. Mol Syst Biol 2012;8:565. [402] Stark C, et al. The BioGRID interaction database: 2011 update. Nucleic Acids Res 2011;39:D698704 (Database issue). [403] Arnaud MB, et al. The Candida Genome Database (CGD), a community resource for Candida albicans gene and protein information. Nucleic Acids Res 2005;33:D35863 (Database issue). [404] Harris MA, et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 2004;32: D25861 (Database issue). [405] Dennis Jr. G, et al. DAVID: database for annotation, visualization, and integrated discovery. Genome Biol 2003;4(5):P3. [406] Chuang HY, et al. Network-based classification of breast cancer metastasis. Mol Syst Biol 2007;3:140. [407] Hyduke DR, Palsson BØ. Towards genome-scale signalling-network reconstructions. Nat Rev Genet 2010;11 (4):297307. [408] Pukkila-Worley R, et al. Candida albicans hyphal formation and virulence assessed using a Caenorhabditis elegans infection model. Eukaryot Cell 2009;8(11):17508. [409] Song Y, et al. Role of the RAM network in cell polarity and hyphal morphogenesis in Candida albicans. Mol Biol Cell 2008;19(12):545677. [410] Chandra J, et al. Biofilm formation by the fungal pathogen Candida albicans: development, architecture, and drug resistance. J Bacteriol 2001;183(18):538594. [411] Nobile CJ, et al. Critical role of Bcr1-dependent adhesins in C. albicans biofilm formation in vitro and in vivo. PLoS Pathog 2006;2(7):e63. [412] Wang YC, Tsai IC, Lin C, Hsieh WP, Lan CY, Chuang YJ, et al. Essential functional modules for pathogenic and defensive mechanisms in Candida albicans infections. BioMed Res Int 2014;2014:15. [413] Barsky A, et al. Cerebral: a Cytoscape plugin for layout of and interaction with biological networks using subcellular localization annotation. Bioinformatics 2007;23(8):10402. [414] Sohnle PG, Hahn BL, Santhanagopalan V. Inhibition of Candida albicans growth by calprotectin in the absence of direct contact with the organisms. J Infect Dis 1996;174(6):136971. [415] Hwang C-S, et al. Copper- and zinc-containing superoxide dismutase and its gene from Candida albicans. Biochim Biophys Acta 1999;1427(2):24555. [416] Hwang C-S, et al. Copper-and zinc-containing superoxide dismutase (Cu/ZnSOD) is required for the protection of Candida albicans against oxidative stresses and the expression of its full virulence. Microbiology 2002;148(11):370513. [417] Almeida RS, Wilson D, Hube B. Candida albicans iron acquisition within the host. FEMS Yeast Res 2009;9 (7):100012. [418] Fonzi WA. The protein secretory pathway of Candida albicans. Mycoses 2009;52(4):291303. [419] Schaller M, et al. Hydrolytic enzymes as virulence factors of Candida albicans. Mycoses 2005;48(6): 36577. [420] Sanglard D, et al. A triple deletion of the secreted aspartyl proteinase genes SAP4, SAP5, and SAP6 of Candida albicans causes attenuated virulence. Infect Immun 1997;65(9):353946. [421] Lorenz MC, Bender JA, Fink GR. Transcriptional response of Candida albicans upon internalization by macrophages. Eukaryot Cell 2004;3(5):107687. [422] Medzhitov R. Recognition of microorganisms and activation of the immune response. Nature 2007;449 (7164):81926. [423] Netea MG, et al. Immune sensing of Candida albicans requires cooperative recognition of mannans and glucans by lectin and Toll-like receptors. J Clin Invest 2006;116(6):1642. [424] Schaible UE, Kaufmann SH. Iron and microbial infection. Nat Rev Microbiol 2004;2(12):94653. [425] Heussler VT, Ku¨enzi P, Rottenberg S. Inhibition of apoptosis by intracellular protozoan parasites. Int J Parasitol 2001;31(11):116676. [426] Navarre WW, Zychlinsky A. Pathogen-induced apoptosis of macrophages: a common end for different pathogenic strategies. Cell Microbiol 2000;2(4):26573. [427] Dockrell D. The multiple roles of Fas ligand in the pathogenesis of infectious diseases. Clin Microbiol Infect 2003;9(8):76679. [428] Kim HS, et al. Expression of genes encoding innate host defense molecules in normal human monocytes in response to Candida albicans. Infect Immun 2005;73(6):371424.
References
619
[429] Poulain D, Jouault T. Candida albicans cell wall glycans, host receptors and responses: elements for a decisive crosstalk. Curr Opin Microbiol 2004;7(4):3429. [430] Ibata-Ombetta S, et al. Candida albicans phospholipomannan promotes survival of phagocytosed yeasts through modulation of bad phosphorylation and macrophage apoptosis. J Biol Chem 2003;278 (15):1308693. [431] Heidenreich S, et al. Infection by Candida albicans inhibits apoptosis of human monocytes and monocytic U937 cells. J Leukoc Biol 1996;60(6):73743. [432] Kim JM, et al. Apoptosis of human intestinal epithelial cells after bacterial invasion. J Clin Invest 1998;102 (10):1815. [433] Cahalan MD, Chandy KG. Ion channels in the immune system as targets for immunosuppression. Curr Opin Biotechnol 1997;8(6):74956. [434] Yu SP, Canzoniero LM, Choi DW. Ion homeostasis and apoptosis. Curr Opin Cell Biol 2001;13(4):40511. [435] Weinberg ED. Iron availability and infection. Biochim Biophys Acta 2009;1790(7):6005. [436] Kehl-Fie TE, Skaar EP. Nutritional immunity beyond iron: a role for manganese and zinc. Curr Opin Chem Biol 2010;14(2):21824. [437] Shankar AH, Prasad AS. Zinc and immune function: the biological basis of altered resistance to infection. Am J Clin Nutr 1998;68(2):447S63S. [438] Lulloff SJ, Hahn BL, Sohnle PG. Fungal susceptibility to zinc deprivation. J Lab Clin Med 2004;144 (4):20814. [439] Hu Y-h, Deng T, Sun L. The Rab1 GTPase of Sciaenops ocellatus modulates intracellular bacterial infection. Fish Shellfish Immunol 2011;31(6):100512. [440] Stenmark H, Olkkonen VM. The Rab GTPase family. Genome Biol 2001;2(5):S3007. [441] Couillault C, et al. TLR-independent control of innate immunity in Caenorhabditis elegans by the TIR domain adaptor protein TIR-1, an ortholog of human SARM. Nat Immunol 2004;5(5):48894. [442] Vidricaire G, Tremblay MJ. Rab5 and Rab7, but not ARF6, govern the early events of HIV-1 infection in polarized human placental cells. J Immunol 2005;175(10):651730. [443] Brumell JH, Scidmore MA. Manipulation of Rab GTPase function by intracellular bacterial pathogens. Microbiol Mol Biol Rev 2007;71(4):63652. [444] Naglik JR, Challacombe SJ, Hube B. Candida albicans secreted aspartyl proteinases in virulence and pathogenesis. Microbiol Mol Biol Rev 2003;67(3):40028. [445] Nicoli S, et al. Calcitonin receptor-like receptor guides arterial differentiation in zebrafish. Blood 2008;111 (10):496572. [446] Luttun A, Verhamme P. Keeping your vascular integrity: what can we learn from fish? Bioessays 2008;30 (5):41822. [447] Knoll R, et al. Laminin-alpha4 and integrin-linked kinase mutations cause human cardiomyopathy via simultaneous defects in cardiomyocytes and endothelial cells. Circulation 2007;116(5):51525. [448] Friedman SM, Berezney R, Weinstein IB. Fidelity in protein synthesis the role of the ribosome. J Biol Chem 1968;243(19):50448. [449] Shiraishi T, Matsuyama S, Kitano H. Large-scale analysis of network bistability for human cancers. PLoS Comput Biol 2010;6(7):e1000851. [450] Dobson A. Population dynamics of pathogens with multiple host species. Am Nat 2004;164(S5):S6478. [451] Faul M, Xu L, Wald MM, Coronado V, Dellinger AM. Traumatic brain injury in the United States: national estimates of prevalence and incidence, 20022006. Inj Prev 2010;16:A268. [452] Kizil C, Kaslin J, Kroehne V, Brand M. Adult neurogenesis and brain regeneration in zebrafish. Dev Neurobiol 2012;72:42961. [453] Tanaka EM, Ferretti P. Considering the evolution of regeneration in the central nervous system. Nat Rev Neurosci 2009;10:71323. [454] Horner PJ, Gage FH. Regenerating the damaged central nervous system. Nature 2000;407:96370. [455] Antos CL, Tanaka EM. Vertebrates that regenerate as models for guiding stem cells. Cell Biol Stem Cell 2010;695:184214. [456] Poss KD, Wilson LG, Keating MT. Heart regeneration in zebrafish. Science 2002;298:218890. [457] Guo Y, Ma L, Cristofanilli M, Hart RP, Hao A, et al. Transcription factor Sox11b is involved in spinal cord regeneration in adult zebrafish. Neuroscience 2011;172:32941.
620
References
[458] Qin Z, Barthel LK, Raymond PA. Genetic evidence for shared mechanisms of epimorphic regeneration in zebrafish. Proc Natl Acad Sci USA 2009;106:931015. [459] Craig SE, Calinescu AA, Hitchcock PF. Identification of the molecular signatures integral to regenerating photoreceptors in the retina of the zebra fish. J Ocul Biol Dis Infor 2008;1:7384. [460] McCurley AT, Callard GV. Time course analysis of gene expression patterns in zebrafish eye during optic nerve regeneration. J Exp Neurosci 2010;2010:1733. [461] Cameron DA, Gentile KL, Middleton FA, Yurco P. Gene expression profiles of intact and regenerating zebrafish retina. Mol Vis 2005;11:77591. [462] Kishimoto N, Shimizu K, Sawamoto K. Neuronal regeneration in a zebrafish model of adult brain injury. Dis Model Mech 2012;5:2009. [463] Ernst J, Bar-Joseph Z. STEM: a tool for the analysis of short time series gene expression data. BMC Bioinformatics 2006;7:191. [464] Mi H, Muruganujan A, Thomas PD. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res 2013;41:D37786. [465] Levin ED, Cerutti DT. Behavioral neuroscience of zebrafish. In: Buccafusco JJ, editor. Methods of behavior analysis in neuroscience. 2nd ed. Boca Raton, FL: CRC Press/Taylor & Francis; 2009. [466] Kalueff AV, Gebhardt M, Stewart AM, Cachat JM, Brimmer M, et al. Towards a comprehensive catalog of zebrafish behavior 1.0 and beyond. Zebrafish 2013;10:7086. [467] Croft D, O’Kelly G, Wu GM, Haw R, Gillespie M, et al. Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res 2011;39:D6917. [468] Hui SP, Sengupta D, Lee SG, Sen T, Kundu S, et al. Genome wide expression profiling during spinal cord regeneration identifies comprehensive cellular responses in zebrafish. PLoS One 2014;9:e84212. [469] Singer AJ, Clark RAF. Mechanisms of disease cutaneous wound healing. N Engl J Med 1999;341:73846. [470] Stadelmann WK, Digenis AG, Tobin GR. Physiology and healing dynamics of chronic cutaneous wounds. Am J Surg 1998;176:26s38s. [471] Mechoulam R, Panikashvili D, Shohami E. Cannabinoids and brain injury: therapeutic implications. Trends Mol Med 2002;8:5861. [472] Shohami E, Cohen-Yeshurun A, Magid L, Algali M, Mechoulam R. Endocannabinoids and traumatic brain injury. Br J Pharmacol 2011;163:140210. [473] Rajaram MVS, Ganesan LP, Parsa KVL, Butchar JP, Gunn JS, et al. Akt/protein kinase B modulates macrophage inflammatory response to Francisella infection and confers a survival advantage in mice. J Immunol 2006;177:631724. [474] Neves SR, Ram PT, Iyengar R. G protein pathways. Science 2002;296:16369. [475] Lin MT, Beal MF. Mitochondrial dysfunction and oxidative stress in neurodegenerative diseases. Nature 2006;443:78795. [476] Lukacs NW. Role of chemokines in the pathogenesis of asthma. Nat Rev Immunol 2001;1:10816. [477] Vicente-Manzanares M, Sancho D, Yanez-Mo M, Sanchez-Madrid F. The leukocyte cytoskeleton in cell migration and immune interactions. Int Rev Cytol 2002;216:23389. [478] Ridley AJ. Rho GTPases and cell migration. J Cell Sci 2001;114:271322. [479] Giancotti FG, Ruoslahti E. Transduction integrin signaling. Science 1999;285:102832. [480] Jiang W, Zhang Y, Xiao L, Van Cleemput J, Ji SP, et al. Cannabinoids promote embryonic and adult hippocampus neurogenesis and produce anxiolytic- and antidepressant-like effects. J Clin Invest 2005;115:310416. [481] Martins RAP, Pearson RA. Control of cell proliferation by neurotransmitters in the developing vertebrate retina. Brain Res 2008;1192:3760. [482] Berg DA, Belnoue L, Song HJ, Simon A. Neurotransmitter-mediated control of neurogenesis in the adult vertebrate brain. Development 2013;140:254861. [483] Qiu YH, Peng YP, Wang JH. Immunoregulatory role of neurotransmitters. Adv Neuroimmunol 1996;6:22331. [484] Blalock JE. Production of peptide hormones and neurotransmitters by the immune system. Chem Immunol 1992;52:124. [485] Webster JI, Tonelli L, Sternberg EM. Neuroendocrine regulation of immunity. Annu Rev Immunol 2002;20:12563.
References
621
[486] Guingab-Cagmat JD, Cagmat EB, Hayes RL, Anagli J. Integration of proteomics, bioinformatics, and systems biology in traumatic brain injury biomarker discovery. Front Neurol 2013;4:61. [487] Neselius S, Brisby H, Theodorsson A, Blennow K, Zetterberg H, et al. CSF-biomarkers in Olympic Boxing: diagnosis and effects of repetitive head trauma. PLoS One 2012;7. [488] Kyritsis N, Kizil C, Zocher S, Kroehne V, Kaslin J, et al. Acute inflammation initiates the regenerative response in the adult zebrafish brain. Science 2012;338:13536. [489] DeKosky ST, Blennow K, Ikonomovic MD, Gandy S. Acute and chronic traumatic encephalopathies: pathogenesis and biomarkers. Nat Rev Neurol 2013;9:192200. [490] Feala JD, Abdulhameed MD, Yu C, Dutta B, Yu X, et al. Systems biology approaches for discovering biomarkers for traumatic brain injury. J Neurotrauma 2013;30:110116. [491] Basu S, Dasgupta PS. Dopamine, a neurotransmitter, influences the immune system. J Neuroimmunol 2000;102:11324. [492] Piomelli D. The molecular logic of endocannabinoid signalling. Nat Rev Neurosci 2003;4:87384. [493] Wu CC, Tsai TH, Chang C, Lee TT, Lin C, Cheng IHJ, et al. On the crucial cerebellar wound healing-related pathways and their cross-talks after traumatic brain injury in Danio rerio. PLoS One 2014;9(6):e97902. [494] Iosif RE, Ekdahl CT, Ahlenius H, Pronk CJH, Bonde S, Kokaia Z, et al. Tumor necrosis factor receptor 1 is a negative regulator of progenitor proliferation in adult hippocampal neurogenesis. J Neurosci 2006;26:970312. [495] Shi WC, Fang ZB, Li L, Luo LF. Using zebrafish as the model organism to understand organ regeneration. Sci China Life Sci 2015;58:34351. [496] Stamova B, Jickling GC, Ander BP, Zhan XH, Liu DZ, Turner R, et al. Gene expression in peripheral immune cells following cardioembolic stroke is sexually dimorphic. PLoS One 2014;9:e102550. [497] Macrez R, Ali C, Toutirais O, Le Mauff B, Defer G, Dirnagl U, et al. Stroke and the immune system: from pathophysiology to new therapeutic strategies. Lancet Neurol 2011;10:47180. [498] Iadecola C, Anrather J. The immunology of stroke: from mechanisms to translation. Nat Med 2011;17:796808. [499] Walsh JG, Muruve DA, Power C. Inflammasomes in the CNS. Nat Rev Neurosci 2014;15:8497. [500] Fann DYW, Lee SY, Manzanero S, Chunduri P, Sobey CG, Arumugam TV. Pathogenesis of acute stroke and the role of inflammasomes. Ageing Res Rev 2013;12:94166. [501] Chen BS, Wu CC. Systems biology: an integrated platform for bioinformatics, systems synthetic biology and systems metabolic engineering. New York: Nova Publishers; 2014. p. 4763. [502] Coucha M, Li WG, Ergul A. The effect of endothelin receptor A antagonism on basilar artery endotheliumdependent relaxation after ischemic stroke. Life Sci 2012;91:67680. [503] Shahpouri MM, Mousavi SA, Khorvash F, Mousavi SM, Hoseini T. Anticoagulant therapy for ischemic stroke: a review of literature. J Res Med Sci 2012;17:396401. [504] Swardfager W, Winer DA, Herrmann N, Winer S, Lanctot KL. Interleukin-17 in post-stroke neurodegeneration. Neurosci Biobehav Rev 2013;37:43647. [505] Iadecola C, Gorelick PB. Hypertension, angiotensin, and stroke: beyond blood pressure. Stroke 2004;35:34850. [506] Wolf WA, Martin JL, Kartje GL, Farrer RG. Evidence for fibroblast growth factor-2 as a mediator of amphetamine-enhanced motor improvement following stroke. PLoS One 2014;9:e108031. [507] Issa R, AlQteishat A, Mitsios N, Saka M, Krupinski J, Tarkowski E, et al. Expression of basic fibroblast growth factor mRNA and protein in the human brain following ischaemic stroke. Angiogenesis 2005;8:5362. [508] Heidecker B, Lamirault G, Kasper EK, Wittstein IS, Champion HC, Breton E, et al. The gene expression profile of patients with new-onset heart failure reveals important gender-specific differences. Eur Heart J 2010;31:118896. [509] Atkin G, Paulson H. Ubiquitin pathways in neurodegenerative disease. Front Mol Neurosci 2014;7:63. [510] Brott T, Broderick J, Kothari R, ODonoghue M, Barsan W, Tomsick T, et al. Intracerebral hemorrhage after intravenous t-PA therapy for ischemic stroke. Stroke 1997;28:210918. [511] Marchetti B, Pluchino S. Wnt your brain be inflamed? Yes, it wnt!. Trends Mol Med 2013;19:14456. [512] Su EJ, Fredriksson L, Geyer M, Folestad E, Cale J, Andrae J, et al. Activation of PDGF-CC by tissue plasminogen activator impairs blood-brain barrier integrity during ischemic stroke. Nat Med 2008;14:7317.
622
References
[513] Prentice RL, Paczesny SJ, Aragaki A, Amon LM, Chen L, Pitteri SJ, et al. Novel proteins associated with risk for coronary heart disease or stroke among postmenopausal women identified by in-depth plasma proteome profiling. Genome Med 2010;2:48. [514] Shen N, Yan Z, He P. A study of the hereditary susceptibility of HLA-DQA1 to essential hypertension, athrothrombotic brain infarction and lacunar stroke. Zhonghua Yi Xue Za Zhi 2001;81:3525. [515] Madden B, Chebl RB. Hemi orolingual angioedema after TPA administration for acute ischemic stroke. West J Emerg Med 2015;16:1757. [516] Ayala P, Uchida M, Akiyoshi K, Cheng J, Hashimoto J, Jia TP, et al. Androgen receptor overexpression is neuroprotective in experimental stroke. Transl Stroke Res 2011;2:34657. [517] Arboleda-Velasquez JF, Zhou Z, Shin HK, Louvi A, Kim HH, Savitz SI, et al. Linking Notch signaling to ischemic stroke. Proc Natl Acad Sci USA 2008;105:485661. [518] Marumo T, Takagi Y, Muraki K, Hashimoto N, Miyamoto S, Tanigaki K. Notch signaling regulates nucleocytoplasmic olig2 translocation in reactive astrocytes differentiation after ischemic stroke. Neurosci Res 2013;75:2049. [519] Hankey GJ, Eikelboom JW. Homocysteine and stroke. Curr Opin Neurol 2001;14:95102. [520] Hwang JY, Aromolaran KA, Zukin RS. Epigenetic mechanisms in stroke and epilepsy. Neuropsychopharmacology 2013;38:16782. [521] Hsu SD, Lin FM, Wu WY, Liang C, Huang WC, Chan WL, et al. Mirtarbase: a database curates experimentally validated microRNAtarget interactions. Nucleic Acids Res 2011;39:D1639. [522] Koutsis G, Siasos G, Spengos K. The emerging role of microrna in stroke. Curr Top Med Chem 2013;13:157388. [523] Qureshi IA, Mehler MF. Emerging role of epigenetics in stroke part 1: DNA methylation and chromatin modifications. Arch Neurol (Chicago) 2010;67:131622. [524] Satish L, Gallo PH, Baratz ME, Johnson S, Kathju S. Reversal of TGF-β1 stimulation of α-smooth muscle actin and extracellular matrix components by cyclic amp in dupuytren’s-derived fibroblasts. BMC Musculoskelet Disord 2011;12:113. [525] Xie LL, Weichel B, Ohm JE, Zhang K. An integrative analysis of DNA methylation and RNA-seq data for human heart, kidney and liver. BMC Syst Biol 2011;5:S4. [526] Flanagan BF, Wotton D, Soong TW, Owen MJ. DNase hypersensitivity and methylation of the human CD3G and D-genes during T-cell development. Immunogenetics 1990;31:1320. [527] Aran D, Sabato S, Hellman A. DNA methylation of distal regulatory sites characterizes dysregulation of cancer genes. Genome Biol 2013;14:R21. [528] Cordova-Palomera A, Fatjo-Vilas M, Palma-Gudiel H, Blasco-Fontecilla H, Kebir O, Fananas L. Further evidence of depdc7 DNA hypomethylation in depression: a study in adult twins. Eur Psychiat 2015;30:71518. [529] Onaga Y, Ido A, Uto H, Hasuike S, Kusumoto K, Moriuchi A, et al. Hypermethylation of the wild-type ferrochelatase allele is closely associated with severe liver complication in a family with erythropoietic protoporphyria. Biochem Biophys Res Commun 2004;321:8518. [530] Majumder P, Boss JM. DNA methylation dysregulates and silences the HLA-DQ locus by altering chromatin architecture. Genes Immun 2011;12:2919. [531] Shiina T, Hosomichi K, Inoko H, Kulski JK. The HLA genomic loci map: expression, interaction, diversity and disease. J Hum Genet 2009;54:1539. [532] Asatiani E, Huang WX, Wang A, Ortner ER, Cavalli LR, Haddad BR, et al. Deletion, methylation, and expression of the NKX3.1 suppressor gene in primary human prostate cancer. Cancer Res 2005;65:116473. [533] Bader GD, Betel D, Hogue CWV. Bind: the biomolecular interaction network database. Nucleic Acids Res 2003;31:24850. [534] Xenarios I, Salwinski L, Duan XQJ, Higney P, Kim SM, Eisenberg D. Dip, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res 2002;30:3035. [535] Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 2003;13:236371. [536] Brown KR, Jurisica I. Online predicted human interaction database. Bioinformatics 2005;21:207682.
References
623
[537] Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, Broackes-Carter F, et al. The mintact project-intact as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res 2014;42:D35863. [538] Licata L, Briganti L, Peluso D, Perfetto L, Iannuccelli M, Galeota E, et al. Mint, the molecular interaction database: 2012 update. Nucleic Acids Res 2012;40:D85761. [539] McDowall MD, Scott MS, Barton GJ. Pips: human protein-protein interaction prediction database. Nucleic Acids Res 2009;37:D6516. [540] Croft D, Mundo AF, Haw R, Milacic M, Weiser J, Wu GM, et al. The reactome pathway knowledgebase. Nucleic Acids Res 2014;42:D4727. [541] Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, et al. String v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 2013;41:D80815. [542] Mi HY, Muruganujan A, Casagrande JT, Thomas PD. Large-scale gene function analysis with the panther classification system. Nat Protoc 2013;8:155166. [543] Ren K, Dubner R. Interactions between the immune and nervous systems in pain. Nat Med 2010;16 (11):126776. [544] Capuron L, Miller AH. Immune system to brain signaling: neuropsychopharmacological implications. Pharmacol Ther 2011;130(2):22638. [545] Ransohoff RM, Brown MA. Innate immunity in the central nervous system. J Clin Invest 2012;122 (4):116471. [546] Nataf S. The sensory immune system: a neural twist to the antigenic discontinuity theory. Nat Rev Immunol 2014;14(4). [547] Sekirov SL, Russell LCM, Antunes, Finlay BB. Gut microbiota in health and disease. Physiological Rev 2010;90(3):859904. [548] Esplugues E, Huber S, Gagliani N, Hauser AE, Town T, Wan YSY, et al. Control of T(H)17 cells occurs in the small intestine. Nature 2011;475(7357):514114. [549] Lee WJ, Hase K. Gut microbiota-generated metabolites in animal health and disease. Nat Chem Biol 2014;10 (6):41624. [550] Sorokin L. The impact of the extracellular matrix on inflammation. Nat Rev Immunol 2010;10(10):71223. [551] Pulendran B, Ahmed R. Immunological mechanisms of vaccination. Nat Immunol 2011;12(6):50917. [552] Romani L. Immunity to fungal infections. Nat Rev Immunol 2011;11(4):27588. [553] Tierney L, Kuchler K, Rizzetto L, Cavalieri D. Systems biology of host-fungus interactions: turning complexity into simplicity. Curr Opin Microbiol 2012;15(4):4406. [554] Schmidt F, Volker U. Proteome analysis of host-pathogen interactions: investigation of pathogen responses to the host cell environment. Proteomics 2011;11(15):320311. [555] Arnold R, Boonen K, Sun MGF, Kim PM. Computational analysis of interactomes: current and future perspectives for bioinformatics approaches to model the host-pathogen interaction space. Methods 2012;57 (4):50818. [556] Meijer AH, Spaink HP. Host-pathogen interactions made transparent with the zebrafish model. Curr Drug Targets 2011;12(7):100017. [557] Gratacap RL, Wheeler RT. Utilization of zebrafish for intravital study of eukaryotic pathogen-host interactions. Dev Comp Immunol 2014;46(1):10815. [558] Schier AF. Genomics: zebrafish earns its stripes. Nature 2013;496(7446):4434. [559] Yasui N, Findlay GM, Gish GD, Hsiung MS, Huang J, Tucholska M, et al. Directed network wiring identifies a key protein interaction in embryonic stem cell differentiation. Mol Cell 2014;54(6):103441. [560] Gerold G, Abu Ajaj K, Bienert M, Laws HJ, Zychlinsky A, de Diego JL. A Toll-like receptor 2-integrin beta (3) complex senses bacterial lipopeptides via vitronectin. Nat Immunol 2008;9(7):7618. [561] Wight TN, Kang I, Merrilees MJ. Versican and the control of inflammation. Matrix Biol 2014;35:15261. [562] Massberg S, Grahl L, von Bruehl ML, Manukyan D, Pfeiler S, Goosmann C, et al. Reciprocal coupling of coagulation and innate immunity via neutrophil serine proteases. Nat Med 2010;16(8):88787. [563] Tartour E, Pere H, Maillere B, Terme M, Merillon N, Taieb J, et al. Angiogenesis and immunity: a bidirectional link potentially relevant for the monitoring of antiangiogenic therapy and the development of novel therapeutic combination with immunotherapy. Cancer Metastasis Rev 2011;30(1):8395. [564] Nathan C, Cunningham-Bussel A. Beyond oxidative stress: an immunologist’s guide to reactive oxygen species. Nat Rev Immunol 2013;13(5):34961.
624
References
[565] Curtis AM, Bellet MM, Sassone-Corsi P, O’Neill LAJ. Circadian clock proteins and immunity. Immunity 2014;40(2):17886. [566] Scaldaferri F, Vetrano S, Sans M, Arena V, Straface G, Stigliano E, et al. VEGF-A links angiogenesis and inflammation in inflammatory bowel disease pathogenesis. Gastroenterology 2009;136(2):58595. [567] Yang H, Ko HJ, Yang JY, Kim JJ, Seo SU, Park SG, et al. Interleukin-1 promotes coagulation, which is necessary for protective immunity in the lung against Streptococcus pneumoniae infection. J Infect Dis 2013;207 (1):5060. [568] Wang YC, Chen BS. A network-based biomarker approach for molecular investigation and diagnosis of lung cancer. BMC Med Genomics 2011;4. [569] Ashida H, Mimuro H, Ogawa M, Kobayashi T, Sanada T, Kim M, et al. Host-pathogen interactions cell death and infection: a double-edged sword for host and pathogen survival. J Cell Biol 2011;195(6):93142. [570] Hawn TR, Shah JA, Kalman D. New tricks for old dogs: countering antibiotic resistance in tuberculosis with host-directed therapeutics. Immunol Rev 2015;264(1):34462. [571] Tong SYC, Chen LF, Fowler VG. Colonization, pathogenicity, host susceptibility, and therapeutics for Staphylococcus aureus: what is the clinical relevance? SemImmunopathology 2012;34(2):185200. [572] Spaan AN, Surewaard BGJ, Nijland R, van Strijp JAG. Neutrophils versus Staphylococcus aureus: a biological tug of war. Annu Rev Microbiol 2013;67(67):629 1 . [573] Yang DR, Zhu HZ. Hepatitis C virus and antiviral innate immunity: who wins at tug-of-war? World J Gastroenterol 2015;21(13):3786800. [574] Mantovani A, Cassatella MA, Costantini C, Jaillon S. Neutrophils in the activation and regulation of innate and adaptive immunity. Nat Rev Immunol 2011;11(8):51931. [575] Johnson EE, Wessling-Resnick M. Iron metabolism and the innate immune response to infection. Microbes Infect 2012;14(3):20716. [576] Kaba HEJ, Nimtz M, Muller PP, Bilitewski U. Involvement of the mitogen activated protein kinase Hog1p in the response of Candida albicans to iron availability. BMC Microbiol 2013;13. [577] Basson NJ. Competition for glucose between Candida albicans and oral bacteria grown in mixed culture in a chemostat. J Med Microbiol 2000;49(11):96975. [578] Maidan MM, Thevelein JM, Van Dijck P. Carbon source induced yeast-to-hypha transition in Candida albicans is dependent on the presence of amino acids and on the G-protein-coupled receptor Gpr1. Biochem Soc Trans 2005;33:2913. [579] Sabina J, Brown V. Glucose Sensing Network in Candida albicans: a Sweet Spot for Fungal Morphogenesis. Eukaryot Cell 2009;8(9):131420. [580] Rodaki A, Bohovych IM, Enjalbert B, Young T, Odds FC, Gow NAR, et al. Glucose promotes stress resistance in the fungal pathogen Candida albicans. Mol Biol Cell 2009;20(22):484555. [581] Brown V, Sexton JA, Johnston M. A glucose sensor in Candida albicans. Eukaryot Cell 2006;5(10):172637. [582] Nitzan M, Fechter P, Peer A, Altuvia Y, Bronesky D, Vandenesch F, et al. A defense-offense multi-layered regulatory switch in a pathogenic bacterium. Nucleic Acids Res 2015;43(3):135769. [583] Chasman D, Ho YH, Berry DB, Nemec CM, MacGilvray ME, Hose J, et al. Pathway connectivity and signaling coordination in the yeast stress-activated signaling network. Mol Syst Biol 2014;10(11). [584] Chatr-aryamontri A, Breitkreutz BJ, Oughtred R, Boucher L, Heinicke S, Chen DC, et al. The BioGRID interaction database: 2015 update. Nucleic Acids Res 2015;43(D1):D4708. [585] Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al. STRING v10: proteinprotein interaction networks, integrated over the tree of life. Nucleic Acids Res 2015;43(D1):D44752. [586] Vogel C, Marcotte EM. Insights into the regulation of protein abundance from proteomic and transcriptomic analyses. Nat Rev Genet 2012;13(4):22732. [587] Ferro VA, Costa R, Carter KC, Harvey MJA, Waterston MM, Mullen AB, et al. Immune responses to a GnRH-based anti-fertility immunogen, induced by different adjuvants and subsequent effect on vaccine efficacy. Vaccine 2004;22(8):102431. [588] Kannarkat GT, Boss JM, Tansey MG. The role of innate and adaptive immunity in Parkinson’s disease. J Parkinsons Dis 2013;3(4):493514. [589] Molnar E, Swamy M, Holzer M, Beck-Garcia K, Worch R, Thiele C, et al. Cholesterol and sphingomyelin drive ligand-independent T-cell antigen receptor nanoclustering. J Biol Chem 2012;287(51):4266474. [590] Besedovsky L, Lange T, Born J. Sleep and immune function. Pflugers Arch 2012;463(1):12137.
References
625
[591] Frantz S, Vincent KA, Feron O, Kelly RA. Innate immunity and angiogenesis. Circ Res 2005;96(1):1526. [592] Gow NAR, van de Veerdonk FL, Brown AJP, Netea MG. Candida albicans morphogenesis and host defence: discriminating invasion from colonization. Nat Rev Microbiol 2012;10(2):11222. [593] Leach MD, Stead DA, Argo E, Brown AJP. Identification of sumoylation targets, combined with inactivation of SMT3, reveals the impact of sumoylation upon growth, morphology, and stress resistance in the pathogen Candida albicans. Mol Biol Cell 2011;22(5):687702. [594] Wu CC, Chen BS. Crosstalk network biomarkers of a pathogen-host interaction difference network from innate to adaptive immunity: C. albicans-zebrafish infection model. Biomed Eng Syst Technol, BIOSTEC 2015;574:190205. [595] Brown HM, Knowlton AE, Grieshaber SS. Chlamydial infection induces host cytokinesis failure at abscission. Cell Microbiol 2012;14(10):155467. [596] Brown JN, Palermo RE, Baskin CR, Gritsenko M, Sabourin PJ, Long JP, et al. Macaque proteome response to highly pathogenic avian influenza and 1918 reassortant influenza virus infections. J Virol 2010;84 (22):1205868. [597] Gao Y, Zhou YJ, Xie B, Zhang SF, Rahmeh A, Huang HS, et al. Protein phosphatase-1 is targeted to DNA polymerase delta via an interaction with the p68 subunit. Biochemistry 2008;47(43):1136776. [598] Hasnain SE, Begum R, Ramaiah KVA, Sahdev S, Shajil EM, Taneja TK, et al. Host-pathogen interactions during apoptosis. J Biosci 2003;28(3):34958. [599] Hamza TH, Zabetian CP, Tenesa A, Laederach A, Montimurro J, Yearout D, et al. Common genetic variation in the HLA region is associated with late-onset sporadic Parkinson’s disease. Nat Genet 2010;42 (9):7815. [600] Sun X, Deng J, Liu TC, Borjigin J. Circadian 5-HT production regulated by adrenergic signaling. Proc Natl Acad Sci USA 2002;99(7):468691. [601] Ciarleglio CM, Resuehr HES, McMahon DG. Interactions of the serotonin and circadian systems: nature and nurture in rhythms and blues. Neuroscience 2011;197:816. [602] Scheiermann C, Kunisaki Y, Lucas D, Chow A, Jang JE, Zhang D, et al. Adrenergic nerves govern circadian leukocyte recruitment to tissues. Immunity 2012;37(2):290301. [603] Buckley CD, Gilroy DW, Serhan CN, Stockinger B, Tak PP. The resolution of inflammation. Nat Rev Immunol 2013;13(1):5966. [604] Rolf J, Zarrouk M, Finlay DK, Foretz M, Viollet B, Cantrell DA. AMPKα1: a glucose sensor that controls CD8 T-cell memory. Eur J Immunol 2013;43(4):88996. [605] Netea MG, Latz E, Mills KH, O’Neill LA. Innate immune memory: a paradigm shift in understanding host defense. Nat Immunol 2015;16(7):6759. [606] Louveau A, Smirnov I, Keyes TJ, Eccles JD, Rouhani SJ, Peske JD, et al. Structural and functional features of central nervous system lymphatic vessels. Nature 2015;523 Advance online publication. [607] Du¨hring S, Germerodt S, Skerka C, Zipfel PF, Dandekar T, Schuster S. Host-pathogen interactions between the human innate immune system and Candida albicans—understanding and modeling defense and evasion strategies. Front Microbiol 2015;6:625 10.3389/fmicb.2015.00625. [608] Bevan MJ. Understand memory, design better vaccines. Nat Immunol 2011;12(6):4635. [609] Medzhitov R, Janeway Jr. CA. Innate immunity: impact on the adaptive immune response. Curr Opin Immunol 1997;9(1):49. [610] Akira S, Uematsu S, Takeuchi O. Pathogen recognition and innate immunity. Cell 2006;124(4):783801. [611] Mavor AL, Thewes S, Hube B. Systemic fungal infections caused by Candida species: epidemiology, infection process and virulence attributes. Curr Drug Targets 2005;6(8):86374. [612] Wan YY, Flavell RA. ‘Yin-Yang’ functions of transforming growth factor-beta and T regulatory cells in immune regulation. Immunol Rev 2007;220:199213. [613] Malhotra N, Kang J. SMAD regulatory networks construct a balanced immune system. Immunology 2013;139(1):110. [614] Heldin CH, Miyazono K, tenDijke P. TGF-beta signalling from cell membrane to nucleus through SMAD proteins. Nature 1997;390(6659):46571. [615] Sideras P, et al. Activin, neutrophils, and inflammation: just coincidence? Semin Immunopathol 2013;35 (4):48199. [616] Brandes U. A faster algorithm for betweenness centrality. J Math Sociol 2001;25(2):16377.
626
References
[617] Kloetzel PM. The proteasome and MHC class I antigen processing. Biochim Biophys Acta 2004;1695 (13):22533. [618] Gao Z, Shao Y, Jiang X. Essential roles of the Bcl-2 family of proteins in caspase-2-induced apoptosis. J Biol Chem 2005;280(46):382715. [619] Singh CR, et al. Cutting edge: nicastrin and related components of gamma-secretase generate a peptide epitope facilitating immune recognition of intracellular mycobacteria, through MHC class II-dependent priming of T cells. J Immunol 2011;187(11):54959. [620] Luisi S, et al. Expression and secretion of activin A: possible physiological and clinical implications. Eur J Endocrinol 2001;145(3):22536. [621] Malaviya R, Abraham SN. Mast cell modulation of immune responses to bacteria. Immunol Rev 2001;179:1624. [622] Ogawa K, Funaba M, Tsujimoto M. A dual role of activin A in regulating immunoglobulin production of B cells. J Leukoc Biol 2008;83(6):14518. [623] Joly AL, et al. Dual role of heat shock proteins as regulators of apoptosis and innate immunity. J Innate Immun 2010;2(3):23847. [624] Pandey P, et al. Negative regulation of cytochrome c-mediated oligomerization of Apaf-1 and activation of procaspase-9 by heat shock protein 90. EMBO J 2000;19(16):431022. [625] Nishikawa M, Takemoto S, Takakura Y. Heat shock protein derivatives for delivery of antigens to antigen presenting cells. Int J Pharm 2008;354(12):237. [626] Zhang S, et al. Smad7 antagonizes transforming growth factor beta signaling in the nucleus by interfering with functional Smad-DNA complex formation. Mol Cell Biol 2007;27(12):448899. [627] Yoshimura A, Wakabayashi Y, Mori T. Cellular and molecular basis for the regulation of inflammation by TGF-beta. J Biochem 2010;147(6):78192. [628] Yoshimura A, Muto G. TGF-beta function in immune suppression. Curr Top Microbiol Immunol 2011;350:12747. [629] Wang J, Maldonado MA. The ubiquitin-proteasome system and its role in inflammatory and autoimmune diseases. Cell Mol Immunol 2006;3(4):25561. [630] Budihardjo I, et al. Biochemical pathways of caspase activation during apoptosis. Annu Rev Cell Dev Biol 1999;15:26990. [631] Yang J, et al. Prevention of apoptosis by Bcl-2: release of cytochrome c from mitochondria blocked. Science 1997;275(5303):112932. [632] Schuster N, Krieglstein K. Mechanisms of TGF-beta-mediated apoptosis. Cell Tissue Res 2002;307(1):114. [633] Hayashi H, et al. The MAD-related protein Smad7 associates with the TGFbeta receptor and functions as an antagonist of TGFbeta signaling. Cell 1997;89(7):116573. [634] Moore KJ, Matlashewski G. Intracellular infection by Leishmania donovani inhibits macrophage apoptosis. J Immunol 1994;152(6):29307. [635] Keane J, et al. Infection by Mycobacterium tuberculosis promotes human alveolar macrophage apoptosis. Infect Immun 1997;65(1):298304. [636] Dye C, Scheele S, Dolin P, Pathania V, Raviglione RC, Project WGSM. Global burden of tuberculosis estimated incidence, prevalence, and mortality by country. J Am Med Assoc 1999;282:67786. [637] Vynnycky E, Fine PE. Lifetime risks, incubation period, and serial interval of tuberculosis. Am J Epidemiol 2000;152:24763. [638] Koch R. The etiology of tuberculosis. Rev Infect Dis 1982;4:12704. [639] Sallusto F, Cella M, Danieli C, Lanzavecchia A. Dendritic cells use macropinocytosis and the mannose receptor to concentrate macromolecules in the major histocompatibility complex class-II compartment down-regulation by cytokines and bacterial products. J Exp Med 1995;182:389400. [640] Geijtenbeek TB, Krooshoop DJ, Bleijs DA, Van Vliet SJ, Van Duijnhoven GC, Grabovsky V, et al. DC-SIGNICAM-2 interaction mediates dendritic cell trafficking. Nat Immunol 2000;1:3537. [641] Rescigno M, Granucci F, Ricciardi-Castagnoli P. Molecular events of bacterial-induced maturation of dendritic cells. J Clin Immunol 2000;20:1616. [642] Albert ML, Pearce SF, Francisco LM, Sauter B, Roy P, Silverstein RL, et al. Immature dendritic cells phagocytose apoptotic cells via alphavbeta5 and CD36, and cross-present antigens to cytotoxic T lymphocytes. J Exp Med 1998;188:135968.
References
627
[643] Visintin A, Mazzoni A, Spitzer JH, Wyllie DH, Dower SK, Segal DM. Regulation of Toll-like receptors in human monocytes and dendritic cells. J Immunol 2001;166:24955. [644] Lipscomb MF, Masten BJ. Dendritic cells: immune regulators in health and disease. Physiol Rev 2002;82:97130. [645] Kaufmann SHE, Schaible UE. A dangerous liaison between two major killers: Mycobacterium tuberculosis and HIV target dendritic cells through DC-SIGN. J Exp Med 2003;197:15. [646] Means TK, Wang S, Lien E, Yoshimura A, Golenbock DT, Fenton MJ. Human Toll-like receptors mediate cellular activation by Mycobacterium tuberculosis. J Immunol 1999;163:39207. [647] Chan J, Xing Y, Magliozzo RS, Bloom BR. Killing of virulent Mycobacterium tuberculosis by reactive nitrogen intermediates produced by activated murine macrophages. J Exp Med 1992;175:111122. [648] Garcia I, Guler R, Vesin D, Olleros ML, Vassalli P, Chvatchko Y, et al. Lethal Mycobacterium bovis bacillus Calmette Guerin infection in nitric oxide synthase 2-deficient mice: cell-mediated immunity requires nitric oxide synthase 2. Lab Invest 2000;80:138597. [649] Scanga CA, Mohan VP, Joseph H, Yu K, Chan J, Flynn JL. Reactivation of latent tuberculosis: variations on the Cornell murine model. Infect Immun 1999;67:45318. [650] Lin PL, Myers A, Smith L, Bigbee C, Bigbee M, Fuhrman C, et al. Tumor necrosis factor neutralization results in disseminated disease in acute and latent Mycobacterium tuberculosis infection with normal granuloma structure in a cynomolgus macaque model. Arthritis Rheum 2010;62:34050. [651] Tsao TC, Hong J, Huang C, Yang P, Liao SK, Chang KS. Increased TNF-alpha, IL-1 beta and IL-6 levels in the bronchoalveolar lavage fluid with the upregulation of their mRNA in macrophages lavaged from patients with active pulmonary tuberculosis. Tuber Lung Dis 1999;79:27985. [652] Cooper AM, Magram J, Ferrante J, Orme IM. Interleukin 12 (IL-12) is crucial to the development of protective immunity in mice intravenously infected with Mycobacterium tuberculosis. J Exp Med 1997;186:3945. [653] Flynn JL, Chan J, Triebold KJ, Dalton DK, Stewart TA, Bloom BR. An essential role for interferon gamma in resistance to Mycobacterium tuberculosis infection. J Exp Med 1993;178:224954. [654] Noss EH, Pai RK, Sellati TJ, Radolf JD, Belisle J, Golenbock DT, et al. Toll-like receptor 2-dependent inhibition of macrophage class II MHC expression and antigen processing by 19-kDa lipoprotein of Mycobacterium tuberculosis. J Immunol 2001;167:91018. [655] Sugawara I, Yamada H, Li C, Mizuno S, Takeuchi O, Akira S. Mycobacterial infection in TLR2 and TLR6 knockout mice. Microbiol Immunol 2003;47:32736. [656] Pathak SK, Basu S, Basu KK, Banerjee A, Pathak S, Bhattacharyya A, et al. Direct extracellular interaction between the early secreted antigen ESAT-6 of Mycobacterium tuberculosis and TLR2 inhibits TLR signaling in macrophages. Nat Immunol 2007;8:61018. [657] Van Kooyk Y, Geijtenbeek TB. DC-SIGN: escape mechanism for pathogens. Nat Rev Immunol 2003;3:697709. [658] Redford PS, Murray PJ, O’garra A. The role of IL-10 in immune regulation during M. tuberculosis infection. Mucosal Immunol 2011;4:26170. [659] Sturgill-Koszycki S, Schlesinger PH, Chakraborty P, Haddix PL, Collins HL, Fok AK, et al. Lack of acidification in Mycobacterium phagosomes produced by exclusion of the vesicular proton-ATPase. Science 1994;263:67881. [660] Sakamoto K. The pathology of Mycobacterium tuberculosis infection. Vet Pathol 2012;49:42339. [661] Rajaram MV, Ni B, Morris JD, Brooks MN, Carlson TK, Bakthavachalu B, et al. Mycobacterium tuberculosis lipomannan blocks TNF biosynthesis by regulating macrophage MAPK-activated protein kinase 2 (MK2) and microRNA miR-125b. Proc Natl Acad Sci USA 2011;108:1740813. [662] Kumar R, Halder P, Sahu SK, Kumar M, Kumari M, Jana K, et al. Identification of a novel role of ESAT-6dependent miR-155 induction during infection of macrophages with Mycobacterium tuberculosis. Cell Microbiol 2012;14:162031. [663] Chatterjee S, Dwivedi VP, Singh Y, Siddiqui I, Sharma P, Van Kaer L, et al. Early secreted antigen ESAT-6 of Mycobacterium tuberculosis promotes protective T helper 17 cell responses in a Toll-like receptor-2dependent manner. PLoS Pathog 2011;7:e1002378. [664] Singh Y, Kaul V, Mehra A, Chatterjee S, Tousif S, Dwivedi VP, et al. Mycobacterium tuberculosis controls microRNA-99b (miR-99b) expression in infected murine dendritic cells to modulate host immunity. J Biol Chem 2013;288:505661.
628
References
[665] Realegeno S, Kelly-Scumpia KM, Dang AT, Lu J, Teles R, Liu PT, et al. S100A12 is part of the antimicrobial network against Mycobacterium leprae in human macrophages. PLoS Pathog 2016;12. [666] Fontan PA, Aris V, Alvarez ME, Ghanny S, Cheng J, Soteropoulos P, et al. Mycobacterium tuberculosis sigma factor E regulon modulates the host inflammatory response. J Infect Dis 2008;198:87785. [667] Tailleux L, Waddell SJ, Pelizzola M, Mortellaro A, Withers M, Tanne A, et al. Probing host pathogen crosstalk by transcriptional profiling of both Mycobacterium tuberculosis and infected human dendritic cells and macrophages. PLoS One 2008;3:e1403. [668] Agarwal V, Bell GW, Nam JW, Bartel DP. Predicting effective microRNA target sites in mammalian mRNAs. eLife 2015;4. [669] Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, et al. PGC-1 alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 2003;34:26773. [670] Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005;102:1554550. [671] Bovolenta LA, Acencio ML, Lemke N. HTRIdb: an open-access database for experimentally verified human transcriptional regulation interactions. BMC Genomics 2012;13(1):405. [672] Zheng G, Tu K, Yang Q, Xiong Y, Wei C, Xie L, et al. ITFP: an integrated platform of mammalian transcription factors. Bioinformatics 2008;24:241617. [673] Galagan J, Lyubetskaya A, Gomes A. ChIP-Seq and the complexity of bacterial transcriptional regulation. Curr Top Microbiol Immunol 2013;363:4368. [674] Galagan JE, Minch K, Peterson M, Lyubetskaya A, Azizi E, Sweet L, et al. The Mycobacterium tuberculosis regulatory network and hypoxia. Nature 2013;499:17883. [675] Jaini S, Lyubetskaya A, Gomes A, Peterson M, Park ST, Raman S, et al. Transcription factor binding site mapping using ChIP-Seq. Microbiol Spectr 2014;2. [676] Minch KJ, Rustad TR, Peterson EJR, Winkler J, Reiss DJ, Ma SY, et al. The DNA-binding network of Mycobacterium tuberculosis. Nat Commun 2015;6. [677] Guo W, Li JT, Pan X, Wei L, Wu JY. Candidate Mycobacterium tuberculosis genes targeted by human microRNAs. Protein Cell 2010;1:41921. [678] Wang Y, Cui T, Zhang C, Yang M, Huang Y, Li W, et al. Global protein-protein interaction network in the human pathogen Mycobacterium tuberculosis H37Rv. J Proteome Res 2010;9:666577. [679] Zhou H, Gao S, Nguyen NN, Fan M, Jin J, Liu B, et al. Stringent homology-based prediction of H. sapiensM. tuberculosis H37Rv protein-protein interactions. Biol Direct 2014;9:5. [680] Esterhuyse MM, Weiner 3rd J, Caron E, Loxton AG, Iannaccone M, Wagman C, et al. Epigenetics and proteomics join transcriptomics in the quest for tuberculosis biomarkers. MBio 2015;6 e01187-15. [681] Pacis A, Tailleux L, Morin AM, Lambourne J, Macisaac JL, Yotova V, et al. Bacterial infection remodels the DNA methylation landscape of human dendritic cells. Genome Res 2015;25:180111. [682] Vento-Tormo R, Company C, Rodriguez-Ubreva J, De La Rica L, Urquiza JM, Javierre BM, et al. IL-4 orchestrates STAT6-mediated DNA demethylation leading to dendritic cell differentiation. Genome Biol 2016;17:4. [683] Weber M, Hellmann I, Stadler MB, Ramos L, Paabo S, Rebhan M, et al. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet 2007;39:45766. [684] Weber JA, Baxter DH, Zhang S, Huang DY, Huang KH, Lee MJ, et al. The microRNA spectrum in 12 body fluids. Clin Chem 2010;56:173341. [685] Liu S, Da Cunha AP, Rezende RM, Cialic R, Wei Z, Bry L, et al. The host shapes the gut microbiota via fecal microRNA. Cell Host Microbe 2016;19:3243. [686] Coleman TF, Li YY. A reflective Newton method for minimizing a quadratic function subject to bounds on some of the variables. Siam J Optim 1996;6:104058. [687] Li CW, Chen BS. Network biomarkers of bladder cancer based on a genome-wide genetic and epigenetic network derived from next-generation sequencing data. Dis Markers 2016;2016:4149608. [688] Li CW, Wang WH, Chen BS. Investigating the specific core genetic-and-epigenetic networks of cellular mechanisms involved in human aging in peripheral blood mononuclear cells. Oncotarget 2016;7. [689] Rengarajan J, Bloom BR, Rubin EJ. Genome-wide requirements for Mycobacterium tuberculosis adaptation and survival in macrophages. Proc Natl Acad Sci USA 2005;102:832732.
References
629
[690] Joshi SM, Pandey AK, Capite N, Fortune SM, Rubin EJ, Sassetti CM. Characterization of mycobacterial virulence genes through genetic interaction mapping. Proc Natl Acad Sci USA 2006;103:117605. [691] Peters JM, Harris JR, Finley D. Ubiquitin and the biology of the cell. New York: Plenum Press; 1998. [692] Stanley SA, Barczak AK, Silvis MR, Luo SS, Sogi K, Vokes M, et al. Identification of host-targeted small molecules that restrict intracellular Mycobacterium tuberculosis growth. PLoS Pathog 2014;10. [693] Ding SZ, Minohara Y, Fan XJ, Wang J, Reyes VE, Patel J, et al. Helicobacter pylori infection induces oxidative stress and programmed cell death in human gastric epithelial cells. Infect Immun 2007;75:40309. [694] Mcadam E, Brem R, Karran P. Oxidative Stress-induced protein damage inhibits DNA repair and determines mutation risk and therapeutic efficacy. Mol Cancer Res 2016;14:61222. [695] Griffin JE, Gawronski JD, Dejesus MA, Ioerger TR, Akerley BJ, Sassetti CM. High-resolution phenotypic profiling defines genes essential for mycobacterial growth and cholesterol catabolism. PLoS Pathog 2011;7: e1002251. [696] Roger T, Lugrin J, Le Roy D, Goy G, Mombelli M, Koessler T, et al. Histone deacetylase inhibitors impair innate immune responses to Toll-like receptor agonists and to infection. Blood 2011;117:120517. [697] Romero MM, Basile JI, Lopez B, Ritacco V, Barrera L, Sasiain Mdel C, et al. Outbreaks of Mycobacterium tuberculosis MDR strains differentially induce neutrophil respiratory burst involving lipid rafts, p38 MAPK and Syk. BMC Infect Dis 2014;14:262. [698] Edwards RA, Witherspoon M, Wang K, Afrasiabi K, Pham T, Birnbaumer L, et al. Epigenetic repression of DNA mismatch repair by inflammation and hypoxia in inflammatory bowel disease-associated colorectal cancer. Cancer Res 2009;69:64239. [699] Wong KK, Carretero J, Li J, Hinds A, Ramirez MI, Williams MC, et al. ETS-1 regulates Twist-1 expression in non-small cell lung cancer (NSCLC) progression and metastasis. Am J Respir Crit Care Med 2011;183. [700] Young MR, Montpetit M, Lozano Y, Djordjevic A, Devata S, Matthews JP, et al. Regulation of Lewis lung carcinoma invasion and metastasis by protein kinase A. Int J Cancer 1995;61:1049. [701] Montero AJ, Diaz-Montero CM, Mao L, Youssef EM, Estecio M, Shen L, et al. Epigenetic inactivation of EGFR by CpG island hypermethylation in cancer. Cancer Biol Ther 2006;5:1494501. [702] Gu H, Li Q, Huang S, Lu W, Cheng F, Gao P, et al. Mitochondrial E3 ligase March5 maintains stemness of mouse ES cells via suppression of ERK signalling. Nat Commun 2015;6:7112. [703] Ren J, Wang Y, Liang Y, Zhang Y, Bao S, Xu Z. Methylation of ribosomal protein S10 by protein-arginine methyltransferase 5 regulates ribosome biogenesis. J Biol Chem 2010;285:12695705. [704] Buckley SM, Aranda-Orgilles B, Strikoudis A, Apostolou E, Loizou E, Moran-Crusio K, et al. Regulation of pluripotency and cellular reprogramming by the ubiquitin-proteasome system. Cell Stem Cell 2012;11:78398. [705] Lelouard H, Gatti E, Cappello F, Gresser O, Camosseto V, Pierre P. Transient aggregation of ubiquitinated proteins during dendritic cell maturation. Nature 2002;417:17782. [706] Canadien V, Tan T, Zilber R, Szeto J, Perrin AJ, Brumell JH. Cutting edge: microbial products elicit formation of dendritic cell aggresome-like induced structures in macrophages. J Immunol 2005;174:24715. [707] Rohde KH, Veiga DF, Caldwell S, Balazsi G, Russell DG. Linking the transcriptional profiles and the physiological states of Mycobacterium tuberculosis during an extended intracellular infection. PLoS Pathog 2012;8: e1002769. [708] Connor SE, Capodagli GC, Deaton MK, Pegan SD. Structural and functional characterization of Mycobacterium tuberculosis triosephosphate isomerase. Acta Crystallogr D Biol Crystallogr 2011;67:101722. [709] Kuhn ML, Zemaitaitis B, Hu LI, Sahu A, Sorensen D, Minasov G, et al. Structural, kinetic and proteomic characterization of acetyl phosphate-dependent bacterial protein acetylation. PLoS One 2014;9:e94816. [710] Piddington DL, Fang FC, Laessig T, Cooper AM, Orme IM, Buchmeier NA. Cu,Zn superoxide dismutase of Mycobacterium tuberculosis contributes to survival in activated macrophages that are generating an oxidative burst. Infect Immun 2001;69:49807. [711] Wagner D, Maser J, Lai B, Cai Z, Barry 3rd CE, Honer Zu Bentrup K, et al. Elemental analysis of Mycobacterium avium-, Mycobacterium tuberculosis-, and Mycobacterium smegmatis-containing phagosomes indicates pathogen-induced microenvironments within the host cell’s endosomal system. J Immunol 2005;174:1491500. [712] Wolschendorf F, Ackart D, Shrestha TB, Hascall-Dove L, Nolan S, Lamichhane G, et al. Copper resistance is essential for virulence of Mycobacterium tuberculosis. Proc Natl Acad Sci USA 2011;108:16216.
630
References
[713] Ward SK, Hoye EA, Talaat AM. The global responses of Mycobacterium tuberculosis to physiological levels of copper. J Bacteriol 2008;190:293946. [714] Balazsi G, Heath AP, Shi L, Gennaro ML. The temporal response of the Mycobacterium tuberculosis gene regulatory network during growth arrest. Mol Syst Biol 2008;4:225. [715] Leistikow RL, Morton RA, Bartek IL, Frimpong I, Wagner K, Voskuil MI. The Mycobacterium tuberculosis DosR regulon assists in metabolic homeostasis and enables rapid recovery from nonrespiring dormancy. J Bacteriol 2010;192:166270. [716] Voskuil MI, Schnappinger D, Visconti KC, Harrell MI, Dolganov GM, Sherman DR, et al. Inhibition of respiration by nitric oxide induces a Mycobacterium tuberculosis dormancy program. J Exp Med 2003;198:70513. [717] Campbell DR, Chapman KE, Waldron KJ, Tottey S, Kendall S, Cavallaro G, et al. Mycobacterial cells have dual nickel-cobalt sensors sequence relationships and metal sites of metal-responsive repressors are not congruent. J Biol Chem 2007;282:32298310. [718] Li J, Wang X, Gong W, Niu C, Zhang M. Crystallization and preliminary X-ray analysis of Rv1674c from Mycobacterium tuberculosis. Acta Crystallogr F Struct Biol Commun 2015;71:3547. [719] Koul A, Choidas A, Treder M, Tyagi AK, Drlica K, Singh Y, et al. Cloning and characterization of secretory tyrosine phosphatases of Mycobacterium tuberculosis. J Bacteriol 2000;182:542532. [720] Ecco G, Vernal J, Razzera G, Martins PA, Matiollo C, Terenzi H. Mycobacterium tuberculosis tyrosine phosphatase A (PtpA) activity is modulated by S-nitrosylation. Chem Commun (Camb) 2010;46:75013. [721] Choi KP, Kendrick N, Daniels L. Demonstration that fbiC is required by Mycobacterium bovis BCG for coenzyme F-420 and FO biosynthesis. J Bacteriol 2002;184:24208. [722] Purwantini E, Gillis TP, Daniels L. Presence of F420-dependent glucose-6-phosphate dehydrogenase in Mycobacterium and Nocardia species, but absence from Streptomyces and Corynebacterium species and methanogenic Archaea. FEMS Microbiol Lett 1997;146:12934. [723] Purwantini E, Mukhopadhyay B. Conversion of NO2 to NO by reduced coenzyme F-420 protects mycobacteria from nitrosative damage. Proc Natl Acad Sci USA 2009;106:63338. [724] Mccue LA, Mcdonough KA, Lawrence CE. Functional classification of cNMP-binding proteins and nucleotide cyclases with implications for novel regulatory pathways in Mycobacterium tuberculosis. Genome Res 2000;10:20419. [725] Mcdonough KA, Rodriguez A. The myriad roles of cyclic AMP in microbial pathogens: from signal to sword. Nat Rev Microbiol 2012;10:2738. [726] Agarwal N, Lamichhane G, Gupta R, Nolan S, Bishai WR. Cyclic AMP intoxication of macrophages by a Mycobacterium tuberculosis adenylate cyclase. Nature 2009;460:98102. [727] Ranganathan S, Bai G, Lyubetskaya A, Knapp GS, Peterson MW, Gazdik M, et al. Characterization of a cAMP responsive transcription factor, Cmr (Rv1675c), in TB complex mycobacteria reveals overlap with the DosR (DevR) dormancy regulon. Nucleic Acids Res 2015;. [728] Gupta D, Sharma S, Singhal J, Satsangi AT, Antony C, Natarajan K. Suppression of TLR2-induced IL-12, reactive oxygen species, and inducible nitric oxide synthase expression by Mycobacterium tuberculosis antigens expressed inside macrophages during the course of infection. J Immunol 2010;184:544455. [729] Andersen P, Doherty TM. The success and failure of BCG implications for a novel tuberculosis vaccine. Nat Rev Microbiol 2005;3:65662. [730] Comstock GW, Woolpert SF, Livesay VT. Tuberculosis studies in Muscogee County, Georgia. Twenty-year evaluation of a community trial of BCG vaccination. Public Health Rep 1976;91:27680. [731] Hart PD, Sutherland I. BCG and vole bacillus vaccines in the prevention of tuberculosis in adolescence and early adult life. Br Med J 1977;2:2935. [732] Sterne JAC, Rodrigues LC, Guedes IN. Does the efficacy of BCG decline with time since vaccination? Int J Tuberc Lung Dis 1998;2:2007. [733] Johnson R, Streicher EM, Louw GE, Warren RM, Van Helden PD, Victor TC. Drug resistance in Mycobacterium tuberculosis. Curr Issues Mol Biol 2006;8:97111. [734] Alexander PE, De P. The emergence of extensively drug-resistant tuberculosis (TB): TB/HIV coinfection, multidrug-resistant TB and the resulting public health threat from extensively drug-resistant TB, globally and in Canada. Can J Infect Dis Med Microbiol 2007;18:28991.
References
631
[735] Zignol M, Van Gemert W, Falzon D, Sismanidis C, Glaziou P, Floyd K, et al. Surveillance of antituberculosis drug resistance in the world: an updated analysis, 20072010. Bull World Health Organ 2012;90:11119. [736] Centers for Disease Control and Prevention (CDC). Emergence of Mycobacterium tuberculosis with extensive resistance to second-line drugs—worldwide, 20002004. MMWR Morb Mortal Wkly Rep 2006;55:3015. [737] Mawuenyega KG, Forst CV, Dobos KM, Belisle JT, Chen J, Bradbury EM, et al. Mycobacterium tuberculosis functional network analysis by global subcellular protein profiling. Mol Biol Cell 2005;16:396404. [738] Sassetti CM, Boyd DH, Rubin EJ. Genes required for mycobacterial growth defined by high density mutagenesis. Mol Microbiol 2003;48:7784. [739] Sassetti CM, Rubin EJ. Genetic requirements for mycobacterial survival during infection. Proc Natl Acad Sci USA 2003;100:1298994. [740] Kinnings SL, Xie L, Fung KH, Jackson RM, Xie L, Bourne PE. The Mycobacterium tuberculosis drugome and its polypharmacological implications. PLoS Comput Biol 2010;6:e1000976. [741] Andries K, Verhasselt P, Guillemont J, Gohlmann HW, Neefs JM, Winkler H, et al. A diarylquinoline drug active on the ATP synthase of Mycobacterium tuberculosis. Science 2005;307:2237. [742] Novoa-Aponte L, Soto Ospina CY. Mycobacterium tuberculosis P-type ATPases: possible targets for drug or vaccine development. Biomed Res Int 2014;2014:296986. [743] Speer A, Shrestha TB, Bossmann SH, Basaraba RJ, Harber GJ, Michalek SM, et al. Copper-boosting compounds: a novel concept for antimycobacterial drug discovery. Antimicrob Agents Chemother 2013;57:108991. [744] Bartlett JG, et al. Antibiotic-associated pseudomembranous colitis due to toxin-producing clostridia. N Engl J Med 1978;298(10):5314. [745] Hall IC, O’Toole E. Intestinal flora in new-borin infants with a description of a new pathogenic anaerobe, Bacillus difficilis. Am J Dis Child 1935;49(2):390402. [746] Kelly CP, LaMont JT. Clostridium difficile—more difficult than ever. N Engl J Med 2008;359(18). [747] Reveles KR, et al. The rise in Clostridium difficile infection incidence among hospitalized adults in the United States: 2001-2010. Am J Infect Control 2014;42(10):102832. [748] McGlone SM, et al. The economic burden of Clostridium difficile. Clin Microbiol Infect 2012;18(3):2829. [749] Warny M, et al. Toxin production by an emerging strain of Clostridium difficile associated with outbreaks of severe disease in North America and Europe. Lancet 2005;366(9491):107984. [750] Hidalgo IJ, Raub TJ, Borchardt RT. Characterization of the human-colon carcinoma cell-line (Caco-2) as a model system for intestinal epithelial permeability. Gastroenterology 1989;96(3):73649. [751] Voth DE, Ballard JD. Clostridium difficile toxins: mechanism of action and role in disease. Clin Microbiol Rev 2005;18(2):247. [752] Pituch H. Clostridium difficile is no longer just a nosocomial infection or an infection of adults. Int J Antimicrob Agents 2009;33:S425. [753] Just I, et al. Glucosylation of Rho-proteins by clostridium-difficile toxin-B. Nature 1995;375(6531):5003. [754] Chaves-Olarte E, et al. R-Ras glucosylation and transient RhoA activation determine the cytopathic effect produced by toxin B variants from toxin A-negative strains of Clostridium difficile. J Biol Chem 2003;278 (10):795663. [755] Janvilisri T, Scaria J, Chang YF. Transcriptional profiling of Clostridium difficile and Caco-2 cells during Infection. J Infect Dis 2010;202(2):28290. [756] Geisler S, Coller J. RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts. Nat Rev Mol Cell Biol 2013;14(11):699712. [757] Hsu CY, et al. Systematic approach to Escherichia coli cell population control using a genetic lysis circuit. Bmc Syst Biol 2014;8(Suppl. 5):S7. [758] Wang X, et al. Cloning and variation of ground state intestinal stem cells. Gastroenterology 2015;148(4): S729. [759] Salwinski L, et al. The database of interacting proteins: 2004 update. Nucleic Acids Res 2004;32:D44951. [760] Li JH, et al. starBase v2.0: decoding miRNA-ceRNA, miRNA-ncRNA and protein-RNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res 2014;42(D1):D927. [761] Friard O, et al. CircuitsDB: a database of mixed microRNA/transcription factor feed-forward regulatory circuits in human and mouse. Bmc Bioinformatics 2010;11(1):435.
632
References
[762] Janoir C, et al. Cwp84, a surface-associated protein of Clostridium difficile, is a cysteine protease with degrading activity on extracellular matrix proteins. J Bacteriol 2007;189(20):717480. [763] Na X, et al. gp96 is a human colonocyte plasma membrane binding protein for Clostridium difficile toxin A. Infect Immun 2008;76(7):286271. [764] ChavesOlarte E, et al. Toxins A and B from Clostridium difficile differ with respect to enzymatic potencies, cellular substrate specificities, and surface binding to cultured cells. J Clin Invest 1997;100 (7):173441. [765] Goy SD, et al. Human neutrophils are activated by a peptide fragment of Clostridium difficile toxin B presumably via formyl peptide receptor. Cell Microbiol 2015;17(6):893909. [766] LaFrance ME, et al. Identification of an epithelial cell receptor responsible for Clostridium difficile TcdBinduced cytotoxicity. Proc Natl Acad Sci USA 2015;112(22):70738. [767] Yuan PF, et al. Chondroitin sulfate proteoglycan 4 functions as the cellular receptor for Clostridium difficile toxin B. Cell Res 2015;25(2):15768. [768] Papatheodorou P, et al. Clostridial glucosylating toxins enter cells via clathrin-mediated endocytosis. PLoS One 2010;5(5). [769] Haug G, Aktories K, Barth H. The host cell chaperone Hsp90 is necessary for cytotoxic action of the binary iota-like toxins. Infect Immun 2004;72(5):30668. [770] Berwin B, et al. Scavenger receptor-A mediates gp96/GRP94 and calreticulin internalization by antigenpresenting cells. EMBO J 2003;22(22):612736. [771] Liu TS, et al. Protective role of HSP72 against Clostridium difficile toxin A-induced intestinal epithelial cell dysfunction. Am J Physiol Cell Physiol 2003;284(4):C107382. [772] Tucker KD, Wilkins TD. Toxin-A of Clostridium difficile binds to the human carbohydrate antigens-I, antigens-X, and antigens-Y. Infect Immun 1991;59(1):738. [773] Kim H, et al. Clostridium difficile toxin A binds colonocyte Src causing dephosphorylation of focal adhesion kinase and paxillin. Exp Cell Res 2009;315(19):333644. [774] Calabi E, et al. Binding of Clostridium difficile surface layer proteins to gastrointestinal tissues. Infect Immun 2002;70(10):57708. [775] Larocque M, Chenard T, Najmanovich R. A curated C. difficile strain 630 metabolic network: prediction of essential targets and inhibitors. BMC Syst Biol 2014;8. [776] Krishnadev O, Srinivasan N. Prediction of protein-protein interactions between human host and a pathogen and its application to three pathogenic bacteria. Int J Biol Macromol 2011;48(4):61319. [777] Dineen SS, et al. Repression of Clostridium difficile toxin gene expression by CodY. Mol Microbiol 2007;66 (1):20619. [778] Walter BM, et al. The LexA regulated genes of the Clostridium difficile. BMC Microbiol 2014;14. [779] Fimlaid KA, et al. Global analysis of the sporulation pathway of Clostridium difficile. PLoS Genet 2013;9(8). [780] Antunes A, Martin-Verstraete I, Dupuy B. CcpA-mediated repression of Clostridium difficile toxin gene expression. Mol Microbiol 2011;79(4):88299. [781] El Meouche I, et al. Characterization of the SigD regulon of C. difficile and its positive control of toxin production through the regulation of tcdR. PLoS One 2013;8(12). [782] Matamouros S, England P, Dupuy B. Clostridium difficile toxin expression is inhibited by the novel regulator TcdC. Mol Microbiol 2007;64(5):127488. [783] Antunes A, et al. Global transcriptional control by glucose and carbon regulator CcpA in Clostridium difficile. Nucleic Acids Res 2012;40(21):1070118. [784] Novichkov PS, et al. RegPrecise 3.0-A resource for genome-scale exploration of transcriptional regulation in bacteria. BMC Genomics 2013;14. [785] Mani N, Dupuy B. Regulation of toxin synthesis in Clostridium difficile by an alternative RNA polymerase sigma factor. Proc Natl Acad Sci USA 2001;98(10):58449. [786] Saujet L, et al. The key sigma factor of transition phase, SigH, controls sporulation, metabolism, and virulence factor expression in Clostridium difficile. J Bacteriol 2011;193(13):318696. [787] Salgado H, et al. RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, crossvalidated gold standards and more. Nucleic Acids Res 2013;41(D1):D20313. [788] Lu P, Vogel C, Wang R, Yao X, Marcotte EM. Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotechnol 2007;25:11724.
References
633
[789] Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2009;37(1):113. [790] Hesson LB, et al. Altered promoter nucleosome positioning is an early event in gene silencing. Epigenetics 2014;9(10):142230. [791] Fernandez-Jimenez N, et al. Coregulation and modulation of NF kappa B-related genes in celiac disease: uncovered aspects of gut mucosal inflammation. Hum Mol Genet 2014;23(5):1298310. [792] Dembek M, et al. High-throughput analysis of gene essentiality and sporulation in Clostridium difficile. Mbio 2015;6(2). [793] Shen Y, et al. Role of long non-coding RNA MIAT in proliferation, apoptosis and migration of lens epithelial cells: a clinical and in vitro study. J Cell Mol Med 2016;20(3):53748. [794] Just I, et al. The enterotoxin from Clostridium difficile (Toxa) monoglucosylates the Rho-proteins. J Biol Chem 1995;270(23):139326. [795] Chen SY, et al. The role of Rho GTPases in toxicity of Clostridium difficile toxins. Toxins 2015;7(12):525467. [796] Kovacs JJ, et al. HDAC6 regulates Hsp90 acetylation and chaperone-dependent activation of glucocorticoid receptor. Mol Cell 2005;18(5):6017. [797] Janvilisri T, et al. Microarray identification of Clostridium difficile core components and divergent regions associated with host origin. J Bacteriol 2009;191(12):388191. [798] Liliental J, et al. Genetic deletion of the Pten tumor suppressor gene promotes cell motility by activation of Rac1 and Cdc42 GTPases. Curr Biol 2000;10(7):4014. [799] Farrow MA, et al. Clostridium difficile toxin B-induced necrosis is mediated by the host epithelial cell NADPH oxidase complex. Proc Natl Acad Sci USA 2013;110(46):186749. [800] Seo JH, et al. ARD1-mediated Hsp70 acetylation balances stress-induced protein refolding and degradation. Nat Commun 2016;7. [801] Boyd CD, O’Toole GA. Second messenger regulation of biofilm formation: breakthroughs in understanding c-di-GMP effector systems. Annu Rev Cell Dev Biol 2012;28(28):43962. [802] Brekasis D, Paget MSB. A novel sensor of NADH/NAD(1) redox poise in Streptomyces coelicolor A3(2). EMBO J 2003;22(18):485665. [803] Kosmaczewski SG, et al. The RtcB RNA ligase is an essential component of the metazoan unfolded protein response. Embo Rep 2014;15(12):127885. [804] Hansen A, et al. The P2Y(6) receptor mediates Clostridium difficile toxin-induced CXCL8/IL-8 production and intestinal epithelial barrier dysfunction. PLoS One 2013;8(11). [805] Sun ML, et al. Lysine acetylation regulates the activity of Escherichia coli S-adenosylmethionine synthase. Acta Biochim Biophys Sin 2016;48(8):72331. [806] Girinathan BP, Braun SE, Govind R. Clostridium difficile glutamate dehydrogenase is a secreted enzyme that confers resistance to H2O2. Microbiology (Sgm) 2014;160:4755. [807] Shah D, et al. Clostridium difficile infection: update on emerging antibiotic treatment options and antibiotic resistance. Expert Rev Anti Infect Ther 2010;8(5):55564. [808] Garey KW, et al. Meta-analysis to assess risk factors for recurrent Clostridium difficile infection. J Hosp Infect 2008;70(4):298304. [809] Louie TJ, et al. Fidaxomicin versus vancomycin for Clostridium difficile infection. N Engl J Med 2011;364 (5):42231. [810] Sun CL, et al. Recombinant Clostridium difficile toxin B induces endoplasmic reticulum stress in mouse colonal carcinoma cells. Acta Biochim Biophys Sin 2014;46(11):97381. [811] Mulvey GL, et al. Therapeutic potential of egg yolk antibodies for treating Clostridium difficile infection. J Med Microbiol 2011;60(8):11817. [812] Ochsner UA, et al. Inhibitory effect of REP3123 on toxin and spore formation in Clostridium difficile, and in vivo efficacy in a hamster gastrointestinal infection model. J Antimicrob Chemother 2009;63(5):96471. [813] Lamb J, et al. The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 2006;313(5795):192935. [814] Dong QL, et al. Inhibitory effect of camptothecin against rice bacterial brown stripe pathogen Acidovorax avenae subsp avenae RS-2. Molecules 2016;21(8). [815] Morimoto Y, et al. Apigenin as an anti-quinolone-resistance antibiotic. Int J Antimicrob Agents 2015;46 (6):66673.
634
References
[816] Tesfazghi M, et al. The recruitment of the transcription factor YY1 to DNA damage sites in human cells. FASEB J 2015;29. [817] Baggiolini M, Clarklewis I. Interleukin-8, a chemotactic and inflammatory cytokine. FEBS Lett 1992;307 (1):97101. [818] Chumbler NM, et al. Clostridium difficile toxin B causes epithelial cell necrosis through an autoprocessingindependent mechanism. PLoS Pathog 2012;8(12). [819] Lyras D, et al. Toxin B is essential for virulence of Clostridium difficile. Nature 2009;458(7242):117681. [820] Kuehne SA, et al. The role of toxin A and toxin B in Clostridium difficile infection. Nature 2010;467 (7316):71197. [821] Hebecker B, et al. Pathogenicity mechanisms and host response during oral Candida albicans infections. Expert Rev Anti Infect Ther 2014;12(7):86779. Available from: https://doi.org/10.1586/ 14787210.2014.916210. [822] Farah CS, Lynch N, McCullough MJ. Oral fungal infections: an update for the general practitioner. Aust Dent J 2010;55(Suppl. 1):4854. Available from: https://doi.org/10.1111/j.1834-7819.2010.01198.x. [823] Slutsky B, Staebell M, et al. “White-opaque transition”: a second high-frequency switching system in Candida albicans. J Bacteriol 1987;169(1):18997 PMC211752. [824] Mishra PK, Baum M, Carbon J. DNA methylation regulates phenotype-dependent transcriptional activity in Candida albicans. Proc Natl Acad Sci USA 2011;108(29):1196570. Available from: https://doi.org/10.1073/ pnas.1109631108. [825] Shirtliff ME, Peters BM, Jabra-Rizk MA. Cross-kingdom interactions: Candida albicans and bacteria. FEMS Microbiol Lett 2009;299(1):18. Available from: https://doi.org/10.1111/j.1574-6968.2009.01668.x. [826] Dongari-Bagtzoglou A, Kashleva H. Development of a highly reproducible three-dimensional organotypic model of the oral mucosa. Nat Protoc 2006;1(4):201218. Available from: https://doi.org/10.1038/nprot.2006.323. [827] Mayer FL, Wilson D, Hube B. Candida albicans pathogenicity mechanisms. Virulence 2013;4(2):11928. Available from: https://doi.org/10.4161/viru.22913. [828] Goyer M, et al. Intestinal cell tight junctions limit invasion of Candida albicans through active penetration and endocytosis in the early stages of the interaction of the fungus with the intestinal barrier. PLoS One 2016;11(3):e0149159. Available from: https://doi.org/10.1371/journal.pone.0149159. [829] Zhu W, et al. EGFR and HER2 receptor kinase signaling mediate epithelial cell invasion by Candida albicans during oropharyngeal infection. Proc Natl Acad Sci USA 2012;109(35):141949. Available from: https://doi. org/10.1073/pnas. [830] Edwards Jr JE, Gaither TA, et al. Expression of specific binding sites on Candida with functional and antigenic characteristics of human complement receptors. J Immunol 1986;137(11):357783. [831] Sohn K, et al. EFG1 is a major regulator of cell wall dynamics in Candida albicans as revealed by DNA microarrays. J Bacteriol 2001;183(13):40903. Available from: https://doi.org/10.1046/j.13652958.2003.03300.x. [832] Wu C-C, Chen B-S. A systems biology approach to the coordination of defensive and offensive molecular mechanisms in the innate and adaptive hostpathogen interaction networks. PLoS One 2016;11(2):e0149303. Available from: https://doi.org/10.1371/journal.pone.0149303. [833] Bonazzi M, Cossart P. Impenetrable barriers or entry portals? The role of cellcell adhesion during infection. J Cell Biol 2011;195(3):34958. Available from: https://doi.org/10.1083/jcb.201106011. [834] Liu Y, et al. New signaling pathways govern the host response to C. albicans infection in various niches. Genome Res 2015;25(5):67989. Available from: https://doi.org/10.1101/gr.187427.114. [835] da Huang W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009;4(1):4457. Available from: https://doi.org/10.1038/ nprot.2008.211. [836] Moazeni M, et al. Down-regulation of the ALS3 gene as a consequent effect of RNA-mediated silencing of the EFG1 gene in Candida albicans. Iran Biomed J 2012;16(4):1728. Available from: https://doi.org/ 10.6091/ibj.1093.2012. [837] Fanning S, et al. Divergent targets of Candida albicans biofilm regulator Bcr1 in vitro and in vivo. Eukaryot Cell 2012;11(7):896904. Available from: https://doi.org/10.1128/EC.00103-12. [838] da Silva Dantas A, et al. Thioredoxin regulates multiple hydrogen peroxide-induced signaling pathways in Candida albicans. Acta Pharmacol Sin 2010;31(5):61628. Available from: https://doi.org/10.1128/MCB.00313-10.
References
635
[839] Patterson MJ, et al. Ybp1 and Gpx3 signaling in Candida albicans govern hydrogen peroxide-induced oxidation of the Cap1 transcription factor and macrophage escape. Antioxid Redox Signal 2013;19(18):224460. Available from: https://doi.org/10.1089/ars.2013.5199. [840] Chakraborty A, et al. The E3 ubiquitin ligase Trim7 mediates c-Jun/AP-1 activation by Ras signalling. Nat Commun 2015;6:6782. Available from: https://doi.org/10.1038/ncomms7782. [841] Bhogaraju S, et al. Phosphoribosylation of ubiquitin promotes serine ubiquitination and impairs conventional ubiquitination. Cell 2016;167(6):16361649.e13. Available from: https://doi.org/10.1016/j. cell.2016.11.019. [842] Chung WO, Dale BA. Innate immune response of oral and foreskin keratinocytes: utilization of different signaling pathways by various bacterial species. Infect Immun 2004;72(1):3528. Available from: https:// doi.org/10.1128/IAI.72.1.352-358.2004. [843] Buck I, et al. The inhibitory effect of the proinflammatory cytokine TNFalpha on erythroid differentiation involves erythroid transcription factor modulation. Int J Oncol 2009;34(3):85360. Available from: https:// doi.org/10.3892/ijo_00000212. [844] Jin S, et al. JlpA of Campylobacter jejuni interacts with surface-exposed heat shock protein 90alpha and triggers signalling pathways leading to the activation of NF-kappaB and p38 MAP kinase in epithelial cells. Cell Microbiol 2003;5(3):16574. Available from: https://doi.org/10.1046/j.14625822.2003.00265.x. [845] Liu Z, Chen S. ER regulates an evolutionarily conserved apoptosis pathway. Biochem Biophys Res Commun 2010;400(1):348. Available from: https://doi.org/10.1016/j.bbrc.2010.07.132. [846] Geng F, et al. Multiple post-translational modifications regulate E-cadherin transport during apoptosis. Virulence 2012;125(Pt 11):261525. Available from: https://doi.org/10.1242/jcs.096735. [847] Jariel-Encontre I, et al. Complex mechanisms for c-fos and c-jun degradation. Mol Biol Rep 1997;24 (12):516. Available from: https://doi.org/10.1023/A:100680472. [848] Kang J, et al. A nuclear function of beta-arrestin1 in GPCR signaling: regulation of histone acetylation and gene transcription. Cell 2005;123(5):83347. Available from: https://doi.org/10.1016/j.cell.2005.09.011. [849] Tsoni SV, et al. Complement C3 plays an essential role in the control of opportunistic fungal infections. Infect Immun 2009;77(9):367985. Available from: https://doi.org/10.1128/IAI.00233-09. [850] Yano S, et al. Transcriptional responses of human epidermal keratinocytes to cytokine interleukin-1. J Cell Physiol 2008;214(1):113. Available from: https://doi.org/10.1002/jcp.21300. [851] Pietrella D, et al. The inflammatory response induced by aspartic proteases of Candida albicans is independent of proteolytic activity. Infect Immun 2010;78(11):475462. Available from: https://doi.org/10.1128/ IAI.00789-10. [852] Chen Z, Jin T, Lu Y. AntimiR-30b inhibits TNF-alpha mediated apoptosis and attenuated cartilage degradation through enhancing autophagy. Cell Physiol Biochem 2016;40(5):88394. Available from: https://doi. org/10.1159/000453147. [853] Verhelst K, et al. Linear ubiquitination in NF-kappaB signaling and inflammation: what we do understand and what we do not. Biochem Pharmacol 2011;82(9):105765. Available from: https://doi.org/10.1016/j. bcp.2011.07.066. [854] Hirakawa MP, et al. Genetic and phenotypic intra-species variation in Candida albicans. Genome Res 2015;25 (3):41325. Available from: https://doi.org/10.1101/gr.174623.114. [855] Du H, Huang G. Environmental pH adaption and morphological transitions in Candida albicans. Curr Genet 2016;62(2):2836. Available from: https://doi.org/10.1007/s00294-015-0540-8. [856] Sandai D, et al. Resistance of Candida albicans biofilms to drugs and the host immune system. Jundishapur J Microbiol 2016;9(11):e37385. Available from: https://doi.org/10.5812/jjm.37385. [857] Xie Z, et al. Candida albicans biofilms do not trigger reactive oxygen species and evade neutrophil killing. J Infect Dis 2012;206(12):193645. Available from: https://doi.org/10.1093/infdis/jis607. [858] Pappas PG, et al. Clinical practice guideline for the management of candidiasis: 2016 update by the Infectious Diseases Society of America. Clin Infect Dis 2016;62(4):e1e50. Available from: https://doi.org/ 10.1093/cid/civ933. [859] illar CC, et al. Mucosal tissue invasion by Candida albicans is associated with E-cadherin degradation, mediated by transcription factor Rim101p and protease Sap5p. Infect Immun 2007;75(5):212635. Available from: https://doi.org/10.1128/IAI.00054-07.
636
References
[860] Xu D, et al. Genome-wide fitness test and mechanism-of-action studies of inhibitory compounds in Candida albicans. PLoS Pathog 2007;3(6):e92. Available from: https://doi.org/10.1371/journal.ppat.0030092. [861] Staib P, et al. Tetracycline-inducible expression of individual secreted aspartic proteases in Candida albicans allows isoenzyme-specific inhibitor screening. Antimicrob Agents Chemother 2008;52(1):14656. Available from: https://doi.org/10.1128/AAC.01072-07. [862] Braga-Silva LA, Santos AL. Aspartic protease inhibitors as potential anti-Candida albicans drugs: impacts on fungal biology, virulence and pathogenesis. Curr Med Chem 2011;18(16):240119. Available from: https:// doi.org/10.2174/092986711795843182. [863] Niewerth M, et al. Ciclopirox olamine treatment affects the expression pattern of Candida albicans genes encoding virulence factors, iron metabolism proteins, and drug resistance factors. Antimicrob Agents Chemother 2003;47(6):180517. Available from: https://doi.org/10.1128/AAC.47.6.1805-1817.2003. [864] Zhao LX, et al. Effect of tetrandrine against Candida albicans biofilms. PLoS One 2013;8(11):e79671. Available from: https://doi.org/10.1371/journal.pone.0079671. [865] Soysa NS, Samaranayake LP, Ellepola ANB. Antimicrobials as a contributory factor in oral candidosis—a brief overview. Oral Dis 2008;14(2):13843. Available from: https://doi.org/10.1111/j.1601-0825.2006.01357. x. [866] Kim J, Lee JE, Lee JS. Histone deacetylase-mediated morphological transition in Candida albicans. J Microbiol 2015;53(12):80511. Available from: https://doi.org/10.1007/s12275-015-5488-3. [867] Cherry JM, et al. SGD: Saccharomyces Genome Database. Nucleic Acids Res 1998;26(1):739. Available from: https://doi.org/10.1093/nar/26.1.73. [868] Szklarczyk D, et al. The STRING database in 2017: quality-controlled proteinprotein association networks, made broadly accessible. Nucleic Acids Res 2017;45(D1):D3628. Available from: https://doi.org/10.1093/ nar/gkw937. [869] Bird AJ, et al. The Zap1 transcriptional activator also acts as a repressor by binding downstream of the TATA box in ZRT2. EMBO J 2004;23(5):112332. Available from: https://doi.org/10.1038/sj.emboj.7600122. [870] Chen H-F, Lan C-Y. Role of SFP1 in the regulation of Candida albicans biofilm formation. PLoS One 2015;10 (6):e0129903. Available from: https://doi.org/10.1371/journal.pone.0129903. [871] Colman-Lerner A, Chin TE, Brent R. Yeast Cbk1 and Mob2 activate daughter-specific genetic programs to induce asymmetric cell fates. Cell 2014;107(6):73950. Available from: https://doi.org/10.1016/S0092-8674 (01)00596-7. [872] Murciano C, et al. Evaluation of the role of Candida albicans agglutinin-like sequence (Als) proteins in human oral epithelial cell interactions. PLoS One 2012;7(3):e33362. Available from: https://doi.org/ 10.1371/journal.pone.0033362. [873] Taylor DM, et al. Characterizing the role of Hsp90 in production of heat shock proteins in motor neurons reveals a suppressive effect of wild-type Hsf1. Cell Stress Chaperones 2007;12(2):15162. Available from: https://doi.org/10.1379/CSC-254R.1. [874] Argimon S, et al. Developmental regulation of an adhesin gene during cellular morphogenesis in the fungal pathogen Candida albicans. Eukaryot Cell 2007;6(4):68292. Available from: https://doi.org/10.1128/ EC.00340-06. [875] Banerjee M, et al. Expression of UME6, a key regulator of Candida albicans hyphal development, enhances biofilm formation via Hgc1- and Sun41-dependent mechanisms. Eukaryot Cell 2013;12(2):22432. Available from: https://doi.org/10.1128/EC.00163-12. [876] Bastidas RJ, Heitman J, Cardenas ME. The protein kinase Tor1 regulates adhesin gene expression in Candida albicans. PLoS Pathog 2009;5(2):e1000294. Available from: https://doi.org/10.1371/journal.ppat.1000294. [877] Braun BR, Kadosh D, Johnson AD. NRG1, a repressor of filamentous growth in C. albicans, is downregulated during filament induction. EMBO J 2001;20(17):475361. Available from: https://doi.org/ 10.1093/emboj/20.17.4753. [878] Chen BS, Wu CC. Systems biology. New York: Nova Science; 2014. [879] Chen BS, Li CW. Big mechanisms in systems biology. New York: Academic Press; 2017. [880] Gru A, et al. The Epstein-Barr virus (EBV) in T cell and NK cell lymphomas: time for a reassessment. Curr Hematol Malig Rep 2015;10(4):45667. [881] Odumade OA, Hogquist KA, Balfour HH. Progress and problems in understanding and managing primary Epstein-Barr virus infections. Clin Microbiol Rev 2011;24(1):193209.
References
637
[882] Murata T, Tsurumi T. Switching of EBV cycles between latent and lytic states. Rev Med Virol 2014;24 (3):14253. [883] Hong GK, et al. The BRRF1 early gene of Epstein-Barr virus encodes a transcription factor that enhances induction of lytic infection by BRLF1. J Virol 2004;78(10):498392. [884] Murata T. Regulation of EpsteinBarr virus reactivation from latency. Microbiol Immunol 2014;58 (6):30717. [885] Kenney SC, Mertz JE. Regulation of the latent-lytic switch in Epstein-Barr virus. Semin Cancer Biol 2014;26:608. [886] Hammerschmidt W, Sugden B. Genetic analysis of immortalizing functions of Epstein Barr virus in human B lymphocytes. Nature 1989;340(6232):3937. [887] Nowag H, et al. Macroautophagy proteins assist Epstein Barr virus production and get incorporated into the virus particles. EBioMedicine 2014;1(2):11625. [888] Tempera I, Lieberman PM. Epigenetic regulation of EBV persistence and oncogenesis. Semin Cancer Biol 2014;26:229. [889] O’Grady T, et al. Global bidirectional transcription of the Epstein-Barr virus genome during reactivation. J Virol 2014;88(3):160416. [890] Kalla M, Gobel C, Hammerschmidt W. The lytic phase of Epstein-Barr virus requires a viral genome with 5methylcytosine residues in CpG sites. J Virol 2012;86(1):44758. [891] Oughtred R, et al. BioGRID: a resource for studying biological interactions in yeast. Cold Spring Harb Protoc 2016;2016(1). p. pdb. top080754. [892] Kerrien S, et al. IntAct—open source resource for molecular interaction data. Nucleic Acids Res 2007;35 (Suppl. 1):D5615. [893] Chatr-Aryamontri A, et al. VirusMINT: a viral protein interaction database. Nucleic Acids Res 2009;37 (Suppl. 1):D66973. [894] Calderone A, Licata L, Cesareni G. VirusMentha: a new resource for virus-host protein interactions. Nucleic Acids Res 2014;gku830. [895] Mei S, Zhang K. Computational discovery of Epstein-Barr virus targeted human genes and signalling pathways. Sci Rep 2016;6:30612. [896] Orchard S, et al. Protein interaction data curation: the International Molecular Exchange (IMEx) consortium. Nat Methods 2012;9(4):34550. [897] del-Toro N, et al. A new reference implementation of the PSICQUIC web service. Nucleic Acids Res 2013;41 (W1):W6016. [898] Qureshi A, et al. VIRmiRNA: a comprehensive resource for experimentally validated viral miRNAs and their targets. Database 2014;2014:bau103. [899] Li Y, et al. ViRBase: a resource for virushost ncRNA-associated interactions. Nucleic Acids Res 2014; gku903. [900] Xiao F, et al. miRecords: an integrated resource for microRNAtarget interactions. Nucleic Acids Res 2009;37(Suppl. 1):D10510. [901] Li J-H, et al. starBase v2. 0: decoding miRNA-ceRNA, miRNA-ncRNA and proteinRNA interaction networks from large-scale CLIP-Seq data. Nucleic Acids Res 2013;gkt1248. [902] Hernando H, et al. The B cell transcription program mediates hypomethylation and overexpression of key genes in Epstein-Barr virus-associated proliferative conversion. Genome Biol 2013;14(1):1. [903] Chen BS, Li CW. Constructing an integrated genetic and epigenetic cellular network for whole cellular mechanism using high-throughput next-generation sequencing data. BMC Syst Biol 2016;10 (1):118. [904] Smith ZD, Meissner A. DNA methylation: roles in mammalian development. Nat Rev Genet 2013;14 (3):20420. [905] Granato M, et al. EBV blocks the autophagic flux and appropriates the autophagic machinery to enhance viral replication. J Virol 2014;02199-14:JVI. [906] Calderwood MA, et al. Epstein-Barr virus and virus human protein interaction maps. Proc Natl Acad Sci USA 2007;104(18):760611. [907] Albanese M, et al. Epstein-Barr virus microRNAs reduce immune surveillance by virus-specific CD8(1) T cells. Proc Natl Acad Sci USA 2016;113(42):E646775.
638
References
[908] Hatton OL, et al. The interplay between EpsteinBarr virus and B lymphocytes: implications for infection, immunity, and disease. Immunol Res 2014;58(23):26876. [909] So¨llner J, et al. Concept and application of a computational vaccinology workflow. Immunome Res 2010;6 (2):1. [910] Nakamura Y, et al. The GAS5 (growth arrest-specific transcript 5) gene fuses to BCL6 as a result of t (1; 3) (q25; q27) in a patient with B-cell lymphoma. Cancer Genet Cytogenet 2008;182(2):1449. [911] Zhao L, et al. Long non-coding RNA SNHG5 suppresses gastric cancer progression by trapping MTA2 in the cytosol. Oncogene 2016;35. [912] Zhang Y, et al. Association between RNF41 gene c.-206 T. A genetic polymorphism and risk of congenital heart diseases in the Chinese Mongolian population. Genet Mol Res: GMR 2016;15(2). [913] Wauman J, et al. RNF41 (Nrdp1) controls type 1 cytokine receptor degradation and ectodomain shedding. J Cell Sci 2011;124(6):92132. [914] Rowe M, Raithatha S, Shannon-Lowe C. Counteracting effects of cellular Notch and EpsteinBarr virus EBNA2: implications for stromal effects on virus-host interactions. J Virol 2014;88 (20):1206576. [915] Choy EY-W, et al. An Epstein-Barr virusencoded microRNA targets PUMA to promote host cell survival. J Exp Med 2008;205(11):255160. [916] Meckes Jr D, Raab-Traub N. Mining Epstein-Barr virus LMP1 signaling networks. J Carcinog Mutagene 2011;11:379. [917] Zeng L, et al. Cellular FLICE-like inhibitory protein (c-FLIP) and PS1-associated protein (PSAP) mediate presenilin 1-induced γ-secretase-dependent and-independent apoptosis, respectively. J Biol Chem 2015;290 (30):1826980. [918] Li X, Bhaduri-McIntosh S. A central role for STAT3 in gammaherpesvirus-life cycle and -diseases. Front Microbiol 2016;7. [919] Jochum S, et al. The EBV immunoevasins vIL-10 and BNLF2a protect newly infected B cells from immune recognition and elimination. PLoS Pathog 2012;8(5):e1002704. [920] Kamperschroer C, et al. The genomic sequence of lymphocryptovirus from cynomolgus macaque. Virology 2016;488:2836. [921] Sides MD, et al. Arsenic mediated disruption of promyelocytic leukemia protein nuclear bodies induces ganciclovir susceptibility in EpsteinBarr positive epithelial cells. Virology 2011;416(1):8697. [922] Sivachandran N, Cao JY, Frappier L. Epstein-Barr virus nuclear antigen 1 hijacks the host kinase CK2 to disrupt PML nuclear bodies. J Virol 2010;84(21):1111323. [923] Lai KW, et al. MicroRNA-130b regulates the tumour suppressor RUNX3 in gastric cancer. Eur J Cancer 2010;46(8):145663. [924] Hsu D-H, et al. Expression of interleukin-10 activity by Epstein-Barr virus protein BCRF1. Science 1990;250 (4982):830. [925] Han L, et al. Sequence variation of Epstein-Barr virus (EBV) BCRF1 in lymphomas in non-endemic areas of nasopharyngeal carcinoma. Arch Virol 2015;160(2):4415. [926] Park GB, et al. Melphalan-induced apoptosis of EBV-transformed B cells through upregulation of TAp73 and XAF1 and nuclear import of XPA. J Immunol 2013;191(12):628191. [927] Koganti S, et al. Cellular STAT3 functions via PCBP2 to restrain Epstein-Barr virus lytic activation in B lymphocytes. J Virol 2015;89(9):500211. [928] Seo J, et al. Cell cycle arrest and lytic induction of EBV-transformed B lymphoblastoid cells by a histone deacetylase inhibitor, trichostatin A. Oncol Rep 2008;19(1):93. [929] Ahsan N, et al. Epstein-Barr virus transforming protein LMP1 plays a critical role in virus production. J Virol 2005;79(7):441524. [930] Jung EJ, et al. Lytic induction and apoptosis of Epstein-Barr virus-associated gastric cancer cell line with epigenetic modifiers and ganciclovir. Cancer Lett 2007;247(1):7783. [931] Jones K, et al. Sodium valproate in combination with ganciclovir induces lysis of EBV-infected lymphoma cells without impairing EBV-specific T-cell immunity. Int J Lab Hematol 2010;32(1p1):e16974. [932] Kenney SC. Reactivation and lytic replication of EBV. Cambridge University Press; 2007. [933] Hutajulu SH, et al. Therapeutic implications of Epstein-Barr virus infection for the treatment of nasopharyngeal carcinoma. Ther Clin Risk Manage 2014;10(September):72136.
References
639
[934] Noh KW, Park J, Kang MS. Targeted disruption of EBNA1 in EBV-infected cells attenuated cell growth. Bmb Rep 2016;49(4):22631. [935] Zihlif MA, et al. Thymoquinone efficiently inhibits the survival of EBV-infected B cells and alters EBV gene expression. Integr Cancer Ther 2013;12(3):25763. [936] Gorres KL, et al. Valpromide inhibits lytic cycle reactivation of Epstein-Barr virus. mBio 2016;7(2): e0011316. [937] Rao SP, et al. Zebularine reactivates silenced E-cadherin but unlike 5-azacytidine does not induce switching from latent to lytic Epstein-Barr virus infection in Burkitt’s lymphoma Akata cells. Mol Cancer 2007;6(1):1. [938] Zhu Y, et al. γ-Herpesvirus-encoded miRNAs and their roles in viral biology and pathogenesis. Curr Opin Virol 2013;3(3):26675. [939] Alberghini F, et al. An epigenetic view of B-cell disorders. Immunol Cell Biol 2015;93(3):25360. [940] Dreyfus DH. Gene sharing between EpsteinBarr virus and human immune response genes. Immunol Res 2016;19. [941] Jansen CA, Piriou E, De Cuyper IM, et al. Long-term highly active antiretroviral therapy in chronic HIV-1 infection: evidence for reconstitution of antiviral immunity. Antivir Ther 2006;11(1):10516. [942] Perelson AS. Modelling viral and immune system dynamics. Nat Rev Immunol 2002;2(1):2836. [943] Duskova K, Nagilla P, Le HS, et al. MicroRNA regulation and its effects on cellular transcriptome in Human Immunodeficiency Vi-rus-1 (HIV-1) infected individuals with distinct viral load and CD4 cell counts. BMC Infect Dis 2013;13. [944] Sun GH, Li HT, Wu XW, et al. Interplay between HIV-1 infection and host microRNAs. Nucleic Acids Res 2012;40(5):218196. [945] Whisnant AW, Bogerd HP, Flores O, et al. In-depth analysis of the interaction of HIV-1 with cellular microRNA biogenesis and effector mechanisms. Mbio 2013;4(2) e00193-13. [946] Mohammadi P, Desfarges S, Bartha I, et al. 24 Hours in the life of HIV-1 in a T cell line. PLoS Pathog 2013;9:1. [947] Lu J, Clark AG. Impact of microRNA regulation on variation in human gene expression. Genome Res 2012;22(7):124354. [948] Keshet I, Yisraeli J, Cedar H. Effect of regional DNA methylation on gene expression. Proc Natl Acad Sci USA 1985;82(9):25604. [949] O’Doherty U, Swiggard WJ, Malim MH. Human immunodeficiency virus type 1 spinoculation enhances infection through virus binding. J Virol 2000;74(21):1007480. [950] Hsu CW, Juan HF, Huang HC. Characterization of microRNA-regulated protein-protein interaction network. Proteomics 2008;8(10):19759. [951] Liang H, Li WH. MicroRNA regulation of human proteinprotein interaction network. RNA 2007;13 (9):14028. [952] Wang C, Jiang W, Li W, et al. Topological properties of the drug targets regulated by microRNA in human protein-protein interaction network. J Drug Target 2011;19(5):35464. [953] Zhang Y, Guo X, Xiong L, et al. Comprehensive analysis of microRNA-regulated protein interaction network reveals the tumor suppressive role of microRNA-149 in human hepatocellular carcinoma via targeting AKT-mTOR pathway. Mol Cancer 2014;13:253. [954] Redova M, Svoboda M, Slaby O. MicroRNAs and their target gene networks in renal cell carcinoma. Biochem Biophys Res Commun 2011;405(2):1536. [955] Kauder SE, Bosque A, Lindqvist A, Planelles V, Verdin E. Epigenetic regulation of HIV-1 latency by cytosine methylation. PLoS Pathog 2009;5(6):e1000495. [956] Valinluck V, Tsai HH, Rogstad DK, et al. Oxidative damage to methyl-CpG sequences inhibits the binding of the methyl-CpG binding domain (MBD) of methyl-CpG binding protein 2 (MeCP2). Nucleic Acids Res 2004;32(14):41008. [957] Nakayama-Hosoya K, Ishida T, Youngblood B, et al. Epigenetic repression of interleukin 2 expression in senescent CD4 1 T cells during chronic HIV type 1 infection. J Infect Dis 2015;211(1):2839. [958] Gibney ER, Nolan CM. Epigenetics and gene expression. Heredity (Edinb) 2010;105(1):413. [959] Warren K, Warrilow D, Meredith L, Harrich D. Reverse transcriptase and cellular factors: regulators of HIV-1 reverse transcription. Viruses 2009;1(3):87394.
640
References
[960] Bourbigot S, Beltz H, Denis J, et al. The C-terminal domain of the HIV-1 regulatory protein Vpr adopts an antiparallel dimeric structure in solution via its leucine-zipper-like domain. Biochem J 2005;387(Pt 2):33341. [961] Zhao RY, Li G, Bukrinsky MI. Vpr-host interactions during HIV-1 viral life cycle. J Neuroimmune Pharmacol 2011;6(2):21629. [962] Sherer NM, Lehmann MJ, Jimenez-Soto LF, et al. Visualization of retroviral replication in living cells reveals budding into multivesicular bodies. Traffic 2003;4(11):785801. [963] Wang Q, Mora-Jensen H, Weniger MA, et al. ERAD inhibitors integrate ER stress with an epigenetic mechanism to activate BH3-only protein NOXA in cancer cells. Proc Natl Acad Sci USA 2009;106(7):22005. [964] Clarke HJ, Chambers JE, Liniker E, Marciniak SJ. Endoplasmic reticulum stress in malignancy. Cancer Cell 2014;25(5):56373. [965] Puoti M, Bruno R, Soriano V, et al. Hepatocellular carcinoma in HIV-infected patients: epidemiological features, clinical presentation and outcome. AIDS 2004;18(17):228593. [966] Perez CL, Milush JM, Buggert M, et al. Targeting of conserved gag-epitopes in early HIV infection is associated with lower plasma viral load and slower CD4(1) T cell depletion. Aids Res Hum Retroviruses 2013;29 (3):60212. [967] Engeland CE, Brown NP, Borner K, et al. Proteome analysis of the HIV-1 Gag interactome. Virology 2014;460461:194206. [968] Lewis B, Whitney S, Hudacik L, et al. Nedd4-mediated increase in HIV-1 Gag and Env proteins and immunity following DNA-vaccination of BALB/c mice. PLoS One 2014;9(3):e91267. [969] Liu J, Wan LX, Liu PD, et al. SCF beta-TRCP-mediated degradation of NEDD4 inhibits tumorigenesis through modulating the PTEN/Akt signaling pathway. Oncotarget 2014;5(4):102636. [970] Cao Z, Kyprianou N. Mechanisms navigating the TGF-beta pathway in prostate cancer. Asian J Urol 2015;2 (1):1118. [971] Yang Z, Zhuan B, Yan Y, Jiang S, Wang T. Identification of gene markers in the development of smokinginduced lung cancer. Gene 2016;576(1 Pt 3):4517. [972] Weiss ER, Popova E, Yamanaka H, et al. Rescue of HIV-1 release by targeting widely divergent NEDD4type ubiquitin ligases and isolated catalytic HECT domains to Gag. PLoS Pathog 2010;6(9):e1001107. [973] Houzet L, Klase Z, Yeung ML, et al. The extent of sequence complementarity correlates with the potency of cellular miRNA-mediated restriction of HIV-1. Nucleic Acids Res 2012;40(22):1168496. [974] He GC, Margolis DM. Counterregulation of chromatin deacetylation and histone deacetylase occupancy at the integrated promoter of human immunodeficiency virus type 1 (HIV-1) by the HIV-1 repressor YY1 and HIV-1 activator Tat. Mol Cell Biol 2002;22(9):296573. [975] Spadoni JL, Rucart P, Le Clerc S, et al. Identification of genes whose expression profile is associated with non-progression towards AIDS using eQTLs. PLoS One 2015;10(9):e0136989. [976] Peterlin BM, Trono D. Hide, shield and strike back: how HIV-infected cells avoid immune eradication. Nat Rev Immunol 2003;3(2):97107. [977] Xie GQ, Yu ZS, Jia DY, Jiao RJ, Deng WM. E(y)1/TAF9 mediates the transcriptional output of Notch signaling in Drosophila. J Cell Sci 2014;127(17):38309. [978] Mbita Z, Hull R, Dlamini Z. Human immunodeficiency virus-1 (HIV-1)-mediated apoptosis: new therapeutic targets. Viruses (Basel) 2014;6(8):3181227. [979] Hayes AM, Qian SM, Yu LB, Boris-Lawrie K. Tat RNA silencing suppressor activity contributes to perturbation of lymphocyte miRNA by HIV-1. Retrovirology 2011;8:36. [980] Pedersen IM, Cheng G, Wieland S, et al. Interferon modulation of cellular microRNAs as an antiviral mechanism. Nature 2007;449(7164) 919-U13. [981] Santhakumar D, Forster T, Laqtom NN, et al. Combined agonist-antagonist genome-wide functional screening identifies broadly active antiviral microRNAs. Proc Natl Acad Sci USA 2010;107(31):138305. [982] Sanghvi VR, Steel LF. RNA silencing as a cellular defense against HIV-1 infection: progress and issues. FASEB J 2012;26(10):393745. [983] Denkert C, Koch I, von Keyserlingk N, et al. Expression of the ELAV-like protein HuR in human colon cancer: association with tumor stage and cyclooxygenase-2. Mod Pathol 2006;19(9):12619. [984] Kim HC, Choi KC, Choi HK, et al. HDAC3 selectively represses CREB3-mediated transcription and migration of metastatic breast cancer cells. Cell Mol Life Sci 2010;67(20):3499510.
References
641
[985] Li XR, Chu HJ, Lv T, et al. miR-342-3p suppresses proliferation, migration and invasion by targeting FOXM1 in human cervical cancer. FEBS Lett 2014;588(17):3298307. [986] Tai MC, Kajino T, Nakatochi M, et al. miR-342-3p regulates MYC transcriptional activity via direct repression of E2F1 in human lung cancer. Carcinogenesis 2015;36(12):146473. [987] Vo DT, Abdelmohsen K, Martindale JL, et al. The oncogenic RNA-binding protein musashi1 is regulated by HuR via mRNA translation and stability in glioblastoma cells. Mol Cancer Res 2012;10(1):14355. [988] Xhemalce B. From histones to RNA: role of methylation in cancer. Brief Funct Genomics 2013;12 (3):24453. [989] Xu LM, Li LQ, Li J, et al. Overexpression of miR-1260b in non-small cell lung cancer is associated with lymph node metastasis. Aging Dis 2015;6(6):47885. [990] Yoon SY, Kim JM, Oh JH, et al. Gene expression profiling of human HBV- and/or HCV-associated hepatocellular carcinoma cells using expressed sequence tags. Int J Oncol 2006;29(2):31527. [991] Arion D, Lewis DA. Altered expression of regulators of the cortical chloride transporters NKCC1 and KCC2 in schizophrenia. Arch Gen Psychiatry 2011;68(1):2131. [992] Sola C, Garcia-Ladona FJ, Mengod G, et al. Increased levels of the Kunitz protease inhibitor-containing beta APP mRNAs in rat brain following neurotoxic damage. Brain Res Mol Brain Res 1993;17(12):4152. [993] Rodriguez E, Plaud M, Romeu R, Skolasky R, Melendez L. Late H.I.V. infection modulates the expression and activity of Cathepsin B, and its inhibitors in macrophages: implications in neuropatho-genesis. Retrovirology 2010;7:1819. [994] Chen Z, Manley JL. Robust mRNA transcription in chicken DT40 cells depleted of TAF(II)31 suggests both functional degeneracy and evolutionary divergence. Mol Cell Biol 2000;20(14):506476. [995] Frontini M, Soutoglou E, Argentini M, et al. TAF9b (formerly TAF9L) is a bona fide TAF that has unique and overlapping roles with TAF9. Mol Cell Biol 2005;25(11):463849. [996] Ribet D, Cossart P. Pathogen-mediated posttranslational modifications: a re-emerging field. Cell 2010;143 (5):694702. [997] Dubrow R, Silverberg MJ, Park LS, Crothers K, Justice AC. HIV infection, aging, and immune function: implications for cancer risk and prevention. Curr Opin Oncol 2012;24(5):50616. [998] Engels EA, Pfeiffer RM, Goedert JJ, et al. Trends in cancer risk among people with AIDS in the United States 19802002. Aids 2006;20(12):164554. [999] Patel P, Hanson DL, Sullivan PS, et al. Incidence of types of cancer among HIV-infected persons compared with the general population in the United States, 19922003. Ann Intern Med 2008;148(10):72836. [1000] Formenti SC, Chak L, Gill P, Buess EM, Hill CK. Increased radio-sensitivity of normal tissue fibroblasts in patients with acquired-immunodeficiency-syndrome (AIDS) and with Kaposis-Sarcoma. Int J Radiat Biol 1995;68(4):41112. [1001] Ghanam RH, Samal AB, Fernandez TF, Saad JS. Role of the HIV-1 matrix protein in Gag intracellular trafficking and targeting to the plasma membrane for virus assembly. Front Microbiol 2012;3:55. [1002] Warrilow D, Tachedjian G, Harrich D. Maturation of the HIV reverse transcription complex: putting the jigsaw together. Rev Med Virol 2009;19(6):32437. [1003] Giroud C, Chazal N, Gay B, et al. HIV-1-associated PKA acts as a cofactor for genome reverse transcription. Retrovirology 2013;10. [1004] Leng J, Ho HP, Buzon MJ, et al. A cell-intrinsic inhibitor of HIV-1 reverse transcription in CD4(1) T cells from elite controllers. Cell Host Microbe 2014;15(6):71728. [1005] Liang C, Hu J, Russell RS, Kameoka M, Wainberg MA. Spliced human immunodeficiency virus type 1 RNA is reverse transcribed into cDNA within infected cells. Aids Res Hum Retroviruses 2004;20 (2):20311. [1006] Pak V, Eifler TT, Jager S, et al. CDK11 in TREX/THOC regulates HIV mRNA 30 end processing. Cell Host Microbe 2015;18(5):56070. [1007] Berro R, Pedati C, Kehn-Hall K, et al. CDK13, a new potential human immunodeficiency virus type 1 inhibitory factor regulating viral mRNA splicing. J Virol 2008;82(14):715566. [1008] Aandahl EM, Aukrust P, Skalhegg BS, et al. Protein kinase A type I antagonist restores immune responses of T cells from HIV-infected patients. FASEB J 1998;12(10):85562. [1009] Ellis J, Hotta A, Rastegar M. Retrovirus silencing by an epigenetic TRIM. Cell 2007;131(1):1314.
642
References
[1010] Vang T, Liu WH, Delacroix L, et al. LYP inhibits T-cell activation when dissociated from CSK. Nat Chem Biol 2012;8(5):43746. [1011] Pareek TK, Lam E, Zheng XJ, et al. Cyclin-dependent kinase 5 activity is required for T cell activation and induction of experimental autoimmune encephalomyelitis. J Exp Med 2010;207(11):250719. [1012] Iordache L, Launay O, Bouchaud O, et al. Autoimmune diseases in HIV-infected patients: 52 cases and literature review. Autoimmun Rev 2014;13(8):8507. [1013] Gupta P, Liu B, Wu JQ, et al. Genome-wide mRNA and miRNA analysis of peripheral blood mononuclear cells (PBMC) reveals different miRNAs regulating HIV/HCV co-infection. Virology 2014;450451:33649. [1014] Wu J, Shen L, Chen J, Xu H, Mao L. The role of microRNAs in enteroviral infections. Braz J Infect Dis 2015;19(5):51016. [1015] Xu X, Bieda M, Jin VX, et al. A comprehensive ChIP-chip analysis of E2F1, E2F4, and E2F6 in normal and tumor cells reveals inter-changeable roles of E2F family members. Genome Res 2007;17(11):155061. [1016] Choi S, Kim HR, Leng L, et al. Role of macrophage migration inhibitory factor in the regulatory T cell response of tumor-bearing mice. J Immunol 2012;189(8):390513. [1017] Ammar A, Sahraoui Y, Tsapis A, et al. Human immunodeficiency virus-infected adherent cell-derived inhibitory factor (P-29) inhibits normal T-cell proliferation through decreased expression of high-affinity interleukin-2 receptors and production of interleukin-2. J Clin Invest 1992;90(1):814. [1018] Rowbotham SP, Barki L, Neves-Costa A, et al. Maintenance of silent chromatin through replication requires SWI/SNF-like chromatin remodeler SMARCAD1. Mol Cell 2011;42(3):28596. [1019] Fatima M, Prajapati B, Saleem K, et al. Novel insights into role of miR-320a-VDAC1 axis in astrocytemediated neuronal damage in NeuroAIDS. Glia 2017;65(2):25063. [1020] Muthumani K, Shedlock DJ, Choo DK, et al. HIV-mediated phosphatidylinositol 3-kinase/serine-threonine kinase activation in APCs leads to programmed death-1 ligand upregulation and suppression of HIVspecific CD8 T cells. J Immunol 2011;187(6):293243. [1021] Myers AP. New strategies in endometrial cancer: targeting the PI3K/mTOR pathway—the devil is in the details. Clin Cancer Res 2013;19(19):526474. [1022] Fowler L, Conceicao V, Perera S, et al. First evidence for the disease-stage, cell-type, and virus specificity of microRNAs during human immunodeficiency virus type-1 infection. Med Sci 2016;4(2):E10. [1023] Christeff N, Gharakhanian S, Thobie N, Rozenbaum W, Nunez EA. Evidence for changes in adrenal and testicular steroids during HIV infection. J Acquir Immune Defic Syndr 1992;5(8):8416. [1024] Christeff N, Gherbi N, Mammes O, et al. Serum cortisol and DHEA concentrations during HIV infection. Psychoneuroendocrinology 1997;22(Suppl. 1):S1118. [1025] Christeff N, Melchior JC, de Truchis P, et al. Lipodystrophy defined by a clinical score in HIV-infected men on highly active antiretroviral therapy: correlation between dyslipidaemia and steroid hormone alterations. AIDS 1999;13(16):225160. [1026] Christeff N, Melchior JC, Mammes O, et al. Correlation between increased cortisol: DHEA ratio and malnutrition in HIV-positive men. Nutrition 1999;15(78):5349. [1027] Towers G, Harris J, Lang G, Collins MKL, Latchman DS. Retinoic acid inhibits both the basal activity and phorbol ester-mediated activation of the HIV long terminal repeat promoter. AIDS 1995;9(2):12936. [1028] Espeseth AS, Fishel R, Hazuda D, et al. siRNA screening of a targeted library of DNA repair factors in HIV infection reveals a role for base excision repair in HIV integration. PLoS One 2011;6(3):e17612. [1029] Bushman FD, Malani N, Fernandes J, et al. Host cell factors in HIV replication: meta-analysis of genomewide studies. PLoS Pathog 2009;5(5):e1000437. [1030] Shearer RF, Iconomou M, Watts CK, Saunders DN. Functional roles of the E3 ubiquitin ligase UBR5 in cancer. Mol Cancer Res 2015;13(12):152332. [1031] Drahovsky D, Lacko I, Wacker A. Enzymatic DNA methylation during repair synthesis in nonproliferating human peripheral lymphocytes. Biochim Biophys Acta 1976;447(2):13943. [1032] Esteller M, Gaidano G, Goodman SN, et al. Hypermethylation of the DNA repair gene O(6)-methylguanine DNA methyltransferase and survival of patients with diffuse large B-cell lymphoma. J Natl Cancer Inst 2002;94(1):2632. [1033] Baxter J, Sauer S, Peters A, et al. Histone hypomethylation is an indicator of epigenetic plasticity in quiescent lymphocytes. EMBO J 2004;23(22):446272.
References
643
[1034] Zhang Y, Zhao M, Sawalha AH, Richardson B, Lu Q. Impaired DNA methylation and its mechanisms in CD4(1)T cells of systemic lupus erythematosus. J Autoimmun 2013;41:929. [1035] Emerson V, Holtkotte D, Pfeiffer T, et al. Identification of the cellular prohibitin 1/prohibitin 2 heterodimer as an interaction partner of the C-terminal cytoplasmic domain of the HIV-1 glycoprotein. J Virol 2010;84 (3):135565. [1036] Kumar A, Zloza A, Moon RT, et al. Active beta-catenin signaling is an inhibitory pathway for human immunodeficiency virus replication in peripheral blood mononuclear cells. J Virol 2008;82(6):281320. [1037] Rajalingam K, Wunder C, Brinkmann V, et al. Prohibitin is required for Ras-induced Raf-MEK-ERK activation and epithelial cell migration. Nat Cell Biol 2005;7(8):83743. [1038] Benetti L, Calistri A, Ulivieri C, et al. Inhibition of ShcA isoforms p46/p52Shc enhances HIV-1 replication in CD4 1 T-lymphocytes. J Cell Physiol 2004;199(1):406. [1039] Catrina SB, Lewitt M, Massambu C, et al. Insulin-like growth factor-I receptor activity is essential for Kaposi’s sarcoma growth and survival. Br J Cancer 2005;92(8):146774. [1040] Woldt E, Matz RL, Terrand J, et al. Differential signaling by adaptor molecules LRP1 and ShcA regulates adipogenesis by the insulin-like growth factor-1 receptor. J Biol Chem 2011;286(19):1677582. [1041] Crum NF, Spencer CR, Amling CL. Prostate carcinoma among men with human immunodeficiency virus infection. Cancer 2004;101(2):2949. [1042] Schlaberg R, Fisher JG, Flamm MJ, et al. Chronic myeloid leukemia and HIV-infection. Leuk Lymphoma 2008;49(6):115560. [1043] Miller AM, Lundberg K, Ozenci V, et al. CD4(1)CD25(high) T cells are enriched in the tumor and peripheral blood of prostate cancer patients. J Immunol 2006;177(10):7398405. [1044] Hoxie JA, Alpers JD, Rackowski JL, et al. Alterations in T4 (CD4) protein and mRNA synthesis in cells infected with HIV. Science 1986;234(4780):11237. [1045] Birge RB, Kalodimos C, Inagaki F, Tanaka S. Crk and CrkL adaptor proteins: networks for physiological and pathological signaling. Cell Commun Signal 2009;7:13. [1046] Minegishi M, Tachibana K, Sato T, et al. Structure and function of Cas-L, a 105-kD Crk-associated substrate-related protein that is involved in beta 1 integrin-mediated signaling in lymphocytes. J Exp Med 1996;184(4):136575. [1047] Guo M, Shapiro R, Morris GM, Yang XL, Schimmel P. Packaging HIV virion components through dynamic equilibria of a human tRNA synthetase. J Phys Chem B 2010;114(49):162739. [1048] Mercenne G, Bernacchi S, Richer D, et al. HIV-1 Vif binds to APOBEC3G mRNA and inhibits its translation. Nucleic Acids Res 2010;38(2):63346. [1049] Park SG, Schimmel P, Kim S. Aminoacyl tRNA synthetases and their connections to disease. Proc Natl Acad Sci USA 2008;105(32):110439. [1050] Folgueira L, Algeciras A, MacMorran WS, Bren GD, Paya CV. The Ras-Raf pathway is activated in human immunodeficiency virus-infected monocytes and participates in the activation of NF-kappa B. J Virol 1996;70(4):23328. [1051] Lyman MG, Randall JA, Calton CM, Banfield BW. Localization of ERK/MAP kinase is regulated by the alphaherpesvirus tegument protein Us2. J Virol 2006;80(14):715968. [1052] Geng Y, Whoriskey W, Park MY, et al. Rescue of cyclin D1 deficiency by knockin cyclin E. Cell 1999;97 (6):76777. [1053] Lukas J, Herzinger T, Hansen K, et al. Cyclin E-induced S phase without activation of the pRb/E2F pathway. Genes Dev 1997;11(11):147992. [1054] Knudsen ES, Wang JY. Targeting the RB-pathway in cancer therapy. Clin Cancer Res 2010;16(4):10949. [1055] Liu L, Ruan J. Network-based pathway enrichment analysis. In: Proceedings IEEE International Conference on Bioinformatics and Biomedical. 2013. p. 21821. [1056] Agarwal N, Iyer D, Oplt T, et al. Mechanism of HIV-associated hepatic steatosis: role of HIV-1 accessory protein Vpr, PPARalpha and LXRalpha. Endocr Rev 2014;35:3. [1057] Sterling RK, Smith PG, Brunt EM. Hepatic steatosis in human immunodeficiency virus a prospective study in patients without viral hepatitis, diabetes, or alcohol abuse. J Clin Gastroenterol 2013;47(2):1827. [1058] Garron ML, Arthos J, Guichou JF, et al. Structural basis for the interaction between focal adhesion kinase and CD4. J Mol Biol 2008;375(5):13208.
644
References
[1059] Gekonge B, Raymond AD, Yin X, et al. Retinoblastoma protein induction by HIV viremia or CCR5 in monocytes exposed to HIV-1 mediates protection from activation-induced apoptosis: ex vivo and in vitro study. J Leukoc Biol 2012;92(2):397405. [1060] Bosque A, Planelles V. Induction of HIV-1 latency and reactivation in primary memory CD4 1 T cells. Blood 2009;113(1):5865. [1061] Afonso PV, Zamborlini A, Saib A, Mahieux R. Centrosome and retroviruses: the dangerous liaisons. Retrovirology 2007;4:27. [1062] Herasimtschuk AA, Hansen BR, Langkilde A, et al. Low-dose growth hormone for 40 weeks induces HIV1-specific T cell responses in patients on effective combination anti-retroviral therapy. Clin Exp Immunol 2013;173(3):44453. [1063] Quinn J, Astemborski J, Mehta SH, et al. HIV/hcv co-infection, liver disease progression, and age-related IGF-1 decline. Pathog Immun 2017;2(1):509. [1064] de la Vega M, Marin M, Kondo N, et al. Inhibition of HIV-1 endocytosis allows lipid mixing at the plasma membrane, but not complete fusion. Retrovirology 2011;8. [1065] Swingler S, Zhou J, Swingler C, et al. Evidence for a pathogenic determinant in HIV-1 Nef involved in B cell dysfunction in HIV/AIDS. Cell Host Microbe 2008;4(1):6376. [1066] Muller F, Froland SS, Aukrust P, Fagerhol MK. Elevated serum calprotectin levels in HIV-infected patients the calprotectin response during ZDV treatment is associated with clinical events. J Acquir Immune Defic Syndromes Hum Retrovirol 1994;7(9):9319. [1067] Hung CH, Thomas L, Ruby CE, et al. HIV-1 Nef assembles a Src family kinase-ZAP-70/Syk-PI3K cascade to downregulate cell-surface MHC-I. Cell Host Microbe 2007;1(2):12133. [1068] Fu W, Sanders-Beer BE, Katz KS, et al. Human immunodeficiency virus type 1, human protein interaction database at NCBI. Nucleic Acids Res 2009;37:D41722. [1069] Hsu PWC, Lin LZ, Hsu SD, Hsu JBK, Huang HD. ViTa: prediction of host microRNAs targets on viruses. Nucleic Acids Res 2007;35:D3815. [1070] Kozomara A, Griffiths-Jones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res 2014;42:D6873 (Database issue). [1071] Matsunaga A, Hishima T, Tanaka N, et al. DNA methylation profiling can classify HIV-associated lymphomas. AIDS 2014;28(4):50310. [1072] Brass AL, Dykxhoorn DM, Benita Y, et al. Identification of host proteins required for HIV infection through a functional genomic screen. Science 2008;319(5865):9216. [1073] Konig R, Zhou YY, Elleder D, et al. Global analysis of host-pathogen interactions that regulate early-stage HIV-1 replication. Cell 2008;135(1):4960. [1074] Zhou HL, Xu M, Huang Q, et al. Genome-scale RNAi screen for host factors required for HIV replication. Cell Host Microbe 2008;4(5):495504. [1075] Campillos M, Kuhn M, Gavin AC, Jensen LJ, Bork P. Drug target identification using side-effect similarity. Science 2008;321(5886):2636. [1076] Yang L, Agarwal P. Systematic drug repositioning based on clinical side-effects. PLoS One 2011;6(12): e28025 1-9. [1077] Duran-Frigola M, Aloy P. Recycling side-effects into clinical markers for drug repositioning. Genome Med 2012;4(3):14. [1078] Hughes JP, Rees S, Kalindjian SB, Philpott KL. Principles of early drug discovery. Br J Pharmacol 2011;162:123949. [1079] Sliwoski G, Kothiwale S, Meiler J, Lowe Jr. EW. Computational methods in drug discovery. Pharm Rev 2014;66:33495. [1080] Hutchinson L, Kirk R. High drug attrition rates—where are we going wrong? Nature reviews. Clin Oncol 2011;8:18990. [1081] Csermely P, Korcsmacrps T, Kiss HJ, London G, Nussinov R. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther 2013;138:133408. [1082] Lindsay MA. Target discovery. Nat Rev Drug Discov 2003;2:8318. [1083] Wang X, Thijssen B, Yu H. Target essentiality and certainty characterize drug side effects. PLoS Comput Biol 2013;9:e1003119.
References
645
[1084] Arkin MR, Tang Y, Wells JA. Small molecule inhibitors of protein-protein interactions: progressing toward the reality. Chem Biol 2014;21:110214. [1085] Scott DE, Bayly AR, Abell C, Skidmore J. Small molecules, big targets: drug discovery faces the proteinprotein interaction challenge. Nat Rev Drug Discov 2016;15. [1086] Wong YH, Chiun CC, Lin CL, et al. A new era for cancer target therapies: applying systems biology and computer-aided drug design to cancer therapies. Curr Pharm Biotechnol 2016;17(14):124667. [1087] Wong YH, Wu CC, Lai HY, et al. Identification of network-based biomarkers of cardioembolic stroke using a systems biology approach with time series data. BMC Syst Biol 2015;9(S4):121. [1088] Chu LH, Chen BS. Construction of cancer-perturbed protein-protein interaction network for discovery of apoptosis drug targets. BMC Syst Biol 2008;2(56):120. [1089] Li CW, Su MH, Chen BS. Investigation of the cross-talk mechanism in Caco-2 cells during Clostridium difficile infection through genetic-and-epigenetic interspecies networks: big data mining and genome-wide identification. Front Immunol 2017;8:901, 1-30. [1090] Chen BS, Wu CC. HN Robust design and its applications to control, signal processing, communication, systems and synthetic biology. New York: Nova Science Pub Inc.; 2016. [1091] Csermely P, Agoston V, Ponger S. The efficiency of multi-target drug: the network approach might help drug design. TREND Pharmacol Sci 2005;26(4):17882. [1092] Chiang JH, Cheng WS, Hood L, Tian Q. An epigenetic biomarker panel for glioblastoma multiform personalized medicine through DNA methylation analysis of human embryonic stem cell-like signature. J Integr Biol 2014;18(5):31023. [1093] Hood L, Flores M. A personal view on systems medicine and the emergence of proactive P4 medicine: predictive, preventive, personalized and participatory. N Biochnol 2012;29(6):61324. [1094] Wong YH, Lin CL, Chen TS, et al. Multiple target drug cocktail design for attacking the core network markers of your cancer using ligand-based and structure-based virtual screening methods. BMC Med Genomics 2015;8(Suppl. 4):S4 112, 123. [1095] Wong YH, Chen RH, Chen BS. Core and specific network markers of carcinogenesis from multiple cancer samples. J Theor Biol 2014;362(1734):2014. [1096] Dickson M, Gagnon JP. Key factor in the rising cost of new drug discovery and development. Nat Rev Drug Discov 2004;3:41729. [1097] Gonem M. Predicting drug-target interaction from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics 2012;28:230410. [1098] Ozturk H, Ozkirimli, Ozgur A. A comparative study of SMILES-based compound similarity function for drug-target interaction prediction. BMC Bioinformatics 2016;17:111. [1099] Wong YH, Wu CC, Lai HY, et al. Identification of network-based biomarkers of Cardioembolic stroke using a systems biology approach with time series data. BMC Syst Biol 2015;9(S6):54 121. [1100] Li CW, Chen BS. Investigating HIV-human interaction networks to unravel pathogenic mechanisms for drug discovery: a systems biology approach. Curr HIV Res 2018;16:7795. [1101] Li CW, Chen BS. Investigating the genome-wide genetic and epigenetic interspecies networks for crosstalk mechanisms and multi-molecule drug design in human macrophages and dentritic cells both infected with Mycobacterium tuberculosis. Front Cell Infect Microbiol 2016;6(124):125. [1102] Wong YH, Li CW, Chen BS. Evolution of network biomarkers from early to late stage bladder samples. Biomed Res Int 2014;2014:125 159078.
Index Note: Page numbers followed by “f” and “t” refer to figures and tables, respectively.
A A20 protein, 106 107 Acc1 protein, 214 215 Acetyl-CoA, 214 215 Acquired Immune Deficiency Syndrome (AIDS), 559 Active-set algorithm, 275 Acute stroke treatments, 263 264 Acvr1b, 326, 330 331 Adaptive immunity, 287, 297 299 Adhesion, 3 5, 117 118 Adhesive stage network, 165 166 Adr1 protein, 146 Agglutinin-like sequence (Als), 122 123 Agilent in situ oligonucleotide microarrays, 189 191, 283 284 Agp2 protein, 212 215 Akaike information criterion (AIC), 15 16, 25, 71, 92 93, 120 121, 140, 163, 194 195, 207, 248 249, 275 276, 286, 303 304, 322 323, 341 342, 391 of human-gene, 512 of human-lncRNA, 512 513 stepwise procedure, 278t system order detection, 15 value, 585 α-adrenergic receptor (α-AR), 305 306 Als3 protein, 124, 212 213 Amyloid precursor protein (APP), 362, 565 567 Analysis of variance (ANOVA), 161 162, 191, 205 207, 492 494, 579 Androgen receptor (AR), 413 Angiogenesis, 294 Angiotensin II, 266 267 Antagomirs, 565 567 Apoptosis, 98, 158, 310 311 mechanism, 218 process, 417 418 regulation in primary and secondary infection, 330 332 Arrestin Beta 2 (ARRB2), 404 407, 413, 441 442, 453, 544 545 Aspergillus fumigates, 196 198 ATG5 protein, 534 ATP-binding cassette transporters (ABC transporters), 544
ABCA12, 544 Autophagy, 534 AVEN gene, 441 442 Axon guidance pathway, 230
B B cell receptor stimulation (BCR stimulation), 490 B cells, 297 299, 490. See also Caco-2 cells activation, 266 267 Bacille Calmette Gue´rin vaccination (BCG vaccination), 370 Bacterial endotoxin, 25 BBRF3 protein, 539 Bcl2 protein, 330 BCRF1 protein, 545 546 Beta-2 microglobulin (B2m), 217 β-glucan, 217 Betweenness, 323 325 BFRF3 protein, 549 BGLF4 protein, 551 Big data mining for candidate protein protein interaction network, 235 236 Big database, 3 5, 9 10 Biofilm formation, 3 5, 135 136 gene regulatory network, 152f Biological General Repository for Interaction Datasets (BioGRID), 88 89, 91 92, 161, 189 191, 320 Biological systems, 87 88, 221 222 Biomedical applications, 297 BKRF2 protein, 538 BLLF1 protein, 539 540 BNIP3L protein, 356 357 BNLF2B protein, 550 Btg2 protein, 323 325 Burkitt’s lymphoma (BL), 489 490 BVRF protein, 545 546
C c-jun N-terminal kinase (JNK), 87 88 C22orf28 protein, 415 416 C4BPA protein, 270 271
647
648
Index
Caco-2 cells, 375 376 adopting autophagy, DNA damage response, and activation of PAK1 and GRB2, 412 413 cellular activities, 415 418 GEINs and HPCNs in Caco-2 cells during early and late stages of CDI, 378 pathogenic factors utilized by C. difficile and resulting pathogenesis in, 408 412 Caenorhabditis elegans, 188, 219 CAL0005908 protein, 291 Calcitonin receptor-like receptor (Crlr), 220 Can1 protein, 212 213 Candida albicans, 3 7, 9, 117 118, 135, 157 158, 187, 203 204, 297 299, 317 318, 427 428 biofilm-related transcription factors, 144 147 potential biofilm-related transcription factors in formation and development of biofilm, 144 146 screening of potential Candida albicans biofilmrelated transcription factors, 144 statistical measures of screening test, 146 147 biofilm-specific property, 135 136 C. albicans host interaction, 117 118 Candida albicans zebrafish infection model, 283 candidate interspecies genetic and epigenetic interspecies networks via big data mining, 463 467 core interspecies pathways analysis, 439 444 data preprocessing of microarray data for human and pathogen, 432 433 defense mechanism of OKF6/TERT-2 cell and offense mechanism of strains, 444 448 dynamic intracellular hyphal PPI network, 178f dynamic models of candidate interspecies genetic and epigenetic interspecies networks, 468 471 extracting core network structures, 481 486 GEINs and HPCNs, 430 432 genome for strain SC5314, 136 identification of gene regulatory parameters, 149 150 determination of significant gene regulations, 150 statistical measures of screening test, 150 151 infection-associated genes via cellular molecular network approach, 122 126 methods of constructing cellular molecular networks in, 118 122 constructing cellular molecular network, 120 121 method overview and data selection, 118 120 predicting infection-associated genes at different infection stages, 121 122 OKF6/TERT-2 cell, 448 453 parameter estimation of dynamic models, 471 478 pathogenic mechanism of C. albicans infection, 433 444
pathways related to hyphal growth, 179f prediction of drug target proteins and multiplemolecules drug design, 458 462 purification, 319 released pathogenic factor and accumulated cellular response, 453 458 SC5314, 433 436, 444, 452 453 strain and growth conditions, 318 systems methods of screening biofilm-related transcription factors, 136 144 trimming false positives, 479 481 WO-1, 429, 433 436, 444, 452 453 Candida Genome Database (CGD), 119 120, 189 191, 205, 308 309 Candidate gene regulatory network via dynamic gene regulatory model, 36 38 Candidate genetic-and-epigenetic interspecies network, 380 382 Candidate genome-wide interspecies genetic and epigenetic network (Candidate GIGEN), 9 10 Candidate GRN, 15 Candidate inflammatory gene regulatory network, construction of, 25 36 annotations of target genes, 28t flowchart for constructing GRN of inflammation, 26f genes, 27t inflammatory genes and regulators, 33t Candidate intracellular protein interaction network, 320 Candidate PPMI network reconstruction through data mining, 579 Candidate protein protein interaction networks, 90 92, 301 construction via multidatabase mining, 274 via dynamic interaction model, 92 93 selection of protein pool for, 161 162 Carbon utilization, shifts in, 214 215 CARD9 protein, 533 534 Cardioembolic stroke (CE stroke), 263 changes in cellular functions and proteins, 266 268 in early tissue plasminogen activator treatment, 269 271 after tissue plasminogen activator treatment, 268 269 immune events in pathomechanisms of early, 264 273 material and methods of PPI network construction and principle network projection, 273 277 pathomechanisms of early stroke and potential drug targets, 271 273 protein protein interaction networks at different stages of, 264 266 CAS4 mutant, 212 Casp2 protein, 330
Index
Caspase recruitment domain (CARD), 544 545 CARD8, 544 545 CCDC136 protein, 537 CCDC22 protein, 444 447 CCL19 protein, 540 542 CD0237 cell wall protein, 8, 414 415 CD0660 protein, 407 408 CD0663 protein, 407 408, 416 417 CD0745 protein, 414 415 CD1128 protein, 414 415 CD1185 protein, 414 CD1214 cell wall protein, 8, 418 CD1412 protein, 417 CD1466 protein, 407 408 CD2115 protein, 414 CD2119 protein, 413 414 CD2247 protein, 418 CD2356 protein, 413 414 CD2629 cell wall protein, 8 CD2643 cell wall protein, 8 CD2643 protein, 414 CD2753 protein, 414 CD2787 cell wall protein, 8, 414, 418 420 CD36 protein, 290 291 CD46 protein, 407 408, 531 533 CD84 protein, 362 364 CDC20 protein, 124 125 Cdc4 protein, 306 CDK7 protein, 571 572 Cek1 protein, 306 Cell behavior, 104 105 types, 281 282 Cell-surface hydrophobicity 1 (Csh1), 175 Cellular function networks, 276 Cellular molecular network construction methods, 128 134 Central nervous system (CNS), 229 230 Centrality analysis of zebrafish intracellular protein protein interaction networks, 323 325 Cerebellar wound-healing process inflammation and immune response in, 238 244 cross talks, 241 244 negative correlation with ZMI in wound-healing process, 241 positive correlation with ZMI in wound-healing process, 240 241 signaling pathways in wound-healing process, 239 240 temporal patterns and PPI network, 238 239 in zebrafishes big data mining for candidate protein protein interaction network, 235 236
649
dynamic network modeling for constructing cerebellar wound-healing protein protein interaction network, 236 238 experiments for zebrafish movement index, 232 235 protein protein interaction network of, 231 238 stab lesion assay and time-course microarray experiments, 231 232 systems biology tools and statistics, 238 Cerulenin, 430 CHEK1 protein, 533 534 Chemokine networks, 45 61 signaling pathway, 230 Chemokine (C-C motif) receptor 7 (CCR7), 244 Chloride intracellular channel (CLIC), 546 CHMP5 protein, 536 Cholinergic receptor muscarinic 2 (CHRM2), 244 Chromatin immunoprecipitation (ChIP), 137 138 CHS2 gene, 123 125 Circadian clock systems, 310 311 Circadian clock-related host proteins, 293 Citrulline, 291 CLOCK, 534 535 Clostridium difficile, 8, 375, 596 597 big data mining and data preprocessing of host/ pathogen gene/miRNA microarray data, 378 380 candidate genetic-and-epigenetic interspecies network, 380 382 cross-talk mechanism by genetic-and-epigenetic interspecies network, 396 421 dynamic models of GEINs for Caco-2 cells and Clostridium difficile during infection, 382 386 extracting core network structures, 393 396 false positives in candidate GEIN for real GEIN via system order detection scheme, 391 393 GEINs and HPCNs in Caco-2 cells during early and late stages of CDI, 378 materials and methods, 378 396 offensive mechanisms of Caco-2 cells and defense mechanisms, 413 415 parameter estimation of dynamic models of candidate GEIN via system identification method, 386 391 pathogen core networks, 403 404 pathogenic effects and host responses, 408 415 Clostridium difficile infection (CDI), 8, 375 376 GEINs at early and late stages of Clostridium difficile infection, 396 421 and HPCNs in Caco-2 cells during early and late stages of CDI, 378
650
Index
Combinatory therapy, 591 Common mediator SMAD (Co-SMAD), 326 327 Complement receptors (CR3), 339 Connectivity Map (CMap), 419 420, 592 594 Copper (Cu), 365 367, 413 414 Cu/ZnSOD, 212 213 COPS5 protein, 413 Core interspecies pathways analysis, 439 444 Core networks, 264 265 biomarker, 10 extraction from real cross-talk GWGEIN by applying PNP method, 351 355 Core protein protein interaction network, 277 cellular function networks, 276 projection, 276 277 Corticotropin-releasing factor receptor signaling pathway (CRFR signaling pathway), 269 270 Cph1 protein, 144 146 CRK protein, 572 573 Cross correlation, 68 Cross talks among host pathogen interactions and their validations, 404 408 among signaling pathways in inflammation, 90 95 analysis of protein protein interaction networks, 99 102 candidate protein protein interaction networks, 90 92 via dynamic interaction model, 92 93 cross-talk analysis by counting cross-talk ranking values, 93 95 GEINs at early and late stages of Clostridium difficile infection, 396 421 by genetic-and-epigenetic interspecies network, 396 421 mechanisms, 6 Cross-talk network biomarkers in host host domain, 290 291 in interaction difference network, 289 interplay among, 293 295 material and methods, 283 287 Ivs. calculation, 286 287 microarray data, 283 284 pathogen host protein protein interaction network, 285 286 protein pool selection and database mining, 284 285 in pathogen host domain, 292 293 in pathogen pathogen domain, 291 PH-PPINs for cross-talk network markers, 287 295 Cross-talk ranking value (CTRV), 93 94 counting for, 94f and link values, 100t
CSNK2A1 protein, 565 Cubic spline method, 69 Cyb5r2 protein, 196 198 Cyclin D1 (CCND1), 244 Cyp51 protein, 196 198 Cytochrome C oxidase subunit II (COX2), 537 538 Cytokine networks, 45 61 “Cytoskeletal regulation by Rho GTPase” pathway, 241 Cytoskeleton-associated recycling or transport complex (CART complex), 544 Cytotoxic T lymphocyte (CTL), 545, 547 Cytotoxin CD0660, 376
D Damage stage, 117 118 Danio rerio. See Zebrafish (Danio rerio) DARS protein, 572 573 Data mining, 24 25 Database for Annotation, Visualization, and Integrated Discovery (DAVID), 205, 561 DC aggresome-like induced structure (DALIS), 364 365 DC-specific ICAM 3 grabbing at the nonintegrin (DCSIGN), 339 Death domain (DD), 87 88 Defense mechanism, 108 Defense/offensive strategies of innate and adaptive immunity, 319 326 based on innate and adaptive HP-PPINs, 304 312 centrality analysis of zebrafish intracellular PPI networks, 323 325 dataset selection and target protein pool determination, 320 materials and methods in innate and adaptive host pathogen networks, 299 304 method for strategies, 319 320 proteins common to PPI networks for primary and secondary infection, 325 326 zebrafish intracellular PPI networks, 320 323 for primary and secondary infection, 323 Dendritic cells (DCs), 7 8, 281 282, 339 DGIdb databases, 592 594 Dhfr protein, 323 325 Diacylglycerol kinase (DGKE), 417 418 Differentially expressed genes (DEGs), 264 265 Dihydropyrimidinase-like 2 (DPYSL2), 244 Distance index (DI), 232 234 DNA methylation, 560 microarray, 23 24 DNA methyltransferase inhibitor (DNMTi), 552 553 DOG1 protein, 293 Dormancy survival regulator (DosR), 367
Index
Drosophila melanogaster, 188 Drug data mining, 592 594 discovery, 591 drug-design specification approach, 594 side effects, 592 target prediction and multimolecule drug design, 418 421 proteins, 551 553 Dynamic Bayesian network approach (DBN approach), 23 24 Dynamic hyphal growth protein protein interaction network of Candida albicans, 166 169 Dynamic innate and adaptive host pathogen protein protein interaction networks, 304 305 Dynamic interaction model, 301 304 of qth pathogen protein, 384 385 Dynamic intracellular protein protein interaction networks, 169 172 Dynamic models of GEINs for Caco-2 cells and Clostridium difficile during infection, 382 386 of wound healing related cellular protein protein interaction network, 247 Dynamic network modeling for constructing cerebellar wound-healing PPI network, 236 238 Dynamic protein protein interaction model, 299 300 Dynamic system models for construction of organism protein interaction network during infection, 163 of host pathogen interspecies PPMI network, 580
E E-cadherin, 552 553 EBNA2 viral protein, 9 10, 551 552 Efg1 protein, 144 146 Efh1 protein, 144 146 EGFR protein, 361 EIF2AK2 protein, 545 Eigen-interactions, 265 266 Eigenexpression fraction, 394 395, 484, 519 520 ELAVL1 protein, 323 325, 567 Endocytosis via receptors, 339 Endogenous cannabinoid signaling, 239 Endoplasmic reticulum (ER), 377 stress response, 415 416 Ensembl database, 32 Entamoeba histolytica, 198 199 Enterotoxin CD0663, 376 Epigenetics, 528 531 Epigenomics, 2 3 Epithelial cells, 375 376
651
Epstein Barr virus (EBV), 9 10, 489 490. See also Human immunodeficiency virus (HIV) big data mining and data preprocessing of NGS data, 492 496 drug target proteins and multimolecule drug design, 551 553 dynamic models of interspecies GIGENs for human B cells and, 496 500 EBV-BZLF1, 532 533 EBV-miRBART5, 532 533 extracting core network from real interspecies GIGEN, 514 520 HVCNs, 524 528 HVCPs, 528 540 interspecies GIGENs in human B cells infected with, 492 interspecies molecular mechanisms for human B lymphocytes, 521 524 lytic infection molecular mechanism, 547 551 PPIN model, 511 512 system identification approach of dynamic models of GIGENs, 500 511 system order detection scheme of dynamic models of GIGENs, 511 514 transportation process of viral particles, 540 547 ˝ Re´nyi random graph model, 209 Erdos ERG1, 291 Essential functional modules material and methods, 205 209 construction of protein protein interaction networks, 207 functional modules in infection process, 209 network reconfiguration, 207 208 omics data selection and database mining, 205 selection of protein pool, 205 207 for pathogenic and defensive mechanisms, 209 221 Extensive drug-resistance TB (XDR-TB), 370 Extracellular matrix (ECM), 439 441
F F2 protein, 267 268 False positives in candidate GEIN for real GEIN via system order detection scheme, 391 393 Fc receptors, 339 Ferritin, 212 213 Fgr15 protein, 146 Fibroblast growth factor signaling pathway (FGF signaling pathway), 266 267 Fibronectin 1 (FN1), 404 407 Fkbp5 protein, 323 325 Flux balance analysis (FBA), 403 FOS protein, 534 FOXA1 protein, 270 271
652
Index
FOXL1 protein, 40 FOXP1 protein, 539 FPR protein, 63 65 FPR1, 410 411 Fruit fly, 204 Functional analysis of core PPMI networks at three infection stages, 561 567 of host pathogen interaction networks, 567 569 of specific PPMI networks at three infection stages, 570 574 Fungal pathogen, 135
G GABRG1 protein, 542 Gain-of-function subnetworks, 136, 141, 154f GAIT. See IFN-gamma-activated inhibitor of translation (GAIT) Galleria mellonella, 188 Ganciclovir (GCV), 551 Gastric cancer (GC), 489 490 GATA3 protein, 63 65 Gch2 protein, 323 Gcn4 protein, 146 GCV3 protein, 291 Gene connectivities, 46t, 61t Gene expression, 24 profile, 230 Gene Expression Omnibus (GEO), 88 89, 273 274, 492 494 Gene ontology (GO), 136, 161, 189 191, 205, 235 236, 357, 434 435, 561 Gene regulatory networks (GRNs), 3 5, 13, 23, 425 of biofilm and planktonic cells, 141 144 candidate, 15 in immune system of unactivated and inflammatory cells, 43t in inflammatory condition, 84f in normal condition, 85f reconstruction method, 139 140 system identification for, 13 17 least square parameter estimation method, 14 16 maximum likelihood parameter estimation method, 16 17 of systems inflammation in humans, 23 25 biological insight, 45 67 candidate gene regulatory network, 71 candidate gene regulatory network via dynamic gene regulatory model, 36 38 candidate inflammatory gene regulatory network construction, 25 36 construction of candidate gene networks of systematic inflammation, 68 69
dataset selection, 68, 72 dynamic regulatory model for gene regulatory network, 69 71 features of host response, 80t, 81t gene network construction, 72 identification of parameters and time delay, 74t inflammatory gene regulatory network construction in immune system, 38 45 inflammatory genes and regulators, 82t material and methods, 68 71 parameter estimation of inflammatory gene regulator model, 79t reconstruction errors via independent data, 79t Gene/miRNA/lncRNA regulation networks (GRNs), 9 10 Genetic and epigenetic network (GEN), 592 594 Genetic Regulatory Modules (GRAM), 25 Genetic-and-epigenetic host/pathogen networks materials and methods for constructing cross-talk GWGEINs and core networks, 341 355 pathogenic/host defense mechanism to identify drug targets, 356 372 TB, 339 341 Genetic-and-epigenetic interspecies networks (GEINs), 376, 429. See also Host-and-pathogen core networks (HPCNs) in Caco-2 cells during early and late stages of CDI, 378 dynamic models of GEINs for Caco-2 cells and Clostridium difficile during infection, 382 386 in OKF6/TERT-2 cells line, 430 432 Genome-wide genetic-and-epigenetic interspecies networks (GWGEINs), 7 8, 340 341 materials and methods for constructing cross-talk GWGEINs and core networks, 341 355 big data mining and data preprocessing, 341 343 construction processes of cross-talk GWGEINs in Mϕs and DCs infected with Mtb, 341 core network extraction from real cross-talk GWGEIN by applying PNP method, 351 355 dynamic models of cross-talk GWGEIN for Mϕs, DCs, and Mtb during early infection process, 344 345 system identification method of dynamic models, 345 350 system order detection scheme of dynamic system models, 350 351 of Mϕs and DCs infected with Mtb, 356 357 Genome-wide interspecies genetic-and-epigenetic networks (GIGENs), 490 491 extracting core network from real interspecies, 514 520 of first and second infection stage in lytic phase of B cells infected with EBV, 521 524
Index
system identification approach of dynamic models of, 500 511 system order detection scheme of dynamic models of, 511 514 Global Health Observatory (GHO), 339 Glucose, 297 299 Golgi apparatus, 213 214 GP5 protein, 267 268 Growth factor receptor-bound protein 2 (GRB2), 244 GγP1 protein, 124 125
H Hank’s balanced salt solution (HBSS), 318 Heat shock protein (HSP), 401 403 Hsp70 protein, 124, 331 Hsp90, 331 Hsp90a. 1, 323 325, 331 HSP90B1, 407 408, 411, 416, 441 442, 456 HSP90B2P, 407 408, 416 Hspa5, 309 310 HSPA5, 401 403 HSPA8, 309 310, 565 567 Hspd1, 323 325 Helicobacter pylori, 196 198, 361 Hemostasis-related processes, 220 Hepatocellular carcinoma (HCC), 563 Heterotrimeric G-protein signaling pathway, 240 Hht21 protein, 175 176 High-mobility group box 1 (HMGB1), 106 Highly active antiretroviral therapy (HAART), 559 Hill function, 139 140 HIST1H4B protein, 444 446 HLA-DQA1 protein, 270 271 HLA-DRB4 protein, 270 271 Hodgkin’s lymphoma (HL), 489 490 HOG1 protein, 293, 306 Homo sapiens, 161, 188 189 HOOK1 protein, 544 Host cell damage, 3 5 Host counterparts, 308 309 Host GRN dynamic regulatory equation, 474 Host-and-pathogen core networks (HPCNs), 7 9, 340 341, 377, 430, 481 484. See also Geneticand-epigenetic interspecies networks (GEINs) in Caco-2 cells during early and late stages of CDI, 378 during infection of C. albicans SC5314 and C. albicans WO-1, 437 438 infection of C. difficile, 399 408 in Mϕs and DCs infected with Mtb, 357 370 biological processes of host core networks in cell types, 357 359 defense mechanisms of host and pathogen and dysfunctions of host in Mction of s, 368 370
653
defense mechanisms of Mtb in Min chanism, 365 368 host responses in Mnterrupt, 361 365 host pathogen cross-talk interactions in both cell types, 359 361 in OKF6/TERT-2 cells line, 430 432 Host-pathogen interaction network in Candida albicans zebrafish infection, 159 165 data selection and database mining, 161 determination of protein interaction pairings in infection PPI network, 163 164 dynamic system model for construction of organism protein interaction network, 163 interspecies protein protein interaction network between pathogen and host, 164 165 screening process of infection-related proteins, 159 160 selection of protein pool for candidate protein protein interaction networks, 161 162 simultaneous time-course microarray experiment, 159 pathogenic/offensive mechanism between Candida albicans and zebrafish in infection process, 165 175 Host pathogen interactions (HPIs), 3 5, 188, 297 Host pathogen protein protein interaction networks (HP-PPINs), 189 195, 297, 301 302 construction framework, 189 data mining and integration of two-sided microarray data, 189 191 defensive and offensive molecular mechanisms based on innate and adaptive, 304 312 identification of interactive abilities and determination of significant interactions, 193 195 during infection process of Candida albicans, 195 199 construction, 195 novel host/pathogen protein protein interaction network, 196 199 inference of putative interspecies and intracellular protein protein interactions, 191 192 multivariate dynamic modeling and identification of, 192 193 selection of protein pool, 191 Host virus core networks (HVCNs), 490 491 cellular processes, 524 527 at first and second infection stage in lytic phase of B cells infected with EBV, 524 528 intracellular signaling pathways in, 527 528 Host virus core pathways (HVCPs), 490 491 at first and second infection stage during lytic replication cycle, 528 540 virion production, 528 540
654
Index
Hsl1 protein, 124 125, 171 172 Hubs, 169 170 Hug, 323 325 Human CD4 1 T cells, 559 Human herpesvirus 4. See Epstein Barr virus (EBV) Human immunodeficiency virus (HIV), 203 204, 559. See also Epstein Barr virus (EBV) design strategy for determining multiple molecule drug combinations, 582 583 extraction of common and specific PPMI networks, 582 HIV-1 Nef, 575 HIV/human interaction networks for multiple drug designs, 565 577 functional analysis of core PPMI network, 565 567 functional analysis of host pathogen interaction networks, 567 569 functional analysis of specific PPMI networks at three infection stages, 570 574 multiple drug combinations, 575 577 network-based pathway enrichment analysis, 575 identification of real PPMI network, 584 590 methods, 577 583 pathogenic mechanisms, 561 564 Human umbilical vein endothelial cells (HUVEC), 95 96 HUWE1 protein, 536 Hypha-related signaling pathways, 169 Hyphal morphogenesis, 211 212 Hyphal stage network, 165 166
I ICAM-1 protein, 40 IFN-gamma-activated inhibitor of translation (GAIT), 572 573 IKBKB protein, 542 543 IL-1R accessory protein (IL-1RAcP), 94 95 IL-1R-activated kinase 1 (IRAK1), 99 102 Immune events in pathomechanisms of early cardioembolic stroke, 264 273 Immune response, 1 2, 158, 217, 230, 317 318 Immune system, 6 7 Immune-related molecules, 281 282 Immunological memory effect, 306 308 impacts on host systems, 310 312 Induced endocytosis, 124 Infection score, 581 582 Infection-associated genes via cellular molecular network approach, 122 126 investigation of Candida albicans adhesion associated genes, 122 124 investigation of Candida albicans damage stageassociated genes, 125 126
investigation of Candida albicans invasion stageassociated genes, 124 125 prediction of Candida albicans infection-associated genes, 122 Infectious diseases, 187 Infectious microbiology, 19 Inflammasomes, 263 264 Inflammation, 1 2, 5 6, 23 24, 106 107, 229 230, 263 Inflammatory gene regulatory network construction in immune system, 38 45 Inflammatory responses, 87 88 Inhibitor of NF-κB kinase (IKK), 88 activation, 97, 99 Inhibitory SMADs (I-SMAD), 326 327 Innate and adaptive immune systems, 317 318 defense/offensive strategies, 319 326 materials and methods, 318 319 Candida albicans strain and growth conditions, 318 infection and survival assay, 318 microarray experiments, 319 purification of Candida albicans and zebrafish RNA, 319 zebrafish strain and maintenance, 318 signaling pathways in, 326 333 Innate and adaptive loops, 297 299, 314 Innate immunity, 287, 297 299 Insulin receptor substrate 1 (IRS1), 244 Integrated genetic and epigenetic network (IGEN), 19 system identification for, 19 21 Integrin beta 1 (ITGB1), 244 Integrin signaling pathway, 241, 268 269 Interaction difference network (IDN), 289 Interaction parameter identification using time profiles microarray data, 248 Interaction variation scores (IVSs), 286 287 calculation, 286 287 Interferon γ (IFN-γ), 267 268, 305 306, 323, 340 Interleukin (IL), 264 IL-10, 340 IL-12, 339 340 IL1A, 45 61 IL1B, 45 61 Interleukin-1 receptor (IL-1R), 88, 98 99 Interspecies cross talk between host immune related molecular mechanisms and their pathogen counterparts, 305 308 of pathogen resource competition related molecular mechanisms and host counterparts, 308 310 Interspecies protein protein interaction network, 172 175 during infection, 165 166 between pathogen and host, 164 165
Index
Intracranial injury. See Traumatic brain injury (TBI) Invasion stage, 3 5, 117 118 Ion transport, 218 219 IRA2 gene, 124 125 Iron, 218 219, 297 299 Isoniazid, 370 IκB protein, 24
K KANK2 protein, 537 Kex2 protein, 171 172 Kmo protein, 196 198 KRAS protein, 572 573 Kre1 protein, 172
L Laminin, alpha 4 (Lama4), 220 Lat1 protein, 214 215 Latent TB, 339 Least square parameter estimation method, 14 16 Legionella pneumophila, 196 198 Likelihood function, 70 Linear dynamic gene regulatory model, 13 Lipoarabinomannan (LAM), 340 Lipopolysaccharide (LPS), 24 Liquid chromatography (LC), 370 371 LL-37 peptide, 123 124 LMP1 protein, 542 543 LMP2B protein, 539 Log-likelihood function, 17 Long noncoding RNAs (lncRNAs), 377, 429 430 Loss-of-function subnetworks, 136, 141, 155f LRRK2 protein, 546 Lytic infection molecular mechanism, 547 551
M Mac1 protein, 212 213 Macrophages (Mϕs), 7 8, 281 282, 343 Major capsid protein (MCP), 539 Major histocompatibility complex class I molecules (MHC class I molecules), 217 MAP/ERK kinase kinase kinase 3 (MEKK3), 102 104 MAPK6 protein, 456 Mass spectrometry (MS), 370 371 Maximum likelihood estimation method (MLE method), 16 17 Mcm1 protein, 146 MET10 protein, 291 MET2 protein, 291 Metal toxicity, 369 Methylation regulation, 272 273 Methyltransferase-associated protein (MTAP), 358 359
655
MHC class I molecules. See Major histocompatibility complex class I molecules (MHC class I molecules) Microarray data, 68, 283 284 for early cardioembolic stroke, 273 274 experiments, 319 profiles, 24 25 technology, 23 Micronutrients, 212 213, 218 219 Micropinocytosis, 339 MicroRNAs (miRNAs), 5 6, 19, 264, 340 341, 376 377, 429 430, 559 560 dynamic model, 20 21 miR-30B, 442 miR-326, 565 567 mir-636, 371 372 miR-BARTs, 528 531 miR143HG, 442 miR1972 2, 442 miR31, 442 miR3941, 442 miR548D2, 442 Migration inhibitory factor (MIF), 570 571 Mitogen-activated protein kinase (MAPK), 98, 401 403 Model order detection and identification, 275 276 Model order detection method, 301 304 MTO1 gene, 124 125 Multidrug-resistance TB (MDR-TB), 370 Multiinput/single-output gene regulatory model, 67 stochastic process, 38 Multimolecule drug design, 340 341, 551 553 Multiple drug targets identification, 595 596 Multiple-molecule drug design of infectious diseases, 596 Multiprotein complexes, 263 264 Multivesicular bodies (MVBs), 536 Muscarinic acetylcholine receptor signaling pathway (mAChR signaling pathway), 269 270 Mycobacterium tuberculosis (Mtb), 339 341 construction processes of cross-talk GWGEINs in Mϕs and DCs infected with, 341 defense mechanisms of Mtb in Min chanism, 365 368 GWGEINs of Mϕs and DCs infected with, 356 357 HPCNs in Mϕs and DCs infected with, 357 370 infection, 7 8 MyD88 protein, 98, 105
N Nasopharyngeal carcinoma (NPC), 489 490 National Center for Biotechnology Information (NCBI), 343
656
Index
Ncstn identification for relationship between bacteriaand fungus-induced immune responses, 332 333 Negative feedback controls of cross talks, 106 107 Negative predictive values, 151 Network-based pathway enrichment analysis, 575 Neurotransmitters, 246 247 Next-generation sequencing (NGS), 2 3, 19, 491 NF-κB inducing kinase (NIK), 98 NFKB1 gene, 401 403 Nitric oxide (NO), 367 NKX3 1 protein, 270 271 Nonhubs, 41 45 NP3 protein, 242 Nsdhl protein, 196 198 Nuclear bodies (NBs), 543 Nuclear envelope (NE), 544 Nuclear factor kappa-B (NF-κB), 87 88 activation, 97 pathway, 45 proteins, 23 Nuclear pore complex (NPC), 544 Nucleocapsid, 489 490 NUP155 protein, 544
O Offensive and defensive mechanism, 6 7 OKF6/TERT-2 cell line, 428, 448 453 Omics approaches, 1 2 data selection and database mining, 205 One-way ANOVA, 320 Open reading frames (ORFs), 136 orf19. 3769, 212 213 Orf19. 5438, 172 orf19. 5627 gene, 123 124 orf19. 6883, 124 125 oriLyt, 490 Ortholog-based PPI, 188 189 Oxidative stress, 413 414 OXSR1 protein, 565 567
P p21 protein (Cdc42/Rac) activated kinase 2 (PAK2), 244 p60 TNF receptor, 96 p80 TNF receptor, 96 Pathogen PPIN dynamic interactive equation, 473 Pathogen-associated molecular patterns (PAMPs), 104 105, 214 215 Pathogen host interactions (PHIs), 282 283 Pathogen host PPI networks (PH-PPINs), 282 283, 285 286 for cross-talk network markers, 287 295 of innate and adaptive immunity, 287 289
Pathogenic and defensive mechanisms, 196 198, 283 essential functional modules for, 209 221 dynamic protein protein interaction networks, 209 210 functionally enriched Candida albicans modules, 210 215 functionally enriched zebrafish modules, 215 221 identification of proteins in Candida albicans infection, 209 210 Pathogenic mechanism, 117 118, 121 122 Pathogenic/host defense mechanism to identify drug targets, 356 372 drug targets, drug mining, and multimolecule drug design, 370 372 GWGEINs of Mϕs and DCs infected with Mtb, 356 357 HPCNs in Mϕs and DCs infected with Mtb, 357 370 Pathogenic/offensive mechanism between Candida albicans and zebrafish in infection process, 165 175 dynamic hyphal growth protein protein interaction network of Candida albicans, 166 169 dynamic intracellular protein protein interaction networks, 169 172 interspecies protein protein interaction network, 172 175 interspecies protein protein interaction network during infection, 165 166 Pathogenicity, 157 Pathogens, 1 2, 87 88, 203 Pathomechanisms of early stroke and potential drug targets, 271 273 Pattern recognition receptors (PRRs), 217, 281 282, 297 299 PCBP2 protein, 491 492 PFDN5 protein, 364 365 Phagocytosis, 214 215 of viruses or bacteria, 339 PHO23 protein, 124 125 Phosphatidylinositide 3-kinases signaling pathway (PI3K signaling pathway), 230, 239 240 PI3K-Akt signaling pathway, 361 PI3K/PKB pathway, 239 240 Phospholipomannan, 218 Phosphorylation, 543 Planktonic gene regulatory network, 153f Platelet-derived growth factor signaling pathway (PDGF signaling pathway), 269 270 Plexin A3 (PLXNA3), 244 PMS2P1 gene, 364 pMtb infection, 361 365 POLR2A protein, 570 571 Polymerase chain reaction (PCR), 592 594 Positive predictive values, 151
Index
Posttranslational modifications, 528 531 Pox1 3 protein, 214 215 Primary infection network, 334f Principal component analysis (PCA), 592 594 Principal network projection method (PNP method), 7 8, 265 266, 341, 351 354, 378, 492, 592 594 extracting core network structures from real GEINs via, 393 396 PRKAR1A protein, 364 Proinflammatory cytokine in mice, 263 Prokineticin, 290 291 Promyelocytic leukemia (PML), 491 492 Proteasome in controlling adaptive immune response, 330 Protein degradation rates, 301 302 interaction model, 275 network construction, 223 225 pool selection, 577 579 selection and database mining, 284 285 recruitment, 97 secretion, 213 214, 219 Protein analysis through evolutionary relationships (PANTHER), 238 Protein kinases (PKs), 98 99, 537 538 Protein protein and miRNA interactions (PPMIs), 560 Protein protein interaction (PPI), 3 5, 87, 188 189, 204 205, 230, 263 264, 282 283, 318, 592 construction, 207 material and methods of PPI network construction and principle network projection, 273 277 Protein protein interaction networks (PPINs), 7 8, 87, 117 118, 263 264 construction, 274 cross-talk analysis of, 99 102 at different stages of cardioembolic stroke, 264 266 system identification for, 18 19 at different time stages of inflammatory system, 95 96 dynamic progression, 102 104 identification of interactive parameters, 108 109 PSAP protein, 537 538 Pseudomembranous candidiasis, 427 428 PSMA3 protein, 545 546 Psmd1 protein, 330 Psmd13 protein, 330 PSME3 protein, 535 536 PTEN protein, 410 411
R RAB7A protein, 535 Rap1 protein, 146
657
Ras-related C3 botulinum toxin substrate (RAC), 244 RAC1 protein, 407 408, 410 411 RAC3 protein, 537 538 Ras-related protein (Rsr1), 166 169 Ras2 protein, 166 169 RBPMS protein, 491 492, 535 536, 550 Reactive nitrogen intermediates (RNIs), 340 Reactive oxygen species (ROS), 196 198, 362, 377 Real-time polymerase chain reaction (RT-PCR), 10 Realistic interaction pair determination, 248 250 Receptor of advance glycation end product (RAGE product), 106 Receptor-regulated SMADs (R-SMAD), 326 327 Reconstructed human oral epithelium (RHE), 119 120 Rel proteins, 23 RELA protein, 533 REP3123 protein, 420 Repressilator, 306 308 Resource competition related proteins, 308 309 Restoration mechanism, 229 231 Retinoblastoma (RB), 572 573 Rheumatoid arthritis, 23 Ribo-nucleotide reductase 1 (RNR1), 292 293 Ribosomal protein S10 (RPS10), 364 Rifampicin, 370 RNA sequencing (RNA-seq), 191 RNF41 protein, 534 RNR1 protein, 293 RPL13A protein, 364 RPS4X protein, 267 268 RPS4Y1 protein, 267 268, 270 271 RUVBL1 protein, 565 Rv0081 protein, 367 Rv0353 protein, 368 Rv0667 protein, 365 Rv0762c protein, 365 Rv0969 gene, 365 367 Rv1098c protein, 371 372 Rv1173 protein, 367 368 Rv1675c protein, 368 Rv2234 protein, 367
S S100A8/A9 protein, 106 Saccharomyces cerevisiae, 23, 119 120, 136, 138, 158, 188 189 SC5314 strain, 427 428 Sc5d protein, 196 198 Sclerotinia sclerotiorum, 198 199 Screening biofilm-related transcription factors, systems methods of, 136 144 data used in, 137 138
658
Index
Screening (Continued) gene regulatory network reconstruction method, 139 140 gene regulatory networks of biofilm and planktonic cells, 141 144 selection scheme for transcription factors and target genes, 138 139 of potential C andida albicans biofilm-related transcription factors, 144 process of infection-related proteins, 159 160 SCS7 gene, 123 125 Secondary infection network, 335f Sensitivity, 150 Serine proteinase inhibitor (serpinc1), 290 291 Serine/Threonine Kinase 11 (STK11), 411 Serotonin receptor signaling (5HT receptor signaling), 310 311 SERPINC1 protein, 267 268 Sigmoid function, 139 140 Signaling lymphocyte activation molecule (SLAM), 362 364 Signaling pathways in innate and adaptive immune responses, 326 333 apoptosis regulation in primary and secondary infection, 330 332 Ncstn identification for relationship between bacteria-and fungus-induced immune responses, 332 333 proteasome in controlling adaptive immune response, 330 TGF-β pathway, 326 330 signaling transduction pathways cross talks among signaling pathways in inflammation, 90 95 determination of significant interaction pairs, 109 113 dynamic progression of protein protein interaction networks, 102 104 identification of interactive parameters of protein protein interaction networks, 108 109 negative feedback controls of cross talks, 106 107 signaling transduction, signaling pathways, and cross talks in inflammatory response, 95 102 specific architecture in signaling transduction network, 104 105 toll-like receptor 4 endogenous ligand, 105 106 in wound-healing process, 239 240 Singular value decomposition (SVD), 265 266 Skn7 protein, 146 Slc18a2 protein, 310 311 SLC25A6 protein, 533 534 SMAD proteins, 326 327 Smad7, 327, 329f, 330 331 SMARCAD1, 570 571
SMI1 protein, 124 125 Specificity, 150 Sphingolipids, 123 124 SPP1 protein, 267 268 STAT3 protein, 491 492 STRING database, 91 92 Stroke. See also Cardioembolic stroke (CE stroke) pathophysiology, 264, 272 273 progression mechanisms, 264 Structure variation value (SVV), 207 209 Student’s t-test, 194 195, 585 590 SUMO proteins, 358 359 SYK protein, 362 364 Synap23. 2 protein, 310 311 Synthetic accessibility (SA), 403 System identification approach of dynamic models of GIGENs, 500 511 for gene regulatory network, 13 17 of IGEN, 19 21 method of dynamic models of GWGEINs, 345 350 of protein protein interaction network, 18 19 System order detection scheme of dynamic system models of GWGEIN, 350 351 Systematic inflammation, 23 24, 37 38 Systemic infection, 117 118, 135 Systemic inflammation, 263 264 Systems biology, 1 2, 23 24 approach, 491 methodology, 246 247 strategy, 207 tools and statistics, 238 Systems drug-design method in infectious diseases, 594 599 drug-design specification approach, 594 identification of multiple drug targets, 595 596 multiple-molecule drug design of infectious diseases, 596 multiple-molecule drug design with side effects, 596 599 Systems medicine, 3, 188 189
T T cells, 297 299 activation, 240 241 T helper type 1 (Th1), 340 TAF9 protein, 565 567 TANK-binding kinase 1 (Tbk1), 306 TCTN1 protein, 544 Tec1 protein, 146 Tegument, 489 490 Temporal Relationship Identification Algorithm prediction algorithm (TRIA prediction algorithm), 72, 164
Index
Terbinafine, 430 Tes15 protein, 214 215 Tetracycline, 430 Tetrandrine, 430 TGFB1I1 protein, 537 538 THBD protein, 267 268 Thioredoxin (txn), 292 293 Thrush. See Pseudomembranous candidiasis Thymoquinone (TQ), 9 10, 492, 552 553 Tissue plasminogen activator (tPA), 263 264 changes in cellular functions and proteins after, 268 269 Tissue regeneration, 229 230 TJAP1 protein, 456 TMEM205 protein, 441 442, 444 447 Tna1 protein, 212 213 TNFR-associated factor 2 (TRAF2), 87 88 TNFRSF10D protein, 542 Toll-like receptors (TLRs), 217, 265 TLR-mediated pathogen recognition, 339 TLR2, 24 TLR4, 24, 45 61, 88 endogenous ligand, 105 106 signaling pathways, 98 99 TOPBP1 protein, 571 572 Tp53 protein, 172, 323 325 Traf6 protein, 99, 172 Transcription factors (TFs), 3 5, 13, 24, 117 118, 128, 343, 380, 428 429, 494 TF GATA1 protein, 451 TF NFKB1 protein, 451 Transcriptome datasets, 299 Transferrin-a (Tfa), 218 219 Transforming growth factor-β (TGF-β), 106, 266 267, 318, 326 330, 490 Transportation process of viral particles, 540 547 Transposon site hybridization (TraSH), 370 371 Transposon-directed insertion site sequencing (TraDIS), 403, 418 419 Traumatic brain injury (TBI), 229 inflammation and immune response in cerebellar wound-healing process, 238 244 protein protein interaction network of cerebellar wound-healing process in zebrafishes, 231 238 TRIM28 protein, 570 571 TRIM3 protein, 544 Tuberculosis (TB), 7 8, 339 Tumor necrosis factor (TNF), 87 88, 95t tumor necrosis factor α signaling pathway, 96 98, 106, 340 Tumor necrosis factor receptors (TNFR), 87 88 Tunicamycin, 430 Turn angle index (TAI), 232 234 Turn direction index (TDI), 232 234
659
U UBI4 gene, 123 125, 169 170, 175 176, 180f Ubiquitin C (UBC), 267 268, 270 271 Ubiquitin proteins, 358 359 Ubiquitination, 123 124
V Valpromide (VPM), 9 10, 492, 552 553 Vascular endothelium, 87 88 VEGF-A protein, 106 Versican, 290 291 VIM protein, 565 567 Viral miR-BART1 3p, 491 Vps28 (endosomal sorting complex), 308 309
W “Weak ties” in network theory, 41 45 Wnt signaling pathway, 269 270 WO-1 strain, 427 428 World Health Organization (WHO), 339
X XDR-TB. See Extensive drug-resistance TB (XDR-TB)
Y Yeast cell, 428 429 Yeast Search for Transcriptional Regulators And Consensus Tracking (YEASTRACT), 137 138 Yeast-to-hyphae transition, 297 299 YEASTRACT database, 119 120 YWHAE protein, 267 268 YWHAZ protein, 267 268
Z Zebrafish (Danio rerio), 158, 188, 204, 229 230, 317 dynamic intracellular PPI network, 179f infection model, 158 intracellular protein protein interaction networks, 320 323 for primary and secondary infection, 323 RNA, 319 stab lesion assay and time-course microarray experiments in, 231 232 strain and maintenance, 318 Zebrafish movement index (ZMI), 230 experiments for, 232 235 Zebularine (Zeb), 9 10, 492, 552 553 Zgc:153257 protein, 323 325 Zgc:63606 protein, 323 325 zgc:77112 protein, 196 198 Zta viral protein, 9 10, 551 552