153 104 10MB
English Pages 360 [349] Year 2022
Methods in Molecular Biology 2486
Jane P. F. Bai · Junguk Hur Editors
Systems Medicine
METHODS
IN
MOLECULAR BIOLOGY
Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, UK
For further volumes: http://www.springer.com/series/7651
For over 35 years, biological scientists have come to rely on the research protocols and methodologies in the critically acclaimed Methods in Molecular Biology series. The series was the first to introduce the step-by-step protocols approach that has become the standard in all biomedical protocol publishing. Each protocol is provided in readily-reproducible step-bystep fashion, opening with an introductory overview, a list of the materials and reagents needed to complete the experiment, and followed by a detailed procedure that is supported with a helpful notes section offering tips and tricks of the trade as well as troubleshooting advice. These hallmark features were introduced by series editor Dr. John Walker and constitute the key ingredient in each and every volume of the Methods in Molecular Biology series. Tested and trusted, comprehensive and reliable, all protocols from the series are indexed in PubMed.
Systems Medicine Edited by
Jane P. F. Bai Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, USA
Junguk Hur School of Medicine and Health Sciences, University of North Dakota, Grand Forks, ND, USA
Editors Jane P. F. Bai Center for Drug Evaluation and Research U.S. Food and Drug Administration Silver Spring, MD, USA
Junguk Hur School of Medicine and Health Sciences University of North Dakota Grand Forks, ND, USA
ISSN 1064-3745 ISSN 1940-6029 (electronic) ISBN 978-1-0716-2264-3 ISBN 978-1-0716-2265-0 (eBook) https://doi.org/10.1007/978-1-0716-2265-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022 Chapter 8 is licensed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/). For further details see license information in the chapter This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
Preface Integration of advances in bioinformatics, computational methods, omics, systems biology, medical informatics, and artificial intelligence has advanced systems medicine and unprecedently accelerated pharmaceutical research and development of medical products for treating human diseases. Systems medicine has cost-effectively improved global health at a fast pace. This book provides a 360-degree review of technological advances in systems medicine and guides readers through all facets of systems medicine with respect to precision medicine. In brief, these chapters would be the foundation for the biomedical scientific community to continue to work together toward further advancing systems medicine for the goal of promoting global health. This book is structured in four parts. The first part, “Scientific and Medical Advances,” consisting of three chapters, highlights the recent achievements in proteomics for biomarker identification, integration of omics and phenotypic data for precision medicine, as well as systems medicine-guided diagnosis and treatment of drug-induced Stevens-Johnson syndrome and toxic epidermal necrolysis. The second part, “Acceleration of Pharmaceutical Research and Development,” includes four chapters. Three chapters present systems-based computational approaches (physiologically-based pharmacokinetic modeling, quantitative systems pharmacology modeling, and machine learning workflow and bioinformatics tools for toxicity analysis) for pharmaceutical research and drug development. The remaining chapter reviews systems approaches for controlling pandemic infectious diseases and the principles of optimizing systemic exposure and responses of drugs for drug repurposing. The third part, “Tools and Methodologies,” comprises five chapters that overarchingly cover the computational tools and methodologies of network biology, quantitative systems toxicology for assessing drug-induced liver toxicity, a computational method for creating virtual patient populations for modeling and simulating patient response variabilities, biomedical ontologies for big data integration and multiscale modeling, bioinformatics methodologies for understanding diseases, and statistical concepts for leveraging complex data and models. These chapters of methods and tools are the foundation for continued advances in the research and development of safe and effective drugs. The last part, “Systems Medicine to Address Unmet Medical Needs,” contains four chapters that discuss how systems medicine can address the unmet medical needs in neurological diseases such as diabetic peripheral neuropathy and amyotrophic laterals sclerosis and how personal dynamic data enable the promotion of scientific wellness, review the role of medical informatics in regulatory development of medical products and future opportunities and challenges, as well as identify the educational needs. Silver Spring, MD, USA Grand Forks, ND, USA
Jane P. F. Bai Junguk Hur
v
Acknowledgments We would like to thank the contributing authors for generously writing their chapters despite their busy schedules to share with readers their knowledge and expertise. Both Drs. Jane P.F. Bai and Junguk Hur donated the remuneration from Spring Nature for editing this book to Rackham Graduate School of the University of Michigan, Ann Arbor, Michigan, USA.
vii
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PART I
SCIENTIFIC AND MEDICAL ADVANCES
1 Mass Spectrometry–Based Proteomics for Biomarker Discovery . . . . . . . . . . . . . . Zhijun Cao and Li-Rong Yu 2 Integration of Omics and Phenotypic Data for Precision Medicine . . . . . . . . . . . . Juan Zhao, QiPing Feng, and Wei-Qi Wei 3 Stevens–Johnson Syndrome and Toxic Epidermal Necrolysis in the Era of Systems Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chun-Bing Chen, Chuang-Wei Wang, and Wen-Hung Chung
PART II
v vii xi
3 19
37
ACCELERATION OF PHARMACEUTICAL RESEARCH AND DEVELOPMENT
4 Integration of Engineered Delivery with the Pharmacokinetics of Medical Candidates via Physiology-Based Pharmacokinetics. . . . . . . . . . . . . . . . . . . . . . . . . . 57 Yuching Yang and Xinyuan Zhang 5 Applications of Quantitative System Pharmacology Modeling to Model-Informed Drug Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Andy Z. X. Zhu and Mark Rogge 6 Combating Viral Diseases in the Era of Systems Medicine. . . . . . . . . . . . . . . . . . . . 87 Jane P. F. Bai and Ellen Y. Guo 7 Toxicity Analysis of Pentachlorophenol Data with a Bioinformatics Tool Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Natalia Polouliakh, Takeshi Hase, Samik Ghosh, and Hiroaki Kitano
PART III
TOOLS AND METHODOLOGIES
8 Virtual Populations for Quantitative Systems Pharmacology Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Yougan Cheng, Ronny Straube, Abed E. Alnaif, Lu Huang, Tarek A. Leil, and Brian J. Schmidt 9 Quantitative Systems Toxicology and Drug Development: The DILIsym Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Paul B. Watkins 10 Introduction to Genomic Network Reconstruction for Cancer Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Guillermo de Anda-Ja´uregui, Hugo Tovar, Sergio Alcala´-Corona, and Enrique Herna´ndez-Lemus
ix
x
11 12
Contents
Learning in Medicine: The Importance of Statistical Thinking. . . . . . . . . . . . . . . . 215 Massimiliano Russo and Bruno Scarpa Development and Applications of Interoperable Biomedical Ontologies for Integrative Data and Knowledge Representation and Multiscale Modeling in Systems Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Yongqun He
PART IV 13
14
15
16
SYSTEMS MEDICINE TO ADDRESS UNMET MEDICAL NEEDS
Systems Biology to Address Unmet Medical Needs in Neurological Disorders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Masha G. Savelieff, Mohamed H. Noureldein, and Eva L. Feldman Informatics in Medical Product Regulation: The Right Drug at the Right Dose for the Right Patient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eileen Navarro Almario, Anna Kettermann, and Vaishali Popat Personal Dense Dynamic Data Clouds Connect Systems Biomedicine to Scientific Wellness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gilbert S. Omenn, Andrew T. Magis, Nathan D. Price, and Leroy Hood Educational Needs for Quantitative Systems Pharmacology Scientists. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . James M. Gallo
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
247
277
315
335 345
Contributors SERGIO ALCALA´-CORONA • Center for Complexity Sciences, Universidad Nacional Autonoma de Me´xico, Ciudad de Me´xico, Me´xico; Instituto de Fisiologı´a Celular, Universidad Nacional Autonoma de Me´xico, Ciudad de Me´xico, Me´xico EILEEN NAVARRO ALMARIO • Paraclete Professionals, LLC, Clarksville, MD, USA ABED E. ALNAIF • QSP and PBPK, Bristol Myers Squibb, Princeton, NJ, USA; EMD Serono, Billerica, MA, USA GUILLERMO DE ANDA-JA´UREGUI • Computational Genomics, National Institute for Genomic Medicine, Ciudad de Me´xico, Me´xico; Ca´tedras CONACYT Program for Young Researchers, Ciudad de Me´xico, Me´xico; Center for Complexity Sciences, Universidad Nacional Autonoma de Me´xico, Ciudad de Me´xico, Me´xico JANE P. F. BAI • Office of Clinical Pharmacology, Office of Translational Sciences, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, USA ZHIJUN CAO • Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA CHUN-BING CHEN • Department of Dermatology, Drug Hypersensitivity Clinical and Research Center, Chang Gung Memorial Hospital, Linkou, Taipei, and Keelung, Taiwan; Cancer Vaccine and Immune Cell Therapy Core Laboratory, Chang Gung Memorial Hospital, Linkou, Taiwan; Graduate Institute of Clinical Medical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan; College of Medicine, Chang Gung University, Taoyuan, Taiwan YOUGAN CHENG • QSP and PBPK, Bristol Myers Squibb, Princeton, NJ, USA; Daiichi Sankyo, Inc., Pennington, NJ, USA WEN-HUNG CHUNG • Department of Dermatology, Drug Hypersensitivity Clinical and Research Center, Chang Gung Memorial Hospital, Linkou, Taipei, and Keelung, Taiwan; Cancer Vaccine and Immune Cell Therapy Core Laboratory, Chang Gung Memorial Hospital, Linkou, Taiwan; Graduate Institute of Clinical Medical Sciences, College of Medicine, Chang Gung University, Taoyuan, Taiwan; College of Medicine, Chang Gung University, Taoyuan, Taiwan EVA L. FELDMAN • NeuroNetwork for Emerging Therapies, University of Michigan, Ann Arbor, MI, USA; Department of Neurology, University of Michigan Medical School, Ann Arbor, MI, USA QIPING FENG • Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA JAMES M. GALLO • Department of Pharmaceutical Sciences, School of Pharmacy and Pharmaceutical Sciences, University at Buffalo, Buffalo, NY, USA SAMIK GHOSH • Systems Biology Institute, Tokyo, Japan ELLEN Y. GUO • University of Illinois at Chicago College of Pharmacy, Chicago, IL, USA TAKESHI HASE • Systems Biology Institute, Tokyo, Japan; Tokyo Medical and Dental University, Tokyo, Japan; Faculty of Pharmacy, Keio University, Tokyo, Japan YONGQUN HE • Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, Center of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
xi
xii
Contributors
ENRIQUE HERNA´NDEZ-LEMUS • Computational Genomics, National Institute for Genomic Medicine, Ciudad de Me´xico, Me´xico; Center for Complexity Sciences, Universidad Nacional Autonoma de Me´xico, Ciudad de Me´xico, Me´xico LEROY HOOD • Institute for Systems Biology, Seattle, WA, USA; Providence Saint Joseph Healthcare System, Seattle, WA, USA LU HUANG • QSP and PBPK, Bristol Myers Squibb, Princeton, NJ, USA ANNA KETTERMANN • Office of Biostatistics, Office of Translational Sciences, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA HIROAKI KITANO • Sony Computer Science Laboratories Inc., Tokyo, Japan; Systems Biology Institute, Tokyo, Japan; Faculty of Pharmacy, Keio University, Tokyo, Japan; Okinawa Institute for Science and Technology Graduate School, Okinawa, Japan TAREK A. LEIL • QSP and PBPK, Bristol Myers Squibb, Princeton, NJ, USA; Daiichi Sankyo, Inc., Pennington, NJ, USA ANDREW T. MAGIS • Institute for Systems Biology, Seattle, WA, USA MOHAMED H. NOURELDEIN • NeuroNetwork for Emerging Therapies, University of Michigan, Ann Arbor, MI, USA; Department of Neurology, University of Michigan Medical School, Ann Arbor, MI, USA GILBERT S. OMENN • Departments of Computational Medicine & Bioinformatics, Internal Medicine, Human Genetics, and School of Public Health, University of Michigan, Ann Arbor, MI, USA; Institute for Systems Biology, Seattle, WA, USA NATALIA POLOULIAKH • Sony Computer Science Laboratories Inc., Tokyo, Japan; Department of Ophthalmology and Visual Science, Yokohama City University, Yokohama, Japan; Systems Biology Institute, Tokyo, Japan VAISHALI POPAT • Biomedical Informatics and Regulatory Review Science, Office of New Drugs, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA NATHAN D. PRICE • Institute for Systems Biology, Seattle, WA, USA; Onegevity, New York, NY, USA MARK ROGGE • Center for Pharmacometrics and Systems Pharmacology, University of Florida, Lake Nona, FL, USA MASSIMILIANO RUSSO • Harvard-MIT Center for Regulatory Science, Harvard Medical School & Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA MASHA G. SAVELIEFF • NeuroNetwork for Emerging Therapies, University of Michigan, Ann Arbor, MI, USA BRUNO SCARPA • Department of Statistical Sciences, University of Padova, Padova, Italy BRIAN J. SCHMIDT • QSP and PBPK, Bristol Myers Squibb, Princeton, NJ, USA RONNY STRAUBE • QSP and PBPK, Bristol Myers Squibb, Princeton, NJ, USA HUGO TOVAR • Computational Genomics, National Institute for Genomic Medicine, Ciudad de Me´xico, Me´xico CHUANG-WEI WANG • Department of Dermatology, Drug Hypersensitivity Clinical and Research Center, Chang Gung Memorial Hospital, Linkou, Taipei, and Keelung, Taiwan; College of Medicine, Chang Gung University, Taoyuan, Taiwan; Cancer Vaccine and Immune Cell Therapy Core Laboratory, Chang Gung Memorial Hospital, Linkou, Taiwan PAUL B. WATKINS • Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy, Institute for Drug Safety Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, NC, USA WEI-QI WEI • Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
Contributors
xiii
YUCHING YANG • Division of Pharmacometrics, Office of Clinical Pharmacology, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, USA LI-RONG YU • Division of Systems Biology, National Center for Toxicological Research, U.S. Food and Drug Administration, Jefferson, AR, USA XINYUAN ZHANG • Division of Pharmacometrics, Office of Clinical Pharmacology, Center for Drug Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD, USA JUAN ZHAO • Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA ANDY Z. X. ZHU • Preclinical and Translational Sciences, Takeda Pharmaceuticals International Co, Cambridge, MA, USA
Part I Scientific and Medical Advances
Chapter 1 Mass Spectrometry–Based Proteomics for Biomarker Discovery Zhijun Cao and Li-Rong Yu Abstract Proteomics plays a pivotal role in systems medicine, in which pharmacoproteomics and toxicoproteomics have been developed to address questions related to efficacy and toxicity of drugs. Mass spectrometry is the core technology for quantitative proteomics, providing the capabilities of identification and quantitation of thousands of proteins. The technology has been applied to biomarker discovery and understanding the mechanisms of drug action. Both stable isotope labeling of proteins or peptides and label-free approaches have been incorporated with multidimensional LC separation and tandem mass spectrometry (LC-MS/ MS) to increase the coverage and depth of proteome analysis. A protocol of such an approach exemplified by dimethyl labeling in combination with 2D-LC-MS/MS is described. With further development of novel proteomic tools and increase in sample throughput, the full spectrum of mass spectrometry-based proteomic research will greatly advance systems medicine. Key words Proteomics, Mass spectrometry, Tandem MS, Protein quantitation, Stable isotope labeling, Biomarker, Liquid chromatography
1
Introduction The development and advancement of mass spectrometry (MS) has significantly changed the field of protein analysis in the past few decades. One application is the identification, quantitation, or characterization of the total proteins expressed in a cell, tissue, or organism at a specific time and physiological state (i.e., proteome) [1, 2]. State-of-the-art MS-based proteomic approaches can analyze and quantify thousands and even over ten thousand proteins in a single experiment as well as their post-translational modifications (e.g., phosphorylation and acetylation). MS-based proteomic technology can also analyze proteins in a targeted fashion (e.g., proteins involved in a specific pathway). Applications of these proteomics technologies have tremendously advanced both hypothesis- and discovery-driven biomedical research.
Jane P.F. Bai and Junguk Hur (eds.), Systems Medicine, Methods in Molecular Biology, vol. 2486, https://doi.org/10.1007/978-1-0716-2265-0_1, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022
3
4
Zhijun Cao and Li-Rong Yu
MS-based proteomic approaches have been applied to addressing a variety of biomedical questions in normal or diseased states in nonclinical or clinical settings. One of such applications is in the field of drug development, thus disciplines termed pharmacoproteomics and toxicoproteomics have been developed [3–5]. These disciplines encompass the areas of drug target identification, mechanisms of efficacy and toxicity, signaling pathways of drug action, and development of protein biomarkers and assays for biomarker-oriented therapy and monitoring therapeutic efficacy and toxicity. In the past two decades, tremendous efforts have been made to identify biomarkers using global or open-discovery proteomic approaches and develop targeted MS-based approaches such as multiple reaction monitoring (MRM) assays for analytical validation of specific protein biomarkers. For proteomic biomarker discovery, samples are traditionally separated using high-resolution two-dimensional electrophoresis (2-DE) or two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) [6] followed by MS identification of resolved proteins. 2-DE is a technique of separating intact proteins. Prior to MS analysis, a gel spot is usually enzymatically digested and either matrix-assisted laser desorption/ionization (MALDI) MS [7, 8] or electrospray ionization (ESI) MS [9] is applied to the analysis of resulting peptides. Since 2-DE-based approaches are laborintensive and tend to have less automation, higher throughput proteomic approaches with automated protein/peptide separation and online MS analysis have been developed over the past three decades. In particular, capillary high-performance liquid chromatography with nanoflow separation (nano-HPLC), nano-ESI, and online tandem MS (LC-MS/MS) have significantly driven the discovery of protein biomarkers. With increased performance of LC separation and increased MS sensitivity, cycling speed, and resolution, thousands of proteins can be analyzed and quantitated in a single analysis. The approaches are well-compatible with proteome samples prepared in-solution with stable isotopic labeling or without labeling (i.e., label-free). It should be noted that some affinity-based proteomic approaches (e.g., protein microarrays, SOMAscan and Olink multiplex assays) have been employed for biomarker discovery [10–12]. This chapter mainly focuses on MS-based proteomic methods in biomarker discovery for systems medicine [13].
2
Mass Spectrometry Mass spectrometry is a premiere analytical instrument and has driven the revolution in proteomics. The current MS enables peptides/protein measurement at the sensitivity of low femtomole (fmol, 10 15 mol) to attomole (amol, 10 18 mol) levels [14], and
MS Proteomics for Biomarker Discovery
5
at the zeptomole levels (zmol, 10 21 mol) in particular cases [15, 16]. High-resolution MS can routinely achieve mass accuracy of low ppm and high ppb (parts-per-billion) [17, 18]. A typical mass spectrometer is composed of an ionization source to generate ions of analytes and desorb them into the gas phase, ion optics to focus and transmit ions, and an analyzer to detect and measure the masses of ions. MALDI [7, 8, 19] and ESI [9, 20] are the two most common ionization approaches for peptide and protein analysis. In MALDI, the peptide/protein sample is mixed with a saturated solution of matrix, for example, α-cyano-4-hydroxycinnamic acid (CHCA), 2,5-dihydroxybenzoic acid (DHB), or 3,5-dimethoxy-4-hydroxycinnamic acid (sinapinic acid) [21]. Samples spotted on the MALDI plate form crystals after they are dried. Peptide/protein ions are ejected from the plate surface into gas phase when the sample is irradiated by a laser pulse in the MALDI source region. In ESI, a high voltage is applied to the emission tip (spray tip) to generate charged small liquid droplets, which undergo further desolvation in the ion source to generate gas phase ions. ESI is readily interfaced with liquid separation techniques such as LC and capillary zone electrophoresis (CZE) for online and real-time MS detection and data acquisition. While MALDI typically generates singly charged peptide/protein ions, ESI is strongly capable of generating multiply charged peptide/protein ions. The gas phase peptide/protein ions created in the source region are transmitted through the ion optics to the mass analyzer where the m/z ratios of ions are measured. There are several different types of mass analyzers. Each of them has unique features in terms of speed, sensitivity, resolution, mass accuracy, and capabilities of tandem MS [22]. A single quadrupole mass analyzer [23] has limited utility for proteomics; however, the triple quadrupole instrument, which has tandem MS capability, has been widely used for targeted biomarker measurement through selected ion monitoring (SIM) due to high measurement speed and sensitivity. Quadrupole ion trap mass analyzers [24], either the classic threedimensional (3D) ion traps or the newer 2D ion traps (i.e., linear ion trap) [25–27], have the capability of performing highly efficient multiple stage tandem MS (i.e., MSn). However, both quadrupole mass analyzers and ion traps have low mass resolution and accuracy in normal scan modes. The time-of-flight (TOF) mass analyzer [28] measures the m/z values of ions based on the time they take to fly in the field-free drift tube. Combination of TOF with another TOF tube (i.e., TOF/TOF) [29, 30] or a quadrupole (i.e., Q-TOF) [31] makes the instrument capable of performing both MS and MS/MS analyses with high resolution and mass accuracy. An Orbitrap is a new type of ion trapping mass analyzer, which measures the frequencies of harmonic oscillations of trapped ions along the axis of the electric field [32, 33]. The ion frequencies are
6
Zhijun Cao and Li-Rong Yu
then converted to corresponding m/z values by a fast Fourier transform. The Orbitrap mass analyzer achieves high resolving power (FWHM 350,000 at m/z 524) [34] and high mass measurement accuracy (typically 0.95 Area Under the Receiver Operating Characteristics). Su et al. from the Institute for Systems Biology in Seattle sought to investigate the systemic immune response. They analyzed serial blood draws from 139 COVID-19 patients collected in the first week of infection and representing the full range of disease severity. They performed an integrative analysis of clinical measurements, immune cells, and plasma multiomics. Their analysis revealed significant differences in immune cell populations between mild and moderate infection, as well as unique immune cell states. This study suggests that therapeutic intervention at the stage of moderate disease could be most effective [59].
Advanced Methods on Integrating Omics and Phenotypic Data The previous section introduced the existing studies of integrating omics and phenotypic data in a biological process or etiologies discovery, or applications to improve diagnosis and risk assessment, and outcome/prognosis/treatment prediction. Commonly used approaches to address these applications include correlationbased, network, Bayesian, and multivariate methods. In recent
28
Juan Zhao et al.
years, machine learning models have been increasingly used to harmonize the various variables and learn from the vast amount of data. Machine learning can be broadly categorized into the following three types. 1. Unsupervised learning (without the ground-truth labels) that aims at discovering hidden patterns, for example, disease prognostic patterns [8]. Methods such as clustering, matrix factorization are unsupervised methods. 2. Supervised learning trains a model with labeled data to predict the labels in unseen data, for example, prediction or diagnosis. Some supervised models that have been widely used in biomarkers are logistic regression, random forests, ensemble methods. 3. Semisupervised/self-learning. These methods train a model on a small set of labeled data to predict unseen data and use the prediction as “pseudo-labels” to predict future test data. The high dimensionality and heterogeneity of the omics/phenotypic data bring challenges to either conventional analysis or machine learning-based methods. First, omics data usually have high dimensional features (e.g., millions of SNPs). The high dimensionality of features compared to the relatively smaller labeled samples has limited statistical power and often leads to overfitting for supervised learning models. Thus, feature reduction is critical. Second, the formats of omics data are diverse, such as sequences (e.g., RNA-Seq), graphs (e.g., metabolic pathways, regulatory networks), geometric information (e.g., binding site, protein folding), and spatial components (e.g., cell compartment). Phenotypic data include structured, unstructured data (e.g., clinical notes) or images. Even the same type of data from different sources may be in various structures, which thus are rendered into multimodal/multiple views. To integrate these multiple-view data, fusion is typically required [60], which is a process of integrating multiple data sources to offer more consistent, accurate, and useful information than that provided by any single data source. Fusion methods can be categorized into two main types, that is, the early fusion and late fusion. Early fusion integrates different sources of data at the input space or the learned feature space. Examples are matrix factorization methods. The optimized goal is to learn a unified and reduced representation from all views that can simultaneously capture the relationship between different features. Late fusion focuses on training models separately on each type of data and combines the results from multiple models, such as ensemble methods. This section will first introduce several commonly used methods and the most recent approaches based on deep learning.
Omics and Phenotypic Data Integration for Precision Medicine
29
5.1 Matrix Factorization Methods
Matrix factorization, also called matrix decomposition, factors matrix into a product of matrices. The objective is to infer latent factors that explain interpatient variance within and across omics modalities [61], and project the data into a dimension-reduced space. Matrix factorization has been used as a feature reduction for downstream supervised classification. Among the matrix factorization methods, nonnegative matrix factorization (NMF) is well known for its direct interpretability of the latent factors, thereby widely used for disease subtyping [52]. In [62], Zhang et al. applied NMF to multiplatform genomic profiles across 385 samples (patients), and identified 200 multidimensional modules (md-module). They found several md-modules are significantly different in median survival time of ovarian cancer. However, when applying NMF to multiview data, a major challenge is to limit the search of factorizations to those that give meaningful clustering results across multiple views simultaneously. To address the problem, researchers extended the traditional NMF to joint NMF to ensure the uniqueness and correctness of factorizations for multiview data [63].
5.2 Canonical Correlation Analysis
Canonical correlation analysis (CCA) is a traditional method to investigate the relationship between two multivariate sets of variables (vectors), for example, a set of SNPs and a combination of phenotypic variables. CCA can be mathematically defined as— given two sets of variables x and y, the problem is to find two sets of basis vectors, one for x and the other for y, such that the correlations between the projections of the variables onto these basis vectors are mutually maximized. CCA can best explain the variability both within and between variable sets, which is appropriate in integrative multiomics data analysis, for example, to discover the coexpressed and coregulated genes and their associating SNPs [64]. However, the high-dimensionality of multiomics data may result in a multicollinearity problem and thus a computational difficulty. To tackle the issue, researchers commonly first applied dimension reduction by using principal component analysis (PCA) or improved the conventional CCA model by introducing the sparsity penalty or considering a group of features instead of individual features. Qiu et al. adopted sparse multiple discriminative CCA on a large multiomics data to identify osteoporosis biomarkers as well as their biological interaction and causal mechanisms [65]. They firstly performed individual transcriptomic, methylomic, and metabolomic analyses in 119 Caucasian female subjects with high (n ¼ 61) and low (n ¼ 58) measured low bone mineral density (BMD) to identify potential differentially expressed genes (DEGs), differentially methylated CpG sites (DMCs), and differential metabolic products (DMPs) for osteoporosis risk. Then they integrated
30
Juan Zhao et al.
the identified DEGs, DMCs, and DMPs via a sparse multiple discriminative canonical correlation analysis to retrieve prominent osteoporosis biomarkers that not only distinguish the high-BMD and low-BMD groups but also are highly correlated spanning different biological layers. 5.3 High-Order Tensor Decomposition
Tensor decomposition (or factorization) has recently gained interest in bioinformatics and biomedical informatics. A tensor is a multidimensional array where each modality spans one dimension (mode of a tensor). Mathematically, a first-order tensor is a vector, a second-order tensor is a matrix, and tensors of order three or higher are called higher-order tensors (e.g., third-order tensor is like a cube). Tensor decomposition is to decompose a tensor into factor matrices, which not only reduces dimensionality but also helps discover latent. groups in each modality and possibly identify group-wise associations, Tensor decomposition have been applied to the subjects including: integrative DNA microarray data from different studies [66]; individual gene tissue tensor [67] to identify gene networks from multitissue gene expression experiments; a subject pathway variant tensor for grouping pathways in a way that reflects molecular mechanism interactions [12]; and longitudinal EHR for finding precise subtypes of disease [68].
5.4 Network-Based Methods
Network-based approach is a promising strategy to integrate and interpret multiomics datasets. Biological systems can be abstracted as a series of networks comprised of interconnected molecular entities [69], and thus can be visualized and analyzed through network-based approach. Typically, the node in the network represents a molecular entity (e.g., protein, gene), and the edges are links between two nodes if there exists an interaction between them. Some examples are protein interaction (PPI) networks and metabolic networks. In some cases, the node can represent different data types (e.g., genes and phenotypes) to form a heterogeneous network. By constructing the network, researchers can use network or graph-based analysis and algorithms to detect significant genes within pathways, discover subclusters, or find coexpression network modules [70].
5.5 Deep Learning Approach
Deep learning (DL) enabled by advances in computational power have demonstrated the capacities in learning the feature representations from complex and diverse datasets such as images and texts and can be used in both unsupervised and supervised settings. Researchers have started to use DL approaches in areas of biology and biomedicine, where data can be represented as images and text in an analogous manner.
Omics and Phenotypic Data Integration for Precision Medicine
31
5.5.1 Deep Autoencoder
Deep autoencoder is a type of artificial neural network used to learn efficient data representations in an unsupervised manner and proven effective in enhancing the downstream task such as classification [71]. Compared to other methods such as PCA or matrix factorization, the deep autoencoder has advantages of identifying the complex and nonlinear relationship between each feature. Recent studies used variational autoencoder (VAE) for nonlinear latent factor inference from gene expressions and mutations. Using the latent factors, they identified colorectal cancer subtypes and predict patient survival [72]. In another work called PathME, researchers developed an unsupervised multimodal neural network architecture to learn a low dimensional embedding of omics features from multiple sources, which can be mapped to the same biological pathway. They used a deep denoising autoencoder to combine and compress the high-dimensional omics features into a single lower-dimensional representation matrix, which provide a score for patients in each pathway. They subsequently applied an NMF to this lower-dimensional representation matrix for identifying subgroups of patients [3]. Zhang et al. [73] first used autoencoder to integrate multiomics integration into a lower unified representation. They then applied K-means clustering to the representation to identify two subtypes with significant survival differences. Their results demonstrated that autoencoder has outperformed other alternative feature reduction approaches in classification tasks.
5.5.2 Graph Neural Networks
Graph Neural Networks (GNN) are deep learning models designed for the graph data and used to address the graph-related tasks in an end-to-end manner [74]. For graph structured data, GNN accepts element feature X (node or edge) and the graph structure A as the input and learn the embedding of the graph. Compared to other deep learning approaches, GNN leverages the feature of the node and the relationship between the nodes. GNN is also flexible and can be used for different tasks, such as link prediction, node classification, and graph classification. Researchers also have applied GNNs to biological data. For example, studies have applied GNNs to a graph structure of molecules/compounds to learn molecular fingerprints, to predict molecular properties [75]. Recently, Wang et al. introduced an integrative method named Multi-Omics graph conventional networks to mRNA expression data, DNA methylation data, and miRNA expression data for predicting the phenotypes [11].
32
6
Juan Zhao et al.
Challenges and Prospects Multiomics data and phenotypic data from diverse sources are snowballing. Hence, how to link different types of data remains a challenge. Data standards and quality controls are available for single-omic data, but absent for multiomics data experiments. Although the OMOP Common Data Model has been developed to enhance data sharing of EHR, there is still a lack of standards for linking EHR to omics data. Meanwhile, when more individualized data were connected from multiple sources, for example, ranging from the molecular data to clinical conditions, or data surrounding people’s daily lives such as the wearable watches and social media, people may raise much concern of data privacy that leakage personal information. How to improve the data linkage while provide a de-centralized, traceable, undeniable and privacy-controlled requires multiple efforts from new technology adoptions (e.g., blockchain [76, 77]), and collaborations, as well as the creation of new regulations and business models [78]. Another challenge is the interpretability of the models. Deep learning models have capabilities of learning intricate patterns but suffer from the black-box issue. How to identify the actionable biomarkers from predictive models become essential for translating the knowledge to clinical practice. There are many efforts in decoding the blackbox and improve the interpretability of the models, such as SHAP (SHapley Additive exPlanations) [79]. Some other challenges such as missing data, the uncertainty of noise in omics or phenotypic data could bring bias to the models and thus affect the health equity [80, 81]. Therefore, it is essential to evaluate the bias, to improve the diversity of the cohort and adopt techniques to migrate the risk.
7
Conclusion The quantity of digitized health and biological information that increased exponentially over the past decade has created opportunities and challenges to precision medicine. There are active studies that leveraged advanced computing methods to consolidate these heterogeneous, disconnected data from multiple sources to facilitate the precision medicine. We introduced the studies and methods that focus on identifying the connection between genomics and phenotypes; characterizing the subtypes of the complex disease for treatment recommendation; and building robust prediction models to improve the diagnosis and therapeutic interventions. Still, data aggregation and technical challenges, including interpretation and visualization, and evaluating and migrating the bias, need to be resolved.
Omics and Phenotypic Data Integration for Precision Medicine
33
References 1. Delude CM (2015) Deep phenotyping: the details of disease. Nature 527:S14–S15 2. Subramanian I, Verma S, Kumar S et al (2020) Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights 14:1177932219899051 3. Domingo-Ferna´ndez D, Mubeen S, Marı´nLlao´ J et al (2019) PathMe: merging and exploring mechanistic pathway knowledge. BMC Bioinformatics 20:243 4. Liu S-H, Shen P-C, Chen C-Y et al (2020) DriverDBv3: a multi-omics database for cancer driver gene research. Nucleic Acids Res 48: D863–D870 5. Cancer Genome Atlas Network (2012) Comprehensive molecular portraits of human breast tumours. Nature 490:61–70 6. Drew L (2016) Pharmacogenetics: the right drug for you, vol 537, pp S60–S62 7. Bibault J-E, Giraud P, Housset M et al (2018) Deep learning and radiomics predict complete response after neo-adjuvant chemoradiation for locally advanced rectal cancer. Sci Rep 8: 12611 8. Shen R, Olshen AB, Ladanyi M (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25:2906–2912 9. Zhu Z, Albadawy E, Saha A et al (2019) Deep learning for identifying radiogenomic associations in breast cancer. Comput Biol Med 109: 85–90 10. Lin Y, Zhang W, Cao H et al (2020) Classifying breast cancer subtypes using deep neural networks based on multi-omics data. Genes 11: 888 11. Wang T, Shao W, Huang Z et al (2020) MORONET: multi-omics integration via graph convolutional networks for biomedical data classification. Nat Commun 12:3445 12. Luo Y, Mao C (2020) PANTHER: Pathway Augmented Nonnegative Tensor factorization for HighER-order feature learning. arXiv preprint arXiv:2012.08580 13. Zhao J, Feng Q, Wu P et al (2019) Learning from longitudinal data in electronic health record and genetic data to improve cardiovascular event prediction. Sci Rep 9:717 14. Conesa A, Beck S (2019) Making multi-omics data accessible to researchers. Sci Data 6:251 15. Subramanian A, Narayan R, Corsello SM et al (2017) A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171:1437–1452.e17
16. Bycroft C, Freeman C, Petkova D et al (2018) The UK Biobank resource with deep phenotyping and genomic data. Nature 562: 203–209 17. Nagai A, Hirata M, Kamatani Y et al (2017) Overview of the BioBank Japan Project: study design and profile. J Epidemiol 27:S2–S8 18. Pendergrass SA, Crawford DC (2019) Using electronic health records to generate phenotypes for research. Curr Protoc Hum Genet 100:e80 19. Wei W-Q, Denny JC (2015) Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome Med 7(1):41 20. Wei W-Q, Teixeira PL, Mo H et al (2016) Combining billing codes, clinical notes, and medications from electronic health records provides superior phenotyping performance. J Am Med Inform Assoc 23:e20–e27 21. Wei W-Q, Bastarache LA, Carroll RJ et al (2017) Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLoS One 12:1–16 22. Wu P, Gifford A, Meng X et al (2019) Mapping ICD-10 and ICD-10-CM codes to phecodes: workflow development and initial evaluation. JMIR Med Inform 7:e14325 23. Zheng NS, Feng Q, Kerchberger VE et al (2020) PheMap: a multi-resource knowledge base for high-throughput phenotyping within electronic health records. J Am Med Inform Assoc 27:1675–1687 24. Estiri H, Strasser ZH, Murphy SN (2020) High-throughput phenotyping with temporal sequences. J Am Med Inform Assoc 28(4): 772–781 25. Burn E, You SC, Sena AG et al (2020) Deep phenotyping of 34,128 adult patients hospitalised with COVID-19 in an international network study. Nat Commun 11:5009 26. Pardamean B, Soeparno H, Budiarto A et al (2020) Quantified self-using consumer wearable device: predicting physical and mental health. Healthc Inform Res 26:83–92 27. COVID-19 CVD registry. https://www.heart. org/en/professional/quality-improvement/ covid-19-cvd-registry 28. Welter D, MacArthur J, Morales J et al (2014) The NHGRI GWAS catalog, a curated resource of SNP-trait associations. Nucleic Acids Res 42:D1001–D1006 29. Denny JC, Ritchie MD, Basford MA et al (2010) PheWAS: demonstrating the feasibility
34
Juan Zhao et al.
of a phenome-wide scan to discover gene–disease associations. Bioinformatics 26: 1205–1210 30. Verma A, Verma SS, Pendergrass SA et al (2016) eMERGE Phenome-Wide Association Study (PheWAS) identifies clinical associations and pleiotropy for stop-gain variants. BMC Med Genet 9:32 31. Gaulton KJ, Ferreira T, Lee Y et al (2015) Genetic fine mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci. Nat Genet 47:1415–1425 32. Khera AV, Emdin CA, Drake I et al (2016) Genetic risk, adherence to a healthy lifestyle, and coronary disease. N Engl J Med 375: 2349–2358 33. Denny JC, Crawford DC, Ritchie MD et al (2011) Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am J Hum Genet 89:529–542 34. Mu¨ller B, Wilcke A, Boulesteix A-L et al (2016) Improved prediction of complex diseases by common genetic markers: state of the art and further perspectives. Hum Genet 135: 259–272 35. Qin H, Niu T, Zhao J (2019) Identifying multi-omics causers and causal pathways for complex traits. Front Genet 10:110 36. Daly AK, Donaldson PT, Bhatnagar P et al (2009) HLA-B*5701 genotype is a major determinant of drug-induced liver injury due to flucloxacillin. Nat Genet 41:816–819 37. Hindorf U, Lindqvist M, Peterson C et al (2006) Pharmacogenetics during standardised initiation of thiopurine treatment in inflammatory bowel disease. Gut 55:1423–1431 38. Relling MV, Gardner EE, Sandborn WJ et al (2011) Clinical Pharmacogenetics Implementation Consortium guidelines for thiopurine methyltransferase genotype and thiopurine dosing. Clin Pharmacol Ther 89:387–391 39. Yang S-K, Hong M, Baek J et al (2014) A common missense variant in NUDT15 confers susceptibility to thiopurine-induced leukopenia. Nat Genet 46:1017–1020 40. Caudle KE, Klein TE, Hoffman JM et al (2014) Incorporation of pharmacogenomics into routine clinical practice: the Clinical Pharmacogenetics Implementation Consortium (CPIC) guideline development process. Curr Drug Metab 15:209–217 41. Wilke RA, Ramsey LB, Johnson SG et al (2012) The clinical pharmacogenomics implementation consortium: CPIC guideline for SLCO1B1 and simvastatin-induced myopathy. Clin Pharmacol Ther 92:112–117
42. Iorio F, Knijnenburg TA, Vis DJ et al (2016) A landscape of pharmacogenomic interactions in cancer. Cell 166:740–754 43. Yuan H, Paskov I, Paskov H et al (2016) Multitask learning improves prediction of cancer drug sensitivity. Sci Report 6:31619 44. Aben N, Vis DJ, Michaut M et al (2016) TANDEM: a two-stage approach to maximize interpretability of drug response models based on multiple molecular data types. Bioinformatics 32:i413–i420 45. Ali M, Aittokallio T (2019) Machine learning and feature selection for drug response prediction in precision oncology applications. Biophys Rev 11:31–39 46. Gurwitz D, Pirmohamed M (2010) Pharmacogenomics: the importance of accurate phenotypes. Pharmacogenomics 11:469–470 47. Namerow LB, Walker SA, Loftus M et al (2020) Pharmacogenomics: an update for child and adolescent psychiatry. Curr Psychiatry Rep 22:26 48. Bishop JR (2018) Chapter 6— pharmacogenetics. In: Geschwind DH, Paulson HL, Klein C (eds) Handbook of clinical neurology. Elsevier, pp 59–73 49. Trivizakis E, Papadakis GZ, Souglakos I et al (2020) Artificial intelligence radiogenomics for advancing precision and effectiveness in oncologic care (Review). Int J Oncol 57:43–53 50. Trivizakis E, Manikis GC, Nikiforaki K et al (2019) Extending 2-D convolutional neural networks to 3-D for advancing deep learning cancer classification with application to mri liver tumor differentiation. IEEE J Biomed Health Inform 23:923–930 51. Zhang H, Liu T, Zhang Z et al (2016) Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell 166:755–765 52. Zhao J, Feng Q, Wu P et al (2019) Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: a case study of Lipoprotein(a) (LPA). PLoS One 14: e0212112 53. Li L, Cheng W-Y, Glicksberg BS et al (2015) Identification of type 2 diabetes subgroups through topological analysis of patient similarity. Sci Transl Med 7:311ra174 54. Mankoo PK, Shen R, Schultz N et al (2011) Time to recurrence and survival in serous ovarian tumors predicted from integrated genomic profiles. PLoS One 6:e24709 55. Zhu B, Song N, Shen R et al (2017) Integrating clinical and multiple omics data for prognostic assessment across human cancers. Sci Rep 7:16954
Omics and Phenotypic Data Integration for Precision Medicine 56. Murillo J, Villegas LM, Ulloa-Murillo LM et al (2021) Recent trends on omics and bioinformatics approaches to study SARS-CoV-2: a bibliometric analysis and mini-review. Comput Biol Med 128:104162 57. Wang H, Li X, Li T et al (2020) The genetic sequence, origin, and diagnosis of SARSCoV-2. Eur J Clin Microbiol Infect Dis:1–7 58. Overmyer KA, Shishkova E, Miller IJ et al (2021) Large-scale multi-omic analysis of COVID-19 severity. Cell Syst 12:23 59. Su Y, Chen D, Yuan D et al (2020) Multiomics resolves a sharp disease-state shift between mild and moderate COVID-19. Cell 183:1479–1495.e20 60. Wang F, Preininger A (2019) AI in health: state of the art, challenges, and future directions. Yearb Med Inform 28:16–26 61. Ulfenborg B (2019) Vertical and horizontal integration of multi-omics data with miodin. BMC Bioinformatics 20:649 62. Zhang S, Zhou XJ (2014) Matrix factorization methods for integrative cancer genomics. Methods Mol Biol 1176:229–242 63. Liu J, Wang C, Gao J et al (2013) Multi-view clustering via joint nonnegative matrix factorization. In: Proceedings of the 2013 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp 252–260 64. Lin D, Zhang J, Li J et al (2013) Group sparse canonical correlation analysis for genomic data integration. BMC Bioinformatics 14:245 65. Qiu C, Yu F, Su K et al (2020) Multi-omics data integration for identifying osteoporosis biomarkers and their biological interaction and causal mechanisms. iScience 23:100847 66. Omberg L, Golub GH, Alter O (2007) A tensor higher-order singular value decomposition for integrative analysis of DNA microarray data from different studies. PNAS 104: 18371–18376 ˜ uela A, Buil A et al (2016) Tensor 67. Hore V, Vin decomposition for multiple-tissue gene expression experiments. Nat Genet 48:1094–1100 68. Zhao J, Zhang Y, Schlueter DJ et al (2019) Detecting time-evolving phenotypic topics via tensor factorization on electronic health records: cardiovascular disease case study. J Biomed Inform 98:103270 69. Zhou G, Li S, Xia J (2020) Network-based approaches for multi-omics integration. In: Li S (ed) Computational methods and data
35
analysis for metabolomics. Springer, New York, NY, pp 469–487 70. Huang S, Chaudhary K, Garmire LX (2017) More is better: recent progress in multi-omics data integration methods. Front Genet 8:84 71. Vincent P, Larochelle H, Lajoie I et al (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. JMLR 11: 3371–3408 72. Ronen J, Hayat S, Akalin A (2019) Evaluation of colorectal cancer subtypes and cell lines using deep learning. Life Sci Alliance 2: e201900517 73. Zhang L, Lv C, Jin Y et al (2018) Deep learning-based multi-omics data integration reveals two prognostic subtypes in high-risk neuroblastoma. Front Genet 9:477 74. Wu Z, Pan S, Chen F et al (2021) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 32:4–24 75. Gilmer J, Schoenholz SS, Riley PF et al (2017) Neural message passing for quantum chemistry. In: International conference on machine learning. PMLR, pp 1263–1272 76. Azaria A, Ekblaw A, Vieira T, Lippman A (2016) MedRec: Using Blockchain for Medical Data Access and Permission Management, 2016 2nd International Conference on Open and Big Data (OBD), pp. 25–30. https://doi. org/10.1109/OBD.2016.11 77. Kuo T-T, Ohno-Machado L (2018) ModelChain: decentralized privacy-preserving healthcare predictive modeling framework on private blockchain networks. arXiv:180201746 [cs] [Internet] 78. Johnson KB, Wei W-Q, Weeraratne D et al (2021) Precision medicine, AI, and the future of personalized health care. Clin Transl Sci 14: 86–93 79. Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Proceedings of the 31st International Conference on Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates Inc.p. 4768–77 80. Ferryman K, Pitcan M (2018) Fairness in precision medicine. Data & Society, 1 81. Rajkomar A, Hardt M, Howell MD et al (2018) Ensuring fairness in machine learning to advance health equity. Ann Intern Med 169: 866–872
Chapter 3 Stevens–Johnson Syndrome and Toxic Epidermal Necrolysis in the Era of Systems Medicine Chun-Bing Chen, Chuang-Wei Wang, and Wen-Hung Chung Abstract Stevens–Johnson syndrome and toxic epidermal necrolysis (SJS/TEN) are severe mucocutaneous bullous disorders characterized by widespread skin and mucosal necrosis and detachment, which are most commonly triggered by medications. Despite their rarity, these severe cutaneous adverse drug reactions will result in high mortality and morbidity as well as long-term sequela. The immunopathologic mechanisms is mainly cell-mediated cytotoxic reaction against keratinocytes leading to massive skin necrolysis. Subsequent studies have demonstrated that immune synapse composed of cytotoxic T cells with drug-specific human leukocyte antigen (HLA) class I restriction and T cell receptors (TCR) repertoire is the key pathogenic for SJS/TEN. Various cytotoxic proteins and cytokines such as soluble granulysin, perforin, granzyme B, interleukin-15, Fas ligand, interferon-γ, tumor necrosis factor-α have been as mediators involved in the pathogenesis of SJS/TEN. Early recognition and immediate withdrawal of causative agents, and critical multidisciplinary supportive care are key management of SJS/TEN. To date, there is yet to be a sufficient consensus or recommendation for the immunomodulants of the treatment in SJS/TEN. Systemic corticosteroids remain one of the most common treatment options for SJS/TEN, though the efficacy remain uncertain. Currently, there is increasing evidence showing that cyclosporine and TNF-α inhibitors decrease the mortality of SJS/TEN. Further multicenter double-blinded, randomized, placebo-controlled trials are required to confirm the efficacy and safety. Key words Cytotoxic T lymphocyte, Granulysin, Human leukocyte antigen, Severe cutaneous adverse reaction, Stevens–Johnson syndrome, T cell receptors, Toxic epidermal necrolysis
Abbreviations APC CTL CTLA-4 EMM HLA PD1 IFN-γ SCAR SJS
Antigen-presenting cells Cytotoxic T lymphocyte cytotoxic T-lymphocyte-associated protein 4 erythema multiforme major Human leukocyte antigen programmed death-1 Interferon-γ severe cutaneous adverse reaction Stevens–Johnson syndrome
Jane P.F. Bai and Junguk Hur (eds.), Systems Medicine, Methods in Molecular Biology, vol. 2486, https://doi.org/10.1007/978-1-0716-2265-0_3, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022
37
38
Chun-Bing Chen et al.
TEN TCR TNF-α
1
Toxic epidermal necrolysis T cell receptors Tumor necrosis factor-α
Introduction and Terminology Stevens–Johnson syndrome (SJS) and toxic epidermal necrolysis (TEN) are life-threatening immune-mediated mucocutaneous bullous diseases that considered to be primarily T-cell mediated. SJS and TEN are thought to be the same disease across difference spectrum of severity defined by the percentage of body surface area (BSA) involved with SJS (30%) (Table 1) [1]. The term with “SJS/ TEN” is used to refer to SJS, TEN, and SJS/TEN overlap syndrome as a whole spectrum of this entity. SJS/TEN often present with widespread erythematous or violaceous macules (spots), atypical targetoid flat macules and patches, blisters primarily on the trunk (Fig. 1) [2]. Patients with TEN (TEN with maculae) usually have confluence of macules, atypical targets, blisters, and progress into generalized sloughing skin [2]. TEN patients can sometimes present with large erythema without signs of confluence of the macules, which was described as “TEN without spots” or “TEN on large erythema” [2]. In addition, mucocutaneous eruptions is the characteristic feature of SJS/TEN and the oral mucosa is more
Table 1 Classification of the definition in the spectrum of erythema multiforme, Stevens–Johnson syndrome, and toxic epidermal necrolysis
EM major
SJS
SJS/TEN overlap
TEN with maculae
TEN on large erythema
Mucosal involvement
+
+
+
+
+
Skin 75 years of age and older, members of racial and ethnic minorities, and pregnant patients. The inclusion of such broader patient populations has already identified differential benefits of SGLT2 inhibitor therapy in the treatment of patients with chronic kidney disease in diabetes [77] and ASCVD in the elderly [78].
296
Eileen Navarro Almario et al.
2.2 Analyses of Clinical Data 2.2.1 Analysis of Outcomes in the Study Population Efficacy
The intent of the evidence of effectiveness in a regulatory submission is to test a hypothesis based on prespecified analysis plan. In general, consistent findings from two adequate and well controlled studies, “each convincing on its own” are needed to substantiate a conclusion that the drug has the effectiveness it is purported to for the indicated use [35]. The guidance notes that 2 studies in distinct populations using the same endpoint or two identical studies using different but related clinical endpoints and illustrates the setting where the 2 studies could provide complementary information that constitutes substantial evidence. The guidance describes the settings where one large multicenter trial, with confirmation, can establish effectiveness. Neither of the circumstances described are likely to provide sufficient clinical trial information to provide predictive estimates of the outcome in the individuals likely to receive treatment unless the large multicenter trial has a sufficient representation of relevant disease phenotypes. This challenge is even more pronounced in trials for disease of unmet need, where mining of data from other sources can provide greater opportunity if accessible for methods of clinical investigation. Data from complementary studies (i.e., mechanistic studies, studies in a closely related indication [e.g., the CF approval of new CF variants that relied on proven activity in the predominant CF phenotype], natural history, and studies of other agents in the same pharmacologic class) are likely sources of data that, if validated and compatible with regulatory systems, can potentially contribute to a model for outcome prediction. Nonetheless, an understanding of appropriate patient representation and the generalizability of the findings from individual trials also necessitates an assessment of the decrement between the randomized and analyzed population. Study withdrawals can have multiple causes including drug tolerability (toxicity) and futility of treatment (e.g., in an open-label setting). A comparison between characteristics of all randomized subjects and subjects who completed the trial can provide meaningful insights on specific patient characteristics that are crucial for developing personalized treatment strategies. Likewise, a comparison of the proportion and causes of withdrawals and treatment discontinuation between treatment arms helps in the causality assessment of observed outcomes. Adequate description of withdrawals from study or treatment (treatment discontinuation) is expected in trial reports to develop adequate instructions for use. Therefore, attention needs to be focused on mitigation and prevention of missing data. A large amount of missing data raises concern for bias and diminish confidence in the results. Missing data in clinical trials usually are not missing at random. There could be many reasons for discontinuation of therapy and study participation. Such reasons could be lack of efficacy, adverse events, or toxicity [79]. For example, in trials of an inhaled antibiotic
Personalized Medicine Through Informatics in Drug Regulation
297
delivered through a patient-activated device, frail subjects with diminished manual dexterity withdrew from therapy; subjects who dropped out from the trial early might have a different prognosis when compared to subjects who completed the trial. Likewise, early discontinuation of treatment-experienced patients from trials of protein-based therapies could cause secondary failures (from preexisting anti-drug antibodies [80]) and safety concerns (prior sensitization). All these reasons for withdrawals from the study could lead to missing data and tied to estimates of different outcomes. To address this issue, the National Research Council (NRC) Panel on Handling Missing Data in Clinical Trials recommended examining sensitivity to the assumptions about the missing data and that examination should be a mandatory component of reporting (recommendation 15) [81]. To further address missing data, the addendum to the statistical principles for clinical trials ICH E9 (R1) emphasizes four attributes that together comprise a precise definition of the estimand [82]. 1. The treatment condition of interest and, as appropriate, the alternative treatment condition to which comparison will be made. 2. The population, the patients targeted by the scientific question. 3. The variable (or endpoint) to be measured for each patient to address the scientific question. 4. The specification on how to account for intercurrent events to reflect the scientific question of interest. 5. The population-level summary for the variable that provides a basis for a comparison between treatment conditions. The estimand should be outlined early in the trial design process: after identifying the objectives, and before estimating the required sample size, planning assessment schedule, and choosing analysis methods. Once the estimand is defined, the estimators (statistical analysis methods) should be chosen to align with the estimand. Therefore, it is crucial that the missing data should be addressed according to the chosen estimand. Safety
The statutory basis for demonstration of safety of approved products under the Food, Drug and Cosmetics Act [83] that states “an application can be refused if it does not contain all tests reasonably applicable to show that it is safe” or “if the information” (submitted or otherwise “available”) are insufficient to determine “whether it is safe” when “used under proposed labeling.” Efficacy assessments are based on quantitative outcome estimates from the tabulated patient level data, whereas safety assessments rely on exploratory analysis beyond the tabulated clinical trial data, as referenced in the FDC Act. As phase III trials are generally powered for efficacy but
298
Eileen Navarro Almario et al.
cannot be powered for all potential safety endpoints, as adverse reactions may not be apparent at the time of protocol development. Identification of safety issues rely on a universe of “reasonably applicable” information relevant to multiple safety endpoints to exclude potential risk [84]. In addition to an assessment of the adequacy of the submitted data, conduct of the safety review includes identification and assessment of adverse events through exploratory analyses, an assessment of the required special safety studies, and assessment of the plan to manage the identified risks [85]. MedDRA coding dictionary is generally used in clinical trials to report adverse events in a standard format with specific advice on the coding of adverse events by sponsors, who “should ensure that adverse events were coded with minimal variability across studies,” which is most relevant for personalized medicine assessments. However, MedDRA includes >24,000 adverse event terms which may include various terms (abdominal pain, upper abdominal pain, abdominal discomfort, etc.) for the same medical concept. Use of queries to group similar terms ensures that important medical concepts are not missed. As such, a safety review consists of comparative quantitative assessments and a qualitative assessment of all data in a regulatory submission as well as relevant published and unpublished information. At inception, review of emerging safety issues is exploratory, drawing on knowledge of disease mechanism and natural history, drug pharmacology, background rates of events in the study population, and most closely reflect the abductive process of disease diagnosis in clinical practice. Analytic standards for visualization of safety data are as important as data standards to enable comparison of safety issues across marketing applications for the drugs/ biologics in the same drug class. Development of therapeutic area specific standards for common safety issues such as drug induced liver injury (DILI) or kidney injury will ensure a consistent and standardized way to evaluate specific safety issues across all drugs efficiently. Review of the integrated summary of safety (ISS) links the observed events to a potential mechanism, by synthesis of the entire range of in vitro, nonclinical, and special safety studies, clinical trial and postmarket experience. The application of prediction capabilities developed in other scientific disciplines to the biologic model of disease and pharmacology is an emerging discipline that has been explored in safety prediction [86] and facilitate the safety review goals “predict (ion) of adverse reactions, by subject-related factors (e.g., age, sex, ethnicity, race, target illness, abnormalities of renal or hepatic function, comorbid illnesses, genetic characteristics (e.g., metabolic status), environment) and drug-related factors (e.g., dose, plasma level, duration of exposure, concomitant medication)” [84]. Analytic platforms that integrate knowledge from
Personalized Medicine Through Informatics in Drug Regulation
299
existing biological systems networks to guide treatment precision is likely given advances in computer performance, simulation, and artificial intelligence algorithms. Ascertaining the utility of these prediction tools in regulatory review and their eventual contribution of precision medicine is an area of increased activity at FDA. Models adopted in risk assessment of candidate drugs in development include tools such as Drug Induced Liver Injury (DILI) Sim [87]; iterative model refinement could lead to clinical utility in DILI risk assessment and minimization for the individual patient. A gap that will need to be addressed is adequate documentation of patient data (including genomic information) alongside the phenotypic characterization of adverse events that occur in the context of the “real world” of care. Application of artificial intelligence to large data sets with varied information is expected to enable safety inference from observational data [88]. This capability will, however, present challenges in the review of regulatory submissions without medical informatics support and a staff trained in new methodologic analyses and their technical and clinical validation. 2.2.2 Outcomes in Subgroups
FDA implemented the Drug Trials Snapshots [89] to inform the public on the extent to which demographic subgroups are represented in clinical trials for medical products. Although the Snapshots information describes the safety and efficacy outcome by baseline demographic, this information is not intended to represent directions for drug use by subgroup, unless addressed in the product label. The assessment of differential response of, and modifications of labeling instructions for, relevant patient subgroups have long been described in FDA requirements for the format and content of the Clinical and Statistical sections of an application, and its guidance on clinical effectiveness and integrated summaries of safety and efficacy. While the integrated summary could refer to a description of “the nature of the drug’s effectiveness in demographic (e.g., age, sex, race, and ethnicity) and other subpopulations, into dose-response, and into onset and duration of effect” from data pooled from studies, the ISE and ISS also refers to a comparison of findings across the various studies in an NDA, “to better understand the overall results” with the intent to gain insight on how the drug should be used and draw from all available data supporting the outcomes assessed [85]. The ISE assesses the impact of prognostic baseline covariates (e.g., age, baseline severity, comorbidities, baseline medications and COVID-19 vaccination status) in the primary efficacy analysis of the relevant pivotal trials (for which the analysis plan should prespecify methods of covariate adjustment). To understand how benefit may vary in individuals, the ISE is intended to “examine study-to-study differences in results” (in the instance that studied populations or disease severity may differ) and “effects in subsets of the treated population, doseresponse information from all sources, any available comparisons
300
Eileen Navarro Almario et al.
with alternative drugs, and any other information, so that the nature of the drug’s effectiveness can be as fully defined as possible, and the user of the drug can be given the best possible information on how to use the drug and what results to expect” [33, 90]. The ISS is intended to elucidate the mechanism of the drug’s adverse action on biological process in human disease and draws from all data sources available in the regulatory submission, including results from postmarket reports (for previously marketed products) and unpublished research. The availability of analyses results metadata [91] for the data sources in an aggregated analysis facilitates accounting for all data sources and traceability of conclusion to the collected data. Guidance is provided for planning the ISE and ISS submissions within an NDA submission, depending on the scope of the development program [92]. Abbreviated programs that rely on a small study database could include their integrated analyses with the summaries of safety and efficacy in module 2. Multicenter trials that contain sufficient numbers of diverse trial participants in the subgroups described in 21 CFR 314.50(d)(5)(v) may enable reliable estimates of effect in these subgroups. Because even large simple trials may not often contain enough numbers of patients to provide subgroup outcomes estimates of efficacy or safety, FDA provides these guidance on integrated analyses to assess whether consistency is seen across subgroups, trials, and related outcomes or considering further hypothesis driven study when important differential activity is observed. An assessment of causality is a necessary part of the ISS and is facilitated by examining dose–and duration–response, as well as response to withdrawal or reexposure in a safety database, in addition to a comparison of event rates across active treatment and placebo (or comparator) arms. An observed difference in outcome estimates of risk could support attribution to the new drug product when coupled with a biologically plausible mechanism supported by in vitro, animal or PK/PD studies. Difference in effect estimates between subgroups, the nature of the outcomes (i.e., a serious or potentially fatal adverse event), and biologic plausibility from other data may be noted in the product label. Consistent differences in subgroups segmented by a biologically plausible modifying factor (pharmacogenomic groups, for example) could inform modeling of risk for the individual using informatics tools. Meta-Analyses and Pooling
The totality of evidence plays a key role in the regulatory approval process. It incorporates a systematic analysis of subject level data from clinical and nonclinical studies submitted to the regulator in support of a marketing application. The results based on some of the studies included in the regulatory review could also be summarized in meta-analyses. The regulatory review process differs from
Personalized Medicine Through Informatics in Drug Regulation
301
systematic reviews that synthesize only published estimates of effect; both processes summarize the empirical evidence based on the prespecified eligibility criteria to answer a specific research or regulatory question and can employ similar tools. Characterization of medical product impact with respect to key patient attributes is important for regulatory review. Thus, results obtained from FDA review of individual trial patient-level data are used in meta-analysis. An integrated assessment of trials is also utilized in regulatory decision-making under the condition that the integration of these trials is clinically and statistically meaningful and when required for a specific analytic purpose. An example of a regulatory review that relied on previously cpublished outcomes without individual patient level data was in the 2005 approval for quinine sulfate, a drug that has been in human use for over 200 years but has been marketed without a product label and therefore could not be introduced into U.S. interstate commerce. This 505b2 (published literature only, with no patient level clinical trial data) application consisted of 22 select published randomized clinical trials from the worldwide literature [93]. The FDA meta-analysis utilized only 10 of these studies it deemed adequate for review based on the need to define a treatment regimen and verify outcomes. While individual patient characteristics or parasite burden or drug resistance may account for observed differences in outcome by trial site, this subject level information could not be construed from the publications, and the only subgroup analysis that could be described was that of outcomes by geographic region (Fig. 6), illustrating the limitations of study level meta-analysis in providing personalized directions for use. The regulatory review highlights the need for meaningful selection of studies to be included in the meta-analyses. Selection of statistically and clinically incompatible studies could lead to misleading data interpretations and conclusions. Potential issues that could influence the results and proper interpretation of meta-analyses include the following. 1. Study selection. The key attributes of the studies should be well-defined study endpoints and populations, and use of standardized diagnostic and evaluation criteria. 2. Study quality. Some of the key elements of rigor in data collection include participant attrition, treatment adherence, and adjudication of endpoints. 3. Study size and enough events in the included studies to make a meaningful contribution to the analysis. 4. Data analysis. Degree of heterogeneity in baseline characteristics, statistical methodology, and ability to explain variability in outcomes.
302
Eileen Navarro Almario et al.
Review: Parenteral Quinine for Severe Malaria (Version 01) Comparison: 02 Mortality by Geographical Region Outcome: 01 Mortality Study Or sub-category
Quinine n/N
Artemisinin n/N
OR (fixed) 95% CI
OR (fixed) 95% CI
01 Asian Studies 19/50 5/46 Karbwang1 5/12 1/14 Karbwang 4/26 2/26 Singh 10/54 9/15 Falz 142 137 Subtotal (95% CI) Total events: 38 (Quinine), 17 (Artemisinin) Test for heterogeneity: Chi2 = 5.67, df = 3 (P = 0.13), I2 = 47.1% Test for overall effect: Z = 2.92 (P = 0.003) 02 African Studies 62/288 59/288 Van Hensbroek 14/49 11/54 Olumese 1/21 0/20 Adam 2/39 3/38 Satti 397 400 Subtotal (95% CI) Total events: 79 (Quinine), 73 (Artemisinin) 2 2 Test for heterogeneity: Chi = 1.30, df = 3 (P = 0.7), I = 0% Test for overall effect: Z = 0.64 (P = 0.52) 539 537 Total (95% CI) Total events: 117(Quinine), 90 (Artemisinin) Test for heterogeneity: Chi2 = 11.08, df = 7 (P = 0.14), I2 = 36.8% Test for overall effect: Z = 2.09 (P = 0.04)
5.03 9.29 2.18 1.06 2.53
[1.69, [0.90, [0.36, [0.39, [1.36,
14.95] 95.95] 13.11] 2.87] 4.72]
1992 1995 2000 2001
1.06 1.56 3.00 0.63 1.12
[0.71, [0.63, [0.12, [0.10, [0.79,
1.59] 3.87] 78.04] 4.00] 1.61]
1996 1999 2002 2002
1.39 [1.02, 1.88]
0.1 Reproduced from the NDA stascal review at hps://www.accessdata.fda.gov/drugsada_docs/nda/2005/021799s000TOC.cfm
Year
0.2
0.5
Favors Quinine
1
2
5
10
Favors Artemisinin
Fig. 6 Study level meta-analysis reveals site-specific outcome variation. Study level meta-analysis of a 505b2 (publication supported) NDA for a 7-day quinine sulfate treatment regimen for uncomplicated P. falciparum malaria, indicating region-specific differences in response to quinine, possibly due to site differences in timeliness of access to care or resistance of the malarial parasite to existing therapy. Subgroup analysis of patient level characteristics could not be performed as individual patient data was unavailable. (Reproduced from the NDA statistical review at https://www.accessdata.fda.gov/drugsatfda_docs/nda/2005/021799s000 TOC.cfm)
The conduct of randomized controlled trials has improved over the years and contemporaneous trials are more likely to address the above issues. In 2008, the US Food and Drug Administration issued a guidance setting new requirements for the development of antidiabetic drugs mandating the conduct of long-term Cardiovascular Outcomes Trials (CVOTs) to rule out an unacceptable increase in cardiovascular (CV) risks for new antihyperglycemic drugs indicated for treatment of Type 2 diabetes. The goal of CVOTs is to evaluate CV safety of a new drug based on the threecomponent major adverse cardiac events (MACE) composite endpoint of CV death, nonfatal myocardial infarction (MI) and nonfatal stroke (3, 4) against the prevailing standard of care. The consistent adoption of consensus MACE outcome definitions in diabetes clinical trials facilitates meta-analyses that support personalized treatment choices. For example, a meta-analysis of 30 large randomized controlled trials conducted in people with or at risk of Type 2 diabetes found reductions in fatal and nonfatal atherosclerotic cardiovascular events and all-cause mortality that varied with the drugs’ effect on weight reduction (Fig. 7) [94]. Prior to inclusion in meta-analysis, the studies were examined based on
Personalized Medicine Through Informatics in Drug Regulation
303
Fig. 7 Patient level meta-analysis of glucose lowering treatment on cardiovascular outcomes in diabetes. Meta-analyses for major cardiovascular event by therapy subgroup comparing new drug treatment vs. standard of care by weight loss outcome achieved, indicating interaction by drug class. The ability to discern differences in cardiovascular outcomes across drug classes relied on the broad application of consensus outcome definitions and data standards developed by the cardiovascular trials research community, adopted in the investigation of therapies for diabetes. CV cardiovascular, MI myocardial infarction, RR risk ratio, SGLT sodium glucose co transporter, GLP glucagon-like peptide. (Reprinted from The Lancet Diabetes & Endocrinology Volume 8, Issue 5, May 2020. G. S. Ghoshs-Swaby O, Leiter LA, et al. Glucose-lowering drugs or strategies, atherosclerotic cardiovascular events, and heart failure in people with or at risk of type 2 diabetes: an updated systematic review and meta-analysis of randomised cardiovascular outcome trials. Pages No. 418–35. Copyright © 2019 Elsevier Ltd. with permission from Elsevier.)
304
Eileen Navarro Almario et al.
adherence to quality metrics (e.g., rigor of masking, reporting on participant attrition, therapeutic adherence, and adjudication of endpoints) to merit study inclusion. These results suggest a potential broad cardiovascular benefit of using diabetes therapies that reduce bodyweight in routine clinical practice and can aid in the development of decision support tools that could be used in individualized treatment strategies for patients with elevated BMI. Moreover, data from an aggregated trial database of CVOTs could be mined to better identify additional important risk factors (patient characteristics, demographics, preexisting medical conditions/comorbidities, concomitant medications, exploratory biomarkers, etc.), confirm validity of comparator efficacy estimates and inform subgroup analyses for heterogenous populations with diabetes. 2.2.3 Outcomes in Individual Subjects
While individual patient data listings are available for all screened and randomized subjects in an NDA, the clinical review also assesses subjects who discontinued treatment or discontinued participation in the trial, subjects with outlier responses and those with significant amounts of missing data, to understand if any relationships exist between treatment and subsects unable to complete participation. There are few tools available to address whether outcomes in the individual are due to natural disease variation of disease or events attributable to the new drug. For example, traditional quantitative rate comparison across study arms requires that events occur in sufficient numbers for stable risk estimates. For trials in limited population studies (i.e., rare or orphan disease) detailed descriptions of individual subject efficacy and safety outcomes merit close scrutiny, as in the development of serious or fatal adverse events or with adverse events of special interest described in case report forms. Visual representation of subject data over the course of observation (Fig. 8) are useful in understanding the eventual outcome of study participants, such as the example provided, which shows the interaction between safety (adverse event occurrence) and efficacy (observed early failures) [95]. Graphical patient profiles using tabulated data from clinical trials can represent the interaction between different outcomes and, along with the release of the analytic code, enables transparency and consistency in application [96]. The occurrence of serious unexpected suspected adverse reaction (SAR) needs to be reported to the IND (21 CFR 312.32(c)(4)) and include “all available information, including a brief narrative describing the suspected adverse reaction and any other relevant information” [97]. Adoption of the technical specifications for the preparation and submission of these individual case safety reports (ICSR) [98] with the requisite minimum data element and MedDRA terminology raises the possibility of using similar tools in pharmacovigilance and with accumulating
Personalized Medicine Through Informatics in Drug Regulation
305
Fig. 8 Interaction of efficacy and safety outcomes in patient- level analysis of risk–benefit. Individual subjects (identified by subject number on the y axis) outcomes in multidrug-resistant pulmonary tuberculosis representing the outcomes observed over time in weeks from study start (x axis) among treatment failures observed by treatment group in color. The outcome of time to culture conversion (treatment success) is shown as a broken line and the time to relapse (treatment failure) represented as a solid line and appears to indicate differential time to relapse. Safety outcomes that limited treatment are shown. The small number of failures limit quantitative comparisons, and the figure is presented to illustrate interaction between safety/tolerability and efficacy in the qualitative analysis of overall individual participant outcomes. (Reproduced from the FDA medical officer review of the bedaquiline NDA, available at 204384Orig1s000 MedR_pdf (fda.gov))
data (from pre- to postmarket), the potential of using automated tools for pattern recognition. In their review of individual patient outcomes (from tabulated data and serious adverse event narratives) the medical reviewer conducts a qualitative analysis approach that incorporates known prior knowledge (disease pathophysiology, pharmacology, mechanisms of the class of drugs, patient factors that lead to treatment success or risk, in vitro and in vivo evidence, access to care, social factors that impact adherence, etc.) to conclude event causality and relationship to new drug. Detailed descriptions of individual outcomes are feasible to conduct in small population studies, or when reviewing details of rare events. Sharing fully annotated case descriptions of outcomes in individuals of rare disease phenotypes or fully documented cases of rare serious events, with adequate documentation of baseline and ad hoc genotypic information, could contribute to better understanding of the pathways involved in the backward loop of prediction refinement (Fig. 2) needed for personalized medicine. Tools that assess the growing body of knowledge regarding drug targets and disease mechanisms are
306
Eileen Navarro Almario et al.
undergoing development using systems medicine approaches [99], but few have made it to the bedside, in part due to the lack of detailed information on these patients with unmet needs. Targeted cellular therapies for specific molecular receptors directed at an individual patient’s cellular profile represent the ultimate in personalized immunotherapy. Recognizing the challenge that targeted therapies present in a priori outcome assessment and the role that the established biological pathways play in establishing the likelihood that a drug will have its intended effect, FDA issued a guidance for individualized antisense oligonucleotide drug products that addresses the administrative and procedural recommendations for IND submissions for these new medical product classes [100].
3
Postmarket Surveillance and Other Data Sources
3.1 Postmarket Surveillance
New safety-related changes to the product label can be required based on new safety information following product approval [101] from a range of sources including postmarket surveillance conducted at specified intervals or in response to emerging safety concerns [102]. These analyses include datamining of AE patterns identified from review of individual case safety reports (ICSRs) submitted to the FDA Adverse Event reporting system (or the analogous Vaccine Adverse Event reporting system) in required postmarketing safety reports (PSR) [103]. These assessments are complemented by literature scanning, epidemiologic investigations (including from the Sentinel initiative) [104] and additional sources (trial repositories, product or disease registries, computer queries of electronic health care data, and claims databases) [105] (Completed ARIA Assessments & Impact) [104–106]. To this end, FDA has long invested in developing a system that can enable electronic health data access where organizations provide consistent summarized results while maintaining possession of these files. The use of claims-based algorithms to assess risk predictors in individuals using distributed data networks similarly depends on well tested algorithms for the health outcome of interest [106]. Real world data sources can contribute to these labeling changes [107, 108]. For example, postmarket surveillance for patient safety issues for medical device products have been conducted under the authority of section 522 of the FD&C Act. FDA collaborated with the payers, professional societies, and industry to develop a patient registry that contains ~300 data elements for each registry participant and importantly also allows the development of a body of outcome data for a common comparator used by new device development programs, confirming the utility of real world data to evaluate “whether devices approvals can be expanded to other patient groups” and “address postmarket questions” [109].
Personalized Medicine Through Informatics in Drug Regulation
3.2
Registry Data
3.3 Reusing Clinical Trial Data
307
The adoption of common outcome definitions and terminology standards for the data that constitute the defined outcome would facilitate the use of real-world data (i.e., data collected in disease registries) not only in postmarket safety surveillance but, as with the approach taken with medical devices or in assessing outcomes in cystic fibrosis, could also be useful in expanding efficacy into other populations, such as in variants of already rare disease that are challenging to enroll prospectively. Practical challenges to implementing such harmonized data collection is that real world data sources (e.g., electronic health records, other observational data sources) are not structured to collect “signs and symptoms” data (including patient responses) and the treating physician’s assessment, when these aspects of care may be of greatest relevance to outcome. Consensus outcome definitions are needed to fully utilize the data in a way that would support regulatory estimates of drug effect. For example, a systematic review assessing the capability of retinal prosthesis systems found that even simple outcomes such as acuity tests were highly variable and that benefit from these devices could not be ascertained in retinitis pigmentosa patients. Even in the highly specialized care systems where retinitis pigmentosa is treated, these was great variability in the choice of outcomes to assess, in how that data is reported (even when representing the same concept) and in the analyses performed to show a treatment difference [109]. The need to conduct efficient timely trials and postvaccination surveillance in the COVID-19 pandemic has accelerated rapid implementation of digital tools. Tools configured to acquire patient responses to COVID-19 vaccines have been deployed by the CDC [110]. These tools could contribute to patient reported outcome assessments but need to be coupled with trial monitoring procedures for validation of vaccine failure outcomes to demonstrate ultimate utility of the data obtained in this setting for personalized medicine. Public investment in clinical health research has already been shown to provide meaningful, cost effective evidence to develop treatment guidelines and support availability of new treatments [111]. Reuse of publicly available data for personalized medicine can further extend the utility of existing clinical research data and maximize benefit from publicly funded research. Reuse of publicly available data from the BIOLINCC repository has identified differences in cardiovascular outcome by baseline characteristics [112] and differential antihypertensive response across treatment subgroups [113]. The secondary analyses of existing data collected in disparate formats required conversion of the individual study data to a common model to enable reuse of analytic code to define the outcomes of interest consistently. Differences in outcome definition, data element availability, annotation of crucial data elements, and deficient metadata limited the ability to conduct broader meta-analysis
308
Eileen Navarro Almario et al.
to better define personalized risk. This use case highlights the need for collection of general and specialty-specific data elements, broader outcome definitions beyond MACE and MACE plus even for trials not covered by the regulatory data standard mandates.
4
Conclusion The regulatory review process involves assessment of large amounts of data from various sources: medical informatics tools and technologies are needed to synthesize this data for optimized outcomes for the individual patient. The regulatory requirement for substantial evidence of effectiveness is the basis for requiring submission of study data and their prespecified analyses in a medical product new drug application. Regulatory review of this data plays an integral role in advancing precision medicine. Following market availability, mandates for continued postmarket assessments can refine early outcome estimates. Consistent application of algorithms to define treatment outcomes can facilitate prediction refinement. Essential data elements needed to support outcome assessment, adherence to consensus data standards to enable data aggregation, common analysis platforms, and the infrastructure for secure data archival and aggregation are made possible in the regulatory environment. Medical informatics can play a transformative role by enabling predictive correlation analyses between the outcomes of interest and the in vitro, nonclinical, and clinical data generated in discovery research. A challenge for the regulatory review workforce is to learn the skill and help develop the tools “to authenticate the reliability and accuracy of digital evidence and to meet scientific and judicial standards” [114]. The creation of the Biomarkers, Endpoints, and other Tools (BEST) resource in response to the recognized need to standardize terms used in translational science and medical product development [55] and the publication list of surrogate endpoints which were the basis of medical product marketing [47] can encourage adoption of standards in primary research. To enable iterative learning about a drug’s effect and identify discriminatory clinical endpoints for late phase trials, these standards need adoption in early phase studies and across all data sources. The development of consensus outcome definitions and common data elements is ultimately a community commitment with the goal of improving outcomes in individual patients [115]. Postmarketing surveillance, writ large, with interoperable standards, enabled by informatics, and supported by robust analytics and shared data, can be an enabler of this transformation. While informatics has the potential of linking prior knowledge generated in basic science to
Personalized Medicine Through Informatics in Drug Regulation
309
translational discovery, from academia to the physician at patient bedside, documenting real-world experience in drug use, in a backward loop of interoperable health care data, is crucial for refining precision medicine.
Acknowledgments The authors acknowledge Jane Bai, Yun Wang, and Tejas Patel for their insightful review on the manuscript. References 1. Adler-Milstein J (2021) From digitization to digital transformation: policy priorities for closing the gap. JAMA 325(8):717–718. https://www.ncbi.nlm.nih.gov/pubmed/33 620414 2. Maddali MV, Mehtani MV, Converse C et al (2019) Development and validation of HIV-ASSIST, an online, educational, clinical decision support tool to guide patientcentered ARV regimen selection. J Acquir Immune Defic Syndr 82(2):188–194. https://www.ncbi.nlm.nih.gov/pubmed/31 513553 3. U.S. Food and Drug Administration (2018) Guidance for industry. Technical specifications document. Submitting select clinical trial data sets for drugs intended to treat human immunodficiency virus-1 infection. https://www.fda.gov/media/112667/ download 4. Hamburg MA (2013) Paving the way for personalized medicine: FDA’s role in a new era of medical product development. https://www. fdanews.com/ext/resources/files/10/10-2 8-13-Personalized-Medicine.pdf 5. 21 U.S. Code } 352 (f) (1) (n.d.) Misbranded drugs and devices. https://uscode.house. gov/view.xhtml?req¼(title:21%20section:3 52%20edition:prelim) 6. U.S. Food and Drug Administration (2021) Prescription drug labeling resources. https:// www.fda.gov/drugs/laws-acts-and-rules/pre scription-drug-labeling-resources#Overview %20of%20Website 7. U.S. Food and Drug Administration (2021) Structured product labeling resources. https://www.fda.gov/industr y/fda-res ources-data-standards/structured-productlabeling-resources 8. U.S. Food and Drug Administration (2021) Drug trials snapshots. https://www.fda.gov/
drugs/drug-approvals-and-databases/drugtrials-snapshots 9. Eaneff S, Obermeyer Z, Butte A (2020) The case for algorithmic stewardship for artificial intelligence and machine learning technologies. JAMA Netw 324(14):1397–1398. https://www.ncbi.nlm.nih.gov/pubmed/32 926087 10. U.S. Food and Drug Administration (2019) New Drug Application (NDA). https://www. fda.gov/drugs/types-applications/newdrug-application-nda 11. U.S. Food and Drug Administration (2020) Study data standards resources. https://www. fda.gov/industr y/fda-resources-datastandards/study-data-standards-resources 12. Sorolla A, Wang E, Golden E et al (2019) Precision medicine by designer interference peptides: applications in oncology and molecular therapeutics. Oncogene 39:1167–1184. https://www.nature.com/articles/s41388019-1056-3.pdf 13. U.S. Food and Drug Administration (2019) FDA’s global substance registration system. https://www.fda.gov/industr y/fda-res ources-data-standards/fdas-global-sub stance-registration-system 14. Uppsala Monitoring Center (n.d.) WHODrug portfolio. https://www.who-umc.org/ whodrug/whodrug-portfolio/ 15. U.S. Food and Drug Administration (2009) Good Review Practice. Guidance for industry and review staff: labeling for human prescription drug and biologic products - determining established pharmacologic class for use in the Highlights of Prescribing Information. https://www.fda.gov/regulatory-informa tion/search-fda-guidance-documents/label ing-human-prescription-drug-andbiological-products-determining-establishedpharmacologic-class
310
Eileen Navarro Almario et al.
16. U.S. Food and Drug Administration (1996) Demonstration of comparability of human biological products, including therapeutic biotechnology-derived products. https:// www.fda.gov/regulator y-information/ search-fda-guidance-documents/demonstra tion-comparability-human-biologicalproducts-including-therapeutic-biotechnol ogy-derived 17. 21 CFR 312.23.(a) (8) (2020) IND content and format. Pharmacology and toxicology information. https://www.accessdata.fda. gov/scripts/cdrh/cfdocs/cfcfr/cfrsearch. cfm?fr¼312.23 18. U.S. Food and Drug Administration (1987) Guidance for industry. Format and content of the nonclinical and pharmacology/toxicology section of an application. https://www.fda. gov/regulatory-information/search-fda-guid ance-documents/format-and-content-non clinical-pharmacologytoxicology-sectionapplication 19. 21 CFR Part 58 (2020) Good laboratory practice for nonclinical studies. https://www. accessdata.fda.gov/scripts/cdrh/cfdocs/ cfcfr/CFRSearch.cfm?CFRPart¼58 20. Regenstrief Institute (n.d.) LOINC® from Regenstrief. https://loinc.org/ 21. Clinical Data Interchange Standards Consortium (n.d.) CDISC foundational standard: SEND. https://www.cdisc.org/standards/ foundational/send 22. U.S. Food and Drug Administration (2011) Guidance for industry. Potency tests for cellular and gene therapy products. https://www. fda.gov/media/79856/download 23. International Conference for Harmonization (2008) ICH M2 EWG electronic common technical document specification. https:// www.fda.gov/media/76783/download 24. Olbei M, Hautefort I, Modos D et al (2021) SARS-CoV-2 causes a different cytokine response compared to other cytokine stormcausing respiratory viruses in severely ill patients. Front Immunol 12:1–11. https:// www.frontiersin.org/ar ticle/10.3389/ fimmu.2021.629193 25. U.S. Food and Drug Administration (1999) Guidance for industry. Q6B specifications: test procedures and acceptance criteria for biotechnological/biological products. http ://w ww.fda. gov/cder/guidance/ Q6Bfnl.PDF 26. U.S. Food and Drug Administration (2021) COVID-19: potency assay considerations for monoclonal antibodies and other therapeutic proteins targeting SARS-CoV-2 Infectivity.
Guidance for industry. https://www.fda. gov/media/145128/download 27. Granat L, Kambhampati O, Klosek S et al (2019) The promises and challenges of patient-derived tumor organoids in drug development and precision oncology. Anim Model Exp Med 2:150–161. https://www. ncbi.nlm.nih.gov/pmc/articles/ PMC6762043/ 28. Durmowicz AG, Lim R, Rogers H et al (2018) The U.S. Food and Drug Administration’s experience with ivacaftor in cystic fibrosis. Establishing efficacy using in vitro data in lieu of a clinical trial. Ann Am Thorac Soc 15(1):1–2. https://www.atsjournals.org/ doi/10.1513/AnnalsATS.201708-668PS? url_ver¼Z39.88-2003&rfr_id¼ori%3Arid%3 Acrossref.org&rfr_dat¼cr_pub++0pubmed& 29. U.S. Food and Drug Administration (1987) Guidance for industry. Guideline for the format and content of the nonclinical pharmacology/toxicology section of an application. h t t p s : //w w w.fd a . g o v / m e di a / 72 2 2 3/ download 30. U.S. Food and Drug Administration (2013) Guidance for industry. Preclinical assessment of investigational cellular and gene therapy products. https://www.fda.gov/regulatoryinformation/search-fda-guidancedocuments/preclinical-assessment-investiga tional-cellular-and-gene-therapy-products 31. U.S. Food and Drug Administration (2015) Guidance for industry. Product development under the animal rule. https://www.fda.gov/ media/88625/download 32. 21 CFR 314.126 (a) (2020) Applications for FDA approval to market a new drug. Adequate and Well controlled studies. https:// www.accessdata.fda.gov/scripts/cdrh/ cfdocs/cfcfr/cfrsearch.cfm?fr¼314.126 33. U.S. Food and Drug Administration (1988) Guidance for industry. Guideline for the format and content of the clinical and statistical sections of an application. https://www.fda. gov/media/71436/download 34. 21 CFR 314.50 (d)(5)(iv) (2020) Content and format of an NDA. Technical sections clinical data section. https://www.accessdata. fda.gov/scripts/cdrh/cfdocs/cfcfr/ CFRSearch.cfm?fr¼314.50 35. U.S. Food and Drug Administration (2019) Guidance for industry. Demonstrating substantial evidence of effectiveness for human drug and biological products. https://www. fda.gov/media/133660/download 36. Clinical Data Interchange Standards Consortium (n.d.) CDISC foundational standards
Personalized Medicine Through Informatics in Drug Regulation SDTM. https://www.cdisc.org/standards/ foundational/sdtm 37. International Conference for Harmonisation (n.d.) MedDRA Medical dictionary for regulatory activities. https://www.meddra.org/ 38. U.S. Food and Drug Administration (2018) Structured product labeling resources: medical condition. https://www.fda.gov/indus try/structured-product-labeling-resources/ medical-condition 39. Szarfman A, Bereket T, Patel T et al (2016) Screen failure data and subgroup representation in diabetes clincial trials. https://www. fda.gov/industry/structured-product-label ing-resources/medical-condition 40. Patel T, Tesfaldet B, Sviglin H, et al (2016) Standard endpoints, standardized data and subgroup outcomes in diabetes: a patientlevel meta-analysis of cardiovascular outcomes. https://www.lexjansen.com/cssus/2016/PP19_Final.pdf 41. Butte AJ (2008) The ultimate model organism. Science 320(5874):325–327. https:// science.sciencemag.org/content/sci/320/ 5874/325.full.pdf 42. Borrel LN, Elhawary JR, Fuentes-Afflick E et al (2021) Race and genetic ancestry in medicine: a time for reckoning with racism. N Engl J Med 384:474–480. https://www. nejm.org/doi/full/10.1056/NEJMms202 9562 43. U.S. Food and Drug Administration (2018) Guidance for industry. Clinical trial imaging endpoint process standards. https://www.fda. gov/media/81172/download 44. U.S. Food and Drug Administration (2018) Biomarker qualification program. https:// www.fda.gov/drugs/drug-developmenttool-ddt-qualification-programs/biomarkerqualification-program 45. U.S. Food and Drug Administration (2016) Draft guidance for industry and FDA staff: principles for codevelopment of an invitro companion diagnostic device with a therapeutic product. https://www.fda.gov/media/ 99030/downloadv 46. Jørgensen JT (2021) The current landscape of the FDA approved companion diagnostics. Transl Oncol 14(6):101063. https://www. sciencedirect.com/science/article/pii/S193 6523321000553 47. U.S. Food and Drug Administration (2020) Table of surrogate endpoints that were the basis of drug approval or licensure. https:// www.fda.gov/d r ugs/development-res ources/table-surrogate-endpoints-werebasis-drug-approval-or-licensure
311
48. U.S. Food and Drug Administration (n.d.) Decision summary. Denovo classification request for FerriScan R2-MRI analysis system. https://www.accessdata.fda.gov/cdrh_docs/ reviews/K124065.pdf 49. U.S. Food and Drug Administration (2018) Guidance for stakeholders and FDA staff: considerations for design, development, and analytical validation of next generation sequencing - based in vitro devices intended to aid in the diagnosis of suspected germline diseases. https://www.fda.gov/regulatoryinformation/search-fda-guidancedocuments/considerations-design-develop ment-and-analytical-validation-next-genera tion-sequencing-ngs-based 50. U.S. Food and Drug Administration (2018) Guidance for stakeholders and Food and Drug Administration staff: use of public human genetic variant databases to support clinical validity for genetic and genomic -based in vitro diagnostics. https://www.fda. gov/regulatory-information/search-fda-guid ance-documents/use-public-human-geneticvariant-databases-support-clinical-validitygenetic-and-genomic-based-vitro 51. U.S. Food and Drug Administration (2020) Table of pharmacogenomic biomarkers in drug labeling. https://www.fda.gov/drugs/ science-and-research-drugs/tablepharmacogenomic-biomarkers-drug-labeling 52. U.S. Food and Drug Administration (2020) Table of pharmacogenetic associations. https://www.fda.gov/medical-devices/preci sion-medicine/table-pharmacogeneticassociations 53. U.S. Food and Drug Administration (2005) Guidance for industry. Pharmacogenomic data submissions. https://www.fda.gov/ media/122944/download 54. U.S. Food and Drug Administration (2018) Precision medicine. https://www.fda.gov/ medical-devices/vitro-diagnostics/precisionmedicine 55. FDA-NIH Biomarker Working Group (2016) BEST (Biomarkers, EndpointS, and other Tools) resource. Glossary. https://www.ncbi. nlm.nih.gov/books/NBK326791/ 56. U.S. Food and Drug Administration (2019) Guidance for industry: technical specifications. Submitting clinical trial datasets for evaluation of QT/QTc interval prolongation and proarrhythmic po. https://www.fda.gov/ media/128187/download 57. U.S. Food and Drug Administration (2014) Guidance for industry: expedited programs for serious conditions - drugs and biologics.
312
Eileen Navarro Almario et al.
h t t p s : //w w w.fd a . g o v / m e di a / 86 3 7 7/ download 58. Amur S, Lavange L, Zinneh I et al (2015) Biomarker qualification: toward a multiple stakeholder framework for biomarker development, regulatory acceptance and utilization. Clin Pharmacol Therapeut 98(1): 34–46. https://ascpt.onlinelibrary.wiley. com/doi/epdf/10.1002/cpt.136 59. 21 CFR 314.125 (b)(6) (2020) Application for FDA approval to market a new drug. Adequate and well-controlled studies. https:// www.accessdata.fda.gov/scripts/cdrh/ cfdocs/cfcfr/cfrsearch.cfm?fr¼314.126 60. U.S. Food and Drug Administration (2021) Office of New Drugs 2020 Annual report. https://www.fda.gov/media/146691/ download 61. Gliklich RE, Castro M, Leavy MB et al (2019) Harmonized outcome measures for use in asthma patient registries and clinical practice. J Allergy Clin Immunol 144(3):671–681. https://pubmed.ncbi.nlm.nih.gov/30 857981/ 62. Agency for Healthcare Research and Quality (AHRQ) (2019) Registries for evauating patient outcomes: a user’s guide. https:// effectivehealthcare.ahrq.gov/sites/default/ files/pdf/ehc-registries-users-guide-third-edi tion-second-addendum.pdf 63. Clinical Data Interchange Standards Consortium (n.d.) Death aCRF, in standards: SDTM implementation guide. https://www.cdisc. org/kb/examples/death-acrf-75271806 64. U.S. Food and Drug Administration (2015) Guidance for Industry. Integrated summary of effectiveness. https://www.fda.gov/ media/72335/download 65. Zhou J, Rakesh K, Wade D et al (2017) Developing ADaM dataset for cardiovascular outcome studies. https://www.pharmasug. org/proceedings/2017/PO/PharmaSUG2017-PO13.pdf 66. U.S. Food and Drug Administration (2014) Study data technical conformance guide -technical specifications document. https:// www.fda.gov/regulator y-information/ search-fda-guidance-documents/study-datatechnical-conformance-guide-technicalspecifications-document 67. U.S. Food and Drug Administration (2015) Guidance for industry. Human immunodeficiency virus-1 infection: developing antiretroviral drugs for treatment. https://www.fda. gov/media/86284/download 68. U.S. Food and Drug Administration (2017) Guidance for industry. Chronic hepatitis C
virus infection: developing direct acting antivirals for treatment. https://www.fda.gov/ media/79486/download 69. U.S. Food and Drug Administration (2018) Guidance for industry. Chronic hepatitis B virus infections:developing drugs for treatment. https://www.fda.gov/media/11 7977/download 70. U.S. Food and Drug Administration (2013) Guidance for industry. Clinical pharmacogenomics: premarket evaluation in early-phase clinical studies and recommendations for labeling. https://www.fda.gov/media/84 923/download 71. Emens LA, Cruz C, Eder JP et al (2019) Long term clinical outcomes and biomarker analyses of atezolizumab therapy for patients with metastatic triple negative breast cancer. A phase 1 study. JAMA Oncol 5(1):74–82. h t t p s : // j a m a n e t w o r k . c o m / j o u r n a l s / jamaoncology/fullarticle/2701722 72. Kimmel SE, Califf R, Dean NE et al (2020) COVID-19 trials: a teachable moment for improving our research infrastructure and relevance. Ann Intern Med 173(8):652–653. https://www.ncbi.nlm.nih.gov/pmc/arti cles/PMC7322771/ 73. Bugin K, Woodcock J (2021) Trends in COVID-19 therapeutic clinical trials. Nat Rev Drug Discov 20(4):254–255. https:// www.nature.com/articles/d41573-02100037-3 74. U.S. Food and Drug Administration (2020) Oncology Center for Excellence Guidance documents. https://www.fda.gov/aboutfda/oncology-center-excellence/oncologycenter-excellence-guidance-documents 75. Hirsch BR, Califf R, Cheng SK et al (2013) Characteristics of oncology clinical trials: insights from a systematic analysis of ClinicalTrials.gov. JAMA Intern Med 173(11): 972–979. https://jamanetwork.com/ journals/jamainternalmedicine/fullarticle/1 682358 76. U.S. Food and Drug Administration (2021) Source data capture from electronic health records. https://www.fda.gov/scienceresearch/advancing-regulator y-science/ source-data-capture-electronic-healthrecords-ehrs-using-standardized-clinicalresearch-data 77. Kelly MS, Lewis J, Huntsberry AM et al (2019) Efficacy and renal outcomes of SGLT2 inhibitors in patients with type 2 diabetes and chronic kidney disease. Postgrad Med 131(1):31–42. https://pubmed.ncbi. nlm.nih.gov/30449220/
Personalized Medicine Through Informatics in Drug Regulation 78. Orkaby AR, Driver JA, Ho YL et al (2020) Association of statin use with all-cause and cardiovascular mortality in US veterans 75 years and older. JAMA 324(1):68–78. https://doi.org/10.1001/jama.2020.7848 79. Permutt T (2016) Sensitivity analysis for missing data in regulatory submissions. Stat Med 35(17):2876–2879. https://onlinelibrary. wiley.com/doi/abs/10.1002/sim.6753 80. Atiqi S, Hoojberg F, Loeff FC et al (2020) Immunogenicity of TNF-inhibitors. Front Immunol 11:312. https://www.ncbi.nlm. nih.gov/pubmed/32174918 81. National Research Council (US) (2010) Panel on handling missing data in clinical trials. the prevention and treatment of missing data in clinical trials. National Academies Press, Washington, DC. https://www.ncbi.nlm.nih. gov/books/NBK209904/ 82. International Council for Harmonization (2020) ICH E9(R1) Harmonised guideline statistical principles for clinical trials: addendum: estimands and sensitivity analysis in clinical trials. Step 5. https://www.ema.europa. eu/en/documents/scientific-guideline/iche9-r1-addendum-estimands-sensitivity-analy sis-clinical-trials-guideline-statisticalprinciples_en.pdf 83. Section 505 (Federal Food, Drug, and Cosmetic Act) (1938) Public Law 75-717, 52 STAT 1040 84. U.S. Food and Drug Administration (2010) MAPP 6010.3 Rev 1. Attachment B: clinical safety review of an NDA or BLA. http://www. farmakovijilansdernegi.org/UserFiles/File/ SafetyNDA.pdf 85. U.S. Food and Drug Administration (2005) Guidance for Industry: premarketing risk assessment. https://www.fda.gov/media/71 650/download 86. Bai JPF, Fontana RJ, Price ND et al (2014) Systems pharmacology modeling: an approach to improving drug safety. Biopharm Drug Dispos 35(1):1–14. https://pubmed. ncbi.nlm.nih.gov/24136298/ 87. Watkins PB (2019) The DILI-sim initiative: insights into hepatotoxicity mechanisms and biomarker interpretation. Clin Transl Sci 12(2):122–129. https://www.ncbi.nlm.nih. gov/pmc/articles/PMC6440570/ 88. Pingault J, O’Reilly PF, Schoeler T et al (2018) Using genetic data to strengthen causal inference in observational research. Nat Rev Genet 19:566–580. https://www. nature.com/articles/s41576-018-0020-3 89. U.S. Food and Drug Administration (2021) Drug trials snapshots. https://www.fda.gov/
313
drugs/drug-approvals-and-databases/drugtrials-snapshots#:~:text¼Drug%20Trials%20 Snapshots%20are%20part%20of%20an%20 overall,benefits%20and%20side%20effects% 20among%20different%20demographic%20 groups 90. U.S. Food and Drug Administration (2021) Guidance for Industry. Integrated summary of effectiveness. 2015. https://www.fda. gov/media/72335/download 91. Low J, Pranab KM (2019) Phuse connect 2019. Striking a balance: adoption of analysis results metadata in early stage development studies. https://www.lexjansen.com/phuseus/2019/ds/DS11.pdf 92. U.S. Food and Drug Administration (2009) Guidance for industry. Integrated summaries of effectiveness and safety: location within the common technical document. https://www. fda.gov/regulatory-information/search-fdaguidance-documents/integrated-summarieseffectiveness-and-safety-location-within-com mon-technical-document 93. U.S. Food and Drug Administration (2005) Drug approval package: quinine sulfate capsules. https://www.accessdata.fda.gov/ drugsatfda_docs/nda/2005/021799s000 TOC.cfm 94. Ghoshs-Swaby O, Goodman SG, Leiter LA et al (2020) Glucose-lowering drugs or strategies, atherosclerotic cardiovascular events, and heart failure in people with or at risk of type 2 diabetes: an updated systematic review and meta-analysis of randomised cardiovascular outcome trials. Lancet Diabetes Endocrinol 8(5):418–435. https://www.thelancet. com/journals/landia/article/PIIS22138587(20)30038-3/fulltext 95. U.S. Food and Drug Administration (2012) Medical review. Application number: NDA 204384Orig1s000. https://www.accessdata. fda.gov/drugsatfda_docs/nda/2012/2043 84Orig1s000MedR_pdf 96. Garner W (2013) Creating graphical patient profiles using SAS®. https://lexjansen.com/ wuss/2013/96_Paper.pdf 97. U.S. Food and Drug Administration (2012) Guidance for industry and investigators: safety reporting requirements for INDs and BA/BE studies. https://www.fda.gov/ media/79394/download 98. International Council for Harmonization (2020) ICH e2B(R2) Technical specifications document for electonic ICSRs: specifications for preparing and submitting electronic ICSRs and ICSR Attachments. https://www. fda.gov/media/132096/download
314
Eileen Navarro Almario et al.
99. Bai JPF, Abernethy D (2013) Systems pharmacology to predict drug toxicity: integration across levels of biological organization. Annu Rev Pharmacol Toxicol 53:451–473. https:// www.annualreviews.org/doi/abs/10.1146/ annurev-pharmtox-011112-140248 100. U.S. Food and Drug Administration (2021) Guidance for sponsor-investigators: IND submissions for individualized antisense oligonucleotide drug products: administrative and procedural ecommendations. https:// www.fda.gov/media/144872/download 101. U.S. Food and Drug Administration (2013) Guidance for Industry: safety labeling changes - implementation of Section 505(o) (4) of the FD&C Act. https://www.fda.gov/ media/116594/download 102. FDA (2019) Best practices in drug and biological product postmarket safety surveillance for FDA staff. https://www.fda.gov/ media/130216/download 103. International Council for Harmonization (2014) E2B(R3) electronic transmission of individual case safety reports implementation guide — data elements and message specification; and appendix to the implementation guide- backwards and forwards compatibility. https://www.fda.gov/regulatory-informa tion/search-fda-guidance-documents/e2 br3-electronic-transmission-individual-casesafety-reports-implementation-guide-dataelements-and 104. U.S. Food and Drug Administration (2020) FDA’s sentinel initiative – background. https://www.fda.gov/safety/fdas-sentinelinitiative/fdas-sentinel-initiative-background 105. The Sentinel Initiative (n.d.) FDA Sentinel drug assessments: from ARIA and other sent i n e l d a t a s o u r c e s . h t t p s : // w w w. sentinelinitiative.org/assessments/drugs 106. Gibson TB, Nguyen M, Burrell T et al (2021) Electronic phenotyping of health outcomes of interest using a linked claims-electronic health record database: findings from a machine learning pilot project. JAMIA 28:1507. https://academic.oup.com/jamia/articleabstract/28/7/1507/6169465? redirectedFrom¼fulltext 107. U.S. Food and Drug Administration (2020) Real world evidence. Publications and guida n c e . h t t p s : // w w w. f d a . g o v / s c i e n c e -
research/science-and-research-specialtopics/real-world-evidence 108. U.S. Food and Drug Administration (2018) Framework for FDA’s real world evidence program. https://www.fda.gov/media/1200 60/download 109. U.S. Food and Drug Administration (2017) FDA facts: postmarket patient registry ensures access to safe and effective devices. h t t p s : // w w w. f d a . g o v / a b o u t - f d a / innovation-fda/fda-facts-postmarket-patientregistry-ensures-access-safe-and-effectivedevices 110. U.S. Centers for Disease Control and Prevention (2021) V-safe after Vaccination health checker. https://www.cdc.gov/coronavi rus/2019-ncov/vaccines/safety/vsafe.html 111. Unger JM, Nghiem V, Hershman DL et al (2019) Association of National Cancer Institute–Sponsored Clinical Trial Network Group studies with guideline care and new drug indications. JAMA Netw Open 2(9): e1910593. https://jamanetwork.com/ journals/jamanetworkopen/fullarticle/274 9235 112. Patel T, Shamsuzzaman M, Wu C et al (2017) Abstract 18061: Predictors of hospitalization or death due to heart failure in diabetic patients by gender in the ACCORD trial using random survival forests. Circulation 136:A18061. https://www.ahajournals.org/ doi/10.1161/circ.136.suppl_1.18061 113. Xin V, Dey A, Wang R et al (2019) Abstract 626: Predictors of hard outcomes in the ALLHAT trial identified with machine learning. Arterioscler Thromb Vasc Biol 39:A626. h t t p s : // w w w. a h a j o u r n a l s . o r g / d o i / abs/10.1161/atvb.39.suppl_1.626 114. Arshad H, Abiodun OI, Jantan A (2018) Digital forensics: review of issues in scientific validation of digital evidence. J Inf Proc Syst 14(2):346–376. https://www.researchgate. net/publication/327644306_Digital_Fore nsics_Review_of_Issues_in_Scientific_Valida tion_of_Digital_Evidence 115. Imperial MZ, Nahid P, Phllips PPJ et al (2018) A patient-level pooled analysis of treatment-shortening regimens for drugsusceptible pulmonary tuberculosis. Nat Med 24:1708. https://www.nature.com/arti cles/s41591-018-0224-2?theme¼acento201 8
Chapter 15 Personal Dense Dynamic Data Clouds Connect Systems Biomedicine to Scientific Wellness Gilbert S. Omenn, Andrew T. Magis, Nathan D. Price, and Leroy Hood Abstract The dramatic convergence of molecular biology, genomics, proteomics, metabolomics, bioinformatics, and artificial intelligence has provided a substrate for deep understanding of the biological basis of health and disease. Systems biology is a holistic, dynamic, integrative, cross-disciplinary approach to biological complexity that embraces experimentation, technology, computation, and clinical translation. Systems Medicine integrates genome analyses and longitudinal deep phenotyping with biological pathways and networks to understand mechanisms of disease, identify relevant blood biomarkers, define druggable molecular targets, and enhance the maintenance or restoration of wellness. Two programs initiated our understanding of datadriven population-based wellness. The Pioneer 100 Study of Scientific Wellness and the much larger Arivale commercial program that followed had two spectacular results: demonstrating the feasibility and utility of collecting longitudinal multiomic data, and then generating dense, dynamic data clouds for each individual to utilize actionable metrics for promoting health and preventing disease when combined with personalized coaching. Future developments in these domains will enable better population health and personal, preventive, predictive, participatory (P4) health care. Key words Scientific wellness, Systems biology, Dense dynamic data clouds, P4 health care, Correlation networks, Polygenic risk scores, Metabolome, Microbiome, Biological aging, Personalized coaching
1
Introduction Upon its creation following World War II, the World Health Organization defined “health” as a “state of physical and mental wellbeing, not just the absence of disease or infirmity” [1, 2]. A quarter century later at the Alma-Ata Conference on Primary Health Care in Kazakhstan, WHO director-general Dr. Halfdan Mahler proclaimed “Health for All” as a global goal [3]. For more than four decades, the United States Surgeon General has led a mission called “Healthy People: Objectives for the Nation” with decadal reports, from Healthy People 1990 to current work toward Healthy People 2030 [https://health.gov/healthypeople], and broad engagement
Jane P.F. Bai and Junguk Hur (eds.), Systems Medicine, Methods in Molecular Biology, vol. 2486, https://doi.org/10.1007/978-1-0716-2265-0_15, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2022
315
316
Gilbert S. Omenn et al.
to identify quantitative goals and 355 objectives for improvement in health and reduction in diseases. We now have the concepts, tools, and strategies to accelerate progress on the long journey toward these goals. Scientific Wellness leverages personal, dense, dynamic data clouds to quantify and define wellness and identify deviations from well states toward disease. Scientific Wellness is manifested through healthful behaviors, community actions, and predictive, preventive, personalized, participatory (P4) health care for clinical translation of scientific wellness concepts. Each human is unique due to the interactions between her/his inherited genome and her/his environment, including lifestyle choices and socioeconomic determinants of health and health disparities over the life span. These interactions are manifested through the dynamics of molecular, biochemical, and physiological networks and contribute to homeostatic maintenance of health or divergence toward disease. Conceptually, longitudinal studies using high dimensional data from genomics, transcriptomics, proteomics, metabolomics, and the microbiome can reveal deep phenotypes for individual health and identify prodromal deviations from health well in advance of clinical symptoms (Fig. 1). This work is fostering the invention of new analytical approaches, new insights into human biology, and new concepts in health and data science.
2
The Pioneer 100 Wellness Project Scientific Wellness is a major focus of the Institute for Systems Biology in Seattle, which began the Pioneer 100 Wellness Project (P100) in 2014. The P100 was the first example of generating personal, dense, dynamic data clouds across a population, sampled on a quarterly basis over 9 months [4]. The 108 participants were 59% male/41% female, ages 21–89 years, and 89% Caucasian. The P100 was inspired by three ground-breaking longitudinal studies on single individuals (N ¼ 1), focused on type 2 diabetes [5], inflammatory bowel disease [6], and gut microbiome composition [7]. Each participant in the P100 was assayed every 3 months using 218 clinical laboratory tests, 643 metabolites, and 262 proteins, plus 4616 operational taxonomic units (OTUs) in the gut microbiome defined by 16S rRNA sequencing (Fig. 2). Leveraging whole genome sequencing for each participant, we calculated 127 polygenic scores for disease risks and quantitative traits based on studies in the NHGRI GWAS catalog [8]. Participants recorded their weight, blood pressure, and heart rate weekly and tracked physical activity and sleep with a Fitbit wearable device.
From Systems Medicine to Scientific Wellness
317
Fig. 1 Connecting high-throughput, multiomic measurements with phenotypes: building personal dense dynamic data clouds. (Omenn & Athey, University of Michigan National Center for Integrative Bioinformatics) 2.1 Community Structure in Correlation Networks
Two age- and sex-adjusted Spearman correlation networks were built using the personal, dense, dynamic data clouds collected from members of the P100 cohort: cross-sectional correlations from mean measurements across the three rounds and delta correlations calculated on the change in analyte values between rounds for individuals [4]. The correlation networks were visualized using chord diagrams; vertices correspond to analytes and each edge between two vertices reflects a statistically significant correlation ( padj < 0.05, adjusted for multiple hypothesis testing). Of particular interest were correlations between distinct omics domains (e.g., metabolomics and proteomics), comprising the interomic cross-sectional correlation network (Fig. 3). The interomic network contained 766 nodes and 3470 edges; 3309 of the edges included a metabolite, 3366 a clinical laboratory test, 207 a protein, 130 a genetic trait, and 46 a microbiome taxon. The delta correlation network contained 822 nodes and 2406 edges, of which 375 were also found in the cross-sectional network [See ref. 4 for detailed tables]. We used the Girvan-Newman algorithm [9] to iteratively prune edges and reveal substructures, or communities of densely connected nodes, within the correlation networks. All told there were 70 of these communities.
318
Gilbert S. Omenn et al.
Fig. 2 Timeline and types of longitudinal data collected in the 100 Pioneers Study. (From Price et al. Nature Biotechnology 2017, Figure 1)
The largest community in the interomic cross-sectional correlation network (246 vertices; 1645 edges) comprised measurements most closely associated with cardiometabolic health: C-peptide, insulin, HOMA-IR, fasting glucose, HDL-cholesterol, triglycerides, and small LDL particle number. The four most connected proteins measured by targeted SRM mass spectrometry or Olink proximity extension assays were leptin, C-reactive protein, fibroblast growth factor 21, and inhibin-beta C chain. Total cholesterol and LDL-cholesterol segregated into a separate community with lipids and L-thyroxine; hypothyroidism is a classic cause of hypercholesterolemia. One highly interconnected metabolite in the cardiometabolic community, gamma-glutamyltyrosine, was significantly correlated with several major CVD biomarkers; it warrants investigation as a candidate biomarker for diabetes risk independent of body mass index. Twelve proteins were correlated with plasma serotonin (among 18 vertices and 25 edges). There were several communities containing microbiome taxa and specific
From Systems Medicine to Scientific Wellness
319
Fig. 3 This circos plot shows numerous interomic correlations among proteome, metabolome, clinical lab values, microbiome, and genome (clockwise). (From Price et al. Nature Biotechnology 2017, Figure 2)
human and/or microbial metabolites. Microbiome diversity was negatively correlated with inflammatory and immune-related proteins. A similar analysis in the interomic delta correlation network representing change over time revealed 33 communities that were distinct from the cross-sectional network. Examples involved changes in galanin (a neuropeptide implicated in Alzheimer disease and diabetes), omega-3 fatty acids, and a furan fatty acid metabolite implicated in pancreatic beta-cell dysfunction and diabetes. We identified several statistically significant associations with genetic predispositions, estimated using polygenic risk scores (PRS). For examples, the genetic risk for inflammatory bowel disease, computed from 110 single nucleotide polymorphisms (SNPs) identified by an independent genome-wide association study (GWAS), correlated with plasma cystine, the disulfide form of the amino acid cysteine. Genetic risk for bladder cancer was associated with AFMU, the acetylated metabolite of caffeine. Measured LDL-cholesterol levels could be compared with the
320
Gilbert S. Omenn et al.
Fig. 4 This plot of quintiles of predicted serum LDL-cholesterol concentrations, compared with the Q3 mean value, is based on polygenic risk scores. Dietary intake and LDL-lowering medications act against this backdrop of individual variation. (From Zubair et al. Sci Rep. 2019)
genetically predisposed LDL cholesterol levels for clinical guidance: if a patient with elevated LDL-C is not genetically predisposed for high LDL-C, we expect a greater contribution from environmental factors and thus a lifestyle intervention approach (or low-dose statin therapy) may be quite effective at reducing LDL-C to healthy levels. Conversely, if a patient with elevated LDL-C is also genetically predisposed for high LDL-C, we predict lifestyle intervention to be less effective and high-dose LDL-C-lowering statin therapy may be indicated; patients with very high genetic predisposition for high LDL-C (Fig. 4) may require a much more potent drug, like a PCSK9 inhibitor. It is likely for many diseases that high-risk patients must be treated differently from those with low polygenic risk scores. Stratifying patients in this manner could yield better outcomes while enabling greater targeting of the intervention and reducing negative side effects of statin therapy. Torkamani et al. [10] have assessed the strategy of generating and utilizing polygenic risk scores and raised concerns about their usefulness. They concluded that the pathways to polygenic risk scores are complex, as is illustrated by our communities of correlation networks. Polygenic risk scores are more complex than is appreciated, but this does not mitigate the point that high and low risk patients could be treated differently.
From Systems Medicine to Scientific Wellness
321
2.2 Identifying and Utilizing Actionable Findings in Behavioral Coaching
Each participant worked with a nutritionist as a personal behavioral coach to periodically discuss the findings, reinforce healthful actions across diet, exercise, stress management, and dietary supplements, and encourage specific responses to personal study data focused on four health areas: cardiovascular, diabetes, inflammation, and nutrition. A very common out-of-range test result, which also showed remarkable improvement over the study period, was HbA1c, a diagnostic marker for type 2 diabetes and risk factor for prediabetes. There was a high prevalence of vitamin D serum values indicative of deficiency of intake and activation by sunlight (91/108), with an additive predisposition related to the number of variant alleles at three different gene loci that reduce absorption of vitamin D. Individuals with two or more of these variants required much larger doses of vitamin D to return to normal blood levels. A surprising finding in a few individuals was high levels of mercury, arising either from eating lots of tuna sushi or from old dental fillings. Reducing tuna consumption or removing amalgam fillings reduced mercury to low levels.
2.3 Reversing Prediabetes/Early Type 2 Diabetes
At baseline, 48% of the Pioneer 100 participants had HbA1c levels at or above the American Diabetes Association normal reference range up to 5.7%, a prevalence similar to the overall U.S. population. We observed on average a 0.085% improvement/ reduction in HbA1c between consecutive blood draws; over three blood draws, generating two intervals, there was an improvement of 0.17% in 6 months. An independent meta-analysis associated a 0.16% improvement in HbA1c with a 1% reduction in the annualized incidence of new diagnoses of diabetes. Extrapolated to the U.S. adult population, such a response might result in 880,000 fewer cases of diabetes per year based on changes in diet, exercise, and weight, stimulated by the actionable report of elevated HbA1c levels [11]. Preventing or delaying onset of type 2 diabetes would reduce incidence of atherosclerotic cardiovascular disease, kidney disease, and Alzheimer disease.
3
Expansion of the Pioneer 100 Protocol to 5000 Participants by Arivale, Inc. Beginning in 2015, the P100 protocol was expanded from the 108 original P100 ‘pioneers’ to several thousand participants who enrolled in the Arivale commercial program. On average, participants in the Arivale program achieved sustained significant improvements in clinical markers related to cardiometabolic risk, inflammation, nutrition, and body mass index (BMI). Notably, improvements in HbA1c were akin to those observed in landmark clinical trials. Furthermore, genetic markers were associated with longitudinal changes in clinical markers. Clearly, genetic predisposition impacts clinical responses to lifestyle change [12].
322
Gilbert S. Omenn et al.
Over the 5+ years that Arivale enrolled participants, the program generated a unique and valuable longitudinal multiomic dataset collected from a generally healthy population. As presented in the following sections, the many subsequent studies on the resulting datasets confirmed or established predisease multiomic footprints of polygenic risk across diseases [13], detected protein markers of cancers and cancer metastasis years prior to diagnosis [14], developed multiomic models of biological age as an overall measure of wellness [15], predicted gut microbiome diversity from plasma metabolomics [16], identified microbiome-derived biomarkers of cardiovascular disease [17], and established the microbiome’s critical role in healthy aging [18]. 3.1 Predicting Blood Analytes and Disease Risks Based on Genetic Variants and Polygenic Risk Scores
Transitions from health to disease are characterized by dysregulation of biological networks under the influence of genetic and environmental factors. These transitions may progress over decades before manifestation of diagnosable clinical abnormalities. We [13] generated polygenic risk scores for 54 diseases and complex traits based on genome-wide association studies [8] coupled with the multiomic profiling established in the Pioneer 100 program. These 54 PRSs were associated with 766 detectable alterations in proteomic (274), metabolomic (713), and standard clinical laboratory (47) assays in blood collected from 4905 participants in the Arivale Wellness Program (78% Caucasian). The BMI PRS was correlated with by far the most analytes (84 proteins, 114 metabolites, 32 clinical labs); the next two most associated PRSs were for educational attainment and for depression. Overall, plasma metabolites and proteins were associated with far fewer PRSs than plasma clinical labs, which had been developed over decades for use in diagnosing or monitoring diseases. The most associated protein, with six associations, was cytokine IL12B; the most associated metabolite, with five, was diglyceride oleoyl-linoleoyl-glycerol (18:1/18:2). Meanwhile, omega-3 fatty acids were associated with 20 of the 54 diseases and traits. We confirmed previously reported associations of glutamatergic neurotransmission and inflammation with depression and of IL-33 with asthma. We discovered associations that could be targeted with diet (omega-6 supplements) or drugs (IL-13 inhibitors) for amyotrophic lateral sclerosis (ALS). We noted associations with longevity for leukemia inhibitory factor and ceramides. PRS-associated analytes may be considered alterations in a presymptomatic stage on a trajectory to particular disease states.
3.2 Detecting Markers of Cancers in the Presymptomatic Phase
One of the primary goals of a scientific wellness program is the identification of biomarkers for early detection of cancers and other major diseases. [Always think and speak of “cancers” in the plural, to recognize the enormous heterogeneity of cancers.] With our longitudinal multiomics dataset generated by the Arivale program and tracking of subsequent disease diagnoses, we were able to apply retrospective analyses to identify potential biomarkers for major
From Systems Medicine to Scientific Wellness
323
Fig. 5 Rising concentrations of protein biomarkers in plasma may precede the diagnosis of various specific cancers by years. Here individual patients’ trajectories for carcinoembryonic antigen-related cell adhesion molecule 5 (CEACAM5) show that striking increases occurred in three individual patients with breast, lung, or pancreatic cancers which eventually claimed their lives due to metastases. The rest of the points and lines reflect the rest of the study population. Other analytes may similarly enable early detection of specific diseases through longitudinal tracking. Plus (+) signs indicate date of diagnosis. CEACAM5 was a persistent outlier in prediagnosis samples for two metastatic cancer cases and exhibited rapid change to extreme values for the third (breast cancer). (From Magis et al. Scientific Reports 2020, Figure 1a)
diseases (Fig. 5). Specifically, we examined the trajectory of 1196 proteins in plasma collected months to years prior to ten independent cancer diagnoses [14]. For three individuals ultimately diagnosed with metastatic breast, lung, or pancreatic cancers, CEACAM5 was a persistent and rising longitudinal outlier as early as 26 months before diagnosis in these individuals. CALCA, a known biomarker for medullary thyroid cancer, was hypersecreted in a patient with metastatic pancreatic cancer at least 16 months before diagnosis. And ERBB2 levels spiked upward between 10 and 4 months before the breast cancer diagnosis. All these individuals were phenotypically healthy at the time when such biomarker signals were present. Arivale had 167 individuals with recognized wellness to disease transitions, including 35 with cancers diagnosed while in the study [14]. The ongoing analysis of the dense dynamic data clouds that precede these additional disease transitions may yield early-stage disease-perturbed biological networks that can be targeted with diet or medications at a reversible stage. Looking ahead, longitudinal multiomic phenotyping for participants in randomized clinical trials may help explain or even predict why some patients respond and others do not, one of the most important challenges in drug discovery and in clinical medicine.
324
Gilbert S. Omenn et al.
3.3 Characterizing the Intestinal Microbiome and Its Metabolites
Stool samples to assay the “gut” microbiome based on sequencing the 16S ribosomal RNA of the bacteria were part of the quarterly sampling protocol for the Pioneer 100 and (less frequently) Arivale study populations [19]. The microbiome has become a fertile area for research, with an enormous literature providing evidence for influence on many organ functions and disease risks, including obesity, inflammatory bowel disease, diabetes, and depression. There are two widely deployed variables that capture a lot of information about the diversity of microbial genera and the composition of the microbiome. The alpha-diversity metric captures diversity in a single sample, comprising both number of taxa (richness) and evenness of their abundances (also known as Shannon entropy). The beta-diversity metric captures dissimilarity between communities or individual samples, or two environmental conditions (often using the Bray-Curtis method) [19, 20]. With multiple methods, Levy et al. [19] correlated the relative abundances of Bacteroides and Prevotella in the Arivale population with certain plasma metabolites and lab values, such as omega-6 fatty acids, carnitine, and thyroid hormone. Primary dimensions in distance-based redundancy analysis of clinical chemistries explained 18% of the variance in bacterial community composition (betadiversity). The Bacteroides/Prevotella dichotomy was associated with inflammation and with dietary markers for high-fat, highsodium, “westernized” diets. The stable high-Bacteroides and high-Prevotella states can be treated as basins or attractors in an energy landscape representing microbiota composition and producing sustained robustness [21]. Microbe–microbe and microbe–host interactions can prevent transition to an alternate state, unless one genus is depleted so that the other can invade and establish itself [19]. Such drastic changes occur in the gut microbiome due to antimicrobial therapies and/or aggressive infections, such as with Clostridium difficile. In patients with severely abnormal gut microbiome, a dramatic therapy is administration of a “microbial fecal transplant” with probiotic organisms [22]. Generally, dietary changes are insufficient to initiate such transitions, unless preceded by antimicrobial treatment. Long-term high-fiber diets seem to support Prevotella dominance. Unlike the ecological barrier with low permissivity between Bacteroides-rich and Prevotella-rich areas, there was no such barrier between Bacteroides and Firmicutes. Levy et al. did identify a subpopulation of individuals with reduced gut microbial diversity, increased relative abundance of the genus Prevotella, and reduced levels of genus Bacteroides.
3.4 Linking the Blood Metabolome to the Gut Microbiome
The microbiome component of our bodies is active in digestion and absorption of dietary constituents; it contributes its own metabolites to those measured in the circulation from endogenous biochemical pathways and secretion into the plasma. Wilmanski et al. [16] used data for >1000 blood analytes from clinical labs,
From Systems Medicine to Scientific Wellness
325
proteomics, and metabolomics in 399 Arivale participants with matching microbiomes taken within a specified time window near the blood draw. With LASSO (least absolute shrinkage and selection operator) models, about 45% of the variance in Shannon alphadiversity in the gut was explained by a set of 40 plasma metabolites, of which 13 were of microbial origin. The strongest predictors were cometabolites, either synthesized by the host and then metabolized by the microbiome (bile acids) or vice versa (hippurate). Several biomarkers were linked with cardiovascular disease, kidney function, or diabetes mellitus type 2. Those with BMI 35 showed perturbations of the host metabolite/gut alpha-diversity associations and had increases in perfluoroalkyl substances (PFAS) from environmental exposures or in C-reactive protein and IL-6, markers of inflammation. The predictive findings were largely replicated in a separate Arivale validation subcohort of 540 participants. Clinical labs and proteins were not predictive. There is likely an optimal range for alpha-diversity, around which the normal intestine shows resistance to change from medicines or mild infections and hence resilience in sustaining the diversity. All of these findings could be useful in developing tests of wellness in the gut microbiome itself and in the human host. Clearly, blood metabolites are much easier to sample and analyze than the microbiome. Indeed, the metabolites in blood may be more informative about health than the microbial species themselves in many cases. The metabolite trimethylamine-N-oxide (TMAO), a biomarker for atherosclerosis, connects many of these compartments, including kidney markers, CVD-related proteins, carnitine and choline metabolites from the gut microbiota, and dietary intake of seafood, red meats, eggs, and dairy, leading to promotion of atherosclerosis [17]. 3.5 Comparing Biological Age with Chronological Age
Age is the most significant risk factor for most common chronic diseases. Aging-related disease risks can be mitigated through lifestyle, environmental, and pharmacological interventions. We would like to capture in wellness not only absence of diseases but also resilience to future diseases, satisfaction with one’s well-being, and energy for activities that enhance health and enrich a person’s life, informed by the N ¼ 1 personal dense, dynamic data (PD3) clouds and actionable metrics [15]. The Klemera-Doubal algorithm [23] was applied to longitudinal data from our whole genome sequencing, single-nucleotide polymorphisms, proteomic, metabolomic, and clinical laboratory assays (900 total biomarkers) from each of 3558 participants (mean age 47.5 years, 59% female, mean BMI 27.7, current smokers 5.0%) in the Arivale Wellness Program. From these molecular and physiological measurements, biological age (BA) was calculated [15]. BA was elevated compared with chronological age (CA) in the presence of about 40 chronic diseases. There was a significantly lower rate of increase in BA than the expected linear 1.0 year per
326
Gilbert S. Omenn et al.
Fig. 6 There is a high correlation between multiomics biomarkers of biological age and chronological age (years since birth), with Pearson correlation coefficient r ¼ 0.78. Deviations of the biological age from the chronological age reflect healthier than average (lower biological age) or less healthy (higher biological age) status of the individuals phenotyped. (Based upon analyses and Fig. 5 reported by Earls et al. 2019 and analyses by Wilmanski et al. Nature Biotechnology 2019)
calendar year for individuals participating in the wellness program (Fig. 6). BA is modifiable; a lower BA relative to CA is proposed as a sign of healthy aging, effective coaching, and/or healthful behaviors. Remarkably, BA decreased by 0.16 year for each year of participation in the wellness program in what was already a rather healthy population, with women doing even better than men. Participants who entered with BA 5 or more years higher than CA averaged 1.0 year of decline for each year in the program, while those whose BA at baseline was at least 5 years or more less than CA maintained their youthful BA over time. Earls et al. proposed BA as an overall metric for scientific wellness and noted that previous publications had associated higher BA with poor balance, physical weakness, declining cognitive performance, cardiovascular risk, frailty indices, and extrinsic epigenetic age [24]. ΔAge (BA-CA) (mean 0.7 with SD 9.3 years) makes a negative number a healthy sign (younger than
From Systems Medicine to Scientific Wellness
327
chronological age), whereas, as noted above, prevalent diseases increased BA compared with CA. Estimates varied with the type of biomarkers, making an integrated metric desirable, unless the focus is on specific organ systems. After correcting for multiple comparisons, obesity, high blood pressure, lung infection, type 2 diabetes, and breast cancer were associated with increased ΔAge. No diseases were associated with a decrease in ΔAge. Measures of metabolic health, inflammation, and toxin bioaccumulation (lead, mercury, PFOS) were strong predictors of higher BA. A 1 SD increase in HbA1c corresponded to a 4-year increase in BA. Type 2 diabetes had the highest ΔAge (+6 years), consistent with reports of 5–9 years shortened life expectancy. From further microbiome studies of Arivale, MrOS (Osteoporotic Fractures in Men), and American Gut Project cohorts, Wilmanski et al. [18] presented findings related to healthy aging and survival. With 9000 monitored individuals, they identified microbially produced amino acids in the circulation; tyrosine, phenylalanine, and tryptophan metabolites were particularly informative. They correlated depletion of core genera, especially Bacteroides, and broad beta-diversity dissimilarity or Bray-Curtis uniqueness scores with better health and longer survival. This pattern matches findings in centenarians [25]. All seven metabolites in the circulation associated with genus-level uniqueness were of microbial origin. Thus, healthy aging leads to two important changes in the gut microbiome—individually unique diversification of the species and loss of species dominant in younger individuals. Unhealthy aging leads to neither of these changes. This raises interesting possibilities for facilitating healthy aging. Among a subgroup of 80 year-olds followed for 4 years, healthy individuals continued to show drift toward unique composition, while in less healthy individuals drift was absent, with depletion of Bacteroides and lower survival [18]. Another approach to biological aging is studying the phenomenon of shortening of telomere length of chromosomes with aging and in cancer progression. The Telomeres Mendelian Randomization Collaboration reported from a meta-analysis of GWAS datasets [26] that SNPs associated with telomere length were associated with higher risk of many cancers (glioma, serous low-malignant ovarian, lung adeno, neuroblastoma, bladder, melanoma, testicular, kidney, and endometrial) and lower risks of noncancer diagnoses (coronary heart disease, abdominal aortic aneurysm, celiac disease, and interstitial lung disease), with no associations with psychiatric, autoimmune, inflammatory, or diabetes.
328
4
Gilbert S. Omenn et al.
Parallel Cohort Studies in Sweden and in California The Swedish SciLifeLab/Swedish CArdioPulmonary bioImage Study (SCAPIS) Wellness Profiling (S3WP) program [27] conducted a 2-year study of 101 healthy adults (age 50–65) with six time points for blood molecular profiles of Olink-based proteomics for plasma proteins (n ¼ 794), transcriptomics (n ¼ 11,976), lipidomics (n ¼ 169), metabolomics (n ¼ 413), autoantibodies (n ¼ 318), and immune cell profiling, plus gut microbiota (1465 operational taxonomic units), medical imaging, and routine clinical chemistries. The intraindividual baseline variation over time was low, but there was high variation between individuals across different molecular readouts using a Uniform Manifold Approximation and Projection (UMAP) technique. Each individual has a unique, stable plasma protein profile and strong connections between the blood proteome and the clinical chemistry parameters, supporting an individual-based definition of health, as well as sex-related differences in endocrine, metabolic, and muscle analytes. As in the Arivale cohort, the liver marker GGT was associated with IL-6, C-Reactive Protein, and inflammation. Schu¨ssler-Fiorenza Rose et al. [28] at Stanford combined omics and wearable monitors for deep longitudinal profiling in 109 participants over a median of 2.8 years (up to 8 years). They identified 67 clinically actionable health-related findings and physiological pathways associated with metabolic, cardiovascular, and oncologic pathophysiology. They focused especially on diabetes and prediabetes; besides metabolic and inflammatory analytes as drivers, they noted the roles of weight gain and decreased Shannon diversity in the gut microbiome. They developed models for predicting insulin resistance from omics measurements and recommended dietary changes and exercise in response. One person developed a large retroperitoneal B-cell lymphoma; longitudinal data revealed a precipitous drop in Shannon diversity and rise in CXCL9 during the 6–12 months preceding the diagnosis. This finding may be similar to the detection of markers of cancers years prior to diagnosis that emerged from the Arivale population.
5
Translating Scientific Wellness to Much Larger Populations In a commentary about the original Price et al. (2017) paper [4], Butte highlighted six challenges for widespread incorporation into health care systems and medical practice [29]. Specifically, these include (1) laboratory variation in the many assays; (2) the likelihood that well-informed clinicians are already considering individual variation in risk and response; (3) the potentially low incremental benefit beyond major genetic variants or blood analytes and broad attention to exercise, HbA1c, sleep, and optimal weight;
From Systems Medicine to Scientific Wellness
329
(4) variable timing (within a year versus rest of one’s lifetime) for appearance of predicted disease and therefore benefits from reducing risks; (5) universal vs. subpopulation-related guidance; and (6) price and out-of-pocket costs of testing and coaching related to cost-effectiveness for health care systems, employers, or population groups. Additional comments have been published by Diamandis [30] and by Vogt et al. [31]. We addressed various comments in our published response [11]. Some of these considerations might be informed also by tracking uses of the U.K. BioBank [500,000 individuals [32]] or the U.S. Precision Medicine AllofUS population [one million persons being recruited] cohorts. 5.1 Proposal for a Human Genome/ Phenome Project
The Institute for Systems Biology, informed by the Pioneer 100 and Arivale experiences, is proposing a project to carry out a genome/phenome analysis of one million patients over the next 10 years in conjunction with a very large community hospital system. This project is termed “Beyond the Human Genome.” We aim to provide the infrastructure to pursue a data-driven approach to identify the health trajectory of each individual to optimize her/his wellness and avoid disease. This Human Genome/Phenome Project differs from the original Human Genome Project in two major features: (1) moving from the analysis of one genome to one million, and (2) identifying the health and wellness trajectories of each individual with longitudinal phenome analyses integrated with the whole genome analysis. This Project seeks both federal and private sector funding. Our clinical partner, the Guardian Network, includes seven major health care systems, 120 ospitals, and 30 million unique patients in 13 states in the South and Southeast. The idea is to deliver to patients through their physicians the actionable results emerging from their individual data clouds for enhancing wellness and averting disease. For each patient we will combine genome/phenome data, socioenvironmental data on determinants of health, and clinical electronic health records in a cloud to analyze, integrate, visualize, and model health status. Over time this study will transform clinical care through the generation of thousands of new actionable possibilities for increasing wellness and reversing disease. We expect that such a powerful data ecosystem will attract partners from across the entire health care spectrum to analyze these data in the context of their own interests. Two partners have made major commitments to this program, Deloitte and Google. We have initiated high data dimensional clinical trials on Alzheimer disease, type 2 diabetes, healthy aging, acute myelogenous leukemia, several cancers, COVID 19, and long COVID 19 (Fig. 7). The total effort will examine the impact of deep genotyping and phenotyping on individuals and on patient populations over a wide range of interventions and interrelationships.
330
Gilbert S. Omenn et al.
Fig. 7 Observational studies and clinical trials using deep phenotyping for genome/phenome analyses launched within the Providence St. Joseph Healthcare System (PSJHS). (From Hood presentation at Precision Medicine World Conference, January 2021, Santa Clara, California)
5.2
Brain Health
The largest ISB undertaking to date is for Brain Health and Mild Cognitive Decline, seeking to sustain cognitive performance and defer or prevent progression to the many forms of Alzheimer disease (AD), which has become a global scourge within the larger category of dementias. The model proposed by Bredesen [33] presumes that AD is multifactorial and highly heterogeneous in causation, including potential subtypes tied to inflammation, diabetes, heavy metal toxicity, black mold exposures, and other determinants or influences that may themselves be reversible. AD may be viewed as a systemic disorder mediated by innate immune reactivity and inflammation. The two most established facts about AD are that (1) carrying the ApoE allele E4 in about 10–15% of the population highly predisposes those individuals to AD (prevalence of AD diagnosis at age 65 rises from about 4–5% if there is no E4 to 9% at age 65 if an individual has one E4 allele, and to 65% if both alleles are E4), and that (2) >400 clinical trials of therapeutic drug candidates, mostly tied to the beta-amyloid causal hypothesis, have all failed. The latest is the controversial clinical and regulatory assessment of the drug aducanumab (see Hastings Center Report: https://www.thehastingscenter.org/would-i-giveaducanumab-to-my-mother/)). ApoE4 directly interacts with amyloid and tau proteins and modifies sirtuin ratios [34]. Published studies [35, 36] report positive effects on mild cognitive decline of healthy lifestyles. Brain exercises (e.g., Posit Health Brain HiQ), more sleep, more social interactions, and dual tasking may all be helpful, especially if cognitive decline is still