Clinical Research Informatics [3 ed.] 3031271726, 9783031271724

This extensively revised new edition comprehensively reviews the rise of clinical research informatics (CRI). It enables

186 45 20MB

English Pages 518 [519] Year 2023

Table of contents :
Contents
1: Introduction to Clinical Research Informatics
Overview
Contexts and Attempts to Define Clinical Research Informatics
Perspective, Objectives, and Scope
Organization of the Book
Conclusion
References
Part I: Foundations of Clinical Research Informatics
2: From Notations to Data: The Digital Transformation of Clinical Research
Historical Perspective
Analog Signal Processing
Digital Signal Processing
The Digitalization of Biomedical Data
Dimensions of Complexity
Computing Capacity and Information Processing
Computational Power
Network Capacity
Local Storage
Data Storage
Data Density
Design Complexity
Analytic Sophistication
The Emergence of Big Science
Evolution of Astronomy and Physics
Biology and Medicine as a Socially Interdependent Process
The Social Transformation of Clinical Research
Standards
Comparable and Consistent Information
Interoperable Systems and Constructs
3: Methodological Foundations of Clinical Research
The Development of Pharmaceuticals: An Overview
Conceptual Framework and Classification of Biomedical Studies
Variability of Biological Phenomena
Biomedical Studies: Definitions and Classification
Observational or Epidemiological Studies
Experimental or Interventional Studies
Minimal Intervention Studies
The Logical Approach to Defining the Outcome of a Clinical Trial
Defining the Treatment Effect: From Measurement to Signal
Defining the Study Sample
Defining the Study Treatments
Superiority Versus Non-Inferiority
Experimental Designs
Definitions and Basic Concepts
Before-After Comparisons in a Single Treatment Group
Antidotes Against Bias: Randomization, Blinding, and a Priori Definition of Analysis
Parallel Group and Crossover Designs
Parallel Group Designs
Crossover Designs
Variants of Parallel Group and Crossover Designs
Innovative Approaches to Drug Development
References
4: The Clinical Research Environment
Overview
Clinical Research Processes, Actors, and Goals
Common Clinical Research Processes
Identifying Potential Study Participants
Screening and Enrolling Participants in a Clinical Study
Scheduling and Tracking Study-Related Participant Events
Executing Study Encounters and Associated Data Collection Tasks
Ensuring the Quality of Study Data
Regulatory and Sponsor Reporting and Administrative Tracking/Compliance
Budgeting and Fiscal Reconciliation
Human Subjects Protection Reporting and Monitoring
Common Tasks and Barriers to Successful Study Completion
Clinical Research Stakeholders
Patients and Advocacy Organizations
Academic Health Centers
Clinical or Contract Research Organizations
Sponsoring Organization
Federal Regulatory Agencies
Healthcare and Clinical Research Information Systems Vendors
Other Clinical Research Actors
Common Clinical Research Settings
Common Clinical Research Goals
A Framework for Data and Information Management Requirements in Clinical Research
Clinical Research Workflow and Communications
Workflow Challenges
Paper-Based Information Management Practices
Complex Technical and Communications Processes
Interruptions
Single Point of Information Exchange
Cognitive Complexity
Emergent Trends in Clinical Research
Precision or Personalized Medicine
Learning Healthcare Systems and Evidence Generating Medicine
Real-World Data and Real-World Evidence (RWD and RWE)
Bridging Public Health, Epidemiology, and Clinical Research
Conclusion
References
5: Next Generation Biorepository Informatics: Supporting Genomics, Imaging, and Innovations in Spatial Biology
Introduction
Informatics Considerations for a Next-Generation Biorepository
Federation and Support for Federated Queries and Shared Ontology
Biorepositories Best Practices Guidelines
Biorepository Informatics Landscape
Standards Considerations for a Next-Generation Biorepository
Biospecimen Preservation Standards: ISO/TC 212
Biorepository Testing and Calibration Standard—ISO/IEC 17025
NAACCR Cancer Patient Standard Annotations
Social Determinants of Health
Significance, Relevance, and Challenges of Next-Gen Biorepositories
The Human Cell Atlas (HCA)
The Human BioMolecular Atlas (HuBMAP)
The Human Tumor Atlas Network (HTAN)
The Cellular Senescence Network (SenNet)
Multiplex Technology for Biorepositories and Cell Atlases and “Spatial Biology”
Conclusion
References
6: Study Protocol Representation
Overview
The Study Protocol: Core Essence of a Clinical Research Study
The Study Protocol Enabled by Clinical Research Informatics
Current Inefficiencies in Research Protocol Informatics
Benefits of the Computable Study Protocol
Capturing the Complete Study Plan in Computable Form
Providing Decision Support During Study Conduct
Facilitating Timely and Accurate Data Capture and Storage
Supporting Appropriate Statistical Analysis and Reporting
Facilitating Appropriate Interpretation and Application of Results
Promoting Reuse of Study Data and Artifacts
Computability and Standardization Requirements
Protocol Representation Standards
Standards for Model Representation
HL7 Reference Information Model (RIM) and Regulated Clinical Research Information Model (RCRIM)
The Clinical Data Interchange Standards Consortium (CDISC) Protocol Representation Model (PRM)
Standard Protocol Reporting Initiatives
Biomedical Research Integrated Domain Group (BRIDG)
Ontology of Clinical Research
Other Protocol Modeling Approaches
Eligibility Criteria Representation Standards
Examples of Computable Protocol-Driven Research Across the Study Life Cycle
Improving Study Design
Improving Clinical Trial Efficiencies
Improving Applications to Care and Research
COVID-19 and the Computable Protocol
The Protocol Model-Driven Future
References
7: Clinical Research Information Systems
CRIS Vendor Models
Why Have Clinical Research Information Systems Evolved?
The Concept of a Protocol Is Fundamental to CRISs
CRISs Implement User Roles That Are Specific to Research Designs
Supporting Differential Access to Individual Studies
Representing Experimental Designs
The Scope of a CRIS May Cross Institutional or National Boundaries
Certain Low-Risk Clinical Studies May Not Store Personal Health Information
Workflow in Clinical Research Settings Is Mostly Driven by the Study Calendar
Time Windows Associated with Events
The Event-CRF Cross-Table
Clinical Research Subjects Are Not Typical “Patients”
CRISs Often Need to Support Real-Time Self-Reporting of Subject Data
Clinical Research Data Capture Is More Structured Than in Patient Care
CRIS Electronic Data Capture Needs to Be Robust and Flexible and Efficient to Setup
Use of Data Libraries
Data Entry in Clinical Research May Not Always Be Performed in Real Time: Quality Control Is Critical
CRIS-Related Processes During Different Stages of a Study
Study Planning and Protocol Authoring
Recruitment and Eligibility Determination
Protocol Management and Study Conduct
Patient Monitoring and Safety
Analysis and Reporting
Miscellaneous Issues
Validation and Certification
Standards
Pragmatic Clinical Trials: Use of EHRs Instead of CRISs
Interoperation Between CRISs and Non-EHR Software
Concluding Remarks
References
8: Public Policy Issues in Clinical Research Informatics
Introduction and the Role of Public Policy in Clinical Research Informatics (CRI)
Foundations of Clinical Research Policy
Foundational Federal Legislation
Food, Drug, and Cosmetic Act of 1938
Public Health Services Act of 1944
Core Regulations and Guidance for CRI
Common Rule
Common Rule Revisions
Food and Drugs Regulation and Guidance
HIPAA Privacy Rule and Research
Regulatory Science and the Role of Informatics
Regulatory Science as a Driver of Informatics at the FDA
Real-World Evidence
NIH as a Driver of Informatics Through Public Policy
Twenty-First Century Cures Act
Data Sharing Policies
Emerging Policy Trends in CRI
References
Part II: Enabling Frameworks and Processes and Tools
9: Data Sharing and Reuse of Health Data for Research
Introduction
Relevant Concepts and Terms
eSource Data (Electronic Source Data)
Traceability
Interoperability and Semantic Interoperability
Data Standards
FHIR
Common Data Model
Electronic Case Report Form (eCRF)
Secondary Use of Data
Benefits of Data Sharing and Reusing Health Data for Research
Requirements for the Use of eSource for Regulated Research
Technical Considerations for Reuse of Health Data for Research
Retrieve Form for Data Capture (RFD)
Common Data Model Harmonization (CDMH)
HL7 FHIR Accelerators
HL7 FHIR Accelerators Focused on the Use of Health Data for Research
Standards-Based Healthcare Research Networks and Collaborative Projects
IMI EHR4CR/I~HD/TriNetX
PCORI and PCORNet
N3C
Elligo ResearchConnect
General Considerations for Implementing eSource for Reuse of Health Data
Best Practices and Methods of Data Sharing for Research
Planning
Adoption and Implementation of Data Standards from the Start
Streamlining Processes
Role of Research in Learning Health Systems and LHS Core Values
Conclusion
Appendix
Examples of Collaborations, Initiatives, Models, and Tools Related to Data Sharing in Clinical Research
References
10: Data Quality in Clinical Research
Clinical Research Data Processes and Relationship to Data Quality
Example 1
Example 2
Example 3
Example 4
Errors Exist
Defining Data Quality
Systematic Data Quality Planning
Identifying and Defining Data to Be Collected
Defining Data Collection Specifications
Observing and Measuring Data
Recording Data
Processing Data
Analyzing Data, Reporting Status, and Reporting Results
Planning for Data Quality
Assessing the Quality of Secondary Use Data
Identification of Required Clinical Concepts
Definition of Data Elements
Exploration and Availability Assessment of Clinical Data Source
Extraction of Relevant Data Elements
Transformation and Curation of Extracted Clinical Data
Fitness-for-Use Assessment and Data Analysis
A Note on Data Bias
Infrastructure for Assuring Data Quality
Data Governance
Impact of Data Quality on Research Results
Summary
References
11: Research Data Governance, Roles, and Infrastructure
Introduction: A Conceptual Model
Research Data Governance
What Does Data Governance Govern?
Why Data Governance?: The Value of Data
Accuracy
Validity
Reliability
Timeliness
Relevance
Completeness
Ethical
Fairness and Bias
The Life Cycle of Data
Why Data Governance? From Data Protection to Research Ethics
Organizational Structure
The Rules in Action
Theories of Information Governance
Data Governance Organization and Roles
Implementation: An Effective Data Governance Structure
The Building Blocks of an Effective Strategy: Case Study
References
12: Informatics Approaches to Participant Recruitment
Typical Clinical Research Recruitment Workflows
Informatics Interventions in Clinical Research Recruitment
Computerized Clinical Trial Decision Support
Internet-Based Patient Matching Systems
Informatics Intervention in Clinical Research Recruitment Support
Data Repository-Based Clinical Trial Recruitment Support
Sociotechnical Challenges
Conclusion and Future Work
References
13: Patient Registries for Clinical Research
Definitions and Types of Registries
The Role of Registries in Evolving Research Contexts
The Role of Registries in Quality Improvement and Learning Health Systems
Using Clinical Data for Patient Registries
Interoperability and Data Standards
Data Exchange Standards
Content Standards
Coding Systems and Controlled Terminologies
Content Standards: Common Clinical Models and Data Elements
Entity Identifiers Including the Unique Device Identifier (UDI)
Clinical Phenotype Definitions
Outcome Measures
The Common Clinical Registry Framework Model and Other HL7 Standards
Limitations of Registries
Informatics Approaches for Building Registries
Registry Functions
The Future: Enabling the Creation and Use of Patient Registries for Biomedical and Health Services Research
References
Part III: Managing Different Types of Data Across Clinical and Translational Research
14: Best Practices for Research Data Management
Introduction
Purpose and Scope
Metadata and Provenance
Documentation
Training
Quality Control Checks
Issues and Corrective Action
Noncompliance, Protocol Violations, Unanticipated Events/Problems
Database Access
Version Control
Roles of Data Management
Data Management Plans
Definition
Purpose
Data Management Tools
Types of Tools
Electronic Data Capture (EDC)
Clinical Data Management Systems (CDMS)
Clinical Trials Management System (CTMS)
eConsent
Dashboard and Analytics Tools
Metadata Management and Dictionaries
Selection and Implementation
Data Acquisition
Data Flows
Definition
Purpose
When to Start Creating Data Flow Diagrams
Example Diagramming Tools
Important Considerations
Case Report Forms
Definition
Purpose
Developing Case Report Forms
Important Considerations
Self-Reported Patient Information
Definition
Purpose
Procedures
Usability Evaluation
Data Transfer Plan
Definition
Purpose
What?
Where?
How?
When?
Testing the Data Transfer Plan
Contingency and Mitigation Planning
Electronic Health Record Data
Definition
Purpose
Acquiring EHR Data
Computable Phenotypes
Variation in EHR Data
EHR Quality Checks
Important Considerations
Regulatory Considerations
Definitions
Purpose
Common Data Management Regulations
Implications for Clinical Data Managers
Data Processing
Definition
Purpose
Statistical Analysis Plan
Definition
Purpose
Important Considerations
Data Quality
Definition
Purpose
Programmed Edit Checks
Statistical Checks
Manual/Visual Checks
Data Integration
Data Reconciliation
Important Considerations
Reports
Definition
Best Practices
Development Process
Important Considerations
Vendor Management
Types of Vendors
Vendor Examples
Selecting a Vendor
Intellectual Property
Chain of Custody
Conclusion
References
15: Patient-Reported Outcome Data
Characteristics of Patient-Reported Outcomes
Measurement Issues
Comparability of PROs Across Studies and Time
Reliability
Validity
Modes of Administration
Personal (Face-to-Face) Administration
Telephone Administration
Mailed Surveys
Web Surveys and Email Communication
Electronic Data Collection Devices/Systems (ePRO)
Voice Auditory Systems
Screen Text Devices
Desktop, Laptop, and Touch-Screen Tablet Computers
Audiovisual Computer-Assisted Self-Interviewing (A-CASI) Systems
Mobile Devices
Item and Scale Development
Modification of Existing PROs
Instrument Repositories
Item Banks
Patient-Centered Drug Development
Standardization and Integration into Clinical Information Systems
Conclusion
References
16: Molecular, Genetic, and Other Omics Data
The Molecular Basis of Life
Molecular Biology and Genomics Data
Sequence Analysis Data
Structure Analysis Data
Functional Analysis Data
Human Variation
Microbiome Data
Translating From the Molecular World to the Clinical World
Clinical Application of Omics Data
Integration of Molecular and Clinical Research Data
Molecular Data to Support Clinical Research
Application of Molecular Data to Disease
Mechanisms of Disease
Diagnostic Methods and Therapeutic Application Studies
Molecular Epidemiological Data
The Role of Microbiome in Disease
The Future of Molecular Data in Clinical Research
References
17: Clinical Trial Registries, Results Databases, and Research Data Repositories
Introduction
Rationale for Registration and Reporting
Trial Registration
Development of Trial Registration
Clinical Trial Registries
Standards, Policies, and Principles of Trial Registration
Timing
Quality of Clinical Trial Registries
Evolution and Spin-Off
Creation and Management of a Trial Registry: The User Perspective
Design of Trial Registries
International Standards
Data Fields
First-Level Fields
Second-Level Fields
Third-Level Fields
Trial Registry Features and Data Quality
Maintenance of Trial Registries
Clinical Trial Results Databases/Results Databases
Standards
Sharing of Clinical Trial Data, Research Data Repositories and Platforms
Anonymization Methods of Clinical Research Data
Managing Identity Disclosure Risk in Microdata
Other Risks in Microdata
The User Perspective of Registration-Results-Data Sharing Process
Evolution and Future Directions of Sharing of Trials Results
Conclusion
References
Part IV: Knowledge Representation and Discovery
18: Knowledge Representation and Ontologies
Ontology Development
Important Ontological Distinctions
Building Blocks: Top-Level Ontologies and Relation Ontology
Formalisms and Tools for Knowledge Representation
OBO Foundry and Other Harmonization Efforts
Ontologies of Particular Relevance to Clinical Research
Research Metadata Ontology
Ontology of Clinical Research
Ontology for Biomedical Investigations
Biomedical Research Integrated Domain Group (BRIDG) Model Ontology
Data Content Ontology
National Cancer Institute Thesaurus (NCIT)
SNOMED Clinical Terms (SNOMED CT)
Logical Observation Identifiers, Names, and Codes (LOINC)
RxNorm
International Classification of Disease (ICD)
Current Procedural Terminology (CPT)
Human Phenotype Ontology
Ontology Repositories
Unified Medical Language System (UMLS)
UMLS Knowledge Sources
UMLS Tooling
UMLS Applications
BioPortal
BioPortal Ontologies
BioPortal Tooling
BioPortal Applications
Approaches to Ontology Alignment in Ontology Repositories
Ontology in Action: Uses of Ontologies in Clinical Research
Research Workflow Management
Data Integration
Electronic Phenotyping
The Way Forward
References
19: Developing and Promoting Data Standards for Clinical Research
Clinical Research: Escalating Efficiencies with Data Standards
Clinical Research Standards Developers and Drivers and Stakeholders
Advancing Research by Fully Integrating with Health Systems: Relevance of Health Data Standards to Clinical Research
Types of Healthcare Standards
Data Exchange Standards: The Evolution of FHIR
FHIR
Data Preparation and Transformation: Terminology Binding
International Landscape and Coordination
Standards Influencers: Collaborative Initiatives Driving Efficiencies in Clinical Research
Standards Maintenance and Access
Conclusion
Appendix: Standards Developing Organizations and Standards
Organizations and Initiatives
US Government Organizations Developing and Naming Standards
Controlled Terminologies (Standards)
Resources
References
20: Nonhypothesis-Driven Research: Data Mining and Knowledge Discovery
Introduction
The Knowledge Discovery in Databases Process
KDD Pipelines
Data Selection
Preprocessing
Transformation
Data Mining
Artificial Neural Networks
Decision Trees
Support Vector Machines
k-Nearest Neighbor
Association Rules
Bayesian Methods
Unsupervised Machine Learning
Interpretation, Evaluation, and Generalizability
FAIRness in KDD
Applications of Knowledge Discovery and Data Mining in Clinical Research
Using Claims and Clinical Data for Temporal Predictions of Clinical Outcomes
Using Clinical Data for Drug Repurposing
Commonly Encountered Challenges in Data Mining
Rare Instances
Sources of Bias
Other Limitations
Infrastructure for Knowledge Discovery
Future Directions
Uncertainty Quantification
Federated Learning
Conclusion
References
21: Clinical Natural Language Processing in Secondary Use of EHR for Research
The Role of Clinical Natural Language Processing in the Secondary Use of EHR
Use Case 1: Information Retrieval for Eligibility Screening or Cohort Identification
Use Case 2: Information Extraction for Assembling Clinical Research Data Sets
Foundations of Clinical Natural Language Processing
Task Formulation
Corpus Annotation
Model Development
Symbolic Approach
Traditional Machine Learning
Deep Learning
Hybrid
Model Evaluation
Model Application
A Step-by-Step Case Demonstration
Task Formulation
Corpus Annotation
Model Development
Symbolic Approach
Deep Learning Approach
Model Evaluation
Clinical NLP Resources
An Overview of Clinical NLP Community Challenges
An Overview of Clinical NLP Systems and Toolkits
An Overview of Clinical NLP Systems
An Overview of Clinical NLP Toolkits and Packages
Challenges, Opportunities, and Future Directions
Reproducibility and Scientific Rigor
Multisite NLP Collaboration
Federated Learning and Evaluation
Conclusion
References
Part V: Evolving Models and New Opportunities for the Transformation of Clinical Research
22: Back to the Future: The Evolution of Pharmacovigilance in the Age of Digital Healthcare
Introduction
Background
Coasian Transactions: The Development and Evolution of PV/Drug Safety
Research Agenda for Modern Pharmacovigilance
Topic 1: The Operational Definition of an Adverse Event
Topic 2: Expanding and Formalizing the Data Model
Topic 3: Terminologies
Topic 4: Discovery/Curation of AEs
Topic 5: Delayed Toxicity and Complex Causal Assessments
Topic 6: Risk Profiling of the Individual
Topic 7: Emerging Data and Technologies for Pharmacovigilance
Interoperability of Healthcare Data
Alphafold 2
Transformer-Based Language Models and GPT3
The Future of Pharmacovigilance
Conclusion
References
23: Evolving Opportunities and Challenges for Patients in Clinical Research
Introduction
Patient Engagement
Efforts Supporting Patient Engagement
Strategies and Best Practices
Citizen Scientist
Information Behavior and Health Literacy
Information Fields
Health Literacy
Social Environment for Patients
The Role of Third Parties in Information Seeking
Self-Help and Advocacy
Evolving Medicine and Immersive Technologies in Clinical Research
Direct-to-Consumer Testing and Data Collection for Research
Crowdsourcing
Mobile Health (Digital Health)
Clinical Trial Involvement
Augmented and Virtual Reality
Consumers’ Relationships with Their Own Data
Conclusions
References
24: Apps in Clinical Research
Introduction
Operational Support
Recruitment and Participant Management
Data Capture
Participant Generated Data
Clinician Generated Data
Apps as the Intervention
Building Apps
Standards for App Development
SMART on FHIR
Content Standards
Development Tools and Hosting Platforms
Security, Sharing, and Privacy Considerations
Security Guidance
Open and Closed Data
The Ecosystem of Apps and Electronic Health Records
Shareability and Data Ownership
Future Directions
References
25: Future Directions in Clinical Research Informatics
Emergence of CRI Discipline Supporting Clinical and Translational Research
Initiatives, Policy, and Regulatory Trends in CRI
Role of CRI in Learning Health Systems: Data and Knowledge Management, Evidence Generation, and Quality Improvement
Multidisciplinary Collaboration is an Essential Feature of CRI
Challenges and Opportunities for CRI
Training and Workforce Needs
Conclusion
References
Index

Recommend Papers

Understanding Clinical Research

New York: McGraw Hill, 2013. — 272 p.Ideal for both researchers and healthcare providers, Understanding Clinical Researc

486 28 2MB Read more

International Clinical Sociology (Clinical Sociology: Research and Practice) 3030545830, 9783030545833

This new edition presents an updated version of the art and science of clinical sociologists around the world. Presentin

121 106 5MB Read more

Ethics and Regulation of Clinical Research 9780300163490

The use of human subjects in medical and scientific research has given rise to troubling ethical questions. How should h

160 77 26MB Read more

Integrative Clinical Research 9783030996307, 9783030996291, 3030996301

This book embraces a comprehensive range of research across several disciplines, providing insights and fresh perspectiv

112 69 3MB Read more

Perspectives in Business Informatics Research. 20th International Conference on Business Informatics Research, BIR 2021 Vienna, Austria, September 22–24, 2021 Proceedings 9783030872045, 9783030872052

222 44 19MB Read more

Perspectives in Business Informatics Research: 22nd International Conference on Business Informatics Research, BIR 2023, Ascoli Piceno, Italy, ... Notes in Business Information Processing) 3031431251, 9783031431258

This book constitutes the proceedings of the 22nd International Conference on Perspectives in Business Informatics Resea

116 15 22MB Read more

Clinical Informatics Literacy: 5000 Concepts That Every Informatician Should Know 9780128032060

Clinical Informatics Literacy: 5000 Concepts That Every Informatician Should Know is about all aspects of clinical infor

118 92 2MB Read more

Applied Clinical Informatics for Nurses(2019, 2nd Ed) [2 ed.] 9781284129175

323 64 7MB Read more

Understanding Clinical Research: An introduction [Team-IRA] 1914961269, 9781914961267

It is important for healthcare professionals to understand the basics of clinical research. This book offers a thorough

116 83 15MB Read more

Clinical Research Issues in Nursing [1 ed.] 9781617287404, 9781616689377

This book is a collection of work that demonstrates the needs and importance of addressing many under-explored issues in

145 19 2MB Read more

Clinical Research Informatics [3 ed.]
3031271726, 9783031271724

Author / Uploaded
Rachel L. Richesson
James E. Andrews
Kate Fultz Hollis

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Health Informatics

Rachel L. Richesson James E. Andrews Kate Fultz Hollis Editors

Clinical Research Informatics Third Edition

Health Informatics

This series is directed to healthcare professionals leading the transformation of healthcare by using information and knowledge. For over 20 years, Health Informatics has offered a broad range of titles: some address specific professions such as nursing, medicine, and health administration; others cover special areas of practice such as trauma and radiology; still other books in the series focus on interdisciplinary issues, such as the computer based patient record, electronic health records, and networked healthcare systems. Editors and authors, eminent experts in their fields, offer their accounts of innovations in health informatics. Increasingly, these accounts go beyond hardware and software to address the role of information in influencing the transformation of healthcare delivery systems around the world. The series also increasingly focuses on the users of the information and systems: the organizational, behavioral, and societal changes that accompany the diffusion of information technology in health services environments. Developments in healthcare delivery are constant; in recent years, bioinformatics has emerged as a new field in health informatics to support emerging and ongoing developments in molecular biology. At the same time, further evolution of the field of health informatics is reflected in the introduction of concepts at the macro or health systems delivery level with major national initiatives related to electronic health records (EHR), data standards, and public health informatics. These changes will continue to shape health services in the twenty-first century. By making full and creative use of the technology to tame data and to transform information, Health Informatics will foster the development and use of new knowledge in healthcare.

Rachel L. Richesson • James E. Andrews Kate Fultz Hollis Editors

Clinical Research Informatics Third Edition

Editors Rachel L. Richesson Learning Health Sciences University of Michigan School of Medicin Ann Arbor, MI, USA

James E. Andrews School of Information University of South Florida Tampa, FL, USA

Kate Fultz Hollis Medical Informatics and Clinical Epidemiology Oregon Health and Science University Portland, OR, USA

ISSN 1431-1917 ISSN 2197-3741 (electronic) Health Informatics ISBN 978-3-031-27172-4 ISBN 978-3-031-27173-1 (eBook) https://doi.org/10.1007/978-3-031-27173-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2012, 2019, 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Contents

1 Introduction to Clinical Research Informatics�� 1 Kate Fultz Hollis, Rachel L. Richesson, and James E. Andrews Part I Foundations of Clinical Research Informatics 2 From Notations to Data: The Digital Transformation of Clinical Research �� 15 Christopher G. Chute 3 Methodological Foundations of Clinical Research �� 23 Antonella Bacchieri and Giovanni Della Cioppa 4 The Clinical Research Environment�� 51 Philip R. O. Payne 5 Next Generation Biorepository Informatics: Supporting Genomics, Imaging, and Innovations in Spatial Biology�� 69 Chenyu Li, Rumana Rashid, Eugene M. Sadhu, Sandro Santagata, and Michael J. Becich 6 Study Protocol Representation �� 91 Joyce C. Niland, Julie Hom, and Susan Hmwe 7 Clinical Research Information Systems �� 111 Prakash M. Nadkarni 8 Public Policy Issues in Clinical Research Informatics�� 127 Jeffery R. L. Smith Part II Enabling Frameworks and Processes and Tools 9 Data Sharing and Reuse of Health Data for Research �� 147 Rebecca Daniels Kush and Amy Cramer 10 Data Quality in Clinical Research�� 169 Meredith Nahm Zozus, Michael G. Kahn, and Nicole G. Weiskopf 11 Research Data Governance, Roles, and Infrastructure�� 199 Anthony Solomonides v

vi

12 Informatics Approaches to Participant Recruitment �� 219 Chunhua Weng and Peter J. Embi 13 Patient Registries for Clinical Research�� 231 Rachel L. Richesson, Leon Rozenblit, Kendra Vehik, and James E. Tcheng Part III Managing Different Types of Data Across Clinical and Translational Research 14 Best Practices for Research Data Management�� 255 Anita Walden, Maryam Garza, and Luke Rasmussen 15 Patient-Reported Outcome Data�� 291 Robert O. Morgan, Kavita R. Sail, and Laura E. Witte 16 Molecular, Genetic, and Other Omics Data�� 309 Stephane M. Meystre, Ramkiran Gouripeddi, and Alexander V. Alekseyenko 17 Clinical Trial Registries, Results Databases, and Research Data Repositories �� 329 Karmela Krleža-Jerić, Mersiha Mahmić-Kaknjo, and Khaled El Emam Part IV Knowledge Representation and Discovery 18 Knowledge Representation and Ontologies�� 367 Kin Wah Fung and Olivier Bodenreider 19 Developing and Promoting Data Standards for Clinical Research�� 389 Rachel L. Richesson, Cecil O. Lynch, and W. Ed Hammond 20 Nonhypothesis-Driven Research: Data Mining and Knowledge Discovery �� 413 Mollie R. Cummins, Senthil K. Nachimuthu, Samir E. Abdelrahman, Julio C. Facelli, and Ramkiran Gouripeddi 21 Clinical Natural Language Processing in Secondary Use of EHR for Research�� 433 Sunyang Fu, Andrew Wen, and Hongfang Liu Part V Evolving Models and New Opportunities for the Transformation of Clinical Research 22 Back to the Future: The Evolution of Pharmacovigilance in the Age of Digital Healthcare�� 455 Michael A. Ibara and Rachel L. Richesson

Contents

Contents

vii

23 Evolving Opportunities and Challenges for Patients in Clinical Research �� 473 James E. Andrews, Christina Eldredge, Janelle Applequist, and J. David Johnson 24 Apps in Clinical Research �� 495 Brian Douthit and Rachel L. Richesson 25 Future Directions in Clinical Research Informatics �� 507 Peter J. Embi and Rachel L. Richesson Index�� 521

1

Introduction to Clinical Research Informatics Kate Fultz Hollis, Rachel L. Richesson, and James E. Andrews

Abstract

This chapter introduces in this third edition of Clinical Research Informatics the overview of important constructs and methods within the subdomain of clinical research informatics now. The chapter sets the tone and scope for the text, highlights important topics and themes, and describes the content and organization of chapters. Keywords

Clinical research informatics definition · CRI · Theorem of informatics · American Medical Informatics Association · Biomedical informatics

K. F. Hollis (*) Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA e-mail: [email protected] R. L. Richesson Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA e-mail: [email protected] J. E. Andrews School of Information, College of Arts and Sciences, University of South Florida, Tampa, FL, USA e-mail: [email protected]

Overview Welcome to the third edition of Clinical Research Informatics. It has been 10 years since co-editors Rachel Richesson and James Andrews published the first edition of this book, and while informatics foundations in clinical research are roughly the same, the tremendous change in technology brings us more data to use, advanced methods in data analysis, and increased public awareness and appreciation of the growing roles of technology in medicine and research. We have also seen increased public awareness of the importance of biomedical research and growing appreciation of the challenges and limitations of our current national research infrastructure. As we have done in past editions, we begin with definitions. Clinical research is the branch of medical science that investigates the safety and effectiveness of medications, devices, diagnostic products, and treatment regimens intended for human use in the prevention, diagnosis, treatment, or management of a disease. The documentation, representation, and exchange of information in clinical research are inherent to the very notion of research as a controlled and reproducible set of methods for scientific inquiry. Contemporary clinical research actually represents new application of statistics to medicine with the acceptance of randomized controlled clinical trials as the gold standard [1] only recognized in this last 70 years. Clinical research has been characterized as a discipline

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. L. Richesson et al. (eds.), Clinical Research Informatics, Health Informatics, https://doi.org/10.1007/978-3-031-27173-1_1

1

2

resting on three pillars of principle and practice related to control, mensuration, and analysis [2], though these can be more modernly interpreted as a triad of expertise in medicine, statistics, and logistics [3]. Clinical research informatics (CRI) is the application of informatics principles and technologies to support the spectrum of activities and business processes that represent clinical research. Informatics, broadly defined as the intersection of information and computer science with a health-related discipline, has a foundation that has drawn from many well-established, theory-based disciplines, including computer science, library and information science, cognitive science, psychology, and sociology. The fundamental theorem of informatics [4] states that humans plus information technology should function and perform better together than humans alone, and so informatics is a source for supportive technologies and tools that enhance—but not replace—unreservedly human processes. The US National Institutes of Health (NIH) offers a comprehensive and widely accepted definition for clinical research that includes a spectrum of populations, objectives, methods, and activities. Specifically, this broad definition states that “clinical research is… patient-oriented research conducted with human subjects (or on material of human origin that can be linked to an individual)” [5]. Under this definition, clinical research includes investigation of the mechanisms of human disease, therapeutic interventions, clinical trials, development of new technologies, epidemiology, behavioral studies, and outcomes and health services research. The challenges in clinical research—and, thus, the opportunities for informatics support— arise from many different objectives and requirements, including the need for optimal protocol design, regulatory compliance, improved patient recruitment and engagement, efficient protocol management, and data collection and acquisition; data storage, transfer, processing, and analysis; and, impeccable patient safety throughout. Regardless of clinical domain or study design, high-quality data collection and standard, for-

K. F. Hollis et al.

malized data representation are critical to the fundamental notion of research reproducibility. The need for clinical research informatics is stronger and more visible that ever before as we are emerging from a global pandemic caused by a novel and lethal virus. As the New England Journal of Medicine reported, in December 2019, a cluster of patients with pneumonia of unknown cause was linked to a seafood wholesale market in Wuhan, China. A previously unknown betacoronavirus was discovered through the use of unbiased sequencing in samples from patients with pneumonia [6]. Since that time, the virus identified as COVID-19 has claimed millions of lives, and the pandemic opened greater awareness of research, as there was a global push to develop and evaluate diagnostic tests, vaccines, and ultimately treatments. The gravity of the pandemic not only made progress faster than usual (the expeditious testing of the vaccine efficacy was unprecedented [7]) but also cast a bright light on the many challenges of clinical research and significant limitations of our infrastructure. The experience has demonstrated the essential role of clinical research informatics to support faster research in real-world settings as focus on pandemic vaccines and treatments was intense. In parallel, the FDA, in recognition that the current clinical research infrastructure and approach are not sufficient to address and inform the growing number of clinical questions and emerging conditions/therapeutics (and also recognizing the need for real-world evidence (RWE) to inform real-world practice), has called for the development of new methods for using real-world data (RWD) [8]. Clinical research investigations of other conditions were also impacted by the COVID-19 pandemic. Many clinical trials had to be modified and delayed according to COVID-19 protocols (following institutional guidelines and the Federal Drug Administration’s “Conduct of Clinical Trials of Medical Products During the COVID-19 Public Health Emergency” [9], but critical work did continue and has informed new methods adapting trials and analysis to cope with disruption and competing resources [10].

1 Introduction to Clinical Research Informatics

ontexts and Attempts to Define C Clinical Research Informatics

3

2009, the number of self-identified CRI practitioners continues to grow in academic medical centers, pharmaceutical companies, and health IT Original research is needed to create an evidence and innovation start-ups. The growing demand to base for information and communications tech- use EHR data to improve the efficiency, generalnologies that meaningfully address the business izability, and relevance of clincal research is drivneeds of research and also streamline, change, ing new models of patient engagement—both in and improve the business of research itself. CRI research and with their health data—resulting in has now become established: several books— tremendous growth in mobile applications, furincluding this one—are now available, CRI- ther broadening the landscape of CRI innovation focused working groups in the American Medical and practice. Informatics Association (AMIA) and HL7 are The scope and number of clinical and research quite active, and CRI is now a regular section in questions to be addressed by CRI have obviously the International Medical Informatics Association evolved over time and since our first two editions (IMIA) Yearbook each year. Standards and best of this book. A single professional or educational practices for research have started to emerge, as home for CRI, and as such a source to develop a are standards for education and training in the single consensus and more precise definition, is field [11]. still lacking at present and likely unachievable The comprehensive and definitive definition given the multidisciplinary and multinational and for CRI is presented by Embi and Payne (2009) multicultural scope of CRI activities. However, who characterize CRI as “the sub-domain of bio- the AMIA CRI Working Group remains a crucial medical informatics concerned with the develop- and dominate forum for CRI, providing a more ment, application, and evaluation of theories, detailed articulation of the role of the chief methods, and systems to optimize the design and research information officer (CRIO) [13]. What conduct of clinical research and the analysis, is important to note is that this is all reflective of interpretation, and dissemination of the informa- the bottom-up development of our field. For the tion generated” [12]. A more descriptive defini- annual survey of CRI for the IMIA Yearbook in tion is offered by AMIA, where CRI includes 2020, Anthony Solomonides wrote that many evaluation and modeling of clinical and transla- topics of interest today may also have been feational research workflow; social and behavioral tured 10 years ago, but some topics did not figure studies involving clinical research; designing at all in 2009. Among those new topics noted are optimal human–computer interaction models for artificial intelligence (AI), especially in the form clinical research applications; improving and of machine/deep learning, our understanding of evaluating information capture and data; flow in causal inference, and the shifting trend in the use clinical research; optimizing research site selec- of “real-world evidence” (RWE) often gathered tion, investigator, and subject recruitment; by networks using one or other of the several knowledge engineering and standards develop- “common data models” [14]. ment as applied to clinical research; facilitating The first references to what is now known as and improving research reporting to regulatory clinical research informatics go back to the 1960s agencies; and enhancing clinical and research and predict the inevitable use of computers to data mining, integration, and analysis. (See support data collection and analysis in research AMIA CRI Working Group: https://amia.org/ [15]. Christopher Chute (Chap. 2) goes into more community/working-groups/clinical-research- detail on historical changes in research in terms informatics.) The definition and illustrative activ- of volume and types of data in the physical sciities of CRI both emerged from in-person and ences. The use of clinical databases for research virtual meetings and interviews with self- inquiry was first established in the late 1960s, identified CRI practitioners within the AMIA and by the next decade, there were at least a organization. Since CRI was formally defined in handful of clinical information systems being

4

used for research. This history is well described by Morris F. Collen in a 1990 historical review [16]. In short course, it was clear that structured data entry and data standards would be a critical component of any computerized support or analysis system in research [17]. M. Scott Blois was the first to describe the complexities and dimensions of representing medical data in his seminal book [18]. Blois first recognized that systems could and should support more than queries about a single patient’s data but rather should be searchable to retrieve many patient records to support research and quality monitoring. The first applications focused on retrieval of clinical information to identify and understand patient subpopulations [19]. Others saw the potential for tapping these clinical databases in observational research and knowledge discovery; by the 1970s, cancer and tumor registries were well established, and cardiovascular disease registries emerged. For the first few decades, computers in clinical research were indeed centered around maintaining a database focused on collecting and querying clinical data. The advent of patient eligibility screening and trial recruitment systems in the 1990s represents the introduction of computers to support clinical research processes [20– 22]. The regulated nature of human trials, especially since the formal inquiry and establishment of standards for the field in the 1970s, created a critical need for documentation of methods and processes, as well as analyses and findings, and we saw systems emerge in the late 1980s that began to address the conduct of studies. The capabilities of these systems have improved, and their use has proliferated. Now, clinical research management systems of various types support the collection of data and the coordination of research tasks. The primary functionality of commercial applications today is essentially concerned with the delivery of valid and accurate data in conformity with the Good Clinical Practice (GCP) guidelines [23], and in most cases, these systems are still not well integrated with patient care systems. The enormity of data generated from new diagnostic and measurement technologies, the increasing ability to collect data rapidly from

K. F. Hollis et al.

patients or external data sources, and the scope and scale of today’s research enterprises have led to a bewildering array and amount of data and information. Information technology has contributed to the information management problems by generating more data and information, but the techniques and principles derived from informatics promise to purposively utilize IT to address the issues of data collection, information management, process and protocol management, communication, and knowledge discovery, as well as show promise to improve research efficiencies, increase our knowledge of therapeutic evaluation, and impact human health and the global economy. Still, in time, these tools will need to be evaluated via more formal means and evolve or be replaced by the next generation of tools and methods. As original informatics research and proper system evaluations—including randomized trials of various systems with outcome measures related to research efficiency, quality, and patient safety—are conducted, published, and scrutinized, evidence-based research systems, practices, and tools can be deployed and subsequently increase our ability to generate the knowledge and clinical evidence needed to address pressing and emergent public health problems.

Perspective, Objectives, and Scope This book comes during a very challenging time for CRI and biomedical informatics. Since the first and second editions of this text, we have seen new legislation (twenty-first Century Cures Act) and new programs including the NIH’s All of Us Research Program that show promise to leverage CRI to impact human health in unprecedented ways. We also have the National COVID Cohort Collaborative (N3C), a consortium with collaborators who contribute and use COVID-19 clinical data to answer critical research questions to address the pandemic (National Center for Advancing Translational Sciences https://ncats. nih.gov/n3c/about/program-faq). There is a growing interest around RWE for treatments and implementing—and generating—evidence in

1 Introduction to Clinical Research Informatics

Learning Health Systems (such as that defined by Agency for Healthcare Research and Quality): Learning Health Systems | Agency for Healthcare Research and Quality https://www.ahrq.gov/professionals/systems/learning-h ealth-s ystems/ index.html. And the rapid development and adoption of the HL7 FHIR standard brings unprecedented potential and opportunity for sharing and use of clinical data in research. This collection of chapters is meant to represent the current knowledge in the field with an eye toward the future, and we have several new chapters including Biorepositories, Best Practices for Research Data Management, and Apps in Clinical Research. In this book, we offer foundational coverage of key areas, concepts, constructs, and approaches of medical informatics as applied to clinical research activities, in both current settings and in light of emerging policies, so as to serve as but one contribution to the discourse going on within the field now. We do not presume to capture the entirety of the field (can any single text truly articulate the full spectrum of a discipline?), but rather an array of both foundational and emerging areas that will impact clinical research and, so, CRI. Our aim is not to provide an introductory book on informatics, as is best done by Shortliffe, Cimino, and Chiang, in their foundational biomedical informatics text [24] or Hersh [25]. Rather, this text is targeted toward those who possess a basic understanding of the health informatics field and who would like to apply informatics principles to clinical research problems and processes. Many of these theories and principles presented in this text are, naturally, common across biomedical informatics and not unique to CRI; however, the authors have put these firmly in the context of how these apply to clinical research. The excitement of such a dynamic area is fueled by the significant challenges the field must face. At this stage, there is still no consistent or formal reference model (e.g., curriculum models supporting graduate programs or professional certification) that represents the core knowledge and guides inquiry. Several informatics graduate programs across the country offer courses in clinical research informatics (Oregon Health and

5

Science University, Columbia University, Duke University to name a few) and anecdotal reports from educators who have told us that this is the text they use. In fact, the impetus for creating Clinical Research Informatics came, in part, from requests from instructors for a text that offers students and others a range of knowledge and best practices from some of the top CRI scholars and practitioners. As the discipline of CRI grows, this book continues to stand out as the primary, authoritative text on the field. In this text, we attempt to cover the range of knowledge topics and practices for relevant to CRI as well as to identify several broad themes that undoubtedly will influence the future evolution of CRI. In compiling works for this book, we were well aware that our selection of topics and placement of authors, while not arbitrary, was inevitably subjective. Others in CRI might or might not agree with our conceptualization of the discipline. Our goal is not to restrict CRI to the framework presented here; rather, that this book will stir a discourse as this subdiscipline continues to evolve. In a very loose sense, this text represents a bottom-up approach to organizing this field. There is not one exclusive professional venue for clinical research informatics; therefore, no one single place to scan for relevant topics. Numerous audiences, researchers, and stakeholders have emerged from the clinical research side (professional practice organizations, academic medical centers, the FDA and NIH sponsors, research societies like the Society for Clinical Trials, and various clinical research professional and accrediting organizations such as the Association of Clinical Research Professionals) and also from the informatics side (AMIA). Every year since 2011, Dr. Peter Embi conducts a systematic review of innovation and science of CRI and presents it to AMIA [26]. The authors for each of the chapters were selected for their demonstrated expertise in the field. We asked authors to attempt to address multiple perspectives, to paint major issues, and, when possible, to include international perspectives. Each of the outstanding authors succeeded, in our opinion, in presenting an overview of principles, objectives, methods, challenges, and

6

issues that currently define the topic area and that are expected to persist over the next decade. The individual voice of each author distinguishes one chapter from the other. Although some topics can be quite discreet, others overlap significantly at certain levels, and therefore, some topcis are included across multiple chapters.

Organization of the Book This new edition has evolved over the previous editions and has been significantly reorganized. New sections have been created (increased from three to five) to offer a more logical organization expansion of related topics and chapters. We organize the chapters under unifying themes at a high level using five broad sections: (1) the foundations of clinical research informatics; (2) enabling frameworks, processes and tools; (3) managing different types of data across clinical and translational research; (4) knowledge representation and data-driven discovery in CRI, a section that represents the future of clinical research, health, and clinical research informatics; and (5) evolving models and new opportunities for the transformation of clinical research. A new feature of this edition is the addition of “Learning Objectives” at the beginning of each chapter that highlight key concepts to help guide readers. Part I: Foundations of Clinical Research Informatics Chapter 2: From Notations to Data: The Digital Transformation of Clinical Research Chapter 3: Methodological Foundations of Clinical Research Chapter 4: The Clinical Research Environment Chapter 5: Next Generation Biorepository Informatics: Supporting Genomics, Imaging, and Innovations in Spatial Biology Chapter 6: Study Protocol Representation Chapter 7: Clinical Research Information Systems Chapter 8: Public Policy Issues in Clinical Research Informatics The first section addresses the historical context, settings, wide-ranging objectives, and basic

K. F. Hollis et al.

definitions and topics for clinical research informatics. In this section, we sought to introduce the context of clinical research and the relevant pieces of informatics that together constitute the space for applications, processes, problems, issues, etc. that collectively comprise CRI activities. We start with a historical perspective from Christopher Chute, whose years of experience in this domain and informatics, generally, allow for an overview of the evolution from notation to digitization. His chapter brings in historical perspectives to the evolution and changing paradigms of scientific research in general and specifically on the ongoing development of clinical research informatics. Also, the business aspects of clinical research are described and juxtaposed with the evolution of other scientific disciplines, as new technological advances greatly expanded the availability of data in those areas. Chute also illustrates the changing sociopolitical and funding atmospheres and highlights the dynamic issues that will impact the definition and scope of CRI moving forward. Extending the workflow and information needs is an overview of study designs in the chapter on methodological foundations of clinical research presented by Antonella Bacchieri and Giovanni Della Cioppa. They provide a broad survey of various research study designs (which was described in much more detail in a separate Springer text [27] authored by them) and highlight the data capture and informatics implications of each. Philip Payne follows the method chapter with a chapter focused on the complex nature of clinical research workflows included in the clinical research environment—including a discussion on stakeholder roles and business activities that make up the field. This is a foundational chapter as it describes the people and tasks that information and communication technologies (informatics) are intended to support. At the crux of clinical research informatics is a variety of information management systems, which are characterized and described by Prakash Nadkarni. His chapter also gives a broad overview of system selection and evaluation issues. In addition, this chapter provides brief descriptions of each group of activities, system requirements

1 Introduction to Clinical Research Informatics

for each area, and the type and status of systems for each. We are very pleased to have Michael Becich and colleagues, Chenyu Li, Rumana Rashid, Eugene Sadhua, and Sandro Santagatac, contribute a new chapter on biorepositories. As Dr. Becich et al. maintain, the importance of biospecimens and their derivatives, particularly genomic sequencing and expression data coupled with deep clinical annotation from electronic health records, is fueling a new era of deep biologic interrogation of both the cell biology of human tissues and their diseased counterparts. The importance of computerized representation of both data and processes—including the formalization of roles and tasks—is underscored by Joyce Niland, Julie Hom, and Susan Hmwe in their chapter on Study Protocol Representation. The essence of any clinical study is the study protocol, an abstract concept that comprises a study’s investigational plan and also a textual narrative documentation of a research study. The section ends with an up-to-date look by Jeff Smith on the important policy issues concerning CRI. Smith provides deep detail on the history and implications of the twenty-first Century Cures Act, and he emphasizes that capitalizing on the numerous and extraordinary opportunities to improve development and delivery of new interventions will depend heavily on the application of CRI theory and methods. Part II: Enabling Frameworks and Processes and Tools Chapter 9: Data Sharing and Reuse of Health Data for Research Chapter 10: Data Quality in Clinical Research Chapter 11: Research Data Governance, Roles, and Infrastructure Chapter 12: Informatics Approaches to Patient Recruitment Chapter 13: Patient Registries for Clinical Research Several chapters in this section cover the process of handling data in CRI and how to appropriately manage and use this data. The use of clinical data for research is a tremendous challenge with perhaps the greatest potential for impact in all areas of clinical

7

research. Standards specifications for the use of clinical data to populate research forms have evolved to support a number of very promising demonstrations of the “collect once, use many” paradigm. In her chapter on clinical data sharing and reuse of health data for research, Rebecca Kush covers various scenarios for data sharing, including who needs to share data and why. More importantly, she describes the history and future strategy of cooperation between major standards development organizations in health care and clinical research. The quality of the data ultimately determines the usefulness of the study and applicability of the results. In “Data Quality in Clinical Research,” Meredith Zozus, Michael Kahn, and Nicole Weiskopf address the idea that central to clinical research are data collection, quality, and management. They focus on various types of data collected (e.g., clinical observations, diagnoses) and the methods and tools for collecting these. Special attention is given to the development as use of case report forms (CRFs), historically the primary mechanism for data collection in clinical research but also the growing use of EHR data in clinical research and data bias. The chapter provides a theoretical framework for data quality in clinical research and also will serve as practical guidance. Moreover, Zozus et al. draw on the themes of workflows presented by Payne in the chapter on clinical research environment and advocate explicit processes dedicated to quality for all types of data collection and acquisition. In the chapter on data governance, Anthony Solomonides provides many ways to look at how clinical research data is governed and the necessary steps to ensure data is managed and protected for research. In addition to providing key activities for proper data governance, he also explains various motivations and reasons supporting “why” effective data governance is important and ardently pursued. Dr. Solomonides describes organizational structures and processes that can be used to ensure data quality and patient and institutional protections. His chapter includes a case study. Chunhua Weng and Peter Embi address information approaches to patient recruitment by dis-

8

cussing practical and theoretical issues related to patient recruitment for clinical trials, focusing on possible informatics applications to enhance recruitment. Their chapter highlights evolving methods for computer-based recruitment and eligibility determination, sociotechnical challenges in using new technologies and electronic data sources, and standardization efforts for knowledge representation. Finally, and also related to patients, is a chapter on patient registries, provided by Rachel Richesson, Leon Rozenblit, Kendra Vehik, and Jimmy Tcheng. They highlight the growing importance of patient registries as curated data resources that can be enriched with multiple data types to support research, discovery, and healthcare quality improvement, thereby providing data infrastructure for learning health systems. Their discussion includes the scientific and technical issues for registries and reviews challenges and approaches for standardizing the data collected. Part III: Managing Different Types of Data Across Clinical and Translational Research Chapter 14: Best Practices for Research Data Management Chapter 15: Patient-Reported Outcome Data Chapter 16: Molecular, Genetic and otheromics Data Chapter 17: Clinical Trials Registries, Results Databases, and Research Data Repositories The premise of clinical research informatics is that the collection of data, and techniques for aggregating and sharing data with existing knowledge, can support discovery of new knowledge leading to scientific breakthroughs. The chapters that comprise this section are focused on state-of-the-art approaches to organizing or representing knowledge for retrieval purposes or use of advanced technologies to discover new knowledge and information where structured representation is not present or possible. A new chapter for this edition is on research data management by Anita Walden, Maryam Garza, and Luke Rasmussen. This chapter provides general guidance around the traditional and several forward- thinking aspects of clinical data management (CDM). It offers a foundational understanding of CDM and the scope of its activities. The chapter

K. F. Hollis et al.

also describes the significance and relevance of CDM to researchers and data managers and summarizes best practices and resources for training and certification. An important source of data, data reported by patients, is described thoroughly by Robert Morgan, Kavita Sail, and Laura Witte in the next chapter on “Patient-Reported Outcomes.” The chapter describes the important role patient outcomes play in clinical research and the fundamentals of measurement theory and well-established techniques for valid and reliable collection of data regarding patient experiences. This section also has a chapter on molecular, genetic, and other omic data that includes the increasing availability of genetic data that is becoming vital to clinical research and personalized medicine. The discussion provided by Stephane Meystre, Ramkiran Gouripeddi, and Alexander Alekseyenko primarily focuses on the relationship and interactions of voluminous molecular data with clinical research informatics, particularly in the context of the new (post) genomic era. The translational challenges in biological and genetic research, genotype–phenotype relations, and their impact on clinical trials are addressed in this chapter as well. The full transparency of clinical research is a powerful strategy to diminish publication bias, increase accountability, avoid unnecessary duplication of research, advance research more efficiently, provide more reliable evidence (information) for diagnostic and therapeutic prescriptions, and regain public trust. Trial registration and results disclosure are considered powerful tools for achieving higher levels of transparency and accountability for clinical trials. New emphasis on knowledge sharing and growing demands for transparency in clinical research is contributing to a major paradigm shift in health research that is well underway. This section’s final chapter by Karmela Krleža-Jerić, Mersiha MahmićKaknjo, and Khaled El Emam discusses the use of trial registries and results databases in clinical research and decision-making. International standards of trial registration and their impact are discussed, as are the contribution of informatics experts to these efforts.

1 Introduction to Clinical Research Informatics

Part IV: Managing Different Types of Data Across Clinical and Translational Research Chapter 18: Knowledge Representation and Ontologies Chapter 19: Developing and Promoting Data Standards for Clinical Research Chapter 20: Non-Hypothesis Driven Research: Data Mining and Knowledge Discovery Chapter 21: Clinical Natural Language Processing in Secondary use of EHRs for Research There is a natural appeal to ideas for transforming and exchanging heterogeneous data, which can be advanced using ontologies (or formal conceptual semantic representations of a domain). Kin Wah Fung and Olivier Bodenreider give us an overview of basic principles and challenges, all tied to examples of use of ontology in the clinical research space. This chapter covers the challenges related to knowledge representation in clinical research and how trends and issues in ontology design, use, and testing can support interoperability. Essential definitions are covered, as well as applications and other resources for development such as the semantic web. Additionally, major relevant efforts toward knowledge representation are reviewed. Specific ontologies relevant to clinical research are discussed, including the ontology for clinical trials and the ontology of biomedical investigation. Organizations, such as the National Center for Biomedical Ontology, that coordinate development, access, and organization of ontologies are discussed. Rachel Richesson, Cecil Lynch, and W. Ed Hammond cover the topic of standards—a central topic and persistent challenge for informatics efforts. Their focus is on the standards development process and relevant standards developing organizations, including the Clinical Data Interchange Standards Consortium (CDISC) and Health Level Seven (HL7). They address the collaboration and harmonization between research data standards and clinical care data standards and discuss new standards initiatives, such as the HL7 Vulcan FHIR Accelerator initiative. A chapter on nonhypothesis-driven research, authored by Mollie Cummins, Senthil

9

Nachimuthu, Samir Abdelrahman, Julio Facelli, Ramkiran Gouripeddi, offers an overview of state-of-the-art data mining and knowledge discovery methods and tools as they apply to clinical research data. The vast amount of data that is warehoused across various clinical research enterprises, and the increasing desire to explore these to identify unforeseen patterns, require very advanced techniques. Examples of how nonhypothesis- driven research supported by advanced data mining, knowledge discovery algorithms, and statistical methods help elucidate the need for these tools to support clinical and translational research. Much of the information in EHR systems is in the form of unstructured or free-text data, and as found in clinical notes and narratives, the ability to access these data has the potential or transform clinical research. As a final chapter in this section, authors Sunyang Fu, Andrew Wen, and Hongfang Liu discuss how new methods and tools are describe and how clinical natural language processing (NLP) has been adopted to computationally facilitate clinical research. Their chapter describes the foundation of clinical NLP and explains different NLP techniques that can be employed in the context of extracting and transforming narrative information in EHR to support clinical research. Part V: Evolving Models and New Opportunities for the Transformation of Clinical Research Chapter 22: Back to the Future: The Evolution of Pharmacovigilance in the Age of Digital Healthcare Chapter 23: Evolving Opportunities and Challenges for Patients in Clinical Research Chapter 24: Apps in Clinical Research Chapter 25: Future Directions in Clinical Research Informatics In this final section of the text, we also include topics that will continue to impact CRI into the future and that build upon the contexts, data sources, and information and knowledge management issues discussed in previous sections. Many of the topics included here are truly multidisciplinary and stand to potentially impact all clinical research studies.

K. F. Hollis et al.

10

Pharmacovigilance is an important and evolving discipline that is highly relevant to the future evolution of CRI application domains, particularly given the relevance of pharmacovigilance to patient safety and potential to impact population health. Informatics methods and applications are needed to ensure drug safety for patients and the ability to access, analyze, and interpret distributed clinical data across the globe to identify adverse drug events. Michael Ibara and Rachel Richesson provide a historical account of its evolution, as well as the increasing need for informatics methods and applications that can be employed to ensure greater patient safety. Various issues are explored in this context, including drug and device safety monitoring and changing paradigms and emerging infrastructures for detecting adverse drug events. Two chapters in this final section tackle different perspectives on patients or consumers. Given the rapid advances in technology and parallel continued emphasis on patient empowerment and participation in decision making, Jim Andrews, Christina Eldredge, Janelle Applequist, and David Johnson consider the changing role of consumers in health care generally and in clinical research particularly. Traditional treatments of information behaviors and health communication are discussed, building to more current approaches and emerging models. Central to understanding the implications for clinical research are the evolving roles of consumers who are more engaged in their own decision-making and care and who help drive research agendas. The tools and processes that support patient decision-making, engagement, and leadership in research are also briefly described here. A new chapter, “Apps in Clinical Research,” is also included in this edition. In this chapter, Brian Douthit describes how the introduction of “apps” (software applications that can be installed and run on computers, tablets, or smartphones) are changing the landscape of clinical research by opening new avenues for administrating and evaluating interventions. The almost ubiquitous use of smartphones and apps has transformed many industries, and a similar transformation of health care and clinical research appears inevita-

ble. In this chapter, Dr. Douthit outlines the major events leading to today’s app infrastructure, current uses of apps in clinical research, design considerations, and future directions. The book concludes with a brief chapter by Peter Embi summarizing the challenges CRI researchers and practitioners will continue to face as the field evolves and new challenges arise. This concluding chapter helps in envisioning the future of the domain of clinical research informatics. In addition to outlining likely new settings and trends in research conduct and funding, the author cogitates on the future of the informatics infrastructure and the professional workforce training and education needs. A focus of this chapter is the description of how clinical research (and supporting informatics) fits into a bigger vision of a learning health system and of the relationship between clinical research, evidence-based medicine, evidence-generating medicine, and quality of care.

Conclusion The overall goal of this book is to contribute to the ongoing discourse among researchers and practitioners in CRI as they continue to rise to the challenges of a dynamic and evolving clinical research environment. CRI is an exciting and broad domain, leaving ample room for future additions or other texts exploring these topics more deeply or comprehensively. Most certainly, the development of CRI as a subdiscipline of informatics and a professional practice area will drive a growing pool of scientific literature based on original CRI research, and high-impact tools and systems will be developed. It is also certain that CRI groups will continue to support and create communities of discourse that will address much needed practice standards in CRI, data standards in clinical research, policy issues, educational standards, and instructional resources. The scholars that have contributed to this book are among the most active and engaged in the CRI domain, and we feel they have provided an excellent starting point for deeper explorations into this emerging discipline. While we

1 Introduction to Clinical Research Informatics

have by no means exhausted the range of topics in this new edition of CRI, we hope that readers will see several themes stand out throughout this text. The authors illustrate well that the CRI domain can keep up with the pace of innovation even in hard times (like pandemics). It is likely that moving forward, the use of informatics and computing will continue to evolve to address emergent research needs and technical capabilitie and accelerate the generation of new knowledge and insight to guide the course of human and global evolution in ways we cannot even predict.

References 1. Mayer D. A brief history of medicine and statistics. In: Essential evidence-based medicine. Cambridge: Cambridge University Press; 2004. p. 1–8. 2. Atkins HJ. The three pillars of clinical research. Br Med J. 1958;2(5112):1547–53. 3. Bacchieri A, Della CG. Fundamentals of clinical research: bridging medicine, statistics and operations, statistics for biology and health. Milan: Springer; 2007. 4. Friedman CP. A “fundamental theorem” of biomedical informatics. J Am Med Inform Assoc. 2009;16(2):169–70. 5. NIH. The NIH Director’s panel on clinical research report to the advisory committee to the NIH director. 1997. http://www.oenb.at/de/img/executive_summary%2D%2Dnih_directors_panel_on_ clinical_research_report_12_97_tcm14-48582.pdf. Accessed 2011. 6. Zhu N, Zhang D, Wang W, et al. A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med. 2020;382(8):727–33. 7. Cohen J. With record-setting speed, vaccinemakers take their first shots at the new coronavirus. Science. 2020. https://www.science.org/content/article/record- setting-speed-vaccine-makers-take-their-first-shots- new-coronavirus. Accessed 09 Oct 2022. 8. FDA. https://www.fda.gov/science-research/ s c i e n c e -a n d -r e s e a r c h -s p e c i a l -t o p i c s / real-world-evidence. 9. FDA. Conduct of clinical trials of medical products during the COVID-19 public health emergency. https://www.fda.gov/media/136238/download. 10. O'Brien EC, Sugarman J, Weinfurt KP, Larson EB, Heagerty PJ, Hernandez AF, Curtis LH. The impact of COVID-19 on pragmatic clinical trials: lessons learned from the NIH Health Care Systems Research

11 Collaboratory. Trials. 2022;23(1):424. https://doi. org/10.1186/s13063-022-06385-8. 11. Valenta AL, Meagher EA, Tachinardi U, Starren J. Core informatics competencies for clinical and translational scientists: what do our customers and collaborators need to know? J Am Med Inform Assoc. 2016;23(4):835–9. https://doi.org/10.1093/jamia/ ocw047. 12. Embi PJ, Payne PR. Clinical research informatics: challenges, opportunities and definition for an emerging domain. J Am Med Inform Assoc. 2009;16(3):316–27. 13. Sanchez-Pinto L, Mosa ASM, Fultz-Hollis K, Tachinardi U, Barnett WK, Embi PJ. The emerging role of the chief research informatics officer in academic health centers. Appl Clin Inform. 2017;8(3):845–53. 14. Solomonides A. Review of clinical research informatics. Yearb Med Inform. 2020;29(1):193–202. 15. Forrest WH, Bellville JW. The use of computers in clinical trials. Br J Anaesth. 1967;39:311. 16. Collen MF. Clinical research databases—a historical review. J Med Syst. 1990;14(6):323–44. 17. Pryor DB, et al. Features of TMR for a successful clinical and research database. In: Blum BI, editor. Proceedings of the sixth annual symposium on computer applications in medical care. New York: IEEE; 1982. 18. Blois MS. Information and medicine: the nature of medical descriptions. Berkeley: University of California Press; 1984. 19. Blois MS. Medical records and clinical databases: what is the difference? MD Comput Comput Med Pract. 1984;1(3):24–8. 20. Breitfeld PP, et al. Pilot study of a point-of-use decision support tool for cancer clinical trials eligibility. J Am Med Inform Assoc. 1999;6(6):466–77. 21. Carlson R, et al. Computer-based screening of patients with HIV/AIDS for clinical-trial eligibility. Online J Curr Clin Trials. 1995;4:179. 22. Mansour EG. Barriers to clinical trials. Part III: knowledge and attitudes of health care providers. Cancer. 1994;74(9 Suppl):2672–5. 23. ICH. Guideline for good clinical practice E6(R1). International Conference on Harmonisation. 1996. 24. Shortliffe EH, Cimino JJ. Biomedical informatics: computer applications in health care and biomedicine health informatics. New York: Springer Science+Business Media, LLC; 2021. 25. Hersh WR. Health informatics: practical guide. 8th ed. Morrisville: Lulu.com; 2022. p. 460. 26. Embi P. AMIA CRI years in review. 2018. http:// www.embi.net/cri-years-in-review.html. Accessed 1 July 2018. 27. Bacchieri A, Cioppa G. Fundamentals of clinical research: bridging medicine, statistics and operations. Milan: Springer; 2007.

Part I Foundations of Clinical Research Informatics

2

From Notations to Data: The Digital Transformation of Clinical Research Christopher G. Chute

Abstract

Keywords

The history of clinical research precedes the advent of computing, though informatics concepts have long played important roles. The advent of digital signal processing in physiologic measurements tightened the coupling to computation for clinical research. The astronomical growth of computational capacity over the past 60 years has contributed to the scope and intensity of clinical analytics, for research and practice. Correspondingly, this rise in computation power has made possible clinical protocol designs and analytic strategy that were previously infeasible. The factors have driven biological science and clinical research into the big science era, replete with a corresponding increase of intertwined data resources, knowledge, and reasoning capacity. These change usher in a social transformation of clinical research and highlight the importance of comparable and consistent data enabled by modern health information data standards and ontologies.

History of clinical research · Digitalization of biomedical data · Information-intensive domain · Complexity of clinical research informatics · Computing capacity and information processing · interoperability and standards · Complexity of research design and protocols

C. G. Chute (*) Schools of Medicine, Public Health, and Nursing, Johns Hopkins University, Baltimore, MD, USA e-mail: [email protected]

Learning Objectives 1. Describe key points in the evolution of clinical research driven by the advent of computing advances and related technologies 2. Outline key concepts related to the digitization of biomedical and research data 3. Discuss the social transformations and other impacts resulting from the digital evolution of clinical research

Historical Perspective The history of clinical research, in the broadest sense of the term, is long and distinguished. From the pioneering work of William Harvey to the modern modalities of translational research, a common thread has been the collection and interpretation of information. Thus, informatics has played a prominent role, if not always recognized as such. Accepting that an allowable definition of informatics is the processing and interpretation

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. L. Richesson et al. (eds.), Clinical Research Informatics, Health Informatics, https://doi.org/10.1007/978-3-031-27173-1_2

15

C. G. Chute

16

of information that permits analyses or inferencing, the science of informatics can and does predate the advent of modern computing. Informatics has always been a multidisciplinary science, blending information science with biology and medicine. Reasonable people may inquire whether distinguishing such a hybrid as a science is needed, though this is reminiscent of parallel debates about epidemiology, which to some had merely coordinated clinical medicine with biostatistics; few question the legitimacy of epidemiology as a distinct discipline today (nor biostatistics if I were to nest this discussion yet further). Similarly, in the past two decades, informatics, including clinical research informatics as a recognized subfield, has come into its own. Nevertheless, common understanding and this present text align informatics, applied to clinical research or otherwise, with the use of digital computers. So when did the application of digital computers overlap clinical research? This question centers on one’s notion about the boundaries of clinical research, perhaps more a cultural issue than amenable to rational debate. For the purposes of this discussion, I will embrace the spectrum from physiological measurements to observational data on populations within the sphere of clinical research.

Analog Signal Processing In its simplest form, the use of an analog measurement can be seen in the measurement of distance with a ruler. While not striking most as a predecessor of clinical informatics, it does illustrate the generation of quantitative data. It is the emphasis on the quantification of data that distinguishes ancient from modern perspectives on biomedical research. The introduction of signal transducers, which enabled the transformation of a myriad of observations ranging from light, pressure, velocity, temperature, or motion into electronic signals, such as voltage strength, demarcated the transition from ancient to modern science. This represents yet another social transformation attributable to the harnessing of electricity. Those

of us old enough to remember the ubiquitous analog chart recorder, which enabled any arbitrary voltage input to be continuously graphed over time, recognize the significant power that signal transduction engendered. The ability to have quantified units of physiologic signals, replete with their time-dependent transformations as represented on a paper graph, enabled the computation, albeit by analog methods, of many complex parameters now taken for granted. These include acceleration constants, maximum or minimum measures, inflection points, a host of continuous data properties, and most importantly an ability to observe and quantitate covariance between and among such measures. These in turn enabled the creation of mathematical models that could be inferred, tested, validated, and disseminated on the basis of continuous quantitative data. Departments of physiology and biomedical research saw a huge progress in the evolution and sophistication of physiologic models arising from increasing quantities of continuous quantitative data over time. Early work invoking signal transduction and quantified analog signals could be found in the 1920s but became much more common in the 1930s and was a standard method in the 1940s and 1950s. This introduced unprecedented precision, accuracy, and reproducibility in biomedical research. The novel capability of complex quantitative data capture, analysis, and utilization presaged the next great leap in clinical informatics: the digitalization of data.

Digital Signal Processing The advent of digital signal processing (DSP), first manifested in analog to digital converters (ADCs), has fundamentally transformed clinical research. In effect, it is the marrying of quantitative data to computing capability. ADCs take analog input, most typically a continuous voltage signal, and transform it into a digital number. Typically, the continuous signal is transformed into a series of numbers, with a specific time interval between the generations of digital “snap-

2 From Notations to Data: The Digital Transformation of Clinical Research

shots.” The opposite twin of ADCs is digital to analog converters (DACs), which can make digital data “move the needle” proverbially. DSPs were first practically used during the Second World War, when they were experimented to carry telephonic signals over long distances without degradation by putting ADCs and DACs in series. The telephony industry brought this capability into the civilian world, and commercial DSP began to appear in the 1950s. At that time, the numerical precision was crude, ranging from 4 to 8 bits. Similarly, the frequency of digital number generation was relatively slow, on the order of one number per second. The appearance of transistors in the 1960s, and integrated circuits in the 1970s, ushered in a period of cheap, reliable, and relatively fast DSP. While case reports exist of physiologic researchers using DACs in the 1950s, this did not become a common practice until the cost and performance characteristics of this technology became practical in the early 1970s. Today, virtually all modern smartphones have highly sophisticated DSP capabilities, some of which is starting to be used for remote physiological monitoring of clinical research participants and the general public through fitness apps.

The Digitalization of Biomedical Data The early 1970s was also coincident with the availability of affordable computing machinery for routine analysis to the same biomedical research community. Because DSP is the perfect partner for modern digital computing, supporting moderately high-bandwidth data collection from a myriad of information sources and signals, they enabled a practical linkage of midscale experimental data to computing storage and analysis in an unprecedented way. Prior to that time, any analysis of biomedical data would require key entry, typically by hand. Again, many of us can recall rooms of punch card data sets, generated by tedious keypunch machinery.

17

While it is obviously true that not all biomedical data or clinical informatics arose from transducer-driven DSP signals, the critical mass of biomedical data generated through digitalization of transducer-generated data culturally transformed the expectation for data analysis. Prior to that time, small data tables and hand computations would be publishable information. The advent of moderate-volume data sets, coupled with sophisticated analytics, raised the bar for all modalities of biomedical research. With the advent of moderate-volume data sets, sophisticated computing analytics, and modeldriven theories about biomedical phenomenon, the true birth of clinical research informatics began.

Dimensions of Complexity Informatics, by its nature, implies the role of computing. Clinical research informatics simply implies the application of computational methods to the broad domain of clinical research. With the advent of modern digital computing, and the powerful data collection, storage, and analysis that this makes possible, inevitably comes complexity. In the domain of clinical research, I assert that this complexity has axes, or dimensions, that we can consider independently. Regardless, the existence and extent of these complexities have made inexorable the relationship between modern clinical research, computing, and the requirement for sophisticated and domain-appropriate informatics.

Computing Capacity and Information Processing Biomedical research and, as a consequence, clinical research informatics are by their nature within a profoundly information-intensive domain. Thus, any ability to substantially increase our capacity to process or manage information will significantly impact that domain. The key-enabling technology of all that has been described in clinical research informatics is the

C. G. Chute

18

advent of ever-increasing computational capabilities. This has been widely written about, but I submit its review is germane to this introduction. I will frame these advances in four dimensions: computational power, network capacity, local memory, and data storage.

Computational Power The prediction of Gordon Moore in 1965 that integrated circuit density would double every 2 years is well known. Given increasing transistor capabilities, a corollary of this is that computing performance would double every 18 months. Regardless of the variation, the law has proved uncannily accurate. As a consequence, there has been roughly a ten trillion-fold increase in computing power over the last 60 years. The applications are striking; the supercomputing resources that national spies would kill each other to secure 20 years ago now end up under Christmas trees as game platforms for children. The advent of highly scalable graphical processing units (GPU) has correspondingly transformed our capacity to feasibly address many problems previously beyond practical limits.

Network Capacity Early computing devices were reliant on locally connected devices for input and output. The most primitive interface devices were plugboard and toggle switches that required human configuration; the baud rates of such devices are perhaps unimaginably slow. Today, Tb network backbones are not uncommon, giving yet nearly another trillion-fold increase in computational capabilities.

Local Storage Early computers used electromechanical relays later replaced by speedy vacuum tubes. The advent of the transistor, and subsequently the integrated circuit, enabled the dramatic reduction in space with an increase in density for

local storage. It is clear that at least a trillionfold increase in common local storage capability in terms of speed and size has been achieved.

Data Storage The advent of high-density, high-performance disk drives, compared to early paper tape or punch card, yields perhaps the most dramatic increase in data processing capability and capacity. Petabyte drive complexes are not uncommon, and with the advent of cloud storage, there is no practical upper limit. For the purposes of this exercise, and to make a relatively round number, we can assert a 1015 increase in data storage capacity. Taken together, these advances total an approximate 1060 increase in computational power (albeit we are cheating somewhat adding exponents, which is really multiplying in non- logarithmic space) over the past 60 years. Regardless, there has been an astronomical increase in our ability and capacity to manage, process, and inference about data and information. In an information-intensive industry such as clinical research, the consequences cannot be other than profound.

Data Density The most obvious dimension of data complexity is its sheer volume. Historically, researchers would content themselves with a data collection sheet that might have been enumeration of subjects or objects of study and at most a handful of variables. The advent of repeated measures, metadata, or complex data objects was far in the future, as were data sets that evolved from the scores to the thousands. Today, it is not uncommon in any domain of biomedical research to find vast, rich, and complex data structures. In the domain of genomics, this is most obvious with not only sequencing data for the genome but also the associated annotations, haplotype, pathway data, and sundry variants with clinical or physiological import, as important attributes. The advent of whole genome

2 From Notations to Data: The Digital Transformation of Clinical Research

sequences (WGS) increases volume and complexity, while the application of WGC to discrete cells within tumors further raises the bar. This complexity is not unique to genomic data. Previously humble clinical trial data sets now have highly complex structures and can involve vectors of laboratory data objects each with associated normal ranges, testing conditions, and important modes of conclusion- changing metadata. Similarly, population-based observational studies may now have large volumes of detailed clinical information derived from electronic health records. The historical model of relying on human- extracted or entered data is long past for most biomedical investigators. High data volumes and the asserted relationships among data elements comprise information artifacts that can only be managed by modern computing and informatics methods.

Design Complexity Commensurate with the complexity of data structure and high volume is the nature of experimental design and methodology. Today, ten-way cross-fold validation, bootstrapping techniques for various estimates, exhaustive Monte Carlo simulation, and sophisticated experimental nesting, blocking, and within-group randomization afford unprecedented complexity in the design, specification, and execution of modern-day protocols. Thus, protocol design options have become inexorably intertwined with analytic capabilities. What was previously inconceivable from a computational perspective is now a routine. Examples of this include dynamic censoring, multiphase crossover interventions, or imputed values.

Analytic Sophistication Paralleling the complexity of design is the sophistication of analysis. As implied in the previous section, it is difficult to say which is causal; no doubt analytic capabilities push design, as design innovations require novel analytic modalities.

19

The elegant progression from simple parameter estimation, such as mean and variance, to linear regressions, to complex parametric models, such as multifactorial Poisson regression, to sophisticated and nearly inscrutable machine learning techniques such as multimodal neural networks or deep learning, demonstrates exponentially more intensive numerical methods demanding corresponding computational capacity. Orthogonal to such computational virtuosity is the iterative learning process now routinely employed in complex data analysis. It is rare that a complete analytic plan will be anticipated and executed unchanged for a complex protocol. Now, preliminary analysis, model refinement, parameter fitting, and discovery of confounding or effect modification are routinely part of the full analysis process. The computational implications of such repeated, iterative, and computationally complex activities are entirely enabled by the availability of modern computing. In the absence of this transformative resource, and the commensurate informatics skills, modern data analysis and design would not be possible.

The Emergence of Big Science What then are the consequences of unprecedented computational capabilities in an information- intensive enterprise such as clinical research? It is useful to examine where this or similar activities have occurred previously. An evolutionary change for many disciplines is a transition from an exclusively independent- investigator- driven suite of agendas across a discipline (small-science or bottom-up foci) to a maturation where interdependency of data and methods, multidisciplinary teams of talent and interest, and large-scale, crossdiscipline shared resources, such as massive machines or databases, predominate (big-science or top-down coordination).

Evolution of Astronomy and Physics The practice of modern astronomy relies upon large groups, large data sets, and strong collaboration between and among investigators. The detec-

C. G. Chute

20

tion of a supernova in a distant galaxy effectively requires a comparison of current images against historical images and excluding any likely wandering objects, such as comets. Similarly, the detection of a pulsar requires exhaustive computational analysis of very large radio telescope data sets. In either case, the world has come a long way from the time when a single man with a handheld telescope, in the style of Galileo, could make seminal astronomical discoveries. In parallel, the world of high particle physics has become a big science given its requirements for large particle accelerators, massive data- collection instrumentation, and vast computational power to interpret arcane data. Such projects and initiatives demand large teams, interoperable data, and collaborative protocols. The era of tabletop experiments, in the style of Rutherford, has long been left behind. What is common about astronomy and physics is their widely recognized status as big- science enterprises. A young investigator in those communities would not even imagine or attempt to make a significant contribution outside the community and infrastructure that these fields have established, in part due to the resource requirements, but equivalently because of the now-obvious multidisciplinary nature of the field.

iology and Medicine as a Socially B Interdependent Process I return to the assertion that biology and medicine have become information-intensive domains. Progress and new discovery are integrally dependent on high-volume and complex data. Modern biology is replete with the creation of and dependency on large annotated data sets, such as the fundamental GenBank and its derivatives, or the richly curated animal model databases. Similarly, the annotations within and among these data sets

constitute a primary knowledge source, transcending in detail and substance the historically quaint model of textbooks or even the prose content in peer-reviewed journals. The execution of modern studies, relying as it does on multidisciplinary talent, specialized skills, and cross-integration of resources, has become a complex social process. The nature of the social process at present is still a hybrid across bottom-up, investigator-initiated research and team-based, program project-oriented collaborations.

The Social Transformation of Clinical Research The conclusion that biology and medicine, and as a consequence clinical research informatics, are evolving into a big-science paradigm is unavoidable. While this may engender an emotional response, the more rational approach is to understand how we as a clinical research informatics community can succeed in this socially transformed enterprise. Given the multidisciplinary nature of informatics, the clinical research informatics community is well poised to contribute importantly in the success of this transformed domain. A consequence of such a social transformation is the role of government or large foundations in shaping the agenda of the cross-disciplinary field. One role of government, in science or any other domain, is to foster the long-term strategic view and investments that cannot be sustained in the private marketplace or the agendas of independent investigators. Further, it can encourage and support the coordination of multidisciplinary participation that might not otherwise emerge. In the clinical trials world, the emergence of modest but influential forces such as ClinicalTrials.gov illustrates this role.

2 From Notations to Data: The Digital Transformation of Clinical Research

Standards

21

science knowledge and information into clinical care.

If biology and medicine, and by association clinical research informatics, are entering a big- science paradigm, what does this demand as an I nteroperable Systems informatics infrastructure? and Constructs

Comparable and Consistent Information Given the information-intensive nature of clinical research informatics, the underlying principle for big science is the comparability and consistency of data. Inferencing across noncomparable information, by definition, cannot be done. Anticipating or accounting for inconsistent data representations is inefficient and non-scalable. The obvious conclusion is that within biology and medicine, a tangible contribution of clinical informatics is to ensure that genomic, clinical, and experimental data conform to frameworks, vocabularies, and specifications that can sustain interoperability. This also raises the profile and critical requirement for robust ontologies to mediate data and knowledge integration. Emergent projects that demonstrate large-scale inferencing and reasoning, using ontological annotations of basic science and clinical data, are beginning to bridge the historical “chasm of semantic despair” that inhibited rapid translation of basic

The hallmark of big science, then, is interoperable information. The core of interoperable information is the availability and adoption of standards. Such standards can and must specify data relationships, content, vocabulary, and context. As we move into this next century, the great challenge for biology and medicine is the definition and adoption of coherent information standards for the substrate of our research practice. The present volume outlines many issues that relate to data representation, inferencing, and standards—issues that are crucial for the emergence of large-scale science in clinical research. Readers must recognize that they can contribute importantly through the clinical research informatics community to what remains an underspecified and as yet immature discipline. Yet there is already tremendous excitement and interest at the intersection between basic science and clinical practice, manifested by translational research, that has well-recognized dependencies on clinical research informatics. I trust that the present work will inspire and guide readers to consider and hopefully undertake intellectual contributions toward this great challenge.

3

Methodological Foundations of Clinical Research Antonella Bacchieri and Giovanni Della Cioppa

Abstract

This chapter focuses on clinical experiments and discusses the phases of the pharmaceutical development process. We review the conceptual framework and classification of biomedical studies and look at their distinctive characteristics. Biomedical studies are classified into two main categories, observational and experimental, which are then further classified into subcategories of prospective and retrospective and community and clinical, respectively. We review the basic concepts of experimental design, including defining study samples and calculating sample size, where the sample is the group of subjects on which the study is performed. Choosing a sample involves both qualitative and quantitative considerations, and the sample must be representative of the population under study. We then discuss treatments, including those that are the object of the experiment (study treatments) and those that are not (concomitant treatments). Minimizing bias through the use of randomization, blinding, and a priori definition of the statistical analysis is also discussed.

A. Bacchieri (*) CROS NT srl and Clinical R&D Consultants srls, Verona, Rome, Italy G. D. Cioppa Clinical R&D Consultants srls, Rome, Italy

Finally, we briefly look at innovative approaches, for example, how adaptive clinical trials can shorten the time and reduce the cost of classical research programs or how targeted designs can allow a more efficient use of patients in rare conditions. Keywords

Phase I, II, III, and IV trials · Classification of biomedical studies · Observational study Experimental study · Equivalence/non- inferiority studies · Superiority versus non-inferiority studies · Crossover designs Parallel group designs · Adaptive clinical trials · Targeted designs

The Development of Pharmaceuticals: An Overview The development of a pharmacological agent (preventive, diagnostic, or therapeutic) from start to first launch on the market typically lasts in excess of 10 years, at times considerably longer, and thereafter continues throughout its life cycle, often for decades postmarketing [1] (see Chap. 22). Clinical experiments, the focus of this chapter, are preceded by many years of preclinical development. In very broad terms, the preclinical

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. L. Richesson et al. (eds.), Clinical Research Informatics, Health Informatics, https://doi.org/10.1007/978-3-031-27173-1_3

23

24

development process can be summarized in a sequence of seven phases [2]: 1. Screening of hundreds of candidates by means of biological assays. Candidates may be produced chemically or through biological systems. 2. Selection of the lead compound. 3. Physicochemical characterization of the lead compound. 4. Formulation of the drug product, consisting of drug substance, excipients, and delivery system. 5. Scale-up of production and quality control. 6. Toxicology experiments. 7. Preclinical pharmacology, which includes pharmacokinetics (what the body does to the drug: absorption, distribution, metabolism, and excretion—ADME) and pharmacodynamics (what the drug does to the different organs and body systems). There is a considerable chronological overlap between phases with multiple iterations and parallel activities, many of which continue well into the clinical stages. As the clinical experiments proceed and the level of confidence on the potential of a new compound grows, experiments also proceed in nonclinical areas, from toxicology to production, in preparation for the more advanced clinical phases and finally for commercialization. Conventionally, the clinical development process is divided into four phases, referred to as Phases I, II, III, and IV. Phase I begins with the first administration of the compound to humans. The main objectives of Phase I investigation are as follows: 1. Obtain indications on the safety and tolerability of the compound. 2. Study its pharmacokinetics in humans, when appropriate. 3. Obtain preliminary indications on pharmacodynamics. Typically, Phase I trials are conducted over a large range of doses. Whereas traditionally Phase

A. Bacchieri and G. D. Cioppa

I is conducted in healthy volunteers, increasingly Phase I studies are carried out directly in patients. Phase II studies are carried out on selected groups of patients suffering from the disease of interest, although patients with atypical forms and concomitant diseases are excluded. Objectives of Phase II are as follows: 1. Demonstrate that the compound is active on relevant pharmacodynamic end-points (proof of concept). 2. Select the dose (or doses) and dosing schedule(s) for Phase III (dose-finding). 3. Obtain safety and tolerability data. Sometimes Phase II is divided further into two subphases: IIa, for proof of concept, and IIb, for dose-finding. The aim of Phase III is to demonstrate the clinical effect (therapeutic or preventive or diagnostic), safety, and tolerability of the drug in a representative sample of the target population, with studies of sufficiently long duration relative to the treatment in clinical practice. The large Phase III studies, often referred to as pivotal or confirmatory, are designed to provide decisive proof in the registration dossier. All data generated on the experimental compound, from the preclinical stage to Phase III, and even Phase IV (see below), when it has already been approved in other countries, must be summarized and discussed in a logical and comprehensive manner in the registration dossier, which is submitted to health authorities as the basis for the request of approval. In the last 30 years, a large international effort took place to harmonize the requirements and standards of many aspects of the registration documents. Such efforts became tangible with the guidelines of the International Conference on Harmonisation (ICH) (www.ich.org). These are consolidated guidelines that must be followed in the clinical development process and the preparation of the registration dossiers in all three regions contributing to ICH: Europe, the United States, and Japan. An increasing number of regulatory authorities, including Chinese, Canadian and Australian, have adopted guidelines similar to

3 Methodological Foundations of Clinical Research

ICH. With regard to the registration dossier, the ICH process culminated with the approval of the Common Technical Document (CTD). The CTD is the common format of the registration dossier recommended by the European Medicines Agency (EMA), the US Food and Drug Administration (FDA), and the Japanese Ministry of Health, Labour and Welfare (MHLW). The CTD is organized in five modules, each composed of several sections. Critical for the clinical documentation are the Efficacy Overview, the Safety Overview, and the Conclusions on Benefits and Risks. The overviews require pooling of data from multiple studies into one or more integrated databases, from which analyses on the entire population and/or on selected subgroups are carried out. In the assessment of efficacy, pooling may be necessary for special groups such as the elderly or subjects with renal or hepatic impairment. In the assessment of safety and tolerability, large integrated databases are critical for the evaluation of infrequent adverse events and for subgroup analyses by age, sex, race, dose, etc. The merger of databases coming from different studies requires detailed planning at the beginning of the project. The more complete the harmonization of procedures and programming conventions of the individual studies, the easier the final pooling. Vice versa, the lack of such harmonization will cause an extenuating ad hoc programming effort at the end of the development process, which will inevitably require a number of arbitrary assumptions and coding decisions. In some cases, this can reduce the reliability of the integrated database. Clinical experimentation of a new treatment continues after its approval by health authorities and launch onto the market. Despite the approval, there are always many questions awaiting answers. Phase IV studies provide some of the answers. The expression Phase IV is used to indicate clinical studies performed after the approval of a new drug and within the approved indications and restrictions imposed by the Summary of Product Characteristics and the Package Insert. The sequence of clinical development phases briefly described above is an oversimplification, and many departures occur in real life. For exam-

25

ple, Phases I and II are frequently combined. Phases II and III may also be merged in an adaptive design trial (described later). Further, compounds in oncology have many peculiarities in their clinical development, mainly concerning Phases I and II. These differences are determined mostly by the toxicity of many compounds, even at therapeutic or subtherapeutic doses, combined with the life-threatening nature of the diseases in question. Rare diseases are another broad field where the above sequence of phases is not followed, due to the limited number of patients. As mentioned above, the clinical development process for a new diagnostic, preventive, or therapeutic agent is extremely long and the costs correspondingly high, often exceeding 10 years and 800 million USD, respectively [3]. Therefore, faster and cheaper development has always been a key objective for pharmaceutical companies, academic institutions, and regulatory agencies alike. Clearly, there is no magic solution, and no method is universally applicable. However, new methodological and operational solutions have been introduced, which contribute in selected situations to reducing the overall time of clinical development and/or lowering costs. Among the most efficient tools are the following: • Modeling and simulation statistical techniques aimed at evaluating the consequences of a variety of assumptions, that is, answering “what happens if…” questions. Simulations are used for many purposes, including detection of bias, comparison of different study designs, contribution to dose and schedule selection, and evaluation of the consequences of different decision-making rules in determining the success or failure of a study or an entire study program. • Strategies that combine different phases of development, mainly Phases II and III, such as adaptive designs (described later). • Technological innovations such as electronic data capture (EDC), which allows data entry directly by the study staff at the site into a central database without the intermediate step of traditional paper case report forms (CRFs) or direct download from measurement instru-

26

ments into the central database without any manual intervention. • Special regulatory options made available for the very purpose of accelerating clinical development of lifesaving and essential treatments. Prominent among these are the Treatment IND (FDA) and the mock application (EMA) for the approval of vaccines in outbreak situations, such as the H1N1 swine flu pandemic.

Conceptual Framework and Classification of Biomedical Studies Variability of Biological Phenomena All biological phenomena as we perceive them are affected by variability. The overall goal of any biomedical study is to separate the effect related to an intervention (the signal) from the background of variability of biological phenomena unrelated to the intervention ([1], Chap. 1). Variability of biological phenomena can be divided into three main components: 1. Phenotypic variability, that is, differences between individuals at a given point in time. 2. Temporal variability, that is, changes in a given individual over time. Temporal variability can be predictable and cyclical (e.g., hormonal changes during the menstrual cycle), predictable and noncyclical (e.g., age-related changes of height), or erratic and unpredictable. An element of unpredictability is always superimposed to any biological phenomenon undergoing predictable temporal changes; for example, the hormonal changes during the menstrual cycle, although predictable quantitatively and chronologically, can still be very different from month to month. 3. Measurement-related variability, due to the use of measurement instruments. External phenomena exist for us only to the extent they are detected by our senses and understood by our intellect. To understand an external phenomenon, we first have to recognize it and

A. Bacchieri and G. D. Cioppa

then measure it. Measurement is the process of assigning a quantity and/or symbol to a variable according to a predefined set of rules. The set of rules is often implicit: for example, the statement “my friend Ann died young at age 40” implies the assignment of a quantity (young) to Ann’s age at the time of death, based on the implicit rule that the normal time of death is much later than age 40, say, 85 or more. In scientific measurements, the set of rules is explicit and defined by the measurement scale used. Variability related to the measuring process becomes an integral part of the variability of biological phenomena as we perceive them. Errors made in the process of measuring can be of two types: random and systematic. • A random error generates measurements that oscillate unpredictably about the true value. Example: rounding off decimals from two digits to one. • A systematic error, also referred to as bias or distortion, generates measurements that differ from the true value always in the same direction. Example: measuring weight with a scale that is not correctly calibrated and, therefore, always underestimates or overestimates weight. Both random error and bias have an impact on the reliability of results of biomedical studies. Random error causes greater variability. This can be rescued to some extent by increasing the sample size of a study. Bias may simulate or obscure the treatment effect. This cannot be rescued: bias can only be prevented by a proper design of the study (see below).

iomedical Studies: Definitions B and Classification Biomedical studies are experiments with the objective of establishing a relationship between a characteristic or intervention and a disease or condition. The relationship of interest is one of cause–effect relationships. The element that makes the biomedical studies different from

3 Methodological Foundations of Clinical Research

27

deterministic experiments is the variability of the phenomenon under study. As mentioned above, all methods and techniques used in biomedical studies have the overall goal of differentiating a true cause–effect relationship from a spurious one, due to the background noise of biological variability and/or to bias. Biomedical studies must have four critical distinctive characteristics: 1. Rationale, methods, and conclusions must be based on comparisons between groups of subjects. 2. The groups of subjects being compared must be similar, that is, must have similar distribution of important demographic and clinical characteristics. 3. An adequate probabilistic model “tailored” exactly to the problem under study must allow the conclusions from the specific study to be applied to the underlying population (inference). 4. All aspects of the study must be planned in advance, in most cases before the study starts and in all cases before the data are analyzed. Biomedical studies can be classified as shown in Fig. 3.1 [1]. Medical studies are the subset of biomedical studies, which involve human subjects. These studies are classified in two main categories: observational and experimental.

bservational or Epidemiological O Studies In observational studies, also referred to as epidemiological studies , the association between a characteristic and an event is investigated without any type of intervention. When the entity of the association is relevant, a causal relationship is assumed. The characteristic being studied can be a pharmacological treatment or a demographic, behavioral, or environmental factor. The event can be the occurrence or recrudescence of a disease, hospitalization, death, etc. If the characteristic modifies the event in a favorable way, it is called protective factor; if it modi-

Fig. 3.1 Classification of biomedical studies. (Adapted from Bacchieri and Della Cioppa [1])

fies the event in a negative way, it is called risk factor [4]. There are two main types of design for observational studies: prospective (or cohort) and retrospective (or case control) ([1], Chap. 4). In prospective studies, subjects are selected on the basis of the presence or absence of the characteristic. Prospective studies are also referred to as cohort studies. In a prospective study, the researcher selects two groups of subjects, one with the characteristic under study (exposed) and the other without (nonexposed). For example, exposed could be subjects who are current cigarette smokers and nonexposed those who never smoked cigarettes or have quit smoking. With the exception of the characteristic under study, the two groups should be as similar as possible with respect to the distribution of key demographic features (e.g., age, sex, socioeconomic status, health status). Each enrolled subject is then observed for a predefined period to assess if, when, and how the event occurs. In our example, the event could be a diagnosis of lung cancer. Prospective studies can be classified based on time in three types: concurrent (the researcher selects exposed and nonexposed subjects in the present and prospectively follows them into the future), non-concurrent (the researcher goes back in time, selects exposed and nonexposed subjects based on exposure in the past, and then traces all the information relative to the event of interest up to the present), and cross-sectional (the researcher selects subjects based on the presence/absence of

28

the characteristic of interest in the present and searches the event in the present). In retrospective studies, subjects are selected on the basis of the presence or absence of the event. Retrospective studies are often referred to as case-control studies. In a retrospective study, the researcher selects two groups of subjects, one group with the event of interest (cases) and the other without (controls). In order to increase comparability between cases and controls, each case is often matched to one or more controls for a few key demographic features (e.g., sex, age, ethnicity). In our example, cases are subjects with a diagnosis of lung cancer; each case could be matched with one or more controls, similar for important characteristics, for example, sex, age, work exposure to toxic air pollutants, and socioeconomic status. The medical history of each enrolled subject is then investigated to see whether, during a predefined period of time in the past, he/she was exposed (and when and how much) to the characteristic under study, in our example cigarette smoking. Retrospective studies can be classified based on time in two types: true retrospective (the researcher selects the subjects with and without the event and goes back in time to search for exposure) and cross-sectional (the researcher selects subjects based on the presence/absence of the event but limits the investigation about the exposure to the present).

xperimental or Interventional E Studies In experimental studies, also referred to as interventional, the researcher has the control of the conditions under which the study is conducted. The intervention, typically a therapeutic or preventive treatment, also referred to as an experimental factor, is not simply observed; the subjects are assigned to the intervention by the researcher, generally by means of a procedure called randomization (see below). The assignment of the study subjects to the intervention can be done by groups of subjects (community trial) or, more frequently, by individual subject (clinical trial).

A. Bacchieri and G. D. Cioppa

Many other factors besides the experimental factor can influence the study results. These are referred to as sub-experimental factors. Some are known (e.g., age, sex, previous or concomitant treatments, study site, degree of severity of the disease), but most are unknown. In experimental studies, the investigator not only controls the assignment of the experimental factor but also attempts to control as much as possible the distribution of sub-experimental factors, by means of (a) randomization; (b) predefined criteria for the selection of study subjects (inclusion/exclusion criteria); (c) precise description, in the study protocol, of the procedures to which study subjects and investigators must strictly adhere; and (d) specific study designs (see below). Nevertheless, sub-experimental factors, known and unknown, cannot be fully controlled by the abovementioned techniques. The influences that these uncontrollable factors exercise on the study results are collectively grouped in a global factor referred to as chance. There are two main types of design for experimental studies: between-group and within-group. 1. In between-group studies, different subjects are assigned to different treatments. The conclusions are drawn by comparing independent groups of subjects. The most important design of this class is the randomized parallel group design. 2. In within-group studies, different subjects are assigned to different sequences of treatments, that is, each subject receives more than one treatment. The conclusions are drawn by comparing subjects with themselves. The most important design of this class is the randomized crossover design.

Minimal Intervention Studies This common type of studies somewhat falls in between the observational and the interventional approach. The overall framework is that of an observational study. However, the investigator is not completely hands off: a small degree of

3 Methodological Foundations of Clinical Research

intervention is imposed by the study design, such as a blood draw or collection of other biological fluid, a noninvasive diagnostic procedure, or a questionnaire, hence the definitions “minimal intervention studies” or “low intervention clinical trials” [5]. These studies are often assimilated to observational studies, but individual informed consent is necessary outlining the risks and benefits of the additional procedure. In the rest of this chapter, we will focus on clinical trials, which are the most commonly used type of experimental studies.

he Logical Approach to Defining T the Outcome of a Clinical Trial Let us assume we are the principal investigator of a clinical trial evaluating two treatments against obesity: A (experimental treatment) versus B (control treatment). The sample size of the trial is 600 subjects (300 per treatment group). The primary outcome variable (or end-point; see below), as defined in the protocol, is the weight expressed in kilograms after 1 month of treatment and is summarized at the group level in terms of mean. After over 1 year of hard work to set up the trial, recruit the patients, and follow them up, results finally come. These are as follows: • Experimental treatment (A), mean weight: 104 kg. • Control treatment (B), mean weight: 114 kg. To simplify matters, we assume no imbalance of the average weight of the subjects at baseline and ignore the variability of the measurements, expressed by the standard deviation (clearly, in real life, both aspects are considered in the analysis and interpretations of results). After only 1 month of treatment, the group receiving the new treatment lost on average 10 kg, compared to the group receiving the traditional treatment. Most likely, investigators would be inclined to rejoice at this finding. We want to believe that the observed difference is attributable to the new treatment and that we are on the verge of an important advancement in the management of obesity.

29

Unfortunately, this belief is not necessarily the case. In fact, three factors may contribute to different degrees to the observed difference: chance, bias, and treatment. The first two must be ruled out with a reasonable degree of certainty before attributing the outcome to the treatment. The first question when confronting any observed difference between treatment groups must always be: can chance be the main reason for the observed difference? In clinical trials, the answer is given by a properly conducted statistical analysis. The famous p value expresses the probability of obtaining a difference as large as the one observed, or even larger, simply by chance, that is, under the hypothesis of no true difference between groups (null hypothesis). If this probability is lower than a predefined (and totally arbitrary) threshold, traditionally fixed at 5% (p 0.95). Textbox 21.2 Example Keywords and Regex Patterns for Fall Identification

a fall; recurrent fall; time of fall; falls?; fell; fallen; collapsed; slipped; tripped; syncope; falling; syncopal (events?|episodes?|spells?); found (\S+\ s+){0,3}on the ground; on (\S+\s+){0,3} way down

21 Clinical Natural Language Processing in Secondary Use of EHR for Research

eep Learning Approach D We use BERTbase, a pre-trained model with pre- trained sentences on unpublished books and Wikipedia, to perform the sequential sentence classification task. The pre-trained BERT model is adopted from the original Google BERT GitHub repository (https://github.com/google-research/ bert). The model contains 768 hidden layers and 12 self-attention heads. For the model fine-tuning, the maximum sequence length (e.g., 512) and batch size (e.g., 32) need to be configured. The early stopping technique is applied to identify the epoch number and prevent overfitting. Sample codes for both approaches can be found at https:// github.com/OHNLP/CRI_Chapter22.

Model Evaluation The models are evaluated on an independent test set based on the mention or sentence level. The presented evaluation results in Fig. 21.6 indicated the model achieve 0.895, 0.9912, 0.770, 0.997, and 0.828 in sensitivity, specificity, PPV, NPV, and F1-score, respectively. The error analysis can be performed by manually reviewing incorrect cases. Through the error analysis, we are able to identify false-negative and false-positive samples for future improvement.

Fig. 21.6 Example of confusion matrix and error cases

443

Clinical NLP Resources n Overview of Clinical NLP A Community Challenges Clinical NLP-related challenges or shared tasks are community activities or competitions with the objective of developing task-specific NLP algorithms within a certain timeline. Solutions will be evaluated using standardized criteria across all participating teams. The top winning team will be awarded small prizes or be invited to disseminate their methods through conference or journal submissions. The challenge starts by calling for participation and releasing the task details. For example, in the 2019 National NLP Clinical Challenge (n2c2) Family History Extraction challenge, the task was to extract mentions of family members in clinical notes and observations (diseases) in the family history. Common timeline for the challenge includes participant registration (e.g., team formulation, data usage agreement), training data release, test data release, submission due, results release, and abstract or manuscript submission. Community challenges have been serving as a vital role in advancing NLP methodologies, disseminating NLP knowledge resources (e.g., annotation guidelines and corpora), engaging informatics

444

researchers, and promoting interdisciplinary collaboration. Furthermore, since the tasks in each challenge are well-defined and standardized by the organizers, coupling with de-identified and made publicly accessible corpora, they are usually regarded as standard benchmarks for the state-of-the-art NLP performance evaluation. Well-known clinical NLP tasks include the Semantic Evaluation (SemEval) challenges [110–112], BioCreative/OHNLP [113–116], the Informatics for Integrating Biology and the Bedside (i2b2) challenges [117–121], the National NLP Clinical Challenge (n2c2) [122], and the Conference and Labs of the Evaluation Forum (CLEF) eHealth challenges [110, 111].

S. Fu et al.

using the Unstructured Information Management Architecture framework (UIMA) [131] and OpenNLP natural language processing toolkit under the Apache project. MedTagger is a resource-driven open-source UIMA-based IE framework developed under the Open Health Natural Language Processing (OHNLP) Consortium aiming to create an interoperable, scalable, and usable NLP ecosystem [124]. Meanwhile, major technology companies have all embraced clinical NLP with commercial solutions available on the market (e.g., IBM Watson [132], Google Healthcare Natural Language API [133], or Amazon Comprehend Medical [134]).

n Overview of Clinical NLP Toolkits A and Packages An Overview of Clinical NLP Systems NLP packages and toolkits are useful resources and Toolkits for developing clinical NLP solutions, especially for text-preprocessing and machine learning An Overview of Clinical NLP Systems approaches. Well-known toolkits include WEKA NLP systems (frameworks) are important [135], MALLET [136], OpenNLP [137], SPLAT resources for the development, standardization, [138], NLTK [139], and SpaCy [140]. Recently, and streamlined execution of symbolic methods. there has been a rapid growth in the number of The key advantage of NLP systems is the built-in open-source deep learning packages (frameand modularized text (pre-)processing pipeline works). Common examples of these packages are such as sentence detector, tokenizer, part-of- Torch [141], Theano [142], MxNet [143], speech tagger, chunking annotator, section TensorFlow [144], PyTorch [145], Keras [146], detector, information extractor, and context and CNTK [147]. Although studies have found annotator [123, 124]. Different NLP systems variations in the GPU performance and memory have been developed at different institutions, management among these libraries [148, 149], including MedLEE [125], MetaMap [126], most of the packages share similar core compeKnowledgeMap [127], cTAKES [123], HiTEX tencies, and the selection of appropriate packages [128], CLAMP [129], and MedTagger [124]. can be based on the research environment and MedLEE is one of the earliest clinical NLP sys- user preference. tems developed and was originally developed for providing clinical decision support for radiographs. The system has been subsequently Challenges, Opportunities, expanded for processing different clinical docu- and Future Directions ments such as discharge summaries, pathology reports, and radiology reports [125, 130]. Despite the notable benefits of leveraging NLP to MetaMap, developed by the National Library of facilitate clinical research, there remain several Medicine (NLM), is a highly configurable sys- open challenges. In this section, we discussed tem for providing access and mapping from clin- three challenges that need to be investigated in ical text to the Unified Medical Language System the future including reproducibility and scientific (UMLS) Metathesaurus [126]. cTAKES is one rigor, multisite NLP collaboration, and federated of the most commonly used tools developed learning and evaluation.

21 Clinical Natural Language Processing in Secondary Use of EHR for Research

Reproducibility and Scientific Rigor Considering that many NLP solutions could serve as middleware applications (i.e., supplying research data) for clinical research, the validity of research outcomes for such studies is dependent on the robustness and trustworthiness of the NLP models used as well as the quality of the data being fed into these models [150–152]. Existing clinical NLP applications face challenges in the form of various data quality issues caused by the heterogeneity of the EHR environment. Since EHR systems are primarily designed for patient care and billing, routinely generated and documented clinical information may suffer from potential data quality issues when being used for clinical research. Furthermore, the EHR system itself may have a strong impact on the syntactic and semantic meaning of patient narratives due to its built-in documentation functionality such as smart forms, templates, and macros. Therefore, it is important to have a good understanding of EHR data before the model development and deployment effort. In addition to data quality, reproducibility, which measures the ability to obtain the same (or similar enough) result following the same (or sufficient details) computational steps, is another important criterion for trusted NLP solutions. In the context of clinical NLP, the criterion emphasizes the need for information resource (e.g., corpus, system, and associated research metadata such as inclusion and exclusion criteria used) provenance and process transparency to ensure scientific rigor. Another quality dimension that is commonly referred to as a potential factor of “user trust” and safety is interpretability [153]. In clinical research, the explanations of NLP results may serve as important criteria for the evaluation of the model’s capability to explain why a certain decision is made.

Multisite NLP Collaboration Compared with manual chart review, NLP solutions are distinctive in their ability to systematically extract clinical concepts from clinical text,

445

offering high-throughput solutions for automated data abstraction across multiple different institutions. Therefore, NLP has strong potential to be used to facilitate multisite clinical research collaborations and national-wide research registry development. However, successfully deploying an existing NLP solution to a different EHR environment is nontrivial. We highlight three important NLP dimensions to be considered including implementability, portability, and customizability. Implementability evaluates the feasibility of deploying NLP solutions to the clinical environment. The NLP implementation process is highly dependent on institutional infrastructure, system requirements, data usage agreements, and research and practice objectives. Besides, how NLP models are packaged can also affect the complexity of implementation. For example, whether the NLP solutions can be packaged into a standalone tool or need to be integrated into existing infrastructures would demand different implementation processes [100]. After the deployment, the performance of NLP needs to be re-evaluated in each local environment. Many studies have found that NLP algorithms developed in one institution for a study may not perform well when reused in the same institution or deployed to a different institution or for different studies [154]. The degradation of NLP performance at a different site is often referred to as an NLP portability issue. The differences in EHR systems, care practice, and data documentation standards across institutions may contribute to the variability in clinical documentation and non-optimal performance of NLP systems. To address that, a local evaluation and refinement process can potentially improve the system. The feasibility of system refinement is dependent on the customizability of each system, which measures how easily each model can be adapted, modified, and refined based on existing implementation when a concept definition is changed or there is an update to clinical guidelines. This quality dimension can affect the choices between different NLP approaches (e.g., symbolic vs. machine learning) for multisite studies.

S. Fu et al.

446

Federated Learning and Evaluation Another barrier of developing robust and portable NLP solutions is the lack of multisite data due to the regulations, privacy, and security requirements surrounding protected health information (PHI) and the high cost of creating well-annotated and curated clinical corpus [34, 155]. Federated learning, a machine learning approach to train statistical models on remote devices, can be potentially leveraged to address data sharing challenges [156, 157]. The learning can be achieved by allowing individual sites to collaboratively train a model and send incremental updates for immediate aggregation to achieve the shared learning objectives without the need to distribute data [156, 157]. Traditional federated learning is, however, limited only to machine learning approaches. To further enhance the process transparency and model interpretability, the OHNLP Consortium [158] adapt the federated learning approach and proposed a collaborative NLP development framework [159]. The framework contains a usercentric crowdsourcing interface for collaborative ruleset development and a transparent multisite participation workflow on corpus development and evaluation [159]. Site-specific knowledge and findings can therefore be effectively aggregated and synthesized. Another similar concept is federated evaluation, a process of deploying NLP solutions to local institutions, running models on local data, sharing performances to a centralized location (e.g., cloud server). For example, the NLP Sandbox, developed by the National Center for Data to Health (CD2H), is a federated evaluation platform that enables the continuous benchmarking of NLP models on data hosted at different sites through Docker containers. Through this approach, institutional-specific findings and knowledge can be learned and shared without transferring PHI information.

Conclusion In conclusion, this chapter provided an overview of clinical NLP in the context of the secondary use of EHR for clinical research. A case study of aging was conducted to demonstrate an end-to-

end process of NLP development and evaluation. We further discussed three open challenges and highlighted the importance of translational science and community engagement efforts for leveraging clinical NLP applications to support research.

References 1. Jha AK. Meaningful use of electronic health records: the road ahead. JAMA. 2010;304(15):1709–10. 2. McCoy TH Jr, Han L, Pellegrini AM, Tanzi RE, Berretta S, Perlis RH. Stratifying risk for dementia onset using large-scale electronic health record data: a retrospective cohort study. Alzheimers Dement. 2020;16:531. 3. Reis BY, Kohane IS, Mandl KD. Longitudinal histories as predictors of future diagnoses of domestic abuse: modelling study. BMJ. 2009;339:339. 4. Qeadan F, VanSant-Webb E, Tingey B, Rogers TN, Brooks E, Mensah NA, et al. Racial disparities in COVID-19 outcomes exist despite comparable Elixhauser comorbidity indices between blacks, Hispanics, native Americans, and whites. Sci Rep. 2021;11(1):1–11. 5. Zhou M, Zheng C, Xu R. Combining phenome- driven drug-target interaction prediction with patients’ electronic health records-based clinical corroboration toward drug discovery. Bioinformatics. 2020;36(Suppl_1):i436–44. 6. Garets D, Davis M. Electronic medical records vs. electronic health records: yes, there is a difference. Policy white paper Chicago, HIMSS Analytics. 2006:1–14. 7. Gilbert EH, Lowenstein SR, Koziol-McLain J, Barta DC, Steiner J. Chart reviews in emergency medicine research: where are the methods? Ann Emerg Med. 1996;27(3):305–8. 8. Kaur H, Sohn S, Wi CI, Ryu E, Park MA, Bachman K, et al. Automated chart review utilizing natural language processing algorithm for asthma predictive index. BMC Pulm Med. 2018;18(1):34. 9. Wang Y, Wang L, Rastegar-Mojarad M, Moon S, Shen F, Afzal N, et al. Clinical information extraction applications: a literature review. J Biomed Inform. 2018;77:34–49. 10. Fu S, Carlson LA, Peterson KJ, Wang N, Zhou X, Peng S, Jiang J, Wang Y, St Sauver J, Liu H. Natural language processing for the evaluation of methodological standards and best practices of EHR-based clinical research. AMIA Summits Transl Sci Proc. 2020;2020:171–80. 11. Manning C, Raghavan P, Schütze H. Introduction to information retrieval. Nat Lang Eng. 2010;16(1):100–3. 12. Manning CD, Manning CD, Schütze H. Foundations of statistical natural language processing. MIT press; 1999.

21 Clinical Natural Language Processing in Secondary Use of EHR for Research 13. Chute CG. The horizontal and vertical nature of patient phenotype retrieval: new directions for clinical text processing. In: Proceedings of the AMIA symposium. American Medical Informatics Association; 2002. 14. Weng C, Tu SW, Sim I, Richesson R. Formal representation of eligibility criteria: a literature review. J Biomed Inform. 2010;43(3):451–67. 15. Van Spall HG, Toren A, Kiss A, Fowler RA. Eligibility criteria of randomized controlled trials published in high-impact general medical journals: a systematic sampling review. JAMA. 2007;297(11):1233–40. 16. Kaggal VC, Elayavilli RK, Mehrabi S, Pankratz JJ, Sohn S, Wang Y, et al. Toward a learning health- care system–knowledge delivery at the point of care empowered by big data and NLP. Biomed Inform Insights. 2016;8:BII.S37977. 17. Hanauer DA, Mei Q, Law J, Khanna R, Zheng K. Supporting information retrieval from electronic health records: a report of University of Michigan’s nine-year experience in developing and using the Electronic Medical Record Search Engine (EMERSE). J Biomed Inform. 2015;55:290–300. 18. Cowie J, Wilks Y. Information extraction. In: Handbook of natural language processing, vol. 56; 2000. p. 57. 19. Nadeau D, Sekine S. A survey of named entity recognition and classification. Lingvisticae Investigationes. 2007;30(1):3–26. 20. Marsh E, Perzanowski D. MUC-7 evaluation of IE technology: overview of results. In: Seventh message understanding conference (MUC-7): proceedings of a conference held in Fairfax, Virginia, Apr 29–May 1, 1998. 21. Torii M, Wagholikar K, Liu H. Using machine learning for concept extraction on clinical documents from multiple data sources. J Am Med Inform Assoc. 2011;18(5):580–7. 22. Si Y, Wang J, Xu H, Roberts K. Enhancing clinical concept extraction with contextual embeddings. J Am Med Inform Assoc. 2019;26(11):1297–304. 23. Fu S, Chen D, He H, Liu S, Moon S, Peterson KJ, et al. Clinical concept extraction: a methodology review. J Biomed Inform. 2020;109:103526. 24. Kent DM, Leung LY, Zhou Y, Luetmer PH, Kallmes DF, Nelson J, et al. Association of silent cerebrovascular disease identified using natural language processing and future ischemic stroke. Neurology. 2021;97(13):e1313–21. 25. Wyles CC, Tibbo ME, Fu S, Wang Y, Sohn S, Kremers WK, et al. Use of natural language processing algorithms to identify common data elements in operative notes for total hip arthroplasty. J Bone Joint Surg Am. 2019;101(21):1931. 26. Fu S, Wyles CC, Osmon DR, Carvour ML, Sagheb E, Ramazanian T, et al. Automated detection of periprosthetic joint infections and data elements using natural language processing. J Arthroplast. 2021;36(2):688–92. 27. Lott JP, Boudreau DM, Barnhill RL, Weinstock MA, Knopp E, Piepkorn MW, et al. Population-based

447

analysis of histologically confirmed melanocytic proliferations using natural language processing. JAMA Dermatol. 2018;154(1):24–9. 28. Hylan TR, Von Korff M, Saunders K, Masters E, Palmer RE, Carrell D, et al. Automated prediction of risk for problem opioid use in a primary care setting. J Pain. 2015;16(4):380–7. 29. Fu S, Lopes GS, Pagali SR, Thorsteinsdottir B, LeBrasseur NK, Wen A, et al. Ascertainment of delirium status using natural language processing from electronic health records. J Gerontol A. 2022;77(3):524–30. 30. Developing a framework for detecting asthma endotypes from electronic health records. Am J Respir Crit Care Med. In: 2014 Conference American Thoracic Society International Conference, ATS 2014, San Diego, CA, p 189 31. Fu S, Leung LY, Wang Y, Raulli A-O, Kallmes DF, Kinsman KA, et al. Natural language processing for the identification of silent brain infarcts from neuroimaging reports. JMIR Med Inform. 2019;7(2):e12109. 32. Chase HS, Mitrani LR, Lu GG, Fulgieri DJ. Early recognition of multiple sclerosis using natural language processing of the electronic health record. BMC Med Inform Decis Mak. 2017;17(1):24. 33. Wu ST, Wi CI, Sohn S, Liu H, Juhn YJ. Staggered NLP-assisted refinement for clinical annotations of chronic disease events. In: 10th International conference on language resources and evaluation, LREC 2016. European Language Resources Association (ELRA); 2016. 34. Fu S, Leung LY, Raulli A-O, Kallmes DF, Kinsman KA, Nelson KB, et al. Assessment of the impact of EHR heterogeneity for clinical research through a case study of silent brain infarction. BMC Med Inform Decis Mak. 2020;20:1–12. 35. Leech G. Corpus annotation schemes. Literary Linguist Comput. 1993;8(4):275–81. 36. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20(1):37–46. 37. Van Rijsbergen CJ. The geometry of information retrieval. Cambridge University Press; 2004. 38. Sager N. Natural language information processing: a computer grammmar of english and its applications. Addison-Wesley Longman Publishing Co., Inc.; 1981. 39. Sager N, Friedman C, Lyman MS. Medical language processing: computer management of narrative data. Addison-Wesley Longman Publishing Co., Inc.; 1987. 40. Devlin J, Chang M-W, Lee K, Toutanova K. Bert: pre-training of deep bidirectional transformers for language understanding. 2018. arXiv preprint arXiv:181004805. 41. Xiao C, Choi E, Sun J. Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J Am Med Inform Assoc. 2018;25(10):1419–28. 42. Wu S, Roberts K, Datta S, Du J, Ji Z, Si Y, et al. Deep learning in clinical natural language process-

448 ing: a methodical review. J Am Med Inform Assoc. 2019;27:457. 43. Childs LC, Enelow R, Simonsen L, Heintzelman NH, Kowalski KM, Taylor RJ. Description of a rule- based system for the i2b2 challenge in natural language processing for clinical data. J Am Med Inform Assoc. 2009;16(4):571–5. 44. Clancey WJ. The epistemology of a rule-based expert system—a framework for explanation. Artif Intell. 1983;20(3):215–51. 45. Cimino JJ. Desiderata for controlled medical vocabularies in the twenty-first century. Methods Inf Med. 1998;37(4/5):394–403. 46. Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 2004;32(Suppl_1):D267–70. 47. Lipscomb CE. Medical subject headings (MeSH). Bull Med Libr Assoc. 2000;88(3):265. 48. Carrell DS, Schoen RE, Leffler DA, Morris M, Rose S, Baer A, et al. Challenges in adapting existing clinical natural language processing systems to multiple, diverse health care settings. J Am Med Inform Assoc. 2017;24(5):986–91. 49. Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform. 2001;34(5):301–10. 50. Sebastiani F. Machine learning in automated text categorization. ACM Comput Surv (CSUR). 2002;34(1):1–47. 51. Freitag D. Machine learning for information extraction in informal domains. Mach Learn. 2000;39(2–3):169–202. 52. Alpaydin E. Introduction to machine learning. MIT Press; 2009. 53. Hastie T, Tibshirani R, Friedman J, Franklin J. The elements of statistical learning: data mining, inference and prediction. Math Intell. 2005;27(2): 83–5. 54. Doan S, Xu H. Recognizing medication related entities in hospital discharge summaries using support vector machine. Proc Int Conf Comput Ling. 2010;2010:259–66. 55. Hoogendoorn M, Szolovits P, Moons LMG, Numans ME. Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer. Artif Intell Med. 2016;69:53–61. 56. Sarker A, Gonzalez G. Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J Biomed Inform. 2015;53:196–207. 57. Sohn S, Larson DW, Habermann EB, Naessens JM, Alabbad JY, Liu H. Detection of clinically important colorectal surgical site infection using Bayesian network. J Surg Res. 2017;209:168–73. 58. Rochefort CM, Verma AD, Eguale T, Lee TC, Buckeridge DL. A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative electronic health record data. J Am Med Inform Assoc. 2014;22(1):155–65.

S. Fu et al. 59. Gaebel J, Kolter T, Arlt F, Denecke K. Extraction of adverse events from clinical documents to support decision making using semantic preprocessing. Stud Health Technol Inform. 2015;216:1030. 60. Pandey C, Ibrahim Z, Wu H, Iqbal E, Dobson R. Improving RNN with attention and embedding for adverse drug reactions. In: 7th International conference on digital health, DH 2017. Association for Computing Machinery; 2017. 61. Liu Z, Yang M, Wang X, Chen Q, Tang B, Wang Z, et al. Entity recognition from clinical texts via recurrent neural network. BMC Med Inform Decis Mak. 2017;17(Suppl 2):67. 62. Liu Z, Tang B, Wang X, Chen Q. De-identification of clinical notes via recurrent neural network and conditional random field. J Biomed Inform. 2017;75S:S34–42. 63. Luu TM, Phan R, Davey R, Chetty G. A multilevel NER framework for automatic clinical name entity recognition. In: 17th IEEE international conference on data mining workshops, ICDMW 2017. IEEE Computer Society; 2017. 64. Tran T, Kavuluru R. Predicting mental conditions based on “history of present illness” in psychiatric notes with deep neural networks. J Biomed Inform. 2017;75S:S138–S48. 65. Gehrmann S, Dernoncourt F, Li Y, Carlson ET, Wu JT, Welt J, et al. Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives. PLoS One. 2018;13(2):e0192360. 66. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems; 2013. 67. Quinlan JR. Induction of decision trees. Mach Learn. 1986;1(1):81–106. 68. Kleinbaum DG, Dietz K, Gail M, Klein M, Klein M. Logistic regression. Springer; 2002. 69. Pearl J. Bayesian networks: a model cf self-activated memory for evidential reasoning. In: Proceedings of the 7th conference of the Cognitive Science Society. Irvine, CA: University of California; 1985. 70. Fix E, Hodges JL. Discriminatory analysis. Nonparametric discrimination: consistency properties. Int Stat Rev/Revue Internationale de Statistique. 1989;57(3):238–47. 71. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. 72. Baum LE, Petrie T. Statistical inference for probabilistic functions of finite state Markov chains. Ann Math Stat. 1966;37(6):1554–63. 73. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20(3):273–97. 74. Tsochantaridis I, Joachims T, Hofmann T, Altun Y. Large margin methods for structured and interdependent output variables. J Mach Learn Res. 2005;6:1453–84. 75. Lafferty J, McCallum A, Pereira FC. Conditional random fields: probabilistic models for segmenting and labeling sequence data. 2001.

21 Clinical Natural Language Processing in Secondary Use of EHR for Research 76. Tang B, Cao H, Wu Y, Jiang M, Xu H. Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features. BMC Med Inform Decis Mak. 2013;13:S1. 77. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436. 78. Chen D, Liu S, Kingsbury P, Sohn S, Storlie CB, Habermann EB, et al. Deep learning and alternative learning strategies for retrospective real-world clinical data. NPJ Digit Med. 2019;2(1):43. 79. LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient- based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324. 80. Chen H, Lin Z, Ding G, Lou J, Zhang Y, Karlsson B. GRN: gated relation network to enhance convolutional neural network for named entity recognition. Proc AAAI. 2019;33:6236. 81. Tan LK, Liew YM, Lim E, McLaughlin RA. Convolutional neural network regression for short-axis left ventricle segmentation in cardiac cine MR sequences. Med Image Anal. 2017;39:78–86. 82. Rios A, Kavuluru R. Convolutional neural networks for biomedical text classification: application in indexing biomedical articles. In: Proceedings of the 6th ACM conference on bioinformatics, computational biology and health informatics. Atlanta, GA: ACM; 2015. 83. Rumelhart DE, Hinton GE, Williams R. Learning representations by back-propagating errors. Nature. 1986;323(6088):533–6. 84. Cocos A, Fiks AG, Masino AJ. Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in twitter posts. J Am Med Inform Assoc. 2017;24(4):813–21. 85. Jauregi Unanue I, Zare Borzeshi E, Piccardi M. Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition. J Biomed Inform. 2017;76:102–9. 86. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation. 2014. 87. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80. 88. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Advances in neural information processing systems; 2017. 89. Zhang D, Wang D. Relation classification via recurrent neural network. 2015. 90. Hochreiter S. The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Uncertainty Fuzziness Knowl Based Syst. 1998;6(2):107–16. 91. Chung J, Gulcehre C, Cho K, Bengio Y. Gated feedback recurrent neural networks. International conference on machine learning. 2015. 92. Radford A, Narasimhan K, Salimans T, Sutskever I. Improving language understanding by generative

449

pre-training. 2018. https://s3-us-west-2amazonaws. com/openai-assets/researchcovers/languageunsupervised/languageunderstandingpaper.pdf. 93. Peng Y, Yan S, Lu Z. Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. 2019. arXiv preprint arXiv:190605474. 94. Szarvas G, Farkas R, Busa-Fekete R. State-of-the- art anonymization of medical records using an iterative machine learning framework. J Am Med Inform Assoc. 2007;14(5):574–80. 95. Fu S, Thorsteinsdottir B, Zhang X, Lopes GS, Pagali SR, LeBrasseur NK, et al. A hybrid model to identify fall occurrence from electronic health records. Int J Med Inform. 2022;162:104736. 96. Zheng S, Lu JJ, Ghasemzadeh N, Hayek SS, Quyyumi AA, Wang F. Effective information extraction framework for heterogeneous clinical reports using online machine learning and controlled vocabularies. JMIR Med Inform. 2017;5(2):e7235. 97. Meystre SM, Kim Y, Gobbel GT, Matheny ME, Redd A, Bray BE, et al. Congestive heart failure information extraction framework for automated treatment performance measures assessment. J Am Med Inform Assoc. 2017;24(e1):e40–e6. 98. Simon R, Radmacher MD, Dobbin K, McShane LM. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst. 2003;95(1):14–8. 99. Varma S, Simon R. Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics. 2006;7(1):91. 100. Wen A, Fu S, Moon S, El Wazir M, Rosenbaum A, Kaggal VC, et al. Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation. NPJ Digit Med. 2019;2(1):130. 101. Luther SL, McCart JA, Berndt DJ, Hahm B, Finch D, Jarman J, et al. Improving identification of fall-related injuries in ambulatory care using statistical text mining. Am J Public Health. 2015;105(6):1168–73. 102. Tremblay MC, Berndt DJ, Luther SL, Foulis PR, French DD. Identifying fall-related injuries: text mining the electronic medical record. Inf Technol Manag. 2009;10(4):253. 103. Zhu VJ, Walker TD, Warren RW, Jenny PB, Meystre S, Lenert LA. Identifying falls risk screenings not documented with administrative codes using natural language processing. In: AMIA annual symposium proceedings. American Medical Informatics Association; 2017. 104. Patterson BW, Jacobsohn GC, Shah MN, Song Y, Maru A, Venkatesh AK, et al. Development and validation of a pragmatic natural language processing approach to identifying falls in older adults in the emergency department. BMC Med Inform Decis Mak. 2019;19(1):138. 105. McCart JA, Berndt DJ, Jarman J, Finch DK, Luther SL. Finding falls in ambulatory care clinical docu-

450 ments using statistical text mining. J Am Med Inform Assoc. 2013;20(5):906–14. 106. Toyabe S. Detecting inpatient falls by using natural language processing of electronic medical records. BMC Health Serv Res. 2012;12(448):448. 107. dos Santos HDP, Silva AP, Maciel MCO, Burin HMV, Urbanetto JS, Vieira R. Fall detection in EHR using word embeddings and deep learning. In: 2019 IEEE 19th international conference on bioinformatics and bioengineering (BIBE). IEEE; 2019. 108. Murphy SN, Weber G, Mendis M, Gainer V, Chueh HC, Churchill S, et al. Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2). J Am Med Inform Assoc. 2010;17(2):124–30. 109. He H, Fu S, Wang L, Liu S, Wen A, Liu H. MedTator: a serverless annotation tool for corpus development. Bioinformatics. 2022;38:1776. 110. Pradhan S, Elhadad N, Chapman W, Manandhar S, Savova G. Semeval-2014 task 7: analysis of clinical text. In: Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014); 2014. 111. Elhadad N, Pradhan S, Gorman S, Manandhar S, Chapman W, Savova G. SemEval-2015 task 14: analysis of clinical text. In: Proceedings of the 9th international workshop on semantic evaluation (SemEval 2015); 2015. 112. Bethard S, Savova G, Chen W-T, Derczynski L, Pustejovsky J, Verhagen M. Semeval-2016 task 12: clinical tempeval. In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016); 2016. 113. Liu S, Wang Y, Liu H. Selected articles from the BioCreative/OHNLP challenge 2018. Springer; 2019. 114. Rastegar-Mojarad M, Liu S, Wang Y, Afzal N, Wang L, Shen F, et al.. BioCreative/OHNLP challenge 2018. In: Proceedings of the 2018 ACM international conference on bioinformatics, computational biology, and health informatics. ACM; 2018. 115. Wang Y, Afzal N, Liu S, Rastegar-Mojarad M, Wang L, Shen F, et al. Overview of the BioCreative/ OHNLP challenge 2018 task 2: clinical semantic textual similarity. 2018. 116. Liu S, Mojarad MR, Wang Y, Wang L, Shen F, Fu S, et al. Overview of the BioCreative/OHNLP 2018 family history extraction task. 2018 117. Uzuner Ö, Luo Y, Szolovits P. Evaluating the state- of-the-art in automatic de-identification. J Am Med Inform Assoc. 2007;14(5):550–63. 118. Uzuner Ö, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc. 2008;15(1):14–24. 119. Uzuner Ö. Recognizing obesity and comorbidities in sparse data. J Am Med Inform Assoc. 2009;16(4):561–70. 120. Uzuner Ö, Solti I, Cadag E. Extracting medication information from clinical text. J Am Med Inform Assoc. 2010;17(5):514–8.

S. Fu et al. 121. Uzuner O, Bodnari A, Shen S, Forbush T, Pestian J, South BR. Evaluating the state of the art in coreference resolution for electronic medical records. J Am Med Inform Assoc. 2012;19(5):786–91. 122. Stubbs A, Filannino M, Soysal E, Henry S, Uzuner Ö. Cohort selection for clinical trials: n2c2 2018 shared task track 1. J Am Med Inform Assoc. 2019;26(11):1163–71. 123. Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17(5):507–13. 124. Liu H, Bielinski SJ, Sohn S, Murphy S, Wagholikar KB, Jonnalagadda SR, et al. An information extraction framework for cohort identification using electronic health records. AMIA Summits Transl Sci Proc. 2013;2013:149. 125. Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB. A general natural-language text processor for clinical radiology. J Am Med Inform Assoc. 1994;1(2):161–74. 126. Aronson AR, Lang F-M. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17(3):229–36. 127. Denny JC, Irani PR, Wehbe FH, Smithers JD, Spickard III A. The KnowledgeMap project: development of a concept-based medical school curriculum database. In: AMIA annual symposium proceedings. American Medical Informatics Association; 2003. 128. Goryachev S, Sordo M, Zeng QT. A suite of natural language processing tools developed for the I2B2 project. In: AMIA annual symposium proceedings. American Medical Informatics Association; 2006. 129. Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S, Liu H, et al. CLAMP–a toolkit for efficiently building customized clinical natural language processing pipelines. J Am Med Inform Assoc. 2018;25(3):331–6. 130. Bakken S, Hyun S, Friedman C, Johnson S. A comparison of semantic categories of the ISO reference terminology models for nursing and the MedLEE natural language processing system. In: MEDINFO 2004. IOS Press; 2004. 131. Ferrucci D, Lally A. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat Lang Eng. 2004;10(3–4):327–48. 132. High R. The era of cognitive systems: an inside look at IBM Watson and how it works, vol. 1. IBM Corporation, Redbooks; 2012. p. 16. 133. Cloud G. Using the healthcare natural language API. 2022. Available from: https://cloud.google.com/ healthcare-api/docs/how-tos/nlp. 134. Medical AC. Amazon Comprehend Medical— extract information from unstructured medical text accurately and quickly. 2022. Available from: https://aws.amazon.com/comprehend/medical/.

21 Clinical Natural Language Processing in Secondary Use of EHR for Research 135. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explor Newsl. 2009;11(1):10–8. 136. Mimno D. Machine learning with MALLET. 2004. 137. OpenNLP. Welcome to Apache OpenNLP. 2022. Available from: https://opennlp.apache.org/. 138. Quirk C, Choudhury P, Gao J, Suzuki H, Toutanova K, Gamon M, et al.. MSR SPLAT, a language analysis toolkit. In: Proceedings of NAACL-HLT 2012; 2012. 139. Loper E, Bird S. Nltk: the natural language toolkit. 2002. arXiv preprint cs/0205028. 140. Honnibal M, Montani I. spaCy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. Sentometr Res. 2017;7(1):411–20. 141. Collobert R, Kavukcuoglu K, Farabet C. Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS workshop; 2011. 142. Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow I, Bergeron A, et al. Theano: new features and speed improvements. 2012. arXiv preprint arXiv:12115590. 143. Chen T, Li M, Li Y, Lin M, Wang N, Wang M, et al. Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems. 2015. arXiv preprint arXiv:151201274. 144. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al. TensorFlow: a system for large-scale machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16); 2016. 145. Paszke A, Gross S, Chintala S, Chanan G. PyTorch: tensors and dynamic neural networks in Python with strong GPU acceleration. 2017;6(3). 146. Chollet F. Keras: the python deep learning library. Astrophysics source code library. 2018:ascl:1806.022. 147. Seide F, Agarwal A. CNTK: Microsoft’s open- source deep-learning toolkit. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016.

451

148. Yapici MM, Topaloğlu N. Performance comparison of deep learning frameworks. Comput Inform. 2021;1(1):1–11. 149. Elshawi R, Wahab A, Barnawi A, Sakr S. DLBench: a comprehensive experimental evaluation of deep learning frameworks. Clust Comput. 2021;24(3):2017–38. 150. Fu S. TRUST: clinical text retrieval and use towards scientific rigor and transparent process. University of Minnesota; 2021. 151. Fu S, Wen A, Pagali S, Zong N. The implication of latent information quality to the reproducibility of secondary use of electronic health records. Stud Health Technol Inform. 2022;290:173. 152. Fu S, Wen A, Schaeferle GM, Wilson PM. Assessment of data quality variability across two ehr systems through a case study of post-surgical complications. AMIA Annu Symp Proc. 2022;2022:196. 153. Du M, Liu N, Hu XJ. Techniques for interpretable machine learning. Commun ACM. 2019;63(1):68–77. 154. Wagholikar K, Torii M, Jonnalagadda S, Liu H. Feasibility of pooling annotated corpora for clinical concept extraction. AMIA Summits Transl Sci Proc. 2012;2012:38. 155. Chapman WW, Nadkarni PM, Hirschman L, D’avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc. 2011;18:540. 156. Li T, Sahu AK, Talwalkar A, Smith V. Federated learning: challenges, methods, and future directions. IEEE Signal Process Mag. 2020;37(3):50–60. 157. Li L, Fan Y, Tse M, Lin K-Y. A review of applications in federated learning. Comput Ind Eng. 2020;149:106854. 158. Consortium O. OHNLP Consortium 2022. Available from: http://ohnlp.org/. 159. Liu S, Wen A, Wang L, He H, Fu S, Miller R, et al. An open natural language processing development framework for ehr-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C). 2021. arXiv preprint arXiv:211010780.

Part V Evolving Models and New Opportunities for the Transformation of Clinical Research

Back to the Future: The Evolution of Pharmacovigilance in the Age of Digital Healthcare

22

Michael A. Ibara and Rachel L. Richesson

Abstract

Pharmacovigilance is the science and activities relating to the detection, assessment, understanding, and prevention of adverse effects from medicines or vaccines. Pharmacovigilance originated in an attempt to better understand the safety of drugs in order to ensure and protect the safety individual patients and consumers. Over time, the development of the field has been heavily influenced by the need for the pharmaceutical industry to fulfill regulatory requirements, with the unintended result of losing track of the individual patient. With the onset of digitized healthcare data, we have an opportunity to reunite the industrial and personal in pharmacovigilance to increase the scope and efficiency of monitoring and the speed of response. Informatics supports this transformation by advancing a pharmacovigilance research agenda that should include defining conceptual (ontological) and operational definitions for adverse events that can address dif-

M. A. Ibara (*) Elligo Health Research, Princeton, NJ, USA e-mail: [email protected] R. L. Richesson Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA e-mail: [email protected]

ferent product types and regulatory contexts, developing standards and systems to detect and report adverse events at scale and from different data sources, and developing methods (including artificial intelligence and machine learning) to predict risks of adverse events an various populations. Keywords

Pharmacovigilance · Informatics · Adverse drug events · Postmarketing surveillance Pharmacoepidemiology · Quantitative signal detection

Learning Objectives 1. Define the term pharmacovigilance and describe how pharmacovigilance relates to assuring the safety of medications and vaccines. 2. Define the terms “adverse event,” “adverse drug event,” and “adverse drug reaction,” list the four required elements from a regulatory perspective, and discuss the relationship of an adverse event to the notion of causality to a specified medication or vaccine product. 3. Discuss the relationship between adverse events of regulatory interest to adverse events of medical interest and the emerging role of electronic health record data for each.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. L. Richesson et al. (eds.), Clinical Research Informatics, Health Informatics, https://doi.org/10.1007/978-3-031-27173-1_22

455

456

4. Discuss the need for a modernized operational definition for adverse events in the current era, and describe how healthcare and pharmacovigilance workflows and systems can change to accommodate use of electronic health records in AE detection, investigation, and reporting. 5. List and describe four specific informatics topics in need of require research and development to support a modern pharmacovigilance infrastructure. The ideas which are here expressed so laboriously are extremely simple and should be obvious. The difficulty lies, not in the new ideas, but in escaping from the old ones, which ramify, for those brought up as most of us have been, into every corner of our minds. (John Maynard Keynes; from the preface to The General Theory of Employment, Interest, and Money 1936)

Introduction This chapter seeks to provide a foundation for future work in pharmacovigilance for the informatician involved in clinical research. It will not attempt to provide an overview of the field of pharmacovigilance, as this has been covered extensively elsewhere [1] including a previous version of this chapter [2]. Important definitions and resources are presented in Table 22.1. The focus here will be on key developments in pharmacovigilance and related areas as a result of the growing digitization of healthcare data. We will propose an informatics research agenda meant to move the field forward and provide for a more holistic consideration of patient safety. Pharmacovigilance is defined by the World Health Organization (WHO) [5] as “the science and activities relating to the detection, assessment, understanding and prevention of adverse effects or any other drug-related problem.” Pharmacovigilance is a central practice for understanding and assuring drug safety. For an excellent history of the development of pharmacovigilance as a discipline and the general applicability of informatics, see the previous edition of this chapter [2] in which the authors

M. A. Ibara and R. L. Richesson

provide a superb primer for those wishing to gain a better understanding of the topic. A full treatment of the historical, regulatory, industrial, statistical, and medical aspects of the field can be found in several excellent reference works on the topic, especially Stephens’ Detection and Table 22.1 Definitions [2–4] and relevant organizations An adverse event (AE) is broadly defined as any clinical event, sign, or symptom that goes in an unwanted direction. Adverse events also include worsening of preexisting conditions, per FDA and ICH definitions. No assertion of causality is implied with adverse events An adverse drug event (ADE) is harm caused by appropriate or inappropriate use of a drug An adverse drug reaction (ADR) includes the suggestion of a causal relationship (e.g., probable, possible) between the event and a therapeutic agent or device. Adverse drug reactions are a subset of ADEs, where harm is directly caused by a drug under appropriate use (i.e., at normal doses). After an ADR is suspected (i.e., adverse consequences are speculated to be caused from a drug), then careful and systematic data collection is required to evaluate that suspicion for further action The international conference on harmonization (ICH E2B) issues international safety reporting guidance The US Food and Drug Administration (FDA) supports the FDA adverse event reporting system (FAERS) database of AE and medication error reports and product quality complaints (resulting in AEs) submitted to FDA as part of FDA’s postmarketing safety surveillance program for drug and therapeutic biologic products. Reporting requirements are summarized at: https://open.fda.gov/data/faers/ The European medicines agency (EMA) is a decentralized agency of the European Union (EU) responsible for the scientific evaluation, supervision and safety monitoring of medicines in the EU The Council for International Organizations of medical sciences (CIOMS) has been instrumental in developing pharmacovigilance standards and practice. Both CIOMS and ICH operate as forums for discussion and standardization of drug safety methods and requirements The WHO international drug monitoring program, supported and coordinated by the WHO collaborating Centre for International Drug Monitoring (“the Uppsala monitoring Centre”), serves as a globally integrated and deliberate pharmacovigilance system and maintains the international database of adverse drug events Adverse events and medication errors are coded using terms in the medical dictionary for regulatory activities (MedDRA)

22 Back to the Future: The Evolution of Pharmacovigilance in the Age of Digital Healthcare

Evaluation of Adverse Drug Reactions: Principles and Practice [6] and Mann’s Pharmacovigilance [4].

Background Pharmacovigilance originated as an attempt to better understand the safety of drugs in order to protect individual patients and improve medicine. But while today pharmacovigilance plays a key and vital role in the research and public health arena, to the uninitiated, it can seem bureaucratic, arcane, and arbitrary. This is due mainly to the myriad influences on the field from medicine, public health, industry, and regulation, as well as from broad interest in the topic by patients and practitioners, academic and industry researchers, and regulatory and legal bodies—all groups who have a stake in the endeavor. Over time, pharmacovigilance has taken on the shape of these combined influences, and their often disparate demands have led to a balkanization of the original pharmacovigilance landscape. Today, what a biopharmaceutical industry professional would describe as the daily work of pharmacovigilance would be unrecognizable to the layman or even to healthcare researchers in safety not otherwise engaged with industry. For a number of years, the most significant forces of differentiation in pharmacovigilance were (and remain) the regulatory and legal requirements to which drug and device manufacturers must comply (hence the often-quoted statement by industry professionals that “compliance” is their first priority). And while healthcare practitioners are subjected to significant regulations and laws as well, a difference in focus and content means that “drug safety” in a healthcare or academic research setting has come to mean something quite different from the industrial use of the term. As the field developed over the last 50 years, the patient was seen as the recipient of any learning and good practices in research on safety of drugs and medical devices but was only taken seriously as a participant at the level of their individual healthcare provider. Both indus-

457

trial and academic researchers saw the patient more as a source rather than a collaborator in their own health and well-being. The result of these trends is that, today, pharmacovigilance looks very much like the rest of healthcare: siloed and having difficulty interoperating with other healthcare components. With separate standards, processes, systems, and data stores, various practitioners of “drug safety” work on their individual agendas, not noticing or acknowledging that they share (or could share) the same data with researchers in other fields of pharmacovigilance. But today, the increasing digitization of healthcare data is challenging this compartmentalization as it becomes possible to have a single data source serve a host of downstream practitioners and researchers, as well as the empowered patient. What is less obvious, but we argue even more significant, is that the digitization of healthcare data creates the possibility for a return to the original aspirations of the field—where we can recapture the original goals of pharmacovigilance and reunite the individual, population, academic, and industrial pursuits to an extent that benefits all stakeholders but most especially which allows us to realize one of the original goals of pharmacovigilance, to protect the individual while contributing to greater understanding at a population level. Practitioners in academic, medical, and industrial settings are finding themselves more often than not pursuing and working with the same data from the same sources. It is encouraging to imagine that they will also work on research topics that will help to reunify the field of pharmacovigilance and move it forward. The previous edition of this chapter [7] provided a full exposition of the Coasian economic approach [8, 9] to pharmacovigilance to support the thesis that the digitization of healthcare data creates opportunities to unify the field of pharmacovigilance. As this has been amply demonstrated over the last several years, here we provide a summary of the approach of Coasian economics as it applies to the field of Pharmacovigilance, below.

458

Coasian Transactions: The Development and Evolution of PV/Drug Safety The Coasian development of pharmacovigilance can be outlined as follows: 1. Historically, pharmacovigilance was largely developed by vertical organizations having the resources to find, collect, and process safety information—drug and device manufacturers. 2. These organizations were the de facto owners of safety information and responsible for it (focus of regulations) because they were the only organizations able to afford the transaction costs. 3. As healthcare data has become digitized, there has been a dramatic lowering of the “transaction cost” of finding, collecting, and reporting safety information. 4. The movement of AE transaction costs toward zero means that the economic incentives to maintain vertical organizations for pharmacovigilance will no longer be present. 5. With AE transaction able to be horizontally (across different organizations), this creates an environment where new business models and opportunities are encouraged. If we view pharmacovigilance through a Coasian lens, we see that not only what we call adverse events but also related healthcare data which may impact our assessments, or which can be used in novel ways to improve our ability to practice pharmacovigilance, will continue to increase in number and at an increasing rate, for the foreseeable future. The challenge for us is to unify (or reunify) the very different professional guilds that have developed as previously described. While it is tempting to imagine that new techniques or methods will simply wipe away traditional practices, this is rarely the case for scientific revolutions [10] let alone for a field with the complexities of

M. A. Ibara and R. L. Richesson

healthcare entwined with the economics of industry and regulatory concerns. While it is beyond the scope of this chapter, an examination of the potential gains in health and economic terms to be achieved from a unification of the field across these areas is motivation enough to hold this out as a goal. What follows is a proposed research agenda which concentrates on a few areas (1) that provide common ground among researchers, industry professionals, and regulators, (2) in which technological advances are beginning to provide significant advances, and (3) in which research informaticists can provide major contributions and guidance.

esearch Agenda for Modern R Pharmacovigilance A Note on Machine Learning Over the last few years, as computing power has reached sufficient levels and research has matured, there has been an explosion in the application of machine learning techniques to many areas in healthcare and pharmaceutical research [11–13]. Such is the meteoric rise in the use of machine learning and algorithmic computation across healthcare and research that research topics 3, 4, and 5 in the sections below are largely concerned with the impact in these areas, whereas just a few years ago, they would be mentioned in passing. It is no longer possible to approach a research agenda for pharmacovigilance without careful consideration of how these techniques and technologies are changing what is possible. But while their impact is considered here in light of their impact on the field, this chapter makes no attempt to evaluate specific techniques in machine learning or artificial intelligence, except as they apply to the specific research topics described below.

22 Back to the Future: The Evolution of Pharmacovigilance in the Age of Digital Healthcare

opic 1: The Operational Definition T of an Adverse Event The regulatory definition of an adverse event (AE)1 is well established, with the term coming into common use in the 1930s and being refined in the 1960s and 1970s, at the same time that formal pharmacovigilance systems began to be established [6]. There has been a refinement of the term since then, but the general definition has remained fairly stable. For our purposes, what is important to note is that the definition of an AE was conceived at a time when the Internet, social media, big data, and the promise of large amounts of digital healthcare data were nascent or nonexistent. The most important effect this has had on the definition of an AE is to cast it in terms of a paper metaphor—we picture in our minds collecting AEs onto forms, and we think of the various elements of the form, the amount of information to be collected, and the location of what type of information should go together, all in terms of a piece of paper. The insidious use of this metaphor encourages a habitual mode of thought which, having been ossified in regulatory definitions, is hard to escape. And while the metaphor has been extended significantly, initially to cover copies and facsimiles and later to include the concept of electronic data stores, the impact of the Internet and the wholesale digitization of healthcare data have stretched the paper metaphor to its limit. It is past time for a re-examination of the fundamental definitions of the field. The need to update our concepts in regard to how we define AEs becomes evident when we seek to operationalize the definition of an AE in order to implement it into systems and use it for research. The classic operational definition derived originally from regulatory use is that a Those familiar with the use of the term “ADR” (adverse drug reaction) vs “AE” (adverse event) should note that this discussion does not attempt to differentiate between those stricter definitions. Here the term “AE” is meant to be used in a general sense of a reported or noticed problem or concern. 1

459

valid adverse event report has “four elements”: an identifiable patient, an identifiable reporter, a suspect drug, and a serious adverse event or fatal outcome [14]. Over time the requirements for a regulatory report (which were created to help busy doctors understand what to report on a piece of paper) have become conflated with the definition of an AE, to the point where we might define a report that is missing these elements as irrelevant. But when we understand that the “four elements” are simply an operational definition meant to assist doctors in reporting, we can see that, given the digitization of healthcare data today, there is a need for a new operational definition. An example illustrates the difficulties that arise from the mismatch of our concepts and the digital reality today in healthcare. In 2010, a pilot study demonstrated for the first time that it was possible to collect AEs at the point of care directly from an electronic health record, with minimal impact on clinicians, and to have those events sent electronically to FDA, in a matter of minutes after the initial recognition of the event [15]. At the time this study was performed, one of the authors engaged in fierce debate with industry colleagues over the fact that the individual physician’s name was masked on the report (although the medical institution was known) and therefore the report was not a “qualified” AE (personal communication). This arcane argument took place as a result of an outdated operational definition for an AE, so that even though we could infer the existence of an individual physician given the design and operation of the electronic health record, the exact requirement of an “identifiable reporter” could be interpreted to mean the report was disqualified. Healthcare research has no such operational definition for what constitutes an AE, and while this allows for a more rational approach to collecting medically relevant information, it means that there can be no direct sharing of approaches or interpretation of findings between the different sectors. And the reason such operational

460

d efinitions are required by regulators and industry is that there are massive efforts which span companies and continents, which require some semblance of uniformity if the attempts to perform pharmacovigilance are to yield useful results. Given that both sectors have an interest in AEs, it would be of great benefit if a more inclusive, subtle, and encompassing operational definition of an AE could be developed. Informaticians seeking to make progress here could begin with sound medical concepts to define the broadest category of adverse events. Clearly this work should be built on existing useful clinical models and ontologies (a topic discussed later), but an understanding of the regulatory definitions will be important as well. The goal would be to create a continuum of definitions based on informatics rather than the incongruous set of definitions that exist today. In this way we can imagine that AEs of “regulatory interest” would be a subset of a larger group of medical interest. It could be argued that this distinction exists today—AEs collected as a matter of course in healthcare are examined to see if they meet regulatory criteria, and if so, they are classed as such. The problem with this approach is that using the outdated “four elements” to define AEs of regulatory interest ignores a significant number of medically interesting events. The time has come to rework the operational definition to better align with what qualifies today as an AE from work being done by researchers in healthcare. This topic takes on greater urgency today as the use of “real-world data” becomes more commonplace in regulated clinical research which opens up the consideration of the high-dimension, longitudinal data found in electronic health records and sourced from wearable sensors. The richness of the available data demands a more nuanced and expanded definition for adverse events.

opic 2: Expanding and Formalizing T the Data Model Similar to the operational definition of an AE, the data model used to report AEs was developed

M. A. Ibara and R. L. Richesson

from a need by regulators to have industry be able to report, in a consistent manner, AE reports. The original document of the 1996 document from the International Council of Harmonization (ICH) that addressed the “Data Elements for Transmission of Individual Case Safety Reports” (ICSR) was designated “E2” (the ICH designation for pharmacovigilance documents) and “B” referred to the particular document that defined data elements [16]. Hence, when referring to “E2B,” we are referring to the underlying data model for an AE. The E2B data model is well-developed and used internationally, which is an advantage. But as is the case with the operational definition of an AE, E2B had its origins long before big data, the Internet, and the dramatic increase in digitized healthcare data. With the most recent version (E2BR3), the overall standard is based upon a HL7 ICSR model that is capable of supporting the exchange of messages for a wide range of product types (e.g., human medicinal products, veterinary products, medical devices). This is an excellent move toward more functionality within the regulatory reporting realm, but whereas this works well to allow submission of AEs to regulators, from an informatics perspective, looking to the future support of research across healthcare, this is lacking. Contrast this with the type of large-scale research done today using very large and disparate datasets. This work has driven the creation of common data models which often include adverse events. A good example of this is the Observational Medical Outcomes Pilot (OMOP) common data model (CDM) [17] produced by OHDSI (Observational Health Data Sciences and Informatics). The OMOP CDM was created to use in the systematic analysis of disparate observational databases, and to this end it has a common format and common terminologies, vocabularies, and coding schemes. Use of this approach in pharmacovigilance is what Koutkias and Jaulent have called the “computational approach” [18], in this case specifically for signal detection. The authors argue that pharmacovigilance should exploit all possible sources of information that may impact drug and

22 Back to the Future: The Evolution of Pharmacovigilance in the Age of Digital Healthcare

device safety, and they do an excellent job of reviewing the sources, tools, and approaches. Most importantly, they suggest that semantic technologies are the right approach to this new pursuit of using diverse data sources in a unified fashion. One semantic technology increasingly popular in clinical informatics is ontologies—explicit, formal specifications of terms or concepts in a domain and the relationships among them [19]. An early introduction of ontologies to the field of pharmacovigilance came in 2006 when Henegar et al. looked at formalizing MedDRA, the standardized medical terminology used for international regulatory purposes, one of which is to report AEs [20]. What Henegar discovered with MedDRA is illustrative of many models and terminologies in use with pharmacovigilance— there were no formal definitions of terms in MedDRA, and this meant that no formal description logic could be applied to reason against data described with this terminology. The lack of formal logic and rigorous concept representation meant that inference was not possible based on semantic content. For many years, those engaged in pharmacovigilance research in industry were well aware of the lack of a semantic layer, but it was considered simply an artifact of the way in which data was collected. Groupings and counts of terms in MedDRA were gathered, and what then followed was a long and arduous process of in effect manually applying the semantic layer back to the data. Ontologies have been demonstrated to significantly improve this situation and allow us to imagine the ability to combine large and disparate sources of data and properly infer from them [18, 20–22]. The challenge today is that there is still relatively sparse communication between the regulatory-facing tools used in pharmacovigilance and those being borrowed from computational biology and other disciplines allowing us to expand the data sources and techniques used in researching the safety of medical products. The SALUS study [23] took on the challenge of harmonizing data models and terminologies in an effort not typical in signal verification studies.

461

This approach holds great promise and engenders a significant amount of research, but SALUS was unusual in that the authors sought to harmonize the work with regulatory requirements. To achieve this, in addition to creating a rich ontology to work with the EHR, they mapped certain elements onto the previously described reporting standard, E2B (R2). And while this was an effective demonstration that it is possible to unify the healthcare, industry, and regulatory needs in pharmacovigilance (by seeking a logical lower-level ontological representation), the fact that now a major revision to E2B (R3) has come into effect and demonstrates the continued balkanized nature of the field. There is no lack of definitions for adverse events. The ongoing development of the FHIR standard [24] has generated renewed interest in this area, as well as ongoing work on the OMOP standard [25]. Work by informaticists is needed to unify and maintain the representations needed in pharmacovigilance, and settling on a set of key ontologies would be a dramatic step forward and would enable better utilization of diverse sources of data, more economical translation of data for industrial research, and more accurate, better- quality communication of this information for regulatory purposes. The field of oncology research may be a useful model for informaticists looking to improve the definitions in pharmacovigilance. As a result of the dramatic increase in genomic data and other real-world data, oncology has been learning to manage massive amounts of detailed data with precision medical concepts—and so is often at the forefront of informatics work that impacts clinical research. Becoming familiar with the unique approach to toxicities, adverse event definitions, and attempts to reconcile healthcare and research concepts in oncology is an excellent introduction to possible solutions [26, 27].

Topic 3: Terminologies Since the beginning of medical and industrial research, terminologies have been developed in an attempt to categorize and standardize work.

462

And it has long been recognized that the problem of semantics, or the meaning of terms in medicine and healthcare research, cannot be fully divorced from the terminologies used to describe things [28, 29]. Along with heterogeneous data models, lack of consistency in various terminologies and how they’re applied has been a challenge even before described succinctly by Cimino and is understood as a lynchpin to using EHRs for big data research [30]. Recently, the work being done in machine learning, ontologies, and computational methods is shedding new light on ways to tame the terminology issues, such that it is now imaginable that the problem of inconsistency could be solved by a logically rigorous ontology which binds terminologies to data models [31]. As a discussion of ontologies preceded this section (see Chap. 19), here we highlight work being done in machine learning which impacts challenges with terminologies. For the last several years, researchers have looked at computer-assisted ways to extract AEs from text (specifically from narratives in AE reports) [32], but more recently new levels of sophistication in handling terminology as part of the process have been demonstrated. Jiang et al. evaluated using machine learning-based approaches to extract clinical entities from hospital discharge summaries written in free text [33]. Clinical entities included medical problems, tests, and treatments. While this work did not specifically address identification of AEs, the clinical and conceptual challenges are the same, and indeed in some cases, medical problems are adverse events. Of interest was their finding that traditional mapping of text to controlled vocabularies (time- consuming work that often reflects individual preference) could be helped by accurate boundary detection by machine learning systems which do named entity recognition (NER) tasks (find and classify words and phrases into semantic classes). They hypothesize this system could help recognize unknown words based on context and so could supplement traditional dictionary-based NLP systems. The implication here is that the task of finding and accurately coding adverse

M. A. Ibara and R. L. Richesson

events (among other medical concepts) could be significantly standardized and automated via the methods described. For pharmacovigilance, this would have a direct application not only in finding AEs in discharge summaries but also in recognizing AEs from patient diaries and notes, where an expression that refers to an AE may have no recognition in a dictionary-based system (e.g., “this stuff split my head into”—where the vernacular refers to a drug-induced headache, but the terms and the misuse of “into” vs “in two” makes machine recognition challenging). The development of a machine learning approach demands better-defined, more logically consistent datasets, and this has spurred work which will change the traditional challenges associated with terminologies. Borrowing from a bioinformatics and systems biology approach, Cai et al. created ADReCS—the Adverse Drug Reaction Classification System [34]. ADReCS is an ontology of AE terms built with MedDRA and UMLS with hierarchical classification and digital identifiers. This means that direct computation on ADR terms can be achieved using the system, a significant step for the efficient use of machine learning technologies. We can imagine a future where this system or ones similar are expanded and mapped to other ontologies built in a similar manner, allowing for an approach to pharmacovigilance that is unlike anything in the past. As we reach this stage of computational maturity in pharmacovigilance, it will create a very significant driver for the biopharmaceutical industry, which spends a great deal on gathering data from disparate sources to test drug safety hypotheses and to standardize and recode that data into common formats that can be submitted to regulators. As systems like ADReCS become the norm, many of the inefficiencies the industry now faces will begin to disappear. As with ontologies, work is needed to expand the most promising systems and to find the most universal and effective representations of terminologies that can migrate successfully from healthcare to industry to regulators with no loss of meaning and will decreased manual effort.

22 Back to the Future: The Evolution of Pharmacovigilance in the Age of Digital Healthcare

Topic 4: Discovery/Curation of AEs Research on the discovery of AEs is being done in every possible source—electronic health records, social media, registries, large databases, real-world data from insurance claims, and other sources [35]. In 2012, Harpaz et al. set the stage for the use of novel methodologies using large datasets with their review of current work [36]. The authors made several salient points regarding the new research methods, including the fact that (1) combining data from heterogeneous sources requires the development of new and reproducible methods, (2) standardized (and simulated) datasets will grow in importance to allow rapid testing of new methods, and (3) standards in PV must be developed to evaluate algorithmic approaches applied to the data. In 2013 Jiang et al. began work on ADEpedia 2.0, which built on their previous AE knowledge base derived from drug product labels; in keeping with the direction laid out by Harpaz, in 2.0 the authors began to enrich the database with data from UMLS (Unified Medical Language System) and EHR data, with a goal to create a standardized source of AE knowledge [37]. Banda et al. continued this approach, standardizing the FDA’s FAERS (FDA Adverse Event Reporting System) database [38]. They provided a curated database removing duplicate records, mapping the data to standardized vocabularies with drug names mapped to RxNorm concepts and outcomes mapped to SNOMED-CT concepts, and created a set of summary statistics about drug-outcome relationships for general consumption. While not involved directly with machine learning, this approach pointed the way toward further machine-based approaches by providing all source code for the work, so that it could be used and updated as needed, and by mapping outcomes and indications to SNOMED-CT, this allows for direct linkage to other ontologies. Since that time, an explosion of work has taken place in all three areas identified by Harpaz, emphasizing the discovery of AEs using machine learning combined with statistical techniques [39–44].

463

The study by Bean et al. [39] serves to illustrate a new way of approaching discovery of AEs in the postmarketing phase—one that doesn’t wait for a series of reports to emerge; rather it takes advantage of what until recently were infrequently connected sources of data to discover previously unknown AEs due to specific drugs and to validate this via EHRs. The authors constructed a knowledge graph with four primary sources of data: drugs, protein targets, indications, and adverse reactions that predicted AEs from public data. They then used this to develop a machine learning algorithm and deployed that algorithm on an EHR. The algorithm was fed by an NLP pipeline developed to parse free text in the EHR. This work is similar to work on prediction of AEs using structure-activity relationships [45], gene expression [46], and protein drug targets [47]. In this work we can see a computational biological approach which can view with the current biology-based approach that has paid dividends but dominated PV for decades. In 2017 Voss et al. moved the field forward significantly with their work to automatically aggregate disparate sources of data into a single repository [48] that allows a machine learning approach to selecting positive and negative controls for pharmacovigilance research design testing. As previous work demonstrated, creating a reference database for pharmacovigilance using manual or even semi-manual methods, is extremely time- and resource-intensive. The authors built on previous work (described in Banda) and added the relationship between a drug and a health outcome of interest (HOI). They performed a quantitative assessment of how well the evidence base could discriminate between known positive drug-condition causal relationships and drugs known to be not associated with a condition, thus allowing the automated creation of an assessment for pharmacovigilance research study designs that allows comparisons across designs with a significant savings in time and increase in standardization. The authors worked through methods for accepting data from various sources at various granular levels, for example, mapping the source at either an ingredient level of a drug or at a

464

c linical drug level and subsequently aggregating evidence to individual ingredients to allow analysis across the dataset. While work in AE discovery occurs at every level and is often the primary topic in other researched covered under other research topics such as terminologies, the topic of curation is not one typically addressed except in individual efforts that are not reproducible and rarely maintained due to the intense effort required. This shift toward automated curation across various data sources will prove to be an important stimulus to the computational approach in pharmacovigilance, allowing a much more rapid and standardized testing of research designs. In the near future, we can expect more reference sets which can be used to train machine learning algorithms and test large-scale analysis methods. Just as the creation of high-quality curated datasets in machine learning is driving forward progress across many fields [49], we can expect the same to occur in pharmacovigilance as work continues. It is insightful to review the dramatic effect that a massive, well-curated (automatically generated) dataset can have on accuracy of machine learning algorithms in the example of ImageNet [50, 51], a database of over 14 million image URLs that are labeled to provide a curated set. Prior to establishing this dataset, progress in visual object recognition was steady but slow. In 2012, using a deep convolutional neural network trained on ImageNet, researchers bested other networks by over 40% to the next best [52]. This massive, curated dataset is widely attributed as one of the primary drivers of the deep learning revolution. Other such databases (VigiBase with 21 million Case safety reports in 2020) are emerging as resources to advance AE detection and even scientific discovery [53]. The moral of this story for pharmacovigilance is that a focus on the creation of large, curated, automatically created test datasets has the potential to move a computational approach to pharmacovigilance forward just as quickly if not more quickly than the best analytical methods. This is certainly an area for future informatics research.

M. A. Ibara and R. L. Richesson

opic 5: Delayed Toxicity T and Complex Causal Assessments The tragic discovery of delayed hepatotoxicity caused by fialuridine is required reading in any pharmacovigilance or clinical research education. It is important to understand just how difficult it was at that time to attribute observed toxicities to the drug, given how they initially presented in patients and the presence of similar symptoms due to underlying disease or caused by an initial therapeutic response. These challenges, coupled with the piecemeal accumulation of information over a period of time, made it difficult to form a conviction that fialuridine caused a fatal toxicity—although, as some argue, evidence was clearly present [54, 55]. The 1995 Institute of Medicine report on the review of events leading up to the tragic deaths of five patients in a 1993 clinical trial of fialuridine for hepatitis B concluded that overall, clinical researchers involved in various trials acted correctly and made the best decisions possible given the available information. Looking at the set of trials that were done over a period of several years, however, one cannot help but be struck by the series of “clues” pointing to fialuridine and how, when taken together, they provide a strong signal that the drug was implicated [56]. In our current post-behavioral economics atmosphere, it may be easier for us to appreciate how we could fail to recognize a problem of delayed toxicity in a drug: humans are superb at pattern recognition over a relatively short timeframe, but our skill degrades rapidly as cause is separated in time from effect and obscured by other possible causes. In pharmacovigilance, one has a feeling of inadequacy when it comes to sorting out the possible links between drugs and toxicities, except in the most obvious and common cases. The investigations into fialuridine-delayed toxicity produced better regulation and reasonable research recommendations [56], but beyond these improvements, not much has been gained in our ability to recognize delayed toxicity in drugs from complex situations.

22 Back to the Future: The Evolution of Pharmacovigilance in the Age of Digital Healthcare

A less dramatic but conceptually similar challenge faces anyone seeking to sort out what drugs may be contributing to a patient’s clinical signs and symptoms when they have underlying disease and are on a multiple drug regimen. The classic questions regarding “dechallenge/rechallenge” (whether a sign or symptom stopped once drug was stopped, and returned after drug was restarted) and the time course of drug dose vs appearance of symptoms are well designed but often unanswerable in a real-world situation. Oncology trials come to mind as a particularly challenging environment in which to attribute cause to individual drugs. These scenarios are not unique to pharmacovigilance. They share the same basic external challenges—incomplete information, competing causes, extended overtime, and internal challenges—idiosyncratic human perception, and bias with pursuits as diverse as cognitive psychology and behavioral economics [57] or the study of policy impacts [58]. Computational approaches to these questions hold out promise to provide the most significant advancement in years for pharmacovigilance, by transferring the burden of recognition to computers working with large datasets using sound methods. Most of the work reviewed earlier in the recognition of AEs applies here as well. Huang et al. systems pharmacology approach of combining clinical observation with molecular biology [42] can be seen as template for research in predicting toxicities in drugs and arming researchers with information that will enhance the design as well as the monitoring of trials using drugs with increasingly complex mechanisms of action. Recent similar work indicates that a systems pharmacology or computable biology approach holds out great promise in predicting toxicities at an earlier stage than previously imagined [59–62]. Combining data across disciplines in a computable framework is a fertile area of research, especially as it applies to predicting toxicities in a real-world setting. The contribution of informatics to this work can have a tangible and concrete impact in improving safety for patients.

465

Arming clinical researchers and pharmacovigilance professionals with these methods holds out hope that another fialuridine tragedy would be avoided today.

opic 6: Risk Profiling T of the Individual The concept of precision medicine that medical care can be tailored—especially in a genomic and molecular sense—to select groups of patients is now commonplace and being realized in the design of clinical trials and healthcare policy in addition to medical practice. In pharmacovigilance, however, there is a need for better understanding of how the concepts of precision medicine can be incorporated into goals and practice. This section simply poses some basic open questions that informaticians can help to address in order to improve the theoretical basis of pharmacovigilance. But while the ideas here are to some degree speculative, the authors believe they should be taken seriously, as they are at the heart of pharmacovigilance itself. A simple coined term “precision pharmacovigilance” is enough to raise questions and spark ideas about how the discoveries in medicine and biology can be more directly taken up in the study of drug and device safety [63]. But a broader (and more provocative) research question to ask is Is it possible to provide to an individual a ‘risk profile’ as it relates to their particular drug and/or device regimen? One aspect of the question relates to the degree to which we can simply follow the discoveries in precision medicine and practice pharmacovigilance along the way—e.g., looking at AEs in certain genetic subgroups while undergoing treatment with immune modulators. It could be said that in this respect, there’s nothing new here; pharmacovigilance has looked at subgroups of patients for some time [64] and continues to bear fruit [65]. But recent research in methods dealing with large-scale longitudinal observational databases [66, 67] allows us to imagine a scenario different

466

from that of looking at the AEs related to certain subtypes of patients—what if we could predict the risk of being a certain person (age, race, genetic makeup), taking a certain set of drugs (let’s say a regimen of five separate drugs), living in a certain area of the world, and having a particular occupation? Can we reach the point where we can tell you that for you as an individual, you have a 60% chance of a significant toxicity if you fit the above profile? The question serves less to examine how much data would it take to provide an exact answer and more to challenge us to decide how feasible it is to pursue this goal. Can pharmacovigilance aspire to studying and predicting risk not only for patient subtypes but for situational circumstances? At this early stage of discovery and application in big data, machine learning, and improving methods, it is important to keep an open mind about what pharmacovigilance can become. Being able to speak directly to select groups of patients who are living in specific circumstances as regards their drug therapy was an original motivation for pharmacovigilance and we believe should continue to inspire research.

opic 7: Emerging Data T and Technologies for Pharmacovigilance Since the first edition of this chapter, the most significant change to impact the clinical research informatics of pharmacovigilance is the continuation of the trends in all of the related research in AI and the continued increase in the amount and availability of data. Such is the pace of change that the authors expect any future discussion of pharmacovigilance will need to include these topics as part of the primary discussion. Several of the more important developments are highlighted here.

I nteroperability of Healthcare Data At the time of this writing, there is a major shift toward healthcare data becoming “interoperable” [68]. This is the long-sought-after goal of making healthcare data available and portable to improve

M. A. Ibara and R. L. Richesson

healthcare delivery, patient care, and research. The confluence of more modern, web-based standards such as FHIR, the regulatory push from the Office of the National Coordinator implementing certain requirements of the 21st Century Cures Act, most specifically the “Interoperability Rule” [69] has given rise to a new breed of “interoperability vendors” who are connecting data across healthcare systems and in the process making more standardized healthcare data available for clinical and regulated research. This trend promises to help solve one of the more intractable problems in scaling clinical research using real- world data—the heterogeneous technical and data environment that exists across the United States and globally.

Alphafold 2 Any researchers interested in novel methods to improve pharmacovigilance should gain a deep understanding of the seminal and dramatic achievement of Alphafold 2—the deep learning AI system developed by Alphabet’s/Google’s DeepMind which performs predictions of protein structure and which basically solved the protein folding problem that was a major goal of researchers and drug developers [70–72]. The obvious impact of Alphafold 2 for the drug development industry is clear as it dramatically reduces the time needed to explore structure- function relationships of drug molecules or their molecular targets. This has sparked a dramatic uptake in the pursuit of practical applications across the industrial and academic worlds. But the larger implications of Alphafold 2 for pharmacovigilance are apparent when we examine how the approach to specific knowledge domains coupled with recent deep learning designs can produce results that were previously thought impossible. This will have direct application not only to the molecular basis of adverse events (especially where molecular biomarkers are associated) but also portends a new, data- focused approach to research in pharmacovigilance that is entirely foreign to the past event-based approaches. This is a nascent area of research to consider for pharmacovigilance specifically, but the authors suspect that direct

22 Back to the Future: The Evolution of Pharmacovigilance in the Age of Digital Healthcare

application of the system designs used for Alphafold 2 will become the next horizon of research.

ransformer-Based Language Models T and GPT3 If Alphafold 2 applicability to pharmacovigilance seems to be in the misty distance, Generative Pre-trained Transformer 3 (GPT3) and its impact in just the last few years provide a clear demonstration of the direction of new research. Since their introduction, transformer-based large language models have shown an unusual and not predicted capability to solve problems across various domains [73]. Clearly the most dramatic discovery was that such “language” models can actually solve task-agnostic problems, meaning that they can be used to learn images and associated text with pictures, actions, and more [74]. Research is underway to apply language models to the challenges of pharmacovigilance. Guan and Devarakonda used BERT (Bidirectional Encoder Representations from Transformers) to find adverse events and the drugs that caused them in the literature, improving on previous NLP approaches [75]. Wang et al. go further by combining transformer-based language models with Judea Pearl’s “do-calculus” method of causality assessment [76, 77]. As a result they begin to breach the age-old problem of causality in adverse event assessment [78]. As these examples illustrate, a continuation of the general AI, machine learning, natural language processing trends have reached the stage where their power and applicability as research designs cannot be denied. We expect this research to proliferate and begin to address fundamental issues in pharmacovigilance. Empowered with tools and knowledge and strong patient advocacy networks, the role of the patient in identifying and reporting potential adverse events will inevitably grow—and with it continue to change the paradigm of AE detection, investigation, and action. This, of course, goes against the very ingrained mindset of regulators, who see population approaches as the only PV worthwhile, relegating personal issues to physicians. This current anachronistic attitude toward

467

PV is ironic, given the parallel trumpeting and funding of personalized medicine and research. As access and usability of technology continues to evolve and support increased patient engagement in both healthcare and research, attitudes and paradigms around PV will likely evolve, as will approaches and regulations to understand and improve the safety of medications, vaccines, and medical devices.

The Future of Pharmacovigilance The future of pharmacovigilance lies, as with many fields, in the application of AI methods to increasingly large datasets. There are now two frontiers addressing this work in the larger context of drug safety. The first frontier has been described above, and it is applying new AI methods and models directly to fundamental challenges in pharmacovigilance. The second frontier represents the industry response to this undeniable trend. Taking a more conservative approach (as can be expected given the regulatory requirements of the field), there is now great interest in how moderate machine learning and NLP approaches can be applied to the administrative and labor-intensive processes in industrial pharmacovigilance work [79–81]. It is likely that machine learning and NLP will provide valuable time and cost savings to current industry processes, and this can only be positive for pharmacovigilance as a whole. It does not, however, address the more fundamental questions that AI has raised for industrial pharmacovigilance just as it has for most other industries. Ball and Dal Pan provide the quintessential example of this approach in their excellent review article “Artificial Intelligence” for Pharmacovigilance: Ready for Prime Time?” [82]. They review the current regulated process for pharmacovigilance and discuss points at which AI (more specifically a truncated definition of “algorithms”) could be useful and where there are still challenges. The very approach to “algorithms” in the article belies the fact that the authors limit the discussion by employing an outdated definition of what AI is based on current

M. A. Ibara and R. L. Richesson

468

work as described above. To call GP-3 and “algorithm” loses all meaning when you consider the full version has over 170 billion learning parameters. Just as the initial digitization of adverse event data suffered from an outdated paper metaphor, which limited its usefulness and slowed meaningful change, so the metaphor of an “algorithm” representing AI is causing the same issues in the professional field of pharmacovigilance. We hope that the half-life of this misrepresentation will be much shorter than was the paper metaphor for data. Finding accurate definitions, ontologies, terminologies, and approaches to pharmacovigilance based on the methods and proven results being discovered today in domain- specific data sets combined with large language models is work in tremendous need for the contributions of clinical research informaticists.

Conclusion The above models of observational research and signal detection in the real world will necessitate standard data representations—including controlled terminology and shared data, information, and (formal) knowledge models. These challenges are nontrivial and are common obstacles for other major informatics activities, such as improving data exchange, collection of longitudinal data, and real-time clinical decision support, which is the holy grail of informatics and electronic health. Because standard data models and terminology are central to so many EHR goals and stakeholders, lots of energy and resources are directed here, and hence good reason to be optimistic that standardized data from EHR systems will one day be available to support pharmacovigilance. These same stakeholders can/will also make the case (to the public and healthcare consumer) of the vital role that standardized and quality clinical data will play in public health. These stakeholders (and drivers, from industry, patient advocates, and the public) can highlight pharmacovigilance as a public health issue—and one that is relevant to clinical care as well as research.

In the end, the divergence of patient-facing drug safety and industrial pharmacovigilance continues today as it was first described in the original publication of this chapter. While the regulated industry provides a certain stability, we must look to innovations in healthcare approaches to risk assessment and personalized medicine for our future models of pharmacovigilance to bring significant improvements to patients’ lives and safety. As Sir Tim Berners-Lee, who invented the World Wide Web in 1989, noted: Data is a precious thing and will last longer than the systems themselves.

References 1. Härmark L, van Grootheest AC. Pharmacovigilance: methods, recent developments and future perspectives. Eur J Clin Pharmacol. 2008;64(8):743–52. https://doi.org/10.1007/s00228-0 08-0 475-9 . Epub 2008 Jun 4. https://www.ncbi.nlm.nih.gov/ pubmed/18523760. 2. van Grootheest AC, Richcsson RL. Pharmacovigilance. In: Richesson R, Andrews J, editors. Clinical research informatics. Health informatics. London: Springer; 2012. https://doi. org/10.1007/978-1-84882-448-5_19. 3. Friedman LM, Furberg CD, DeMets DL. Assessing and reporting adverse events. In: Fundamentals of clinical trials. New York: Springer; 1998. p. 170–84. 4. Andrews EB, Moore N. Mann’s pharmacovigilance. 3rd ed. Chichester: Wiley-Blackwell; 2014. 5. World Health Organization. What is pharmacovigilance? http://www.who.int/medicines/areas/quality_safety/safety_efficacy/pharmvigi/en/. Accessed 4 Feb 2022. 6. Talbot J, Aronson JK, editors. Stephens’ detection and evaluation of adverse drug reactions: principles and practice. 6th ed. Chichester: Wiley; 2011. 7. Ibara MA, Richesson RL. Back to the future: the evolution of pharmacovigilance in the age of digital healthcare. In: Richesson R, Andrews J, editors. Clinical research informatics. Health informatics. Cham: Springer; 2019. https://doi. org/10.1007/978-3-319-98779-8_20. 8. Coase RH. The nature of the firm. Economica. 1937;4(16):386. https://onlinelibrary.wiley.com/doi/ full/10.1111/j.1468-0335.1937.tb00002.x. Accessed 4 Feb 2022. 9. Naughton J. How a 1930s theory explains the economics of the internet. The Guardian. 7 Sept 2013. http://www.theguardian.com/technology/2013/ sep/08/1930s-theory-explains-economics-internet. Accessed 4 Feb 2022.

22 Back to the Future: The Evolution of Pharmacovigilance in the Age of Digital Healthcare 10. Kuhn TS. The structure of scientific revolutions. Chicago: University of Chicago Press; 2012. http:// www.press.uchicago.edu/ucp/books/book/chicago/S/ bo13179781.html. 11. Chilcott M. How data analytics and artificial intelligence are changing the pharmaceutical industry. Forbes Magazine. 10 May 2018. https://www.forbes. com/sites/forbestechcouncil/2018/05/10/how-data- analytics-and-artificial-intelligence-are-changing- the-pharmaceutical-industry/. 12. Dua S, Rajendra Acharya U, Dua P. Machine learning in healthcare informatics. Intelligent systems reference library. 2014. https://link.springer.com/ book/10.1007%2F978-3-642-40017-9. 13. Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, Wang Y, Dong Q, Shen H, Wang Y. Artificial intelligence in healthcare: past, present and future. Stroke Vasc Neurol. 2017;2:230–43. https://doi.org/10.1136/ svn-2017-000101. 14. U.S. Food and Drug Administration. Guidance for industry postmarketing adverse event reporting for nonprescription human drug products marketed without an approved application. Oct 2007. https://www. fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm071982.pdf. 15. Linder JA, Haas JS, Iyer A, Labuzetta MA, Ibara M, Celeste M, Getty G, Bates DW. Secondary use of electronic health record data: spontaneous triggered adverse drug event reporting. Pharmacoepidemiol Drug Saf. 2010;19(12):1211–5. https://doi. org/10.1002/pds.2027. 16. Research, Center for Drug Evaluation and Guidances (Drugs). E2B(R3) electronic transmission of individual case safety reports implementation guide—data elements and message specification; and appendix to the implementation guide—backwards and forwards compatibility. n.d. https://www.fda.gov/drugs/guidancecomplianceregulatoryinformation/guidances/ ucm274966.htm. 17. OMOP. Common data model—OHDSI. n.d. https:// www.ohdsi.org/data-standardization/the-common- data-model/. Accessed 8 Mar 2018. 18. Koutkias VG, Jaulent M-C. Computational approaches for pharmacovigilance signal detection: toward integrated and semantically-enriched frameworks. Drug Saf. 2015;38(3):219–32. 19. Gruber TR. A translation approach to portable ontology specifications. Knowl Acquis. 1993;5(2):199–220. 20. Henegar C, Bousquet C, Louët AL-L, Degoulet P, Jaulent M-C. Building an ontology of adverse drug reactions for automated signal generation in pharmacovigilance. Comput Biol Med. 2006;36(7):748–67. 21. Pacaci A, Gonul S, Anil Sinaci A, Yuksel M, Erturkmen GBL. A semantic transformation methodology for the secondary use of observational healthcare data in postmarketing safety studies. Front Pharmacol. 2018;9:435. 22. Personeni G, Bresso E, Devignes M-D, Dumontier M, Smaïl-Tabbone M, Coulet A. Discovering associa-

469

tions between adverse drug events using pattern structures and ontologies. J Biomed Semant. 2017;8(1):29. 23. Yuksel M, Gonul S, Erturkmen GBL, Sinaci AA, Invernizzi P, Facchinetti S, Migliavacca A, Bergvall T, Depraetere K, De Roo J. An interoperability platform enabling reuse of electronic health records for signal verification studies. Biomed Res Int. 2016;2016:1–18. https://doi.org/10.1155/2016/6741418. 24. Overview—FHIR v4.3.0. n.d. https://www.hl7.org/ fhir/overview.html. Accessed 11 Sept 2022. 25. OMOP common data model. n.d.. https://www.ohdsi. org/data-standardization/the-common-data-model/. Accessed 11 Sept 2022. 26. Le-Rademacher JG, Hillman S, Storrick E, Mahoney MR, Thall PF, Jatoi A, Mandrekar SJ. Adverse event burden score-A versatile summary measure for cancer clinical trials. Cancers. 2020;12(11) https://doi. org/10.3390/cancers12113251. 27. Ewer MS, Herson J. Cardiovascular adverse events in oncology trials: understanding and appreciating the differences between clinical trial data and real- world reports. Cardiooncology (London, England). 2022;8(1):13. 28. Cimino JJ, Clayton PD, Hripcsak G, Johnson SB. Knowledge-based approaches to the maintenance of a large controlled medical terminology. J Am Med Inform Assoc. 1994;1(1):35–50. 29. Schroll JB, Maund E, Gøtzsche PC. Challenges in coding adverse events in clinical trials: a systematic review. PLoS One. 2012;7(7):e41174. 30. Reich C, Ryan PB, Stang PE, Rocca M. Evaluation of alternative standardized terminologies for medical conditions within a network of observational healthcare databases. J Biomed Inform. 2012;45(4):689–96. 31. Ethier J-F, Dameron O, Curcin V, McGilchrist MM, Verheij RA, Arvanitis TN, Taweel A, Delaney BC, Burgun A. A unified structural/terminological interoperability framework based on LexEVS: application to TRANSFoRm. J Am Med Inform Assoc. 2013;20(5):986–94. 32. Kovacevic A, Dehghan A, Filannino M, Keane JA, Nenadic G. Combining rules and machine learning for extraction of temporal expressions and events from clinical narratives. J Am Med Inform Assoc. 2013;20(5):859–66. 33. Jiang M, Chen Y, Mei L, Trent Rosenbloom S, Mani S, Denny JC, Hua X. A study of machine-learning- based approaches to extract clinical entities and their assertions from discharge summaries. J Am Med Inform Assoc. 2011;18(5):601–6. 34. Cai M-C, Xu Q, Pan Y-J, Pan W, Ji N, Li Y-B, Jin H-J, Liu K, Ji Z-L. ADReCS: an ontology database for aiding standardization and hierarchical classification of adverse drug reaction terms. Nucleic Acids Res. 2015;43(D1):D907–13. 35. Murff HJ, Patel VL, Hripcsak G, Bates DW. Detecting adverse events for patient safety research: a review of current methodologies. J Biomed Inform. 2003;36(1–2):131–43.

470 36. Harpaz R, DuMouchel W, Shah NH, Madigan D, Ryan P, Friedman C. Novel data-mining methodologies for adverse drug event discovery and analysis. Clin Pharmacol Ther. 2012;91(6):1010–21. 37. Jiang G, Liu H, Solbrig HR, Chute CG. ADEpedia 2.0: integration of normalized adverse drug events (ADEs) knowledge from the UMLS. AMIA Joint Summits Transl Sci Proc. 2013;2013:100–4. 38. Banda JM, Lee E, Vanguri RS, Tatonetti NP, Ryan PB, Shah NH. A curated and standardized adverse drug event resource to accelerate drug safety research. Sci Data. 2016;3:160026. 39. Bean DM, Honghan W, Iqbal E, Dzahini O, Ibrahim ZM, Broadbent M, Stewart R, Dobson RJB. Knowledge graph prediction of unknown adverse drug reactions and validation in electronic health records. Sci Rep. 2017;7(1):16416. 40. Boland MR, Jacunski A, Lorberbaum T, Romano JD, Moskovitch R, Tatonetti NP. Systems biology approaches for identifying adverse drug reactions and elucidating their underlying biological mechanisms. Wiley Interdiscip Rev Syst Biol Med. 2016;8(2):104–22. 41. Ho T-B, Le L, Thai DT, Taewijit S. Data-driven approach to detect and predict adverse drug reactions. Curr Pharm Des. 2016;22(23):3498–526. 42. Huang L-C, Wu X, Chen JY. Predicting adverse side effects of drugs. BMC Genomics. 2011;12(5):S11. 43. Jamal S, Goyal S, Shanker A, Grover A. Predicting neurological adverse drug reactions based on biological, chemical and phenotypic properties of drugs using machine learning models. Sci Rep. 2017;7(1):872. 44. Zhang W, Liu F, Luo L, Zhang J. Predicting drug side effects by multi-label learning and ensemble learning. BMC Bioinform. 2015;16:365. 45. Frid AA, Matthews EJ. Prediction of drug-related cardiac adverse effects in humans—B: use of QSAR programs for early detection of drug- induced cardiac toxicities. Regul Toxicol Pharmacol. 2010;56(3):276–89. 46. Wang Z, Clark NR, Ma’ayan A. Drug-induced adverse events prediction with the LINCS L1000 data. Bioinformatics. 2016;32(15):2338–45. 47. Pérez-Nueno VI, Souchet M, Karaboga AS, Ritchie DW. GESSE: predicting drug side effects from drug–target relationships. J Chem Inf Model. 2015;55(9):1804–23. 48. Voss EA, Boyce RD, Ryan PB, van der Lei J, Rijnbeek PR, Schuemie MJ. Accuracy of an automated knowledge base for identifying drug adverse reactions. J Biomed Inform. 2017;66:72–81. 49. Wikipedia contributors. List of datasets for machine learning research. Wikipedia, the free encyclopedia. 1 Jul 2018. https://en.wikipedia.org/w/index. php?title=List_of_datasets_for_machine_learning_ research&oldid=848338519. 50. ImageNet Large Scale Visual Recognition Competition (ILSVRC). n.d. http://www.image-net. org/challenges/LSVRC/. Accessed 2 Jul 2018.

M. A. Ibara and R. L. Richesson 51. Wikipedia contributors. ImageNet. Wikipedia, the free encyclopedia. 21 Jun 2018. https:// en.wikipedia.org/w/index.php?title=ImageNet&ol did=846928201. 52. Gershgorn D. The data that transformed AI research—and possibly the world. Quartz. 26 Jul 2017. https://qz.com/1034972/the-data-that-changed- the-direction-of-ai-research-and-possibly-the-world/. 53. Bihan K, Bénédicte LV, Funck-Brentano C, Salem J. Uses of pharmacovigilance databases: an overview. Therapie. 2020;75(6):591–8. https://doi. org/10.1016/j.therap.2020.02.022. 54. Bari A. Severe toxicity of fialuridine (FIAU). N Engl J Med. 1996;334(17):1135; author reply 1137–8. 55. The cure that killed. DiscoverMagazine.com. Discover Magazine. n.d. http://discovermagazine.com/1994/ mar/thecurethatkille345. Accessed 4 Jul 2018. 56. Institute of Medicine (US). Committee to review the fialuridine (FIAU/FIAC) clinical trials. In: Manning FJ, Swartz M, editors. Review of the fialuridine (FIAU) clinical trials. Washington, DC: National Academies Press (US); 1995. 57. Stiensmeier-Pelster J, Heckhausen H. Causal attribution of behavior and achievement. In: Heckhausen J, Heckhausen H, editors. Motivation and action. Cham: Springer International Publishing; 2018. p. 623–78. 58. NoNIE guidance on impact evaluation. World Bank Group. n.d. http://siteresources.worldbank.org/ EXTOED/Resources/nonie_guidance.pdf. 59. Ai H, Chen W, Zhang L, Huang L, Yin Z, Hu H, Zhao Q, Zhao J, Liu H. Predicting drug-induced liver injury using ensemble learning methods and molecular fingerprints. Toxicol Sci. 2018;165:100. https://doi. org/10.1093/toxsci/kfy121. 60. Kim E, Nam H. Prediction models for drug-induced hepatotoxicity by using weighted molecular fingerprints. BMC Bioinform. 2017;18(7):227. 61. Kotsampasakou E, Montanari F, Ecker GF. Predicting drug-induced liver injury: the importance of data curation. Toxicology. 2017;389:139–45. 62. Yang H, Sun L, Li W, Liu G, Tang Y. In silico prediction of chemical toxicity for drug design using machine learning methods and structural alerts. Front Chem. 2018;6:30. 63. Edwards V. Looking forward. Pharmacovigilance in the next 5 years. The Industry Vision. Available at: https://www.ema.europa.eu/en/documents/presentation/presentation-pharmacovigilance-next-5-years- industry-vision-v-edwards_en.pdf. Last access date 23 Sept 2022. 64. Lynch T, Price A. The effect of cytochrome P450 metabolism on drug response, interactions, and adverse effects. Am Fam Physician. 2007;76(3):391–6. 65. St Sauver JL, Olson JE, Roger VL, Nicholson WT, Black JL 3rd, Takahashi PY, Caraballo PJ, et al. CYP2D6 phenotypes are associated with adverse outcomes related to opioid medications. Pharmacog Personal Med. 2017;10:217–27.

22 Back to the Future: The Evolution of Pharmacovigilance in the Age of Digital Healthcare 66. Moghaddass R. The factorized self-controlled case series method: an approach for estimating the effects of many drugs on many outcomes. n.d.. 67. Shaddox TR, Ryan PB, Schuemie MJ, Madigan D, Suchard MA. Hierarchical models for multiple, rare outcomes using massive observational healthcare databases. Stat Anal Data Min. 2016;9(4):260–8. 68. The Office of the National Coordinator for Health Information Technology. Interoperability. n.d. https:// www.healthit.gov/topic/interoperability. Accessed 11 Sept 2022. 69. O N C _ C u r e s _ A c t _ F i n a l _ R u l e _ 0 3 0 9 2 0 2 0 . pdf. n.d. https://www.healthit.gov/sites/default/ files/page2/2020-0 3/ONC_Cures_Act_Final_ Rule_03092020.pdf. Accessed 23 Sept 2022. 70. Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596(7873):583–9. 71. Skolnick J, Gao M, Zhou H, Singh S. AlphaFold 2: why it works and its implications for understanding the relationships of protein sequence, structure, and function. J Chem Inf Model. 2021;61(10):4827–31. 72. Heaven WD. DeepMind’s protein-folding AI has solved a 50-year-old grand challenge of biology. MIT Technology Review. 30 Nov 2020. https:// www.technologyreview.com/2020/11/30/1012712/ deepmind-protein-folding-ai-solved-biology-science- drugs-disease/. 73. Marr B. What is GPT-3 and why is it revolutionizing artificial intelligence? Forbes Magazine. 5 Oct 2020. https://www.forbes.com/sites/bernard-

471

marr/2020/10/05/what-i s-g pt-3 -a nd-w hy-i s-i t- revolutionizing-artificial-intelligence/. 74. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, et al. Language models are few-shot learners. 2020. arXiv [cs.CL]. arXiv. https://proceedings.neurips.cc/paper/2020/file/1457c 0d6bfcb4967418bfb8ac142f64a-Paper.pdf. 75. Guan H, Devarakonda M. Leveraging contextual information in extracting long distance relations from clinical notes. n.d. 76. Pearl J. An introduction to causal inference. Int J Biostatistics. 2010;6(2):7. https://doi. org/10.2202/1557-4679.1203. 77. Pearl J, Mackenzie D. The book of why: the new science of cause and effect. Basic Books; 2018. 78. Wang X, Xiaowei X, Tong W, Roberts R, Liu Z. InferBERT: a transformer-based causal inference framework for enhancing pharmacovigilance. Front Artif Intell. 2021;4:659622. 79. Bate A, Hobbiger SF. Artificial intelligence, real- world automation and the safety of medicines. Drug Saf. 2021;44(2):125–32. 80. Aronson JK. Artificial intelligence in pharmacovigilance: an introduction to terms, concepts, applications, and limitations. Drug Saf. 2022;45(5):407–18. 81. Schmider J, Kumar K, LaForest C, Swankoski B, Naim K, Caubel PM. Innovation in pharmacovigilance: use of artificial intelligence in adverse event case processing. Clin Pharmacol Ther. 2019;105(4):954–61. 82. Ball R, Dal Pan G. ‘Artificial intelligence’ for pharmacovigilance: ready for prime time? Drug Saf. 2022;45(5):429–38.

Evolving Opportunities and Challenges for Patients in Clinical Research

23

James E. Andrews, Christina Eldredge, Janelle Applequist, and J. David Johnson

Abstract

Motivated in many ways by the rapid evolution of information and communication technologies along with the shift toward increased patient decision-making and empowerment, changes in healthcare have had critical implications for clinical research. This chapter explores the developments impacting health consumers from various perspectives, with some focus on foundational issues in health information and communication as related to health consumerism and engagement. An overarching challenge for health consumers is the information environment, which is increasingly social, and underlying issues from emerging technologies and applications contribute to the

J. E. Andrews (*) · C. Eldredge School of Information, College of Arts and Sciences, University of South Florida, Tampa, FL, USA e-mail: [email protected]; [email protected] J. Applequist Zimmerman School of Advertising and Mass Communications, University of South Florida, Tampa, FL, USA e-mail: [email protected] J. D. Johnson Department of Communication, University of Kentucky, Lexington, KY, USA e-mail: [email protected]

changing nature of patients’ participation in research. Not surprisingly, core findings from communication and information behavior research have relevance for our current understanding and future models of the evolving role of the health consumer. Keywords

Health consumerism · Consumer health information · Consumer health movement Patient empowerment · Patient engagement Public access technologies · Personalization of medicine · Clinical research

Learning Objectives 1. Describe the importance of patients and consumers to clinical research and the evolving trends in patient empowerment and engagement related to the research enterprise. 2. Describe concepts related to health information literacy and information seeking and their roles in patients’ participation in research. 3. Outline the various social environments and other networks that consumers and patients may operate within. 4. Describe existing and emerging supporting applications and technologies designed to engage patients in research and define challenges and opportunities these present.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. L. Richesson et al. (eds.), Clinical Research Informatics, Health Informatics, https://doi.org/10.1007/978-3-031-27173-1_23

473

474

Introduction The largest and least used resource in medicine is the patient…The idea is to make patients more involved in their care. After all, the patient is the sole subject matter expert on themselves. (Dr. Warner Slack [1])

A primary focus of clinical research is the discovery of treatments and processes to improve or save lives. Patients are necessarily the critical focus of clinical research and must be part of a partnership with researchers that is based on trust and understanding [2]. The roles of patients are complex and integrated across all stages of research, from design to recruitment and consent, retention, communication, data sharing, privacy and confidentiality, and safety. Across the spectrum of activities, we have seen a proliferation of informatics tools, methods, and processes introduced to better engage and empower patients and enable safer and more efficacious approaches to research. Clinical research informatics, as framed throughout this text, has emerged as an important discipline supporting research endeavors and has impacted the concomitant evolution of the role of patients. Since at least the 1970s, patients have become more active participants in decisions affecting their healthcare [3] both within and outside of clinical and research settings, transforming from passive to active participants toward a model of health consumerism [4]. Information and communication technologies have fueled this movement, revealing both challenges and incredible opportunities as a result [5, 6]. An obvious implication is that the responsibility for health- related matters is increasingly being passed on to the individual, partly because of policy and legal challenges that have entitled patients to fuller information access. In many ways the goal has been greater patient empowerment, defined by the World Health Organization (WHO) as, “a process by which people, organizations and communities gain mastery over their affairs” [7] or, in a more pragmatic sense, as “self-reliance through individual choice (consumer perspective)” [8].

J. E. Andrews et al.

Given the centrality of patients to clinical research, the evolution toward greater involvement and empowerment poses challenges and issues that stand to impact how clinical research might be conducted in the future, as well as its ultimate success in affecting desired outcomes and discovery. Trends in health consumerism have fueled this movement, as have new tools (such as wearable technology supporting patient- generated data) and increasingly effective access to authoritative information resources, social networking capabilities, and personal decision aids. Yet consumerism and empowerment assume, or even require, some level of health and information literacy on the part of consumers. Variations in consumer abilities to access, collect, transmit, and use information or to navigate health organizations are among the factors developers, researchers, and healthcare providers consider. This culmination of changes in healthcare, motivated in many ways by the rapid evolution of information technologies and in parallel with a shift toward increased patient decision-making and empowerment, needs to be considered across clinical research contexts, from recruitment and participation to, ultimately, successful outcomes, precision medicine, and more rapid discovery. As noted, there is a growing onus on individuals to develop literacy skills (health, information, and numeracy) in order for all to fully realize the potential. For those who are less able, or unwilling, to move in that direction, new structures or pathways may need to be created by informaticians. This chapter explores these and other developments first via a broad look at some foundational issues in health communication as related to patient engagement and research. We also discuss the information environment within which health consumers are immersed and the changing nature of patients’ information world. We see that core findings from communication research have relevance for our current understanding and future studies of the evolving role of the patient or consumer in clinical research. We also describe some emerging models and tools that seem to hold promise for helping usher in the next generation of clinical research consumers.

23 Evolving Opportunities and Challenges for Patients in Clinical Research

Patient Engagement National research goals have shifted over the past decade resulting in an increase of initiatives to promote patient engagement in clinical research. Such efforts generally seek to address the historical disconnect between patient and investigator research priorities and perspectives, which have contributed to problems such as very low clinical trial accrual rates. We are learning more about approaches to patient-engaged research models, resulting in patients and/or patient communities that are actively involved participants in the design, recruitment, data collection, and dissemination of clinical trial results [9, 10].

fforts Supporting Patient E Engagement The emphasis on patient-centered care and research, and specifically on improving the engagement of patients in research, has been visible in the USA through national-level efforts. In 2010, the Patient-Centered Outcomes Research Trust Fund (PCORTF) was established by the Patient Protection and Affordable Care Act of 2010. A premise of PCORTF is that the outcomes of research and evaluation of new interventions search must be seen as meaningful not only to clinical investigators but also by the patients themselves. The use of measures such as quality of life indicators in addition to traditional laboratory results can enable the patient voice to be incorporated into research results [11]. Thus, PCORTF provides federal funding for the Patient- Centered Outcomes Research Institute [12, 13]. According to PCORI, their mission is to aid patients, families and their caregivers to “make informed healthcare decisions […] by producing and promoting high-integrity, evidence-based information that comes from research guided by patients, caregivers, and the broader healthcare community” [14]. A primary initiative of PCORI has been the establishment of the National Patient-Centered Clinical Research Network (PCORnet) which is a distributed research net-

475

work linking health information from over 130 patient groups and health systems (and millions of patients across the USA) as of 2015 [15]. PCORnet aims to leverage patient health data from the outset of each project, by partnering with stakeholders across the USA including patients and patient advocacy groups, in addition to research and clinical stakeholders, to improve health research efficiency, safety, and access to difficult to reach patient populations such as those patients with rare diseases. This extensive health research information network also has the potential to improve clinical research participant diversity [14]. Furthermore, patients are empowered with significant influence on the selection and oversight of research projects [15]. Another major federal effort emerged out of the National Cancer Institute’s (NCI) Cancer Moonshot’s Blue Ribbon Panel, which promoted initiatives to develop a network involving patient engagement in research [16]. These include such groups as the Rare Tumor Patient Engagement Network, a Partners in Clinical Research portal, Patient Engagement and Cancer Genome Sequencing Research Network, and others. The focus from NCI’s efforts, though focused on cancer, has the same broader goals of diversity of participants, sharing and communication of results, and helping to speed research discovery and translation into care.

Strategies and Best Practices Results are emerging from both broad-scale and more focused efforts on building patient engagement and involvement across the clinical research spectrum. We are now seeing some of the more effective strategies and best practices, although in practice the prevalence of patient engagement across clinical trials still remains low [17]. PCORI developed an Engagement Rubric designed to help guide how input from patients and stakeholders can be built into the research process throughout the study, from design to conduct and to dissemination and based on principles encouraging equitable partnerships [18]. The

J. E. Andrews et al.

476

rubric is useful for investigators planning to apply for PCORI funding but can be applied beyond that for other engagement activities. Along with these efforts, more is being understood on the strategies needed to overcome barriers to adoption of eConsent to facilitate and authenticate participation in newer models for clinical research recruitment and participation [19]. The Europe-based Patient Focused Medicines Development [20] provides other types of tools for a range of groups that support or conduct drug and treatment clinical research. A central tenet of the organization is to create medicines “not just for patients, but with patients” [21]. In support of this, this initiative has developed resources and guides for their implementation that have been reported in the literature and are publicly available [21]. Underlying all of these tools or resources is the belief that the experience-based expertise unique to patients (as well as certain non-patients in underrepresented demographics) is important to patient-centered, equitable approaches [22– 24]. Considerations for improving research studies consistently include the need to improve patient engagement in the research process, recruitment, and retention in projects and to produce research that is more relevant and accessible to consumers [25]. A systematic review by Harrison et al. [26] of dozens of studies on patient engagement found that the most common elements of stakeholder engagement were “respect, equitable power, and trust” [p. 312]. Similarly, a group of European researchers did a scoping review of various research and related materials and noted a number of benefits, as well as costs and challenges and suggest approaches to monitor and evaluate engagement across the research continuum [27]. Their explorations reveal a number of measures, rather than any particular number, of “patient involvement” in research and the “return on engagement,” as well as numerous challenges for evaluating these concepts at various points in the research process, ranging from the priority setting to design of studies to dialogues with regulators. Patient involvement requires some level of mutual understanding and shared expectations,

both ethical and practical, across groups that may have very different goals in a clinical research endeavor. Researchers may be seeking a better understanding of how to achieve a successful completion of a study and generation of new knowledge related to the management of disease and so propose strategies and measures designed to speed up and improve patient involvement in the research process, eventually developing patient experts [9]. One qualitative study of PCORI study investigators reported that challenges and opportunities for supporting patient and stakeholder engagement and expectations include three themes: infrastructure support, relationship building, and continual maintenance of the relationships. Practical suggestions for improving these might come from organizational policies, exploring innovative models, and viewing engagement as an investment [28]. None of these is without practical challenges. For instance, a group focused on the engagement of citizens, particularly elderly citizens, found that there was ample sense of the benefits have engaging patients in the research process but noted that practical barriers complicated integrating them this into the project [29]. Although there is an increased expectation to engage citizens without clinical research experience into the research teams, there is a concomitant need for practical support in evaluating their competencies and readiness for their engagement [30]. Even seemingly similar groups face different perceptions of the obstacles encountered and show varying needs when engaging patients in research [31]. Researchers often need some guidance as to how or when to involve patients and how best to communicate with them throughout the project [29].

Citizen Scientist In what might be seen as a natural culmination of the increasingly engaged world of health consumers, fueled by an increasingly sophisticated information environment, the notion of “Citizen Scientist” is now emerging. In the USA, the NIH National Center for Advancing Translational Sciences (NCATS) and the National Cancer

23 Evolving Opportunities and Challenges for Patients in Clinical Research

Institute (NCI) have supported the concept of the citizen scientist to reflect health consumers collaborating with researchers to advance biomedical science to create new biomedical knowledge. Potential ways health consumers may collaborate in translational or clinical research include (but are not limited to): aiding in data collection (e.g., patient-generated data), joining advisory boards, contributing to user-centered design of health information technology (HIT), aiding in subject recruitment, and dissemination of clinical trial results. One example of public citizens aiding in clinical research through crowdsourcing is the Mark2Cure project supported by NCATS which recruited “citizen scientists to help mine biomedical literature” (https://ncats.nih.gov).

I nformation Behavior and Health Literacy This overload of information forces decentralization of effort, with increasing responsibility passing to individuals, with their effectiveness determined by their ability to gather, then intelligently act on, health information. (J. David Johnson and Donald O. Case [32])

Quotes attributed to Dr. Warner Slack, such as the one cited at the beginning of this chapter, relate broadly to the participation of patients in their care motivated in many ways by the advent of the electronic medical record and other enabling technologies. Dr. Slack was prescient in seeing the vast potential in such technological empowerment of patients but must also have known that the challenges they would face with a bewildering information and technological landscape that would need to be navigated. To effectively operate in today’s health information and technologically advanced environment implies a certain level of information seeking and health literacy skills. This assumption poses a major challenge not only to patients but also to health-related researchers and professionals working to positively affect the role of patients in clinical research. There is an array of readily accessible information resources that can overwhelm even the savviest of professionals or

477

self-directed information seekers. Moreover, the biomedical and health information available to the general public is increasingly complex (e.g., consider the number of advanced biological, pharmaceutical, or genetic information sources). In this rapidly evolving information age, we are “still struggling with how to cope with turning this information into action” [32, p. 6], and so our traditional approaches to health communication and information, such as public health campaigns, are evolving along with the technologies and channels used in their exchange. Recognition of the variety of actions that individuals take to learn more about their health, as well as to ultimately become engaged in research or other health behaviors, is among the core challenges to clinical research informatics. Traditionally, health agencies and organizations have conducted campaigns to inform the public and change individual behaviors [33], including encouraging participation in research. Public health communication campaigns represent “purposive attempts to inform, persuade, or motivate behavior changes in a relatively well- defined and large audience, generally for non- commercial benefits to the individuals and/or society, typically within a given time period, by means of organized communication activities involving the mass media and often complemented by interpersonal support” [33]. More recently, however, it is individual actions, particularly health-related information seeking, that determine what messages individuals will be exposed to and how they will behave. There are several motivators that might instantiate information behaviors where individuals could explore a host of sources of information of varying authoritativeness, completeness, or accuracy. This contrasts directly with more traditional health information campaigns that tend to view the world as rational, known, and which concentrate on controlling individuals to seek values of efficiency and effectiveness [33]. A focus on information seeking develops a true receiver’s perspective and forces us to examine how an individual acts within the new information environment instantiated by the previously mentioned array of information carriers. Some of

478

these carriers may be actively trying to reach individuals, as in health campaigns, but many contain passive information awaiting access and use. Traditional health communicators learned that classic approaches are not very effective unless the needs of the audience and their reaction to messages are considered [34, 35]. It became apparent that while there were some notable successes, audiences could be remarkably resistant to campaigns, especially when they did not correspond to the views of their immediate social network [36–39]. Indeed, campaigns have tended to reach those who are already interested and typically bypass those who are most in need of their messages [38]. In effect, these earlier classic studies reveal that campaigns ironically tend to reach those who are already converted. While this might have a beneficial effect of further reinforcing beliefs, the audience members who are most in need of being reached are precisely those members who are least likely to attend to health professionals’ messages [36]. Clearly, it is now a commonplace recognition that mass media alone is unlikely to have the desired impacts and must be supplemented with interpersonal communication as well as within social networks [40], thus giving rise to the near ubiquity of information technologies supporting social interactions and sharing. Campaigns may result in felt needs on the part of the individual, but the individual and his or her placement in a particular social context will determine how needs are acted upon. This focus on receivers dovetails nicely with the renewed focus on the patient as consumer, as expert, and as one seeking empowerment.

Information Fields The area of health information behaviors has generated countless studies from a range of disciplines. A deep dive into these would be outside the scope and purpose of this chapter; however, some brief, focused discussion can help frame how we might view and support patients’ roles in clinical research. One such conception is that of the information field within which a person is

J. E. Andrews et al.

embedded [41]. The concept of field has a long tradition in the social sciences tracing back to the seminal work of Lewin [42] with interesting recent variants such as the information horizons approach [43]. Potential fields for patients have become far richer over the last decade or so, providing them resources that can dramatically change their relationships with clinicians and researchers, as well as with patient advocacy groups and other health-related agencies and organizations. An individual’s information field provides the more static context for their information seeking, containing resources (e.g., health-related websites and datasets), constraints (such as numeracy skills or access), and carriers of information (the increasing array of media available) [3, 44]. It provides the starting point for information seeking [45] representing the typical arrangement of information stimuli to which an individual is regularly exposed [46] and the information resources they routinely use [43]. Individuals are embedded in physical and virtual worlds that involve recurring contacts with an interpersonal network of friends and/or family and, increasingly, strangers. They are also regularly exposed to the same mediated communication channels. Thus, an individual’s particular information field that they operate in across information behaviors necessarily constrains the very possibility of selecting particular sources of information, since these vary from one person to another. People can, if they so desire, arrange the elements of their information field to maximize their surveillance of health information, providing an initial contextualizing of their environment. As individuals become more focused in their information seeking, they change the nature of their information field to support the acquisition of information related to particular purposes [47]. In this sense, individuals act strategically to achieve their ends and in doing so construct local communication structures in a field that mirrors their interests [48]. How they shape this field over time determines not only their knowledge of general health issues but also their incidental exposure to information that may stimulate them to more purposive information seeking. The nature of an

23 Evolving Opportunities and Challenges for Patients in Clinical Research

individual’s interpersonal environment, or social fields, has important consequences for information seeking and for health practices [4]. Its importance is increasing with rising consumerism, a focus on prevention, and a greater focus on individual responsibility. In a sense, individuals are embedded in a field that acts on them, the more traditional view of health campaigns. However, and relevant to our discussions here, they also make choices about the nature of their fields, the types of media they attend to, and the social media they participate in, often based on their information needs and preferences, which is greatly facilitated by the Internet and explosion of choices among even traditional media such as cable television and online media.

Health Literacy Health literacy interacts with, or mediates, health behaviors including information seeking. We generally refer to health literacy as being the skills one possesses that enable navigation of the healthcare system, for engaging in information seeking to learn more, and other health-oriented behaviors. When confronting a health problem, one’s level of health literacy determines their need for information and what should be sought. [32]

The Healthy People 2030 initiative outlined by the US Department of Health and Human Services offers two concepts that contribute to their broad definition of health literacy. These are: Personal health literacy is the degree to which individuals have the ability to find, understand, and use information and services to inform health-related decisions and actions for themselves and others. Organizational health literacy is the degree to which organizations equitably enable individuals to find, understand, and use information and services to inform health-related decisions and actions for themselves and others [49]. Of note is that the above definition recognizes the “contextual” nature of health literacy and the public health perspective, involving not only an

479

individual’s ability to find and use relevant information for a given context, but that organizations and other producers of health information and services affect health literacy [49]. Studies of both US and European populations reveal very low levels of health literacy. In 2019, the National Academy of Medicine reported that one tenth to one third of people have only basic or low health literacy [50]. In the USA, the CDC states that as many as nine out of ten adults can struggle with health information if it includes complex medical terms [51]. It follows that these numbers are more pronounced in the context of clinical research given the complexity of trials, experimental drug names, and nonconventional treatments. Thus, understanding health information literacy issues will be important to endeavor geared toward increasing patient participation and engagement in clinical research, or related efforts in the design and implementation of related informatics technologies. A commentary by the National Academy of Medicine (NAM) in the USA outlines important communication-based actions that can support patient engagement in trials and other clinical research [50]. These actions span each aspect of the patient’s “journey” throughout the research lifecycle, from discovery of a trial or treatment, to a study’s end. Similarly, the CDC has emphasized the importance of clear communication with patients and to text materials with intended audiences for feedback on comprehensibility, clarity, and appropriateness [51]. A recent roundtable on health literacy in clinical research was sponsored by the National Academies of Science, Engineering, and Medicine (NASEM) [52]. The proceedings cover a range of topics on the importance of literacy in the context of clinical research and the state of the art of best practices for incorporating these into trials. The importance of health literacy to successful recruitment and retention and ultimate success of clinical trials is an emerging theme. An overarching issue and impetus for this are that patients need to feel that they have some level of understanding of what they are signing up for and are consenting to, based on an appropriate level of understanding not clouded by incomprehensible or otherwise

480

obfuscating communication practices, and that a great onus lies on investigators and organizations to address the needs of patients engaged in decisions this serious. Increasingly, this means greater attention and understanding of literacy as an important factor to address some of the historic disparities in representation in trials of underserved populations. Information seeking and health literacy related to clinical trials are emerging as key factors to be addressed in any effort to help create more equitable and accessible clinical research. Motivations for seeking information about trials are becoming the focus of theory-based research in information seeking and the information needs of those in this sub-domain [53–55]. While it is useful to try and understand the antecedents and other factors that might impede or help initiate people’s information seeking and efficacious use in the context of seeking relevant clinical trials, often in highly stressed and emotional conditions, it is difficult to translate into practice. As individuals, patients have different motivations, belief structures, and abilities, to a name a few, and at best we can only hope to facilitate their needs openly and with clarity. This can be done, in part, with effective information and communication strategies and through engagement and equitable outreach.

J. E. Andrews et al.

pation tools may lead to more positive outcomes, especially for rare diseases [58]. The notion of health information seeking and participation in healthcare research in a social world is one that may manifest in various ways across one’s information environment. We take a look at a few of these to show different perspectives beyond the individual when exploring more social aspects of clinical research involvement and patient engagement and empowerment.

he Role of Third Parties in Information T Seeking There are several ways that the use of third parties can complement clinical practice and, by extension, participation in research. First, individuals who want to be fully prepared before they visit the doctor often consult the Internet [59, 60]. Lowery and Anderson [61] suggest that prior information use may impact respondents’ perception of physicians. Second, there appears to be an interesting split among Internet users, with as many as 60% of users reporting that while they look for information, they only rely on it if their doctors tell them to [59, 60]. While the Internet makes a wealth of information available for particular purposes, it is often difficult for the novice to weigh the credibility of the information, a critical service that a knowledge broker, such as a clinical professional or consumer health librarian, can provide. This suggests that a precursor to Social Environment for Patients a better patient-doctor dialogue would be to increase the public’s knowledge base and to proA compelling development in consumer health vide alternative, but also complementary, inforover the past 15 or more years has been the rise of mation sources by shaping clients’ information a dynamic social world fueled further by Internet- fields. To achieve behavioral change regarding based social media applications. The frequently health promotion, a message must be repeated cited Pew Internet report on the social life of over a long period via multiple sources [62]. By health information showed that large percentages shaping and influencing the external sources a of adults seek health information online [56]. patient will consult both before and after visits, While most of all adults continue to seek infor- clinical practices can simultaneously reduce their mation from traditional sources (e.g., health pro- own burden for explaining (or defending) their fessionals), the social world is robust, with more approach and increase the likelihood of patient than half of online health information seekers compliance. It is not a stretch to see the implicadoing so for someone else and discussing such tions these ideas have for clinical research information with others [56]. Online support accrual, retention, and overall satisfaction. groups also show signs of fostering patient Although intermediaries (e.g., navigators) empowerment or management [57], and partici- play an important role despite an increase of

23 Evolving Opportunities and Challenges for Patients in Clinical Research

more Web-based consumer health information, increasing health literacy by encouraging autonomous information seekers also should be a goal of our healthcare system [49, 63]. While it is well-known that individuals often consult a variety of others before presenting themselves in clinical or research settings outside of HMO and organizational contexts, there have been few systematic attempts to shape the nature of these prior consultations. If these prior information searches happen in a relatively uncontrolled, random, parallel manner, expectations (e.g., treatment options, expected outcomes, diagnosis, trial retention and completion) may be established that will be unfulfilled. The emergence of the Internet as an omnibus source of information also has apparently changed the nature of opinion leadership; both more authoritative (e.g., medical journals and literature) and more interpersonal (e.g., support or advocacy groups) sources are readily available and accessible online [64]. This is part of a broader trend that Shapiro [65] referred to as “disintermediation,” or the capability of the Web to allow the general public to bypass experts in their quest for information, products, and services. A risk here, however, is that individuals can quickly become overloaded or confused in an undirected environment. In other words, while the goal may be to reduce uncertainty or help bridge a knowledge gap, the effect can be increased uncertainty and, ultimately, decreased sense of efficacy for future searching. Going back to our earlier point, a focus on promoting health information literacy would mean helping people gain the skills to access, to judge the credibility of, and to effectively utilize a wide range of health information. Increasing use of secondary information disseminators, or brokers, is really a variant on classic notions of opinion leadership [66] and gatekeepers [64]. Opinion leadership suggests ideas flow from the media to opinion leaders to those less active segments of the population serving a relay function, as well as providing social support information to individuals [67], reinforcing messages by their social influence over them [68], and validating the authoritativeness of the

481

information [66]. So, not only do opinion leaders serve to disseminate ideas, but they also, because of the interpersonal nature of their ties, provide additional pressure to conform as well [67]. Another trend in this area is the recognition of human gatekeepers, community-based individuals who can provide information to at-risk individuals and refer them to more authoritative sources for treatments [3]. Recognizing the powers of peer opinion leaders, many health institutions are establishing patient advocacy programs, for example, where cancer survivors can serve to guide new patients through their treatments. However, these highly intelligent seekers also may create unexpected problems for agencies since they may create different paths and approaches to dealing with treating a disease or motivating clinical research studies.

Self-Help and Advocacy For a number of years, formal groups have continued to serve as opinion leaders and information seekers for individuals and have supported their everyday health information needs. Self- help groups are estimated to be in the hundreds if not thousands across a wide variety of diseases with members numbering in the millions [57]. They also can provide critical information on the personal side of disease: How will my spouse react? Am I in danger of losing my job? Will I get proper treatment in a clinical study? etc. Self-help groups might also prepare someone psychologically for a more active or directed search for information once his or her immediate personal reactions are dealt with, or as more knowledge is gained on a particular disease, clinical trial options, and so on. Driving this movement has been the belief that self-help groups have the potential to affect outcomes by supporting patients’ general well-being and sense of personal empowerment [57], and the diversity of tools now available have the potential to realize this. The Internet has increased the impact of these groups and the functionality and tools available to individuals, with the additional twist that formal institutions or private companies often support these groups. One prominent and relatively

J. E. Andrews et al.

482

recent example of a robust and multifaceted online support system (or health social network) is PatientsLikeMe (PLM). PLM is essentially an online support group that uses patient-reported outcomes, symptoms, and various treatment data to help individuals find and communicate with others with similar health issues [69]. Its developers have noted that the essential question asked by patients participating in one of the several disease communities is “Given my current situation, what is the best outcome I can expect to achieve and how do I get there?” [70]. Personal health records, graphical profiles, and various communication and networking tools help patients in their quest to answer this. Enhanced access to others willing to share experiences is obviously critical and would certainly have been nearly impossible prior to the information and communication technologies available today. Another prominent and long-lasting self-help intervention is the Comprehensive Health Enhancement Support System (CHESS) which has focused on a variety of diseases with educational and group components, closed membership, fixed duration, and decision support [71]. Computer-mediated support group interventions such as CHESS have been shown, according to a meta-analysis, to increase social support, to decrease depression, and to increase quality of life and self-efficacy, with their effects moderated by group size, the type of communication channel, and the duration of the intervention [71].

Although motives will vary from one group to the next, commonalities across these include diverse approaches for social support, information exchange, patient data tracking, and also finding and connecting patients to clinical trials. A few examples of these other sites with varied tools for patients are shown in Table 23.1 below: The emergence of patient advocacy groups (PAGs) over (at least) the last half century comes from people with the same disease or afflictions who need to share efforts in facing similar challenges, exchange knowledge that is recognized as different from that of health professionals, and to speak with a more unified voice to impact policy and promote research [72]. Advocacy groups have interests beyond serving and supporting the needs of their individual members; however, they may seek to change societal reactions to their members or ensure that sufficient resources are devoted to the needs of their groups [73]. At times, these groups will have agendas that do not necessarily coincide with an individual’s needs. Advocacy groups need members to advance the group’s agendas. For example, they often are especially interested in insuring that the latest information on treatment is made available to patients, sometimes pressing for the release of information on experimental treatments before they would traditionally be available. A recent Department of Health and Human Services rule requires drug development transparency through increased patient access to information on

Table 23.1 Examples of patient-directed resources for clinical research studies Site ClinicalTrials.gov Center Watch Pfizer Link Antidote (formerly TrialReach) Fox Trial Finder ISRCTN Registry

Description Publically available clinical trial database

URL https://www. clinicaltrials.gov/ Clinical trial information for patients & professionals http://www. centerwatch.com/ Online community for their clinical trial participants https://www. pfizerlink.com/ Digital health company which matches patients with pertinent https://www.antidote. research studies me/ Aids in recruiting clinical trial participants for Parkinson’s studies https://foxtrialfinder. michaeljfox.org/ Accepts all proposed, ongoing, and completed clinical research https://www.isrctn. studies and works to “support researchers in providing lay summaries com/ and research feedback”

23 Evolving Opportunities and Challenges for Patients in Clinical Research

experimental therapies and expanded access to these treatments. The rule requires investigators to submit information on expanded access to experimental therapies to ClinicalTrial.gov and is administrated by the FDA and NIH [74]. Advocacy groups have been particularly active when there are no clear treatment options or when they are perceived to be ineffective [73]. Thus, at times individual and group interests coincide, and at times, of course, they do not.

483

Direct-to-Consumer Testing and Data Collection for Research

In contrast to primarily investigator research data collection in clinical trials, there are several business models emerging that empower patients to directly engage in clinical testing, data collection, and research. These models promote consumer engagement in research through patient consent for de-identified data sharing. One example is direct-to-consumer (DTC) marketing of genetic testing by companies such as 23andMe Evolving Medicine and Immersive and AncestryDNA. This type of genetic testing Technologies in Clinical Research may also be referred to as “at-home genetic testing” [76]. The consumer pays a fee for their Empowering patients with accessibility to and genetic profile incentivized by the ability to learn ownership of their own medical data reverses the more about their genetic makeup and possible predominantly one-way dynamic of today’s health links to health disorders. Consumers do not need care system. [75] to have a healthcare provider or insurance comIn the previous sections, we used broad strokes pany involved in the ordering process; however, to lay a foundation based on traditional commu- some companies may have a healthcare provider nication and health information research that or counselor available to discuss the results. For may be useful for framing an understanding of example, in 2018, the Food and Drug the evolving role of patient and consumers. This Administration (FDA) authorized 23andMe to builds on our premise that the natural goal in the test for BRCA1 and BRCA2 genetic mutations to current state of the consumer health movement is identify women’s increased lifetime breast canthe fostering of patient or consumer empower- cer risks. ment and engagement. In part, this means a conWith patient consent, health data collected can tinuing shift from traditional models of medicine also be used by researchers to study the genetics and clinical research to ones where patients have of health conditions on a population level. a greater role in their own care and decision- According to the American Society of Human making, from treatment options to involvement Genetics (ASHG), the benefits of this model in clinical research to initiating and conducting include increased consumer access and consumer research themselves. The core issues relate to empowerment. Furthermore, increasing conmore than choice in and of itself but rather choice sumer access to genetic testing may result in for achieving more personalized medicine, for more genetic data availability for health researchincreasing safety in research and care, and for ers and increased consumer awareness of genetic accomplishing other altruistic aims that may be disorders [77]. The risks of this direct-to- supported by social networks that enable consumer model include consumer misinterpreknowledge transfer, greater voice, and concerted tation of results, lack of access to genetic action evoking the wisdom of crowds. counseling, accuracy/validity of laboratory In this section, we offer a discussion of newer or results, and privacy concerns [76, 77]. In the emerging models and enabling technologies that we example of breast cancer noted above, 23andbelieve will help in the movement toward greater Me’s ability to test for the BRCA gene have been emphasis on consumer empowerment, patient met with potential for issues with over testing, engagement, and evolving consumer/patient rela- spread of misinformation, and a misallocation of tionships with information and technology. resources [78].

484

J. E. Andrews et al.

Crowdsourcing

Mobile Health (Digital Health)

In addition to the aforementioned PLM, there are other, large-scale initiatives that are built upon consumer engagement in research from both the private and government sectors. Specifically, the All of Us Research Program, which is part of the National Institute of Health (NIH) Precision Medicine Initiative, has been soliciting patient health data contributions from 1 million participants to support clinical research to improve the ability to deliver more personalized and diverse healthcare (All of Us Research Program, https://allofus.nih.gov) [79]. The initiative is unique in that it will collect lifestyle and environment measurements in addition to biological and genetic data, which are often not entirely captured in traditional clinical datasets such as electronic health records [79]. The NIH also plans to leverage mobile health technology (see “Mobile Health” section below) to aid in the collection of activity and environmental data element measurements to study activity and environmental exposure effects on personal health. However, with this new role of the patient as a partner in research comes the ethical/moral obligation of the patient to provide accurate and quality data. The efforts of institutes such as PCORI and the NIH (All of Us) to engage health consumers in clinical research, especially through health data contributions, may improve clinical investigator access to health data, especially data on rare diseases and data not easily accessible from electronic health records, to support precision medicine research efforts. Precision medicine is defined by the Precision Medicine Initiative as, “an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person” [76, 80]. Crowdsourcing efforts, such as the 1 million patient data collection goal by the NIH All of Us Project, are required to collect the necessary diverse dataset to support this type of research.

Mobile health, or mHealth, has grown significantly over the past few decades as healthcare consumers have near ubiquitous access to cellphones, with nearly 85% of Americans owning smartphone technology [81]. Some now refer to this area as “digital health” [82]. These smartphones and other mobile health technology (e.g., tablets and wearable sensors) can be leveraged by clinical researchers to collect patient-generated data such as lifestyle data (e.g., diet, sleep logs, exercise data), social data, and environmental data. The utilization of real-time data may offer more insights as to how these factors affect clinical trial outcomes. Furthermore, eConsent can be utilized by researchers to facilitate clinical trial involvement. Wearable devices that are often connected digitally to smartphone technology can be used to record physiological measures in clinical trials such as pulse oximetry, blood glucose levels, vital signs, and heart rhythm [81, 83, 84]. As noted previously, in this model of data collection, the consumer has a role in assuring the quality and quantity of data provided to the clinical researcher. However, with improving technology, some devices can record and/or transfer data automatically via wireless mobile connections which reduces “the burden” on consumers to participate in these types of clinical trials. A search of ClinicalTrials.gov for the keywords “continuous glucose monitoring” returned 1244 studies. Devices such as the Dexcom connect with smartphones to record and share glucose data (https:// www.dexcom.com/). In addition to wearable sensors, health consumers can participate in research by providing “long-term” self-reported data through mobile apps to monitor mental health, substance abuse, and stress levels. The use of mobile apps may reduce reporting burden [85] and provide more accurate real-time data recording which could potentially reduce recall bias. Simplicity and usability of the mobile apps should be considered prior to use in clinical research [86]. Often these

23 Evolving Opportunities and Challenges for Patients in Clinical Research

apps are assessed for usability and patient acceptance prior to their use in clinical practice and/or research [87]. Other clinical trials which could potentially benefit from patient and caregiver at home monitoring and data collection via health information technology devices include geriatric studies on falls in the elderly, dementia studies, and Parkinson’s disease symptom monitoring studies [88–90]. With the aging population, health information technology can support the use of a “smart home” to promote aging in place and generate data for further study [91]. In this way, the patient can support a health system which “learns” how to provider safer and higher quality care through evidence-based medicine [92]. However, research has found that studies relying on mHealth apps suffer from substantial participant dropout or attrition, leading to concerns surrounding sample representativeness and study effectiveness [93]. Research participants utilizing mHealth apps are more likely to drop out than be retained, making strategies for engagement crucial. A research study’s methods to increase retention could include setting participant reminders, implementing in-app support from peers and coaches, and pre-study usability assessments [93]. Furthermore, according to Dr. Istepanian of the Institute for Global Health Innovation, there are several “unknowns” related to the use of mobile technology in health which include security concerns, ethics, and effects on human behaviors [82].

Clinical Trial Involvement One area where the limitations of traditional health campaigns is most clearly revealed is in the difficulty and considerable expense involved in recruiting people into clinical research studies. According to Allison [94], less than 3% of eligible cancer patients enroll in trials, and roughly one in five of NCI-sponsored trials fail to meet their necessary enrollment [68, 95]. Trial recruitment is particularly challenging in the context of

485

rare diseases, where there are relatively low numbers of affected individuals who are usually geographically dispersed. Even with new technologies to better match patients with trials or other health information, privacy and credibility underlie and potentially impede these efforts [96], and researchers must consider whether they are getting representative samples given that those seeking trials might disproportionately represent certain demographics [97]. The extremely low accrual rates in clinical research show that even within subsets of the population who might be eligible to participate in particular trials, the traditional “one-size-fits-all” approach to health campaigns is insufficient. Expectations have understandably risen on the part of consumers, who have access to more targeted or even personalized information to assist them with such decisions and whose support groups may reinforce their natural predispositions. Health-related social networks and crowdsourcing do offer the potential to overcome some of the discouraging barriers to patient recruitment into clinical trials [98] and other research projects. Projects like All of Us can greatly facilitate researcher access to willing populations for those who go through elaborate approval processes. Of course, there are a number of affective and practical reasons individuals are not, or cannot, be part of a clinical trial. Certainly, in traditional clinical research, access to the study site is an issue that is not easily overcome by many, particularly those in rural, underserved areas. Moreover, many patients understandably question how involvement in a study might impact his or her quality of life, even if they have strong feelings of altruism. Human nature suggests there might also be concerns of bias by physicians seeking to enroll patients into a trial and knowing which trials are available has been a challenge even with such national efforts as ClinicalTrials. gov [94]. Social networking platforms present the potential for studying existing data as well as for mining these sites for likely study populations based on eligibility criteria or other factors [98],

486

while the nature of the participants in many of these sites may be that they are already willing partners seeking to find a path to a positive outcome for themselves and others like them. With reportedly 1/3 of traditional trial recruitment sites failing to recruit a single patient [94], online patient communities offer a far more promising outlook. For instance, Facebook advertising of trials has been shown to improve recruitment [99, 100], including hard-to-reach populations [101]. Critical to this potential revolution, however, is an understanding that such communities are not merely a gathering ground for X number of people with disease Y looking for a cure. Rather, these are increasingly savvy consumers who have empowered themselves with personal and collective knowledge and expertise, who are not likely to respond to every call for participants, and who have been known to share information on ongoing trials in ways that can be very disruptive of traditional research. In other words, a shift in the research model will certainly need to be advanced but only with the consent of a more influential group. For example, physicians and nurses have been shown to act as key influencers for patients in their clinical trial consideration process yet refer a small number due to their inability to access information on available trials [102]. Thus, potential collaborations among site developers, healthcare providers, researchers, and patients could expedite research and advance the needs of all groups [102, 103]. There are several factors that impact accrual to trials, retention, and satisfaction. Many of these can be enhanced or supported via consumer informatics tools or well-designed information systems and research studies. For instance, many patients place a high value on the information they receive during research trials and cite it as a key reason for choosing to participate [104–106]. Thus, bearing in mind health literacy issues, consumer involvement in research demands study designs that can be clearly, simply explained to participants without using jargon, with similarly accessible information updates throughout the research process [23, 107, 108]. Consumers also report that communication from researchers may be frequent at the start of studies but drops off

J. E. Andrews et al.

dramatically, with many patients never even knowing the final study outcomes. Patients repeatedly cite the lack of follow up information (including results) from researchers [99–111], confirming related findings that patient engagement occurs a lot in the beginning, and very little as the study progresses. Lastly, while altruism has been found to be a major motivator for participation in clinical research, people want to see how they might have made a difference [106, 109]. This can be achieved with even simple steps using email, texts, call-ins, annual “thank you” breakfasts, or social media updates, in order to help keep study participants in the loop, so to speak [109, 112]. However, it is important to note that racial and ethnic minority groups face significant health disparities (e.g., sociocultural barriers, lack of trust) making trial recruitment and accrual even more challenging. Additionally, this documented underrepresentation allows for the external validity of trial findings to be challenged [113]. African Americans make up 12% of the population yet participate in less than 5% of clinical trials, and Hispanics represent 16% of the population and participate in less than 1% of clinical trials [114, 115]. Research has shown that educational interventions may help to improve willingness to participate but that White women and those with more education are most positively impacted by the use of educational materials [116]. There is also growing awareness of using AI algorithms on datasets for certain types of trial accruals that focus on richer regions (data rich areas) and less so in underserved, data poor areas, potentially creating bias and related inequities when using AI in global health contexts [117].

Augmented and Virtual Reality Future technologies offer exciting new directions for consumers and patients and their future involvement with clinical research. In particular, immersive technologies are those which create unique user experiences by merging the real world with digital or simulated realities. Often associated with gaming, the immersive technol-

23 Evolving Opportunities and Challenges for Patients in Clinical Research

ogy landscape has quickly carved its own space in the field of healthcare. Augmented reality and virtual reality are the two primary forms of immersive technology but are not quite the same. Augmented reality (AR) is technology that superimposes digital content (e.g., images, sounds, or text) over a real-world environment [118]. The most commonly known examples of augmented reality are Pokemon GO, broadcasters drawing lines on the field during football games to analyze plays, or Snapchat filters. In the context of clinical research, AR is being used to help explain complex medical situations and processes during research studies to patients and their families [119]. It is also being used to address participant screening and eligibility, as well as study management upon enrollment. AR allows for sophisticated, comprehensive patient identification programs based on facial recognition software that can be connected to a smartphone. For clinical trial management, facial scanning can store appointments, medical records, and all trial processes—eliminating the need for any trial files and linking all information directly to the patient [120]. Further, patient engagement can be increased through the use of AR by tying clinical trial processes to one’s smartphone. For example, a participant in a clinical trial could hover their smartphone over their leg and tap on the areas of a graphic that allow them to input their pain scale on specific areas, which automatically gets linked to their medical chart. Virtual reality (VR) is a completely immersive experience where the physical world is not present. VR includes the use of special devices, such as a headset, that allow you to experience a three-dimensional environment. As this chapter has already addressed clinical trial recruitment as a significant challenge, VR may act as a novel and innovative opportunity for engaging and potentially increasing participant numbers [121]. One study tested the use of VR on medication adherence, finding that the technology resulted in an increase in patient medication compliance upon participating in a 7-min interactive experience on how antiretroviral therapy would work

487

in their body, consistent with previous research on the ability for VR to increase individual levels of learning and engagement [122, 123]. In the study, 94% stated that the VR component had made them more likely to take their medication. Additionally, the COVID-19 pandemic showcased the significant challenges associated with clinical research continuation, prompting the need for innovative, remote models that can transform how research is conducted. VR can largely assist in these instances, allowing for each of the following to occur remotely: study staff and participant interaction and consent, data collection, and intervention management [124–128].

onsumers’ Relationships with Their C Own Data The evolving role of consumers has also meant a more dynamic relationship with their own health data. As we have seen, national efforts toward more consumer engagement in health data collection, use, and management have been seen in PCORI, All of Us, and other innovative models. Still, there is a noticeable lack of a complete health information network in the USA which may be driving the need for patients to manage their own data using patient oriented applications such as Apple Health [110]. Blue Button is another example that emerged as an initiative to address lack of Veteran access to their own medical records [111]. The 21st Century Cures Act has undoubtedly propelled change in this area. In 2016, Congress prohibited “information blocking” under the act, significantly altering the ways in which providers shared data with their patients [129]. As of October 6, 2022, health organizations will be required to offer patients all data (including medical, billing, and other providers’ records) available upon request in electronic format, enabling patient-mediated data exchange [130]. In time and theory, patient-mediated data sharing can support aggregation of records longitudinally and from multiple providers, making it easier for clinical researchers to get complete data on medical history and procedures, enabling

J. E. Andrews et al.

488

more complete data collection and assessment of patient risk and mediating factors. Additionally, empowered patients can leverage information technologies to improve access to clinical trial results. ClinicalTrails.gov serves not only to increase patient awareness of available trials, but it also fulfills investigator obligations to share clinical results with participants, researchers, and communities. With the push for more health consumer access and control of their data nationally, new technology will emerge to fill this need. Therefore, it would be expected that large private technology companies would be interested in expanding their business models to include growth in the health informatics area [131]. Recently, Apple (e.g., Apple’s research kit) and Amazon have announced their entry into the field. This has happened in the past with attempts by both Google and Microsoft; however, these efforts were short lived due to lack of user adoption [132]. Ultimately, consumers will decide whether to change their current methods of using and storing their medical data or whether or not to share their data with researchers. The factors involved in this decision with include privacy, trust, cost, and willingness to share this information. As these enabling, participatory technologies evolve, standards and policies need to be delineated to ensure an ethical balance between a desire to share one’s personal health information and the goals of using this for research and discovery [133]. A systematic review was conducted in 2017 focusing on consumer health informatics research focused on data sharing and related technologies [134]. Across the papers reviewed, the authors stressed a need for additional emphasis to be given to issues such as privacy as a feature in the design of tools, special consent and data sharing procedures in trial recruitment, and other privacy and confidentiality focus areas [134]. Informatics researchers and developers’ attention to these areas and the usability of future enabling technologies will be central to their adoption and efficacy for expanding consumer participation in clinical research in novel ways.

Conclusions To support patient empowerment and engagement in clinical research, even in the broadest sense, now means understanding the interactions among patients or consumers themselves and between consumers and the fragmented and increasingly complex health information environment they must navigate. We have long known that information alone, whether provided by an intermediary or accessed directly, does not necessarily lead to rational choice or informed decision-making [135]. For instance, the traditional “one-size-fits-all” approach to public health campaigns is limited at best. Research in information behaviors continues to reveal that individuals facing serious health issues will seek out others with similar problems and that the notion of opinion leaders is evolving in the new social networking environments emerging online. These patient-empowered research networks allow participants in clinical trials to “… unblind themselves, pool their data, parse literature, conduct statistical analysis, and post their findings online” [136]. New technologies are enabling a personalization of medicine that facilitates more quantitative assessment of one’s own progress toward some possible positive outcome and of one’s state measured against others. While there are concerns over an increasing influence of the private sector, direct-to-consumer marketing, and related social and ethical considerations, there is plenty of promising evidence suggesting a new model of clinical research, supported by technology, is now possible: one that will help speed discovery and encourage consumer participation in a smarter and more equitable health system. In many ways, patients are becoming savvier and can make better decisions as to which trials might be a good fit for them; consequently, adverse events could be identified more quickly, thus helping to make clinical trials safer. The underlying issues are not resolved but are becoming clearer, and this clarity will help guide future research. Information fields are becoming even more fluid as choices of sources and changing technologies become available and more

23 Evolving Opportunities and Challenges for Patients in Clinical Research

ubiquitous. Patient engagement and collaboration with researchers and patient advocacy groups means enhanced knowledge sharing, and the citizen researcher can leverage this to help drive research relying on the wisdom of crowds to quickly correct erroneous information [75]. These issues are part of the evolving role of consumers and the technologies and systems that support them and have risen to be particularly salient in the context of clinical research efforts.

References 1. Goetz T. The decision tree: taking control of your health in the new era of personalized medicine. New York: Rodale; 2010. 2. Butler SL. Clinical research a patient’s perspective. In: Principles and practice of clinical research. 2nd ed. Elsevier; 2007. p. 143–53. https://doi. org/10.1016/B978-012369440-9/50017-7. 3. Johnson JD. Cancer-related information seeking. Cresskill: Hampton Press; 1997. 4. Carman K, Lawrence W, Sigel J. The ‘new’ health consumerism. Health Affairs Blog. 5 Mar 2019. https://www.pcpcc.org/2019/03/05/%E2%80% 98new%E2%80%99-h ealth-c are-c onsumerism. Accessed 12 May 2022. 5. Zeckhauser R, Sommers B. Consumerism in health care: challenges and opportunities. AMA J Ethics. 2013;15(11):988–92. https://doi.org/10.1001/virtualmentor.2013.15.11.oped1-1311. 6. Pagaria S. Consumerism in healthcare: current status, benefits and challenges. Health Manag. 2020;20(9):662–4. 7. Wallerstein N. What is the evidence on effectiveness of empowerment to improve health? World Health Organization Regional Office for Europe. 2022. https://www.who.int/europe/initiatives/ empowerment-through-digital-health. Accessed 11 Aug 2022. 8. Lemire M, Sicotte C, Paré G. Internet use and the logics of personal empowerment in health. Health Policy. 2008;88:130–40. https://doi.org/10.1016/j. healthpol.2008.03.006. 9. Sacristán JA, Aguarón A, Avendaño-Solá C, Garrido P, Carrión J, Gutiérrez A, Kroes R, Flores A. Patient involvement in clinical research: why, when, and how. Patient Prefer Adherence. 2016;10:631–40. https://doi.org/10.2147/PPA.S104259. 10. Frank L, Forsythe L, Ellis L, et al. Conceptual and practical foundations of patient engagement in research at the patient-centered outcomes research institute. Qual Life Res. 2015;24(5):1033–41. https://doi.org/10.1007/s11136-014-0893-3.

489

11. Epstein RM, Street RL. The values and value of patient-centered care. Ann Fam Med. 2011;9(2):100– 3. https://doi.org/10.1370/afm.1239. 12. PCORI. About us, Our Mission. 2022. https://www. pcori.org/about-us. Accessed 29 Jun 2022. 13. DHHS. Patient Outcomes Research Trust Fund. 2022. https://aspe.hhs.gov/patient-centered- outcomes-research-trust-fund. Last accessed 29 Jun 2022. 14. PCORI. Fact Sheet. Oct 2020. https://www.pcori. org/sites/default/files/PCORI-PCORnet-Fact-Sheet. pdf. Accessed 29 Jun 2022. 15. PCORI. 2022. https://www.pcori.org/research- results/pcornet-national-patient-centered-clinical- research-network. Accessed 29 Jun 2022. 16. NCI. 2021. https://www.cancer.gov/research/ key-i nitiatives/moonshot-c ancer-i nitiative/ implementation/patient-engagement. 17. Michaud S, Needham J, Sundquist S, Johnson D, Hanna S, Hosseinzadeh S, Bartekian V, Steele P, Benchimol S, Ross N, Stein BD. Patient and patient group engagement in cancer clinical trials: a stakeholder charter. Curr Oncol. 2021;28(2):1447–58. https://doi.org/10.3390/curroncol28020137. 18. PCORI Engagement Rubric (Patient-Centered Outcomes Research Institute) website. Published 4 Feb 2014. Updated 6 Jun 2016. https://www.pcori. org/sites/default/files/Engagement-R ubric.pdf. Accessed 29 Jun 2022. 19. Chen C, Lee PI, Pain KJ, Delgado D, Cole CL, Campion TR Jr. Replacing paper informed consent with electronic informed consent for research in academic medical centers: a scoping review. AMIA Joint Summits Transl Sci Proc. 2020;2020:80–8. 20. Patient Focused Medicines Development. https:// patientfocusedmedicine.org/patient-engagement- open-forum/. Accessed 1 Jul 2022. 21. Feldman D, Kruger P, Delbecque L, Duenas A, Bernard-Poenaru O, Wollenschneider S, Hicks N, Reed JA, Sargeant I, Pakarinen C, Hamoir AM, Patient Focused Medicines Development Working Groups 1, Patient Focused Medicines Development Working Groups 2A, Patient Focused Medicines Development Working Groups 2B. Co-creation of practical “how-to guides” for patient engagement in key phases of medicines development-from theory to implementation. Res Involv Engagem. 2021;7(1):57. https://doi.org/10.1186/s40900-021-00294-x. 22. Crocker JC, Boylan AM, Bostock J, Locock L. Is it worth it? Patient and public views on the impact of their involvement in health research and its assessment: a UK-based qualitative interview study. Health Expect. 2017;20(3):519–28. 23. Demian MN, Lam NN, Mac-Way F, Sapir-Pichhadze R, Fernandez N. Opportunities for engaging patients in kidney research. Can J Kidney Health Dis. 2017;4:2054358117703070. 24. Dudley L, Gamble C, Allam A, Bell P, Buck D, Goodare H, Hanley B, Preston J, Walker A,

490 Williamson P, Young B. A little more conversation please? Qualitative study of researchers’ and patients’ interview accounts of training for patient and public involvement in clinical trials. Trials. 2015;16(1):190. 25. Domecq JP, Prutsky G, Elraiyah T, Wang Z, Nabhan M, Shippee N, Brito JP, Boehmer K, Hasan R, Firwana B, Erwin P. Patient engagement in research: a systematic review. BMC Health Serv Res. 2014;14(1):89. 26. Harrison JD, Auerbach AD, Anderson W, Fagan M, Carnie M, Hanson C, Banta J, Symczak G, Robinson E, Schnipper J, Wong C, Weiss R. Patient stakeholder engagement in research: a narrative review to describe foundational principles and best practice activities. Health Expect. 2019;22(3):307–16. https://doi.org/10.1111/hex.12873. 27. Vat LE, Finlay T, Jan Schuitmaker-Warnaar T, Fahy N, Robinson P, Boudes M, Diaz A, Ferrer E, Hivert V, Purman G, Kürzinger ML, Kroes RA, Hey C, Broerse JEW. Evaluating the “return on patient engagement initiatives” in medicines research and development: a literature review. Health Expect. 2020;23(1):5–18. https://doi.org/10.1111/hex.12951. 28. Heckert A, Forsythe LP, Carman KL, Frank L, Hemphill R, Elstad EA, Esmail L, Lesch JK. Researchers, patients, and other stakeholders’ perspectives on challenges to and strategies for engagement. Res Involv Engagem. 2020;6:60. https://doi.org/10.1186/s40900-020-00227-0. 29. Chamberlain SA, Gruneir A, Keefe JM, Berendonk C, Corbett K, Bishop R, Bond G, Forbes F, Kieloch B, Mann J, Thelker C, Estabrooks CA. Evolving partnerships: engagement methods in an established health services research team. Res Involv Engagem. 2021;7(1):71. https://doi.org/10.1186/ s40900-021-00314-w. 30. Mallidou AA, Frisch N, Doyle-Waters MM, MacLeod MLP, Ward J, Atherton P. Patient-oriented research competencies in health (PORCH) for patients, healthcare providers, decision-makers and researchers: protocol of a scoping review. Syst Rev. 2018;7(1):101. https://doi.org/10.1186/ s13643-018-0762-1. 31. Smith SK, Selig W, Harker M, Roberts JN, Hesterlee S, Leventhal D, Klein R, Patrick-Lake B, Abernethy AP. Patient engagement practices in clinical research among patient groups, industry, and academia in the United States: a survey. PLoS One. 2015;10(10):e0140232. https://doi.org/10.1371/ journal.pone.0140232. 32. Case DO, Johnson JD. Health information seeking. 2012. 33. Rice RE, Atkin CK. Preface: trends in communication campaign research. In: Rice RE, Atkin CK, editors. Public communication campaigns. Newbury Park: Sage; 1989. p. 7–11. 34. Freimuth VS. Improve the cancer knowledge gap between whites and African Americans. J Natl Cancer Inst. 1993;14:81–92.

J. E. Andrews et al. 35. Freimuth VS, Stein JA, Kean TJ. Searching for health information: the cancer information service model. Philadelphia: University of Pennsylvania Press; 1989. 36. Alcalay R. The impact of mass communication campaigns in the health field. Soc Sci Med. 1983;17:87–94. https://doi. org/10.1016/0277-9536(83)90359-3. 37. Katz E, Lazersfeld PF. Personal influence: the part played by people in the flow of mass communications. New York: Free Press; 1955. 38. Lichter I. Communication in cancer care. New York: Churchill Livingstone; 1987. 39. Rogers EM, Storey JD. Communication campaigns. In: Berger CR, Chaffee SH, editors. Handbook of communication science. Newbury Park: Sage; 1987. p. 817–46. 40. Noar SM. Challenges in evaluating health communication campaigns: defining the issues. Commun Methods Meas. 2009;3:1–11. https://doi. org/10.1080/19312450902809367. 41. Cool C. The concept of situation in information science. Annu Rev Inf Sci Technol. 2001;35:5–42. 42. Scott J. Social network analysis: a handbook. 2nd ed. Thousand Oaks: Sage; 2000. 43. Sonnenwald DH, Wildemuth BM, Harmon GL. A research method to investigate information seeking using the concept of information horizons: an example from a study of lower socio-economic students’ information seeking behavior. New Rev Inf Behav Res. 2001;2:65–85. 44. Johnson JD. Information seeking: an organizational dilemma. Westport: Quorom Books; 1996. 45. Rice RE, McCreadie M, Chang SL. Accessing and browsing information and communication. Cambridge: MIT Press; 2001. 46. Johnson JD, Andrews JE, Case DO, Allard SL, Johnson NE. Fields and/or pathways: contrasting and/or complementary views of information seeking. Inf Process Manag. 2006;42:569–82. https://doi. org/10.1016/j.ipm.2004.12.001. 47. Kuhlthau CC. Inside the search process: information seeking from the user’s perspective. J Am Soc Inf Sci Technol. 1991;42:361–71. https://doi.org/10.1002/ ( S I C I ) 1 0 9 7 -4 5 7 1 ( 1 9 9 1 0 6 ) 4 2 : 5 < 3 6 1 : : A I D - ASI6>3.0.CO;2-#. 48. Williamson K. Discovered by chance: the role of incidental information acquisition in an ecological model of information use. Libr Inf Sci Res. 1998;20:23–40. https://doi.org/10.1016/ S0740-8188(98)90004-4. 49. DHHS. Healthy People 2030. https://health.gov/ healthypeople/priority-a reas/health-l iteracy- healthy-people-2030. Accessed 12 May 2022. 50. National Academy of Medicine. Advancing health literacy in clinical research: clear communications for every participant. 28 Oct 2019. https://nam.edu/ advancing-health-literacy-in-clinical-research-clear- communications-for-every-participant/. Accessed 12 May 2022.

23 Evolving Opportunities and Challenges for Patients in Clinical Research 51. CDC. Patient Engagement, Health Literacy. https:// www.cdc.gov/healthliteracy/researchevaluate/ patient-engage.html. Accessed 12 May 2022. 52. National Academies of Sciences, Engineering, and Medicine. Health literacy in clinical research: practice and impact: proceedings of a workshop, Washington, DC; 2020. https://nap.nationalacademies.org/catalog/25616/health-literacy-in-clinical- research-p ractice-a nd-i mpact-p roceedings-o f. Accessed 12 May 2022. 53. Patel CO, Garg V, Khan SA. What do patients search for when seeking clinical trial information online? AMIA Annu Symp Proc. 2010:597–601. 54. Yang ZJ, McComas K, Gay G, Leonard JP, Dannenberg AJ, Dillon H. Motivation for health information seeking and processing about clinical trial enrollment. Health Commun. 2010;25(5):423– 36. https://doi.org/10.1080/10410236.2010.483338. 55. Janet Yang Z, McComas K, Gay G, Leonard JP, Dannenberg AJ, Dillon H. From information processing to behavioral intentions: exploring cancer patients’ motivations for clinical trial enrollment. Patient Educ Couns. 2010;79(2):231–8. https://doi. org/10.1016/j.pec.2009.08.010. Epub 2009 Sep 11. 56. Fox S, Jones S. The social life of health information: Americans’ pursuit of health takes place within a widening network of both online and offline sources. Pew Internet & American Life Project. 2009. http:// www.pewinternet.org/Reports/2009/8-The-Social- Life-of-Health-Information.aspx. Accessed Aug 2011. 57. Barak A, Boniel-Nissim M, Suler J. Fostering empowerment in online support groups. Comput Hum Behav. 2008;24:1867–83. https://doi. org/10.1016/j.chb.2008.02.004. 58. Wicks P, Massagli M, Frost J, Brownstein C, Okun S, Vaughan T, Bradley R, Heywood J. Sharing health data for better outcomes on PatientsLikeMe. J Med Internet Res. 2010;12:e19. https://doi.org/10.2196/ jmir.1549. 59. Fox S, Raine L. How internet users decide what information to trust when they or their loved ones are sick. Pew Internet & American Life Project. 2002. http://www.pewinternet.org/Reports/2002/ Vital-D ecisions-A -P ew-I nternet-H ealth-R eport/ Summary-of-Findings.aspx. Accessed Aug 2011. 60. Taylor H, Leitman R. Four-nation survey shows widespread but different levels of internet use for health purposes. Harris Interactive Healthcare Care News. 2002. http://www.harrisinteractive.com/news/newsletters/healthnews/HI_HealthCareNews2002Vol2_ iss11.pdf. Accessed Aug 2011. 61. Lowery W, Anderson WB. The impact of web use on the public perception of physicians. Paper presented to the annual convention of the Association for Education in Journalism and Mass Communication, Miami Beach; 2002. 62. Johnson JD. Dosage: a bridging metaphor for theory and practice. Int J Strateg Commun. 2008;2:137–53. https://doi.org/10.1080/15531180801958204.

491

63. Parrott R, Steiner C. Lessons learned about academic and public health collaborations in the conduct of community-based research. In: Thompson TL, Dorsey AM, Miller K, Parrott RL, editors. Handbook of health communication. Mahwah: Lawrence Erlbaum Associates, Inc.; 2003. p. 637–50. 64. Case D, Johnson JD, Andrews JE, Allard S, Kelly KM. From two-step flow to the internet: the changing array of sources for genetics information seeking. J Am Soc Inf Sci Technol. 2004;55:660–9. https://doi.org/10.1002/asi.20000. 65. Shapiro AL. The control revolution……: how the internet is putting individuals in charge and changing the world we know. New York: Public Affairs; 1999. 66. Katz E. The two step flow of communication: an up to date report on an hypothesis. Public Opin Q. 1957;21:61–78. 67. Burt RS. Structural holes: the social structure of competition. Cambridge: Harvard University Press; 1992. 68. Mills EJ, Seely D, Rachlis B, et al. Barriers to participation in clinical trials of cancer: a meta-analysis and systematic review of patient-reported factors. Lancet Oncol. 2006;7(2):141–8. 69. Frost J, Massagli M. PatientsLikeMe the case for a data-centered patient community and how ALS patients use the community to inform treatment decisions and manage pulmonary health. Chron Respir Dis. 2009;6:225–9. https://doi. org/10.1177/1479972309348655. 70. Brownstein CA, Brownstein JS, Williams DS III, Wicks P, Heywood JA. The power of social networking in medicine. Nat Biotechnol. 2009;27:888–90. https://doi.org/10.1038/nbt1009-888. 71. Rains SA, Young V. A meta-analysis of research on formal computer-mediated support groups: examining group characteristics and health outcomes. Hum Commun Res. 2009;35:309–36. 72. Aymé S, Kole A, Groft S. Empowerment of patients: lessons from the rare diseases community. Lancet. 2008;371(9629):2048–51. 73. Weijer C. Our bodies, our science: challenging the breast cancer establishment, victims now ask for a voice in the war against disease. Sciences. 1995;35:41–4. 74. Statement of Scott Gottlieb. Commissioner of Food and Drugs before the Subcommittee on Health, Committee on Energy and Commerce, US House of Representatives, 3 Oct 2017. https://www.fda.gov/ newsevents/testimony/ucm578634.htm. Accessed 29 Jun 2018. 75. Steinhubl SR, Muse ED, Topol EJ. The emerging field of mobile health. Sci Transl Med. 2015;7(283):283rv3. https://doi.org/10.1126/scitranslmed.aaa3487. 76. National Library of Medicine. What is direct-to- consumer genetic testing? https://medlineplus.gov/ genetics/understanding/dtcgenetictesting/directtoconsumer/. Accessed 9 Aug 2022.

492 77. ASHG statement on direct-to-consumer genetic testing in the United States. Society news. Am J Hum Genet. 2007;81:637. http://www.ashg.org/pdf/dtc_ statement.pdf. Accessed 29 Jun 2018. 78. Gil J, Obley AJ, Prasad V. Direct-to-consumer genetic testing: the implications of the US FDA’s first marketing authorization for BRCA mutation testing. JAMA. 2018;319(23):2377–88. https://doi. org/10.1001/jama.2018.5330. 79. All of Us Research Program. https://www.joinallofus.org/en/program-overview. Accessed 9 Aug 2022. 80. White House Archives. https://obamawhitehouse. archives.gov/node/333101 Accessed 29 Jun 2018. 81. Pew Research Center. 5 Feb 2018. http://www. pewinternet.org/fact-sheet/mobile/. Accessed 29 Jun 2018. 82. Istepanian R. Mobile health (m-Health) in retrospect: the known unknowns. Int J Environ Res Public Health. 2022;19(7):3747. https://doi.org/10.3390/ ijerph19073747. 83. Wenzel. 2017. Accessed http://www.clinicalinformaticsnews.com/2017/04/26/wearables-s haping- the-future-of-clinical-trials.aspx. 84. Li X, Dunn J, Salins D, et al. Digital health: tracking physiomes and activity using wearable biosensors reveals useful health-related information. PLoS Biol. 2017;15(1):e2001402. https://doi.org/10.1371/ journal.pbio.2001402. 85. Cummins KM, Brumback T, Chung T, Moore RC, Henthorn T, Eberson S, Lopez A, Sarkissyan T, Nooner KB, Brown SA, Tapert SF. Acceptability, validity, and engagement with a mobile app for frequent, continuous multiyear assessment of youth Health behaviors (mNCANDA): mixed methods study. JMIR Mhealth Uhealth. 2021;9(2):e24472. https://doi.org/10.2196/24472. 86. Zhou L, Bao J, Setiawan IMA, Saptono A, Parmanto B. The mHealth app usability questionnaire (MAUQ): development and validation study. JMIR Mhealth Uhealth. 2019;7(4):e11500. https://doi. org/10.2196/11500. 87. Gallagher JL, Rivera RD, Van Shepard K, Roushan T, Ahsan G, Ahamed SI, Chiu A, Jurken M, Simpson PM, Nugent M, Gobin KS, Wen C, Eldredge CE. Life- threatening allergies: using a patient-engaged approach. Telemed J e-health. 2019;25(4):319–25. https://doi.org/10.1089/tmj.2018.0046. 88. APDA. https://www.apdaparkinson.org/article/ wearable-technology-in-parkinsons/. 89. Zhong R, Rau PP. A mobile phone-based gait assessment app for the elderly: development and evaluation. JMIR Mhealth Uhealth. 2020;8(2):e14453. https://doi.org/10.2196/14453. 90. Yousaf K, Mehmood Z, Awan IA, Saba T, Alharbey R, Qadah T, Alrige MA. A comprehensive study of mobile-health based assistive technology for the healthcare of dementia and Alzheimer’s disease (AD). Health Care Manag Sci. 2020;23(2):287–309. https://doi.org/10.1007/s10729-019-09486-0.

J. E. Andrews et al. 91. Majumder S, Mondal T, Deen MJ. Wearable sensors for remote health monitoring. Sensors (Basel, Switzerland). 2017;17(1):130. https://doi. org/10.3390/s17010130. 92. AHRQ. https://www.ahrq.gov/learning-health- systems/about.html. 93. Amagai S, Pila S, Kaat A, Nowinski C, Gershon R. Challenges in participant engagement and retention using mobile health apps: literature review. J Med Internet Res. 2022;24(4):e35120. https://doi. org/10.2196/35120. 94. Allison M. Can web 2.0 reboot clinical trials? Nat Biotechnol. 2009;27:895–902. https://doi. org/10.1038/nbt1009-895. 95. Cartmell KB, Bonilha HS, Simpson KN, Ford ME, Bryant DC, Alberg AJ. Patient barriers to cancer clinical trial participation and navigator activities to assist. Adv Cancer Res. 2020;146:139–66. https:// doi.org/10.1016/bs.acr.2020.01.008. 96. Atkinson NL, Massett HA, Mylks C, Hanna B, Deering MJ, Hesse BW. User-centered research on breast cancer patient needs and preferences of an internet-based clinical trial matching system. J Med Internet Res. 2007;9:e13. https://doi.org/10.2196/ jmir.9.2.e13. 97. Marks L, Power E. Using technology to address recruitment issues in the clinical trial process. Trends Biotechnol. 2002;20:105–9. https://doi.org/10.1016/ S0167-7799(02)01881-4. 98. Brubaker JR, Lustig C, Hayes GR. PatientsLikeMe: empowerment and representation in a patient- centered social network. Presented at the CSCW 2010 workshop on CSCW research in healthcare: past, present, and future, Savannah; 2007. 99. Nash EL, Gilroy D, Srikusalanukul W, Abhayaratna WP, Stanton T, Mitchell G, Stowasser M, Sharman JE. Facebook advertising for participant recruitment into a blood pressure clinical trial. J Hypertens. 2017;35(12):2527–31. 100. Carter-Harris L, Bartlett Ellis R, Warrick A, Rawl S. Beyond traditional newspaper advertisement: leveraging facebook-targeted advertisement to recruit long-term smokers for research. J Med Internet Res. 2016;18(6):e117. https://doi. org/10.2196/jmir.5502. 101. Kayrouz R, Dear BF, Karin E, Titov N. Facebook as an effective recruitment strategy for mental health research of hard to reach populations. Internet Interv. 2016;4:1. 102. Getz KA. Examining and enabling the role of health care providers as patient engagement facilitators in clinical trials. Clin Ther. 2017;39(11):2203–13. https://doi.org/10.1016/j.clinthera.2017.09.014. 103. Applequist J, Burroughs C, Ramirez A, Merkel PA, Rothenberg ME, Trapnell B, Desnick RJ, Sahin M, Krischer JP. A novel approach to conducting clinical trials in the community setting: utilizing patient-driven platforms and social media to drive web-based patient recruitment. BMC Med

23 Evolving Opportunities and Challenges for Patients in Clinical Research Res Methodol. 2020;20(58):1–14. https://doi. org/10.1186/s12874-020-00926-y. 104. Moorcraft SY, Marriott C, Peckitt C, Cunningham D, Chau I, Starling N, Watkins D, Rao S. Patients’ willingness to participate in clinical trials and their views on aspects of cancer research: results of a prospective patient survey. Trials. 2016;17(1):17. 105. Ryan A. Engaging consumers with musculoskeletal conditions in health research: a user-centred perspective. In: Integrating and connecting care: selected papers from the 25th Australian National Health Informatics Conference (HIC 2017), 10 Aug 2017, vol. 239. IOS Press; 2017. p. 104. 106. Zanni MV, Fitch K, Rivard C, Sanchez L, Douglas PS, Grinspoon S, Smeaton L, Currier JS, Looby SE. Follow YOUR heart: development of an evidence-based campaign empowering older women with HIV to participate in a large-scale cardiovascular disease prevention trial. HIV Clin Trials. 2017;18(2):83–91. 107. Boote J, Baird W, Beecroft C. Public involvement at the design stage of primary health research: a narrative review of case examples. Health Policy. 2010;95(1):10–23. 108. Collins K, Boote J, Ardron D, Gath J, Green T, Ahmedzai SH. Making patient and public involvement in cancer and palliative research a reality: academic support is vital for success. BMJ Support Palliat Care. 2014; https://doi.org/10.1136/ bmjspcare-2014-000750. 109. Buckley JM, Irving AD, Goodacre S. How do patients feel about taking part in clinical trials in emergency care? Emerg Med J. 2016;33:376. 110. Healthit.gov. https://www.apple.com/ios/health/. Accessed 29 Jun 2022. 111. U.S. Department of Veterans Affairs. Blue Button. Accessed https://www.va.gov/bluebutton/. 112. Chakradhar S. Many returns: call-ins and breakfasts hand back results to study volunteers. 113. Hussain-Gambles M, Atkin K, Leese B. Why ethnic minority groups are under-represented in clinical trials: a review of the literature. Health Soc Care Community. 2004;12(5):382–8. 114. Alegria M, Sud S, Steinberg BE, Gai N, Siddiqui A. Reporting of participant race, sex, and socioeconomic status in randomized clinical trials in general medical journals, 2015 vs 2019. JAMA Netw Open. 2021;4(5):e2111516. https://doi.org/10.1001/ jamanetworkopen.2021.11516. 115. Coakley M, Fadiran EO, Parrish J, Griffith RA, Weiss E, Carter C. Dialogues on diversifying clinical trials: successful strategies for engaging women and minorities in clinical trials. J Womens Health (Larchmt). 2012;21(7):713–6. https://doi. org/10.1089/jwh.2012.3733. 116. Patel SN, Staples JN, Garcia C, Chatfield L, Ferriss JS, Duska L. Are ethnic and minority women less likely to participate in clinical trials? Gynecol Oncol. 2020;157(2):323–8. https://doi.org/10.1016/j. ygyno.2020.01.040.

493

117. Celi L, Cellini J, Charpignon ML, Dee E, Dernoncourt F, Eber R, Mitchell WG, Moukheiber L, Schrimer J, Situ J, Pagulo J, Park J, Wawira JG, Yao S. Sources of bias in artificial intelligence that perpetuate healthcare disparities—a global review. PLoS Digital Health. 2022;1:e0000022. https://doi. org/10.1371/journal.pdig.0000022. 118. Houston B. What is augmented reality? A practical overview. ThreeKit. 28 May 2020. https:// www.threekit.com/blog/what-is-augmented-reality. Accessed 7 May 2022. 119. Eckert M, Volmerg JS, Friedrich CM. Augmented reality in medicine: systematic and bibliographic review. JMIR Mhealth Uhealth. 2019;7(4):e10967. https://doi.org/10.2196/10967. 120. DDA Health, Clinical Trial Management. 2022. https://augmentedreality.health/clinical-t rial- management.cfm. Accessed 8 May 2022. 121. Setzer J. 3 ways VR can transform contract research. Lucid Dream. https://www.luciddreamvr.com/ blog/3-ways-virtual-reality-can-transform-cros. 4 May 2021. Accessed 7 May 2022. 122. Riva G, Baños RM, Botella C, Mantovani F, Gaggioli A. Transforming experience: the potential of augmented reality and virtual reality for enhancing personal and clinical change. Front Psych. 2016;7:164. https://doi.org/10.3389/fpsyt.2016.00164. 123. Allcoat D, von Muhlenen A. Learning in virtual reality: effects on performance, emotion, and engagement. Res Learn Technol. 2018;26:1–13. https://doi. org/10.25304/rlt.v26.2140. 124. Chiamulera C, Mantovani E, Tamburin S. Remote clinical trials: a timely opportunity for a virtual reality approach and its potential application in neurology. Br J Clin Pharmacol. 2021;87(10):3639–42. https://doi.org/10.1111/bcp.14922. 125. Hirsch IB, Martinez J, Dorsey ER, Finken G, Fleming A, Gropp C, Home P, Kaufer D, Papapetropoulos S. Incorporating site-less clinical trials into drug development: a framework for action. Clin Ther. 2017;39(5):1064–76. https://doi.org/10.1016/j. clinthera.2017.03.018. 126. Rief W, Barsky AJ, Bingel U, Doering BK, Wöhr M, Schweiger U. Rethinking psychopharmacotherapy: the role of treatment context and brain plasticity in antidepressant and antipsychotic interventions. Neurosci Biobehav Rev. 2016;60:51–64. https://doi. org/10.1016/j.neubiorev.2015.11.008. 127. Summers JK, Vivian DN. Ecotherapy—a forgotten ecosystem service: a review. Front Psychol. 2018;9:1389. https://doi.org/10.3389/ fpsyg.2018.01389. 128. Ballesteros S, Kraft E, Santana S, Tziraki C. Maintaining older brain functionality: a targeted review. Neurosci Biobehav Rev. 2015;55:453–77. https://doi.org/10.1016/j. neubiorev.2015.06.008. 129. Bracha Y, Bagwell J, Furberg R, Wald J. Consumer- mediated data exchange for research: current state of US law, technology, and trust. JMIR Med

494 Inform. 2019;7(2):e12348. https://medinform.jmir. org/2019/2/e12348. https://doi.org/10.2196/12348. 130. Provider obligations for patient portals under the 21st century cures act, Health Affairs Forefront. 16 May 2022. 131. Monegain B. Amazon, Apple only part of ‘seismic change’ coming to healthcare. Healthcare IT News. 1 May 2018. http://www.healthcareitnews. com/news/amazon-apple-only-part-seismic-change- coming-healthcare. Accessed 29 Jun 2018. 132. The Economist. https://www.economist.com/news/ business/21736193-w orlds-b iggest-t ech-f irms- see-o pportunity-h ealth-c are-w hich-c ould-m ean- empowered. 3 Feb 2018. Accessed 29 Jun 2018. 133. Househ M, Grainger R, Petersen C, Bamidis P, Merolli M. Balancing between privacy and patient needs for health information in the age of partici-

J. E. Andrews et al. patory health and social media: a scoping review. Yearb Med Inform. 2018;27(1):29–36. https://doi. org/10.1055/s-0038-1641197. Epub 2018 Apr 22. 134. Staccini P, Lau AYS, Section Editors for the IMIA Yearbook Section on Consumer Health Informatics and Education. Findings from 2017 on consumer health informatics and education: health data access and sharing. Yearb Med Inform. 2018;27(1):163–9. https://doi.org/10.1055/s-0038-1641218. Epub 2018 Aug 29. 135. Johnson JD. Health-related information seeking: is it worth it? Inf Process Manag. 2014;50(5):708–17. 136. Wicks P, Vaughan T, Heywood J. Subjects no more: what happens when trial participants realize they hold the power? BMJ. 2014;348:g368. https://doi. org/10.1136/bmj.g368.

Apps in Clinical Research

24

Brian Douthit and Rachel L. Richesson

Abstract

Apps—software applications that can be installed and run on a computer, tablet, smartphone, or other electronic devices—are changing the landscape of clinical research by opening new avenues for administrating and evaluating interventions. In addition to supporting research operations, the use of apps can facilitate increased patient engagement, efficiencies in research participation and data collection, accelerating the generation of new evidence and its application into clinical practice into practice and into lives of patients via management of their health and disease and decision-making. The use of apps is also changing the paradigm of health data ownership, raising new opportunities for participant empowerment and involvement in clinical research. This chapter outlines the major events leading to today’s app infrastructure,

B. Douthit (*) United States Department of Veterans Affairs, Tennessee Valley Healthcare System, Nashville, TN, USA Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA e-mail: [email protected] R. L. Richesson Department of Learning Health Sciences, University of Michigan, Ann Arbor, MI, USA e-mail: [email protected]

current uses of apps in clinical research, design considerations, and future directions. Keywords

Apps · Mobile health · Clinical research Patient engagement · Data access · FHIR

Learning Objectives 1. Formulate a definition for “apps” and describe three ways that they can be used to support clinical research. 2. Distinguish between app-driven operational support and support of the research intervention. 3. Verbalize basic security considerations when using apps in clinical research. 4. Define interoperability and identify several supporting messaging and content standards.

Introduction The exact origin of the word “app” is difficult to trace, but it is widely known to be short for “application” which generally refers to a software application that can be installed and run on a computer, tablet, smartphone, or other electronic devices. This terminology evolved to differentiate apps from integrated programs and

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. L. Richesson et al. (eds.), Clinical Research Informatics, Health Informatics, https://doi.org/10.1007/978-3-031-27173-1_24

495

496

software, in that an app is in some way portable, sharable, and easily accessible. The initial appeal of the smartphone was driven by the use of apps. First developed for productivity and leisure, apps now cover nearly every facet of daily life, with health being no exception. Although health and fitness apps only make up about 3% of consumer mobile apps, this still accounts for nearly 100,000 different apps being available in app stores [1]. These programs can utilize phone hardware, now having capabilities to monitor heartbeat, steps, and other physiological data that would otherwise be difficult to collect from the average person. As apps have become ubiquitous in almost all facets of life, the proliferation of new app-based tools in clinical research was inevitable. The journey toward an app-integrated research ecosystem, however, is still not fully realized. There are several reasons for this, including an uncertainty in how to govern over new technologies such as artificial intelligence and a continued debate in how to appropriately store, manipulate, and access patient data. To begin to understand this shifting paradigm, it is helpful to examine the origin of apps and the growth of the surrounding infrastructure. Researchers have studied and used “mHealth” for years, utilizing smartphones and wireless technologies to collect patient data and deliver interventions. Common examples include home hypertension management [2], diabetic glucose control [3], and weight loss interventions [4]. As such interventions were shown to be effective, the use of apps in different facets of health and healthcare expanded quickly. Enacted in 2016, the 21st Century Cures Act authorized and funded numerous programs to improve treatment delivery in the United States [5]. Notably, the Office of the National Coordinator (ONC) published a rule in 2020 stating several requirements and goals for interoperability in the United States, reinforcing that interoperability is defined as “complete access, exchange, and use of all electronically accessible health information” [6]. With tangible, government-led requirements for data access and exchange being enacted and

B. Douthit and R. L. Richesson

enforced, a new age of “open data” is upon us. Paving the way to digital interoperability includes the growing adoption of mobile devices (i.e., smartphones, tablets, etc.), cloud platforms, and both clinician and patient facing application programming interfaces (APIs). In clinical research informatics, the focus of apps is to support research processes and workflows and deliver interventions. Apps may also augment and define clinical or security roles for case report forms (CRFs) or other tools where patient data may be accessed. In other words, apps may be used in a number of different ways, with a number of different variations to suit the needs of a clinical trial or research study. In the next few sections, we will describe use cases and considerations for apps used in clinical research.

Operational Support One of the first obvious uses cases for apps is to improve common research activities, including but not limited to, determining patient eligibility, capturing informed consent and patient-reported data, enabling patients to contribute clinical patient records—such as from EHR or PHR systems, sharing reminders and information about study events, facilitating incentive payments to study participants, and sharing data across the research team. Apps may contain a number of these features or act independently, with formats varying as either web-based, EHR-based, mobile device formats, or a combination. As authoring tools for apps have become more accessible and easier to use, the development and use of apps for operational functions in clinical research have increased.

Recruitment and Participant Management Identifying, reviewing eligibility, contacting, and recruiting participants for clinical research can be one of the most time-consuming processes for investigators and their teams. To maximize

24 Apps in Clinical Research

e fficiency, apps have become common tools to both facilitate and automate several research activities. The first, and perhaps the most common use case, is the determination of patient eligibility. There are several ways to approach this problem, with a mix of both manual and automatic processes to compare patient data to the study requirements. This is often made possible through the EHR, where most patient data reside. It is also often the case that patients in clinical trials are being recruited from inpatient or outpatient settings, where they are likely to already have a record with robust data. For most recruitment tools, apps rely on querying structured data to compare against a set of rules. Structured data are data that have a specified format and organization. Examples include height, blood pressure, and diagnosis. Unstructured data, the antithesis to structured data, is often referred to as “free text” where blocks of narrative text are documented. Unstructured data can be systematically analyzed using text extraction and natural language processing (NLP) methods and tools [7], but it is much more feasible to construct an eligibility tool using structured data. This is important to consider, as the underlying recruitment criteria specified in a narrative research protocol might reflect data that are not routinely captured in a structured format from EHRs or other sources. If this is the case, the usefulness of an eligibility app may be limited. Apps that help determine eligibility for one or more research studies may also be developed in different formats, such as web apps, mobile apps, or EHR-integrated apps. An example of a mobile app comes from work done at University of Kansas, where researchers developed an app which pooled all studies across their Cancer Center allowing for easy referrals and being able to screen for a large number of studies at once [7]. In instances where patients may be eligible for many trials, researchers from Sydney, Australia, developed a mobile app which facilitates cross-reference referrals in melanoma clinical trials [8].

497

Data Capture A common use for apps in clinical trials is to capture data essential to the analysis of the intervention or medical product. Classically, paper case report forms (CRFs) were used to capture the data generated during the study. Today, much of the paper forms have transitioned to electronic CRFs (or eCRFs) to expediate and standardize data extraction and transmission to study sponsors. These eCRFs often allow for automatic data extraction for the EHR, eliminating the need for manual extraction and reducing the possibility of transcription errors [9]. Additionally, Fast Healthcare Interoperability Resources (FHIR)based apps have been developed that allow for this automated extraction [10] (see section “Building Apps” for more details). However, apps have expanded the possibilities of data acquisition and offer more functionality than eCRFs. Most notably, apps are allowing for the capture of data, including patient generated-data, that are not widely available in the EHR.

articipant Generated Data P The Office of the National Coordinator (ONC) defines patient-generated health data as “health- related data created, recorded, or gathered by or from patients (or family members or other caregivers) to help address a health concern” [11]. Table 24.1 outlines several of these data types and common examples [11, 12]. The ONC also gives distinctions between patient-generated health data and data that are captured in the formal care system, in that they are the responsibility of the patient to capture, and the patients have autonomy over how they share this data to their providers [11]. When developing a trial which uses these data, the team should be sensitive to the patient’s ownership of their data and clearly communicate their intentions, expectations, and proposed uses for their data. Apps enable the feasible collection of patient- generated data. For example, the hardware found in most smartphones allows for the capture of data that otherwise required additional devices,

B. Douthit and R. L. Richesson

498 Table 24.1 Patient-generated health data types and examples Patient-generated health data type Health history Treatment history Biometric data Symptoms Lifestyle choices Self-reported outcomes Care preferences

Personal profile Multimedia observations Social tracking Environmental tracking Activity Physiological data

Examples Family history, personal conditions Surgical history, counseling Voice recordings, gait, hand or face geometry Pain, nausea, mental health status Food journaling, exercise logs, smoking patterns Satisfaction, intervention effectiveness Communication style, notifications, health decision-maker Values, goals, expectations Video, voice, or picture recordings Time spent on app, texting, visitors Room temperature, humidity, light, noise, pollution Movement, calories burned, sleep patterns, hygiene, steps Weight, blood pressure, heart rate

utilizing technologies such as GPS, pedometers, and sensors which can capture environmental and physiological data. The use of an app could allow for simple capture and export of data when it is needed, requiring minimal participant input and increasing data availability, quality, and efficiency of reporting [13]. Apps also make data such as questionnaires, self-reported outcomes, and other data requiring intentional user input easier and more feasible to collect. The added functionality of apps over paper or eCRFs is the ability to use features to remind patients for data input. For example, if the app is built on a mobile platform, the user may be sent reminders to enter data at needed intervals. In research relating to patient adherence, it is shown that such reminders are effective in both behavior change and data quality [14, 15]. The ability to structure questionnaires also gives several advantages to the research team. First, it facilitates simple standardization of forms across multiple sites, allowing for global changes and

updates to the questionnaire if needed, as well as validation checks on data entry which increase data quality. Second, the acquisition and collation of data from patients is easier, as patients and research sites do not need to independently transmit data. Third, it allows for conditional logic, allowing researchers more control over data collection needs and minimized patient data entry- related fatigue. In addition to patients, caregivers are also empowered by the use of apps in clinical research. There are many scenarios in which caregivers are responsible for research activities of their loved ones (either in part or in whole), with mobile access to research data entry tools facilitating higher-quality data capture and less burden on the patient and their caregivers. Caregivers may also be part of the study [16, 17] and require their own data collection tools. The use of apps allows for multiple, targeted data collection which can be modified based on the role of the caregiver and what data are needed for the study. Involving caregivers may also improve patient compliance with trial compliance and continuation of the study protocol.

linician Generated Data C While we often quantify clinicians’ work based on what we are able to parse in the EHR, an increasing frequency of clinician-focused studies requires data similar to what we have been collecting in patient-centered trials. The use of apps and associated devices has augmented our ability to study clinicians, both in relation to patient care and in relation to their well-being as supported by the “quadruple aim” [18]. As smartphones are increasingly being permitted for use during routine clinical care [19] (in years past, cell phones were prohibited or highly discouraged in healthcare settings), there are new opportunities to study clinicians’ workflows and collect data that has been otherwise unobtainable. As with any data collection design or modality, the utilization of apps in the research study is dependent on the goals and data that is needed. In a study in which clinicians are the subjects, apps may be used to collect data in a similar manner as they do for patients in a nonclinical setting:

499

24 Apps in Clinical Research

activity tracking, environment, and biometric data. If a study seeks feedback from a clinician, the app may be used to collect surveys and narrative information to assist in feasibility and usability studies.

Apps as the Intervention When apps are used in clinical research, one of their roles can often be to deliver or support the delivery of the primary intervention. Examples include the use of apps to increase medication adherence and collect and monitor data from motion sensors, implementation of a guided order set, and others. Apps that are considered interventions or that deliver interventions may also simultaneously support administrative research functions, as outlined above. Table 24.2 illustrates some clinical research examples in which apps are used to deliver the primary intervention. The considerations for development of interventional apps are similar to those for apps designed for operational support. There may be data collected as part of the intervention, requiring thought into how the data will be stored and transferred. When using an app as the primary intervention, however, there may be a significant difference in the work needed to build the app from a technical standpoint. Researchers should also be aware early of the additional security and privacy risks and engage their development team early and often to mitigate potential issues (see section “Security, Sharing, and Privacy Considerations”). A significant advantage of using apps in clinical trials to deliver the intervention is the ability to standardize and track the intervention across several sites. While an app is built on a standard such as FHIR, it also makes the eventual dissemination of the intervention much more feasible. However, even if built using a web portal or standalone app, the research team can have more control over the standardization of the intervention, making change management simple and bolstering internal validity. Regardless, there are many options on how to develop an app to suit

Table 24.2 Examples of clinical research protocols using apps as the intervention Study The NUDGE Pragmatic Trial [20]

Purpose To improve cardiovascular medication adherence Pain Monitor, To improve a Smartphone chronic pain App for Chronic Pain [21]

Intervention Mobile phone text messages and an artificial intelligence chatbot Mobile app with self-reported pain motoring and symptoms, telemonitoring services, and alarming clinicians to pain events DISCO App To reduce the Mobile app with [22] financial burden tailored education of cancer and assessments to assist in navigating financial burden relate to cancer treatment ICUConnect To reduce Web- and mobile- [23] unmet palliative based application care needs, with patient and psychological clinician-facing distress, and components to assess healthcare for caregiver needs resource related to palliative utilization care and distress while guiding clinician action based on responses selfBACK vs Testing two Web-based app vs a e-Help: Low apps for mobile app; Back Pain effectiveness in individualized and Management reducing lower nonindividualized [24] back and neck interventions for pain self-care management eMums Plus To assist new Mobile application [25] mothers with used for nurse-led postnatal online program depression and accompanied with other parenting assessment of difficulties patient-reported outcomes

the intervention of the study and the resources of the research team.

Building Apps When deciding to build an app to support clinical research, one of this first decisions that must take place is choosing the platform on which the app

500

will be built. This is largely dependent on the format of the app, whether it is web-based, a mobile app, or hosted through an EHR. Web-based apps are highly variable in how they may be developed and are largely depending on the level of functionality that is needed. Third-party development services are available, or the research team may have internal talent that is able to build the app from the ground up. At times, researchers opt to work with external companies to relieve the effort of ensuring technical conformity to HIPAA protections and other security considerations, including user access permissions and database access and management. For mobile apps and EHR- based apps, there are several solutions available that may be more accessible for most medical system-based researchers.

Standards for App Development To develop an interoperable, scalable app, standards should be central in the research team conversations regarding the app’s development early on. Standards for apps can be categorized as either messaging standards (communicating and receiving data) or content standards. Both are equally important in assuring the app can be used at multiple sights and be disseminated easily if found to be effective.

SMART on FHIR Substitutable Medical Applications, Reusable Technologies (SMART) on Fast Healthcare Interoperability Resources (FHIR) is a Health Level Seven (HL7) standard which allows for the functionality of several different APIs and data exchange between apps and the EHR. Simply put, SMART is the infrastructure and FHIR facilitates the data exchange. The 21st Century Cures Act [6] endorses SMART on FHIR as a standard to promote app interoperability, cementing its place in US-based app development. Using SMART on FHIR to develop an app to support clinical research will help to ensure HIPAA and security requirements allow for easier access to data. As EHR vendors are and health systems are adopting FHIR at a rapid pace, using SMART on

B. Douthit and R. L. Richesson

FHIR will also promote easier dissemination posttrial. FHIR is a messaging standard that operates on the principle of “resources,” which are categories of clinical and operational data, such as patient, observation, or allergy (over 100 are currently in use) [26]. The FHIR standard specification defines the format and structure of “messages” between EHRs, apps, and other data source APIs, providing constraints and data definitions for acceptable responses. The FHIR standard specification includes required and optional data requirements for each FHIR resource, hence providing a clear, standardized format for app developers to follow. The standard also outlines profiles and extensions which further constrain and define data requirements, promoting a design closer to “plug-and-play.” One limitation is that to use FHIR apps, your local EHR must be FHIR- enabled, which not all organizations have as the inclusion of this standard costs money and resources to maintain. However, EHR vendors offer this functionality regularly, so this issue is becoming less common. There are also some issues with versioning—as FHIR is continually updated, each health system will most likely have a different version, and this may limit FHIR resource availability or cause instability if implementing an app with a different version. FHIR also offers functionality such as CDS Hooks, an ability to trigger third part decision support services based on common clinical actions such as “open patient record” or “order medication.” Also mentioned in the 21st Century Cures Act, CDS Hooks can be a powerful tool to leverage in apps designed for clinical research, as a CDS tool developed independently can be operationalized as an interoperable health app. This can be especially helpful if a CDS tool was developed for a feasibility study, and the research team wishes to later leverage the tool at scale with several different sites. Using CDS Hooks can save time and resources, leveraging existing CDS within a FHIR app that can be integrated in an EHR [27]. At present, CDS Hooks is being used for provider-facing apps, but the need for patient involvement may catalyze expanded functionality.

24 Apps in Clinical Research

Content Standards Content standards include terminologies such as SNOMED-CT, LOINC, and ICD-10, as well as definitions and constraints of messaging standards, such as United States Core Data for Interoperability (USCDI) profiles for FHIR. Regardless of the tool, including such standards is extremely important as it allows for data mapping, the ability to pool analysis by cross-referencing observations, and makes conforming to messaging standards easier. Terminology standards may be used to define a set of responses and standardize the data required by the app. If data requirements are defined using terminology standards, the process to convert disparate EHR systems becomes more feasible as you may map between different terminologies. If standards are not used, manual mapping will be required which costs time and resources. LOINC also contains functionality to standardize question and response data [28], which is a common feature in most clinical research apps. The USCDI from the ONC [29] defines health data elements for nationwide interoperable health information exchange. These requirements are leveraged by FHIR US Core profiles, which operationalize these requirements into a FHIR message. Using these profiles not only allows the benefits of using FHIR but also conforms to national data representation requirements which furthers the feasibility to scale the app after the trial concludes. The USCDI and associated FHIR profiles are always expanding, so it is imperative to check back frequently if data required by an app under development is not yet defined.

Development Tools and Hosting Platforms While it is possible to build an app from the ground up, there are several educational resources and services to aid researchers and developer to create an app to aid in clinical research. If data from a mobile phone is needed, one option may be to use Apple HealthKit [30]. This, of course, is limited to iOS operating system, but other options

501

for such toolkits are becoming available, such as Google Fit [31]. Such options are limited to certain software and hardware, so the options can be limiting if other functionality is needed. Third-party open-access platforms are also available, with Open mHealth being a popular option with a growing community [32]. Open mHealth provides documentation and tools to develop app-based tools for use in clinical research, facilitating to EHR data and other sources, such as through Apple toolkit. Documentation is also available to help with data standardization, storage, sharing, and visualizing. Such open-access all-in-one solutions are appealing as it minimizes costs when conducting research. There is also a robust community available for support during development. If an EHR-based or EHR-integrated app is needed, the research team may rely solely on the functionality of the EHR, or they may also leverage standards such as FHIR. Regardless, major EHR have app stores which allow their communities to disseminate and implement apps more easily. One example is the Epic App Market [33]; Epic and its customers may host and acquire apps which take advantage of several different Epic APIs. Sandboxes are available, as well as support for developers from both Epic staff and community experts. Cerner also has a similar service, the app gallery, which supports FHIR apps and offers tools to developers [34]. The EHR-based galleries are convenient and easier to integrate, but they are limited to particular EHR vendor systems and may or may not use common standards for context or information exchange outside of that vendor ecosystem. The SMART App Gallery is a vendor agnostic site hosted by Boston Children’s Hospital Computational Health Informatics Program, which hosts SMART on FHIR apps and links to creator websites and demos [35].

ecurity, Sharing, and Privacy S Considerations Unfortunately, as with all technology, app designers, builders, and hosts must be aware of the vulnerabilities of apps and understand the

502

potential risks for data breaches and other negative outcomes. While it is not often that the clinical research team has members that are expert in digital security, it is strongly encouraged that the team consult with app developers and security experts to minimize the chance of data breaches, injection attacks, and other compromising activities committed by bad actors. Considerations in the design of the app and credentialing processes must be focused around protecting sensitive health data and comply with local and federal law.

Security Guidance While HIPAA guides much of the requirements around what constitutes protected data and reporting requirements in the United States, the requirements regarding app security design are less clear. This problem is compounded by the rapid changes in technology and security protocols, where government regulation cannot keep pace with such changes. It often falls upon the app developer to ensure best practices to ensure patient data is kept secure. Despite these challenges, some guidance does exist from the ONC. Last updated in 2020, ONC guidelines were issued for privacy and security consideration for healthcare APIs [36]. The document outlines several topics, including patient right of access, guidance for organizational privacy policies, encryption, access controls, data integrity, and patient portal security. This document may be helpful in guiding security and privacy decisions but again falls victim to quickly outdated information on technical specifications. Even when following the most up-to-date guidelines and using vetted infrastructure, vulnerabilities still exist. In 2021, a white-hat hacker (an individual who breaches security vulnerabilities in good faith to help alert developers to weaknesses) Alissa Knight published a white paper outlining vulnerabilities in certain FHIR implementations [37]. While HL7 (the parent company of FHIR) noted such vulnerabilities were due to third-party implementations [38], it is reasonable

B. Douthit and R. L. Richesson

to continually assess for security vulnerabilities, even when using trustworthy products.

Open and Closed Data Security needs differ depending on the app being designed. “Open data” refers to less restrictive modes of data access, such as caretakers and family members having access to patient data. “Closed data” refers to data with tight restrictions, often only able to be accessed from behind a health system’s firewall. The precedent of open data, however, has been reinforced by the 21st Century Cures Act [6] which has called for health systems to give immediate access of health data to its patients. Beyond governmental intervention, patients desire an increased autonomy in their medical decision-making and want more access to their health data [39]. When designing an app that collects patient data, the research team should be aware that open data is becoming more favorable in both legislation and public opinion, but such models are more vulnerable to security threats. Aside from patients accessing their own data, research databank, repositories, and registries (see Chaps. 13 and 17) are increasingly interested in the rich data resulting from the use of apps in clinical research. Patient perspectives on data sharing are changing, where many patients are now willing to share health data and repurpose biospecimens for research [40]. However, debates are still ongoing on whether opt-in or opt-out consent for use of EHR data should be used, and patient views are mixed [41]. Actively gaining permission to use data for research is not ideal, but the use of apps may be able to make more scalable solutions. For instance, if the mechanism to share data is included in the functionality of the app, this can save time for the research team (or health system if put into large-scale production) and gives patients a greater sense of autonomy over their health data. If patients are able to have a greater sense of control over their data, they are often more willing to share data for research [41, 42].

24 Apps in Clinical Research

503

he Ecosystem of Apps T and Electronic Health Records

such as patients and caregivers to generate data and interact with the formal healthcare system like never before. This, in part, empowers patients The introduction of apps in conjunction with the and is beginning to rewrite the role that patients electronic health record has created a rapidly play in research and how we perceive ownership changing paradigm. At the inception of EHRs, of data. We may soon see patient-owned apps that the entire infrastructure was built ground up from access their health data via the EHR, opening of by the health system. Such systems included new frontier of citizen-science and engaged Vanderbilt’s StarPanel, Regenstrief’s first-of-its- health consumers. Apps may also drive a decenkind RMRS (Regenstrief Medical Record tralized health record, where health data ownerSystem), and the development of the national US ship shifts toward patient and caregivers, Veterans Affairs’ system, VistA (Veterans Health resembling more of a “personal health record” Information System and Technology [43]. Architecture). From database management to The prospect of an increased use of apps in order entry, almost all the effort to develop these clinical research also calls into question who systems fell on the shoulders of each health sys- owns the data and who is responsible for maintem. Over time, third part solutions became avail- taining it over time. This idea of “data proveable in the form of knowledge management, such nance” potentially impacts data availability and as drug-drug interactions that could be utilized in quality [44] but also may support a decentralized clinical decision support. Today, most major EHR through the use of standards and possibly health systems have transitioned to a major ven- blockchain technology [45]. If patients gain more dor, such as Epic, Cerner, or Allscripts. Many of access over their health data through apps, it may these knowledge management systems are inte- be assumed that it is their responsibility to keep grated and accessible and often come as out-of- data current and accurate, especially if the data is the box tools that are usable immediately after intended to be shared with the health system or a implementation. The ability to author new CDS, research team. As we develop apps for clinical documentation templates, order sets, and other research, we must consider this increased patient tools has also been enhanced and streamlined by autonomy and build in educational and supportmodern EHRs, but much of this effort remains ive tools to assure patients are able to maintain the responsibility of each health system. proper provenance, quality, literacy, and security Unfortunately, the products generated by such of their data. work often remain siloed and not distributed.

Shareability and Data Ownership Apps have provided a unique opportunity to expand the shareability and scalability of knowledge management and clinical tools by changing the focus of where such tools are developed. Instead of being EHR specific, apps are often created using EHR-agnostic standards (such as FHIR) and provide more opportunity for adoption by other health systems and sites with dissimilar EHR infrastructure. Apps also exist outside of the EHR and are built to both transfer and receive data, allowing for nonclinical users

Future Directions Apps are helping to change the landscape in how we conduct and design research. They help to expand our reach to clinicians, patients, and caregivers, providing a unique platform to collect data that we have otherwise missed through traditional EHR documentation methods. They also give us the ability to implement clinical research interventions, recruit patients, and support operational activities more easily. Although there are many benefits, we must be cognizant security and privacy issues when designing and implementing them.

504

As apps become more commonplace in clinical research, we must be familiar with the technical and terminology standards behind them. By promoting open, interoperable design (such as through FHIR), the clinical research community may more easily share tools, improve upon existing apps, and disseminate effective interventions with plug-and-play operational support. Perhaps even more paramount to the research process is managing tensions around data access, where we must simultaneously protect the security of patient data while also promoting expanded access to apps for use in clinical research. Future app development should promote all the principles outlined in this chapter and continue to promote the acquisition of important data types and support the implementation of the most effective, equitable interventions.

References 1. Tangari G, Ikram M, Ijaz K, Kaafar MA, Berkovsky S. Mobile health and privacy: cross sectional study. BMJ. 2021;373:n1248. https://doi.org/10.1136/bmj. n1248. 2. McManus RJ, Little P, Stuart B, et al. Home and online management and evaluation of blood pressure (HOME BP) using a digital intervention in poorly controlled hypertension: randomised controlled trial. BMJ. 2021;372:m4858 (in Eng). https://doi. org/10.1136/bmj.m4858. 3. Eberle C, Löhnert M, Stichling S. Effectiveness of disease-specific mHealth apps in patients with diabetes mellitus: scoping review. JMIR Mhealth Uhealth. 2021;9(2):e23477 (in Eng). https://doi. org/10.2196/23477. 4. Duncan MJ, Fenton S, Brown WJ, et al. Efficacy of a multi-component m-health weight-loss intervention in overweight and obese adults: a randomised controlled trial. Int J Environ Res Public Health. 2020;17(17) (in Eng). https://doi.org/10.3390/ijerph17176200. 5. U.S. Food and Drug Administration. 21st Century Cures Act. 31 Jan 2020 https:// w w w . f d a . g o v / r e g u l a t o r y -i n f o r m a t i o n / selected-amendments-fdc-act/21st-century-cures-act. 6. Office of the National Coordinator for Health Information Technology. 21st Century Cures Act: Interoperability, Information Blocking, and the ONC Health IT Certification Program. https://www.federalregister.gov/documents/2020/05/01/2020-07419/21st- century-c ures-a ct-i nteroperability-i nformation- blocking-and-the-onc-health-it-certification.

B. Douthit and R. L. Richesson 7. Tu SW, Peleg M, Carini S, et al. A practical method for transforming free-text eligibility criteria into computable criteria. J Biomed Inform. 2011;44(2):239–50 (in Eng). https://doi.org/10.1016/j.jbi.2010.09.007. 8. Gonzalez M, Carlino MS, Zielinski RR, et al. An app to increase cross-referral and recruitment to melanoma clinical trials. J Clin Oncol. 2016;34(15_Suppl):9590. https://doi.org/10.1200/JCO.2016.34.15_suppl.9590. 9. Fleischmann R, Decker AM, Kraft A, Mai K, Schmidt S. Mobile electronic versus paper case report forms in clinical trials: a randomized controlled trial. BMC Med Res Methodol. 2017;17(1):153 (in Eng). https:// doi.org/10.1186/s12874-017-0429-y. 10. Zong N, Wen A, Stone DJ, et al. Developing an FHIR-based computational pipeline for automatic population of case report forms for colorectal cancer clinical trials using electronic health records. JCO Clin Cancer Inform. 2020;4:201–209 (in Eng). https://doi.org/10.1200/cci.19.00116. 11. Office of the National Coordinator. What are patient-generated health data? https:// w w w. h e a l t h i t . g o v / t o p i c / o t h e r h o t -t o p i c s / what-are-patient-generated-health-data. 12. Demiris G, Iribarren SJ, Sward K, Lee S, Yang R. Patient generated health data use in clinical practice: a systematic review. Nurs Outlook. 2019;67(4):311–330 (in Eng). https://doi. org/10.1016/j.outlook.2019.04.005. 13. van Dam J, Omondi Onyango K, Midamba B, et al. Open-source mobile digital platform for clinical trial data collection in low-resource settings. BMJ Innov. 2017;3(1):26. https://doi.org/10.1136/ bmjinnov-2016-000164. 14. Gurol-Urganci I, de Jongh T, Vodopivec-Jamsek V, Atun R, Car J. Mobile phone messaging reminders for attendance at healthcare appointments. Cochrane Database Syst Rev. 2013;2013(12):CD007458 (in Eng). https://doi.org/10.1002/14651858.CD007458. pub3. 15. Lam TYT, Wu PI, Tang RSY, Luk AKC, Ng S, Sung JJY. Mobile messenger-initiated reminders improve longitudinal adherence in a community-based, opportunistic colorectal cancer screening program: a single-blind, crossover randomized controlled study. Cancer. 2021;127(6):914–921 (in Eng). https://doi. org/10.1002/cncr.33336. 16. Graff MJ, Vernooij-Dassen MJ, Thijssen M, Dekker J, Hoefnagels WH, Rikkert MG. Community based occupational therapy for patients with dementia and their care givers: randomised controlled trial. BMJ (Clin Res Ed). 2006;333(7580):1196 (in Eng). https:// doi.org/10.1136/bmj.39001.688843.BE. 17. Alam S, Hannon B, Zimmermann C. Palliative care for family caregivers. J Clin Oncol. 2020;38(9):926– 936 (in Eng). https://doi.org/10.1200/jco.19.00018. 18. Arnetz BB, Goetz CM, Arnetz JE, et al. Enhancing healthcare efficiency to achieve the quadruple aim: an exploratory study. BMC Res Notes. 2020;13(1):362 (in Eng). https://doi.org/10.1186/s13104-020-05199-8.

24 Apps in Clinical Research 19. de Jong A, Donelle L, Kerr M. Nurses’ use of personal smartphone technology in the workplace: scoping review. JMIR Mhealth Uhealth. 2020;8(11):e18774 (in Eng). https://doi.org/10.2196/18774. 20. Glasgow RE, Knoepke CE, Magid D, et al. The NUDGE trial pragmatic trial to enhance cardiovascular medication adherence: study protocol for a randomized controlled trial. Trials. 2021;22(1):528 (in Eng). https://doi.org/10.1186/s13063-021-05453-9. 21. Suso-Ribera C, Mesas Á, Medel J, et al. Improving pain treatment with a smartphone app: study protocol for a randomized controlled trial. Trials. 2018;19(1):145. https://doi.org/10.1186/s13063-018-2539-1. 22. Hamel LM, Dougherty DW, Kim S, et al. DISCO app: study protocol for a randomized controlled trial to test the effectiveness of a patient intervention to reduce the financial burden of cancer in a diverse patient population. Trials. 2021;22(1):636 (in Eng). https:// doi.org/10.1186/s13063-021-05593-y. 23. Cox CE, Riley IL, Ashana DC, et al. Improving racial disparities in unmet palliative care needs among intensive care unit family members with a needs-targeted app intervention: the ICUconnect randomized clinical trial. Contemp Clin Trials. 2021;103:106319 (in Eng). https://doi.org/10.1016/j.cct.2021.106319. 24. Marcuzzi A, Bach K, Nordstoga AL, et al. Individually tailored self-management app-based intervention (selfBACK) versus a self-management web-based intervention (e-Help) or usual care in people with low back and neck pain referred to secondary care: protocol for a multiarm randomised clinical trial. BMJ Open. 2021;11(9):e047921 (in Eng). https://doi. org/10.1136/bmjopen-2020-047921. 25. Sawyer A, Kaim A, Le HN, et al. The effectiveness of an app-based nurse-moderated program for new mothers with depression and parenting problems (eMums plus): pragmatic randomized controlled trial. J Med Internet Res. 2019;21(6):e13689 (in Eng). https://doi. org/10.2196/13689. 26. Health Level 7. FHIR overview. https://www.hl7.org/ fhir/overview.html. 27. Health Level 7. Clinical decision support. https:// build.fhir.org/clinicalreasoning-cds-on-fhir.html. 28. LOINC. Answer File. https://loinc.org/answer-file/. 29. Office of the National Coordinator for Health Information Technology. United States Core Data for Interoperability (USCDI). https://www.healthit.gov/ isa/united-states-core-data-interoperability-uscdi. 30. Apple. HealthKit. https://developer.apple.com/ documentation/healthkit. 31. Google. Google Fit. https://developers.google.com/ fit. 32. Open mHealth. Open mHealth. https://www.openmhealth.org/. 33. Epic. Epic App Market. https://appmarket.epic.com/.

505 34. Cerner. App gallery. https://code.cerner.com/apps. 35. Boston Children’s Hospital Computational Health Informatics Program. SMART App Gallery. https:// apps.smarthealthit.org/apps/featured. 36. The Office of the National Coordinator. Key privacy and security considerations for healthcare application programming interfaces (APIS). https://www.hhs. gov/guidance/document/key-privacy-and-security- considerations-healthcare-application-programming- interfaces-apis. 37. Jercich K. Playing with FHIR? Don’t get burned, white-hat hacker cautions. https://www.healthcareitnews.com/news/playing-fhir-dont-get-burned-white- hat-hacker-cautions. 38. Health Level 7. Statement to the global community from hl7 international on the paper ‘Playing with FHIR: Hacking and Securing FHIR APIs”. https:// blog.hl7.org/statement-t o-t he-g lobal-c ommunity- from-hl7-international-on-the-paper-playing-with- fhir-hacking-and-securing-fhir-apis. 39. The Pew Charitable Trusts. Most Americans want to share and access more digital health data. https:// www.pewtrusts.org/en/research-and-analysis/issue- briefs/2021/07/most-americans-want-to-share-and- access-more-digital-health-data. 40. Kim J, Kim H, Bell E, et al. Patient perspectives about decisions to share medical data and biospecimens for research. JAMA Netw Open. 2019;2(8):e199550. https://doi.org/10.1001/jamanetworkopen.2019.9550. 41. Hammack-Aviran CM, Brelsford KM, McKenna KC, Graham RD, Lampron ZM, Beskow LM. Research use of electronic health records: patients’ views on alternative approaches to permission. AJOB Empirical Bioethics. 2020;11(3):172–86. https://doi.org/10.108 0/23294515.2020.1755383. 42. Naeem I, Quan H, Singh S, et al. Factors associated with willingness to share health information: rapid review. JMIR Hum Factors. 2022;9(1):e20702 (in Eng). https://doi.org/10.2196/20702. 43. Roehrs A, da Costa CA, Righi RD, de Oliveira KS. Personal health records: a systematic literature review. J Med Internet Res. 2017;19(1):e13 (in Eng). https://doi.org/10.2196/jmir.5876. 44. Douthit BJ, Del Fiol G, Staes CJ, Docherty SL, Richesson RL. A conceptual framework of data readiness: the contextual intersection of quality, availability, interoperability, and provenance. Appl Clin Inform. 2021;12(3):675–685 (in Eng). https://doi. org/10.1055/s-0041-1732423. 45. Margheri A, Masi M, Miladi A, Sassone V, Rosenzweig J. Decentralised provenance for healthcare data. Int J Med Inform. 2020;141:104197 (in Eng). https://doi. org/10.1016/j.ijmedinf.2020.104197.

Future Directions in Clinical Research Informatics

25

Peter J. Embi and Rachel L. Richesson

Abstract

As with the rest of healthcare and biomedicine, the COVID-19 pandemic has had profound impacts on biomedical informatics and CRI. In some ways, advances in research informatics capabilities over the recent past enabled responses including data sharing across sites for tracking and studying the effects of the pandemic as well as helping speed the development of vaccines. However, the pandemic also illuminated in stark ways the deep inequities in healthcare and research present in our systems. Despite global urgency, international will, and unprecedented motivation and collaboration toward developing tests, vaccines, and therapies and with some important exceptions, clinical research was still relatively slow and costly. Moreover, even when vaccines and treatments were demonstrated safe and effective enough for emergency use authorization, inefficacious communications and lack of public understanding and trust hampered the widespread translation of discoveries into practice, particularly P. J. Embi (*) Vanderbilt University Medical Center, Nashville, TN, USA e-mail: [email protected] R. L. Richesson Department of Learning Health Sciences, University of Michigan Medical School, Ann Arbor, MI, USA e-mail: [email protected]

among vulnerable and historically underserved populations. While there are myriad reasons for this that will likely become better understood over time, lessons learned from this experience will undoubtedly inform the future agenda for CRI, including a need for clarity and transparency of CRI methods to gain public trust in the integrity and value of research. If there was any doubt, it is now quite evident that efficient and effective approaches for clinical research are needed to enable the development, evaluation, deployment, and ongoing monitoring of cost-effective therapies. Indeed, this need is more important now than ever before and will continue to grow. Rapid advances in biomedical science, the growth of the human population, and escalating costs of healthcare are all fueling the need to accelerate the pace of biomedical discoveries and their translation into healthcare practice. Furthermore, the fundamentally information-intensive nature of clinical research endeavors and the growth in both health technology adoption and health-related data available for analytics and tailored interventions beg for the solutions offered by CRI. As a result, the demand for informatics professionals who focus on the increasingly important field of clinical and translational research increased. Despite the tremendous progress made to date, new models, tools, and approaches will be needed to fully leverage

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. L. Richesson et al. (eds.), Clinical Research Informatics, Health Informatics, https://doi.org/10.1007/978-3-031-27173-1_25

507

508

and mine these digital assets and improve CRI practice, and this innovation will continue to drive the field forward in the coming years. Keywords

Clinical research informatics · Biomedical informatics · Translation research · Electronic health records · Future trends · US policy initiatives · Health IT infrastructure · Data analytics · Data science · Learning health systems · Evidence-generating medicine Learning Objectives 1. Describe the value of Clinical Research Informatics (CRI) to different types of research activities across the translational science spectrum and describe the relationship of CRI to other subdomains of biomedical informatics. 2. Discuss the role of CRI in learning health systems, including activities related to evidence generation and evidence-based care, as well as the potential impact that CRI science, infrastructure, and professionals can have on population health and health equity. 3. Discuss the evolution of CRI as a scientific discipline and the emerging challenges for CRI and describe the current and future training and workforce needs for CRI.

mergence of CRI Discipline E Supporting Clinical and Translational Research As evidenced by the third edition of this book, in the past decade, clinical research informatics (CRI) has clearly become established as a dis-

P. J. Embi and R. L. Richesson

tinct and important biomedical informatics subdiscipline [1]. Given that clinical research is a complex, information-, and resource-intensive endeavor, one comprised of a multitude of actors, workflows, processes, and information resources, this is not a surprise. As described throughout the text, the myriad stakeholders in CRI, as well as their roles in the healthcare, research, and informatics enterprises, are continually evolving, fueled by technological, scientific, and socioeconomic changes. The changing roles in healthcare and biomedical research bring new challenges for research conduct and coordination but also bring potential for new research efficiencies, more rapid translation of results to practice, and enhanced patient benefits as a result of increased transparency, more meaningful participation, and increased safety. As Fig. 25.1 depicts, the pathway from biological discovery to public health impact (the phases of translational research) clearly is served by informatics applications and professionals working in the different subdomains of biomedical informatics. While descriptions of the translational spectrum have evolved since we produced this figure for the manuscript that defined the field, it still depicts the positioning of CRI amidst related informatics subdomains. Given that all of these endeavors rely on data, information, and knowledge for their success, informatics approaches, theories, and resources have and will continue to be essential to driving advances from discovery to global health. Indeed, informatics issues are at the heart of realizing many of the goals for the research enterprise.

25 Future Directions in Clinical Research Informatics

Fig. 25.1 Clinical and translational science spectrum research and informatics. This figure illustrates examples of research across the translational science spectrum and the relationships between CRI and the other subdomains

509

of translational bioinformatics, clinical informatics, and public health informatics as applied to those efforts. (From Embi and Payne [1], with permission)

tiatives like the Cancer Moonshot and the evolution of the All of Us Research Program for advancing precision and personalized medicine are other examples of this continued progress. As has been the case for the past two decades, As part of such initiatives, a focus on innovarecent years have seen the emergence and expan- tions for creating data resources and promoting sion of national and international research initia- common data models that will accelerate tives, as well as policy and regulatory efforts research on persistent and pervasive public focused on accelerating and improving clinical health concerns such as cancer and chronic disresearch capacity and capabilities. Indeed, a eases continues to grow. Additionally, the Cures range of initiatives funded by US health and Acts and its programs are empowering and human service agencies are helping to advance enabling patients to access and share their own the field. These include initiatives by the US health data, including for research, promising to National Institutes of Health (NIH), including radically change the ways research studies are important efforts related to the NIH Clinical and designed and conducted and further increasing Translational Science Award (CTSA) [2, 3] pro- opportunities for patients participate in research grams and the establishment of visible and well- and to influence what research is studied. As funded data science initiatives at NLM and research volume and activities progress an across the NIH. Increased funding as a result of evolve, critical CRI innovations do as well and the 21st Century Cures Act toward specific ini- with them the capabilities, professional develop-

I nitiatives, Policy, and Regulatory Trends in CRI

510

ment, education, and training around the practices and science of research informatics. The CTSA program in particular has fostered perhaps the most significant growth in both the science and practice of CRI and the closely related domains of translational research informatics, translational bioinformatics, and biomedical data science. In the past few years, this progress has continued with such efforts as data- sharing networks (e.g., ACTS) and work on interorganizational coordination of CRI activities (e.g., CD2H, Recruitment Innovation Center, Trial Innovation Centers), all of which foster informatics innovations to support pragmatic and multisite clinical research as well as recruitment innovations [4]. CTSA efforts have also continued to support CRI-related education and training (e.g., CLIC informatics videos) [5, 6]. Other NIH activities advancing efforts related to “big data” and “data science” also have direct relevance to CRI [7, 8]. The growth of data science illustrated by the completion of the Big Data to Knowledge (BD2K) awards the first phase designed to stimulate data-driven discovery via innovative methods, software, and training and more recently a second phase of awards designed to make the aforementioned products of research usable, discoverable, and broadly disseminated, embracing approaches that make biomedical data findable, accessible, interoperable, and reusable or “FAIR” [9]. Additionally, other CRI-related efforts led by institutes like the National Cancer Institute (NCI) [10–13] and National Library of Medicine [14, 15] will continue to advance work in the field. Beyond NIH, funders like the Agency for Healthcare Research and Quality (AHRQ) and the Patient-Centered Outcomes Research Institute (PCORI) are also driving advances in research data methods and techniques for CRI- related efforts, including comparative effectiveness and health services research [16–18]. In addition to such initiatives focused on advancing the science and practice of CRI, investments by institutions and by the government through the US Department of Health and Human Services (DHHS), the US Office of the National Coordinator for Health Information Technology (ONC), and the US Centers for Medicare and Medicaid Services (CMMS) have incentivized the adoption and “meaningful use” of electronic health

P. J. Embi and R. L. Richesson

records (EHRs). The Medicare Access and CHIP Reauthorization Act of 2015 (MACRA) emphasizes the use of patient registries for quality measurement and reporting. The resultant widespread health IT infrastructure now in place, while initially focused primarily on improving patient care, is starting to enable interoperable infrastructure that is allowing for data reuse across research networks [19–21]. While initially separate efforts, recent efforts to translate between prevailing data models and adopt common interchange standards, as well as updates to antiquated regulatory structures should enable increased interactions and enable more robust reuse of data and information from clinical care for public health and research improvements. The rapid development and adoption of the HL7 FHIR standard offers potential to transform clinical research and is garnering broad participation from pharmaceutical industry, research sponsors and regulators, health system leaders and consumer, and patient advocates in a number of projects through Vulcan FHIR Accelerator Initiative [22]. Vulcan projects include the development of FHIR resources and guidance to address the collection of RWD for research, the exchange of phenotype definitions, reporting of adverse events, and patient access to electronic information on medicinal products. The importance of the FHIR standard to clinical research is also emphasized in the 2020 report from the ONC on National Health IT Priorities for Research, which calls for improvements in data quality, harmonization, access to interoperable data, services for efficient storage and data, integration with other health data sources, and data aggregation [23].

ole of CRI in Learning Health R Systems: Data and Knowledge Management, Evidence Generation, and Quality Improvement A driving goal, to create and enable the learning health system, is now within reach, and early examples are coming online and more are likely to follow [24, 25]. Just as biomedical informatics approaches and resources are essential to realizing the potential of such systems for enhancing clinical care, so too are CRI methods, theories, and tools critical to realizing the vision of a learn-

25 Future Directions in Clinical Research Informatics

511

Fig. 25.2 Enabling a virtuous cycle of EBM and EGM is critical to realizing a learning health system, and there remain numerous enabling factors and key stakeholders that must be addressed and aligned to overcoming current challenges of the (a) current one-way flow of evidence in order to achieve (b) an ideal cycle where evidence informs practice and practice informs evidence. (From Embi and Payne [26], reproduced with permission)

ing health system that enables systematic evidence generation and application via clinical practice [26]. Indeed, fully leveraging our healthcare and research investments to advance human health will require even more emphasis on making sense of the ever-increasing amounts of data generated through healthcare and research endeavors. It is work in the field of CRI that will enable and improve such research activities, from the translation of basic science discoveries to clinical trials to the leveraging of healthcare data for population level science and health services research that enables its impact on care. Importantly, these advances will continue to require increased effort not just to the development and management of technologies and platforms but also to the foundational science of CRI in an increasingly electronic world [27]. By facil-

itating all of the information-dense aspects of clinical research, population management, and quality improvement, CRI methods and resources will enable the conduct of increasingly pragmatic and rigorous research programs to generate new and impactful knowledge [28]. In fact, the now ubiquitous presence of EHRs will allow the systematic collection of essential data that will drive quality improvement research, outcomes research, clinical trials, comparative effectiveness research, and population level studies to a degree not heretofore feasible [29]. In addition to the technological and informatics underpinnings already mentioned, realizing this promise will require increased attention and efforts by experts focused on advancing the domain of CRI. As depicted (Fig. 25.2), an informatics- enabled learning health system will enable the

512

virtuous cycle of evidence generation and application, leveraging both real-world experiences and data and applying increasingly computable knowledge artifacts in order to drive evidence-directed care and population management. Such a system will (a) enable the study of linkages from molecules to populations, (b) enable the development of tools and methods to enable evidence generation from real-world practice experience, (c) build bridges between health systems and research enterprises, and (d) enable the implementation and study of solutions to systematically improve healthcare delivery. While the potential value of a LHS is clear, the business case for individual organizations to transform in ways that enable systematic learning from every patient encounter remains elusive. Although the interest in value-based care is strong and investments in quality improvement efforts abound, there are still relatively few demonstrations system-wide learning health system efforts that fully align and leverage informatics and related activities and capabilities across the academic and operational aspects of the enterprise. Only after the needed incentives and regulations can be put in place will we see organizations invest in the infrastructure, data exchange standards, and data quality programs that can enhance embedded pragmatic research and provide evidence for EBM. Evidence generation via routine practice remains a national priority and requires a multi- disciplinary, team science approach that spans the academic-operational divide that persists today. Rethinking and realigning approaches to developing LHS-relevant research questions, approaches to evidence-generation and discovery, and incentive structures for those working on these challenges across academia and operations will have to be part of the solution if we are to realize an LHS ecosystem national and locally [30, 31]. Indeed, advances in CRI have already begun to enable significant improvements in the quality and efficiency of clinical research [32–34]. These have occurred through improvements in processes at the individual investigator level,

P. J. Embi and R. L. Richesson

through approaches and resources developed and implemented at the institutional level, and through mechanisms that have enabled and facilitated the endeavors’ multicenter research consortia to drive team science. As research becomes increasingly global, initiatives like those mentioned above provide opportunities for collaboration and cooperation among CRI professionals across geographical, institutional, and virtual borders to identify common problems, solutions, and education and training needs. Increasingly, investigators and professionals engaged in these groups are explicitly self-identifying as CRI experts or practitioners, further evidence for the establishment of CRI as an important, respected, and distinct informatics subdiscipline.

ultidisciplinary Collaboration is M an Essential Feature of CRI CRI professionals come to the field from many disciplines and professional communities. In addition to the collaborations and professional development fostered by such initiatives as the CTSA mentioned above, there is also a growing role for professional associations that can provide a professional home for those working in the maturing discipline. The American Medical Informatics Association (AMIA) is the most well-recognized of such organizations. Working groups focused on CRI within organizations like AMIA continue to see considerable growth in interest and attendance over the past decade. There has also been the emergence of operational professionals often referred to as chief research information officers (CRIOs) who are akin to CMIOs but focused on the research IT portfolios of academic health centers [35]. The past several years have also seen a growth in scientific conferences dedicated to CRI and the closely related informatics subdiscipline of translational bioinformatics (TBI). The main meeting hosted by AMIA has seen growing attendance and productivity among the informatics and clinical/translational research communities. The AMIA Informatics Summits (formerly

25 Future Directions in Clinical Research Informatics

the CRI/TBI summits) have continued to provide a forum for highlighting and disseminating the state of the art in research information science, innovation, and practice. In addition, journals like AMIA’s JAMIA, Applied Clinical Informatics, and JAMIA Open; the Learning Health Systems journal, with its new Learning from Data feature [36]; as well as other leading journals in the field have also seen growth in CRI-focused publications. The importance of CRI has led to editorial board members with CRI expertise, and even journal space special issues are dedicated to important topics in CRI [37]. The International Medical Informatics Association’s annual Yearbook of Medical Informatics now contains a dedicated section on CRI. Given the growth and maturation of CRI, it is likely that journals specifically focused on this domain will emerge in the years to come. In addition, other non-informatics associations and journals (e.g., DIA, the Society for Clinical Trials, Clinical Research Forum, and many professional medical societies) also increasingly provide coverage and opportunities for professional collaboration among those working to advance CRI. Efforts like these continue foster the maturity and growth so critical to advancing the field.

Challenges and Opportunities for CRI One of the keys to enabling a learning health system is the ability to enable systematic evidence generation through practice. While progress is evident, here again the pace has been far too slow considering the need and theoretical capabilities. For instance, leveraging data generated through practice to discover answers to the innumerable clinical questions that cannot be feasibly answered today due to the siloed, disconnected, and inaccessible systems demands a solution. One factor that remains a key challenge today is the incomplete but persistent paradigm that dictates clinical care and research as distinct activities that are related only in the application of

513

research evidence to practice, via evidence-based medicine [26]. Instead, CRI activities can increasingly create environments that enable a virtuous cycle of evidence generation and application, where “evidence-generating medicine” (EGM) paradigm is realized. As defined, EGM involves “the systematic incorporation of research and quality improvement considerations into the organization and practice of healthcare to advance biomedical science and thereby improve the health of individuals and populations” [26]. An EGM-enabled environment recognizes and supports the fact that (a) clinical care activities are not entirely distinct from research activities, (b) EGM must be enabled during practice to advance both research and care, (c) EGM activities are in fact ongoing, (d) advancing EGM is key to the desired EBM lifecycle, and (e) multiple enabling factors and stakeholders are essential to making this reality (Fig. 25.3) [26]. The generation of evidence for EBM requires robust research in embedded real-world settings, and our national capacity for pragmatic clinical trials to support this need is growing. The NIH Pragmatic Trials Collaboratory now has had 10 years of experience, supporting dozens of multisite pragmatic trials and spawning of similar programs to support pragmatic clinical trials in a number of domains including pain management, rehabilitation, and dementia [39–41]. The escalating costs of research and healthcare delivery—without corresponding value in knowledge generation or patient outcomes—continue to drive the need for efficient research that can be embedded within healthcare systems and rapidly applied to real-world settings. Such “embedded research” has the potential both to advance science and address specific health system needs, but to be successful, pragmatic approaches to nine elements of the research design are required [42, 43]. In general, pragmatic or embedded trials are conducted in healthcare settings, have broad eligibility criteria, and employ study recruitment procedures that augment existing clinical workflows (rather than use dedicated research staff), all of which contrast with traditional clinical

514

P. J. Embi and R. L. Richesson

Fig. 25.3 Creating an informatics-enabled evidence- health system. (From Payne and Embi [38], reproduced generating medicine (EGM) system: the virtuous cycle of with permission) evidence generation and application that fuels a learning

trials. In addition, (pragmatic) embedded research includes the design of interventions that are easily integrated into clinical workflows (and often into electronic health record (EHR) systems), flexibility in the delivery of interventions that can be customized to different patient or clinician preferences, measures of adherence to capture expected variation in the delivery of the intervention, the ability to capture follow-up as part of usual care with minimal data collection, and the selection of outcome measures that are meaningful to patients and other stakeholders and that can be easily collected as part of usual care. The use of EHR systems and data are foundational to all of

the aforementioned design elements that characterize pragmatic trials, and facilitating the enhancement (including implementation of data standards and patient generated data) and use of EHR data and systems for research will continue to be an active area for CRI [44, 45]. Specifically, the use of data captured in EHR systems can support feasibility assessment, screening and recruitment, as well as measurement of adherence and outcomes. EHR system functions can be used to help integrate new practice interventions into clinical workflows, enabling flexible delivery and customization of interventions and potentially increasing adherence and fidelity to the interventions and

25 Future Directions in Clinical Research Informatics

follow-up assessments. As discussed in the chapters throughout this text, the practical use of EHR and other health system-generated data for various research questions will require continued focus on the clinical data management and data governance, deliberate and routine data quality assessment, and the curation and enrichment of data for patient registries. There are social and technical challenges regarding EGM. While many of these fall into the socio-part of the socio-technical equation, the technical informatics issues still remain, largely due to the immaturity of research-enabled EHR systems and best practices for research data warehouses and repositories. While we are seeing increasing in the use of EHR data for research (largely due to the growth in EHR adoption and subsequent increase in electronic clinical data), the use of EHR data for observational research is severely challenged by data flow and bias. Despite promises and advances to date, data in EHRs are inherently incomplete and too often inadequate to tell the full story of a patient’s health. Transforming real-world data into valid and reliable real-world evidence requires not only appropriate analytical methods but also an understanding of data quality, provenance, and variability across sites and over time. To realize the full potential of real-world evidence, further study and advances in the socio-technical issues and informatics methods that mitigate or control for EHR-derived data will become increasingly important [46]. While reuse of local data is critical for research and EGM, there is also a great and growing demand and capability for enhancing and combining data across sites. CRI approaches that facilitate such activities range from promoting the development and adoption of data standards (such as US Core for Data Interoperability [47]) that can leverage data widely captured at the point of care to centralized and federated data- sharing capabilities and to NLP and text extraction approaches that enable free text data to be more easily leveraged for research. As evidenced from previous chapters in this book, these are all critical and dynamic areas of CRI and will continue to be so in the future.

515

In addition to pervasive and well-known issues of interoperability and quality of EHR data, social, political, and economic issues further limit the ability to develop and leverage EHR systems to support clinical research. While distributed research networks have been evolving and participation is growing, federated data- sharing models have their limitations. The COVID-19 pandemic brought many urgent and disparate questions (e.g., infection risk, complication/death risk, validation of studies for diagnostic data, vaccine effectiveness, treatment effectiveness) that required high quality and completed data to answer. With no systems in place, the NIH/NCATS quickly mobilized the N3C through the CTSA/CD2H program. The research and healthcare community in turn mobilized and cooperated. The N3C effort was one such remarkable feat (72 unique data transfer agreements as of 2020, producing 1.4 billion rows of data from more than 200,000 COVID- positive patients that has been used by researchers from over 120 institutions within the N3C enclave) [48]. Multiple distributed research networks put aside historical competition and pitched in to support national data collection to support the urgent need for research and discovery. The multiple and competing data models still necessitated a lot of mapping, but in the end four data models contributed to N3C (and each data model made it easy/ready for dozens of organizations so submit their data seamlessly). The partnership and collaboration achieved by the N3C was remarkable, but the cost and resources expended were significant, and it is not clear that this model is sustainable for other diseases. However, despite the resources and cooperation required for a centralized data repository, this approach has shown to provide more quality assurance and arguably analysis-ready datasets than federated models [49]. As the COVID-19 pandemic made clear, there is a great need for robust, ethical, reliable, research-derived evidence, and our current system suffers from grave disparities, inequities, and trust gaps that limit the impacts of research. Public health crises create urgency for answers, and desperation for evidence can sometimes

516

stress our dissemination mechanisms that are largely based on honor systems for biomedical science and research integrity. While good science can be performed quickly and accurately when required and while disseminating such findings with haste is critical during times of public health crises, errors and missteps can also erode faith in our largely honor-based scientific systems. We have for some time recognized that our scientific systems are poorly equipped to detect fraud, and this was again laid bare during the pandemic. One infamous example occurred early in the pandemic when in May 2020, the Lancet published “Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID19: a multinational registry analysis” [50]. On May 28, 2020, dozens of signatories—including clinical researchers, epidemiologists, health services researchers, statisticians, and informaticists—signed an open letter [51] to Lancet editors, challenging the integrity of the study and demanding specifics on the data. In this case, concerns related to the plausibility and veracity of the underlying data sources and the data themselves raised concerns upon publication. As this example demonstrates, CRI efforts to address and improve systems for enhancing research integrity must be created and maintained. Standard approaches and best practices for data that underlie research findings, sharing of data sources, sharing of underlying code, and details that can enable in-depth reviews as well as reproducibility will be key to the future. Informatics experts and capabilities can and must play a central role in this effort. Another vital area for CRI is the expansion and formalization of the role that patient and community engagement must play to ensure that research is relevant, representative, applicable, and impactful for the people who are the focus of such efforts. As efforts such as those by PCORI, the CTSA program, and the All of US effort have shown, there is great value in patients being a part of the research enterprise. Building upon guidance and early measures for patient engagement from such efforts, additional work in CRI will form an important effort moving forward.

P. J. Embi and R. L. Richesson

Other challenges and opportunities for CRI are emerging from the need for AI-driven solutions to benefit all in society, and here again CRI has a role to play. The growing use of AI/ML solutions in practice requires study and attention to the unintended consequences of building and deploying resultant algorithms on populations that are subject to systemic racism and consequent bias in underlying data upon which such systems are built and used. Approaches like algorithm vigilance, akin to the similarly CRI-support practice of pharmacovigilance, focused on monitoring, for intended and unintended effects of AI-derived algorithms must be advanced. This will require both upstream CRI efforts to address data quality and equity concerns as well as implementation science elements that ensure equitable application and outcomes, lest harm come to those who have been systematically underrepresented and undervalued in our society and healthcare system.

Training and Workforce Needs Realizing the full promise of CRI will require strategies to address the severe shortage of professionals currently working in the CRI domain. As with many biomedical informatics subdisciplines, training in CRI is and will remain interdisciplinary by nature, requiring the study of topics ranging from research methods and biostatistics, to regulatory and ethical issues in CRI to the fundamental informatics and IT topics essential to data management in biomedical science. As the content of this very book illustrates, the training needed to adequately equip trainees and professionals to address the complex and interdisciplinary nature of CRI demands the growth of programs focused specifically in this area. Furthermore, while there is certainly a clear need for more technicians conversant in both clinical research and biomedical informatics to work in the CRI space, there remains a great need for scientific experts working to innovate and advance the methods and theories of the CRI domain. In recent years, the National Library of Medicine (NLM), which has long supported

25 Future Directions in Clinical Research Informatics

training and infrastructure development in health and biomedical informatics, recognized this need by clearly calling out clinical research informatics as a domain of interest for the fellowship training programs it supports. While most welcome the NLM support, the availability of training and education remains extremely limited. Significantly, more capacity in training and education programs focused on CRI will be needed to establish and grow the cadre of professionals focused in this critical area if the goals set forth for the biomedical science and healthcare enterprise are to be realized. This will require increased attention by sponsors and educational institutions. In addition to training the professionals who will focus primarily in CRI to advance the domain, there is a major need to also educate current informaticians, clinical research investigators and staff, and institutional leaders concerning the theory and practice of CRI. Programs like AMIA’s 10 × 10 initiative and tutorials at professional meetings offer examples like a course focused on CRI that help to meet such a need [52]. Such offerings help to ensure that those called upon to satisfy the CRI needs of our research enterprise are able to provide appropriate support for utilization of CRI-related methods or tools, including the allocation of appropriate resources to accomplish organizational aims. As the workforce of CRI professionals grows, the field can be expected to mature further. While so much of the current effort of CRI is quite appropriately focused on the proverbial “low- hanging fruit” of overcoming the significant day- to-day IT challenges that plague our traditionally low-tech research enterprise, significant advances will ultimately come about through a recognition that biomedical informatics approaches are crucial centerpieces in the clinical research enterprise. Indeed, just as the relationship between clinical care and clinical research is increasingly being blurred as we move toward the realizing of a “learning health system,” so too are corollaries to be drawn between the current formative state of CRI and the experiences learned during the early decades of work in clinical infor-

517

matics. Those working to lead advances in CRI would do well to heed the lessons learned from the clinical informatics experiences of years past. Future years can be expected to see CRI not only instrument, facilitate, and improve current clinical research processes, but advances can be expected to fundamentally change the pace, direction, and effectiveness of the clinical research enterprise and discovery. Toward that end, groups are already working to develop maturity models and deployment indices that can be used to measure and compare CRI infrastructures as to their level of maturity and ability to support the research enterprise [53]. Such measures of CRI maturity will only grow and become more useful to inform progress in the years to come. Guided by such measures, we should expect to see CRI efforts continue to improve, with consequent improvements to scientific discovery, healthcare quality, and real-world evidence generation as learning health systems continue to evolve and mature.

Conclusion Given the rapid advances in biomedical discoveries, the lessons learned from experiences like the recent pandemic, the ongoing concerns for future health consequences of global population growth and climate change, and the escalating costs and inequities of healthcare, there is an ever-increasing need for clinical research that will enable the testing and implementation of cost-effective therapies at the exclusion of those that are not. The fundamentally informationintensive nature of such clinical research endeavors begs for the solutions offered by CRI. As a result, the demand for informatics professionals who focus on the increasingly important field of clinical and translational research will only grow. New models, tools, and approaches must continue to be developed to achieve this, and the resultant innovations are what will continue to drive the field forward in the coming years. It remains an exciting time to be working in this critically important area of informatics study and practice.

518

References 1. Embi PJ, Payne PR. Clinical research informatics: challenges, opportunities and definition for an emerging domain. J Am Med Inform Assoc. 2009;16(3):316–27. 2. Zerhouni EA. Translational and clinical science—time for a new vision. N Engl J Med. 2005;353(15):1621–3. 3. Zerhouni EA. Clinical research at a crossroads: the NIH roadmap. J Investig Med. 2006;54(4):171–3. 4. NCATS. CTSA Trial Innovation Network. https:// ncats.nih.gov/ctsa/projects/network. 5. Mendonca EA, Richesson RL, Hochheiser H, Cooper DM, Bruck MN, Berner ES. Informatics education for translational research teams: an unrealized opportunity to strengthen national research infrastructure. CTSA Synergy paper. J Clin Transl Res. Sept 2022. Accepted for Publication. 6. CTSA. Center for Leading Innovation and Collaboration (CLIC). Insights to inspire. https://clic- ctsa.org/taxonomy/term/5691. Accessed 15 Oct 2022. 7. Bourne PE, Bonazzi V, Dunn M, Green ED, Guyer M, Komatsoulis G, Larkin J, Russell B. The NIH big data to knowledge (BD2K) initiative. J Am Med Inform Assoc. 2015;22(6):1114. https://doi.org/10.1093/ jamia/ocv136. 8. NIH Strategic Plan. https://www.nih.gov/sites/default/ files/about-nih/strategic-plan-fy2016-2020-508.pdf. 9. Wilkinson M, Dumontier M, Aalbersberg I, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018. https://doi.org/10.1038/sdata.2016.18. 10. Oster S, Langella S, Hastings S, et al. caGrid 1.0: an enterprise grid infrastructure for biomedical research. J Am Med Inform Assoc. 2008;15(2):138–49. 11. Saltz J, Oster S, Hastings S, et al. caGrid: design and implementation of the core architecture of the cancer biomedical informatics grid. Bioinformatics. 2006;22(15):1910–6. 12. Niland JC, Townsend RM, Annechiarico R, Johnson K, Beck JR, Manion FJ, Hutchinson F, Robbins RJ, Chute CG, Vogel LH, Saltz JH, Watson MA, Casavant TL, Soong SJ, Bondy J, Fenstermacher DA, Becich MJ, Casagrande JT, Tuck DP. The cancer biomedical informatics grid (caBIG): infrastructure and applications for a worldwide research community. Fortschr Med. 2007;12(Pt 1):330–4. 13. Kakazu KK, Cheung LW, Lynne W. The cancer biomedical informatics grid (caBIG): pioneering an expansive network of information and tools for collaborative cancer research. Hawaii Med J. 2004;63(9):273–5. 14. Citation to clinicaltrials.gov final rule. https://prsinfo. clinicaltrials.gov. 15. Citation to common rule change. https://www. hhs.gov/ohrp/regulations-a nd-p olicy/regulations/ finalized-revisions-common-rule/index.html. 16. Holve E, Segal C, Lopez MH, Rein A, Johnson BH. The electronic data methods (EDM) forum for

P. J. Embi and R. L. Richesson comparative effectiveness research (CER). Med Care. 2012;50(Suppl):S7–10. https://doi.org/10.1097/ MLR.0b013e318257a66b. 17. Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS. Launching PCORnet, a national patient- centered clinical research network. J Am Med Inform Assoc. 2014;21(4):578–82. https://doi.org/10.1136/ amiajnl-2014-002747. Epub 2014 May 12. 18. PCORnet PPRN Consortium, Daugherty SE, Wahba S, Fleurence R. Patient-powered research networks: building capacity for conducting patient-centered clinical outcomes research. J Am Med Inform Assoc. 2014;21(4):583–6. https://doi.org/10.1136/amiajnl- 2014-002758. Epub 2014 May 12. 19. Califf RM. The patient-centered outcomes research network: a national infrastructure for comparative effectiveness research. N C Med J. 2014;75(3):204– 10. https://www.ncbi.nlm.nih.gov/pubmed/24830497. 20. Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, Suchard MA, Park RW, Wong IC, Rijnbeek PR, van der Lei J, Pratt N, Norén GN, Li YC, Stang PE, Madigan D, Ryan PB. Observational health data sciences and informatics (OHDSI): opportunities for observational researchers. Stud Health Technol Inform. 2015;216:574–8. 21. Klann JG, Abend A, Raghavan VA, Mandl KD, Murphy SN. Data interchange using i2b2. J Am Med Inform Assoc. 2016;23(5):909–15. https://doi. org/10.1093/jamia/ocv188. Epub 2016 Feb 5. 22. h t t p s : / / c o n f l u e n c e . h l 7 . o r g / d i s p l a y / VA / Vulcan+Accelerator+Home. 23. The Office of the National Coordinator for Health Information Technology. National health IT priorities for research. 2020. https://www. healthit.gov/sites/default/files/page/2020-0 1/ PolicyandDevelopmentAgenda.pdf. Accessed 16 Oct 2020. 24. Friedman CP, Wong AK, Blumenthal D. Achieving a nationwide learning health system. Sci Transl Med. 2010;2(57):57cm29. 25. Platt JE, Raj M, Wienroth M. An analysis of the learning health system in its first decade in practice: scoping review. J Med Internet Res. 2020;22(3):e17026. https://doi.org/10.2196/17026. 26. Embi PJ, Payne PR. Evidence generating medicine: redefining the research-practice relationship to complete the evidence cycle. Med Care. 2013;51(8 Suppl 3):S87–91. https://doi.org/10.1097/ MLR.0b013e31829b1d66. 27. Payne PR, Embi PJ, Niland J. Foundational biomedical informatics research in the clinical and translational science era: a call to action. J Am Med Inform Assoc. 2010;17(6):615–6. 28. Richesson RL, Green BB, Laws R, Puro J, Kahn MG, Bauck A, Smerek M, Van Eaton EG, Zozus M, Hammond WE, Stephens KA, Simon GE. Pragmatic (trial) informatics: a perspective from the NIH health care systems research collaboratory. J Am Med Inform Assoc. 2017;24(5):996–1001. https://doi. org/10.1093/jamia/ocx016.

25 Future Directions in Clinical Research Informatics 29. Embi PJ, Kaufman SE, Payne PRO. Biomedical informatics and outcomes research. Circulation. 2009;120:2393–9. https://doi.org/10.1161/ CIRCULATIONAHA.108.795526. 30. Bierer BE, Crosas M, Pierce HH. Data authorship as an incentive to data sharing. N Engl J Med. 2017;376:1684–7. https://doi.org/10.1056/ NEJMsb1616595. 31. Smoyer WE, Embi PJ, Moffatt-Bruce S. Creating local learning health systems: think globally, act locally. JAMA. 2016;316(23):2481–2. https://doi. org/10.1001/jama.2016.16459. 32. Payne PR, Johnson SB, Starren JB, Tilson HH, Dowdy D. Breaking the translational barriers: the value of integrating biomedical informatics and translational research. J Investig Med. 2005;53(4):192–200. 33. Sung NS, Crowley WF Jr, Genel M, et al. Central challenges facing the national clinical research enterprise. JAMA. 2003;289(10):1278–87. 34. Chung TK, Kukafka R, Johnson SB. Reengineering clinical research with informatics. J Investig Med. 2006;54(6):327–33. 35. Sanchez-Pinto LN, ASM M, Fultz-Hollis K, Tachinardi U, Barnett WK, Embi PJ. The emerging role of the chief research informatics officer in academic health centers. Appl Clin Inform. 2017;8(3):845–53. https:// doi.org/10.4338/ACI-2017-04-RA-0062. 36. Embi PJ, Payne PRO, Friedman CP. Learning from data: a recurring feature on the science and practice of data-driven learning health systems. Learn Health Syst. 2022;6:e10302. https://onlinelibrary.wiley.com/ doi/full/10.1002/lrh2.10302. 37. Embi PJ, Payne PR. Advancing methodologies in clinical research informatics (CRI): foundational work for a maturing field. J Biomed Inform. 2014;52:1–3. https://doi.org/10.1016/j.jbi.2014.10.007. 38. Payne PRO, Embi PJ, editors. Translational informatics: realizing the promise of knowledge-driven healthcare. London: Springer; 2014. 39. https://heal.nih.gov/research. 40. https://impactcollaboratory.org/. 41. https://sites.brown.edu/learrn/. 42. Loudon K, Treweek S, Sullivan F, Donnan P, Thorpe KE, Zwarenstein M, et al. The PRECIS-2 tool: designing trials that are fit for purpose. BMJ. 2015;350:h2147. https://doi.org/10.1136/bmj.h2147. 43. Richesson RL, Green BB, Laws R, Puro J, Kahn MG, Bauck A, et al. Pragmatic (trial) informatics: a perspective from the NIH Health Care Systems Research Collaboratory. J Am Med Inform Assoc. 2017;24(5):996–1001. 44. Weinfurt K. What is a pragmatic clinical trial: pragmatic elements: an introduction to PRECIS-2. In: Rethinking clinical trials: a living textbook of pragmatic clinical trials. Bethesda, MD: NIH Health Care Systems Research Collaboratory; 2021. Updated 23 Sept 2021. Available at: https://rethinkingclinicaltrials.org/chapters/pragmatic-clinical-trial/pragmatic- elements-an-introduction-to-precis-2/. https://doi. org/10.28929/092.

519 45. Marsolo K. Informatics and operations— let’s get integrated. J Am Med Inform Assoc. 2013;20(1):122–4. 46. Bastarache L, Brown JS, Cimino JJ, Dorr DA, Embi PJ, Payne PRO, Wilcox AB, Weiner MG. Developing real-world evidence from real-world data: transforming raw data into analytical datasets. Learn Health Syst. 2021;6(1):e10293. https://doi.org/10.1002/ lrh2.10293. 47. ONC. https://www.healthit.gov/isa/united-states- core-data-interoperability-uscdi. Accessed 19 Oct 2022. 48. Haendel MA, Chute CG, Bennett TD, et al., N3C Consortium. The National COVID Cohort Collaborative (N3C): rationale, design, infrastructure, and deployment. J Am Med Inform Assoc. 2020;28(3):427–43. 49. Pfaff ER, Girvin AT, Gabriel DL, Kostka K, Morris M, Palchuk MB, Lehmann HP, Amor B, Bissell M, Bradwell KR, Gold S, Hong SS, Loomba J, Manna A, McMurry JA, Niehaus E, Qureshi N, Walden A, Zhang XT, Zhu RL, Moffitt RA, Haendel MA, Chute CG, N3C Consortium, Adams WG, Al-Shukri S, Anzalone A, Baghal A, Bennett TD, Bernstam EV, Bernstam EV, Bissell MM, Bush B, Campion TR, Castro V, Chang J, Chaudhari DD, Chen W, Chu S, Cimino JJ, Crandall KA, Crooks M, Davies SJD, DiPalazzo J, Dorr D, Eckrich D, Eltinge SE, Fort DG, Golovko G, Gupta S, Haendel MA, Hajagos JG, Hanauer DA, Harnett BM, Horswell R, Huang N, Johnson SG, Kahn M, Khanipov K, Kieler C, Luzuriaga KR, Maidlow S, Martinez A, Mathew J, McClay JC, McMahan G, Melancon B, Meystre S, Miele L, Morizono H, Pablo R, Patel L, Phuong J, Popham DJ, Pulgarin C, Santos C, Sarkar IN, Sazo N, Setoguchi S, Soby S, Surampalli S, Suver C, Vangala UMR, Visweswaran S, Oehsen JV, Walters KM, Wiley L, Williams DA, Zai A. Synergies between centralized and federated approaches to data quality: a report from the national COVID cohort collaborative. J Am Med Inform Assoc. 2022;29(4):609–18. https:// doi.org/10.1093/jamia/ocab217. 50. Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. Lancet. 2020; https://doi. org/10.1016/S0140-6736(20)31180-6. 51. https://zenodo.org/record/3862789#.Y1CzgnbMJPa. 52. The Ohio State University-AMIA 10x10 program in Clinical Research Informatics. http://www.amia.org/ education/academic-and-training-programs/10x10- ohio-state-university. Accessed 14 Jul 2011. 53. Knosp BM, Barnett W, Embi PJ, Anderson N. Maturity models for research IT and Informatics reports from the field. In: Proceedings of the AMIA summit on clinical research informatics; 2017. p. 18–20. https://knowledge.amia.org/amia-64484- cri2017-1.3520710/t001-1.3521784/t001-1.3521785/ a011-1.3521792/ap011-1.3521793#pdf-container.

Index

A Academic health centers (AHCs), 57, 58 Adaptive clinical trials, 25, 45–47 Adaptive randomization methods, 39 Adoption and Implementation of Data Standards, 159–160 Advancing applied clinical research data standards, 399 Adverse drug event (ADE), 456 Adverse drug reaction classification system (ADReCS), 462 Adverse event (AE), 459, 460 Adverse event reporting, 391 Agency for Healthcare Research and Quality (AHRQ), 233, 235 American Health Information Management Association (AHIMA), 202 American Medical Informatics Association (AMIA), 3, 133, 512 Analog signal processing, 16 Analog to digital converters (ADC), 16 Analysable, 347, 349, 357 Analyzable data, 331–333, 335, 348, 349, 354, 356 Anomymization, 150, 332 Apps, 495–504 Argonaut, 403 Artificial neural networks, 418–420 Assessment methods, 292, 294–296, 302, 303, 305 B Best practices, 70–73 Big data, discovery and application, 466 Biobank, 71, 72, 74, 75 BioHub, 72 Bioinformatics, 311, 314, 317, 462 Bioinformatics sequence markup language, 312 Biomedical informatics, 3–5, 508, 516, 517 approaches and resources, 510 Biomedical ontologies, 368, 369, 378, 379, 384 Biomedical Research Integrated Domain Group (BRIDG), 102, 150, 370–372, 399, 400 Biomedical study classification, 27

definitions, 26 distinctive characteristics, 27 epidemiological studies, 27 experimental studies, 28 minimal intervention studies, 28–30 phase I, 24 phase II, 24 phase III, 24 phase IV, 24 Biorepository, 70–77, 84 Biospecimen, 70–75, 77 Business associate agreement (BAA), 207 C Cancer data standards repository (caDSR), 392 Cancer Genome Project, 321 CarePlan, 397 Case report forms (CRFs), 93, 98, 103, 105–107 CDISC created the Retrieve Form Data Capture (RFD), 163 CDISC data collection standard, 153 CDISC operational data model (ODM), 392 Centers for Medicaid and Medicare Services (CMS), 233, 395 Chief Clinical Information Officer (CCIO), 211 Chief Health Information Officer (CHIO), 211 Chief Information Officer (CIO), 211 Chief Medical Information Officer (CMIO), 211 China Food and Drug Administration (CFDA), 157 Christmas trees, 18 Classification of biomedical studies, 26–28 Clinical and translational science spectrum research, and informatics, 509 Clinical data acquisition standards harmonization (CDASH) standards, 391 Clinical data interchange standards consortium (CDISC), 9, 101, 149, 390 Clinical data management (CDM), 256–258, 260, 286 example 1, 172 example 2, 172 example 3, 172 example 4, 173

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 R. L. Richesson et al. (eds.), Clinical Research Informatics, Health Informatics, https://doi.org/10.1007/978-3-031-27173-1

521

522 Clinical data repositories, 414, 417, 428 Clinical decision support (CDS) logic, 395 Clinical entities, 462 Clinical genetic IT support, 400 Clinical information modeling initiative (CIMI), 399 Clinical natural language processing, 434, 435, 437–440, 442–446 Clinical or Contract Research Organizations (CROs), 58 Clinical registry, 232 Clinical research, 148–150, 152, 154–162, 242, 392, 434–437, 444–446, 474–489, 496, 498–504 data management, 51, 52, 54, 61 design, 52, 54, 57, 59–61, 63–66 funding, 57–59, 62 history of, 15 workflow, 51, 52, 56, 61–64, 66 Clinical research data analyzing and reporting data, 184 assessment and data analysis, 188 clinical concepts identification, 186 collection and management, 189 compatibility-proximity principle, 181 data elements, 186 data governance, 191–193 data quality, 170, 171, 174–177 design and assessment, 190 errors exist, 173, 174 exploration and availability assessment, 186, 187 extraction process, 187 FDA regulation, 190 identification and definition, 178, 179 measurement and observations, 181, 182 planning, 184, 185 processing data, 183 recording data, 183 relevant data fields, 187 results, 193 specification definition, 179–181 systematic planning, 177, 178 transformation and curation, 187, 188 types, 170 Clinical research data standards in drug development, 390 drug safety assurance activities, 390 eligibility criteria for trials, 391 with Health Information Systems, 392, 393 maintenance process, 403, 404 Clinical research environment academic health centers, 57, 58 administrative compliance, 55 administrative managers/coordinators, 60 administrative tracking, 55 advocacy organizations, 56, 57 associated data collection tasks, 54 budgeting and fiscal reconciliation, 55 clinical or contract research organizations, 58 cognitive complexity, 63 common goals, 60–61 common processes, 51–53 common settings, 60

Index communications processes, 62 complex technical processes, 62 computerized physician order entry systems, 61 data mining tools, 61 data safety and monitoring boards, 60 decision-support systems, 61 electronic data collection or capture tools, 61 electronic health records, 61 emergent trends, 63 evidence generating medicine, 64 federal regulatory agencies, 59 goals, 52 healthcare and clinical research information systems vendors, 59, 60 interruptions, 62 Learning Healthcare Systems, 64 literature search tools, 61 overview, 52 participant events enrolling, 54 identification, 53, 54 scheduling, 54 screening, 54 study protocol, 54 tracking, 54 participant screening tools, 61 patients organizations, 56, 57 protocol authoring tools, 61 quality assurance, 55 regulatory, 55 research-specific decision-support systems, 61 research-specific web portals, 61 screening and enrolling participants, 54 simulation and visualization tools, 61 software developers, 59, 60 sponsor reporting, 55 sponsoring organizations, 58, 59 study encounters, 54 successful study completion, 56 workflow and communications, 61, 62 Clinical research information systems (CRISs), 111–126 big science emergence modern astronomy, 19 particle physics, 20 socially interdependent process, 20 social transformation, 20 biomedical data, 17 clinical research subjects, 116 complexity computing capacity, 17 information processing, 17 computational power, 18 concepts, 113 contexts and attempts, 3, 4 current inefficiencies, 94, 95 data and information systems, 7, 8 data-driven discovery, 8–10 definition, 2, 3, 6, 9 EHR-related systems, 112, 113

Index certain experimental designs, 118 certain research designs, 118 cross-field validation, 117 data library, 118, 119 dynamic lists, 118 Skip logic, 118 specific privileges, 118 validation, 117 emerging policy trends, 141, 142 essential functions, 113 events, 115, 116 foundations of, 6, 8 fundamental theorem, 2 history, 15, 16 implementation representing experimental designs, 113 supports multiple studies, 113 infrastructures, 517 Initiatives, Policy and Regulatory Trends, 509, 510, 512 knowledge representation, 8, 9 local storage, 18 network capacity, 18 objectives, 4, 5 overview, 1, 2 perspective, 4, 5 pragmatic clinical trials analysis, 124 comparison therapy choice flexibility, 124 follow-up intensity, 124 outcome, 124 patient selection criteria, 124 personnel, 124 practitioner adherence, 124 subject compliance, 124 therapeutic flexibility, 124 quality control, 119 real-time self-reporting, 116 scope, 4, 5, 114 standards, 123, 124 comparable information, 21 consistent information, 21 constructs, 21 interoperable systems, 21 structured data, 117 study protocol, 93, 94 study stages patient-monitoring and safety, 122, 123 planning and protocol authoring, 119–121 protocol management, 121, 122 recruitment and eligibility determination, 121 telephonic signals, 17 time windows, 115 validation and certification, 123 vendor models, 112 workflow, 115 Clinical research policy core regulations common rule, 131–133 common rule revisions, 133, 134

523 food and drugs regulation & guidance, 134, 135 HIPAA privacy rule and research, 135 data sharing policies, 140, 141 foundational federal legislation Food, Drug and Cosmetic Act of 1938, 129, 130 Public Health Services Act of 1944, 130 regulatory science, real-world evidence, 138 21st Century Cures Act, 139 Clinical research system, 390, 391 Clinical research transparency, 332, 333 Clinical study data management, 121, 122 Clinical trial management systems, 391 Clinical trials transformation initiative (CTTI), 234, 402 Coalition for accelerating standards and therapies (CFAST), 162 collaborations, initiatives and tools, 162, 163 Coasian transactions, 458 Code of Federal Regulations (CFR), 131 Common clinical registry framework (CCRF) model, 242 Common data model (CDM), 246, 460 Common product models, 400 Common protocol template (CPT), 163, 238 Common rule, 131–134, 136, 206, 208, 215 Complete crossover design, 43 Complexity of clinical research informatics, 17 computing capacity and information processing, 17 of design protocol, 19 Compliance, 134 Computable phenotyping, 376 Computable study protocol, 93, 95, 96, 100 accurate data capture, 98 complete study plan, 96 computability and standardization, 100 decision support, 97, 98 facilitating timely, 98 interpretation and application of results, 99 statistical analysis and reporting, 98, 99 study data and artifacts, 99 Computational approaches, 465 signal detection, 460 Computational biology, 71, 79–83 Computational methods, 462 Computerized clinical trial, 222, 223 Computerized physician order entry (CPOE) systems, 61 Computing capacity and information processing, 17 Confidentiality, 209, 215 Consent, 131–135, 140, 141 Consolidated clinical document architecture (C-CDA), 238 Consumer health information, 481 Consumer health movement, 483 Content standards, 238 Continuity of Care Document (CCD), 401 Coordinated Research Infrastructure Building Enduring Life-science Services (CORBEL), 159, 163 Core Protocol version 1, 140 Covered entity, 207 COVID-19, 65

Index

524 Critical Path Institute (C-Path), 162, 402 Crossover designs, 28, 33, 40, 42–45 D Data access, 496, 502, 504 Data accuracy, 172, 179, 181, 183, 184 Databases, 256, 258, 260–262, 264, 266, 269, 270, 275, 280, 282, 283 Data collection, 170–173, 177–181, 183–185, 189–192 Data content ontology, 372, 373 Data entry, 256, 263, 280, 282 Data exchange, 390, 392, 395, 396, 402, 403 Data exchange standards, 237, 238, 395, 396, 402 Data governance coherent data governance program, 201 data and information governance program, 214 data-information-knowledge, 202 data lifecycle, 204, 205 data manifold, 202 data protection to research ethics, 206–209 decision matrix, 214 definition, 200 electronic patient data, 201 implementation, 213 information governance, 202, 209–211 master data, 215 organization and roles, 211, 212 outcomes, 201 structures and processes, 201 value of data, 203, 204 Data integration, 371, 379–382 Data integrity, 157, 201, 203, 204, 206, 209 Data intensive project, data exchange and preparation and exchange activities, 395 Data management, 256–261, 263–287 Data management plans (DMPs), 260, 261, 265, 267, 277, 279, 283 Data mining, 415, 417, 421–425, 428 Data mining tools, 61 Data model, 460 Data quality, 170–194 Data safety and monitoring boards (DSMBs), 60 Data science, 509, 510 Data sharing and reuse, 128, 139–141, 148, 150 benefits, 150, 151 Data standards, 149, 156, 159, 160, 162, 237 Data storage analytic sophistication, 19 data density, 18, 19 design complexity, 19 Decision-support systems, 61 Decision trees, 418–420 Deep learning, 437–440, 443, 444 De-identification, 207 Delayed hepatotoxicity, 464, 465 Digital Imaging and Communications in Medicine (DICOM), 395 Digitalization of biomedical data, 17 Digitalization of healthcare data, 16, 17, 19, 457

Digital signal processing (DSP), 16 Digital to analog converters (DACs), 17 Digitize action collaborative, 400 Discovery science, 52 Dose-titration design, 44 Drug therapy, 466 E E2B data model, 460 Economical translation of data, 461 eDiaries, 151 EHR-based recruitment, 223, 224 EHR data and data sharing, 158 Electronic case report form (eCRF), 150 Electronic clinical research study, 160 Electronic data collection devices, 299, 300 Electronic health data, 148 Electronic health record (EHR), 61, 70, 76, 77, 111, 152, 156, 174, 257, 260, 263, 273–277, 282, 434, 435, 437–440, 442–446, 463, 510 low-risk clinical studies, 114, 115 patient-encounter data, 119 Electronic medical record (EMR), 100 Electronic Health Records for Clinical Research (EHR4CR), 163, 164 Electronic Health Record systems for Clinical Research, 163 Electronic phenotyping, 379, 382, 383 Eligibility Rule Grammar and Ontology (ERGO) project, 104 Employer identification number (EIN), 240 Enterprise data warehouses (EDW) data marts or registries, 395 Epidemiological studies, 27 prospective studies, 27 retrospective studies, 28 E-protocol, 93–101, 104–108 Equivalence/non-inferiority studies, 33, 35–37 eSource data, 149 eSource data interchange, 152 eSource implementations, 151, 157, 158 eSource methodology assessments, 158 eSource process, 154 eSource solution, 154 European Clinical Research Infrastructure Network (ECRIN), 163 European Medicines Agency (EMA), 390 Evidence-generating medicine (EGM), 513, 514 Executed study protocol, 92 Experimental designs, 34, 37, 40 antidotes against bias blinding, 39 cluster randomization, 39 double-blind clinical trial, 40 simple randomization, 39 stratified randomization, 39 basic concepts, 37 crossover designs, 28, 33, 40, 42–44 variants of, 44

Index definitions, 37 innovative approaches, 45, 49 parallel group designs, 40–42 variants of, 44 single treatment group, 37, 38 Experimental studies, 28, 29 clinical trial, 29, 30 study treatments concomitant treatments, 34 control treatment, 34 experimental treatment, 34 superiority vs. non-inferiority, equivalence/non- inferiority studies, 35–37 treatment effect definition end-point to group indicator, 31 group indicator to signal, 31 measurement to end-point, 30 measurements, 30 Expert determination, 207 Exposome, 311, 318, 321 Exposomics, 309, 311 Exposure health, 324 F Factorial designs, 44 Fast healthcare interoperability resources (FHIR), 154, 163, 238, 396, 397, 497, 500 Federal research policy, 142 Federal-wide assurance (FWA), 208 Food and Drug Administration (FDA), 112 Food and Drug Administration Amendments Act (2007), 233 Food and Drug Administration’s (FDA) Sentinel Initiative, 164 Food, Drug & Cosmetic (FD&C) Act, 129 Function and outcomes research for comparative effectiveness in total joint replacement (FORCE-TJR), 235 Functional analysis data, 313–315 Future developments, 355 G GenBank, 20, 313 General health data exchange standard, 396 Genetic data, 314, 322 Genomes project, 315 Genomics, 70, 72, 73, 79, 81 Genomics data, 311–312, 315 Genomics metadata, 397 Good clinical practice (GCP), 4 Graphical processing units (GPU), 18 H Hallmark, 21 Health care and patient health outcomes, 161 Healthcare delivery systems, 390 Healthcare informatics, 397, 409

525 Healthcare Information and Management Systems Society (HIMSS), 402 Healthcare information systems, 392, 400 Healthcare Information Technology IT Standards Panel (HITSP), 152 Healthcare standards, types, 394, 395 Health consumerism, 474 Health data standards to clinical research, 392 Health information seeking behaviors, 480 Health Insurance Portability and Accountability Act (HIPAA), 130, 134–136, 206, 207, 215 Health IT infrastructure, 510 Health Level Seven (HL7), 100, 163, 237, 396 Health outcome of interest (HOI), 463 Health plan identifier (HPID), 240 History of clinical research, 15 HL7 Clinical Document Architecture (CDA) standard, 401 HL7 Patient Care workgroup, 242 Honest broker, 209 Human Biomolecular Atlas Program (HuBMAP), 71, 72, 77–81 Human genome project, 310 Human subjects protection monitoring, 55 Human subjects protection reporting, 55, 56 Human Tumor Atlas Network (HTAN), 72, 74, 77, 78, 80–82 I ICD-10-CM, 238 Imaging, 70–72, 74, 78, 80–83 Immunotherapy, 83 Incomplete crossover designs, 44 Individual participant data (IPD), 330, 331, 339, 348, 349 Infectious Diseases Data Observatory (IDDO), 163 Inforamtics-enabled learning health system, 511 Informatics, 70–75, 77, 79, 84, 171, 179, 189, 456, 460, 461, 464–466, 468 applications, 508 general applicability, 456 interventions in clinical research recruitment, 222, 223 ontologies, 461 Information architecture, 207 Information-based recruitment workflows, 222 Information governance, 209–211, 214 Information-intensive domain, 17, 20 Innovative Medicine Initiative, 164 Institutional Review Boards (IRBs), 132, 208 Integrated and interoperable health information systems, 18, 21, 393 Integrating the Healthcare Enterprise (IHE), 163, 402 Internal and external threats, 201, 214 Internal Review Board (IRB), 212 International Conference on Harmonization (ICH), 171 International Patient Summary Implementation Guide (IG), 403

526 International standards, 333–336, 338, 340–346, 349, 355–358 International Standards Landscape, 400 Internet-based patient matching systems, 223 Interoperability, 149, 237, 242, 243, 248 Interoperable information, 21 J Joint Initiative Council (JIC), 400 K k-nearest neighbor classification, 420 Knowledge discovery in databases (KDD), 414–422 L Latin square design, 44 Learning Health Community (LHC), 164 Learning health system (LHS), 160, 161, 210, 235, 236, 242, 393, 511–513, 517 LOINC, 238 M Machine learning (ML), 414, 422, 437–440, 444–446, 462 and algorithmic computation, 458 approaches, 462 combined with statistical techniques, 463 high quality curated datasets, 464 Management of clinical data, 172, 173 Mapping process, 398 MedDRA model, 461 Medical and industrial research, 461 Medical device epidemiology network (MDEpiNet), 240 Medical language system, 318 Messaging standard, 396 Metabolomics, 309, 311 Microbiome, 312, 316, 317, 322, 323 Microbiomics, 309, 311 Mobile technologies, 151 Molecular biology, 310–312, 315, 317 diagnostic methods, 321, 322 functional analysis data, 313–315 future, 323, 324 genomic data, 311, 312 human variation, 315, 316 integration platforms, 317–320 mechanisms of disease, 321 molecular data to support clinical research, 320 molecular epidemiological data, 323 omics data clinical application, 317 sequence analysis data, 312, 313 structure analysis data, 313 therapeutic applications studies, 321, 322 Molecular Epidemiological Data, 323 Multiplexed imaging, 70, 78, 80, 83 Multistage designs, 44

Index N Named Entity Recognition (NER) tasks, 462 National Academy of Medicine (NAM), 174 National Council for Pharmacy Drug Program (NCPDP), 395 National Institute for Standards and Technology (NIST), 401 National Institutes of Health (NIH), 170 National Library of Medicine (NLM), 130 National Mesothelioma Virtual Bank (NMVB), 71, 72, 80 National provider identifier (NPI), 240 Natural language processing (NLP), 434, 435, 437–440, 442–446 NIH Clinical and Translational Science Award (CTSA), 509 NLM Value Set Authority Center (VSAC), 395 Non-informatics associations and journals, 513 O Observational Health Data Sciences and Informatics (OHDSI), 164, 398 Observational Medical Outcomes Partnership (OMOP) CDM, 398 Observational Medical Outcomes Pilot (OMOP), 460 Observational research methods, 243 Observational study, 27–29, 232 Occam’s razor, 179 OneMind, 164 Ontologies, 462 Ontology Web Language (OWL), 100, 102, 103 Outcome data by patient report, 292–305 Outcomes measurement, 241 P Parallel group designs, 28, 33, 40–43, 48 Participant screening tools, 61 Patient Centered Outcomes Research Institute, 164 Patient empowerment, 474, 480, 488 Patient registries biomarker discovery, 233 biomedical and health services research, 248, 249 CMS centralized repository of registries, 233 comparative effectiveness research, 233 definitions, 232 EHR systems, 237 envisions registries, 234 error and bias types, 243 FORCE-TJR registry, 235 inclusion criteria types, 232 informatics approaches clinical system, 245 cost and commitment, 245 critical functions of, 231, 247 efficiency calculus, 245 federated form, 246 FHIR API, 246 minimal mapping work, 245

Index NQRN clinical registry maturational framework model, 246 PCORnet, 246 structured reporting, 246 tidy data, 246 interoperability and data standards CCRF model, 242 clinical models and data elements, 239 clinical phenotype, 240, 241 coding systems and controlled terminologies, 238 content standards, 238 data exchange standards, 237, 238 outcome measures, 241 UDI, 239, 240 LHS, 235, 236 limitations of, 243 NIH inventory, 233 patient safety, 233 post-market surveillance, 233 qualified registries, 235 risk mitigation and evaluation systems, 233 RoPR, 233 support clinical trials planning and recruitment, 233 Patient-reported outcome measurement information system (PROMIS), 303–305 Patient-reported outcomes (PRO), 292–305 PCORnet common data model (CDM), 212, 246, 398 Personal identifying information (PII), 208 Personalization of medicine, 488 Pharmaceuticals and Medical Device Agency (PMDA), 157 Pharmacovigilance, 456–468 in clinical research, 456 definition, 456 development of, 456 drug safety, 457 regulatory and legal requirements, 457 Phase I, II, III, and IV trials, 24, 25, 32–35, 41, 44–47 Phenotypes, 274–277 Planning, quality improvement, 158, 159 Pluripotent storage model, 247 Population health problems, 392 Post marketing surveillance, 456 Postgenomic era, 311 genomics data, 311–312 molecular biology, 311 Postmarketing phase, AEs, 463 Precision medicine, 52, 63, 65 Precision pharmacovigilance, 465 Privacy, 129, 130, 135, 136, 139, 140, 204, 207–211 Process analysis and design, 160 Processing methods, 171, 183–185 Prospective studies, 27 Protected health information (PHI), 206 Protective factor, 27 Protocol authoring tools, 61 Protocol models care and research, 106 eligibility criteria representation standards, 103, 104 EMR data, 105, 106

527 future, 107, 108 improving study design, 105 Protocol representation standards, Health Level 7, 100 Protocol-results-data, 354 Pseudonymization, 150 Public disclosure of results and analysable, 347 Public health reporting, 393 Public Health Services Act (PHSA), 129, 130 Public policy, 128–142 PV/drug safety, development and evolution, 458 Q Quasi-registries, 244 Queries, 264, 275–277, 280–283 R Random error, 26 Randomization, 28 Real-time electronic data validation, 117 Real World Data (RWD), 52, 60, 64, 65, 158, 261 Real world evidence (RWE), 64, 65, 151 Reengineering, 161 Reference information model (RIM), 100, 396 Registries, 232–249 Registry of patient registries (RoPR), 233 Regulated clinical research information management (RCRIM), 399 Regulatory frameworks, 206 Regulatory science, 136–138 Regulatory support systems, 112, 114, 125 Reliability, 293–297, 302, 305 Representational state transfer (RESTful) APIs, 238 Research data collection, 194 Research data governance, 200–215 Research data management, 117 Research data repositories, 332, 333, 349–351, 355–358 Research electronic data capture (REDCap), 72, 74, 112 Research integrity, 330, 332, 349, 356, 357 Research logistics support, 111 Research metadata ontology, 370, 380 Research methods and biostatistics, 516 Research recruitment workflows, 220–222 Research-specific decision-support systems, 61 Research transparency, 335, 340, 349 Resources for health (RFH) project, 396 Results databases, 333, 339, 347–349, 354, 355, 357, 358 Retrieve protocol for execution (RPE) standards, 163 Retrospective studies, 28 Reusing clinical data, 148 Risk factor, 27 RxNorm, 238, 240 S Safe Harbor method, 207 Scales, 292–295, 300–302 SDO Charter Organization (SCO), 401 Secondary data use, 150, 237, 248

Index

528 Secondary use, 172, 174, 175, 185–189 Security, 204, 206, 207, 209, 212, 213 Semantic interoperability, 149 Senescence Network (SenNet), 71, 72, 77, 78, 80, 82 Sequence analysis data, 312, 313 Sequence ontology, 312 Simultaneous treatment design, 44 Single cell genomics, 70, 77 Skip logic, 118 SNOMED CT, 238, 239, 463 Social determinants of health (SDoH), 70, 76, 77, 84 Spatial biology, 70–72, 77, 82–84 Standard operating procedures (SOPs), 190 Standard Protocols Items for Randomized Trials (SPIRIT), 102 Standards, 256, 260, 262–264, 266, 267, 270, 276, 278, 282, 284, 286 Standards and Whole Slide Imaging, 71 Standards development organizations (SDO), 392, 401, 406 STARBRITE project, 152 Storage Standard for Medical Information Exchange (SS-MIX), 153 Structure analysis data, 313 Structured product labeling, 400 Study data tabulation model (SDTM), 391 Study design classification, 26–28 phase I, II, III, and IV, 24, 25, 32–35, 41, 44–47 Study protocol, 92–97, 99, 101, 105, 107 Study protocol representation clinical research informatics, 93, 94 clinical research study, 92, 93 Substitutable Medical Applications and Reusable Technologies (SMART), 164 Superiority versus non-inferiority studies, 35–37 Support vector machines, 418, 420 Surrogate end-points, 32 Symbolic approach, 437–440, 442, 445 Synthetic data, 352, 353 Systematic error, 26

T Targeted designs, 45, 47, 49 Theorem of informatics, 2 Tidy data, 246 Traceability, 149 Transcelerate, 402 Transcription errors, 160 Transcriptome, 310 Transcriptomics, 311 TRANSFoRm project, 153 Translational bioinformatics (TBI), 512 Translational Research and Patient Safety in Europe project (TransFoRm), 164 Translational science, 160 Trial registration, 332–337, 340–343, 348–350, 354, 356, 357 Trial registries, 331–333, 335–338, 340–343, 345–349, 354, 355, 357, 358 Tuskegee Syphilis Study, 131 21st Century Cures Act, 139, 240 U Unanticipated risks, 208 Unified medical language system (UMLS), 98 Unified modeling language (UML), 100, 102, 103 Unique device identification (UDI), 240 User perspectives, 343, 345, 346, 353 V Validity, 294, 296, 301, 302, 305 Valuable learning health systems, 148 W Web ontology language, 100 Whole genome sequences (WGS), 18–19