148 100 10MB
English Pages 439 [429] Year 2021
Intelligent Systems Reference Library 212
Chee-Peng Lim · Yen-Wei Chen · Ashlesha Vaidya · Charu Mahorkar · Lakhmi C. Jain Editors
Handbook of Artificial Intelligence in Healthcare Vol 2: Practicalities and Prospects
Intelligent Systems Reference Library Volume 212
Series Editors Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK
The aim of this series is to publish a Reference Library, including novel advances and developments in all aspects of Intelligent Systems in an easily accessible and well structured form. The series includes reference works, handbooks, compendia, textbooks, well-structured monographs, dictionaries, and encyclopedias. It contains well integrated knowledge and current information in the field of Intelligent Systems. The series covers the theory, applications, and design methods of Intelligent Systems. Virtually all disciplines such as engineering, computer science, avionics, business, e-commerce, environment, healthcare, physics and life science are included. The list of topics spans all the areas of modern intelligent systems such as: Ambient intelligence, Computational intelligence, Social intelligence, Computational neuroscience, Artificial life, Virtual society, Cognitive systems, DNA and immunity-based systems, e-Learning and teaching, Human-centred computing and Machine ethics, Intelligent control, Intelligent data analysis, Knowledge-based paradigms, Knowledge management, Intelligent agents, Intelligent decision making, Intelligent network security, Interactive entertainment, Learning paradigms, Recommender systems, Robotics and Mechatronics including human-machine teaming, Self-organizing and adaptive systems, Soft computing including Neural systems, Fuzzy systems, Evolutionary computing and the Fusion of these paradigms, Perception and Vision, Web intelligence and Multimedia. Indexed by SCOPUS, DBLP, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.
More information about this series at https://link.springer.com/bookseries/8578
Chee-Peng Lim · Yen-Wei Chen · Ashlesha Vaidya · Charu Mahorkar · Lakhmi C. Jain Editors
Handbook of Artificial Intelligence in Healthcare Vol 2: Practicalities and Prospects
Editors Chee-Peng Lim Institute for Intelligent Systems Research and Innovation Deakin University Waurn Ponds, VIC, Australia
Yen-Wei Chen College of Information Science and Engineering Ritsumeikan University Shiga, Japan
Ashlesha Vaidya Royal Adelaide Hospital Adelaide, SA, Australia
Charu Mahorkar Avanti Institute of Cardiology Nagpur, India
Lakhmi C. Jain KES International Shoreham-by-Sea, UK
ISSN 1868-4394 ISSN 1868-4408 (electronic) Intelligent Systems Reference Library ISBN 978-3-030-83619-1 ISBN 978-3-030-83620-7 (eBook) https://doi.org/10.1007/978-3-030-83620-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This volume is a sequel of Handbook of Artificial Intelligence in Healthcare. The first volume focuses on advances and applications of artificial intelligence (AI) methodologies in several specific areas, i.e. signal, image and video processing as well as information and data analytics. In this volume, several general practicality challenges and future prospects of AI methodologies pertaining to the healthcare and related domains are presented in Part I and Part II, respectively. A total of 17 chapters are included in this volume. A description of each contribution is as follows. Decision-making and control in healthcare environments are essential activities. AI-based tools are useful for informed decision-making by both physicians and patients. Albu et al. present several intelligent paradigms, particularly artificial neural networks and fuzzy logic, for modelling, prediction, diagnosis and control in healthcare applications. These intelligent tools are able to assist in decision-making and control processes for prevention, early detection and personalized healthcare. Triberti et al. aim to tackle the “human” challenge pertaining to AI in healthcare practice, focusing on the potential risk in the doctor–patient relationship. Noting that there is still limited knowledge on the usage of AI in health and medicine, they study the guidelines for identifying people who work with AI in the healthcare context. They argue that it is important to form an interdisciplinary team with members who are able to value both rigorous practice and health and well-being of patients. Belciug acknowledges the cross-fertilization of statistical analysis and AI for devising new and impactful methods to assist in medical practice and discovery. It is necessary to exploit statistical analysis for validating AI-based methodologies in healthcare, in order to improve reliability and credibility of the findings. In addition, useful plan, design and implementation of statistical analysis with respect to AI in healthcare research are discussed. Pedell et al. examine the benefits of introducing humanoid robots into different active ageing and aged care settings. It is found that implementation and interaction with robots require a well-designed plan, in order to develop trust and interest for creating a shift in feelings of control pertaining to older adults as well as staff. In a group setting, older adults can engage and enjoy the interaction with both the robot and the wider group with positive effects. Successful interactions between v
vi
Preface
older adults and humanoid robots also need to be supported by motivational goal modelling and technology probe techniques. To combat cancer, which is a leading cause of mortality worldwide, physical activity (PA) plays a significant role in reducing the risk of developing cancer. Dadhania and Williams investigate the use of digital wearable tools in offering advantages including scale, cost and data capture. Specifically, current methods of evaluating PA in cancer patients and how wearable accelerometers are used in cancer clinical trials are studied. The successes and challenges associated with collecting PA data with wearable accelerometers in digital healthcare trials are discussed. Stankova et al. develop an online application of a home-administered parentmediated program for children with Autism spectrum disorder for enhancement of their communication skills. The program is organized in modules, each with different text and visual cards, targeting impressive/expressive language, discourse abilities and other functions. The instructional component for parents involves activities within the Moddle e-educational platform. The administration for the program follows a strict schedule, which is also available in Moodle. To overcome the “black-box” issue, Gerlings et al. focus their research on explainable AI models. Different explanation needs with respect to stakeholders in the case of classifying COVID-19 patients are studied. The need for a constellation of stakeholders involved in human-AI collaborative decision-making is highlighted. The study provides insights into how AI-based systems can be adjusted to support different needs from stakeholders, in order to facilitate better implementation in the healthcare context. Resta uses a neural network model, i.e. the self-organizing map (SOM), to identify the emergence of COVID-19 clusters among different regions in Italy, in an attempt to explain different characteristics of the pandemic within the same country. Demographic, healthcare and political data at the regional level are considered, and the interactions among them are examined. By leveraging capabilities of the SOM model, the relations among variables can be visualized, and an early warning system can be developed to address further intervention in the battle against the COVID-19 pandemic. Casacuberta and Vallverdú indicate that universal emotion leads to a conceptual bias in the use of AI in medical scenarios. Indeed, emotional responses in medical practices are mediated culturally. As a result, a multicultural approach is required in the medical context, taking special consideration of emotional variations with respect to different cultural background of patients. From the computational perspective, the most common biases that can originate from data treatment utilizing machine learning algorithms are discussed. The Russian Hoc Group on Application of AI Technologies in Health Informatics (AHG2 TC215 ISO) highlights the importance of designing and deploying AI-based systems in accordance with established guidelines and legislation for medical applications. In this respect, the formation of unified approaches, definitions and requirements for AI in medicine can significantly increase efficiency of the associated development and application. A consistent approach through global standardization can reduce the burden of stakeholders when establishing regulatory frameworks.
Preface
vii
Initiatives to define goals and directions for standardization pertaining to AI in the healthcare areas are discussed. Gusev et al. discuss AI research and development in Russia, where government and expert community are working together to develop legal and technical regulations. AI-based software products for diagnostic and treatment processes, including clinical trials, are regulated comprehensively. A balance between accelerating time to market of AI products and ensuring their safety and efficacy is required along with appropriate consideration on the potential risks and problems. The first series of Russian national technical standards to accelerate AI product development and instil trust in medical practitioners are being established. Kolpashchikov et al. address issues and challenges on the use of robotic technologies in healthcare. In addition to surgical and rehabilitation robots, non-medical robots that are useful for healthcare organizations to reduce costs, prevent disease transmission and mitigate the lack of workforce are reviewed. One critical issue that prevents future development of robotic in healthcare is lack of autonomy, which is most challenging in minimally invasive surgery where flexible robots are used in confined spaces. Innovative solutions for producing flexible robots as well as new robotic designs with appropriate actuators and sensors are required. Belandi et al. conduct a review on the development of Internet of things (IoT) and machine learning for smart healthcare systems. Utilizing smart healthcare technologies encompassing IoT and machine learning devices for monitoring home environments is becoming popular, particularly for elderly patients with long-term nonacute diseases who do not require hospitalization. The survey focus is placed on two aspects, namely architectures and algorithms, of the available technologies. A taxonomy for classification of the reviewed models and systems is provided. Hoppe et al. highlight the lack of studies on the potential of digital business models in the healthcare sector. Key performance indicators (KPIs), individualization, efficiency and communication channels are identified as the main factors. An evaluation with a structural equation modelling process indicates that KPIs and communication channels have a significant influence on the potential of digital business models and their processes in healthcare. An outlook on the benefits and challenges pertaining to the rapid development of AI in the healthcare sector is presented. Manresa-Yee et al. explore the transparency and interpretability issues of AI, particularly deep neural network models. Through explainable AI, users are able to understand the predictions and decisions from AI-based systems, increasing trustfulness and reliability of the systems. An overview on explanation interfaces in the healthcare context is discussed. A survey on healthcare related to studies on explanations in the form of natural text, parameter influence, visualization of data graphs or saliency maps is presented. Giarelis et al. introduce a graph-based text representation method for discovery of future research collaboration in the medical field. The method combines graph-based feature selection and text categorization for formulation of a novel representation of multiple scientific documents. The proposed method is able to provide useful predictions on future research collaborations, as demonstrated through the use of the COVID-19 Open Research Data Set.
viii
Preface
Shopon et al. investigate information security by combining privacy concepts and biometric technologies. An analysis on the protection of physiological and social behavioural biometric data through a variety of authentication applications is given. Current and emerging research studies in the multi-modal biometric domain, including the use of deep learning-based methods, are explained. Open questions and future directions in this research field are discussed, offering new methods in biometric security and privacy investigation and providing insights into the emerging topics of big data analytics and social network research. The editors are grateful to all authors and reviewers for their contributions. We would also like to thank the editorial team of Springer for their support throughout the compilation of both volumes of this handbook. We sincerely hope that the research and practical studies covered in both volumes can help instil new ideas and plans for researchers and practitioners to work together, as well as to further advance research and application of AI and related methodologies for the benefits of health and well-being of humans. Waurn Ponds, Australia Shiga, Japan Adelaide, Australia Nagpur, India Shoreham-by-Sea, UK May 2021
Chee-Peng Lim Yen-Wei Chen Ashlesha Vaidya Charu Mahorkar Lakhmi C. Jain
Contents
Part I 1
2
Practicalities of AI Methodologies in Healthcare
Intelligent Paradigms for Diagnosis, Prediction and Control in Healthcare Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adriana Albu, Radu-Emil Precup, and Teodor-Adrian Teban 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Relevant References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Medical Decision-Making Based on Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Skin Diseases Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Hepatitis C Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Coronary Heart Disease Prediction . . . . . . . . . . . . . . . . . . 1.4 Medical Image Analysis Using Artificial Neural Networks . . . . . 1.5 Artificial Neural Networks Versus Naïve Bayesian Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.1 Hepatitis B Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5.2 Stroke Risk Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Prosthetic Hand Myoelectric-Based Modeling and Control Using Evolving Fuzzy Models and Fuzzy Control . . . . . . . . . . . . . 1.6.1 Evolving Fuzzy Modeling Results . . . . . . . . . . . . . . . . . . . 1.6.2 Fuzzy Control Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Artificial Intelligence in Healthcare Practice: How to Tackle the “Human” Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stefano Triberti, Ilaria Durosini, Davide La Torre, Valeria Sebri, Lucrezia Savioni, and Gabriella Pravettoni 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 AI in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 A “third Wheel” Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 “Confusion of the Tongues” . . . . . . . . . . . . . . . . . . . . . . . .
3 4 9 12 12 14 16 18 21 22 25 27 28 32 35 35 43
44 46 48 50 ix
x
Contents
2.3.2 Decision Paralysis and Risk of Delay . . . . . . . . . . . . . . . . 2.3.3 Role Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 An Interface for AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Identifying Personnel to Work with AI . . . . . . . . . . . . . . . . . . . . . . 2.6 Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
4
A Statistical Analysis Handbook for Validating Artificial Intelligence Techniques Applied in Healthcare . . . . . . . . . . . . . . . . . . . Smaranda Belciug 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Contingency Tables or Cross-Tabulation . . . . . . . . . . . . . 3.2.2 Odds Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Pearson’sχ 2 Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Normality Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Kolmogorov–Smirnov Goodness of Fit (K-S) Test . . . . . 3.3.2 Lilliefors Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.3 Shapiro Wilk W Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Statistical Benchmarking Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 T-test or Student’s T-test . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 T-test for Two Independent Groups of Observations . . . . 3.4.3 Equality of Variances: Levene’s Test . . . . . . . . . . . . . . . . . 3.4.4 Equality of Variances: Bartlett’s Test . . . . . . . . . . . . . . . . 3.4.5 Mann–Whitney Test or Mann–Whitney Wilcoxon Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.6 One Way ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.7 Tukey’s Honest Significant Difference Test . . . . . . . . . . . 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Designing Meaningful, Beneficial and Positive Human Robot Interactions with Older Adults for Increased Wellbeing During Care Activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sonja Pedell, Kathy Constantin, Diego Muñoz, and Leon Sterling 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Social Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 The Nao Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 The Need for Meaningful Activities and a Holistic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Method: Learning from HCI Approaches for Exploring Social HRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Situated Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Participatory Design and Mutual Learning . . . . . . . . . . . . 4.3.3 Technology Probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51 51 52 55 56 57 58 61 62 63 66 67 68 69 69 69 70 72 73 73 75 76 77 78 79 82 82
85 86 87 87 88 89 89 90 90
Contents
4.3.4 Motivational Goal Models and Technology Probes . . . . . 4.3.5 Understanding Emotions . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.6 Iterative Visits in the Field and Data Collection . . . . . . . 4.4 Four Case Studies Using the Nao in the Field . . . . . . . . . . . . . . . . . 4.4.1 Preparing Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 Interaction stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.4 Case Study 1: Active Ageing Knitting Group . . . . . . . . . 4.4.5 Case Study 2: Dementia Respite Care as Part of the Active Ageing Program . . . . . . . . . . . . . . . . . . . . . . 4.4.6 Case Study 3: Men’s Shed . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.7 Case Study 4: Residential Care . . . . . . . . . . . . . . . . . . . . . 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Creating a Basis Through Humor and Turning Initial Negative Emotions into Positive . . . . . . . . . . . . . . . 4.5.2 Increasing Wellbeing Through Activity and Application of Skills . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.3 Situated AI for Human Robot Interactions . . . . . . . . . . . . 4.5.4 Designing Social Interactions . . . . . . . . . . . . . . . . . . . . . . . 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Wearable Accelerometers in Cancer Patients . . . . . . . . . . . . . . . . . . . . Seema Dadhania and Matthew Williams 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 The Cancer Patient and Outcome Measures . . . . . . . . . . . . . . . . . . 5.2.1 Measuring Physical Activity . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Measuring Physical Activity in the Cancer Patient . . . . . 5.3 Harnessing Wearable Technology in Oncology . . . . . . . . . . . . . . . 5.3.1 What Can Wearable Technology Be Used to Measure in Oncology, and Why Are These Parameters Relevant? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Accelerometers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Challenges with Wearable Accelerometer Data . . . . . . . . 5.5 Real-World Experience of Running a Digital Health Study . . . . . 5.5.1 Device Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Successes and Challenges of Running a Real-World Wearable Accelerometer Study . . . . . . . . . 5.6 Clinical Studies in Cancer Patients Using Wearable Accelerometers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Ethical Issues with Wearable Accelerometer Data . . . . . . . . . . . . . 5.7.1 Data Privacy and Security . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.2 Data Ownership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.3 Insurance Premiums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
91 91 92 92 92 93 93 93 95 99 99 100 101 102 103 103 104 104 109 109 111 111 112 114
115 116 118 122 122 125 128 135 135 135 136 136
xii
Contents
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6
7
8
Online Application of a Home-Administered Parent-Mediated Program for Children with ASD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Margarita Stankova, Tsveta Kamenski, Polina Mihova, and Todor Datchev 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Conceptual Framework and Aims of the Program . . . . . . . . . . . . . 6.2.1 Behavioral Model of Communicative Failure . . . . . . . . . 6.2.2 Structure of the Program . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Technical Description and Parameters of the Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.4 Technical Specifications of the System . . . . . . . . . . . . . . . 6.3 Pilot Testing of the Program—Qualitative Analysis . . . . . . . . . . . 6.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Explainable AI, But Explainable to Whom? An Exploratory Case Study of xAI in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julie Gerlings, Millie Søndergaard Jensen, and Arisa Shollo 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Adoption and Use of AI in Healthcare . . . . . . . . . . . . . . . 7.2.2 Drivers for xAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Emergence of xAI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.4 AI and xAI in the Fight Against the COVID-19 Pandemic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Case Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 Development Team . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 Subject Matter Expert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 Decision-Makers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.4 Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Discussion and Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . Appendix 1—Technical Aspects of LungX . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pandemic Spreading in Italy and Regional Policies: An Approach with Self-organizing Maps . . . . . . . . . . . . . . . . . . . . . . . . Marina Resta 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Related Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Data and Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
149
150 151 151 153 158 159 162 164 165 169 170 171 171 172 173 174 176 176 178 179 182 182 185 186 190 191 193 195 199 199 200 202
Contents
xiii
8.4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Biases in Assigning Emotions in Patients Due to Multicultural Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David Casacuberta and Jordi Vallverdú 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 The Non-Universality of Emotions . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Emotions in Medical Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Machine Learning, Data, Emotions, and Diagnosis . . . . . . . . . . . . 9.4.1 What is Affective Computing? . . . . . . . . . . . . . . . . . . . . . . 9.4.2 Data for Automatic Emotion Detection . . . . . . . . . . . . . . . 9.4.3 Developing the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Correcting Data Biases in Medical Diagnosis . . . . . . . . . . . . . . . . . 9.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Part II
205 207 211 212 215 215 216 217 219 219 220 221 222 224 225
Prospects of AI Methodologies in Healthcare
10 Artificial Intelligence in Healthcare: Directions of Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hoc Group on Application of AI Technologies in Health Informatics (AHG2 TC215 ISO) 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Definition of Artificial Intelligence (AI) . . . . . . . . . . . . . . . . . . . . . 10.3 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 AI Features and Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Problems and Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 AI Systems in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 Quality and Safety of AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8 Standardization of AI in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . 10.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
231
231 232 233 234 239 244 248 252 255 256
11 Development of Artificial Intelligence in Healthcare in Russia . . . . . 259 A. Gusev, S. Morozov, G. Lebedev, A. Vladzymyrskyy, V. Zinchenko, D. Sharova, E. Akhmad, D. Shutov, R. Reshetnikov, K. Sergunova, S. Izraylit, E. Meshkova, M. Natenzon, and A. Ignatev 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 11.1.1 National Strategy for AI in Healthcare of the Russian Federation . . . . . . . . . . . . . . . . . . . . . . . . . . 261
xiv
Contents
11.1.2 The Work of Government Agencies and the Expert Community on the Development of AI in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 AI Regulations in Healthcare of the Russian Federation . . . . . . . . 11.2.1 Basic Principles of Regulations in Healthcare . . . . . . . . . 11.2.2 Technical and Clinical Trials of Software as a Medical Device Created with the Application of AI Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.3 State Registration of Software as a Medical Device Created with the Application of AI Technologies . . . . . . 11.2.4 Post-registration Monitoring of Software as a Medicaldevice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Technical Regulations of Artificial Intelligence in the Russian Federation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Practical Experience of Artificial Intelligence in Healthcare of the Russian Federation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Robotics in Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dmitrii Kolpashchikov, Olga Gerget, and Roman Meshcheryakov 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Surgical Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 Computer-Assisted Surgery . . . . . . . . . . . . . . . . . . . . . . . . 12.2.2 Mechanical Design and Control . . . . . . . . . . . . . . . . . . . . . 12.2.3 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Rehabilitation Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.1 Contact Therapy Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.2 Assistive Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.3 Non-Contact Therapy Robots and Socially Assistive Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Non-Medical Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Smart Healthcare, IoT and Machine Learning: A Complete Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Valerio Bellandi, Paolo Ceravolo, Ernesto Damiani, and Stefano Siccardi 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Architecture and Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 Research Questions and Methodology Adopted . . . . . . . 13.3 The General Picture of Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.1 Architectures for the Local Integration Level—The Edge Level . . . . . . . . . . . . . . . . . . . . . . . . . . . .
262 267 267
269 271 271 272 273 276 276 281 281 283 283 284 286 287 287 289 292 293 294 296 297 307
307 309 310 311 312
Contents
13.3.2 Task Allocation and Resource Management—The Fog Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.3 Global Integration of Tasks and Resources—The Cloud Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.4 Algorithms and Data Analytics . . . . . . . . . . . . . . . . . . . . . 13.3.5 Architectural Configurations . . . . . . . . . . . . . . . . . . . . . . . 13.4 Data Pipeline and Data Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Digital Business Models in the Healthcare Industry . . . . . . . . . . . . . . . Nathalie Hoppe, Felix Häfner, and Ralf Härting 14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Role of the Healthcare Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Current Trends of Digitalization in Healthcare . . . . . . . . . . . . . . . . 14.4 Potential Benefits of Digital Business Models in the Healthcare Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.1 Research Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.2 Industry-Dependent Determinants of Digitalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.3 Digital Technologies Along the Care Pathway . . . . . . . . . 14.4.4 Challenges of Digitalization in Healthcare . . . . . . . . . . . . 14.4.5 Study Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4.6 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.6 Outlook: The Role of AI in Healthcare . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Advances in XAI: Explanation Interfaces in Healthcare . . . . . . . . . . . Cristina Manresa-Yee, Maria Francesca Roig-Maimó, Silvia Ramis, and Ramon Mas-Sansó 15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.1 Prediction Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.2 Diagnosis Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.3 Automated Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xv
312 313 314 315 319 322 323 331 332 332 333 335 335 336 337 339 340 346 348 349 350 357
358 359 361 361 362 363 364 365 366
16 Medical Knowledge Graphs in the Discovery of Future Research Collaborations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Nikolaos Giarelis, Nikos Kanakaris, and Nikos Karacapilidis 16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 16.2 Background Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
xvi
Contents
16.2.1 Graph Measures and Indices . . . . . . . . . . . . . . . . . . . . . . . . 16.2.2 Graph-Based Text Representations . . . . . . . . . . . . . . . . . . 16.2.3 Graph-Based Feature Selection . . . . . . . . . . . . . . . . . . . . . 16.2.4 Graph-Based Text Categorization . . . . . . . . . . . . . . . . . . . 16.2.5 Graph-Based Link Prediction . . . . . . . . . . . . . . . . . . . . . . . 16.3 The Proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.3.1 Graph-Based Text Representation . . . . . . . . . . . . . . . . . . . 16.3.2 Graph-Based Feature Selection . . . . . . . . . . . . . . . . . . . . . 16.3.3 Graph-Based Text Categorization . . . . . . . . . . . . . . . . . . . 16.3.4 Graph-Based Link Prediction . . . . . . . . . . . . . . . . . . . . . . . 16.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4.1 Cord-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Biometric System De-identification: Concepts, Applications, and Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Md. Shopon, A. S. M. Hossain Bari, Yajurv Bhatia, Pavan Karkekoppa Narayanaswamy, Sanjida Nasreen Tumpa, Brandon Sieu, and Marina Gavrilova 17.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 Literature Review and Classification of Biometric De-identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3 New Types of Biometric De-identification . . . . . . . . . . . . . . . . . . . 17.3.1 Sensor-Based Biometric De-identification . . . . . . . . . . . . 17.3.2 Emotion-Based De-identification . . . . . . . . . . . . . . . . . . . . 17.3.3 Social Behavioral Biometrics-Based De-identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.3.4 Psychological Traits-Based De-identification . . . . . . . . . 17.3.5 Aesthetic-Based Biometric De-identification . . . . . . . . . . 17.4 Multi-Modal De-identification System . . . . . . . . . . . . . . . . . . . . . . 17.4.1 Definition and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 17.4.2 Deep Learning Architecture . . . . . . . . . . . . . . . . . . . . . . . . 17.4.3 Multi-Modal De-identification Methodology . . . . . . . . . . 17.4.4 Potential Applications of Multi-Modal Biometric De-identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5 Potential Applications in Risk Assessment and Public Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6 Open Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6.1 Open Problems of Sensor-Based Biometric De-identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6.2 Open Problems of Gait and Gesture De-identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
373 374 374 375 375 376 376 377 378 378 382 382 382 384 385 388 393
393 394 397 397 400 403 405 408 409 409 410 411 412 413 415 415 416
Contents
17.6.3 Open Problems of Emotion-Based De-identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6.4 Open Problems of Social Behavioral De-identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6.5 Open Problems of Psychological Traits-Based De-identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6.6 Open Problems of Aesthetic-Based Biometric De-identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.6.7 Open Problems of Multi-Modal De-identification . . . . . . 17.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xvii
416 416 417 417 417 418 418
Part I
Practicalities of AI Methodologies in Healthcare
Chapter 1
Intelligent Paradigms for Diagnosis, Prediction and Control in Healthcare Applications Adriana Albu, Radu-Emil Precup, and Teodor-Adrian Teban
Abstract Decision-making and control in healthcare applications are essential activities, which imply a large number of medical and technical aspects, and the nonlinearity of systems specific to these applications makes them challenging. In addition, these activities involve humans: on the one hand, the patient, who has a medical problem and who requires the best solution; on the other hand, the physician, who should be able to provide, in any circumstances, a decision or a prediction regarding the current and the future medical status of a patient. The technology, in general, and particularly the artificial intelligence tools could help both of them, and it is assisted by appropriate theory regarding modeling tools. Diagnosis, modeling, prediction and control represent the mechanisms that support theoretically the healthcare applications as far as decision-making is involved. Two of the most powerful intelligent paradigms that are successfully used in this field are artificial neural networks and fuzzy logic, with their corresponding models. This chapter presents several applications developed by the Process Control Group of the Politehnica University Timisoara, Romania and emphasizes that these techniques, which produce intelligent models, even if they are artificial, are able to make decisions and to control various processes, being useful tools for prevention, early detection and personalized healthcare. Keywords Artificial neural networks · Fuzzy control · Fuzzy models · Medical diagnosis · Medical prediction · Prosthetic hands · Recurrent neural networks
A. Albu · R.-E. Precup (B) · T.-A. Teban Department of Automation and Applied Informatics, Politehnica University Timisoara, Bd. V. Parvan 2, 300223 Timisoara, Romania e-mail: [email protected] A. Albu e-mail: [email protected] T.-A. Teban e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_1
3
4
A. Albu et al.
1.1 Introduction Decision-making and control in medical domain aim to improve the healthcare system and to help physicians, assisting them and offering suggestions or a second opinion [1]. Medical processes can be modeled by the intelligent paradigms available in this field, bringing relevant benefits for the patients and also for the physicians. In addition, some medical conditions are hardly detected by humans; therefore, a suggestion or an alert that is provided by an automated system could support the physicians. Not least, there is a large number of simple and routine activities that are time consuming and that overload medical staff; these could be easily performed by a machine. Artificial Intelligence (AI) domain provides a series of techniques, approaches and algorithms that can be used to solve problems which require an intelligent behavior. In order to do this, it was necessary to understand the way humans think and act, and then to build intelligent tools able to reproduce humans’ functionalities. A key property of the human brain activity is learning. For this reason, one of the most popular topics related to AI domain is machine learning. Although, during the last years, have been developed different kinds of systems that have this property, the Artificial Neural Networks (ANNs) are still one of the most common and efficient forms of learning systems [2]. An essential element in this context is connected to the vague and imprecise information, which can be represented, recognized or interpreted using fuzzy models. ANNs are able to acquire, to store and to use experiential knowledge [2], practically learning by ordinary experience, as humans do [3]. These features make them suitable for medical decision-making. They follow the structure of the human brain, using a simplified architecture, which is made of basic processing units (artificial neurons), interconnected and working in parallel [1]. According to [2, 3], ANNs are able to execute parallel distributed computation, to tolerate noisy inputs, to learn new associations, new patterns and new functional dependencies. Therefore, an ANN is a collection of neurons. They are connected by weighted links. Figure 1.1 describes a simplified mathematical model of a neuron, as it is presented in [2]. It has a number of weighted inputs ai , which are used by an activation Fig. 1.1 The simplified mathematical model of a neuron j
1 Intelligent Paradigms for Diagnosis, Prediction …
5
function f . If a linear combination of the inputs exceeds an established threshold, then the neuron j will be activated or, in other words, it is fired. The weighted link wij from the neuron i to the neuron j is next applied to propagate the input ai from i to j. The output of a neuron j is determined in two steps [2]. First, the sum of its weighted inputs is computed in terms of the linear combination of inputs in j =
n
wi j ai .
(1.1)
i=0
Second, the activation function f is applied to this sum in order to calculate the output aj a j = f (in j ) = f
n
wi j ai .
(1.2)
i=0
Equations (1.1) and (1.2) and Fig. 1.1 highlight the presence of the bias w0j applied to the neuron as an additional constant input a0 , with the value a0 = 1. This can be useful in various applications as it affects the nonlinear input–output map of the neuron given in (1.2). The weights and the bias are parameters of ANN and they should be trained (learned) by learning algorithms, which actually iteratively solve optimization problems targeting the minimization of cost functions that depend on modeling errors and are parameterized appropriately. Having the model of a neuron, the ANN can be created, connecting together as many neurons as necessary. The behavior and the properties of the network are determined by its architecture and by the properties of its neurons. One of the fundamental ways of connecting neurons is the feed-forward network, with connections only in one direction (from input to output), forming a directed acyclic graph [2]. The neurons of such a network are organized in layers. The network has at least one layer, the output one, which receives directly the network’s inputs and produces the output. According to the problem that is solved, the network can have a single output unit or multiple outputs (for instance, if the network is used for classification in more than two classes). Additionally, a feed-forward neural network usually has one or more layers of hidden neurons, layers that are placed between the inputs and the output layer. Establishing the architecture of the ANN, particularly the number of hidden layers, is an aspect that requires a special attention. For simple problems, it was empirically determined that a single hidden layer of neurons is enough. For more complex problems, the structure of the ANN can be defined using clustering techniques able to avoid structures that are too complicated (regarding the number of hidden layers) [4]. Figure 1.2 illustrates a general example of a feed-forward neural network with a single hidden layer. Most of the applications presented in this chapter make use of ANNs that follow this structure.
6
A. Albu et al.
Fig. 1.2 A feed-forward neural network with one hidden layer
Fuzzy logic and control can be considered as built upon human experience. They also represent a tool to express nonlinear input–output maps and ensure nonlinear decision-making and control. Fuzzy logic and control operate by expressing the model and control requirements and elaborating the decision and the control signal in terms of IF–THEN rules that belong to the set of rules ... IF (antecedent) THEN (consequent), ...
(1.3)
where the antecedent (the premise) refers to the situation concerning the system or process dynamics (that is usually compared to the desired/imposed dynamics), and the consequent (conclusion) refers to the decisions that should be made and often organized as the control signal u, in order to fulfil the desired dynamics. The set of rules of the form given in (1.3) is the rule base of the fuzzy model or the fuzzy controller. Using the fuzzy control system structure given in [5] and extracting the fuzzy controller, the block diagram of principle of a fuzzy model is presented in Fig. 1.3 focusing on fuzzy control. Figure 1.3 also highlights the operation principle of a fuzzy model in its classical version, characterizing Mamdani fuzzy models, with the following variables and modules: (1) the crisp inputs, (2) the fuzzification module, (3) the fuzzified inputs, (4) the inference module, (5) the fuzzy conclusions, (6) the defuzzification module, (7) the crisp output. An essential feature of fuzzy control systems pointed out in [5] concerns the multiple interactions from the process to the controller expressed by auxiliary variables of interest that belong to the vector ya . The input vector of the fuzzy model or fuzzy controller is e e = [ e yaT ] = [ e1 e2 . . . en ]T ,
(1.4)
1 Intelligent Paradigms for Diagnosis, Prediction …
7
Fig. 1.3 Basic fuzzy model structure also expressed as fuzzy controller structure in terms of [5]
where the superscript T indicates matrix transposition, e is the control error and also the first input e1 , i.e. the first element in e e1 = e = r − y,
(1.5)
r is the reference input (the set-point) and y is the controlled output. According to Fig. 1.3, the operation principle of a Mamdani fuzzy model or fuzzy controller involves the following sequence of operations [5]: • The crisp input information is converted into a fuzzy representation. This operation is called fuzzification of crisp information. • The fuzzified information is processed using the rule base, composed of the fuzzy IF–THEN rules referred to as fuzzy rules or fuzzy control rules of type (1.3), which must be well defined in order to model or control the given system (process). The principles to evaluate and process the rule base represent the inference mechanism / engine and the result is the fuzzy expression of u, namely the fuzzy decision or the fuzzy control signal. • The fuzzy decision or the fuzzy control signal must be converted into a crisp formulation, with well-specified physical nature, directly understandable and usable by the user or the actuator in order to be capable of using the fuzzy model information or controlling the process. This operation is known under the name of defuzzification. These three operations characterize three modules in the structure of a fuzzy model or a fuzzy controller (Fig. 1.3): the fuzzification module (2), the inference module (4) and the defuzzification module (6). All three modules are assisted by adequate databases. Type-2 fuzzy logic benefits from the additional parameterization in handling the uncertainties specific to modeling and control. Type-2 fuzzy logic requires an additional module in the structure given in Fig. 1.3, referred to as type reducer,
8
A. Albu et al.
which transforms type-2 fuzzy sets into type-1 ones. Type reducer algorithms and applications of type-2 fuzzy logic and control are discussed in [6–14]. Two types of fuzzy models and controllers are widely used in practice: • Mamdani fuzzy models or controllers, also called linguistic fuzzy models and controllers, with either fuzzy consequents, being type-I fuzzy systems according to the classifications conducted in [15, 16] or singleton consequents, which are type-II fuzzy systems. As shown in [5], Mamdani fuzzy controllers are usually used as direct closed-loop controllers. • Takagi–Sugeno-Kang (or Takagi–Sugeno or Sugeno) fuzzy models or controllers, which are also known as type-III fuzzy systems in terms of [15, 16], especially if affine consequents are employed. These fuzzy controllers are typically used as supervisory controllers. The expression of the rule i of Takagi–Sugeno-Kang fuzzy models that makes use of affine consequents is [17] Rule i : IF z 1 IS L Ti1 AND . . . AND z n IS L Tin THEN y i = ai0 + ai1 z 1 + · · · + ain z n , i = 1 . . . n R ,
(1.6)
where z j , j = 1 . . . n, are the input and also scheduling variables, grouped in the input vector z z = [z 1 z 2 . . . z n ]T ∈ n ,
(1.7)
n is the number of input variables, L Ti j , i = 1 . . . n R , j = 1 . . . n, are the input linguistic terms, y i is the output of ith local model in the rule consequent of rule i, i = 1 . . . n R , and aiχ , i = 1 . . . n R , χ = 0 . . . n, are the parameters specific to the rule consequents [17, 18]. Accepting that the fuzzy model structure makes use of the algebraic product t-norm as an AND operator in the inference engine and the weighted average defuzzification method, the expression of the fuzzy model output y is [17–19] n R nR τi y i = λi y i , y = i=1 nR i=1 τi i=1
τi y i = [ 1 zT ]πi , λi = n R
i=1 τi
, i = 1 . . . nR, (1.8)
where all variables depend on z, which is omitted for the sake of simplicity, τi (z) is the firing degree of rule i [17–19]
1 Intelligent Paradigms for Diagnosis, Prediction …
9
τi (z) = AND(μi1 (z 1 ), μi2 (z 2 ), . . . , μin (z n )) = μi1 (z 1 ) · μi2 (z 2 ), . . . , μin (z n ), i = 1 . . . n R ,
(1.9)
and λi (z) is the normalized firing degree. The expression of the rule i in Mamdani fuzzy models is given as follows, similar to that given in (1.6) for Takagi–Sugeno fuzzy models Rule i : IF z 1 IS L Ti1 AND . . . AND z n IS L Tin THEN y i IS L Tio , i = 1 . . . n R , (1.10) with the same antecedent but a different consequent, where the output linguistic terms L Ti o , i = 1 . . . n R , are involved. Even if the literature contains an impressive number of papers and books that describe different ways of using ANNs and fuzzy models, there is enough room for further developments. Technology becomes faster and faster. Biological neurons are switching at speeds that are million times slower than a computer gate [3]. Nevertheless, humans (and even animals) are more efficient in speech recognition and visual information processing than the fastest computer. Therefore, the research in this field is still a challenge, as long as the understanding of human neural system is not completed, yet [1]. This chapter describes, along several sections, the research results obtained by the Process Control Group of the Politehnica University Timisoara, Romania, in the field of intelligent paradigms developed for practical problems specific to medical domain. It continues the work underlined in [1], adding some new results. This chapter is structured as follows: the relevant references are discussed in the next section. Medical decision-making based on ANNs is treated in Sect. 1.3 and several applications are included. Some results on medical image analysis using ANNs are outlined in Sect. 1.4. A discussion on ANNs versus Naïve Bayesian classifier is given in Sect. 1.5 along with two representative applications. Aspects concerning the prosthetic hand myoelectric-based modeling and control using evolving fuzzy models and fuzzy control are presented in Sect. 1.6 and a sample of authors’ recent results is illustrated. The conclusions are highlighted in Sect. 1.7.
1.2 Relevant References As stated in [1], a quick look at recent research papers shows that the influence of AI on human life is continuously increasing. Medicine is part of this trend, suggestively illustrated in [20–25], and it could incorporate in the future much more applications based on intelligent paradigms. ANNs, which are machine learning models inspired from the architecture of the human brain, have been extensively used in various applications developed for medical field. Several examples are hereby presented.
10
A. Albu et al.
In [20], the authors are using ANNs to predict acute rejection in liver transplant patients. The novelty and the advantage of their method is that it uses routine laboratory data only, being non-invasive (while conventional tests need biopsy, which is an invasive procedure). The efficiency of ANNs is also proved in [21], where the authors present an overview regarding the use of ANNs in lung cancer research. They also underline that even if the literature shows that these tools are suitable for clinical decision support, a strict cooperation between physician and biostatistician is mandatory in order to avoid inaccurate use of ANNs. Another recent example is provided in [22], where ANNs are used to analyze stomach images, classifying them as normal, benign or malign. The initial images are processed in order to reduce their dimension. The accuracy of classification results places this method above others, according to the authors’ opinion. Medical images are intensively used together with neural networks in decision making process. For instance, an automated classification of skin lesions is performed by deep convolutional neural networks (CNNs) that analyze images in [23]. The results demonstrate that such tools are able to classify skin cancer with an accuracy comparable to dermatologists. The use of machine learning in stroke imaging is emphasized in [24]. The authors focus on technical principles, applications and perspectives, stating that these techniques may play an important role in setting the adequate therapeutic method or in predicting the prognosis for stroke patients. A relevant study is provided in [25], where the authors forecast how medicine (particularly cardiovascular medicine) will incorporate AI in the future. They consider the most common machine learning techniques, including CNNs and recurrent neural networks (RNNs). When it comes to classifying, several algorithms are available as ANN is just one of many other intelligent paradigms that are part of the AI domain. The most suitable one has to be chosen, according to the problem that must be solved. There are numerous aspects that should be considered when an algorithm has to be selected for the implementation of such a system [26]. Some of them refer to: the input data (its initial form, what type of data is required by each algorithm, how difficult is to process data, whether the values are discrete or continuous, what constraints must be accomplished, etc.), the expected output (its desired aspect, its meaning, etc.), the resource consumption (execution time, memory, etc.). Even if in the last years, the most frequently used method for classification problems is ANN, the Naïve Bayesian Classifier (NBC) should not be ignored. As stated in [26, 27], the comparison between them is debated by researchers for more than two decades [28–35]. One of the sections of this chapter describes two medical applications that provide predictions based on both ANNs and NBC, therefore, a literature overview dedicated to this subject is required. Some papers analyze the performances of ANNs and NBC used in parallel, for the same purpose. Others combine them into a single entity, which benefits of the advantages provided by both methods. It is worth to mention several results presented
1 Intelligent Paradigms for Diagnosis, Prediction …
11
in the literature, even if some of these papers have been written more than twenty years ago. Actually, this demonstrates a long-standing concern for this subject [27]. As presented in [26], the ANNs have been the target of the researchers for many years. Thus, in 1996, a group of technical and medical researchers from Israel showed that these classifiers have similar performances [28] and that each one can be improved by different strategies. About ten years later (more precisely, in 2007), a study implemented in China, on a database of 1069 cases, indicated that ANN is the best identifier [29], but also their results showed that the Bayesian model was not far away. The research continues, and in 2009, an experiment performed in Bangladesh showed that NBC mostly outperforms ANNs learning algorithms [30]. In the same year, a team from United Kingdom compared naïve Bayes, decision trees and neural networks and demonstrated that NBC is the best choice for their purpose [31]. Nowadays, this issue is still in discussion. In 2016, for instance, two researchers from Brazil concluded that neural networks had excellent results and Bayes requires further investigation [32]. As shown in [27], there are researchers that have used these two classifiers together, making use of the benefits provided by each of them. For instance, in [33], the posterior probabilities provided by NBC model are trained with an ANN for the final prediction. Brest cancer is detected in [34] using Bayes algorithm for probability of identification and neural networks for classification. Another example is provided by [35], where an ANN is used for predictions and a discrete Bayes network for interpretation. As discussed in [17, 18], artificial intelligence techniques including neural networks and fuzzy logic are employed to model the representative medical system specific to the nonlinear finger dynamics of the human hand aiming the myoelectricbased control of prosthetic hands. Such recent applications include the ANN-based control of robotic hands [36], the prediction of muscle force using wavelet neural networks [37], the prediction of handgrip force from myoelectric signals by extreme learning machines [38], the classification of ME signals by neural network tree combined with maximal Lyapunov exponent [39], the combination of visual servoing control and ANN learning [40], nonlinear autoregressive with exogenous inputs and variable structure recurrent dynamic ANNs using information from five or eight myoelectric sensors placed on human subject’s arm [1, 41–43], regression CNNs [44] and their inclusion in an adaptive auto-regressive filter algorithm based on adaptive infinite impulse response filtering theory to learn proportional velocity control [45], evolving fuzzy models [17–19, 43, 46, 47], Mamdani-type fuzzy controllers for myoelectric-based control [48, 49], and Takagi–Sugeno-Kang fuzzy controllers for myoelectric-based control [17]. Even if intelligent paradigms of AI domain have already been used in medicine, the physicians still need to analyze a multitude of variables and relationships between them in order to identify the medical status of a patient. Meanwhile, the Machine Learning (ML) domain provides a series of algorithms to represent data structures and to make predictions or classifications. Therefore, medicine is still a proper field for automated tools that may support decision-making process [50]. One of the representative results in this field is related to a relatively easily understandable Iterative
12
A. Albu et al.
Learning Control-based approach to batch training feed-forward ANN architectures, which is briefly described in [1] in the context of its potential to be implemented in healthcare applications. The successful applications with appropriate performance proved by Model-Free Control (MFC) [51, 52], justify to consider, as stated in [53], that MFC is a new tool for ML. The potential of MFC as an efficient tool for ML is proved in an illustrative application that deals with the control of finger dynamics in prosthetic hand myoelectric-based control systems [54], and other mechatronics applications of fuzzy control in combination with MFC [55], Model-Free Adaptive Control (MFAC) [56], Active Disturbance Rejection Control (ADRC) [57], and Virtual Reference Feedback Tuning (VRFT) [58–60].
1.3 Medical Decision-Making Based on Artificial Neural Networks The ANNs offer support in decision-making process, so that the physicians can benefit of a faster diagnosis or of specific predictions for diseases with various and confusing symptoms. The current section describes several examples which prove that these intelligent paradigms are suitable for a large scale of medical applications.
1.3.1 Skin Diseases Diagnosis This example, also described in [1, 50, 61], is an ANN created and trained to suggest a diagnosis regarding skin diseases from erythemato-squamous class. It makes the difference between six such diseases: psoriasis, seborrheic dermatitis, lichen planus, pityriasis rosea, chronic dermatitis, and pityriasis rubra pilaris. Setting a diagnosis for these diseases is difficult and sometimes inaccurate because the patients have multiple and vague symptoms. A patient suspected by an erythemato-squamous disease is usually described by eleven clinical features that are evaluated when the diagnosis is set [61]. These features are enumerated in the left column of the graphical user interface provided in Fig. 1.4. Sometimes, these features are enough to detect the erythemato-squamous disease. But in most of the cases, a biopsy is also required for a correct diagnosis. This study evaluated 22 histopathological features that can be extracted from skin samples [61]. They are listed in the last two columns in Fig. 1.4. From medical point of view, these six diseases have similar features. Therefore, the diagnosis is difficult for a dermatologist. ANNs have the ability to overcome these problems because they learn by examples and then they can generalize and identify patterns. Therefore, an automated diagnosis system can be a real help for the human expert.
1 Intelligent Paradigms for Diagnosis, Prediction …
13
Fig. 1.4 The graphical user interface for the skin diseases diagnosis system [1]
A feed-forward ANN with back-propagation learning algorithm has been created. As long as an ANN learns by examples, a database [62] of 366 patients with erythemato-squamous diseases has been used in [61]. Each patient has 33 symptoms and an already established diagnosis. Therefore, the input matrix of ANN has 366 × 33 elements, representing the symptoms of all the patients, and the target matrix has 366 × 6 elements, representing the diagnosis of all the patients. The ANN’s architecture also contains a hidden layer of 10 neurons. Once the ANN is trained on a representative set of data, it can be used to diagnose a new patient. The physician uses the interface provided by Fig. 1.4 [1] to enter the clinical and the histopathological features of his patient. Of course, these features could be imported from the patient’s record. Some of the features are binary values (1 – present, 0 – not present), others have an intensity degree between 0 (not present) and 3 (acute), with discrete intermediate values 1 and 2. Based on these features, the ANN generates the diagnosis. Table 1.1 contains several examples of inputs and outputs taken from the dataset [62] that has been used in ANN’s training. The superscript T in this table indicates matrix transposition. These examples emphasize that apparently similar features are defining different diseases, making the diagnosis process more difficult [1]. Regarding the performance of this system, it can be noticed that its precision (93.7%) reported in [1, 50, 61] is comparable to that of a human expert; but it is infinitely faster. However, there are two features of this system that could be improved. One is connected to the uncommon diseases, which are difficult to be identified by the system (because of the small number of examples in the training set). The other one refers to the fact that when two diagnoses are plausible, the system identifies just one of them, because the outputs are binary values and they are not described as a probability score. For instance, the results 1 and 0 associated to a
14 Table 1.1 Examples of records from the skin diseases database
A. Albu et al. Input vector
Diagnosis
[2 2 2 0 0 0 0 0 3 2 0 0 0 3 0 0 2 0 3 2 2 Psoriasis 2 2 0 0 3 0 0 0 0 0 3 0]T [2 2 0 3 0 0 0 0 1 0 0 0 0 0 0 3 2 0 0 0 0 Seborrheic dermatitis 0 0 0 0 0 0 3 0 0 0 1 0]T [2 2 2 3 2 2 0 2 0 0 0 3 2 0 0 0 2 1 1 0 0 Lichen planus 0 0 0 3 0 3 0 2 0 0 2 3]T [2 2 1 0 1 0 0 0 0 0 0 0 0 0 0 3 2 0 2 0 0 Pityriasis rosea 0 0 0 0 0 0 2 0 0 0 2 0]T [2 2 0 2 0 0 0 0 0 0 0 0 0 0 1 1 3 1 2 0 2 Chronic dermatitis 1 0 0 0 0 0 1 0 1 0 2 0]T [2 2 1 0 0 0 2 0 2 0 1 0 0 0 0 1 1 1 1 0 0 Pityriasis rubra pilaris 0 0 0 0 0 0 0 0 1 1 1 0]T
couple of plausible diseases do not say anything about the reasons that leaded to this conclusion. But if the user knows that behind 1 and 0 there is a 0.55 and a 0.45, then he will pay attention to that patient. Therefore, the research should continue in this field in order to find better solutions. One direction of future research is a hybrid expert system that combines the power of ANN with the sensitivity of other intelligent paradigms, e.g. probabilistic reasoning applied in Sect. 1.5.
1.3.2 Hepatitis C Predictions This is an example in the area of liver diseases and it was also described in [1, 50, 63]. The aim of the system that was developed is to make predictions regarding the evolution of a patient that is infected with hepatitis C virus, according to the treatment that could be administrated, in order to decide which is the most suitable one. Hepatitis C is a serious and frequent disease. There is no vaccine against its virus, the treatment is very expensive and, more important, this treatment is not always efficient, sometimes causing severe adverse effects. Three treatments are currently available for hepatitis C [63]: Simple InterFeron (IFN), Peg interferon α-2a, and Peg interferon α-2b. At the beginning of the treatment, for both physician and patient, it would be good to know which of these treatments will have benefits (if there is one), in order to avoid undesired side effects. The treatments for hepatitis C influence four biological indicators (TGP, TGO, GGT, and ARN VHC). The system described here is able to offer, for each of these four biological indicators, predictions regarding the evolution during the next 12 months, indicating its growing tendency, its stabilizing or its decreasing tendency. The physician could use these predictions in order to estimate the evolution of the patient during each possible treatment and to decide which is the most suitable treatment for a specific patient.
1 Intelligent Paradigms for Diagnosis, Prediction …
15
Fig. 1.5 The network of ANNs used for hepatitis C predictions [1]
This system is in fact a network of strongly interconnected ANNs (Fig. 1.5). There are four sections that provide predictions regarding the evolution of the biological indicators after 3, 6, 9, and 12 months of treatment. Inside each section there are four ANNs, one for each biological indicator. All the 16 ANNs are feed-forward neural networks and they are trained using back-propagation algorithm. Each ANN has 10 hidden neurons, an output neuron (which predicts the evolution tendency of that biological indicator after 3, 6, 9, or 12 months) and a variable number of inputs. The networks belonging to the first section (the ones that predict the evolution of each biological indicator after the first 3 months of treatment) receive as inputs: the patient’s age, the gender, the location where he/she lives (rural/urban), the treatment that is evaluated, the Knodell score, the hepatic fibrosis score and the value of the biological indicator for which the prediction is made, at the initial moment (before the treatment starts). The output of these networks is the value of that biological indicator after 3 months of treatment. The networks in the following sections have a similar structure, but they have as additional inputs the outputs of the networks in the previous sections (referring to the same biological indicator); therefore, the networks in the last section will have 10 inputs (the initial inputs and the values of biological indicators after 3, 6, and 9 months of treatment). The human expert enters, through a graphical user interface (Fig. 1.6), the five features of the patient, the initial values of biological indicators and chooses one of the three treatments. Then the system provides the predictions regarding the evolution of the biological indicators during that treatment. The 193 patients considered in this study have been observed along 12 months to establish the treatment’s influence on the evolution of the biological indicators. The anonymized information about these 193 patients (120 women and 73 men, with ages between 14 and 67 years) has been collected from Gastroenterology Department of the Emergency Clinical Hospital, Timisoara, Romania [1].
16
A. Albu et al.
Fig. 1.6 The graphical user interface for hepatitis C predictions system [1]
Regarding the accuracy of this system, each of the 16 ANNs has its own value. In fact, for each of them have been created and trained 500 neural networks with the same architecture, but different initial parameters. The network with the best accuracy was chosen to be used for predictions. The results are encouraging, most of ANNs having accuracies around 85% according to [1, 50, 63], as “acc” parameter of each ANN in Fig. 1.5 shows. It can be observed that the accuracy of all the networks from the first section is greater than 90% and that in most of the cases the accuracy is decreasing along the chain of networks. One reason could be the fact that not only the prediction is propagating through the network of ANNs, but also the error. The concern regarding the hepatitis C virus infection is still a major issue in medical domain. Even if the treatments for hepatitis C are continuously improved, becoming more and more efficient, the patient evolution during the treatment has to be carefully observed in order to react if something goes wrong. The prediction system presented here provides an overview for 12 months of treatment. This contains information that can be used by the physician in the decisional process of choosing the best treatment for a patient.
1.3.3 Coronary Heart Disease Prediction Even if medicine is following an exponential curve in scientific knowledge, there are still many cases when a disease is discovered too late. This section describes an artificial neural network that can help doctors with predictions regarding the coronary heart disease [64]. At the end of this study, the accuracy of the neural
1 Intelligent Paradigms for Diagnosis, Prediction …
17
network (84.73%) proves that it can be a reliable tool, able to support the physician’s decisions. The coronary heart disease is a medical condition that presents interest because the main cause of mortality in Europe is represented by the cardiovascular diseases, according to the European Heart Network report from 2017. Out of all cardiovascular diseases, coronary heart disease, which includes heart attack, is the cause of the highest number of deaths [65]. The risk prediction of coronary heart disease is made using an ANN with the following features: it is a feedforward, multilayer neural network, that has 15 inputs (symptomatic information and laboratory analysis), one hidden layer with 50 neurons where the sigmoid activation function is used, and an output represented by a single neuron (1—the patient is at risk, 0—no risks). In the training process there was used a supervised learning algorithm, Backpropagation Resilient, suitable for classification of data into various categories. The dataset [66] used to train the network contains 3640 records from the Framingham Heart Study [67]. Each patient from this dataset is described by 15 features specific to the coronary heart diseases, e.g. gender, age, hypertension, cholesterol, body mass index. The dataset was randomly divided in: training (70% of the data), validation (15%), and testing (15%) parts. During the training process of an ANN, the weights are adjusted to minimize the error, but the initial values of these weights are randomly set. In order to optimize the performances of the network, 500 networks that have different initial settings (regarding the weights) have been trained and tested. The one with the best accuracy has been saved and used further in the application. The user can interact with the application through the graphical user interface shown in Fig. 1.7 [64]. The application provides three main actions: • finding out the risk of coronary heart disease for a new patient (based on the features of the patient directly introduced through the interface or imported from an external file); • checking the accuracy of the network (how safe the neural network is in the predictions it makes); • re-training the network (not available for regular users). The performance of such a model can be analyzed by several metrics: accuracy, precision, recall, F1 score. For this problem, the accuracy is the most suitable one and it is 84.7328%. Before obtaining this accuracy, other algorithms were used to train the neural network such as: Levenberg–Marquardt Backpropagation algorithm (obtained accuracy of 72.8%) or Variable Learning Rate Backpropagation, but the most suitable for these multiple records was Backpropagation Resilient algorithm [64]. Therefore, this application is capable of predicting, with an accuracy of 84.7%, whether a person will have heart disease in the next 10 years or not. If the prediction indicates such a risk, the physician may advise that patient to give up various addictions or may recommend certain medications in order to significantly reduce the risks of developing coronary heart disease.
18
A. Albu et al.
Fig. 1.7 The graphical user interface for the coronary heart disease [64]
1.4 Medical Image Analysis Using Artificial Neural Networks Sometimes the symptomatic and analytic information presented by the patient are not relevant or are not enough to set a diagnosis. For specific diseases, the solution can be provided by medical images, emphasizing internal problems of the human body that cannot be found analyzing symptoms or laboratory tests results [1]. The liver represents a great challenge to the radiologists because of the difficulty to appreciate the morphological changes induced by the illness [63]. For this reason, an ANN was developed in order to suggest a diagnosis regarding the liver diseases that can be detected analyzing information extracted from medical images obtained by Computed Tomography (CT). The system is able to identify three hepatic diseases: hepatomegaly, steatosis and tumors. Figure 1.8 [1] illustrates three sequential slices from abdominal tomography of patients with these diseases, together with a healthy
1 Intelligent Paradigms for Diagnosis, Prediction …
19
Fig. 1.8 Slices of abdominal tomography: a healthy liver, b hepatomegaly, c steatosis, d tumors [1]
liver [63]. This approach leads to a relatively easily observation of the features of each disease. ANN is trained to distinguish between these four types of images. But, as can be seen in Fig. 1.8, a slice of an abdominal tomography contains, besides the liver, some other elements (parts of vertebral column and ribs, a portion of lung, etc.) that are not relevant for the diagnosis that is set. For this reason, a medical image is processed before it is used by such a system. The first step is segmentation, necessary to locate and to extract the liver from the initial image. Figure 1.9 [1] gives an image before and after segmentation process. Then, the processing continues, extracting some features that are relevant for liver’s diseases. One of these is texture, and Fig. 1.8 emphasizes the differences of texture between the four medical conditions of the liver. The texture features can be extracted using grey level spatial dependence matrices, also called co-occurrence matrices [63], which define the distribution of co-occurring pixel values for a specified offset. The offset is determined by an angle and a distance between pixels. The most frequently used angles are 0°, 45°, 90°, and 135°.
20
A. Albu et al.
Fig. 1.9 Segmentation: a an entire slice, b the extracted liver [1]
The co-occurrence matrices consist of the following elements, which are expressed in a computer programming-like notation (with arguments instead of subscripts) [1]: C0◦ ,d (i, j) = |{((k, l), (m, n)) ∈: k−m = 0, |l − n| = d, I (k, l) = i, I (m, n) = j}|, (1.11) Cs 45◦ ,d (i, j) = |{((k, l), (m, n)) ∈: (k−m = d, l−n = −d) OR (k−m = −d, l − n = d), I (k, l) = i, I (m, n) = j}|,
(1.12)
C90◦ ,d (i, j) = |{((k, l), (m, n)) ∈: |k−m| = d, l−n = 0, I (k, l) = i, I (m, n) = j}|,
(1.13)
C135◦ ,d (i, j) = |{((k, l), (m, n)) ∈ I : (k−m = d, l−n = d) OR (k−m = −d, l−n = −d), I (k, l) = i, I (m, n) = j}|,
(1.14)
where I is the image (in fact the matrix of gray level elements) and d is a distance between pixels. The co-occurrence matrices have (each one) 256 × 256 elements. The process of analyzing the image continues, extracting from these matrices a set of six texture features: energy, entropy, contrast, maximum element, inverse difference moment, and correlation; the equations that define them are given in [63]. These spatial grey tone co-occurrence texture features have been introduced by R. Haralick in the 1970s, and they are still successfully used in image texture analysis and classification. An image is now described by 24 elements (six texture features for each of the four matrices). These are the inputs of the ANN. It is a feed-forward ANN, with back-propagation training algorithm. It also has one hidden layer with 10 neurons and one output layer that suggests the diagnosis. As in the previous example, a single layer of hidden neurons was considered enough. This ANN has a fixed architecture, therefore, during the training process, the weights of the connections between neurons are modified. The training process has two phases [63]: a preliminary phase, where the parameters receive their initial
1 Intelligent Paradigms for Diagnosis, Prediction …
21
values, and a main phase, which is iterative, where the parameters are adjusted. The performance of an ANN depends not only on the way of modifying its weights, but also on their initial values. For this reason, in order to choose the best neural network, 500 ANNs with different initial values of the weights have been created and trained. Then, the ANN with the best accuracy has been used to provide the prediction regarding the medical condition of the liver. The training of this system was performed using anonymized abdominal CT images obtained from the “The Modeling Centre for Prosthesis and Surgical Interventions on the Human Skeleton” Multiple Users Research Base of the Politehnica University Timisoara. Liver diseases can be a real danger for patients’ lives because frequently, they have perceptible symptoms barely in advanced stages. An early detection using an automated diagnosis system that puts together the ability of ANNs and the accuracy of medical images might save lives [1].
1.5 Artificial Neural Networks Versus Naïve Bayesian Classifier The AI domain contains a multitude of intelligent paradigms. The results described in this section aim to provide arguments for choosing the most appropriate one for specific medical predictions. For this reason, artificial neural networks (ANNs) and Naïve Bayesian Classifier (NBC) are compared, being used to predict the evolution of a patient infected with hepatitis B virus in the first part of this section [1, 26, 50, 63], and to predict the risk of stroke in its second part [1, 27, 50, 68]. Both approaches developed on the basis of these paradigms provide reliable results and are suitable to assist physicians in their decisions. The background of ANNs has already been introduced in one of the previous sections of this chapter. In the following, the mathematical support for NBC is provided. NBC applies Bayes’ theorem, which is a formula with conditional probabilities that implements the probabilistic reasoning as illustrated in [69]. Its general form applied to medical decision-making is [2] p(Dk |S) =
p(S|Dk ) · p(Dk ) , k = 1 . . . m, p(S)
(1.15)
where m is the number of possible medical conditions evaluated in a given context. It should be mentioned that k in this section is not connected to k that will be used in Sect. 1.6. Substituting the elements of (1.15) according to the demonstration provided by [2], the Bayes’ rule is expressed as
22
A. Albu et al. n p(σi |Dk ) p(Dk ) · i=1 , p(Dk |S) = m n p(D j ) · p(σi |D j ) j=1
(1.16)
i=1
and it calculates, for a patient characterized by a set of symptoms S, the probability to have or to evolve to a specific medical condition Dk [26]. The value m in (1.16) represents the number of diseases, n is the number of symptoms, and σi is a symptom. The elements in (1.16) are calculated in terms of p(Dk ) =
(1.17)
p(σi , Dk ) , p(Dk )
(1.18)
car d {Dk ∩ σi } , car d
(1.19)
p(σi |Dk ) = p(σi , Dk ) =
car d Dk , car d
where represents the number of records from the dataset. Therefore, p(Dk ) is the ratio between the number of patients that have the disease Dk and the total number of patients, and p(σi , Dk ) is the number of patients that simultaneously have the disease Dk and the feature σi , divided by the total number of patients. When using Bayes’ theorem, some constraints should be applied to input and to output data [26, 27]. For each medical condition there is calculated a probability and this aspect leads to the idea that the possible outputs should be mutually exclusive (only one for a patient) and that each patient should certainly have one of these possible outputs. For the applications described here, this constraint is satisfied. It is obvious that, given a set of values that describe the current medical status of a patient, he can evolve to exactly one direction. Another constraint imposed by the Bayes’ theorem assumes that all symptoms or features that describe a patient are conditionally independent, given a disease. Most of the times this condition is not satisfied, but the theorem is still used (in this simplified assumption). For this reason, it is also called naïve Bayes model. According to [2], naïve Bayes systems work properly and have surprisingly good results, even with non-independent inputs.
1.5.1 Hepatitis B Predictions As mentioned in [26], when speaking about hepatitis B, the current clinical status of the patient is clearly determined: an infection with hepatitis B virus (HBsAg, antiHBc, anti-HBs are some markers that are analyzed and that indicate this aspect). But each patient has a specific evolution of this disease, according to some particular
1 Intelligent Paradigms for Diagnosis, Prediction …
23
Fig. 1.10 The graphical user interface for hepatitis B predictions [26]
features. There are six severity levels of hepatitis B: easy, medium, serious, prolonged, cholestatic, and comatose. The clinical evolution of a patient regarding these stages is hard to be determined. Nevertheless, the prognostic could be improved if the physician has, from the beginning, some predictions in this regard [63]. The application described here uses both NBC and ANN to make predictions regarding the evolution of patients infected with hepatitis B virus. The graphical user interface [26] for the entire system is provided in Fig. 1.10. Both mechanisms are included in the same window, in order to compare them easier. The left side panel allows the user to enter the features of a new patient or to upload them from an external file, if available. The results provided by the Bayesian classifier are presented in the right upper panel of the interface. There is a field for each of the six severity levels. The accuracy of this mechanism, 73.33%, is also printed. The right lower panel displays, in a similar manner, the results obtained using the ANN. The accuracy for this method is 80%. This part of the interface has an additional button, which trains 1000 ANNs, choosing the most accurate one. The Bayesian classifier requires binary values as inputs; therefore, some of the features that describe a patient are transformed. The values that have to be processed are split into several intervals that are established together with a physician. Then, the theorem is applied and the probabilities are calculated.
24
A. Albu et al.
The ANN that was created is a function fitting neural network (a specialized version of feedforward network). It is not necessary to process the input data in order to be used by this mechanism. The network receives as inputs directly the vector of features that define the patient’s medical status, which contains 22 symptoms. It also has a hidden layer of 10 neurons and an output layer that produces the result (a code defining one of the six severity levels of the disease). The Levenberg–Marquardt backpropagation algorithm has been used for training. For the neurons belonging to the hidden layer of this ANN, the activation function is hyperbolic tangent sigmoid. The neuron from the output layer has a linear activation function. There are several metrics that analyze the performances of such a model: accuracy, precision, recall, F1 score. For this system, the accuracy is the most suitable one, showing the number of correct predictions from all tested records. But the obtained accuracies (73.33%, respectively 80%) are not relevant enough to decide which mechanism is the most suitable one for this specific problem. A further discussion is necessary [26]. Both classifiers that have been implemented are using the same database. It contains anonymized information about 165 patients from Clinical Hospital of Infectious Diseases No. 4 “Victor BABES”, Timisoara. 150 of them are used for training and 15 for testing. For better results, the database could be improved in two directions: more records and a higher number of fields. Even if ANNs outperform Bayesian classifier in this example, the idea of using both of them should not be abandoned [26]. The advantage of Bayes’ theorem is that it provides probabilities associated to each severity level and the physician who uses the system can analyze the results before his final decision. For instance, 0, 0.45, 0.55, 0, 0, 0 offered by the Bayesian classifier shows an uncertain opinion regarding medium and serious evolution of the hepatitis B. Since the ANN says 0, 0, 1, 0, 0, 0, a result that is too categorical. This system, with both classifiers used simultaneously, could be useful if it generates consistent solutions. There are input values where both mechanisms offer the same prediction (as illustrated in the example given in Fig. 1.10), and this provides confidence. However, there are also features that produce different (even opposite) results. For instance, easy and medium levels of severity could be assimilated, but not easy and prolonged, results that are contradictory and cannot be accepted in medical predictions. This case, the physician should use other instruments to estimate the future medical status of the patient. The aim of this research was to use two of the most relevant mechanisms of AI domain, for solving the same problem, in order to compare their performances. How-ever, it is difficult to decide which one is better. Both methods are good classifiers, but their performances are strongly connected to the quality and quantity of processed data. A larger database would be necessary, but this could lead to an additional inconvenient: missing data. Fortunately, the ANNs are able to manage this aspect. On the other hand, naïve Bayes classifier is preferable, because it provides probabilities as result, and this makes it a reliable tool [26]. For now, a definite opinion regarding the most appropriate mechanism cannot be formulated. While results showed almost no distinction between the two methods,
1 Intelligent Paradigms for Diagnosis, Prediction …
25
the analysis needs to be continued. Further tests, on different datasets, are required, in order to come with stronger arguments.
1.5.2 Stroke Risk Prediction Stroke is a serious cause for long-term disability or death. As shown in [27], there are several risk factors connected to stroke, but it is difficult to determine what influence they have on vascular diseases. For this reason, the development of automated tools that are using AI techniques to provide predictions regarding vascular events and to identify high-risk patients is a necessity. The application described here is able to make predictions (based on the features of a patient) regarding the risk of a stroke, including the patient in one of the four risk categories. It is implemented, similar with the previous one, using both NBC and ANN and it is trying to draw some conclusions regarding the most suitable method for this type of applications. The features of the application are similar to those of the previously described system (the one for the hepatitis B). The Bayes’ theorem is applied in similar conditions, that require binary values as inputs. The ANN is a function fitting neural network, with one hidden layer of 10 neurons and one neuron on the output layer, providing the risk category: 1 for the lowest risk of stroke and 4 for the highest one. The hyperbolic tangent sigmoid activation function has been used for the neurons that belong to the hidden layer, while the output neuron has a linear activation function. The Levenberg–Marquardt backpropagation algorithm was used in the training process. As mentioned before, the initial values of the inputs’ weights of each neuron influence the performance of the ANN. But there is no rule that specifies which should be the proper values. For this reason, the current system created and trained 1000 ANNs. The most accurate of them has been saved and further used to make predictions [27]. The quality of the results provided by a classification model is strongly connected to the database that is analyzed. Both algorithms implemented within this system are using the same data. There are 108 records containing anonymized information about patients with vascular problems. All this data has been collected from Municipal Clinical Emergency Hospital of Timisoara, during 2015–2016 [68]. The dataset has been divided in two parts, required for such a system. From the 108 available records, 81 have been used for training and 27 have been reserved for testing [27]. Figure 1.11 illustrates the graphical user interface of this application [27]. In its left side the user introduces the features of a new patient (or uploads them from an external file). The other two panels of the interface are dedicated to NBC and ANN, respectively. As can be observed, for the considered dataset, both methods have the same accuracy: 88.89%. The Bayesian inference offers as results probabilities. Therefore, each risk category has associated a number in the range [0, 1]. The ANN provides a definite classification. A single risk category will have the value 1, the rest of them being 0. A probability is a valuable information, because this way, the human expert who is
26
A. Albu et al.
Fig. 1.11 The graphical user interface for stroke risk prediction [27]
using the system can evaluate the medical status of the patient. For instance, it could be a relevant difference, from a medical point of view, between a risk prediction of 0.95 and one of 0.55 provided by the NBC. The ANN will indicate, in both cases, that the prediction is 1 [27]. As already mentioned regarding the performance of this system, there are several metrics that can be used to evaluate it. Even if the accuracy is the most popular evaluation method, it is not always the best one. A model can be highly accurate in testing phase, especially if the dataset has a dominant category, but, when the model is used for real predictions, it could provide incorrect results for particular cases. This is the so-called accuracy paradox, and for this reason some other metrics should be used in parallel to evaluate the performances of a predictive system [27]. In order to calculate these classification metrics, the confusion matrix was created. It contains information about the predicted risk, with respect to the ground truth value. The definition of the confusion matrix and the values resulted in the testing phase for each classifier are presented in Table 1.2. This system provides as result the risk category, which belongs to one of the four possible classes (unlike a regular classification, which consists in two classes: positive and negative). Therefore, the meaning of the elements that compose the confusion matrix, for this particular case, is the following: • TP—true positive: the prediction is correct; • TN—true negative: doesn’t exist; this system will never say that there is no risk (risk = 0 is an impossible output); Table 1.2 The confusion matrix [27] Definition
Method (predicted class)
NBC
ANN
Real (actual class)
TP
FN
24
3
24
0
FP
TN
0
0
3
0
1 Intelligent Paradigms for Diagnosis, Prediction …
27
Table 1.3 The evaluation metrics [27] Name
Formula
Sensitivity (recall)
TP/(TP + FN) 0.89
Precision (correctness in positive prediction) TP/(TP + FP)
Value for NBC Value for ANN 1
1 0.89
• FP—false positive: the predicted risk is greater than the real value; • FN—false negative: the predicted risk is less than the real value. Using these elements, the classification metrics sensitivity and precision can be found. Their description and the obtained values are provided in Table 1.3. There are two other metrics, specificity and negative prediction value, but these cannot be evaluated because the system does not provide true negative results. Instead, the F1 score (harmonic mean between recall and precision: 2 × Precision x Recall / (Precision + Recall)) can be calculated. The resulted value is 0.94 for each method, which demonstrate a sort of balance between precision and sensitivity [27]. A definite conclusion regarding the most suitable AI model for stroke risk prediction cannot be drawn, the two models having similar performances. NBC doesn’t provide any FP value, while ANN doesn’t have any FN value. ANN is better than NBC regarding the sensitivity of the model, but NBC outperforms ANN in precision [27]. There are some ideas that can be further implemented in order to improve the performances of a system like this. The modest number of records that are available is an impediment. Both statistics and machine learning need a relevant dataset in order to model the desired behavior. An intensive training could be performed (if much more records would be available) and the performance metrics could be evaluated with respect to each category, not to the entire process. This technique would increase the quality of the predictions for real cases. Another promising idea [70] is to offer predictions not only for a close moment, but also about what will be the medical status of the patient after a larger interval of time.
1.6 Prosthetic Hand Myoelectric-Based Modeling and Control Using Evolving Fuzzy Models and Fuzzy Control As specified in [17], evolving fuzzy models and evolving fuzzy controllers, also referred to as rule-based ones, represent a concept created by P. Angelov in 2001 and further developed in his and his co-authors papers as, for example, the representative ones [71–73]. These essential nonlinear models and controllers are developed around online rule base learning, and some fresh results are reported in [74–77]. Building upon our recent results presented in [17], the next sub-section gives results
28
A. Albu et al.
on the development of evolving fuzzy models for finger dynamics prosthetic hand myoelectric-based control. The second sub-section is also built upon [17] and gives some fuzzy control results for the challenging process related to finger dynamics in the framework of prosthetic hand myoelectric-based control. The first solution to deal with that, exemplified in [17], is to develop one-degree-of-freedom (1-DOF) fuzzy controllers by transferring the experience from linear control to fuzzy control in terms of inserting the nonlinear features specific to fuzzy control for linear control system performance enhancement. The second solution to control these processes, mentioned in [54] as a direction of future research, is to employ two-degree-of-freedom (2-DOF) fuzzy controllers. The concept of 2-DOF fuzzy control was created by R.-E. Precup and S. Preitl in 1999 and expressed in [78, 79] as fuzzy control with non-homogenous dynamics with respect to the input channels, and further developed in subsequent papers that applied 2-DOF fuzzy control to servo systems and electrical drives [80, 81]. The third solution, also mentioned in the same context in [54], is to ensure the optimal tuning of fuzzy controllers in terms of appropriate optimization problems. Useful examples of optimization algorithms that solve the optimization problems are included in the following applications, which, although are not related to healthcare, they have already proved their value and potential: milling process optimization and control [82], fuzzy classification systems [83], asymmetric traveling salesman problem [84], mobile robot navigation and path planning [85, 86], collective construction of 3D structures [87], performance improvement of silicon-on-insulator FinFETs [88], inserting information feedback models in optimization algorithms [89], and dealing with several continuous optimization problems [90]. The optimal tuning of fuzzy controllers is not simple because of the models of the process and of the fuzzy controllers, and several constraints appear in this regard. These constraints can be related to the technical operation and also the economic conditions of the control systems, and challenging healthcare applications are presented in [91–93], but also non-healthcare applications are those given in [94–97]. Nevertheless, a special attention should be paid to those constraints that result from the stability of the fuzzy control systems, which should be guaranteed; few recent approaches are discussed in [98–102].
1.6.1 Evolving Fuzzy Modeling Results The evolving fuzzy models share the inputs that are elements of the following input vector [17, 18]: [z 1,k z 2,k z 3,k z 4,k z 5,k z 6,k z 7,k z 8,k ]T ,
(1.20)
where z j,k (b), 0 b ≤ z j,k ≤ 255 b, is the myoelectric signal obtained as the output of the myoelectric sensor j, j = 1 . . . 8, the measuring unit b indicates bit. The placement of the eight sensors is [17, 18, 42]: sensors 1 to 4 on the flexor digitorum
1 Intelligent Paradigms for Diagnosis, Prediction …
29
superficialis, sensors 5 and 6 on the extensor digitorum, sensor 7 on the extensor digiti minimi, and sensor 8 on the abductor pollicis longus. The placement of the myoelectric sensors on the human hand and hardware details are illustrated and described in [17, 19, 42]. As considered in [17–19, 41–43, 47], the outputs of all fuzzy models are the angles yl,k (%) of fingers l, l = 1 . . . 5, expressed as flexion percentages of finger closing between fully relaxed (0, i.e., minimum contraction) and fully contracted (100, i.e., maximum contraction), subjected to the constraints 0 % ≤ yl,k ≤ 100 %, l = 1 . . . 5.
(1.21)
The finger indices l are l = 1 (the thumb), l = 2 (the index finger), l = 3 (the middle finger), l = 4 (the ring finger), and l = 5 (the pinky). As shown in [17], the finger angles (namely, the finger flexion angles) are measured using five flex sensors and the sensor output is converted to a percentage from 0 to 100%, where 0% is the position of finger being strait and 100% for the finger fully bent. The minimum and maximum bend of each finger is calibrated at startup. The eight system inputs are generated in order to cover different ranges of magnitudes and frequencies and to capture various hand movements in terms of the details given in [17–19, 42, 43, 47]. A set of D = 25000 data points of training data and a set of D = 18491 data points of validation data was used in [17–19, 42, 43, 47]. The sampling period was set to Ts = 0.01 s. The evolving fuzzy models are developed using the incremental online identification algorithm that is implemented using Angelov and Filev’s theory [103] and Ramos and Dourado’s eFS Lab software [104]. The algorithm is described in [17, 18] and its flowchart is given in Fig. 1.12, where k indicates the discrete time step, which is also the notation for the current data sample, and FM indicates fuzzy model. As pointed out in [17], the input vector in (1.20) includes information from all myoelectric sensors in order to model the effects of cross-couplings in the Multi Input-Multi Output (MIMO) nonlinear dynamical system specific to finger dynamics. This nonlinear system will be viewed as a controlled process in the next section. Dynamics are inserted in the fuzzy models in terms of adding several past values of the outputs yl,k , l = 1 . . . 5, and/or inputs by appropriate shifting. Once the number of inputs is fixed, the incremental online identification algorithm calculates both the rest of the structure of the models and all parameters. The general expressions of the fuzzy models are [17] yl,k = fl (zk ), l = 1 . . . 5,
(1.22)
where fl are nonlinear input–output maps. The values of root mean square error (RMSE) between the model outputs yl,k (%) and the real-world system outputs (i.e., the human hand finger angles or the expected outputs) ydl,k (%) were analyzed in [17–19, 42, 43, 47] and integrated in RMSE, measured in % and viewed as a global performance index [17–19, 42, 43, 47]
30
A. Albu et al.
Fig. 1.12 The flowchart of the incremental online identification algorithm [17, 18]
D
1 RMSE = (yl,k − ydl,k )2 , l = 1 . . . 5, D k=1
(1.23)
where the real-world system outputs ydl,k unknown template are obtained by real-time measurements conducted on the human hand. The readers are invited to consult [17–19, 42, 43, 47] and contact the authors for details on the evolving fuzzy model development. Minimum details on the best evolved fuzzy models will be summarized as follows. The best evolved fuzzy models exhibit the smallest RMSE on the validation, with the notation RMSEv . The structures and parameter values of the five evolved fuzzy models including membership functions, rule bases and rule consequents ones are specified in the files that are freely available at [105]. The Recursive Least Squares algorithm was applied in step S5 given in Fig. 1.12 to calculate the parameters in the rule consequents. The fuzzy model with the number 17 according to [47] is the best one for the first finger (l = 1). The input vector of this model is [17, 47] zk = [z 1,k z 2,k z 3,k z 4,k z 5,k z 6,k z 7,k z 8,k z 1,k−1 z 2,k−1 z 3,k−1 z 4,k−1 z 5,k−1
1 Intelligent Paradigms for Diagnosis, Prediction …
z 6,k−1 z 7,k−1 z 8,k−1 y1,k−1 y2,k−1 y3,k−1 y4,k−1 y5,k−1 y1,k−2 ]T ,
31
(1.24)
it evolved to n R = 1 rules, the number of evolved parameters is 1273, and its performance is RMSEv = 1.0583%. The fuzzy model with the number 13 according to [18] is the best one for the second finger (l = 2). The input vector of this model is [17, 18] zk = [z 1,k z 2,k z 3,k z 4,k z 5,k z 6,k z 7,k z 8,k y1,k−1 y2,k−1 y3,k−1 y4,k−1 y5,k−1 y2,k−2 ]T , (1.25) it evolved to n R = 2 rules, the number of evolved parameters is 1032, and its performance is RMSEv = 1.161%. The fuzzy model with the number 17 according to [47] is the best one for the third finger (l = 3). The input vector of this model is [17, 47] zk = [z 1,k z 2,k z 3,k z 4,k z 5,k z 6,k z 7,k z 8,k z 1,k−1 z 2,k−1 z 3,k−1 z 4,k−1 z 5,k−1 z 6,k−1 z 7,k−1 z 8,k−1 y1,k−1 y2,k−1 y3,k−1 y4,k−1 y5,k−1 y3,k−2 ]T ,
(1.26)
it evolved to n R = 2 rules, the number of evolved parameters is 1809, and its performance is RMSEv = 1.1181%. The fuzzy model with the number 13 according to [18] is the best one for the fourth finger (l = 4). The input vector of this model is [17, 18] zk = [z 1,k z 2,k z 3,k z 4,k z 5,k z 6,k z 7,k z 8,k y1,k−1 y2,k−1 y3,k−1 y4,k−1 y5,k−1 y4,k−2 ]T , (1.27) it evolved to n R = 3 rules, the number of evolved parameters is 1505, and its performance is RMSEv = 1.157%. The fuzzy model with the number 17 according to [47] is the best one for the fifth finger (l = 5). The input vector of this model is [17, 47] zk = [z 1,k z 2,k z 3,k z 4,k z 5,k z 6,k z 7,k z 8,k z 1,k−1 z 2,k−1 z 3,k−1 z 4,k−1 z 5,k−1 z 6,k−1 z 7,k−1 z 8,k−1 y1,k−1 y2,k−1 y3,k−1 y4,k−1 y5,k−1 y5,k−2 ]T ,
(1.28)
it evolved to n R = 2 rules, the number of evolved parameters is 1541, and its performance is RMSEv = 1.1168%. A sample of experimental results expressed as the real-world system output and the simulated model output is illustrated in Fig. 1.13 for the fourth finger. Figure 1.13 suggestively shows the performance of the fuzzy model on the validation data.
32
A. Albu et al.
Fig. 1.13 Finger angle y1 versus time of one evolved fuzzy model (red) and real-world system (blue) on validation data set [47]
1.6.2 Fuzzy Control Results The MIMO control system structure that controls the five finger angles yl , l = 1 . . . 5, is presented in Fig. 1.14. It consists of five separate Single Input–Single Output (SISO) control loops (one for each finger) with the controllers Cl , l = 1 . . . 5, which elaborate the control signals u l , l = 1 . . . 5. The reference inputs (the set-points) are
Fig. 1.14 MIMO control system structure to control the five finger angles [17]
1 Intelligent Paradigms for Diagnosis, Prediction …
33
Fig. 1.15 Structure and input membership functions of Takagi–Sugeno PI-fuzzy controllers, l = 1…5 [17]
rl , l = 1 . . . 5 in Fig. 1.14, and el , l = 1 . . . 5, are the control errors. Disturbances are not inserted in Fig. 1.14 as there are cross-couplings between the five SISO control loops, however they are considered in [54]. The structure and the input membership functions of one of the five SISO Takagi–Sugeno-Kang Proportional-Integral (PI)-fuzzy controllers are presented in Fig. 1.15, where q −1 is the backward shift operator, FC is the strictly speaking fuzzy controller an essentially nonlinear input– block that ensures
output static map, el,k unknown template is the control error increment, and u l,k unknown template is the control signal increment. The readers are invited to consult [17] for details on the fuzzy controller design and tuning. The main steps concerning the fuzzy controller design and tuning are described as follows. Step 1. The five continuous-time PI controllers are designed and tuned. Their transfer functions are Cl (s) = kCl (1 +
1 ), l = 1 . . . 5, sTil
(1.29)
where kCl are the controller gains, and Til are the integral time constants, l = 1 . . . 5. A frequency domain approach is applied in [17] to tune the parameters of the five PI controllers making use of linear models that approximate the process models. Step 2. The sampling period is set to Ts = 0.01 s, which meets the requirements of quasi-continuous digital control, and the five continuous-time PI controllers are discretized leading to the unified recurrent equation of the incremental discrete-time PI controllers [17] u k,l = K P,l (ek,l + γl ek,l ), l = 1 . . . 5,
(1.30)
with the parameters [17] K P,l = kC,l (1 −
Ts 2 Ts ), γl = , l = 1 . . . 5. 2 Ti,l 2 Ti,l − Ts
(1.31)
34
A. Albu et al.
As specified in [17], the FC block employs the weighted average method for defuzzification, and the SUM and PROD operators in the inference engine. The rule base of the FC block is [17] IF (ek.l IS N AND ek.l IS N) OR (ek.l IS P AND ek.l IS P) THEN u k.l = ηl K P,l (ek.l + γl ek.l ), F (ek.l IS ZE) OR (ek.l IS N AND ek.l IS ZE) OR (ek.l IS N AND ek.l IS P) OR (ek.l IS P AND ek.l IS ZE) OR (ek.l IS P AND ek.l IS P) THEN u k.l = K P,l (ek.l + γl ek.l ), l = 1 . . . 5,
(1.32)
where the parameters ηl , 0 < ηl , are meant for the alleviation of the overshoot of the control system. Step 4. The values of the parameters Be,l and ηl are set in accordance with the experience of the control system designer or in terms of optimal parameter tuning after the definition of an appropriate optimization problem, which is solved by classical or modern optimization algorithms specified in the first part of this sub-chapter. The values of the parameters Be,l are obtained using [17] Be,l = γl Be,l , l = 1 . . . 5.
(1.33)
This approach is convenient because it offers low-cost fuzzy controller design and tuning. A sample of experimental results is presented in Fig. 1.16 for the first
Fig. 1.16 Flexion percentage (finger angle) y1 versus time [17]
1 Intelligent Paradigms for Diagnosis, Prediction …
35
finger as the output versus time considering zero initial conditions, zero disturbance inputs and the 40% step modification of the reference input of the fuzzy control system. Figure 1.16 illustrates encouraging results, but there is still room for further sperformance improvement; one solution to deal with that is data-driven model-free control, with results reported in [54]. However, different approaches to design and tune fuzzy controllers can be applied as, for example, those given in [106–108].
1.7 Conclusions This chapter presented the authors’ concerns and results in the field of intelligent paradigms applied in healthcare applications. It is becoming obvious that medicine needs to incorporate some automated tools able to provide valuable support in decision-making process. Even if medical experts became aware that AI is a necessity (certainly not meant to replace human experts, but to assist them with a second opinion) and even if they are already using such tools, computer-aided decision-making systems still have enough room for improvements. All the applications developed in this field lead to a favorized and also popular topic nowadays: personalized medicine. The general purpose of this topic is to provide tools, models and control solutions that are valuable aid for physicians, who will be able to analyze in a greater depth a much larger quantity of data. This way, the patients will have the chance to benefit from a higher quality healthcare system. The results presented in this chapter are encouraging, but there are still many relevant things to do in this field. The authors intend to continue the research in this direction in order to develop new approaches and tools that should be able to improve the healthcare system, in general, and particularly, the patients’ lives. As highlighted in [1], it is still challenging to create a reliable prognostic model able to achieve the requested confidence in order to be used in real clinical circumstances. Data-driven model-free control in combination with AI techniques is another promising direction of future research in healthcare systems. Acknowledgements The research reported in this paper was supported by the grant of the Romanian Ministry of Education and Research, CNCS - UEFISCDI, project number PN-III-P4-ID-PCE2020-0269.
References 1. A. Albu, R.E. Precup, T.A. Teban, Results and challenges of artificial neural networks used for decision-making in medical applications. Facta. Univ. Ser. Mech. Eng. 17(4), 285–308 (2019) 2. S.J. Russell, P. Norvig, inArtificial Intelligence: A Modern Approach, 3rd edn. (Pearson, Upper Saddle River, NJ, USA, 2010)
36
A. Albu et al.
3. J.M. Zurada, Introduction to Artificial Neural Systems (Jaico Publishing House, Mumbai, India, 2012) 4. M. Lafif Tej, S. Holban, Determining multi-layer perceptron structure using clustering techniques. Int. J. Artif. Intell. 17(1), 139–166 (2019) 5. R.E. Precup, H. Hellendoorn, A survey on industrial applications of fuzzy control. Comput. Ind. 62(3), 213–226 (2011) 6. J.M. Mendel, Uncertain Rule-Based Fuzzy Logic Systems: Introduction and New Directions (Prentice-Hall, Upper-Saddle River, NJ, 2001) 7. H.A. Hagras, A hierarchical type-2 fuzzy logic control architecture for autonomous mobile robots. IEEE Trans. Fuzzy. Syst. 12(1), 524–539 (2004) 8. O. Castillo, P. Melin, Type-2 Fuzzy Logic Theory and Applications (Springer-Verlag, Berlin, Heidelberg, New York, 2008) 9. O. Castillo, P. Melin, A.A. Garza, O. Montiel, R. Sepúlveda, Optimization of interval type-2 fuzzy logic controllers using evolutionary algorithms. Soft. Comput. 15(6), 1145–1160 (2011) 10. O. Castillo, P. Melin, A review on the design and optimization of interval type-2 fuzzy controllers. Appl. Soft. Comput. 12(4), 1267–1278 (2012) 11. O. Castillo, P. Melin, Optimization of type-2 fuzzy systems based on bio-inspired methods: A concise review. Info. Sci. 205, 1–19 (2012) 12. O. Castillo, P. Melin, A review on interval type-2 fuzzy logic applications in intelligent control. Inf. Sci. 279, 615–631 (2014) 13. R.E. Precup, P. Angelov, B.S.J. Costa, M. Sayed-Mouchaweh, An overview on fault diagnosis and nature-inspired optimal control of industrial process applications. Comput. Ind. 74, 75–94 (2015) 14. F. Valdez, O. Castillo, P. Cortes-Antonio, P. Melin, A survey of type-2 fuzzy logic controller design using nature inspired optimization. J. Intell. Fuzzy. Syst. 39(5), 6169–6179 (2020) 15. M. Sugeno, On stability of fuzzy systems expressed by fuzzy rules with singleton consequents. IEEE Trans. Fuzzy Syst. 7(2), 201–224 (1999) 16. L.T. Kóczy, Fuzzy If-Then rule models and their transformation into one another. IEEE Trans. Syst. Man. Cybern Part A 26(5), 621–637 (1996) 17. R.E. Precup, T.A. Teban, A. Albu, A.B. Borlea, I.A. Zamfirache, E.M. Petriu, Evolving fuzzy models for prosthetic hand myoelectric-based control. IEEE Trans. Instrum. Meas. 69(7), 4625–4636 (2020) 18. R.E. Precup, T.A. Teban, A. Albu, A.B. Borlea, I.A. Zamfirache, E.M. Petriu, (2019) Evolving fuzzy models for prosthetic hand myoelectric-based control using weighted recursive least squares algorithm for identification. Proc., IEEE International Symposium on Robotic and Sensors Environments (ON, Canada, Ottawa, 2019), pp. 164–169 19. Precup RE, Teban TA, Petriu EM, Albu A, Mituletu IC (2018) Structure and evolving fuzzy models for prosthetic hand myoelectric-based control systems. Proc. 26th Mediterranean Conference on Control and Automation, Zadar, Croatia, pp. 625–630 20. A. Zare , M.A. Zare, N. Zarei, R. Yaghoobi , M.A. Zare , S. Salehi , B. Geramizadeh , S.A. Malekhosseini, N. Azarpira, A neural network approach to predict acute allograft rejection in liver transplant recipients using routine laboratory data. Hepatitis Monthly17(12), e55092 (2017) 21. L. Bertolaccini, P. Solli, A. Pardolesi, A. Pasini, An overview of the use of artificial neural networks in lung cancer research. J. Thor. Dis. 9(4), 924–931 (2017) 22. S.A. Korkmaz, H. Binol, A. Akcicek, M.F. Korkmaz, An expert system for stomach cancer images with Artificial Neural Network by using HOG Features and Linear Discriminant Analysis: HOG_LDA_ANN, in Proceedings of IEEE 15th International Symposium on Intelligent Systems and Informatics (Subotica, Serbia, 2017), pp. 327–332 23. A. Esteva, B. Kuprel, R.A. Novoa, J. Ko, S.M. Swetter, H.M. Blau, S. Thrun, Dermatologistlevel classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (2017) 24. E.J. Lee, Y.H. Kim, N. Kim, D.W. Kang, Deep into the brain: artificial intelligence in stroke imaging. J. Stroke 19(3), 277–285 (2017)
1 Intelligent Paradigms for Diagnosis, Prediction …
37
25. K.W. Johnson, J.T. Soto, B.S. Glicksberg, K. Shameer, R. Miotto, M. Ali, E. Ashley, J.T. Dudley, Artificial intelligence in cardiology. J. Am. Coll. Cardiol. 71(23), 2668–2679 (2018) 26. A. Albu, M.S. Pasca, C.G. Zimbru, Medical predictions: naive Bayes classifier vs artificial neural networks, in Proceedings of 13th IEEE International Symposium on Applied Computational Intelligence and Informatics (Timisoara, Romania, 2019), pp. 1–6 27. A. Albu, L. Stanciu, M.S. Pasca, C.G. Zimbru, Choosing between artificial neural networks and Bayesian inference in stroke risk prediction, in Proceedings of 7th International Conference on e-Health and Bioengineering (Iasi, Romania, 2019), pp. 1–6 28. H. Guterman, Y. Nehmadi, A. Chistyakov, J.F. Soustiel, M. Feinsod, A comparison of neural network and Bayes recognition approaches in the evaluation of the brainstem trigeminal evoked potentials in multiple sclerosis. Int. J. Bio-Med. Comput. 43(3), 203–213 (1996) 29. J.X. Chen, Y.W. Xing, G.C. Xi, J. Chen, J.Q. Yi, D.B. Zhao, J. Wang, A comparison of four data mining models: Bayes, neural network, SVM and decision trees in identifying syndromes in coronary heart disease, in Proceedings of 4th International Symposium on Neural Networks (Nanjing, China, 2007), pp. 1274–1279 30. M.S. Islam, S.M. Khaled, K. Farhan, M.A. Rahman, J. Rahman, Modeling spammer behavior: naive Bayes vs. artificial neural networks, in Proceedings of 2009 International Conference on Information and Multimedia Technology (Jeju Island, South Korea, 2009), pp. 52–55 31. D. Xhemali, C.J. Hinde, R.G. Stone, Naïve Bayes vs. decision trees vs. neural networks in the classification of training web pages. Int. J. Comput. Sci. Iss.4(1), 16–23 92009) 32. L.M. Rodrigues, M. Mestria, Classification methods based on Bayes and Neural Networks for human activity recognition, in Proceedings of 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge (Guilin, China, 2016), pp. 1141–1146 33. B.J. He, S.M. Mortuza, Y.T. Wang, H.B. Shen, Y. Zhang, NeBcon: protein contact map prediction using neural network training coupled with naive Bayes classifiers. Bioinformat 33(15), 2296–2306 (2017) 34. E. Udayakumar, S. Santhi, P. Vetrivelan, An investigation of Bayes algorithm and neural networks for identifying the breast cancer. Ind.J. Med. Paed. Oncol. 38(3), 340–344 (2017) 35. T.P. Burghardt, K. Ajtai, Neural/Bayes network predictor for inheritable cardiac disease pathogenicity and phenotype. J. Molec. Cell Cardiol. 119, 19–27 (2018) 36. J.H. Wang, H.C. Ren, W.H. Chen, P. Zhang, A portable artificial robotic hand controlled by EMG signal using ANN classifier, in Proceedings of 2015 IEEE International Conference on Informatics and Automation (Lijiang, China, 2015), pp. 2709–2714 37. Z.J. Xu, Y.T. Tian, L. Yang, sEMG pattern recognition of muscle force of upper arm for intelligent bionic limb control. J. Bionic. Eng. 12(2), 316–323 (2015) 38. H.X. Cao, S.Q. Sun, K.J. Zhang, Modified EMG-based handgrip force prediction using extreme learning machine. Soft. Comput. 21(2), 491–500 (2017) 39. Y. Guo, G.R. Naik, S. Huang, A. Abraham, H.T. Nguyen, Nonlinear multiscale maximal Lyapunov exponent for accurate myoelectric signal classification. Appl. Soft. Comput. 36, 633–640 (2015) 40. C.G. Yang, J.S. Chen, Z.J. Ju, A.S.K. Annamalai, Visual servoing of humanoid dual-arm robot with neural learning enhanced skill transferring control. Int. J. Humanoid. Robot 15(2), 1–23 (2018) 41. T.A. Teban, R.E. Precup, T.E. Alves de Oliveira, E.M. Petriu, Recurrent dynamic neural network model for myoelectric-based control of a prosthetic hand, in Proceedings of 2016 IEEE International Systems Conference (Orlando, FL, USA, 2016), pp. 1–6 42. T.A. Teban, R.E. Precup, E.C. Lunca, A. Albu, C.A. Bojan-Dragos, E.M. Petriu, in Recurrent neural network models for myoelectric-based control of a prosthetic hand, in Proceedings of 22nd International Conference on Systems Theory, Control and Computing (Sinaia, Romania, 2018), pp. 603–608 43. R.E. Precup, T.A. Teban, A. Albu Evolving fuzzy and neural network models of finger dynamics for prosthetic hand myoelectric-based control, in Proceedings of 11th International Conference on Electronics, Computers and Artificial Intelligence (Pitesti, Romania, 2019), pp. 1–8
38
A. Albu et al.
44. A. Ameri, M.A. Akhaee, E. Scheme, K. Englehart, Regression convolutional neural network for improved simultaneous EMG control. J. Neural. Eng.16(3): 036015 (2019) 45. C. Igual, J. Igual, J.M. Hahne, L.C. Parra, Adaptive auto-regressive proportional myoelectric control. IEEE Trans. Neural. Syst. Rehab. Eng. 27(2), 314–322 (2019) 46. R.E. Precup, T.A. Teban, T.E. Alves de Oliveira, E.M. Petriu, Evolving fuzzy models for myoelectric-based control of a prosthetic hand, in Proceedings of 2016 IEEE International Conference on Fuzzy Systems (Vancouver, BC, Canada, 2016), pp. 72–77 47. R.E. Precup, T.A. Teban, A. Albu, A.I. Szedlak-Stinean, C.A. Bojan-Dragos, Experiments in incremental online identification of fuzzy models of finger dynamics. Rom J Inform Sci Tech 21(4), 358–376 (2018) 48. M. Tabakov, K. Fonal, R.A. Abd-Alhameed, R. Qahwaji, Fuzzy bionic hand control in realtime based on electromyography signal analysis, in N.T. Nguyen, L. Iliadis, Y. Manolopoulos, B. Trawi´nski (eds.)Computational Collective Intelligence ICCCI 2016 (Springer, Cham, 2016), pp. 292–302 49. M. Tabakov, K. Fonal, R.A. Abd-Alhameed, R. Qahwaji Bionic hand control in real-time based on electromyography signal analysis, in N.T. Nguyen, R. Kowalczyk (eds.)Transactions on Computational Collective Intelligence XXIX. (Springer, Cham, 2018), pp. 21–38 50. A. Albu, R.E. Precup, T.A. Teban, Medical applications of artificial neural networks, in Proceedings of 14th International SAUM Conference on Systems, Automatic Control and Measurements (Nis, Serbia, 2018), pp. 1–11 51. M. Fliess, C. Join, Model-free control and intelligent pid controllers: towards a possible trivialization of nonlinear control? IFAC Proc. 42(10), 1531–1550 (2009) 52. M. Fliess, C. Join, Model-free control. Int. J. Control 86(12), 2228–2252 (2013) 53. M. Fliess, C. Join, Machine learning and control engineering: the model-free case, in Proceedings of Future Technologies Conference 2020 (Vancouver, BC, Canada, 2020), pp. 1–20 54. R.E. Precup, R.C. Roman, T.A. Teban, A. Albu, E.M. Petriu, C. Pozna, Model-free control of finger dynamics in prosthetic hand myoelectric-based control systems. Stud. Informat. Control 29(4), 399–410 (2020) 55. R.C. Roman, R.E. Precup, R.C. David, Second order intelligent proportional-integral fuzzy control of twin rotor aerodynamic systems. Proc. Comput. Sci. 139, 372–380 (2018) 56. R.C. Roman, R.E. Precup, C.A. Bojan-Dragos, A.I. Szedlak-Stinean, Combined model-free adaptive control with fuzzy component by virtual reference feedback tuning for tower crane systems. Proc. Comput. Sci. 162, 267–274 (2019) 57. R.C. Roman, R.E. Precup, E.M. Petriu, Hybrid data-driven fuzzy active disturbance rejection control for tower crane systems. Eur. J. Control 58, 373–387 (2021) 58. R.E. Precup, S. Preitl, K.J. Burnham, B. Vinsonneau, Virtual reference feedback tuning approach to fuzzy control systems development. IFAC Proc. 40(8), 123–128 (2007) 59. M.C. Campi, A. Lecchini, S.M. Savaresi, Virtual reference feedback tuning: a direct method for the design of feedback controllers. Automatica 38(8), 1337–1346 (2002) 60. S. Formentin, M.C. Campi, A. Caré, S.M. Savaresi, Deterministic continuous-time virtual reference feedback tuning (VRFT) with application to PID design. Syst. Control Lett. 127, 25–34 (2019) 61. D.M. Filimon, A. Albu, Skin diseases diagnosis using artificial neural networks, in Proceedings of 9th IEEE International Symposium on Applied Computational Intelligence and Informatics (Timisoara, Romania, 2014), pp. 189–194 62. UCI Machine Learning Repository - Dermatology Data Set, Available from: http://archive. ics.uci.edu/ml/datasets/Dermatology. Last Accessed Dec 2020 63. A. Albu, Decisional methods applied in medical domain, in Proceedings of 5th International Symposium on Applied Computational Intelligence and Informatics (Timisoara, Romania, 2009), pp. 123–128 64. N. Mischie, A. Albu, Artificial neural networks for diagnosis of coronary heart disease, in Proceedings of 8th International Conference on e-Health and Bioengineering (Iasi, Romania, 2020), pp. 1–6
1 Intelligent Paradigms for Diagnosis, Prediction …
39
65. European Cardiovascular Disease Statistics 2017, Available from: http://www.ehnheart.org/ cvd-statistics.html. Last Accessed April 2020 66. Framingham Heart Study Dataset, Available from: https://www.kaggle.com/amanajmera1/fra mingham-heart-study-dataset/activity. Last Accessed: April 2020 67. About the Framingham Heart Study, Available from: https://www.framinghamheartstudy.org/ fhs-about/. Last Accessed: April 2020 68. I. Tanasoiu, A. Albu, A connectionist model for cerebrovascular accident risk prediction, in Proceedings of 6th IEEE International Conference on E-Health and Bioengineering (Sinaia, Romania, 2017), pp. 45–48 69. A. Albu, L. Stanciu, Benefits of using artificial intelligence in medical predictions, in Proceedings of 5th IEEE International Conference on E-Health and Bioengineering (Iasi, Romania, 2015), pp. 1–6 70. S. Lukic, Z. Cojbasic, P. Peric, Z. Milosevic, M. Spasic, V. Pavlovic, A. Milojevic, Artificial neural networks based early clinical prediction of mortality after spontaneous intracerebral hemorrhage. Acta. Neurol. Belg. 112(4), 375–382 (2012) 71. P. Angelov, D. Filev, On-line design of Takagi-Sugeno models. Fuzzy Sets and Systems— IFSA 2003, in T. Bilgiç, De B. Baets, O. Kaynak, (eds.) Lecture Notes in Artificial Intelligence, vol. 2715, (Springer, Berlin, Heidelberg, 2003), pp. 92–165 72. P. Angelov, J. Victor, A. Dourado, D. Filev, On-line evolution of Takagi-Sugeno fuzzy models. IFAC Proc. 37(16), 67–72 (2004) 73. P. Angelov, N. Kasabov, Evolving computational intelligence systems, in Proceedings of 1st International Workshop on Genetic Fuzzy Systems (Granada, Spain, 2005), pp. 76–82 74. P. Angelov, I. Škrjanc, S. Blažiˇc, Robust evolving cloud-based controller for a hydraulic plant, in Proceedings of 2013 IEEE Conference on Evolving and Adaptive Intelligent Systems (Singapore, 2013), pp. 1–8 75. S. Blažiˇc, I. Škrjanc, D. Matko, A robust fuzzy adaptive law for evolving control systems. Evolv. Syst. 5(1), 3–10 (2014) 76. Oliveira L, Bento A, Leite VJS, Gomide FAC (2020) Evolving granular feedback linearization: design, analysis, and applications. Appl. Soft. Comput.86, 105927 77. M.M. Ferdaus, M. Pratama, S.G. Anavatti, M.A. Garratt, E. Lughofer, PAC: a novel selfadaptive neuro-fuzzy controller for micro aerial vehicles. Inf. Sci. 512, 481–505 (2020) 78. R.E. Precup, S. Preitl, (1999) Development of some fuzzy controllers with non-homogenous dynamics with respect to the input channels meant for a class of systems, in Proceedings of European Control Conference (Karlsruhe, Germany, 1999), pp. 61–66 79. R.E. Precup, S. Preitl, Development of fuzzy controllers with non-homogeneous dynamics for integral-type plants. Electr. Eng. 85(3), 155–168 (2003) 80. R.E. Precup, S. Preitl, E.M. Petriu, J.K. Tar, M.L. Tomescu, C. Pozna, generic two-degree-offreedom linear and fuzzy controllers for integral processes. J. Franklin Inst. 346(10), 980–1003 (2009) 81. S. Preitl, A.I. Stinean, R.E. Precup, Z. Preitl, E.M. Petriu, C.A. Dragos, M.B. Radac, Controller design methods for driving systems based on extensions of symmetrical optimum method with DC and BLDC motor applications. IFAC Proc. 45(3), 264–269 (2012) 82. R.E. Haber, J.R. Alique, Fuzzy logic-based torque control system for milling process optimization. IEEE Trans. Syst. Man. Cybern Part C Appl. Rev. 37(5), 941–950 (2007) 83. Z.C. Johanyák, A modified particle swarm optimization algorithm for the optimization of a fuzzy classification subsystem in a series hybrid electric vehicle. Tehn. Vjesn Tech. Gaz 24(2), 295–301 (2017) 84. E. Osaba, J. Del Ser, A. Sadollah, M.N. Bilbao, D. Camacho, A discrete water cycle algorithm for solving the symmetric and asymmetric traveling salesman problem. Appl. Soft. Comput. 71, 277–290 (2018) 85. J. Vašˇcák, I. Zolotová, E. Kajáti, Navigation fuzzy cognitive maps adjusted by PSO, in Proceedings of 2019 23rd International Conference on System Theory, Control and Computing (Sinaia, Romania, 2019), pp. 107–112
40
A. Albu et al.
86. R.E. Precup, E.I. Voisan, E.M. Petriu, M.L. Tomescu, R.C. David, A.I. Szedlak-Stinean, R.C. Roman, Grey wolf optimizer-based approaches to path planning and fuzzy logic-based tracking control for mobile robots. Int. J. Comput. Commun. Control 15(3), 3844 (2020) 87. H. Zapata, N. Perozo, W. Angulo, J. Contreras, A hybrid swarm algorithm for collective construction of 3D structures. Int. J. Artif. Intell. 18(1), 1–18 (2020) 88. G. Kaur, S.S. Gill, M. Rattan, Whale optimization algorithm for performance improvement of silicon-on-insulator FinFETs. Int. J. Artif. Intell. 18(1), 63–81 (2020) 89. G.G. Wang, Y. Tan, Improving metaheuristic algorithms with information feedback models. IEEE Trans. Cybern. 49(2), 542–555 (2019) 90. L.M. Li, K.D. Lu, G.Q. Zeng, L. Wu, M.R. Chen, A novel real-coded population-based extremal optimization algorithm with polynomial mutation: A non-parametric statistical study on continuous optimization problems. Neurocomput 174, 577–587 (2016) 91. H. Costin, C. Rotariu, I. Alexa, G. Constantinescu, V. Cehan, B. Dionisie, G. Andruseac, V. Felea, E. Crauciuc, M. Scutariu, TELEMON—a complex system for real time medical telemonitoring, in Proceedings of 11th International Congress of the IUPESM/World Congress on Medical Physics and Biomedical Engineering (Munich, Germany, 2009), pp. 92–95 92. C. Rotariu, A. Pasarica, G. Andruseac, H. Costin, D. Nemescu, Automatic analysis of the fetal heart rate variability and uterine contractions, in Proceedings of 8th International Conference and Exposition on Electrical and Power Engineering (Iasi, Romania, 2014), pp. 1–6 93. S.I. Bejinariu, R. Luca, H. Costin, Nature-inspired algorithms based multispectral image fusion, in Proceedings of 9th International Conference and Exposition on Electrical and Power Engineering (Iasi, Romania, 2016), pp. 10–15 94. P. Baranyi, P. Korondi, R.J. Patton, H. Hashimoto, Trade-off between approximation accuracy and complexity for TS fuzzy models. Asian J. Control. 6(1), 21–33 (2004) 95. I. Dzitac, F.G. Filip, M.J. Manolescu, Fuzzy logic is not fuzzy: World-renowned computer scientist Lotfi A. Zadeh. Int. J. Comput. Commun. Control 12(6), 748–789 (2017) 96. R. Andoga, L. F˝oz˝o, J. Judiˇcák, R. Bréda, S. Szabo, R. Rozenber, M. Džunda, Intelligent situational control of small turbojet engines. Int. J. Aerosp. Eng. 2018, 8328792 (2018) 97. M. Evagoras, K.M. Deliparaschos, E. Kalyvianaki, A.C. Zolotas, T. Charalambous, Robust dynamic CPU resource provisioning in virtualized servers. IEEE Trans. Serv. Comput. (2020). https://doi.org/10.1109/TSC.2020.2966972 98. D. Liu, G.H. Yang, M.J. Er, Event-triggered control for T-S fuzzy systems under asynchronous network communications. IEEE Trans. Fuzzy Syst. 28(2), 390–399 (2020) 99. B. Xiao, H.K. Lam, Y. Yu, Y.D. Li, Sampled-data output-feedback tracking control for interval type-2 polynomial fuzzy systems. IEEE Trans. Fuzzy Syst. 28(3), 424–433 (2020) 100. B.P. Jiang, H.R. Karimi, Y.G. Kao, C.C. Gao, Takagi-Sugeno model based event-triggered fuzzy sliding-mode control of networked control systems with semi-Markovian switchings. IEEE Trans. Fuzzy Syst. 28(4), 673–683 (2020) 101. Y. Xia, J. Wang, B. Meng, X.Y. Chen, Further results on fuzzy sampled-data stabilization of chaotic nonlinear systems. Appl. Math. Comput. 379, 125225 (2020) 102. R.E. Precup, S. Preitl, E.M. Petriu, R.C. Roman, C.A. Bojan-Dragos, E.L. Hedrea, A.I. Szedlak-Stinean, A center manifold theory-based approach to the stability analysis of state feedback Takagi-Sugeno-Kang fuzzy control systems. Facta. Univ. Ser. Mech. Eng. 18(2), 189–204 (2020) 103. P. Angelov, D. Filev, An approach to online identification of Takagi-Sugeno fuzzy models. IEEE Trans. Syst. Man Cybern Part B Cybern 34(1), 484–498 (2004) 104. J.V. Ramos, A. Dourado, On line interpretability by rule base simplification and reduction, in Proceedings of European Symposium on Intelligent Technologies, Hybrid Systems and their Implementation on Smart Adaptive Systems (Aachen, Germany, 2004), pp. 1–6 105. http://www.aut.upt.ro/~rprecup/Fuzzy-models.zip. Last Accessed: April 2020 106. S. Preitl, Z. Preitl, R.E. Precup, Low cost fuzzy controllers for classes of second-order systems. IFAC Proc. 35(1), 397–402 (2002)
1 Intelligent Paradigms for Diagnosis, Prediction …
41
107. S. Yordanova, D. Merazchiev, L.C. Jain, A two-variable fuzzy control design with application to an air-conditioning system. IEEE Trans. Fuzzy Syst. 23(2), 474–481 (2015) 108. Y.J. Liang, Y.X. Li, W.W. Che, Z.S. Hou, Adaptive fuzzy asymptotic tracking for nonlinear systems with nonstrict-feedback structure. IEEE Trans. Cybern 51(2), 853–861 (2021)
Chapter 2
Artificial Intelligence in Healthcare Practice: How to Tackle the “Human” Challenge Stefano Triberti, Ilaria Durosini, Davide La Torre, Valeria Sebri, Lucrezia Savioni, and Gabriella Pravettoni Abstract Artificial Intelligence (AI) plays a crucial role in Health and Medicine, especially in terms of diagnostic support, identification of treatment, patient health management and improvement in health organization infrastructures. However, there is still limited knowledge on the issues related to the implementation and usage of this disruptive technology. In the present contribution, after briefly reviewing the main trends in AI in medicine, we focus on the prefiguration of the possible risky effects of AI implementation on doctor-patient relationship (the “third wheel” effect); then, we deepen the issue of interface or the necessity to develop shared guidelines for the interaction properties of AI devices. Finally, we include some reflections and guidelines on the identification of personnel that will work with AI within healthcare contexts, highlighting the need for the implementation of interdisciplinary teams, able to value both rigorous practice and patients’ health and well-being.
S. Triberti (B) · D. La Torre · V. Sebri · L. Savioni · G. Pravettoni Department of Oncology and Hemato-Oncology, University of Milan, Milan, Italy e-mail: [email protected] D. La Torre e-mail: [email protected] V. Sebri e-mail: [email protected] L. Savioni e-mail: [email protected] G. Pravettoni e-mail: [email protected] S. Triberti · I. Durosini · V. Sebri · L. Savioni · G. Pravettoni Applied Research Division for Cognitive and Psychological Science, IEO European Institute of Oncology IRCCS, Milan, Italy e-mail: [email protected] D. La Torre Artificial Intelligence Institute, SKEMA Business School and Université Côte D’Azur, Sophia Antipolis, France © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_2
43
44
S. Triberti et al.
Keywords Artificial intelligence · Medicine · Healthcare process · Interface · Human-computer interaction · User experience · Third wheel effect
2.1 Introduction It is now broadly accepted that Artificial Intelligence (AI) identifies an interdisciplinary area which includes philosophy, mathematics, computer science, engineering and robotics, and cognitive science. AI is a discipline based on the ability of a machine to learn from experience, to simulate the human intelligence, to adapt to new scenarios, and to get engaged in humanlike activities. Roughly speaking AI is the simulation of human intelligence by computers. On the other hand, recent conceptions challenge the idea that human cognitive processes could be effectively simulated or reproduced, so that AI could be considered similar to human minds in terms of the results (e.g., finding solutions to problems), but there will always be an irreducible difference between human cognition and artificial computation. In any case AI as a field has made a rapid progress in recent years, and it has already played and will continue to play a crucial role in the economic, social, and scientific areas of society. Most researchers and scientists agree that AI will transform business models and every segment of all industries through the adoption of innovative technologies and creative intelligence [1]. AI will also outperform humans on most cognitive tasks in this century, disrupt businesses and societies at large in greater magnitude than any previous technological revolutions, and affect labor productivity as it will make the production and delivery of new goods and services much easier and simpler than before. Importantly, the massive deployment of AI-based algorithms also raise major societal and ethical questions. AI is expected to become more and more widespread in medical practice and, according to some experts, to deeply change the way we perform medicine and pursue healthcare. AI has the potential to reduce operational costs, increase efficiency and improve customer experience. Often the name AI is associated with Machine Learning (ML). ML is a branch of AI where algorithms are used to learn from data to make future decisions or predictions. Some of the tasks one can solve by means of ML algorithms include [2– 4] recognizing patterns (facial expressions, handwritten or spoken words, medical images), generating patterns (images or motion sequences), recognizing anomalies (unusual credit card transactions, unusual patterns of sensor readings), prediction (future stock prices, currency exchange rates). ML algorithms include different forms of learning: • Supervised learning, which consists in learning task a function that maps an input to an output based on example input–output pairs. It infers a function from labeled training data consisting of a set of training examples. • Unsupervised learning whose task is to identify previously undetected patterns and information in a data set with no pre-existing labels.
2 Artificial Intelligence in Healthcare Practice …
45
• Reinforcement learning is the third paradigm in machine learning. A reinforcement learning algorithm typically tries to balance between exploration and exploitation by means of Markov decision processes and dynamic programming techniques. Data are often presented sequentially rather than all at once, and peculiar labels are sequentially provided to the learner during the interaction with the task environment in the form of reward/penalization (therefore the “reinforcement” mechanism). The paradigm of reinforcement learning is usually situated in between supervised learning and unsupervised learning and it deals with learning in sequential decision making problems. It is mainly based on the concepts of Markov decision process and dynamic programming and on various definitions of optimality with respect to the goal of learning sequential decisions. Reinforcement learning algorithms are based on estimating value functions, which are functions of states that estimate how good it is for the agent to be in a given state or to perform a given action. The notion of goodness is measured in terms of future rewards that the agent can expect. It is well recognized that the current trends in AI can be summarized as follows: • Learning algorithms, with a particular focus on unsupervised and reinforcement learning; • Ethics in AI is a subarea of AI which lies in the ethical quality of AI algorithmbased prediction, the ethical quality of the end outcomes drawn out of that and the ethical quality of the impact it has on humans; • Quantum computing that is the large scale adoption of new quantum-based super computers. Quantum computers will play a critical role in the creation of artificial intelligence; • Convergence of AI and other emerging technologies; • Facial recognition and image analysis and their relevant application to emotion detections, cybersecurity, identity identification; • Biased data usually coming from big data and different sources of data including structured, semi-structured, unstructured data; • Neural networks, deep neural networks and multilayered neural networks, • Socio-economic models and their use to model social and economic phenomena by using data from different sources including social media, IoT, sensors, GPS mobility track; • Deep learning, which is a specific subset of machine learning using artificial neural networks (ANN) which are layered structures inspired by the human brain; • Privacy, policy and security in data management and storage. Another fundamental field is the study of human behavior towards AI and, consequently, the implementation of AI technologies within real-world application fields. This is the main objective of the present contribution, with a focus on healthcare and medicine. The contribution is organized as follows. Section 2.2 presents the main applications of AI techniques in healthcare. Section 2.3 presents the concept of the “third wheel effect” or the risks AI implementation may present for what regards the quality
46
S. Triberti et al.
of doctor-patient relationship. Section 2.4 deals with the issue of interface in AI, a cornerstone for effective implementation. Section 2.5 reports suggestions for the selection and monitoring of personnel that should work with AI, along with tools and variables of interest. The subsequent sections will conclude the contribution with a summary of implications and recommendations for the field.
2.2 AI in Healthcare According to literature, AI could be employed in the healthcare field in four main ways, namely as a diagnosis support tool, as an aid for identification of personalized therapy regimens, as a tool to promote patient engagement, or to model and empower medical institutions’ organizational infrastructures. As it is easy to see, at least three of these modalities entail the possibility that AI will increasingly interact not only with the health professionals, but also with patients. Yet, there is still scarce information about how the inclusion of AI in medical practice will impact patient-doctor relationship, which is a paramount factor influencing treatment effectiveness. Generally, symptoms are collected, analyzed, and compared with the previous scientific literature to recognize diseases and apply available treatments. According to an evidence-based medicine approach, physicians have to know the current scientific literature to propose the best intervention possible [5]. However, theoretical and methodological studies show significant lacks because (1) samples are not always representative of the overall populations, (2) experimental settings tend be more artificial than natural ones; (3) experimental studies are too expensive. This way, although randomized studies are relevant for scientific progress, the medical practice proposes a new way to study illness and related medications using AI [6]. As previously said, diagnosis is recognized as the main area of AI application in medicine and healthcare. Specifically, patients’ clinical data are analyzed from the real-word evidence through the use of AI devices [7]. Thus, data collected are compared to a substantial proportion of the literature from genetic testing, diagnosis imaging, electrodiagnostic, physical examination notes, clinical results, and other patients’ clinical information. AI is used a diagnostic support tool to identify the disease and support physicians in their everyday duties that rely on the manipulation of data and knowledge [8]. For example, Artificial neural networks (ANNs) can recognize and classify patterns accurately analyzing a complex interaction of many biological, clinical, and pathological variables. Second, AI can identify available treatments tailored to patient-specific solutions, in accordance with the concept of precision medicine (e.g., Watson for Oncology) [9]. This could be very interesting for two reasons: first, to identify the best intervention possible, second, physicians can provide early interventions to delay the onset of chronic pathological conditions that could worsen in the future (pre-emptive medicine [10, 11]. Third, AI can promote patient engagement in their process of care. Digital devices, such as eHealth, Digital Therapeutics, and Ambient Intelligence, are technological solutions aimed at collecting data, monitoring health status, and support patients’ health management by
2 Artificial Intelligence in Healthcare Practice …
47
providing useful recommendations to face health-related issues every day [12]. This allows doctors to treat patients outside of the medical consultation, making homebased care possible. Individuals can be assisted at any moment becoming more active and responsible for their care [13]. Forth, AI can empower medical institutions’ organizational infrastructure providing insights and suggestions. Healthcare systems are not stationary, but in continuous development learning by the experience; this way, AI can provide suggestion to deal with challenges and organizational improvements in terms of model care coordination capabilities [10, 11]. For example, the reduction of the number of visits to the hospitals due to the possibility of acquiring many measurements by oneself can lead to: (i) cost reduction in resource consumption (e.g., emergency room use and the number of patients recovering in hospitals); (ii) the promotion of illness prevention by reducing the necessity to reach the hospital for any healthcare need. In terms of prevention, laboratory and clinics need to accelerate the implementation of AI to capture data in real-time as well as institutions need to sustain their transformation in intelligible processes [14]. This would be useful to keep away infections and other medical issues about recovery in hospitals. Indeed, AI would be useful to make the healthcare systems dynamic and in continuous progress by learning from its experience and strive to implement permanent process developments. Analyzing databases characterized by a flow of data, it takes into account larger-scale organizations and cycles to treat diseases in line with an idea of disease based on a multi-agent system (MAS), such as multiple agents situated in a common environment that interact with each other [15]. Specifically, the MAS approach captures the dynamics of individual patients (e.g., responses to received medications and behavioral interactions) within a larger societal ecosystem. A global care coordination technology process is implemented to map, control, and better support changes of the system with a demonstrated promotion in responses to medication and more efficacious interventions. Further, the growth of research on the “Explainable Artificial Intelligence” (XAI) improves AI’s transparency and ability in explaining its elaboration processes that can sustain the implementation in any professional practice [16]. This way, new scientific findings could be shared through open-sources, and data could be aggregated and displayed for open access by physicians as point-of-care information thanks to the simplification, readability, and clinical utility of datasets [17]. Anyway, the role of physicians remains crucial and currently it is still an understudied topic. The introduction of AI within healthcare contexts will influence the doctor-patient relationship. In other words, AI may change the way diagnoses, interventions, and treatments will be provided, but it should not replace the contribution of the medical practice as performed by human doctors. The challenge of applying AI in medicine supports the aims of improving clinical decision making and more personalized treatments, far from the concept of a “therapy for a disease” [18]. On the contrary, medical interventions have to take into account individuals and their needs by engaging patients as decision-makers in any phase of their process of care. This is in line with a patient-centered perspective in which patients and physicians collaborate systematically to take decisions on treatments and interventions (shared
48
S. Triberti et al.
decision making) [19]. In line with this, online iterative platforms can involve individuals as actively responsible for their treatments by sustaining patients’ acceptance of disease in complicated day-to-day routines. For example, Patients Like Me is a platform that allows patients to communicate and sharing opinions with others about their disease, at the same time, doctors can collect symptoms and physical/psychological difficulties from patients directly [20]. This would be desirable to stay in contact with patients’ opinions and promote the relationship between individuals and the healthcare system. Doctor-patient interactions are relevant in the process of care and professionals have to create a personalized environment to promote the flow of information and patients’ autonomy and responsibility in disease management [21]. This way, AI sustains patients to more able to treat their symptoms and be engaged in their process of care to take the “optimal decision in real-time” with physicians. To sum up, many clinical data collected from patients’ medical records and wearable health sensors could be utilized not only to provide suggestions about lifestyle change and maintenance, but also to increase information aimed at promoting healthcare designs based on patients’ habits and needs. Challenges still exist in terms of the acceptance of technologies and AI by healthcare professionals as well [22, 23] claimed that predictors to measure the intention-to-use are related to perceived usefulness (the conviction that technologies can help to reach objectives more safely), the perceived ease of use, and the user acceptance of information technology. Longoni and colleagues [24] claimed that doctors prefer to rely on their medical intuitions rather than on a statistical model. This could be explained based on the perception that they may seem less competent and less professional if they rely on computerized decision aids [25]. In general, two consequences are possible: on one hand, physicians’ perception of not being able to deal with AI may discourage its application, on the other hand, the professionals’ overconfidence in AI can lead to a decrease in the involvement of human competencies in the process of care [26]. Physicians have to mediate the relationships between AI and patients becoming even more competent to manage data as well as communication with patients. In conclusion, most of the novel applications of AI in medicine need further research, especially pertaining to the Human–Computer Interaction area. In any case, scientific research still has to deal with the actual implications of AI implementation within medical practice, especially for what regards patients’ reaction and the effects on the quality of doctor-patient relationship. Recent research has proposed the concept of “third wheel effect” trying to summarize the risks that could be prefigurated in this area.
2.3 A “third Wheel” Effect Despite the significant advantages offered by AI to the healthcare context, its effects on the doctor-patient relationship are still unknown. According to recent literature, the presence of AI could influence the clarity of communication as well as the mutual
2 Artificial Intelligence in Healthcare Practice …
49
trust between health providers and patients. Is it possible that AI will act in the medical consultation like a “third wheel”? Technologies are not “just tools” used in healthcare, but they increasingly become interlocutors within doctor-patient relationships, influencing the whole process. Indeed, research on health professionals’ attitudes and behaviors towards the implementation of AI in everyday medical practice led to controversial results. For example, while many doctors are not keen to trust recommendations coming “from a machine”, others show profound confidence in this technology, even if they do not really know how it works. For example, [27] found that radiologists were generally positive towards AI, but some raised doubts on trust (e.g., “validation of algorithms should be mandatory”). Analogous results were found by [28], who also reported lack of knowledge about the utility of AI among pathologists and the need for specialized training to encourage adoption. Regarding mental health professionals [29], psychotherapists were more positive towards hypothetical AI with a limited role (i.e., analyzing data to aid diagnosis and patient monitoring) than towards one with a more prominent role (e.g., acting as a therapist). Also, the methodological approach to psychotherapy influenced attitudes towards adoption, with cognitive-behavioral psychotherapists being more positive than psychodynamic and systemic ones. Although there is still no scientific evidence on the impact of AI on the medical context, we have previously illustrated the possible effect of the presence of AI in health, talking about the “Confusion of the Tongues”, the decision paralysis or risk of delay, and the role ambiguity as the main issues that still need to be addressed (Fig. 2.1) [10]. The three facets of the effect are reported and explained below, along with research tools that could be used to identify them in context and guidelines for tackling them in advance.
Fig. 2.1 A resume of the third wheel effect
50
S. Triberti et al.
2.3.1 “Confusion of the Tongues” What is the role of the clinician in a medical consultation aided by AI? How should the information be encoded in it? All the patients’ information (e.g., symptoms, previous treatments, allergies) are encoded into AI according to specific languages and formats (i.e., data). Only in this way, data can be analyzed and processed by algorithms. However, some patients’ information may be difficult to translate into computable data. This is not related to some sort of malfunction, but to AI’s difficulty to understand unclear, vague, and undefined information and symptoms. For example, how can physicians code a “general malaise” reported by the patient? Is that a new symptom? Is that the expression of emotion? In this scenario, the active role of the physician is crucial. The clinician is not like a “translator of information” from the AI to patients and back, but he/she is required to use human skills during the process, such as emotional intelligence. Emotional intelligence is the ability to recognize and regulate emotions in ourselves and others [30] and can be used in a medical consultation with a patient [31]. Emotional intelligence helps to understand the origin of undefined symptoms or vague aspects reported by patients, paying attention also to psychological aspects. The AI does not have access to the subtle capacities to understand, recognize, and manage emotions and this aspect must always remain present in the doctor-patient relationship when using AI. Only this way it will be possible to avoid the risk of forcing patients’ information into predefined categories and to adapt symptoms to the capabilities and language of the AI. If this happens (for example, due to an overestimation of capabilities of technologies [32], patients could be scarcely motivated to talk with their doctors about personal feelings and health concerns. Indeed patients can feel when physicians are really listening to them [33, 34]. Lack of perceived authentic listening could lead patients to experience negative emotions such as demoralization, a sense of abandonment, and anger [35–37] and, over time, it could have a detrimental effect on the adherence by patients to the treatment process, the effectiveness of therapy, and the success of shared decision making [38, 39]. Tools that could be employed to detect this issue within the healthcare process regard the possibility to “member check” the health information with the patient after it has been transformed in data that could be utilized by the AI, to make sure it adequately represent symptoms and health issues as they are experienced. It could be useful to profile patients in terms of actual health literacy to understand their representation and understanding of the illness before translating them into information for the diagnosis. Finally, doctors themselves may report their user experience on structured questionnaires or evaluation sheets immediately after utilizing AI tools and report on possible uncertainties. Future development may focus on AI for the medical context that employ natural language processing and, based on research evidence, are able to understand (or to identify ambiguity) in patient testimony.
2 Artificial Intelligence in Healthcare Practice …
51
2.3.2 Decision Paralysis and Risk of Delay Once the data has been entered into AI, an elaboration process begins. The current AI process is not transparent and well understood by human users who may have an unclear representation of how exactly AI reached conclusions. This could generate in doctors “trust issues,” especially when important medical decisions should be taken based on AI conclusions. This happens because these decisions cannot be made on the basis of medical data alone. As mentioned previously, doctors need to use their skills in the relationship with the patient and contextualize the role of AI within this same relationship. The doctor must act as a mediator between the AI and the patient, translating the patient’s report for the AI and introducing AI in the medical consultation. AI conclusions should be approved, refined, and explained to the patient by the doctor, answering possible questions of patients. However, these represent additional tasks a doctor should take on and it is still unclear whether activities associated with such a mediation role could generate decision paralysis or delay across the health organization processes. Also, specific training for healthcare professionals to manage AI within the consultation is still not available. On the other hand, a positive consequence of the implementation of AI in medical practice should be that it may prevent time-consuming activities for doctors who can invest their energy and time in a patient-centered approach and empathic consultation [40]. Yet, this would be possible only when AI is implemented in healthcare to carry on administrative or technical tasks, avoiding confusion with the “human” aspects of medicine that should still be incarnated in the human professional. Health institutions that employ AI for diagnosis should take into account procedure analyses to identify whether AI-enhanced tasks and practices have been delayed if compared with the usual, and address implementation through systematic organizational plans.
2.3.3 Role Ambiguity Disagreements between AI and physicians may appear regarding diagnosis or identification of treatment. What happens if doctors and AI provide different conclusions on diagnosis and treatments? Who really has authority? Such confusion may seem trivial to AI developers or health professionals, but it is a serious matter when it comes to patients’ representation of the healthcare process. The doctor must explain to the patient the role of AI in the medical consultation. Physicians will have to reassure the patient that the recourse to such a technology is a strategy that can be used to provide the best possible treatment and diagnosis. In the perception of the patient, this communication may contain the implicit message that someone else is doing the doctor’s work. This is relevant especially when people lose trust in medical experts and public health officials. For example, when patients receive multidisciplinary care they may
52
S. Triberti et al.
experience confusion or anxiety related to (possible) different opinions about their health [41]. This is amplified especially when the diagnosis is bad or unexpected [42], patients may contact multiple doctors, looking for other diagnoses and generating a phenomenon known as “doctor shopping” [43]. In this scenario, what happens if technologies are used to reach a diagnosis? And what is their effect on the healthcare process and the doctor-patient relationship? An example is the case of Ernest Quintana. Ernest was an elderly man hospitalized in Fremont, California with a severe respiratory crisis. The clinician announced the diagnosis to Ernest and his family through a robot, a remote-controlled device that allows a video-communication. Family members were shocked by the use of this device and perceived this experience as a devaluation of their loved ones [44]. This is an extreme example in which technologies were introduced in the healthcare context as a mediator during the communication of a diagnosis and highlighted the importance of professionals in the context of care, especially in case of traumatic diagnosis or grief [45, 46]. These issues may be identified in advance if research methods are employed focused on patient’s perception of AI and the healthcare process as a whole. Probably qualitative research in the field (e.g., ethnography, observation, contextual inquiry) is particularly advisable because they could notice the issues without influencing patient experience (e.g., asking direct questions about the role of AI compared with the doctor may promote such a doubt in the patient while it was not there before). Solutions could entail specific formation both for doctors and patients that help to understand how AI actually works, what it can or cannot do, to avoid that patient perception of a strange “third figure” in the therapeutic relationship puts trust in the health provider at risk. This paragraph underlines the important role of AI within the medical consultation as well as its possible effect on the decision-making process of care. AI becomes a real interlocutor within the healthcare context and it can have a possible effect on the doctor-patient relationship, such as (1) misunderstandings or simplification of the patients’ symptoms when entering data in the classifications of AI; (2) decision paralysis when the recommendations of AI are difficult to understand or to implement at the organizational level; and (3) ambiguity on roles and responsibilities within the hospital, especially in patient’s perception. The use of technologies in healthcare context still needs to be explored and studied from a psychological point of view [10, 11, 47], exploring patients’ attitudes, perceptions, and feelings about these technologies.
2.4 An Interface for AI Looking towards a future of desirable, effective usage of AI in healthcare requires taking into consideration the issue of interface. Interface could be defined as “what stands between” the user and the functions of a given technology. An interface serves three main aims [48]:
2 Artificial Intelligence in Healthcare Practice …
53
• To represent a software’s functions through a model; • To make virtual contents perceivable by the user; • To facilitate (possibly to promote, AN) the usage of the technology. Research in Human–Computer Interaction (HCI) determined that the more effective interfaces are “transparent” or in other words they allow users to perform the intended actions without recruiting higher-order cognitive processes [49]. When using an interface, one should be able to focus on the action he or she would like to perform, without “wasting time” to understand how the machine actually works, or how the sub-steps of the course of action should be translated in commands computable by the it. Indeed, interfaces from the first phases in the history of computers required users to have advanced technical knowledge (e.g., they had to displace or move cables, to write strings of code, etc.). Later, the field evolved with the development of interfaces based on users’ natural behavior, featuring for example graphical interfaces with icons and folders, and peripheral devices such as mouse and keyboards and touch-based commands. While some may consider the development of interfaces as an achievement of secondary importance if compared with the evolution of hardware, software and algorithms, it should be noticed that interfaces played a fundamental role in making computers a technology utilizable by everyone, so that today it is possible to envisage a digital revolution in society. This considered, AI has still to undergo such a fundamental step in its evolution. While many argue that AI technologies will transform human practices, AI interfaces are largely underdeveloped: there are no clear guidelines on how to develop the appearance and interaction aspects of these technologies, allowing users who are not technologically savvy to employ them effectively, correctly and effortlessly as possible. In the specific case of AI, interest is growing globally towards the issue of explanation. Indeed, elaboration processes by AI come out from a “black box” and human users are often not sure about how results were reached. Research is showing that in the healthcare field many professionals harbor dubious or negative attitudes towards AI, mostly because they are worried about (1) the lack of “human” abilities in technologies, that could possibly negatively impact the fragile equilibriums of patient-doctor relationship, and (2) the effects unforeseen errors in diagnosis or identification of treatment may have on their careers. Indeed, doctors do not have only to solve problems, but also to make decisions that will affect patient’s lives. The sub-discipline of XAI (eXplainable Artificial Intelligence) is more and more turning to social sciences in order to understand what is an effective explanation, how we could “teach” it to artificial entities involved in important decisions, and how this could be represented within a usable interface [7, 50]. Important resources to this aim are represented by User Experience (UX) and User Centered Design (UCD). UX is a collection of methods for the study of how people utilize products and tools, with the goal to improve the overall experience. It goes beyond ergonomics because of its attention to emotional and contextual aspects of the usage experiences. In order to design effective interfaces for AI, future developers should enter the contexts of use in order to understand: • The actual needs of expected users, as well of pre-existing skills and knowledge;
54
S. Triberti et al.
• The psychosocial factors affecting usage and the final utilization of AI outcomes, such as emotions and individual differences; • The opportunities and constrains represented by the implementation contexts, such as for example the organizational practices within a given hospital or clinic. It is advisable that future AI developers consider to adopt a UCD approach. Designing a technology centered on its users means that research activities are not implemented after the technology is already in use to improve its effectiveness, rather the expected users play a fundamental role in the design itself. In other words, all algorithms and computational models, hardware and interface are designed based on a preliminary analysis of the expected users and contexts of implementation. For example, when developing an AI that is expected to provide patients with guidelines and indications for health management, one should take into account how patients experience their lived illness in the everyday contexts of life. The interface aspect is not of secondary importance. How do patients expect to interact with the technology? Possibly they expect a user interface that is similar to devices they already use, which they can navigate to find personalized information (e.g., similar to mobile applications); or, they would expect a device with natural language processing they could interact with in a dialogical way, asking questions in the here-and-now of their evolving health needs, similar to home assistants. An effective interface is a prerequisite for the technology being actively used and not abandoned or ignored, which is totally independent of the sophistication of algorithms or programming. One possible resource is represented by merging usability tests with contextual inquiry [51]. Inspired by ethnographic approaches, conducting a contextual inquiry means interviewing the user about needs, obstacles and expectations towards a given anticipated technology use, however, the interview (and possibly, later, the testing) do not take place in a laboratory, but within the context of use (e.g., the hospital; patient’s home), grounding questions, answers and observation in physical spaces and objects as well as in everyday activities. Contextual inquiry is regarded as a highly pragmatical method [52] and it tends to yield unstructured data, it could be used to identify constrains and necessity that should be featured among guidelines to strictly follow when implementing the design. As a general rule, it is possible to state that, when expected to deal with patients, AI interface should be able to: • Fit within technologies/devices that are already available and utilizable by the patient, as well as within normal everyday activities and life spaces; • Visualize (or make immediately accessible) questions patients may have regarding health activities and treatment adherence; • Support immediate contact with the human health provider; • Display contents and functions that are already designed based on patients’ (not doctors’) experience, knowledge and understanding of their own health conditions. Understanding and managing AI interfaces and tools is, however, a complicated task, especially when coming to health professionals that are expected to use it for medical diagnosis and/or identification of treatment. The skills that you need to
2 Artificial Intelligence in Healthcare Practice …
55
acquire to understand how and why a “black box” works and operates require a great deal of education, training and focus. For this reason, another important approach to AI implementation in healthcare regards the analysis of skills and competences needed for professionals that are expected to use AI within their work practices.
2.5 Identifying Personnel to Work with AI Another area of development that is worth mentioning is that of personnel selection. Indeed, as seen in the previous paragraph, AI has still a long way to go in terms of interface development and design based on actual users’ and professionals’ needs. At the same time, it is important to prefigure what could be done to improve AI implementation by working on the human side of the equation. Recent research is focusing on analyzing users’ characteristics to strengthen the effectiveness of working groups that are supposed to work with AI. First, healthcare managers should take into account that AI is not like any other medical tool one could buy and give to health professionals to use within their everyday practice. On the contrary, it is likely that future work mediated by AI, both within the healthcare context and others, will be performed by teams whose composition value interdisciplinarity and openness to complexity. For example, [7] argued that healthcare teamwork involving AI should be centered on the patient and, for that reason, feature multiple professionals able to anticipate and reduce the risks emerging from the technology use. Specifically, a desirable team configuration (Fig. 2.2) to guarantee effective implementation of AI within a given healthcare context may include: • A psychologist expert in Human–Computer Interaction that performs UCD research and informs the developer on how to design AI tools for the specific contexts and informs the developer (a); supports the doctor in preserving relationship quality and shared decision making when using artificial tools (b); supports the patient in the healthcare journey and receives feedback from him/her on possible “third wheel effect” issues (c); • A technology developer/designer that receives information from the doctor (d) on how to design and modify the technology basing on actual medical needs (f); informs communication and management of the AI based on its technical features (h); • An AI communicator or “manager” that orient the implementation of the technology within the organizational context as well as the promotion of its use; this professional has access to the experience and skills of the others, and the patient as well (h, i, j); • The patient, who maintains contact with the doctor and the AI, as well as with communication and HCI expert to report experience important for design and implementation (c, e, g, j).
56
S. Triberti et al.
Fig. 2.2 Graphical representation of a work team to support effective AI implementation in the healthcare context (interdisciplinary links and collaborations are defined by letters)
This abstract representation is useful to take into account the importance of many professional skills that should be integrated in order to promote effective usage of a technological resource. Indeed, in this field the pre-existing attitudes towards AI deserve to be considered as well, because there is still limited information on users’ behavior towards the organizational inclusion of technologies that could deeply affect the ways people perform their normal activities. Recent approaches are developing team formation resources that include attitude towards complex technologies such as AI as a criterium for creating effective groups [53]. Future studies may include advanced goal-programming models to help healthcare organizations to optimize AI implementation in medical and assistance practice.
2.6 Recommendations In this chapter, we tried to highlight the relevance of AI in the healthcare context. AI could be employed as a diagnosis support tool, as an aid for the identification of personalized therapy regimens, as a tool to promote patient engagement, or to model and empower medical institutions’ organizational infrastructures. As easy to see, AI will increasingly interact not only with healthcare professionals but also with patients. This inclusion may have an impact on the doctor-patient relationship which
2 Artificial Intelligence in Healthcare Practice …
57
requires to consider not only technical aspects but also human and psychological ones. In this contribution, we described three possible ways of dysfunctional effects of including AI in the healthcare setting, conceptualizing them as a “third wheel” effect. Decisions may be delayed or paralyzed when AI recommendations are difficult to understand or to explain to patients. This can affect the organizational aspects of healthcare settings. It refers to how AI will be implemented within the healthcare systems, which may experience difficulty to adapt their timing, procedures, and organizational boundaries to innovation. Addressing the problem of how AI could be implemented in health systems could be helpful in improving the process. This requires the creation of AI implementation management plans that take into account not only the benefits of AI for the “technical” aspects of medical operations, but also the behavior of organizational units towards AI outcomes and how these findings fit into any care practice processes. The attention to organizational aspects of AI implementation is also central to the idea that the personnel expected to work with such technology should be carefully identified and selected. Recent studies point towards interdisciplinary teams that involve the patient as the main source of information for design and implementation, and also recognize the role of HCI experts and management, each of them with specific activities and interactions. Health organizations should take into account these guidelines to design effective human-AI systems that could really improve patients’ health and everyday life. Finally, in order to improve AI implementation and usage, future research should pay attention to the issue of interface, that should not be considered of secondary importance if compared with technical and algorithmic properties of the technologies. Final users (e.g., doctors and other health professionals, and patients in case of AI embedded within Digital Therapeutics resources) could be involved in preliminary UCD research in order to define in advance the interaction properties of innovative devices. This could sensitively reduce issues in AI use and consequently promote effective technology-mediated healthcare processes.
2.7 Conclusion The inclusion of AI in the healthcare context could lead to some advantages for clinical practice. AI can be a tool for diagnosis support, can promote patient engagement, or model and empower medical institutions’ organizational infrastructures, and could constitute an aid for the identification of personalized therapy regimens. However, this growing inclusion of AI in healthcare practice is raising some challenges. These challenges could be linked to technical problems, but also to psychological and social aspects related to its implementation. Today, AI cannot explain its processes and outputs and this could make it less reliable for users, especially when they are requested to make important decisions. The acceptance of AI by users can lead to some problems, such as overconfidence, and deskilling or the loss or reduction of human skills. Lastly, AI presents itself
58
S. Triberti et al.
as a real interlocutor within the doctor-patient relationship. It is difficult to predict the effects of AI in this context, but AI may be perceived as a “third wheel” in the relationship. Future research on AI in medicine should not focus on its effectiveness in terms of diagnosis and identification of treatment only, but also deepen the social and organizational processes that could support or hinder its use and the achievement of desirable improvements for health and care at a global scale.
References 1. 2. 3. 4.
5. 6.
7. 8.
9.
10. 11.
12.
13.
14.
15.
16. 17.
A. Riccoboni, The A.I. age (Critical Future Publisher, 2020). E. Alpaydin, Introduction to Machine Learning, 4th edn. (MIT Press, Cambridge, 2020) C.M. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2006) M. van Otterlo, M. Wiering, Reinforcement learning and markov decision processes. Reinforcement Learning. Adapt. Learn. Optim. 12, 3–42 (2012). https://doi.org/10.1007/978-3642-27645-3_1 J.A. Knottnerus, P. Tugwell, Evidence-based medicine: achievements and prospects. J. Clin. Epidemiol. 84, 1–2 (2017). https://doi.org/10.1016/j.jclinepi.2017.02.006 C. Castaneda, K. Nalley, C. Mannion, P. Bhattacharyya, P. Blake, A. Pecora, A. Goy, K.S. Suh, Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine. J. Clin. Bioinf. 5(1), 4 (2015) G. Pravettoni, S. Triberti, Il medico 4.0: Come cambia la relazione medico-paziente nell’era delle nuove tecnologie (Edra, Milan, 2019) S. Sarwar, A. Dent, K. Faust, M. Richer, U. Djuric, R. Van Ommeren, P. Diamandis, Physician perspectives on integration of artificial intelligence into diagnostic pathology. NPJ Digital Med. 2(1), 1–7 (2019) C. Liu, X. Liu, F. Wu, M. Xie, Y. Feng, C. Hu, Using artificial intelligence (Watson for Oncology) for treatment recommendations amongst Chinese patients with lung cancer: feasibility study. J. Med. Internet Res. 20(9), e11087 (2018) S. Triberti, I. Durosini, G. Pravettoni, A “third wheel” effect in health decision making involving artificial entities: a psychological perspective, in Frontiers in Public Health, vol. 8 (2020) S. Triberti, I. Durosini, G. Curigliano, G. Pravettoni, Is explanation a marketing problem? the quest for trust in artificial intelligence and two conflicting solutions. Public Health Genomics 23(1–2), 1–4 (2020) D. Cirillo, S. Catuara-Solarz, C. Morey, E. Guney, L. Subirats, S. Mellino, A. Gigante, A. Valencia, M.J. Rementeria, A.S. Chadha, N. Mavridis, Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. NPJ Digital Med. 3(1), 1–11 (2020) R. Zhuo, X. Sun, Design of personalized service system for home-based elderly care based on data fusion, in International Conference on Big Data Analytics for Cyber-Physical-Systems (Springer, Singapore, 2019), pp. 412–419 S.O. Danso, G. Muniz-Terrera, S. Luz, C. Ritchie, Application of big data and artificial intelligence technologies to dementia prevention research: an opportunity for low-and-middleincome countries. J. Glob. Health 9(2) (2019). http://doi.org/ https://doi.org/10.7189/jogh.09. 020322 D. Grzonka, A. Jakobik, J. Kołodziej, S. Pllana, Using a multi-agent system and artificial intelligence for monitoring and improving the cloud performance and security. Futur. Gener. Comput. Syst. 86, 1106–1117 (2018). https://doi.org/10.1016/j.future.2017.05.046 D. Gunning, Explainable artificial intelligence (xai). Defense Adv. Res. Projects Agency (DARPA) nd Web. 2(2) (2017) E. Tjoa, C. Guan, A survey on explainable artificial intelligence (XAI): Toward medical XAI. IEEE T. Neur. Net. Lear. (2020). https://doi.org/10.1109/TNNLS.2020.3027314
2 Artificial Intelligence in Healthcare Practice …
59
18. V. Sebri, L. Savioni, An introduction to personalized eHealth, in P5 eHealth: An Agenda for the Health Technologies of the Future. ed. by G. Pravettoni, S. Triberti (Springer, Cham, 2020), pp. 53–70 19. C. Renzi, S. Riva, M. Masiero, G. Pravettoni, The choice dilemma in chronic hematological conditions: why choosing is not only a medical issue? A psycho-cognitive perspective. Crit. Rev. Oncol. Hematol. 99, 134–140 (2016). https://doi.org/10.1016/j.critrevonc.2015.12.010 20. J. Hendler, A.M. Mulvehill, Who will be your next doctor? in Social Machines. (Apress, Berkeley, CA, 2016), pp. 14–28 21. A. Gorini, K. Mazzocco, S. Triberti, V. Sebri, L. Savioni, G. Pravettoni, A P5 Approach to m-Health: design suggestions for advanced mobile health technology. Front. Psychol. 9, 2066 (2018). https://doi.org/10.3389/fpsyg.2018.02066 22. F.D. Davis, Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Q. 319–340 (1989) 23. S.P. Morozov, A.V. Vladzymyrskyy, V.G. Klyashtornyy, A.E. Andreychenko, N.S. Kulberg, V.A. Gombolevsky, K.A. Sergunova, Clinical acceptance of software based on artificial intelligence technologies (radiology). (2019) arXiv preprint arXiv:1908.00381 24. C. Longoni, A. Bonezzi, C.K. Morewedge, Resistance to medical artificial intelligence. J. Consum. Res. 46(4), 629–650 (2019). http://doi.org/ https://doi.org/10.1093/jcr/ucz013 25. M.Palmeira, G. Spassova, Consumer reactions to professionals who use decision aids. Euro. J. Mark. (2015) 26. D.R. Lewis, The perils of overconfidence: Why many consumers fail to seek advice when they really should. J. Finan. Serv. Mark. 23(2), 104–111 (2018) 27. F. Jungmann, T. Jorg, F. Hahn, D.P. Dos Santos, S.M. Jungmann, C. Düber, P. Mildenberger, R. Kloeckner, Attitudes toward artificial intelligence among radiologists, IT specialists, and industry. Acad. Radiol. 28(6), 834–840 (2020) 28. R. Abdullah, B. Fakieh, Health care employees’ perceptions of the use of artificial intelligence applications: survey study. J. Med. Internet Res. 22(5), e17620 (2020) 29. V. Sebri, S.F.M. Pizzoli, L. Savioni, S. Triberti, Artificial intelligence in mental health: professionals’ attitudes towards AI as a psychotherapist, in Annual Review of Cybertherapy and Telemedicine (in press) 30. D.D. Goleman, Emotional Intelligence: Why It Can Matter More than IQ for Character, Health and Lifelong Achievement (Bantam Books, 1995) 31. I. Durosini, S. Triberti, G. Ongaro, G. Pravettoni, Validation of the Italian version of the brief emotional intelligence scale (BEIS-10). Psychol. Rep. 0033294120959776 (2020) 32. Jochemsen H. Medical practice as the primary context for medical ethics, in Autonomy and Human Rights in Health Care. An International Perspective, ed. by D. Weisstub, G. Diaz Pintos (Springer, Dordrecht, 2008) 33. R. Charon, Narrative medicine as witness for the self-telling body. J. Appl. Commun. Res. 37(2), 118–131 (2009). https://doi.org/10.1080/00909880902792248 34. S.K. Smith, A. Dixon, L. Trevena, D. Nutbeam, K.J. McCaffery, Exploring patient involvement in healthcare decision making across different education and functional health literacy groups. Soc. Sci. Med. 69(12), 1805–1812 (2009). https://doi.org/10.1016/j.socscimed.2009.09.056 35. C.R. Harris, R.S. Darby, Shame in physician–patient interactions: patient perspectives. Basic Appl. Soc. Psychol. 31(4), 325–334 (2009). https://doi.org/10.1080/01973530903316922 36. J.W. Kee, H.S. Khoo, I. Lim, M.Y. Koh, Communication skills in patient-doctor interactions: learning from patient complaints. Health Professions Edu. 4(2), 97–106 (2018). https://doi.org/ 10.1016/j.hpe.2017.03.006 37. G. Miaoulis Jr., J. Gutman, M.M. Snow, Closing the gap: the patient-physician disconnect. Health Mark. Q. 26(1), 56–68 (2009). https://doi.org/10.1080/07359680802473547 38. S. Ozawa, P. Sripad, How do you measure trust in the health system? A systematic review of the literature. Soc. Sci. Med. 91, 10–14 (2013). https://doi.org/10.1016/j.socscimed.2013.05.005 39. J.M. Taber, B. Leyva, A. Persoskie, Why do people avoid medical care? A qualitative study using national data. J. Gen. Intern. Med. 30(3), 290–297 (2015). https://doi.org/10.1007/s11 606-014-3089-1
60
S. Triberti et al.
40. E. Topol, Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again (Hachette UK, 2019) 41. S.K. Kedia, K.D. Ward, S.A. Digney, B.M. Jackson, A.L. Nellum, L. McHugh, K.S. Roark, O.T. Osborne, F.J. Crossley, N. Faris, R.U. Osarogiagbon, ‘One-stop shop’: lung cancer patients’ and caregivers’ perceptions of multidisciplinary care in a community healthcare setting. Transl. Lung Cancer Res. 4(4), 456 (2015). https://doi.org/10.3978/j.issn.2218-6751.2015.07.10 42. Briet, J. P., Hageman, M. G., Blok, R., & Ring, D. (2014). When do patients with hand illness seek online health consultations and what do they ask? Clin. Orthop. Relat. Res. 472(4), 1246– 1250. https://doi.org/10.1007/s11999-014-3461-9 43. R.Y. Yeung, G.M. Leung, S.M. McGhee, J.M. Johnston, Waiting time and doctor shopping in a mixed medical economy. Health Econ. 13(11), 1137–1144 (2004). https://doi.org/10.1002/ hec.871 44. G. Nichols, Terminal patient learns he’s going to die from a robot doctor (2019). Retrieved at: https://www.zdnet.com/article/terminal-patient-learns-hes-going-to-die-from-arobot-doctor/. Accessed 26 Mar 2019 45. I. Durosini, A. Tarocchi, F. Aschieri, Therapeutic assessment with a client with persistent complex bereavement disorder: a single-case time-series design. Clin. Case Stud. 16(4), 295– 312 (2017). https://doi.org/10.1177/1534650117693942 46. R. Rosner, G. Pfoh, M. Kotouˇcová, Treatment of complicated grief. Eur. J. Psychotraumatol. 2(1), 7995 (2011). https://doi.org/10.3402/ejpt.v2i0.7995 47. B.K. Wiederhold, Can artificial intelligence predict the end of life… and do we really want to know? in Cyberpsychology, Behavior, and Social Networking, vol. 22 (2019), p. 297. https:// doi.org/10.1089/cyber.2019.29149.bkw 48. G. Riva, Psicologia Dei Nuovi Media (IlMulino, Bologna, 2012) 49. T. Winograd, F. Flores, Understanding Computers and Cognition: A New Foundation for Design (Intellect Books, 1986) 50. T. Miller, Explanation in artificial intelligence: Insights from the social sciences. Artif. Intell. 267, 1–38 (2019) 51. G. Getto, The story/test/story method: a combined approach to usability testing and contextual inquiry. Comput. Compos. 55, 102548 (2020) 52. S.W. Dekker, J.M. Nyce, R.R. Hoffman, From contextual inquiry to designable futures: what do we need to get there? IEEE Intell. Syst. 18(2), 74–77 (2003) 53. D. La Torre, C. Colapinto, I. Durosini, S. Triberti, Team formation for human-artificial intelligence collaboration in the workplace: A goal programming model to foster organizational change. IEEE T. Eng. Manage. 1–11 (2021). https://doi.org/10.1109/TEM.2021.3077195
Chapter 3
A Statistical Analysis Handbook for Validating Artificial Intelligence Techniques Applied in Healthcare Smaranda Belciug
Abstract Day by day, little by little, healthcare as we know it changes. We are now part of a new age, where artificial intelligence systems and medicine walk hand in hand. Conventional statistical analysis and artificial intelligence have many things in common, but usually follow different paths. Statistical analysis starts with a theory and a model and tries to fit the parameters of the model to the data, whereas artificial intelligence uses a more pragmatic approach, empowering the data to determine the model. In what regards healthcare, statistical analysis formulates a hypothesis and tests the likelihood of that hypothesis against the data; artificial intelligence lets the data formulate the hypothesis. The multitude of synergies between statistical analysis and artificial intelligence lead to new impactful methods that assist medical practice and discovery. This paper aims to present a cross-fertilization of statistical analysis and artificial intelligence and to provide a handbook for validating the results obtained by this merger. Many researchers, from both the medical or artificial intelligence field aim to collect, analyze and publish their results in a reliable and robust manner, but still a lot of mistakes can occur at any stage of one’s study. Mistakes in medical studies are detrimental to the whole scientific community, medical industry, pharma industry, economy, and ultimately to each individual that will have to deal with the medical system at a certain point in her/his field. Unreliable medical results play a crucial part in the “reproducibility crisis”, which means creating published information that cannot be asserted by other peers. Incorrect results of artificial intelligence in healthcare slow down the scientific, medical and artificial intelligence progress, and waste research funding (measurement errors, design failure, outright fraud). Thus, using statistical analysis in validating artificial intelligence applied in healthcare is of crucial importance. The scope of this chapter is to provide the statistical handbook in order to improve the reliability and credibility of the findings of artificial intelligence in healthcare research. Errors in the statistical analysis are a matter for concern. If there are errors, then the conclusions of the study might be incorrect. Readers might not detect the error and be misled with respect to clinical practice or further research. Readers of medical journals accept uncritically the printed word. Hence, this chapter S. Belciug (B) Department of Computer Science, Faculty of Sciences, University of Craiova, Craiova, Romania e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_3
61
62
S. Belciug
will focus on the plan, design and implementation of statistical analysis in artificial intelligence in healthcare research. Keywords Statistical analysis · Normal distribution · p-level · Kolmogorov–Smirnov test · Lilliefors test · Shapiro Wilk W test · Odds ratio · Pearson’s χ 2 test · t-test · Mann–Whitney test · Levene test · Bartlett test · One-way ANOVA · Tukey’s post hoc test
3.1 Introduction Statistical analysis is an extremely important feature of machine learning (ML) applied in medicine. The most popular ML books contain statistical analysis, but still the use of statistical analysis in machine learning is unclear and minimal. This chapter’s scope is to help illuminate the use of statistical analysis and consequently the training needs of professionals that activate in ML applied in medicine field. Medical discoveries begin with high-quality scientific papers that should encompass accurate and rigorous statistical analysis, which allows others to reproduce the findings. Many researchers from the medical or ML fields, aim to collect, analyze and publish their results in a reliable and robust manner, but still a lot of mistakes can occur at any stage of one’s study. Mistakes in medical studies are detrimental to the whole scientific community, medical industry, economy, and ultimately to everyone that will have to deal with the medical system at a certain point in her/his life. Why is that? Because unreliable medical results play a crucial part in the “reproducibility crisis” [1], which means creating published information that cannot be asserted by other peers. Incorrect results of ML applied in medicine slow down the scientific, medical, and ML progress, and waste research findings (measurement errors, design failures, outright fraud). Many journals have implemented models that use exhaustive statistical review process. Our aim through this chapter is to improve the reliability and credibility of the findings of ML medical research. Many controversies in medicine are traceable due to the varying quality of statistical analysis research design. Eager researchers attend or read books on medical statistics and find dozens of ways to compute ‘p’ values, for instance, but rarely know how to design the statistical analysis of a medical research. There are only a few ways to do statistical analysis in medicine properly, but a thousand ways to do it wrong. Young researchers from the medical and computer science fields, such as medical residents, M.Sc or Ph.D. students, do no master the skill of understanding common statistics [2]. Science and medical discoveries cannot move further without proper understanding of the statistical interpretation of the results published in literature. This has repercussions on patients’ treatment. The importance of statistical analysis is promoted by the International Medical Informatics Association and by the National Library Medicine. Both organisms recommend courses of biostatistics and medical informatics [3, 4].
3 A Statistical Analysis Handbook for Validating Artificial Intelligence …
63
A literature review of the statistics used in biomedical informatics studies revealed unpleasant numbers [5]: in 71% of the studies published in JAMIA (Journal of American Medical Informatics Association) and 62% of the studies published in IJMI (Journal of Medical Informatics), the authors used only basic descriptive statistics; elementary statistics was used in only 42% of JAMIA articles and 22% of IJMI articles; multivariable statistics was used in 12% and 6% of the articles published by JAMIA, and IJMI, respectively; whereas machine learning techniques applied in medicine where used in 9% and 6%. A disturbing fact is that 18% of the research papers published in JAMIA and 36% in IJMI between 2000 and 2007 have no statistics. Statisticians urge medical and ML researchers to consult them at the planning stage of their study, rather just at the analysis stage. Therefore, a thorough research should start at the planning stage of the study, which will increase the impact of the medical findings (e.g. using power analysis one can determine how many samples are needed to achieve adequate power of the study, etc.), followed by a statistical benchmark for validating the results of ML applied in medical research, which will enhance medical research, [6]. In this chapter we shall describe the most used statistical tests and provide some examples on how to use them, together with the explanations that can be drawn from the results.
3.2 Hypothesis Testing In statistics, we deal with two hypotheses: the null hypothesis, notated H 0 , that is the general accepted one, and its opposite, the alternative hypothesis, notated H 1 . In an ideal world, in order to determine the correct hypothesis, one should test the entire population. This is hardly the case in everyday practice; thus, we can only test a random sample of that population. One thing we must keep in mind: when choosing the random sample, we need to make sure that it matches the features of the entire population, otherwise the drawn conclusion will most definitely be wrong. After determining the correct sample, we can formulate our hypothesis. The null hypothesis is the general accepted one, whereas the “research” hypothesis is H 1 . Data scientists are trying to reject the null hypothesis, thus producing new advances in research. For instance, let us presume that a pharmaceutical company produces a new vaccine and states that it is more effective than another one that is in use today. A possible hypothesis in this case would be: “a new vaccine produces better immunity than the one that it is use today”. The two hypotheses in this case are: • H 0 : there are no significant differences between the two vaccines. The difference reported by the pharmaceutical company was caused by hazard. • H 1 : there are significant differences between the new vaccine and the one that is in use.
64
S. Belciug
If we refer to the ML field, let us presume that we have designed and developed a new algorithm that detects brain tumor from CT scans. Our test results show that the new algorithm is better than another state-of-the-art algorithm. Therefore, we can have the following hypothesis “the new machine learning technique has a better accuracy in detecting brain tumors from scans, than another state-of-the-art method”. • H 0 : there are no significant differences between the two ML methods. • H 1 : there are significant differences between the two ML methods. To determine the statistical significance of an event we must understand three concepts: hypothesis testing, the Normal distribution, and p-level. To find out whether we need to accept or reject the null hypothesis based on the evidence at hand we must perform statistical tests. There are two type of tests: parametric and non-parametric. As the name states, the parametric tests use the statistical parameters from the sample data. The statistical parameters are the mean, standard deviation (or dispersion), and the distribution that governs the data. The most used parametric tests are the t-test and ANOVA. Before applying these tests, we need to verify whether the data is governed by the Normal distribution or not. Contrary to the first category of tests, the non-parametric tests do not make presumptions regarding the statistical parameters of distribution of the data. They are the alternative, when the data fails to have a Normal distribution. The most used non-parametric tests are chi-square and Mann Whitney U test. The p-level is one of the most powerful statistical tools that we can use. It is a number between 0 and 1. Its values can be interpreted as follows: if the p-level is less than or equal to 0.05, then we can reject the null hypothesis, because there is enough evidence to support the significance of the results; else if the p-level is greater than 0.05, then we accept the null hypothesis, due to the fact that there is not enough evidence to reject it. Another important statistical tool is power analysis. Recall that we mentioned earlier that we cannot perform a test on a whole population, hence we need to use a smaller population sample. The power of a test is the probability that the results obtained on the smaller sample are the same as the results obtained on the whole population. Let us presume that we have a pharmaceutical company that has developed a new vaccine for a certain disease. If the vaccine shows that it produces immunity on a small given number of subjects, having a good power analysis, we can presume that the vaccine will be as effective when it will be given to a larger cohort. In practice, we split the subjects into two categories: the control group which will receive the placebo, and the other one which will receive the vaccine. After performing power analysis if we find for instance that the power is 0.90, it means that 90% of the time the results are statistically significant, and in the rest 10% of the time, the results are not statistically significant. As a rule, the greater the size of the sample, the higher the statistical power [7]. The same principle applies to the ML algorithm example. If a method shows that it obtains a good accuracy when run a given number of times on a dataset, having a good statistical power, we can assume that it will produce the same accuracy on a larger number of runs.
3 A Statistical Analysis Handbook for Validating Artificial Intelligence …
65
Power analysis needs to be performed in any of the following circumstances: when we want to validate the results of our research, when we want to computer the power, having a specific sample size, and when we want to find the size of the sample so that we can achieve a certain statistical power [8–10]. One way to compute the sample size is by using Cochran’s formula, which allows us to establish the sample size taking into account a desired power: n=
Z 2 pq e2
where e is the margin of error (usually 0.05), p the proportion of the sample which has the respective attribute, and q = 1 − p [11]. Last, but not least, in hypothesis testing we need to discuss the Normal (Gaussian) distribution. A random variable X has a Normal distribution with mean μ and dispersion σ 2 , and density and distribution function with the following formulas: (x−μ)2 1 · e− 2σ 2 , −∞ < x < ∞, √ σ 2π x (t−μ)2 1 e− 2σ 2 dt, −∞ < x < ∞. · FX (x) = √ σ 2π
f X (x) =
−∞
The Normal distribution graph is the well-known Gaussian Bell, depicted in Fig. 3.1. The Normal distribution plot can point us some helpful information such as the area under the graph that lies between x = μ − σ and x = μ + σ contains 68% of all the observations from the statistical series. If we widen the area between x = μ − 2σ
Fig. 3.1 Normal distribution plot with mean μ = 8 and dispersion σ = 3
66
S. Belciug
and x = μ + 2σ , 95% of all the observations will fall in this region. This interval is also known as the 95% confidence interval. Widening the area all the way from x = μ − 3σ to x = μ + 3σ , 99.7% of the data will be found in that interval. Plotting the distribution graph can help us determine whether the sample data is governed or not by the Normal distribution. Another way is to apply different statistical tests such Kolmogorov–Smirnov Goodness of Fit test, the Lillifors test, and the Shapiro Wilk W test. The next section will provide the theory behind each one of the tests, together with a detailed example of their application. Thus, to conclude, in order to determine whether we accept or reject the null hypothesis, we need to perform specific tests that will give us a level of significance (p-level). Choosing the correct test implies knowing the distribution of data (normal or not). The data distribution must be determined on a sample that has a proper size, size which will be determined using power analysis. After this short recap, we shall continue with discussing normality tests, so that we can establish the correct statistical tools that can be applied on our research topic. All these tests apply to numerical data, so before we continue discussing them, we shall make a little detour and briefly present some statistical tools that can be applied when dealing with nominal data.
3.2.1 Contingency Tables or Cross-Tabulation When dealing with medical datasets, we do not always encounter numerical data. Sometimes we have to deal with nominal data. If this is the case, in statistics, we have the contingency table or cross-tabulation which is a table in form of a matrix that shows the variables’ frequency distribution. A contingency table helps us answer the following question: is there a correlation between two variables? The contingency table is created by counting the number of occurrences of the observations that belong to two categories. Let us presume that we want to study the situation of a gastroenterological hospital ward, where we are going to take into account two variables: sex (female or male) and the presence of hepatic cancer (yes or no). We shall presume further that we have 70 patients on the ward, and we want to study of sex differences in hepatic cancer. The contingency table below displays the number of patients that are female and have / or do not have hepatic cancer, and the number of patients that are male and have / or do not have hepatic cancer. Table 3.1 presents the simplest kind of 2 × 2 contingency table. Table 3.1 Contingency table for gastroenterological ward
Sex
Health status
Total
Hepatic cancer
Non-hepatic cancer
Female
14
24
38
Male
10
22
32
Total
24
46
70
3 A Statistical Analysis Handbook for Validating Artificial Intelligence …
67
Table 3.1 depicts the proportion of women that suffer from hepatic cancer, as well as the proportion of men that suffer from hepatic cancer. We can see that the numbers are close, but not equal. We can measure the strength of the correlation through odds ratio, whereas the significance of the correlation through the performance of independence tests, such as Pearson’s χ 2 . We instruct the reader to note that even if computing the χ 2 independence test is similar to the goodness-of-fit test, one must not confuse the two.
3.2.2 Odds Ratio In what follows we shall present how the odds ratio for Table 3.2 is computed. Let us denote with A the event: a person is female, and B the event: a person has hepatic cancer. Hence, we can translate the following statements: • the probability of being female and having hepatic cancer is the event A and the ; event B and it equals p11 = 14 38 • the probability of being female and not having hepatic cancer is the event A and event !B and it equals p10 = 24 ; 38 • the probability of being male and having hepatic cancer is the event !A and the event B and it equals p01 = 10 ; 32 • the probability of being male and not having hepatic cancer is the event !A and the event !B and it equals p00 = 22 32 The odds ratio of A and B is: OR =
p11 p00 . p10 p01
Taking into account the OR value, we say that: Table 3.2 Accuracies (%) of AI algorithm over 20 computer runs No. of run
Accuracy
No. of run
Accuracy
No. of run
Accuracy
1
89.34
8
95.20
15
90.09
2
95.20
9
88.89
16
88.89
3
87.50
10
89.34
17
95.20
4
93.40
11
95.20
18
87.50
5
92.35
12
87.50
19
93.40
6
90.09
13
93.40
20
92.35
7
88.89
14
92.35
Mean
91.30
SD
2.79
95% CI
(89.99, 92.61)
68
S. Belciug
• there is a positive correlation between the two events if OR’s value is greater than 1; • there is a negative correlation between the two events if OR’s value is less than 1; • there is no correlation between the two events (i.e. the events are independent) if OR’s value equals 1. In our case OR is 1.28, that is two events are positive correlated [12]. Thus, we have established that indeed there is a correlation between the two events. Now, we are interested in finding out the statistical significance of that association. Next, we are going to briefly present Pearson’s χ 2 test.
3.2.3 Pearson’s χ 2 Test This test is applied when we are dealing with categorical data. The hypotheses in this case are: • H0 : the frequency distribution of the sample’s events fits a certain theoretical distribution. • H1 : the frequency distribution of the sample’s events does not fit a certain theoretical distribution There are few assumptions that need to be verified before applying the Pearson’s χ2 test: the events are mutually exclusive, and the total probability equals 1. Depending on the test’s value we can draw the following conclusions: • If we achieve a small χ2 statistic, then we can say that the observed data (the sample values) fits the expected data very well, meaning that the relationship is significant. • Otherwise, if the χ2 statistic is large, then the observed data does not fit the expected data, the relationship is not significant. Let us denote the observed data with Oi , i = 1, 2, . . . , n, and the expected data with E i , i = 1, 2, . . . , n. To verify if the two sample have the same distribution, we compute: χ2 =
n (Oi − E i )2 . Ei i=1
The p-value for the χ 2 statistics with n − 1 degrees of freedom is found in the corresponding critical value table [13].
3 A Statistical Analysis Handbook for Validating Artificial Intelligence …
69
3.3 Normality Tests The last section left us dealing with determining the data distribution. Recall that there are two ways to compute this: either by plotting the data distribution and comparing it to the Gaussian bell, or by applying different statistical tests. Even if in practice all data scientists use different software programs to perform these tests, we believe that it is very important that the theory behind them is understood. Hence, we shall present the theory together with examples on how the tests work.
3.3.1 Kolmogorov–Smirnov Goodness of Fit (K-S) Test Being a statistical test, the K-S test works with hypothesis [14–17]. Being a normality test, it is only natural that the hypotheses refer to the normality of the data. Therefore, the working hypotheses are: • H0 : the data is governed by the Normal distribution. • H1 : the data is not governed by the Normal distribution. To find out which hypothesis is true, one must undertake the following steps: 1. 2. 3. 4.
Build the sample’s empirical distribution function. Plot the sample’s distribution function graph together with the Normal distribution’s graph. Determine the largest vertical difference between the two graphs. Calculate the K-S statistics. D = supx |F0 (x) − Fdata (x)|.
5.
Use the appropriate K-S table which contains the critical values of the test to find and compare the critical value to the test statistic value.
The K-S test has been improved using the Lilliefors test. The Lilliefors test can be used even if we do not know the mean or dispersion of the sample.
3.3.2 Lilliefors Test In the case of the Lilliefors test we are going to use the same hypotheses as the K-S test [18, 19]. In this case we need to compute the z-score for every observation. The Lilliefors test has the following steps: 1. • •
For each observation we compute the z-score, having the following notations: z i is the z-score for each of the observations in the sample s is the standard deviation of the sample
70
• •
S. Belciug
X i is the sample observation X is the mean of the sample zi =
2.
Xi − X , i = 1, 2, . . . , n. s
Compute: i − zi , 1 ≤ i ≤ n , n i −1 ,1 ≤ i ≤ n . D − = min z i − n D + = max
3. 4.
Determine D = max D + , D − . Use the appropriate critical value in the Lilliefors table and compare it to the test value, to conclude whether we need to accept or reject the null hypothesis.
3.3.3 Shapiro Wilk W Test The third test that we are going to present is the Shapiro Wilk W test [20]. This test computes the W statistics. The steps for computing the W statistics are: 1.
The n observations {x1 , x2 , . . . , xn } are arranged in ascending order: x1 ≤ x2 ≤ . . . ≤ xn
2.
The following statistics must be computed: Z2 =
n (xi − x)2 . i=0
3.
Compute the differences: d1 = xn − x1 , d2 = xn−1 − x2 , ... di = xn−i+1 − xi , for i = 1, 2, . . . , n/2 or (n − 1)/2, depending whether n is odd or even.
4.
Using the Shapiro Wilk ai coefficients, we need to compute:
3 A Statistical Analysis Handbook for Validating Artificial Intelligence …
b=
k
71
ai di
i=1
5.
The W statistics is computed: W =
6.
b2 . Z2
Using the Shapiro Wilk W table, decide whether to accept or reject the null hypothesis.
Let us present a practical example of how to apply the test. We shall presume that we have developed a ML algorithm that detects lung tumors from CT scans. We have run the algorithm for 20 times, and we want to see whether the sample data that contains the 20 obtained accuracies is governed by the Normal distribution, or not. Table 3.2 presents the accuracies obtained during the 20 computer runs, the mean and standard deviation (SD), together with the 95% confidence interval. We are interested in determining whether the sample of accuracies is governed by the Normal distribution or not. The hypotheses are: • H0 : the sample data contains the accuracies has a Normal distribution • H1 : the sample data does not have a Normal distribution. Figure 3.2 shows the distribution graph plotted together with Gaussian Bell. Also, the three normality tests are performed. We can see that the K-S D statistic is 0.16782,
Fig. 3.2 Distribution function plot together with Normal distribution plot
72
S. Belciug
with p-level > 0.20, whereas the Lilliefors p-level < 0.15. The Shapiro Wilk W statistic is 0.89263, with p-level equaling 0.03006. It appears that we find ourselves in an awkward situation: the K-S test and Lilliefors test state that the null hypothesis should be accepted, because the both p-levels are above 0.05, but the Shapiro Wilk W statistic’s p-level is below 0.05, hence it states that the null hypothesis should be rejected. What is there to be done in this case? Unfortunately, we have only 20 computer runs, a number that is not sufficient for the Central Limit Theorem to be applied. The Central Limit Theorem states that if the sample size is large enough (above 30) then the sample distribution becomes approximately Gaussian [7]. Because we have a tie in this situation, not knowing exactly whether to reject or accept the null hypothesis, we have two alternatives: the first is to proceed with the hypothesis testing using statistical tests that presume the Normal distribution for the sample, and statistical tests that do not presume the Normal distribution; the second is to complete the sample with 10 more computer runs. We shall use the first approach, and thus present tests that do not need the sample data to be governed by the Normal distribution. In what follows, we are going to continue to present different statistical benchmarking tests.
3.4 Statistical Benchmarking Tests In general, when developing a new ML algorithm, or when we want to compare two different medical treatments, drugs, vaccines or procedures, we perform comparisons between different samples. Depending on the case we formulate the null and the alternative hypotheses. For instance, if we want to compare the performances of two or more ML algorithms, we are interested in finding out whether there are significant differences between them or not. In this case the two hypotheses would be: • H0 : there are not significant differences between the performances of the algorithms. • H1 : there are significant differences between the performances of the algorithms. One major concern when performing statistical analysis is choosing the correct statistic test. In choosing the test, we need to have to keep in mind the nature of the data and also the scope of the analysis. The following questions require to be asked and answered: 1. 2. 3. 4.
The observations from the sample should be split into one, two or multiple groups? Are the observations from the samples dependent or independent? The data type is continuous or categorical? Are the samples governed by the Normal distribution?
3 A Statistical Analysis Handbook for Validating Artificial Intelligence …
73
With these questions in mind, let us browse through some of the most frequently used statistical tests.
3.4.1 T-test or Student’s T-test The most known statistical test is the t-test. It was developed by William Sealy Gosset, the Head Brewer of Guinness. The t-test is used for determining how significant are the differences between two samples based on their means. If we developed a new ML algorithm and we want to see if it performs differently than the top state-of-the-art algorithm, we could use this test. The t distribution uses just one parameter for the degrees of freedom. To compute the degrees of freedom we must deduct 1 from the sample size. In this type of test the degrees of freedom relates to the estimated standard deviation. Besides computing the difference in means, we can also use the t-test for determining the confidence interval. Depending on the problem at hand we can use one of the following versions of the test: • One sample t-test: tests the sample’s mean against a previously known mean. • Paired sample t-test: compares the mean of a group at different times. • Independent sample t-test: compares the means of two different groups. In case we have multiple groups, we use the analysis of variances. The t-test can be performed using different statistical software. Here we mention Statistica StatsSoft, SPSS, Python or R. Because we are discussing how we should validate the results of ML algorithms, we are interested in the independent sample t-test, also known as the t-test for two independent groups of observations.
3.4.2 T-test for Two Independent Groups of Observations As stated above, the t-test for independent samples compares the means of two groups, to determine whether there are significant differences between them or not. Before performing the test two assumptions must be verified: the first is whether the samples are governed by the Gaussian distribution or not; the second is whether the two groups have approximately the same variance. Here our interest revolves around the variance of the difference between the two means. In what follows we are going to present the theory behind this type of t-test. Let us denote with s1 and s2 the standard deviations of the two groups, and with n 1 and n 2 the sample sizes. We are interested in computing the pooled variance s 2 :
74
S. Belciug
s2 =
(n 1 − 1) · s12 + (n 2 − 1) · s22 . n1 + n2 − 2
The variance of the difference between means is computed as follows: se(x1 − x2 ) = s ×
1 1 + , n1 n2
where x1 and x2 are the means of the two groups. In the case of two groups the degrees of freedom is equal to n 1 + n 2 − 2. Finally, we compute the t-statistics: t=
x1 − x2 . se(x1 − x2 )
Returning to our example involving the ML algorithm, let us see how we can test its performance against the performance of another ML algorithm. In Table 3.3, we have the accuracies obtained by running the second ML algorithm 30 times, together with the mean, SD, and 95% CI. We can see from Fig. 3.3 that the distribution plot does not seem Gaussian. The two tests concur this information, since the both p-levels from the Lilliefors and Shapiro Wilk W test are below 0.05. Even so, due to the fact that the sample size is 30, the sample is approximately normal. We mentioned above that in order to perform the t-test besides verifying the data’s distribution, we should also check the equality of variances. The following two tests can be performed for this task: Levene and Bartlett. Levene’s test is used when the data set does not have a normal distribution. If this is not the case, then the Bartlett’s Table 3.3 Second ML accuracies (%) obtained after 30 computer runs, mean, SD, and 95% CI No. of run
Accuracy
No. of run
Accuracy
No. of run
Accuracy
1
87.8
11
91.3
21
91.3
2
93.4
12
87.8
22
93.4
3
88.5
13
93.4
23
78.9
4
84.3
14
88.5
24
95.4
5
92.6
15
84.3
25
91.3
6
87.8
16
92.6
26
87.8
7
93.4
17
87.8
27
93.4
8
88.9
18
93.4
28
90.8
9
78.9
19
78.9
29
87.8
10
95.4
20
95.4
30
92.3
Mean
89.56
SD
95% CI
(87.79, 91.32)
4.71
3 A Statistical Analysis Handbook for Validating Artificial Intelligence …
75
Fig. 3.3 Second ML distribution function plot together with Normal distribution plot
test should be used. We shall make a little detour from the t-test, so that we could explain these two tests.
3.4.3 Equality of Variances: Levene’s Test Levene’s test was developed by Levene in 1960, [21]. The null and alternative hypotheses are: H0 : σ12 = σ22 = . . . = σk2 H1 : σ12 = σ22 = . . . = σk2 Levene’s test uses the W statistics: k Ni (Z i − Z ..)2 (N − k) W = · k i=1 2 , Ni
(k − 1) Z i j − Z i. i=1
j=1
We have denoted: • • • •
k is the number of data samples. Ni is the ith group’s number of samples. N is the total number of samples, from all the groups. Z i j = Yi j − Yi , with Yi j being the value of variable j, from the ith group, and Yi .
76
S. Belciug
• Z i. =
1 Ni
• Z .. =
1 N
Ni
Zi j j=1 Ni k
Zi j
i=1 j=1
The W statistics is governed approximately F-distributed with k − 1 and N − k degrees of freedom.
3.4.4 Equality of Variances: Bartlett’s Test Bartlett’s test was developed by Bartlett in 1937, [22]. For this test we are going to keep the two hypotheses from above. We need to compute the following statistics:
k
(N − k) ln S 2p − i=1 (n i − 1)ln Si2
χ = , k 1 1 1 1 + 3(k−1) i=1 n i −1 − N −k 2
where we denote: k • N= n i , – variances i=1
• Si – variances • S 2p = N 1−k (n i − 1)Si2 the pooled estimate of the variance. i
A short recap is in order: both samples are approximately normal, whether the tests revealed this, or we have a sufficient large enough sample. Thus, we are going to present the results of both tests in Table 3.4. According to both Levene and Bartlett’s tests, the samples have equal variances (p-level > 0.05). Returning to the t-test, we now have both assumptions (normality and equality of variances) verified. Hence, after performing the test we have the following results, see Table 3.5. The t-value is 1.485 and the p-value is 0.143, implying the fact that there are no significant differences between the performances of the two ML models. Table 3.4 Levene’s and Bartlett’s tests
Test
Test value
p-level
Levene’s test
3.400
0.071
Bartlett’s test
2.212
0.143
Table 3.5 t-test for independent values New ML versus Second ML
t-value
p-level
1.485
0.143
3 A Statistical Analysis Handbook for Validating Artificial Intelligence …
77
Recall, that the two data samples have approximately normal distributions. Thus, we are going to present its alternative, the non-parametric Mann–Whitney U test.
3.4.5 Mann–Whitney Test or Mann–Whitney Wilcoxon Test The Mann–Whitney U test is the non-parametric version of the t-test [23–25]. We use this test when the samples are not governed by the Gaussian distribution. The first step that we need to undertake is to merge the two samples together and afterwards to rank the observations. Let us see the process step by step through an example. Table 3.6 contains the merged sample and the ranking of the observations. After each sample has been ranked, we sum the ranks per sample. From Table 3.5 we can see that the rank sum for sample 1 is 559, and for sample 2 is 716. We can verify our and comparing it to
computation by summing up the sum ranks N (N +1) = 1275 . The Mann–Whitney the sum of all ranks 559 + 716 = 1275and 2 test can used with both the T or U statistics. The T statistics is equal to the sum of ranks in the smaller group (559). On the other hand, the U statistics is computed using the following formula: 1 U = n 1 n 2 + n 1 (n 1 + 1) − T. 2 Practically, the U statistics estimates the probability of an observation from the first sample to have a smaller value than an observation from the second sample. Thus, U is the number of all possible pairs that verifies this assumption. If the sample sizes are large enough, then we can presume that the T statistics is approximately normal, having as mean and standard deviation, μ = n s (n s + n L + 1)/2, and σT = √ n L μT /6. We have denoted with n s the number of samples in the smaller group, whereas n L the number of samples in the larger group. T . Finally, we need to compute the z-statistics with the formula: z = T −μ σT 20(20 + 30 + 1) = 510, 2 30 · 510 σT = = 123, 69 6 559 − 510 z= = 0.396 123, 69 μT =
The corresponding p-value is greater than 0.05, thus there are no significant differences between the two ML algorithms. If we care to verify whether the performance samples of two ML algorithms are statistically significant different, we can use the t-test or Mann Whitney U test. But what happens if we need to verify the differences between multiple ML algorithms?
78
S. Belciug
Table 3.6 Computations for Mann–Whitney U test (Rank and Accuracies (ACA)) ML 1 (20 runs)
ML 2 (30 runs)
Rank
Rank
ACA
2 2 2
78.9
31
92.35
4.5
84.3
31
92.35
4.5
84.3
31
92.35
ACA
7
87.5
7
87.5
7
87.5
ML 1 (20 runs)
ML 2 (30 runs)
Rank
Rank
ACA
78.9
27
91.3
78.9
29
92.3
33.5
92.6
33.5
92.6
ACA
39
93.4
11.5
87.8
39
93.4
11.5
87.8
39
93.4
11.5
87.8
39
93.4
11.5
87.8
39
93.4
11.5
87.8
39
93.4
11.5
87.8
39
93.4
15.5
88.5
39
93.4
15.5
88.5
39
93.4
49
95.4
49
95.4
49
95.4
18
88.89
45.5
95.2
18
88.89
45.5
95.2
18
88.89
45.5
95.2
45.5
95.2
20 24.5
90.09
24.5
90.09
90.8
27
91.3
27
91.3
Sum = 559 (ML 1)
Sum = 716 (ML 2)
We still can apply the t-test, but it would imply a lot of work because it should be performed repeatedly for every two possible combinations of ML algorithms. A better solution is using the one way analysis of variances (one way ANOVA).
3.4.6 One Way ANOVA As the name states, one way ANOVA uses the variance to compare the groups, [26]. If we would use two way ANOVA, then we would use two parameters to compare the
3 A Statistical Analysis Handbook for Validating Artificial Intelligence …
79
groups. The null hypothesis states that there are no significant differences between the groups. To compare the variances we use the F distribution. Take a closer look to the null hypothesis. After computing the F distribution, we determine whether we accept or reject the null hypothesis, but it only states if there are or not differences between all the groups, it does not determine exactly which groups are different. This is ANOVA’s limitation. Fortunately, we can mend this issue by performing ad-hoc or post-hoc analysis. The ad-hoc analysis is performed during ANOVA, whereas the post-hoc is performed after the process is over. For example, an ad-hoc test is the least significant difference, and the Tukey Honest Significant difference is a post-hoc analysis. The assumptions for ANOVA are the same as for the t-test, the samples must have nearly normal distribution and approximately equal variances. ANOVA studies the variances of the residuals. The residuals are computed as the difference between the mean of each group and each object in that group. The following steps are employed: 1. 2. 3. 4. 5.
The mean of each group is computed. The overall mean is computed as the mean of all observations. The within group variation is computed as the total deviation of each object from each sample. The deviation of each group from the overall mean is computed. The F statistics is computed as the ratio between the variation of the group and the variation within the group.
We mentioned earlier that ANOVA does not provide the answer to which pair of groups are significant different or not, and that we should use either an ad-hoc or a post-hoc analysis. In what follows, we shall present the post-hoc Tukey test.
3.4.7 Tukey’s Honest Significant Difference Test Tukey’s test is applied after the ANOVA test was performed and we have found that there are significant differences between the groups [27, 28]. Tukey’s test compares the groups’ pairs of means. The honest significant difference is computed using the following formula: μi − μ j H SD = μsw nh
where: • μi − μ j is the difference between the mean of group i and the mean of group j. Please keep in mind the fact that μi ’s value should be greater than μ j . • μsw is the mean square within. • n h is the number of samples of a group.
80
S. Belciug
After computing the HSD, we can see which groups are significantly different. To see how one-way ANOVA works, we are going to add another ML classifier. Its performances are recorded in Table 3.7, whereas the normality tests results can be spotted in Fig. 3.4. From Fig. 3.4, we can see that the Kolmogorov–Smirnov and Lillifors tests state that the third ML’s distribution is normal, whereas the Shapiro Wilk test states that it is not normal. Having a sample size above 30, we can say that distribution is approximately normal. The third ML also has approximately the same size as the Table 3.7 Third ML performances (%) No. of run
Accuracy
No. of run
Accuracy
No. of run
Accuracy
1
80.50
11
60.30
21
72.40
2
89.30
12
88.40
22
75.30
3
85.40
13
80.20
23
80.30
4
90.20
14
77.50
24
60.30
5
95.12
15
89.30
25
88.40
6
87.62
16
85.40
26
80.20
7
83.40
17
90.20
27
77.50
8
72.40
18
95.12
28
95.12
9
75.30
19
87.62
29
87.62
10
80.30
20
83.40
30
83.40
Mean
82.54
SD
8.75
95% CI
(79.31, 85.85)
Fig. 3.4 Normality plot together with 3rd ML distribution plot
3 A Statistical Analysis Handbook for Validating Artificial Intelligence …
81
Fig. 3.5 Visual schema of the statistical benchmark process
other two MLs, so, in practice we can presume that their variances are equal. Hence, with these assumptions tested we can proceed further with one-way ANOVA analysis. Table 3.8 presents one-way ANOVA’s results in terms of SS (sum of squares), degrees of freedom (df), MS (mean squares), F-value, and p-level (contrast: quadratic polynomial). We can see that ANOVA points out that there are significant differences Table 3.8 ANOVA ML
SS
df
MS
F
p
933
2
466.5
11.14
0.000
82
S. Belciug
between the three ML models. We know from the previous statistical analysis, that between the first two models there aren’t any statistically significant differences, thus implying that the differences are between the first two and the thirds ML model. Still, we are going to apply the Tukey’s post hoc analysis. The posthoc Tukey’s analysis revealed that there are significant differences between the first two ML models and the third one, and no difference between the first two models: • The p-level between the first ML and the third is 0.0001; • The p-level between the second ML and the third ML is 0.002; • The p-level between the first and the second ML is 0.1849.
3.5 Conclusions The chapter provides a minimal statistical analysis handbook for validating ML techniques. Unfortunately, statistical analysis is often missing in medical journals, and thus the reported results might be questioned. ML are meta-heuristics, algorithms of stochastic nature, not deterministic, therefore the only way we can determine their robustness and correctness is by benchmarking them statistically. In this chapter we have discussed hypothesis testing, p-level and the normal distribution. For each one of this concepts theory (Cochran’s formula, Kolmogorov– Smirnov, Lillifors, Shapiro Wilk, Leneve, Bartlett, Mann Whitney U tests, t-test, one-way ANOVA, Tukey post-hoc etc.) as well as practical examples have been presented. We hope that through this chapter we have shed a light over how should a statistical validation analysis should be performed. Please keep in mind that we have presented only one way of a statistical benchmarked, other analysis, might also be possible. Last, but not least, we leave the reader with a visual schema or roadmap of the statistical analysis benchmark process presented in this chapter.
References 1. J.A. Hill, How to review a manuscript. J. Electrocardiol. 49(2), 109–111 (2017). https://doi. org/10.1016/j.electrocard.2016.01.001 2. D.M. Windish, S.J. Huot, M.L. Green, Medicine residents’ understanding of the biostatistics and results in medical literature. JAMA 298, 1010–1022 (2007) 3. S.B. Johnson, A framework for the biomedical informatics curriculum. AMIA Annu. Symp. Proc. 331–335 (2003) 4. IMIA, Recommendations of the international medical informatics association (IMIA) on education in health and medical informatics. Methods Inf. Med. 39, 267–277 (2000) 5. M. Scotch, M. Duggal, C. Brandt, Z. Lin, R. Shiffman, Use of statistical analysis in biomedical informatics literature. J Am. Med. Inform. Assoc. 17(1), 3–5 (2010) 6. S. Belciug, Artificial Intelligence in Cancer: Diagnostic to Tailored Treatment, 1st edn. (Elsevier, Academic Press, 2020)
3 A Statistical Analysis Handbook for Validating Artificial Intelligence … 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21.
22. 23. 24. 25. 26. 27. 28.
83
D.G. Altman, Practical Statistics for Medical Research (Chapman and Hall, New York, 1991) L. Kish, Survey Sampling, Wiley Inter Science (1995) S.L. Lohr, Sampling: Design and Analysis, Duxbury Press (1999) W. McLennan, An Introduction to Sample Surveys. A.B.S. Publications, Canberra (1999) W.G. Cochran, Sampling Techniques (Wiley, New York, 1963) A.W.F. Edwards, The measure of association in a 2 x 2 table. J. Roy. Stat. Soc. 126(1), 109–114 (1963) R.L. Plackett, Karl Pearson and the chi-squared test. Int. Stat. Rev Revue Internationale de Statistique 51(1), 59–72 (1983) A.N. Kolmogorov, Sulla determinazione empirica di una legge di distribuzione. Giornale dell’Instituto Italiano degli Attuari 4, 83–91 (1933) F.J. Massey, Distribution table for the deviation between two sample cumulatives. Ann Math Stat 23, 435–441 (1952) N.V. Smirnov, (1939) Estimate of deviation between empirical distribution function in two independent samples. Bull Moscow Univ 2(2), 3–16 (1939) N.V. Smirnov, Table for estimating the goodness of fit of empirical distributions. Ann Math Stat 19, 279–281 (1948) H.W. Lilliefors, On the Kolmogorov-Smirnov test for normality with mean and variance unknown. J Am. Stat Assoc 62(318), 399–402 (1967) S.S. Shapiro, M.B. Wilk, An analysis of variance test for normality (complete samples) Biometrika, 52(3–4), 591–611 (1965) H.W. Lilliefors, On the KolmogorovSmirnov test for the exponential distribution with mean unknown. J. Am. Stat. Assoc. 64(325), 387–389 (1969) H. Levene, Robust tests for equality of variances, in Contribution to Probability and Statistics: Essays in Honor of Harold Hotelling ed. by I. Olkin, H. Hotelling, et al. (Stanford University Press, 1960), pp. 278–292 M.S. Bartlett, Properties of sufficiency and statistical tests. Proc. R. Stat Soc. A 160, 268–282 (1937) W.H. Kruskal, Historical notes on the Wilcoxon unpaired two-sample test. J. Am. Stat. Assoc. 52, 356–360 (1957) H.B. Mann, D.R. Whitney, On a test whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947) F. Wilcoxon, Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945) D. Howell, Statistical methods for psychology. Duxbury, pp. 324–325 D. Brillinger, The collected works of John W Tukey (1984) J. Tukey, Comparing individual means in the analysis of variance. Biometrics 5(2), 99–114 (1949)
Chapter 4
Designing Meaningful, Beneficial and Positive Human Robot Interactions with Older Adults for Increased Wellbeing During Care Activities Sonja Pedell, Kathy Constantin, Diego Muñoz, and Leon Sterling Abstract This research explores the benefits of introducing humanoid robots into different active ageing and aged care settings. We visited active ageing groups with a focus on dementia, knitting and a men’s shed. We also took the robot to a residential care home to set up engaging activities. Exploring assumptions of older adults and staff about the capabilities, purpose and intelligence of the robot played a large role in understanding how robots should be introduced. We found that implementation and interactions need to be carefully crafted in advance for developing trust and interest, and for creating a shift in feelings of control in older adults as well as staff. Benefits, meaning, and comfortable interactions are created through building on existing skills, familiarity, and past experiences. When done successfully, older adults were seen to engage in playful and empowering ways, enjoying the interactions with both the robot and the wider group with positive effects beyond the time the actual interactions took place. The article summarizes the findings across the different settings. It presents recommendations for introducing older adults to interact with humanoid robots, supported by motivational goal modelling and technology probe techniques. We consider our research in group settings to be relevant for the wider acceptance of the use of robots. We discuss that researchers should set clear goals for the interactions between the robot and older adults and gradually introduce the technology to older adults in a participatory way in group settings before attempting one-on-one scenarios with them.
S. Pedell (B) · K. Constantin · D. Muñoz · L. Sterling Centre for Design Innovation, Future Self Living Lab, Swinburne University of Technology, 1 John Street, Hawthorn 3122, Victoria, Australia e-mail: [email protected] K. Constantin e-mail: [email protected] D. Muñoz e-mail: [email protected] L. Sterling e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_4
85
86
S. Pedell et al.
Keywords Human–robot interaction · Humanoid robots · Older adults · Technology engagement · Dementia · Designing interactions · Motivational models · Technology probes
4.1 Introduction There are often fears and negative assumptions around robot use. Despite this, research articles in social robotics argue that older adults—particularly when socially isolated—can benefit from interactions with humanoid robots [1]. However, scenarios on how to introduce, tailor the robot’s functions, and establish benefits of robot use over time is scarce. Through our study of humanoid robots in groups of older adults, we found that discovering beneficial and engaging use scenarios takes time. Implementation needs to be conducted with care, challenging the cost-benefit of using robots in one-to-one settings. We suggest that the benefits, emotions, and user goals need to be better understood when interacting with humanoid robots before introducing them widely to older adults without expectations on how they should be used. In particular, we discuss emotions arising from and goals for human-robot interactions. We base our approach on Human-Computer Interaction (HCI) literature on tools and techniques for technology design to make recommendations for designing human-robot interaction in care settings for older adults to increase their wellbeing. The Nao robot has been used successfully with young children in rehabilitation [2] as well as in physical exercise at schools [3] showing positive effects on children’s and emotional wellbeing [4]. We investigate the benefits for care groups of older people—some of them living with dementia. Part of our rationale for the investigation is the high demand on carers and the need for new and different forms of stimulation [5]. Research is needed to bridge part of the credibility gap between the extremely positive expectations and claimed potential for the role of robots portrayed as social companions [6] and negative attitudes [7]. Here, we aim to critically explore and discuss the benefits of human-robot interaction for older adults in care settings. Researchers face many challenges when investigating the use of robots in social situations, some of these being of an ethical nature concerning the acceptance of people and understanding how the relationship that older adults establish with the robot can benefit their social interactions (e.g. [8]). Humanoid robots are not off-theshelf products that can be turned on and seamlessly used. Introducing a robot in a social context makes the robot part of the social situation, where people have different capabilities and where there are contextual circumstances that may be specific to the user group [9]. Hence, there is a need to consider the personal preferences, capabilities and circumstances of the user groups when designing interactions with robots. To explore the benefits of the robots researchers must prepare interactions carefully in complementing the expertise of different disciplines including humanrobot interaction (HRI), human-computer interaction (HCI), software engineering
4 Designing Meaningful, Beneficial and Positive Human …
87
(SE), design and psychology. We look at these interactions from a social relationship point of view.
4.2 Social Robotics Social robotics is an emerging field (e.g. [3, 10, 11]). Socially assistive robots and their acceptability and success have been investigated in residential care settings [12] and their ability to support the process of care giving to increase well-being of older adults is promising [13]. In particular, companion robots have been promoted to address social isolation—for a comprehensive overview on companion robots refer to Robinson et al. [14]. Beneficial scenarios and goals around socializing are subtle and ill-defined, and expectations differ from user to user. This is where our research is located. One of the best-known robots is the PARO seal companion; it has been increasingly successful among elderly living in nursing homes [1]. While this robotic technology has been praised, there has also been criticism claiming that the PARO is patronizing and childish [5, 15]. Sharkey [16] discusses the pros and cons of HRI in elderly care and its impact on human dignity. She points to the risk of “developing robotic ‘solutions’ to the problems of aging that result in a reduced rather than improved quality of life for older people.” (p.63). One of the important factors she points out is the possibility for the older person of having a choice of interaction. Similarly, Sharkey and Sharkey [17] discuss ethical considerations of robot care for the elderly. The main ethical concerns they raise include reduction of human contact (also discussed by Sparrow and Sparrow [18]), loss of control, loss of privacy, loss of personal liberty, deception and infantilization. Some of these concerns are summarized by Turkle [19] with the term ‘inauthentic relationships’. We consider the introduction of a robot into a group setting the ideal environment to avoid these pitfalls and investigate the benefits of HRI in an ethical manner. In a group the robot is not taking away any opportunities for human interaction and leaves the control to the individuals whether they want to interact or not. Previous works have found that group settings have a significant impact on the engagement with the robot [20] and that robots encourage social interaction in a group [21].
4.2.1 The Nao Robot This research looks specifically into the role of humanoid robots and their benefits for group activities of older people some of them living with dementia. We used NAO robots (referred to hereafter as Nao), which are autonomous and programmable humanoid robots that were created and developed by Aldebaran Robotics (rebranded as SoftBank). They are 58 cm tall and weigh roughly 4.3 kg. The name Nao is derived from the Romanisation of the Mandarin Chinese word for ‘brain’ (‘n˘ao’), and the English word ‘now’. Development of these robots first began in 2004, however,
88
S. Pedell et al.
the first public release of the robots was in 2008. Since then, there have been five release versions of Nao, with the one used for this research being released in 2014. Development began under the name Project Nao; the project was started to produce an intelligent humanoid robot for the consumer market. Currently Nao is largely used as part of education, research or by developers. While this robot now is not the current model, and many new robots (such as Pepper and Valkyrie) have emerged on the market and created interest in the robot (research) community, we have continued using the same Nao. We consider consistency of use important to investigate key benefits for users beyond new technical features and technical specifications. To do this and push towards robot human interactions that increase wellbeing, we think it is important to investigate the same robot and build on growing knowledge and programs. It is noteworthy that this has caused some challenges in updating software and backwards compatibility once the focus of interest by the wider research community on the Nao robot decreased. This is also a consideration in terms of how technology development is planned if the aim is to render robots a true consumer product offered to aged care providers.
4.2.2 The Need for Meaningful Activities and a Holistic Approach There have been many robots in the past that have failed for social purposes because people simply did not feel comfortable in their presence. The small Nao robot, while being humanoid but not too similar to a human being and having a wide range of functions, is promising to be used as social robot. Here we are particularly interested in social interaction between four groups of older adults in care environments. We took the robot to a residential care home and visited three active ageing groups with a focus on, knitting, a men’s shed, and one specifically focusing on dementia. It can be a challenge to keep people living with dementia occupied with engaging activities due to their often short attention span and declining mental capacity. Humanoid robots provide a more sophisticated presence—an important aspect in social interaction [22]—which we expect to be helpful to increase engagement. Another advantage of the robot was that we do not have to consider input devices which can be problematic due to limitations of older users’ abilities such as sensory and motor impairments or simply a hesitation to touch new technology. However, older people do have diverse needs and interests depending on their life experiences and circumstances. Meaningful activities that enhance those people’s life and in addition support the care of activity groups and carers are crucial [23]. To be successful in providing stimulating activities including a robot, we need to understand the situation of these specific user groups to set realistic goals. We need to design for the whole socio-technical system and not merely rely on the functionality of the robot. This is in accordance with Young et al. [24] calling for a holistic approach to evaluate interactions with robots, which also include perspectives on social mechanics and social structures.
4 Designing Meaningful, Beneficial and Positive Human …
89
4.3 Method: Learning from HCI Approaches for Exploring Social HRI Here we describe the approaches we applied to better understand the potential of humanoid robots contributing to older adults’ wellbeing in care settings. Breazeal et al. [10] advocate for understanding multiple dimensions (cognitive, affective, physical, social) to design robots that can be beneficial in the daily lives of people. We concur and suggest in addition that it is not only about the design of the robot but its application and designing beneficial use scenarios. Breazeal et al. [10] further suggest that successful design of robots requires a multidisciplinary approach. The team that conducted the present research consisted of a digital media designer, a robot engineer, a user-centered design specialist, a psychologist, and a software engineer. The shared interest of all team members is in the design of meaningful technologies for older adults. Their common trans-disciplinary field is Human-computer Interaction (HCI). HCI has traditionally been multidisciplinary and its research and design approaches very much user focused on achieving benefits in technology use for quite specific user groups. In our applied approach we build on the HCI knowledge of participatory design [25, 26] and mutual learning [27], technology probes [28], goal modelling with a focus on emotions [29], and the overarching concept of situated action [9].
4.3.1 Situated Action From a participatory perspective, we recognize that older adults and prospective users of the Nao robot in our research can share their experiential knowledge about what they want for their life [30], which is grounded in their knowledge about the context they are immersed in [25]. However, they do not have any expertise or experience in robots and many of them have little if any experience with the use of any modern technology. As they do not know anything about the robot and the robot does not know anything about them, we need an approach to bridge these gaps over time. Hence at the beginning we were in a situation where we would bring along the robot and observe what happened between the group members and the robot immersed in the different care contexts. This concept of situated activities has been described in detail by Lucy Suchman [9]. Suchman’s concepts of situated action [9] and thinking about technologies not as ‘smart’ but located in a social and material context are helpful when thinking about interactive technologies in particularly social robots to create beneficial use scenarios. To be true to Suchman’s notion of situated action [9] and acknowledging that the older adults and staff are the experts of their life context, we decided to follow an approach informed by the principles of participatory design that provides them the tools to share their knowledge with the research team [31, 32].
90
S. Pedell et al.
4.3.2 Participatory Design and Mutual Learning At the core of participatory design is the “democratic participation and empowerment” of the user [33]. Ertner et al. [34] conclude from a substantial review of participatory design practices to empower users is not only a moral, but also a complex and challenging undertaking. In our case, the challenge was that we needed to give user groups of older people who had no experience with robots a strong voice and find out how their use would fit into their life and routines. This meant that participatory design was a mechanism, getting knowledge from participants as they benefit from interactions with researchers and with the technology. This approach is closely related to the concept of ‘mutual learning’: “That is, designers learn from the participants about their experiences, practices and situations, and participants learn from the designers about potential technological options and how these can be provided. Everyone involved learns more about technology design.” [27 p25]. However, this is not a conventional application of participatory design to develop a product, but of an existing product to explore its use. Sanders et al. [35] suggest due to the growing field of participatory design and every project being unique that “it is necessary to decide which design approach(es), methods, tools and techniques to use in a specific project” (p195). We combined participatory design with technology probes and motivational goal modelling describing the reasons below.
4.3.3 Technology Probes Technology probes are well suited to participatory approaches to design as they are able to explore imaginative, investigative, emotional, discursive, reactive, disruptive, reflective, and playful participation [36]. Technology probes [28] support the design of technology in understanding its use in the everyday context. Technology probes are prototype-like devices that are specifically designed to collect participant data and motivate redesign [37]. They are particularly suitable for vulnerable users [38]. Through their ability to capture the nuanced aspects of everyday life in a care setting, the results of technology probe analysis offer a useful starting point for programming meaningful interaction scenarios for older adults. Information and story generation are two important benefits that we see in the use of probes as participatory artefacts. That way the probe technologies become bridging elements or ‘information vessels’ [39] that allow the social activities in the home to permeate discussions of field researchers and engineers responsible for programming the robot. In our project the direct participation of the older adults occurred via the Nao robot. The robot as technology probe meant to learn how interactions should be designed in carefully introducing it to the groups in different settings, iteratively adding new use scenarios based on the data of previous visits and the feedback received. The technology probe results can be re-expressed in terms of the motivational models which are well understood by the engineer programming the robot.
4 Designing Meaningful, Beneficial and Positive Human …
91
4.3.4 Motivational Goal Models and Technology Probes Motivational goal models are particularly suitable to be combined with technology probes in field studies [40]. Firstly, we motivational goal models as a suitable way to express field data between visits with a focus on what people want and what their motivations are for interacting with technology—here the robot. Data gathered using probes are fragmentary and unstructured, the process of translation from field data to the abstract generalization required to program the robot is challenging. A process of combining technology probe data collection and motivational goal models allows us to talk about intangible outcomes, such as that arising from fieldwork which can be surprising, complex, but subtle. The models provide a place where abstract interaction data and those concerned with emotions can be represented and discussed among researchers [40, 41]. There are three goal categories: What a technology should do, how it should be and how it should feel (‘Do/Be/Feel method’ according to Sterling et al. [29]). Secondly, motivational models are part of a development methodology and can be combined with motivational scenarios, roles, and domain models [42], each of them describing and providing context of the domain, which is important because contextual information offered by technology probes is often lost after data analysis. A key element of the goal models constitutes user emotions and when discussing goals from visit to visit we paid increasingly attention to the emotions groups members felt and what made them comfortable in terms of emotions the Nao robot seemed to express.
4.3.5 Understanding Emotions Interpreting emotional body language of robots is increasingly important and subject to recent human–robot research (e.g. [43, 44]). Breazeal et al. [10] suggest that users of robots need to engage not only on a cognitive level, but on an emotional level to be beneficial during interactions. This follows a more general trend in the relevance of emotions to technology design over the last two decades. For example, Boehner et al. [45, 46] introduce a model for emotions in affective computing. They model emotions as interactions: they are dynamic, culturally mediated and socially constructed. This means that emotions are not discrete, and they are experienced by people depending on their own situation. Therefore, they propose that technologies interpret emotions through interactions instead of detecting and categorizing them. This is similar to the construction of emotional goals, one of the three goals categories in the abovementioned motivational goal models [29]. It is the emotional goals that in this method guide the development and evaluation of use scenarios.
92
S. Pedell et al.
4.3.6 Iterative Visits in the Field and Data Collection Our multidisciplinary research investigated the integration of the humanoid Nao robot into care group settings of older adults through iterative visits in the respective locations that the groups met (situated research). During every visit the values of participatory design were applied, and attention paid to the context of each group and its individuals. A range of activities or interaction modes were investigated including demos, exercise, and dancing. A mixed method approach consisting of interviews, observation, researcher notes, video analysis, and interaction studies were applied to evaluate the level of engagement and how the groups reacted to and interacted with the robot. The project has ethics approval of the Swinburne University Human Research Ethics Committee (SUHREC #2012/305). After the visits, the team members discussed their observations and notes in debriefing sessions and defined goals for the next visit based on strong themes coming up. This is comparable to the procedure of reflexive practice described by Ertner et al. [34]. The identified goals can be systematically organized in motivational goal models (a practice that has been developed and formalized by some of the authors). We suggest motivational goal modelling as an approach for future implementations and as a framework for planning the introduction of robots for social purposes in care settings. In the findings we do not only discuss what it means to implement a robot as a technology in a care setting, but the benefits created of the robot in the groups in regard to wellbeing, social relationships and individual benefits over time.
4.4 Four Case Studies Using the Nao in the Field 4.4.1 Preparing Considerations Major consideration is needed when the Nao robot is applied in the field. This is especially true when the Nao robot is interacting with people unfamiliar with robots. These kinds of interactions include many unpredictable factors. This was the first time the engineer, while having worked with robots for years, used the Nao robot in the field. Hence, the overarching objective of this project “of conducting” research on designing and preparing interactions that provide a real benefit for the human side of interaction—in this case four groups of older adults in varying care situations—was new terrain for the robot engineer. Our iterative research unveiled that the standard settings of the NAO robot were not suitable for the intended audiences and their context in order to engage them. Everything the robot did was discussed ahead of the onsite visits, and the interactions were carefully planned, designed, and took the whole team to prepare.
4 Designing Meaningful, Beneficial and Positive Human …
93
4.4.2 Interaction stages Across all case studies we used the Nao robot in different ways adopting interactions to the specific groups according to what we had learnt about their needs and interests. However, there were similar stages of interactions, analogous to a play with several acts. There was always an introductory part, a demo, and a meet and greet as part of the first encounter with the robot. These stages got tailored to each group over time. They included (i) introduction of the robot, (ii) demos, (iii) meet and greets (Wizard of Oz), (iv) Joint activities between robot and group (physical exercise, dancing, fashion show). For a more detailed summary refer to Pedell et al. [47].
4.4.3 Overview When we brought the robot for the first time to the active ageing group for people living with dementia at a local council “it” was met with a lot of skepticism. Group members had no idea what a robot could do, and many (including the staff) were uncomfortable and had a negative perception of robots. In this first encounter we very much showed off what the robot could do and answered questions following a Wizard of Oz approach. We learnt that in order to be beneficial—and with a higher aim to be good for wellbeing—we had to better integrate the robot into the lives of older people and create some purpose. Hence, the second time we went to the council’s knitting group asking whether they could help creating some clothes for the robot to maybe come up with a theatre show to have a concrete purpose. Also, we named the robot Kira, as we wanted to give her more of an identity for the introduction and making it easier to refer to “her”. It is noteworthy to say that staff from the first group then informed us that the group had kept talking about the robot all afternoon and we should go back there, too.
4.4.4 Case Study 1: Active Ageing Knitting Group The knitting group created a boost of understanding for the project and approach. While the group was doubtful why a robot would need some clothes, they were ready to help as is their disposition when asked to apply their knitting and crocheting skills. The act of measuring was a key part of the process. The robot sitting in the middle of the table was first looked at, then cautiously waved at like she was a puppy or a little child in a cautious cute, half hand wave, but then became subject to detailed measuring. It was obvious that the group members started to get comfortable seeing the robot as just another “creature” in the need of some tailored clothing (they had done this many times for humans every age and size, penguins rescued from an oil spill and some general purposeful household crafting) and settled into a routine for a
94
S. Pedell et al.
Fig. 4.1 Measuring (left) and fashion show (right)
job to be done. At this moment, the older adults truly started to interact with the robot moving from the role of spectator watching a robot demo to actor. Kira was pushed, prodded, and professionally measured (Fig. 4.1 left). Suggestions to send them a printed version of the robot specification with mm detailed measurements was just waved away. The researchers could also see that the specification document would not have given the right measurements as there were no circumference measures of the chest, hip, head and upper versus lower arm. Once happy to have acquired the necessary information in swift motions and all noted down and enquiring what exact clothes we had in mind (which we just left to them) we were sent on our way. Only a few weeks later we were told that a set of clothes had been prepared and the crafting members were keen to see the clothes on the robot. Due to technical issues with Kira, an alternate Nao robot was taken back to the knitting group for fitting. When this new Nao was introduced to the knitting group, there was a round of laughter that set the tone for the rest of the visit. The new robot was called Max. The clothes crafted by the knitting group were for Kira “herself” and were mainly female oriented, baby pinks, pastel blues, floral headbands, and frilly skirts. The older adults found it amusing that a “boy” was trying on Kira’s clothing. The only visible difference between Kira and Max was their color, one was blue and the other was orange. This, and the introduction to Max helping with the fitting for the day allowed the older adults to meet “someone” new. They talked to Max and explained things like “now this was for Kira but let’s try it on!”. Everyone was eager to see their handcrafted clothing on Max. There was no competition among the group, just excitement and a great deal of lively conversation. The Nao programmer and robot researcher faded into the background, and the older adults had seamlessly become the facilitators of the activity. Max was their dress up doll, and they took it amongst themselves to drive the fitting, taking turns and stepping back to assess their craft, making comments on fit, style, sizing, and overall look. We decided to organize a fashion show in the dementia respite group (first group). Kira was parading up and down the floor with the different sets of clothes on to everyone’s amusement (Fig. 4.1 right). The engineer had programmed a catwalk movement into the robot. The invited creators of the clothes were standing on the side commenting on fit, realizing that some clothes would slip more on the smooth
4 Designing Meaningful, Beneficial and Positive Human …
95
Fig. 4.2 Kira’s favorite outfit for exercise (left) and with her woolen hat (right)
surface than on other “creatures” or be in the way or get jammed in the joints when the robot was walking or waving its arms. One outfit with a little skirt, tied top and head band became the favorite and most used outfit as it enabled free movement and had a perfect fit (see Fig. 4.2 left). Woolen caps were used in moderation as we had to learn that robots lose 100% of their heat over the head and prevented ventilation resulting in shut down caused through overheating (see Fig. 4.2 right). A story often told taking off the hat at the beginning of a visit. The creators of the crafted clothes got a lot of praise to their great pleasure and enjoyed the applause for every new outfit. The wardrobe of clothes since became a fixed element of all of Kira’s outings reinforcing her persona. With a dressed Kira we never again encountered the same level of hesitation and fear at first contacts with groups as we had before.
4.4.5 Case Study 2: Dementia Respite Care as Part of the Active Ageing Program Having learnt the importance of tying robot interaction into enjoyable, familiar, and skillful activities we went back to the first group. Within dementia care, music therapy is used to control mood and problem behaviors, and even reduce the need for some pharmacological and physical treatments [48]. Music has been demonstrated to provide meaningful engagement for people living with dementia [49]. There is growing evidence to demonstrate musical memory’s robustness, which is often spared by the disease [50]. Hence, music related activities are promising, as music can be played by the Nao robot. Also, mobility is important in dementia care. Robots have been suggested to deliver interventions to prevent physical decline for the elderly as such activities can be programmed and help to remain independent [14]. In this
96
S. Pedell et al.
Fig. 4.3 Exercise with Kira who is sitting down
regard we saw opportunities for the robot to contribute to wellbeing and health in a preventative and enjoyable way. First, we did need to understand the situation of the group and their preferences better to leverage opportunities for physical activities and music. Physical exercise and dancing were activities all group members would participate in at the same time and was also something they clearly enjoyed. Kira was programmed to complete a set of 16 different movements regularly conducted physical exercises, and one song was chosen for dancing. “Give Me a Home Among the Gumtrees” by John Williamson was recommended by the carers as less mobile group members would sing along. The engineer programmed the robot based on a manual of the exercises created by the council with photos of the exercises and descriptions. He had to work from the carers’ exercise schedule and pre-plan the structure, order, and timing of the interactions. The dance was created through a video taken of one of the staff members singing and dancing along to the song. At the next visit Kira introduced herself as someone who also needed exercise and wanted to join in as she felt a bit stiff. As the exercises were done sitting in a circle, it was decided that the robot should also sit on a little stool and put on a table for better visibility (see Fig. 4.3). Maintaining stability in the sitting position was a concern in and of itself as the Nao robot balances better when it is standing, squatting or sitting on the floor but would have been harder to identify with by the older adults sitting themselves. Successful interactions during the physical exercise regime included looking at Kira and copying movements. When the group realized Kira was also doing the exercises, they would shift their attention from Sally (staff member) to Kira. Overall, the staff supported the interaction with referring now naturally to the robot as Kira and addressed her directly with questions and comments such as: “How are you going, Kira?” and “Knee up, here we go Kira, you are not very flexible, are you?”. Sally also reinforced the role of Kira as an instructor saying: “Now we do one round on one foot only, like Kira…”. There were certainly limits to the movements Kira was able to be programmed for. When the instructing staff would do an exercise that the engineer had not programmed, she said for example: Kira can’t do this one as she does not have five fingers, but we are doing it.” The incapability or limits of
4 Designing Meaningful, Beneficial and Positive Human …
97
movements did not seem to bother the participants at all. In contrary there was a sense of pride when they were doing things that Kira was not able to do or did it in a more flexible manner. But there was also sympathy with Kira when she wobbled on her stool trying to keep balance or ‘keep up to speed’ with the movements as all participants were aware of their own physical limitations. This made her more relatable and caused additional interest in the participants, what she was able to do and what not. Movements Kira did particularly well were often complimented with positive responses by the group. In sum the group did not only show high emotional engagement with Kira but also positive response to her incapability during some of the exercises. The dancing along with music worked particularly well. The song “Give Me a Home Among the Gumtrees” was presented by Kira in the correct timing and exact sequence, benefitting from the creation of a video demonstrating the movements going with the song by the staff member. Every group member was involved in some way according to their capabilities—either dancing along and/or singing along. If not able to dance, they would get up and clap or tap their feet along with the song watching Kira who seemingly would look at them with her big round eyes. The staff was again dancing side by side with the robot on the table. In this activity participants were showing sustained engagement which was attributed to Kira’s presence. One visit while dancing on the table Kira stumbled and fell all the way to the ground. Attention immediately turned into concern. As a result of the fall Kira had a twisted foot and when the engineer tried to bend it back, she would start screaming “ouch”. Concern then turned into shocked faces and inquiries. The staff was trying to lighten the situation in asking if we needed to write an incident report. Sympathizing positively with the robot and showing this high level of emotion triggered us to look more closely into the range of emotions and reactions Kira was able to do. During one of the visits in this group, the older adults were all sitting in a large semi-circle facing towards a table with Kira sitting on top. Towards the end of the visit, and while Kira was received well by most, there were a few older adults that didn’t take kindly to Kira at the beginning. They showed signs of disapproval including body language of crossed arms and legs, shifted body weight and refusal to “meet”. When the question was raised whether Kira has emotions or not, one male group member walked up and said: “Excuse me, but I do not think that you feel anything!” Which Kira responded to with breaking out in tears (Staff: “Oh, you are breaking her heart”) followed by a burst of angry fist shaking. While this was quite unplanned, it turned out to be the perfect fit of this reaction to the whole situation, and the whole group burst into laughter including the old man who slapped his thigh and laughed his way back to his seat. Kira then bent forward and started laughing, which made everyone laugh again. While this was an unplanned, unpredicted scenario, it showed that the older adults had become comfortable with engaging with Kira on their own accord. Exercise from there on was conducted on the floor to prevent further falls. During one visit, when trying to get on her feet from laying on the ground, Kira would lose balance and fall over – during the second attempt the group would cheer her, on being engrossed in this sequence. When she fell over again, we needed to pick her
98
S. Pedell et al.
Fig. 4.4 Kira with walker (left) and Kira on the chair (right)
up to make sure the foot would not twist again which resulted in Kira kicking her feet looking like a toddler having a tantrum. This resulted in group members being reminded of their own children when they were younger or their grandchildren, as they told us, which led to a wave of sympathy and laughter. The group shared some interactions that can be classified as social with each other (such as laughing together) as well as with the robot (being confrontational, empathetic and firing the robot on). Realizing how common walking frames were (filling up most of a room next to the group room like a parking ground) a sequence was programmed with Kira staggering around with a walking frame (Fig. 4.4 left). This frame was designed, and 3D printed with great effort by the designer on the team to further to further relate Kira to something older people are familiar with. When Kira started using the walking frame, participants responded with interest and surprise. Also, the staff seemed puzzled, and asked: “Why does she have a walker?”. After this initial surprise that created some discussion around walkers and a high level of interest, the ice seemed to be broken. Participants were interested in Kira and the meet and greet was much livelier than in the previous week. There were discussions around the capabilities and role of technologies and the stigma around assistive devices such as walkers, wheelchairs and walking sticks. According to staff members not accessible to our own observation some group members were more engaged in exercise and increased overall interaction and liveliness in exercise sessions between our visits: “when Kira is there, it seems to enthrall them and they’d copy her moves, which they weren’t doing before. And we’ve noticed in the weeks after that, that they were becoming more active.” A long-time staff member of the group, claimed that she had never seen “the whole group concentrate so long on one thing.” Kira was also seen as supporting the active ageing program in making staff’s life easier as some sort of novelty and need for always changing stimulation … “I think [Kira’s possibilities] are endless, I really do. […] I think you could utilize her endlessly throughout the day. I would love to have her sit down with some clients and give a history run…or even an opportunity for clients to sit down and listen to poetry.” For things like that staff would often “revert to iPads, but to have
4 Designing Meaningful, Beneficial and Positive Human …
99
[Kira] would be even more beneficial.” One particular success was pointed out by an observing staff member who commented about one of the participants. According to staff, one of the elderly members would often tend to be extremely agitated, jumping up and walking around. However, during the exercise session it was pointed out that “There is no music and Victor is sitting still—this is unheard of.”
4.4.6 Case Study 3: Men’s Shed The third group that we visited was a locally organized men’s shed. It might not be obvious how this group fits into the care agenda with the other groups. In Australia, men’s sheds are highly successful groups of older men meeting in a space with many tools often as part of a community center where they do woodwork and other crafts while socializing and talking. It has been shown that the talking had highly positive effects on the overall wellbeing through increased mental health [51]. Hence men’s sheds are seen as therapeutic, especially for a generation where sharing problems and admitting to loneliness is not a common part of everyday life and, in particular, for men. We were invited by the local organizer of the men’s shed to demonstrate the Nao and initiate a discussion on robots. Similarly, to other visits we prepared a show and tell. The presentation setting was broken up when one of the men suggested to put Kira on one of the built rocking horses. As with the knitted clothes the association of the robot with a created artefact initiated laughter, but also a connection. People were joking, yelling, and losing any reservation towards the robot. One man offered to build a chair for Kira and got his folding rule out to measure upper body to leg ratio. Again, shortly after the first author got message that the chair was ready for collection. The only request was to get some photos of Kira using the chair (Fig. 4.4 right).
4.4.7 Case Study 4: Residential Care The residential care setting proved to be difficult despite some experience with other groups. The first visits were organized in a way that all residents were in the main community room to give them the experience and making them part of this “special event”. Due to the size of the group (about 60 people) the robot visit became a big show and interaction was impossible. We were surprised by the number of people who had been gathered for the weekly mass service and stayed on for our visit. By microphone the first author was introduced as the MC of the event after mass. As we were usually just dropping in on the activity groups, observing and integrating ourselves into whatever activity was taking place the mutual expectations on this visit were not discussed in detail. The set up arranged for us threw the team a bit of balance. With difficulty we could at least prevent to set up the robot for dancing on the altar left from the Mass service as we were worried this would offend some older
100
S. Pedell et al.
Fig. 4.5 Meet and greet in residential care
adults. While enjoyable for the older adults supporting the demos with clapping and attention this visit was mostly reduced to a show with some demos, storytelling and the robot being carried around for everyone to be touched in a brief meet and greet (see Fig. 4.5). One astounding small interaction that kept repeating while walking around with the robot was how several older adults cautiously lifted the little skirt with thumb and index apparently curious what they would see underneath. After clarifying that this was too big of a group, another visit was organized. However, this visit seemed to not work for exploring our aim on researching beneficial interactions either. Our visit coincided with a group of Kindergarten children. While the children were enthusiastic about Kira gathering around her the older adults wanted to interact with the children. Hence this set up of competing interests and stimulation did not work well in terms of an enjoyable interaction. The children were distracted and obviously attracted to the robot while the older adults neither got anything in terms of positive interaction out of the children nor the robots. While some older adults smiled at the joy of the young kids some others displayed disappointment not being the focus of attention. It would be interesting to explore in more detail how intergenerational interaction could be mediated by humanoid robots.
4.5 Discussion While sceptical at the beginning, this research turned into a project with several groups of older adults benefitting from interacting with the robot. From a spark of curiosity we were able to create programs for highly successful and engaging interaction scenarios. At every visit we learnt more about beneficial interactions of older adults with the humanoid Nao robot Kira in different care settings of active ageing groups. Most groups benefitted from the contributions and learnings of the previous groups. Each group added to the persona of Kira through accessories and stories. Rich stories were created about her acquaintances, “travels” to other groups and what happened there. The clothes became a constant accessory and part of the identity of Kira and we took the chair and 3-D printed walker to several visits. Overall Kira was taking something of the previous encounters in form of accessories, stories
4 Designing Meaningful, Beneficial and Positive Human …
101
and our increased knowledge to the next group over time. Our iterative participatory approach combined with several tools and techniques from HCI helped to understand what were useful (and not so useful) interaction scenarios, but also left the power of guiding to the older adults. The multidisciplinary team was key in developing the approach itself. In addition, the support of a range of stakeholders such as staff was key to understanding some of the interactions, sourcing materials and getting ideas for engaging activities. There are distinctive roles in the collaboration of creating successful human–robot interactions in a group care setting. It was not the intention to create a more efficient care setting (e.g. replacing staff members). In contrary the staff had a facilitating role between the group and the robot during the activities. This has also been suggested by Carros et al. [20]. The designer of the interactions requires the older adults and staff to figure out what is important to design adequate interactions. This means understanding what exactly needed to be programmed into the robot before a visit. The robot or more general the artificial intelligence (AI) needs a designer to work out where the significance is and the older adults require to see what the robot can do. Hence a spiral of careful informing of each other needed to be set in motion. Below we discuss key insights from our research which we put forward as recommendations. We also emphasize the importance of the procedure of designing interactions around meaningful and familiar activities. This is important as we expect that even with more advanced robots these insights are still relevant. When it comes to design engaging social interactions with older adults the implementation is key.
4.5.1 Creating a Basis Through Humor and Turning Initial Negative Emotions into Positive There is a lot of fear of robots “taking over the world”—largely caused by how cinema has portrayed robots, especially AI. Our data suggest that initial hesitation and the observation that at first the older adults seemed guarded, almost off-put by the robot (like a carer who was scared their job was at risk) can be overcome. When older adults and carers experienced how human reliant the Nao robots are (set up, programming, placement), how fragile and ‘wobbly’ they are (balancing), and how innocent the Nao looks, the carers and older adults realized it was not something to be scared of—rather they empathized with the robot. The persona of Kira as a girl dressed up helped to create interest and laughter. Humor as one important aspect of social interactions has been investigated before in regard to whether it improves human–robot interactions [52, 53]. While some of this research has not been conclusive, we found that lightness and humor in HRI is something that benefits groups of older adults when interacting with robots. It is a part of showing the robot’s vulnerabilities and helps make the experience comfortable, which is a basic requirement to positive interactions. It needs to be emphasized though that some of the humorous situations were facilitated either by
102
S. Pedell et al.
one of the team members or transpired from the coincidental clash of robot and human behavior. Based on the observed effect of our group members we recommend that robots and people should not be formal and serious all the time. If in the long-term people want humans and robots to socialize together, lightness and humor may help with strengthening this relationship. Adding lightness to the interactions would lead to a more relaxed experience, more enriching and memorable leading to conversation and socialization, increasing wellbeing of older adults over time.
4.5.2 Increasing Wellbeing Through Activity and Application of Skills Interactions were most powerful and impactful when group members were able to associate the robot with their own skills and capabilities (e.g., knitting and woodwork) and engage with the robot on a level that gave them a feeling of familiarity. They were able to relate to the connection between some of the props or activities from their own life (Kira is sitting on the rocking horse I made). Real ownership in the shaping of the interactions was created when the group members were involved in some actual crafting of the robot’s display (Kira is wearing the clothes I knit for her or she is sitting on the chair I made for her). This resulted in feelings of pride and happiness which other less creative group members were able to join and share as an experience. Where the robot was involved in physical activities (dancing and exercises) it can be argued that the increased engagement by group members (and in one case lowered agitation) that has been reported by the staff members is also directly connected to an increased wellbeing. The more active and stimulated older adults are the more they can maintain wellbeing and avoid mental health issues. Important was how we used the classic ‘home among the gumtrees’ song for their dance activity. Using robots to facilitate meaningful interactions through selecting specific activities and music from the “good old days” that encourages them to get up and move (again) was key in the group of people living with dementia. The best-known set of laws for robots are Isaac Asimov’s "Three Laws of Robotics". The Three Laws are: A robot may not injure a human being (i) or, through inaction, allow a human being to come to harm (ii). A robot must obey the orders given it by human beings except where such orders would conflict with the First Law (iii). We suggest that this is not sufficient anymore for social robotics. When we talk about increasing wellbeing, we need to strive for robots that benefit human beings through positive and meaningful interactions.
4 Designing Meaningful, Beneficial and Positive Human …
103
4.5.3 Situated AI for Human Robot Interactions In terms of AI most interactions needed to be pre-programmed for the robot to do anything. However, this is not seen as disadvantage having observed the strong reservation and fear at the beginning. We do not recommend intelligent robots per se, as what older adults want are technologies that are immersed in the situation [9]. Understanding and designing for the situation can lead to engage with the robot better and helps older adults to feel in control and in the centre of attention. Hence, we do need to know about the skills, experiences and goals of older adults before introducing robots. We applied a participatory approach to understand the extent to which AI should be used in the context of our participants and to maintain the focus on people instead of AI, as suggested by Gyldenkaerne et al. [54] when doing research on AI in healthcare. Based on the results of the four case studies we propose additionally treating the robot as a technology probe informing designers and engineers about the goals in different situations. Treating the robot as technology probe enables flexibility and openness which is practiced by Maldonado et al. [55] in a codesign project involving people with dementia. Insights through such a research approach also holds opportunities for situated AI. The example of Kira prompting a discussion on walkers shows AI in a robot might not mean to solve a problem, but simply to address a topic of concern in an empathetic manner in social situations.
4.5.4 Designing Social Interactions Intensive research of the care and group setting is needed before the robot is employed. This includes consulting domain experts and users to explore and evaluate use scenarios before going into the field. Activities and length of deployment need to be chosen carefully to set realistic expectations for the target audience and engage group members in interactions that are based on their interests. The more the group members are able to relate to the robot and are involved in the design of the activities the more likely it is that they engage which has also been shown by other research in group settings with older adults [20]. Providing common ground and a familiar setting are crucial for HRI in a group. Overdoing “novelty” and try to revolutionize the setting can lead to fear and rejection. We recommend a careful balance between stimulation and familiarity needs to be maintained—in particular when introducing the robot to older adults living with dementia. This concurs with Sharkey [16] suggesting with respect to the introduction of robots in dementia care that “many of the benefits that can be obtained are likely to be the result of the skilled and careful deployment of the robot.” (p.72). There are different levels of social interactions that can be considered: (i) Between the robot and the group: Humanoid robots are beneficial for increasing social interaction and mobility in groups of older people when addressing the groups interests and skills—becoming part of the activities. (ii) Between humans: Robots can
104
S. Pedell et al.
mediate the interests of several older people resulting in enjoyable shared experiences between humans. (iii) The wider community: Robots can be helpful to talk about sensitive topics in the wider community such as ageing and stigma and challenge our perception on older people and technology and self-perception.
4.6 Conclusions We aimed to understand what constitutes positive interactions with humanoid Nao robots in care settings with older adults to increase wellbeing. Although there is much debate on social robotics to address social isolation and technical inclusion of older adults, we propose that there is not enough knowledge on how social interactions should happen and how to introduce robots in older people’s lives. It is necessary to better understand how people engage with robots which in our case took some time and iterative visits in the field. We agree that there is potential for technologies to benefit older adults, but often interaction scenarios are not based on in-depth research with this demographic in their context. Hence, we reported on four case studies demonstrating the introduction of robots focusing on social aspects, context and goals of the older adults involved. The team explored and let the beneficial interaction scenarios evolve in a participatory process. We oppose the introduction of robots with expectations already set and recommend and discuss tools and techniques that enable the discovery of meaningful HRI scenarios in the field.
References 1. H. Robinson, B. MacDonald, N. Kerse, E. Broadbent, The psychosocial effects of a companion robot: a randomized controlled trial. J. Am. Med. Dir. Assoc. 14(9), 661–667 (2013). https:// doi.org/10.1016/j.jamda.2013.02.007 2. F.M. Carrillo, J. Butchart, S. Knight, A. Scheinberg, L. Wise, L. Sterling, C. McCarthy, Adapting a general-purpose social robot for paediatric rehabilitation through in situ design. ACM Trans. Human-Robot Interact. 7(1), 30 (2018). Article 12. https://doi.org/10.1145/320 3304 3. M. Vircikova, P. Sincak, Experience with the children-humanoid interaction in rehabilitation therapy for spinal disorders. Robot Intell. Technol. Appl. 208, 347–357 (2012) 4. C.L. van Straten, J. Peter, R. Kühne, Child-Robot relationship formation: a narrative review of empirical research. Int. J. Soc Rob. 12, 325–344 (2020). https://doi.org/10.1007/s12369-01900569-0 5. K. Roger, L. Guse, A. Osterreicher, Social commitment robots and dementia. Can. J. Ageing. 31(1), 87–94 (2012). https://doi.org/10.1017/S0714980811000663 6. M. Anderson, S.L. Anderson, Robot be good. Sci. Am. 303(4), 72–77 (2010) 7. T. Metzler, S. Barnes, Three dialogues concerning robots in elder care. Nurs. Philos. 15, 4–13 (2014). https://doi.org/10.1111/nup.12027 8. A. Sharkey, N. Sharkey, Children, the elderly, and interactive robots. IEEE Robot. Autom. Mag. 18(1), 32–38 (2011). https://doi.org/10.1109/MRA.2010.940151 9. L. Suchman, Human-Machine Reconfigurations: Plans and Situated Actions (Cambridge University Press, 2006)
4 Designing Meaningful, Beneficial and Positive Human …
105
10. C. Breazeal, K. Dautenhahn, T. Kanda, Social robotics, in Springer Handbook of Robotics (Springer International Publishing, 2016), pp. 1935–1971. https://doi.org/10.1007/978-3-31932552-1_72 11. T. Pachidis, E. Vrochidou, V.G. Kaburlasos, S. Kostova, M. Bonkovi´c, V. Papi´c, Social robotics in education: state-of-the-art and directions, in Advances in Service and Industrial Robotics (Springer International Publishing, Cham, 2018), p. 689–700. https://doi.org/ 10.1007/978-3-030-00232-9_72 12. R. Khosla, K. Nguyen, M.-T. Chu, Human robot engagement and acceptability in residential aged care. Int. J. Human-Comput. Interact. 33(6), 510–522 (2017). https://doi.org/10.1080/104 47318.2016.1275435 13. R. Kachouie, S. Sedighadeli, R. Khosla, M.-T. Chu, Socially assistive robots in elderly care: a mixed-method systematic literature review. Int. J. Human-Comput. Interact. 30(5), 369–393 (2014). https://doi.org/10.1080/10447318.2013.873278 14. H. Robinson, B. MacDonald, E. Broadbent, The role of healthcare robots for older people at home: a review. Int. J. Soc. Robot. 6, 575–591 (2014). https://doi.org/10.1007/s12369-0140242-2 15. R. Bogue, Advances in robot interfacing technologies. Industrial Robot: An Int. J. 40(4), 299–304 (2013) 16. A. Sharkey, Robots and human dignity: a consideration of the effects of robot care on the dignity of older people. Ethics Inf. Technol. 16, 63–75 (2014). https://doi.org/10.1007/s10676014-9338-5 17. N.E. Sharkey, A.J.C. Sharkey, Living with robots: Ethical considerations for eldercare, in Artificial companions in society: Scientific, economic, psychological and philosophical perspectives. ed. by Y. Wilks (John Benjamins, Amsterdam, 2010), pp. 245–256 18. R. Sparrow, L. Sparrow, In the hands of machines? The future of aged care. Mind. Mach. 16(2), 141–161 (2006). https://doi.org/10.1007/s11023-006-9030-6 19. S. Turkle, Alone Together: Why We Expect More from Technology and Less from Each Other (Basic Books, New York, 2011) 20. F. Carros, J. Meurer, D. Löffler, D. Unbehaun, S. Matthies, I. Koch, R. Wieching, D. Randall, M. Hassenzahl, V. Wulf, Exploring human-robot interaction with the elderly: results from a ten-week case study in a care home, in Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–12 (2020). https://doi.org/10.1145/3313831.3376402 21. E. Mordoch, A. Osterreicher, L. Guse, K. Roger, G. Thompson, Use of social commitment robots in the care of elderly people with dementia: a literature review. Maturitas 74, 14–20 (2013) 22. T. Sorell, H. Draper, Robot Carers, Ethics, and Older People (Springer, 2014) 23. D. Muñoz, S. Favilla, S. Pedell, A. Murphy, J. Beh, T. Petrovich, Evaluating an app to promote a better visit through shared activities for people living with dementia and their families (number 1874), in Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2021). https://doi.org/10.1145/3411764.3445764 24. J.E. Young, J. Sung, A. Voida et al., Evaluating human-robot interaction. Int J of Soc Robotics 3, 53–67 (2011). https://doi.org/10.1007/s12369-010-0081-8 25. T. Robertson, J. Simonsen, Participatory design: an introduction, in Routledge international handbook of participatory design. ed. by J. Simonsen, T. Robertson (Routledge, New York, NY, 2012), pp. 1–18 26. E.B.-N. Sanders, Generative tools for co-designing, in Proceedings of Conference on CoDesigning, eds. by S.A.R. Scrivener, L.J. Ball, A. Woodcock (Springer, Dordrecht, The Netherlands), pp. 3–12 27. T. Robertson, T.W. Leong, J. Durick, T. Koreshoff, Mutual learning as a resource for research design, in Proceedings of the 13th Participatory Design Conference: Short Papers, Industry Cases, Workshop Descriptions, Doctoral Consortium Papers, and Keynote Abstracts, vol. 2, no. (PDC ’14), pp. 25–28 (2014). https://doi.org/10.1145/2662155.2662181 28. H. Hutchinson, et al., Technology probes: inspiring design for and with families, in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’03), pp. 17–24, (2003). https://doi.org/10.1145/642611.642616
106
S. Pedell et al.
29. L. Sterling, S. Pedell, G. Oates, Using motivational modelling with an app designed to increase student performance and retention, in Early Warning Systems and Targeted Interventions for Student Success in Online Courses, eds. by D. Glick, A. Cohen, C. Chang, pp. 161–176 (2020). https://doi.org/10.4018/978-1-7998-5074-8.ch008. 30. F. Visser, P. Stappers, R. van der Lugt, E.B.-N. Sanders, Contextmapping: experiences from practice. CoDesign 1(2), 119–149 (2005). https://doi.org/10.1080/15710880500135987 31. E.B.-N. Sanders, P.J. Stappers, Co-creation and the new landscapes of design. CoDesign 4(1), 5–18 (2008). https://doi.org/10.1080/15710880701875068 32. E. Brandt, T. Binder, E.B.-N. Sanders, Ways to Engage Telling, Making and Enacting (Routledge International Handbook of Participatory Design. Routledge, New York, 2012), pp. 145–181 33. A.P. Correia, F.D. Yusop, I don’t want to be empowered: the challenge of involving real-world clients in instructional design experiences, in Proceedings of the 10th Anniversary Conference on Participatory Design, (2018) 34. M. Ertner, A.M. Kragelund, L. Malmborg, Five enunciations of empowerment in participatory design, in Proceedings of the 11th Biennial Participatory Design Conference (PDC ‘10) (Association for Computing Machinery, New York, NY, USA, 2010), pp. 191–194. https://doi. org/10.1145/1900441.1900475 35. E.B.-N. Sanders, E. Brandt, T. Binder, A framework for organizing the tools and techniques of participatory design, in Proceedings of the 11th Biennial Participatory Design Conference (PDC ‘10) (Association for Computing Machinery, New York, NY, USA, 2010), pp. 195–198. https://doi.org/10.1145/1900441.1900476 36. C. Graham, M. Rouncefield, Probes and participation, in Proceedings of the 10th Conference on Participatory Design: Experiences and Challenges, eds. by D. Hakken, J. Simonsen, T. Roberston (Indiana University, Indianapolis, IN, 2008), pp. 194–197 37. M. Arnold, The connected home: probing the effects and affects of domesticated ICTs, in, Proceedings of the 8th Conference on Participatory Design: Artful Integration: Interweaving Media, Materials and Practices, eds. by A. Clement, P. Van den Besselaar, vol. 2 (ACM Press, New York, NY, 2004), pp. 183–186 38. M. Rouncefield, A. Crabtree, T. Hemmings, T. Rodden, K. Cheverst, K. Clarke, G. Dewsbury, J. Hughes, Adapting cultural probes to inform design in sensitive settings, in Proceedings of the 15th Australasian Conference on Computer-Human Interaction eds. by S. Viller, P. Wyeth (University of Queensland, Queensland, Australia, 2003), pp. 4–13 39. J. Paay, L. Sterling, F. Vetere, S. Howard, A. Boettcher, Engineering the social: the role of shared artifacts. Int. J. Hum Comput Stud. 67(5), 437–454 (2009) 40. S. Pedell, F. Vetere, T. Miller, S. Howard, L. Sterling, Tools for participation: intergenerational technology design for the home. Int. J. Des. 8(2), 1–14 (2014) 41. S. Pedell, T. Miller, F. Vetere, L. Sterling, S. Howard, J. Paay, Having fun at home: Interleaving fieldwork and goal models, in Proceedings of the 21th Australasian Conference on ComputerHuman Interaction, eds. by M. Foth, J. Kjeldskov, J. Paay (ACM Press, New York, NY, 2009) 42. L. Sterling, K. Taveter, The Art of Agent-Oriented Modelling (MIT Press, Cambridge, MA, 2009) 43. A. Beck, L. Cañamero, A. Hiolle et al., Interpretation of emotional body language displayed by a humanoid robot: a case study with children. Int. J. Soc. Robotics 5, 325–334 (2013). https:// doi.org/10.1007/s12369-013-0193-z 44. H.R.M. Pelikan, M. Broth, L. Keevallik, Are You Sad, Cozmo?: How Humans Make Sense of a Home Robot’s Emotion Displays, in Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction (HRI ‘20) (Association for Computing Machinery, New York, NY, USA, 2020), pp. 461–470. https://doi.org/10.1145/3319502.3374814 45. K. Boehner, R. DePaula, P. Dourish, P. Sengers, How emotion is made and measured. Int. J. Hum. Comput. Stud. 65(4), 275–291 (2007). https://doi.org/10.1016/j.ijhcs.2006.11.016 46. K. Boehner, R. DePaula, P. Dourish, P. Sengers, Affect: from information to interaction, in Proceedings of the 4th Decennial Aarhus Conference, Critical Computing—Between Sense and Sensibility (pp. 59–68). https://doi.org/10.1145/1094562.1094570
4 Designing Meaningful, Beneficial and Positive Human …
107
47. S. Pedell, J. Constantin, K. D’Rosario, S. Favilla, Humanoid robots and older people with dementia: designing interactions for engagement in a group setting, in Interplay 2015 Congress (IASDR 2015), Brisbane, Australia, 2–5 November, 2015, eds. by V. Popovic, A.L. Blackler, D.-B. Luh, N. Nimkulrat, B. Kraal, N. Yukari, pp. 1639–1655 48. P. Riley, N. Alm, A. Newell, An interactive tool to promote musical creativity in people with dementia. Comput. Hum. Behav. 25(3), 599–608 (2009) 49. K. Sherratt, A. Thornton, C. Hatton, Music interventions for people with dementia: a review of the literature. Aging Ment. Health 8(1), 3–12 (2004) 50. L. Cuddy, J. Duffin, Music, memory, and Alzheimer’s disease: is music recognition spared in dementia, and how can it be assessed? Med. Hypotheses 64, 229–235 (2005). https://doi.org/ 10.1016/j.mehy.2004.09.005 51. J.S. Culph, N.J. Wilson, R. Cordier, R.J. Stancliffe, Men’s Sheds and the experience of depression in older Australian men. Aust. Occup. Ther. J. 62(5), 306–315 (2015). https://doi.org/10. 1111/1440-1630.12190 52. R. Oliveira, P. Arriaga, M. Axelsson, A. Paiva, Humor–Robot interaction: a scoping review of the literature and future directions. Int. J. Soc. Rob. Springer Science and Business Media B.V. https://doi.org/10.1007/s12369-020-00727-9 53. L. Bechade, G.D. Duplessis, L. Devillers, Empirical study of humor support in social humanrobot interaction, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9749) (Springer, 2016), pp. 305–316. https://doi.org/10.1007/978-3-319-39862-4_28 54. C.H. Gyldenkaerne, G. From, T. Mønsted, J. Simonsen, PD and the challenge of AI in healthcare, in Proceedings of the 16th Participatory Design Conference 2020—Participation(s) Otherwise—Volume 2 (PDC ‘20) (Association for Computing Machinery, New York, NY, USA, 2020), pp. 26–29. https://doi.org/10.1145/3384772.3385138 55. R.M. Branco, Q. Joana, Ó. Ribeiro, Playing with personalisation and openness in a codesign project involving people with dementia, in Proceedings of the 14th Participatory Design Conference: Full papers—Volume 1 (PDC ‘16) (Association for Computing Machinery, New York, NY, USA, 2016) 61–70. https://doi.org/10.1145/2940299.2940309
Sonja Pedell Associate Professor Sonja Pedell is Director of Swinburne University’s Future Self and Design Living Lab. The FSD Living Lab has core development capabilities in the area of innovative socio-technical systems and design solutions for health and wellbeing with a focus on the ageing population and dementia. Prior to taking up this role at Swinburne, Dr. Pedell completed a Master of Psychology from the Technical University of Berlin and was employed as an Interaction Designer, Usability Consultant and Product Manager in industry for several years. Kathy Constantin After three years of working in the high-end feature film industry, Kathy returned to her alma mater to do a Ph.D. in design, motivating movement identity formation in older adults. She graduated with honors first class at Swinburne University, and made the Dean’s List while on exchange at the University of Cincinnati, Ohio, USA. Kathy has worked with humanoid robots, virtual reality, digital media, animation, and 3D. She currently teaches 3D modelling and is a VFX industry guest lecturer. Diego Muñoz Dr. Diego Muñoz is Research Fellow in the Centre for Design Innovation in the School of Design at Swinburne University of Technology. He has completed a Ph.D. in HumanComputer Interaction at the Queensland University of Technology. Diego’s research focuses on intergenerational communication, participatory design with families, and technology design for older adults.
108
S. Pedell et al.
Leon Sterling Professor Leon Sterling is Emeritus Professor based in the Future Self Living Lab in the Centre for Design Innovation in the School of Design at Swinburne University of Technology. After completing a Ph.D. at the Australian National University, he worked for 15 years at universities in the UK, Israel and the United States. He returned to Australia as Professor of Computer Science at the University of Melbourne in 1995, serving as Head of the Department of Computer Science and Engineering for 6 years. In 2010, he moved to Swinburne where he served as Dean of the Faculty of Information and Communication Technologies for 4 years and Pro ViceChancellor (Digital Frontiers) for two years. His current research is in incorporating emotions in technology development, where motivational models are an essential element.
Chapter 5
Wearable Accelerometers in Cancer Patients Seema Dadhania and Matthew Williams
Abstract Cancer is the leading cause of death worldwide accounting for nearly 10 million deaths in 2020 alone. The complexity of cancer treatment, recovery and survivorship in a co-morbid, aging population creates a need for novel monitoring and surveillance approaches. Research over the past few decades has linked higher physical activity (PA) with reduced risk of developing cancer, and a lower mortality in those with a cancer diagnosis. Accurate assessment of physical function is also a vital aspect of the cancer patient clinical evaluation, required to inform treatment decisions and eligibility for cancer clinical trials. Existing methods of evaluating PA are subject to recall bias and limitations. Given the growing body of evidence highlighting the limitations of self-reporting PA measures and as cancer care becomes increasingly patient centered, digital wearable tools could help us address unmet needs by offering unique advantages such as scale, cost, data volume, and continuous real-world objective data capture. In this chapter we discuss current methods of evaluating PA in the cancer patient, how wearable accelerometers have been used in cancer clinical trials and give details of a wearable accelerometer digital health trial collecting longitudinal PA data including successes and challenges. Keywords Wearables · Cancer · Near-patient sensing · Clinical decision support
5.1 Introduction Cancer is the leading cause of death worldwide, accounting for nearly 10 million deaths in 2020 alone [1]. Significant improvements in detection of cancer, treatment options and an aging population have led to a higher prevalence of both cancer patients and survivors. The complexity of cancer treatment, recovery and survivorship in S. Dadhania (B) · M. Williams Computational Oncology Laboratory, Institute of Global Health Innovation, Imperial College London, London, UK e-mail: [email protected] M. Williams e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_5
109
110
S. Dadhania and M. Williams
a co-morbid, aging population creates an ongoing need for novel monitoring and surveillance approaches. Since the early 1990s, there is accumulating evidence linking higher physical activity to reduced cancer risk and lower mortality in those with a diagnosis of cancer [2–4]. Given the increasing global incidence and prevalence of cancer, there is rising interest in physical activity as a non-pharmacological method of intervention and cancer prevention. In patients diagnosed with cancer, accurate assessment of physical function is required to inform treatment decisions and eligibility for cancer clinical trials. Physical function also forms part of the wider assessment of HealthRelated Quality of Life (HRQoL) which represents the patient’s general perception of the effect of illness and treatment on physical, psychological and social aspects of life [5] and is a widely used method of assessment in Oncology practice and cancer clinical trials. Systematic reviews provide evidence that increasing levels of physical activity (PA) in patients with advanced cancer are associated with greater HRQoL and overall improvement in physical status [6, 7]. Given the importance of PA to overall health and recovery, a large proportion of oncology clinical trials utilize tools to capture PA-associated measures including sleep, exercise, energy expenditure and functional performance. However, objective assessment of physical functioning is challenging as patients spend most of their time away from the clinical environment, self-report to providers and experience dynamic changes in physical functioning through their cancer pathway (Fig. 5.1). The treating team are thus exposed to an episodic snapshot rather than disease in motion. Recent advances in wearable activity monitors and sensing technology have made it possible to gather real-time, objective activity data in a non-obtrusive manner. A ‘wearable’ is a device with an inbuilt sensor capable of collecting health-related
Fig. 5.1 The cancer patient pathway [9]. The cancer pathway is the patient’s journey from the initial suspicion of cancer through clinical investigations, patient diagnosis, treatment and follow-up [10]
5 Wearable Accelerometers in Cancer Patients
111
data remotely [8]. It is the objective and passive nature of the data i.e., without interference from patient nor clinician that lends itself to its potential role in cancer patient monitoring and will be the focus of this chapter. Given the increasing number of research and consumer-grade wearable devices available, we will use the term ‘wearable accelerometer’ to encompass all wearable devices which capture activity measures.
5.2 The Cancer Patient and Outcome Measures The quality of medical care delivered has traditionally been measured in terms of patient outcomes such as morbidity, the state of having a specific illness or condition and mortality which refers to the number of deaths caused by the specific illness. Survival and recurrence rates similarly describe whether a specific treatment is achieving its intended goal. Overall survival (OS) is the length of time from either the date of diagnosis or the start of treatment for a disease such as cancer, that patients are still alive [11]. Progression free survival (PFS) is the length of time during and after the treatment of cancer, that a patient lives with the disease but it does not get worse. These remain the most used outcomes in cancer clinical trials and given they are both definitive and objective in nature can directly influence clinical management decisions. Whilst objectively relevant to clinicians and patients, these outcomes do not reflect the fine balance between treatment associated benefit and the risk associated with toxicity and quality of life factors. Clinicians in practice are aware that high quality care which is patient-centered goes well beyond whether a patient lives, dies or whether the cancer recurs. The Institute of Medicine has defined the six aims of delivering quality healthcare as safe, effective, patient-centered, timely, efficient and equitable [12]. This definition highlights the paradigm shift in including the patient and their individual beliefs into the decision-making process and paves the way for developing patient-centered measurement tools that more broadly and reliably capture factors that define health and guide outcomes from the perspective of the patient. Patient-centered outcomes are thus research which evaluate the relationship of an observational event (the outcome) and the treatment provided in the context of the patient [13, 14]. As oncology care shifts towards a more patient centered approach, increasing emphasis is being placed on outcomes which address HRQoL as well as mental and physical wellbeing.
5.2.1 Measuring Physical Activity Physical activity has been defined as ‘all bodily actions produced by the contraction of skeletal muscle that increase energy expenditure above basal level’ [15]. It is a complex behavior and thus challenging to measure. Multiple measurements exist
112
S. Dadhania and M. Williams
to measure PA including direct/indirect calorimetry, questionnaires, diaries, behavioral observations and motion sensors such as accelerometers. A measure called the metabolic equivalent of task, or MET, is used to characterize the intensity of PA. One MET is equivalent to the rate of energy expended by a person sitting at rest and is defined as 1 kilocalorie per kilogram per hour. Sedentary behavior is any waking behavior characterized by an energy expenditure of 1.5 or fewer METs while sitting, reclining or lying down. Light-intensity activities expend less than 3 METs, moderate-intensity activities expend 3 to 6 METs, and vigorous activities expend 6 or more METs.
5.2.2 Measuring Physical Activity in the Cancer Patient 5.2.2.1
Performance Status
To conduct clinical trials in a consistent manner across multiple centers requires the use of standard criteria to measure how the disease impacts daily life. Performance status (PS) is a measure widely used by Oncologists to determine a patient’s general wellbeing and fitness and is utilized to guide management decisions. It describes a patient’s level of functioning in terms of their ability to care for themselves, daily activity, and physical ability. There are 2 commonly used scales for PS, the Eastern Cooperative Oncology Group (ECOG) scale and the Karnofsky scale (KPS) (Fig. 5.2). ECOG Scale 0
1
Normal Activity
Symptomatic and ambulatory; cares for self
Karnofsky Scale 100
Normal; no evidence of disease
90
Able to perform normal activities with only minor symptoms Normal activity with effort; some symptoms
80 70
Able to care for self but unable to do normal activities Requires occasional assistance; cares for most needs
2
Ambulatory >50% of the time; occasional assistance
60
3
Ambulatory ≤50% of the time; nursing care needed
50
Requires considerable assistance
40
Disabled; requires special assistance
30
Severely disabled
20
Very sick; requires active supportive treatment
10
Moribund
4
Bedridden
Fig. 5.2 ECOG and KPS performance status scales [16, 17]
5 Wearable Accelerometers in Cancer Patients
5.2.2.2
113
Patient Reported Outcomes
Cancer patients experience significant physical and psychosocial consequences of their underlying disease and treatment which impact quality of life (QoL) [18]. These consequences can be under-recognized and under-treated in Oncology practice, leading to greater morbidity which is costly to both the patient and treating health system. Patient reported outcomes (PROs) are used in practice for early detection of distress, and as a performance metric for evaluating quality of care on health outcomes. PROs are tools used to capture aspects of a patient’s health status, self-reported by the patient and without influence from others. They are typically standardized instruments or questionnaires collected at varying time points in the pathway. PROs focus on treatment toxicity, physical symptoms, psychosocial problems and/or the global impact of disease on HRQoL. Validated PRO tools include but are not limited to: • National Institute of Health’s (NIH) Patient-Reported Outcome Measurement Information System (PROMIS) tool which can be used across various diseases [19], • European Organization for Research and Treatment Quality-of-Life Core Questionnaire (EORTC QOL-C30) which is cancer specific [20], • Functional Assessment of Cancer Therapy–General (FACT-G) [21], • 36-item short-form health survey (SF-36) [22], • Global Physical Activity Questionnaire (GPAQ). A study to evaluate wearable activity monitors to assess PS and predict clinical outcomes in 37 advanced cancer patients showed a correlation between the number of steps per day and rating on the ECOG PS scale. An increase of 1000 steps/day was associated with a reduced odds for adverse events, hospitalizations and hazard for death. In addition, it demonstrated significant correlation between activity measures and the PRO tool evaluating physical functioning and fatigue [23].
5.2.2.3
Limitations Associated with PS Scales and Self-Reported Measures
The PS scales were developed over 50 years ago and remain subject to bias and limitations. Despite these limitations, important clinical decisions are based on PS including fitness for treatment and eligibility for clinical trials. Several studies have shown providers routinely under or over-estimate PS [24, 25] and assessment of PS is subject to patient recall bias [26]. Challenges associated with data generated from PROs include how to handle missing data, often seen in clinical trial data collection where up to half of PROs are missing [27]. Furthermore, it is challenging for clinical teams to reach robust conclusions when evaluating PROs as they vary in in scale, measurement and interpretation. Finally, there remain no standardized time-points at which to evaluate HRQoL.
114
S. Dadhania and M. Williams
Several studies however have shown that incorporating PROs into the routine care of the cancer patient can improve outcomes, QoL and patient satisfaction [28–31]. Questionnaires which assess patient PA are equally subject to recall bias, and patients are more likely to recall high-intensity bouts of activity rather than movement related to daily activities e.g. climbing stairs or walking [32] so the latter are more likely to go unreported. Patients receiving anti-cancer therapies or recovering from treatment are less likely to perform high-intensity or moderate-to-vigorous PA (MVPA) regularly [33–37]. Additionally, low-to-moderate sensitivity and lack of agreement between questionnaires make it challenging to draw conclusions when comparing activity levels across study populations. Interpreting questionnaires can be problematic as activity levels are classified into broad categories e.g. time spent in light, moderate or vigorous activity or into METS to standardize results to a given PA intensity threshold [38, 39]. Given PA intensity in older or co-morbid populations is more likely to fall into the low intensity category i.e. < 3 METS, the subtle nuances of activity within this population are diluted [39–41]. Categorization of PA intensity into METS does not consider the age and comorbidity related variability in metabolic function and/or speed of movement.
5.3 Harnessing Wearable Technology in Oncology Given the growing body of evidence highlighting the limitations of self-reporting measures and as cancer care becomes increasingly patient centered, digital wearable tools could help us address unmet needs by offering unique advantages such as scale, cost, data volume, and continuous real-world objective data capture. Consumer-grade monitors are commonly activity tracking wristbands paired with a smartphone app which sync and summarize a user’s data into number of steps taken, calories burned and activity classification. Brand examples include Fitbit, Apple iWatch, Samsung and Garmin. Consumer wearable devices are often released with limited evidence of validation and algorithms used to calculate activity metrics are proprietary with users having no access to raw acceleration data. Research-grade devices tend to be larger in size, more durable and cost more than their consumergrader counterpart. Brand examples include ActiGraph, GENEActiv, ActivPal and Axivity and have been used in population level activity studies including the National Health and Nutrition Examination Survey (NHANES) and UK Biobank 7-day accelerometer study. Contrary to commercial devices, research-grade accelerometers are validated in the scientific literature, many of which have FDA approval and all provide access to raw activity data either as counts or gravitational acceleration units for a given unit of time [38].
5 Wearable Accelerometers in Cancer Patients
115
5.3.1 What Can Wearable Technology Be Used to Measure in Oncology, and Why Are These Parameters Relevant? 5.3.1.1
Physical Activity
Higher PA is associated with improved outcomes [42, 43] including QoL, disease recurrence, overall and disease-free survival in patients with primary colorectal, breast, prostate, pancreatic, lung and gynecological tumors [44–49]. PA measurement can be a useful proxy measure of overall health [50], and as PA levels are reflective of physical deconditioning associated with the underlying disease or treatment, is a feasible end-point for cancer clinical trials, particularly in those where QoL and toxicity are important. Accurately measuring PA is also important to identify correlations with other measurable clinical outcomes and to evaluate the efficacy of interventions to increase PA levels within cancer clinical research.
5.3.1.2
Sleep
Cancer patients are disproportionately affected by sleep–wake disturbance and insomnia relative to the general population [51]. Sleep disturbance is widely reported in cancer patients and survivors and has a negative impact on health outcomes, poorer treatment adherence and disease outcomes [52–54]. We know in brain tumour patients, sleep–wake disturbance is among the most commonly described symptom of the disease and/or treatment related toxicity and evidence suggests a unique pathophysiology [55, 56]. Developing an understanding and management strategy for impaired sleep–wake disturbance is thus an important area of research in neurooncology. Objective measures of sleep–wake cycles using accelerometry in addition to more traditional sleep diary and polysomnography methods may provide a useful way of capturing sleep data in a more longitudinal and passive manner.
5.3.1.3
Gait
The characterization of gait is emerging as a powerful tool in neurodegenerative disease to identify surrogate markers of incipient disease manifestation or disease progression and a significant amount of work has been done in Parkinson’s disease to characterize gait with disease trajectory [57]. Gait analysis typically requires large, expensive equipment performed through optical analysis with reflective markers, in pairing with force plates along walkways [57]. Gait analysis with wearable sensors also frequently depends on the use of 6 or 9-axis inertial measurement units (IMU’s), utilizing gyroscopes and magnetometers for positional sensor data. While these systems are essential for developing protocols, their size and cost make them inaccessible to quantify gait in many healthcare settings. This has driven the demand for cheaper, lighter and more portable options which can be easily deployed in the
116
S. Dadhania and M. Williams
clinic. Wearable accelerometers and their application in instrumented testing have risen in popularity [58]. Whilst many commercial wearable brands have developed proprietary algorithms, the majority show limited ability to identify and quantify gait in a free-living environment [59, 60]. In developing more clinically useful algorithms and outcomes, research groups are now utilizing raw accelerometer data. Specifically these outcomes are associated with spatial–temporal gait characteristics or ‘micro’ gait parameters and incorporate timings and length of steps and the variation associated with ageing and disease [61]. Examining micro as well as ‘macro’ gait characteristics e.g. volume, patterns and variability of walking bouts [62] is a clinically relevant model of gait assessment and algorithms have been developed to capture steps and walking bouts from free-living accelerometer data [63].
5.3.1.4
Quality of Life
Increasingly cancer clinical trials are measuring Quality of life as a study specific outcome utilizing questionnaires such as the EORTC-QOL C30. Measuring QoL reflects patient satisfaction and perceived benefits of a treatment or intervention and is of particular importance in Oncology practice where the intervention may be given for extension of life rather than with curative intent [64]. A Cochrane review which evaluated the association between QoL in cancer patients and physical activity demonstrated a positive effect of physical activity on QoL encompassing fatigue, social and physical functioning and role function [65]. Given the passive and objective nature of accelerometer data, capturing physical activity as a surrogate for QoL is an attractive prospect. A study by Haslam al described how often patients QoL is assessed throughout a patient’s disease from 3 high-impact Oncology journals between July 2015 to July 2018 in all studies which reported QoL as a study outcome. Most studies reported QoL during active treatment only but noted it was rarely measured after disease progression or until the end of the patient’s life [66]. Wearable accelerometer data collection has the advantage of being continuous and there may be opportunity for measuring QoL in a longitudinal manner, providing a picture of QoL over a patient’s entire cancer journey rather than the episodic snapshots we currently see.
5.4 Accelerometers All modern wearable PA devices contain an accelerometer, a microelectromechanical system (MEMS). Accelerometers consist of small sensors which register acceleration along three axes, and are worn on varying locations of body, commonly the hip, wrist and thigh, thus the term ‘wearable accelerometers’ is often used to describe a wearable device to capture physical activity measures remotely. We will use the term ‘wearable accelerometer’ to encompass all wearable devices which capture activity measures.
5 Wearable Accelerometers in Cancer Patients
117
Tri-axial accelerometers measure the vibration in three axes, X, Y and Z. The raw acceleration data collected from triaxial accelerometry represent the direction and magnitude of acceleration from each axes in the unit g, where 1 g is equivalent to the gravity of earth [67]. Sampling frequency, range and resolution settings will determine the particulars of this data. The raw data is then processed using various algorithms into PA summaries such as step count, calories, activity count and activity classification using developed algorithms and can be represented at varying temporal resolutions e.g., second, minute, hourly, daily. Common open-source methods of processing accelerometer data and attempting to remove the gravitational and noise components incorporated within the signals are the Euclidean Norm Minus One (ENMO) [68] and Mean amplitude deviation (MAD). Both metrics do not require the data to be filtered in order to correct for gravity which is taken into consideration in their algorithms, making them more attractive. MAD is the mean value of the dynamic acceleration component and is calculated from the resultant vector value of the measured orthogonal acceleration, which incorporates two components, dynamic and static. The former is due to deviations in velocity and the latter due to gravity. During analysis of each epoch, the static component is removed, and the remaining dynamic component revised. The MAD value is therefore the mean of the altered acceleration signal autonomous of the static element within each epoch. The ENMO metric which is also calculated in a similar away adjusts for gravity by subtracting a fixed offset of one gravitational unity for the Euclidean Normal of the X, Y and Z acceleration signals [69]. The processed data represented as an epoch, expresses the length of time during which the measure of activity magnitude is summed before being stored in the accelerometer or over a userdefined time-sampling interval if manipulating raw data. The epoch length used in health research often ranges between less than one second to 60 s and makes the data more manageable for ease of analysis and interpretation. Shorter epoch lengths can identify the greater range in PA variability, useful at the extreme ends of sedentary and more vigorous activity, whilst longer epoch lengths reduce and smoothen the PA data [70]. The hip has in the past been the preferred placement for accelerometers, however wrist worn accelerometry has been used more recently in large national health surveys such as UK Biobank [71] and NHANES [72]. Figure 5.3 shows the processing steps required to extract physical activity information from raw accelerometer data using the UK biobank accelerometry processing tool [73]. Resample x/y/z to sampling frequency e.g. 100 Hz
Calibration to local gravity
Vector magnitude
Remove noise & gravity
5-sec epoch analysis
Non-wear detection
Activity volume & intensity summary
Fig. 5.3 Processing steps used by the UK biobank for triaxial accelerometer raw data processing [73]
118
S. Dadhania and M. Williams
5.4.1 Challenges with Wearable Accelerometer Data 5.4.1.1
Data Volume
Due to the high granularity of accelerometry data, it rapidly expands in size and requires powerful data analysis and manipulation tools. For example, 24 h of data recorded at a sampling frequency of 100 Hz (100 readings per second) produces 8,640,000 observations and adequate computing power and storage capabilities are required when designing clinical studies utilizing wearable accelerometers. Figure 5.4 displays 1 week of accelerometer data taken at a sampling frequency of 100 Hz, and although diurnal variations in acceleration are clearly visible, it is challenging to infer much else from this visual representation. In order to reduce the size and complexity, data is represented as epochs which are prebuilt into consumer wearable algorithms or can be user-defined if analyzing raw accelerometer data. Data compression does however run the risk of data loss and over-simplification, to present more digestible ‘average’ summaries. Several studies have established the effect of variable epoch lengths during data processing resulting in different moderateto-vigorous PA (MVPA) scores, and lack of measurement criterion a well-known limitation of accelerometer PA studies [74–77]. Once data is processed, there are challenges related to the sheer volume of data and whether it is of a good enough quality to aid clinical decision making. Workload concerns and over burdening clinical teams with wearable data may overwhelm clinical staff and there is there is a risk of liability from delayed review. There remain a lack of electronic health record functionalities to successfully store processed accelerometer data and the additional resource and time required for the management of this data can potentially outweigh the benefits.
Fig. 5.4 7-day collection at a sampling frequency of 100, 25 and 25 Hz down-sampled wearing the AX3 triaxial accelerometer
5 Wearable Accelerometers in Cancer Patients
5.4.1.2
119
Body Placement
A 2020 systematic review identified 25 clinical wearable studies of patients during cancer treatment. Although the device itself varied across studies, the majority were placed on the wrist for data collection [78]. At our institution, we have carried out a significant amount of patient and public involvement (PPI) work with cancer patients and survivors in developing near patient sensing study protocols. Patients almost unanimously feel the wrist is the most acceptable sensor location in terms of appearance, comfort and acceptability for longitudinal data collection lasting more than 7 days. This correlates with large population level accelerometer studies such as the UK Biobank and NHANES which have opted for wrist placement of the accelerometer device [73, 79] on the assumption that wrist placement would increase compliance, and studies in children and adolescents have corroborated this assumption [80–82]. Although not in the cancer arena, this PPI work in dementia shows how patient involvement for designing wearable accelerometer studies can be utilized [83]. Location of the accelerometer device can alter activity classification and physical activity measures. A study assessed hip and wrist accelerometer machine learning (ML) algorithms for free-living behavior classification using a random forest and hidden markov model compared to the ground truth captured by wearable cameras in 40 overweight or obese women. The ML algorithm obtained on average 5% higher accuracy in classifying walking or running in the hip versus the wrist accelerometer. Standing was most commonly labeled as sitting for the hip classifier and walking for the wrist classifier [84] and emphasis is placed on different features used by the ML classifier between the two body locations. Del Din et al. showed gait characteristics can vary dependent on sensor location and concluded that the chest may be more accurate than the waist in measuring gait characteristics such as step length and time [85]. This is a clear example of accelerometers being a surrogate for the acceleration of the body location on which it is placed. Figure 5.5 shows the variation in acceleration amplitude when carrying out three separate activities which place different emphasis on lower or upper limb movement. Despite more clinical studies opting to place accelerometers on the wrist, there remain several issues in the analytical pipeline with most methods developed from small, homogenous samples and/or in a controlled environment. As such, when deployed in the free-living environment, these methods fall short in accuracy when characterizing human-movement behavior or PA measures [86, 87]. Accelerometer data captured from varying locations are likely to lead to different results and as demonstrated in Fig. 5.5, will have a preference for movements which engage that particular part of the human body. Interpretation of physical activity summaries into defined activities thus requires device and location specific calibration.
120
S. Dadhania and M. Williams
Fig. 5.5 Karas et al. showed the difference between raw accelerometer amplitude collected from two locations, the hip (left) and left wrist (right) whilst carrying out three activities; dealing cards, dressing up and walking, with each accelerometer axis represented in a different colour. The amplitude and frequency between the two sensor locations whilst walking appears similar and is likely because both the lower limbs and the arms (arm swinging) are required for walking, with slightly more emphasis placed on the lower limbs. Getting dressed is a much more complex process and higher amplitudes are seen in both the hip and left wrist when compared to dealing cards which is an activity isolated to the upper limbs alone [88]
5.4.1.3
Sampling Frequency, Range and Resolution
Sampling frequency represents the number of observations per second in unit Hertz (Hz) and correlates with energy consumption of the wearable accelerometer battery. Most sensors used for physical activity capture data at 30-100 Hz [67]. In accordance with the Nyqust theorem, characterizing a PA signal with accuracy requires the sampling frequency to be at least double the frequency of the signal of interest. As shown in [89], human activity frequencies are between 0 and 20 Hz and that 98% of the Fast Fourier Transform (FFT) amplitude is contained below 10 Hz, and thus most studies utilizing wearable accelerometers can capture the signal of interest. A study assessing the effect of lowering the sampling rate from 100 to 25 Hz to reduce device energy consumption on accelerometer-based PA monitoring and machine learning activity classification in healthy adults showed data between the 2 sampling frequencies are highly correlated in overall activity measurement. However,
5 Wearable Accelerometers in Cancer Patients
121
consistently lower overall acceleration was observed in the data collected at 25 Hz and a transformation equation suggested to equate vector magnitude between the two sampling frequencies. When comparing machine learning activity classification, there was excellent agreement between the two sampling frequencies [90]. Khan et al. performed a detailed study on optimizing the sampling frequency by using publicly available accelerometer datasets in the context of human activity recognition. They concluded that the sampling frequency used in the literature are up to 57% greater than what is required, leading to unnecessary device energy consumption [91]. It should also be noted that differences in measured acceleration based on sampling frequency can be device-specific, and researchers should not assume that changing the sampling frequency will have no impact on their data collection. Acceleration ranges between ± 2-15 g, and a resolution of 12 bits are often used [92–94]. In most cases, 2 g is acceptable however during high intensity PA acceleration can reach 6 g at a sensor placed on the waist or even higher the closer depending on the proximity of the sensor position to the floor [95]. One of the challenges in using wearable devices from differing manufacturers and where raw data is pre-processed remains the concealment of proprietary algorithms and device settings, which can be changed at the manufacturer’s discretion at the time of software updates. Interpretation and analyses of patient healthcare data, particularly longitudinal data collection is therefore exposed to inconsistencies at the time of analysis. For example, the manufacturer of ActiGraph PA monitors ActiGraph LLC state that the observed activity counts depend on the sampling frequency [96]. In Sect. 5.3, we discuss real-world experience of a PA monitoring cancer clinical study and discuss the choice of device based on the ability to access raw, unprocessed acceleration data.
5.4.1.4
Expertise
In planning a clinical study incorporating wearable accelerometers, it is essential to draw upon researchers with expertise in PA assessment. Both in the study design and analysis, specialist knowledge is required to meet the needs of the study in terms of choice of device, accelerometer settings and data handling including programming, statistical and/or ML analysis. This runs alongside the more clinical aspects of the study protocol and ensures that there is a firm understanding of what can be achieved from the data. For example, clinicians may have an expectation that they will be able to accurately measure step count from raw accelerometer data, which is a more digestible way of understanding accelerometer data for clinical decision making and communicating PA data to patients. However, there remain no validated algorithms to accurately count steps in the free-living environment in a non-healthy population. The popularity of commercial wearables has blurred the lines in healthcare studies evaluating PA outcomes, and whilst there is a steady rise in use of commercial devices in clinical studies, they remain unvalidated with concerns about the black box nature of their algorithms.
122
S. Dadhania and M. Williams
There is continued popularity of cut-point methods in determining physical activity levels i.e., sedentary, MVPA which remains attractive in clinical research owing to the simplicity of this analytical method. A systematic review reporting accelerometer methods in PA intervention studies highlighted popularity of the cut-point data analysis method, whilst more advanced analytical methods such as machine learning or other pattern recognition techniques were rarely employed [97]. Certain accelerometer brands have developed specific cut-points for labeling activity intensity and a recent study has published cut-points for determining activity intensity from an ActiGraph accelerometer worn on the wrist in free-living adults with an accuracy of 70.8% [98]. However, these are specific to this brand which use ‘counts’ as the unit of measure for ActiGraphs monitors [96]. Applying Hildebrand Euclidean Norm Minus One (ENMO) cut points for the wrist accelerometer had a higher accuracy of 75.2% but across all methods, limits of agreement were wide across sedentary and light intensities. Algorithms to generate comparable data by converting raw accelerometer data to Actigraph counts have been published [94] In the context of clinical studies recruiting patients with cancer or chronic disease who are more likely to carry out activities in the sedentary or low intensity range, subtle changes in activity may be attributable to an underlying disease and/or treatment effect and the wide variation in correctly labelling low intensity activity remain a challenge of interpreting accelerometer data for clinical use. Although ML methods are more complex than methods currently employed and a ‘plug-and-play’ approach is yet to be successfully developed, many studies have now shown improvements in prediction of energy expenditure, PA intensities and activity recognition for activity count and raw accelerometer data [99–103]. Barriers to adoption nevertheless remain prevalent as clinical teams implementing the PA monitors are often not the same as those developing the analytical models. This highlights a significant skills gap in clinical training at a time where electronic health datasets are both rising in popularity and growing in complexity.
5.5 Real-World Experience of Running a Digital Health Study 5.5.1 Device Considerations The use of wearables in clinical trials during cancer treatment is steadily increasing [78], and here we report real-world experience of running a longitudinal digital health trial including successes and challenges faced. There are now several digital endpoint frameworks published [104] which provide guidance on incorporating wearable devices and sensors into clinical studies, and this section therefore serves as an insight into the more practical aspects of a digital health study in a cancer sub-population. BrainWear (ISRCTN34351424) is a phase 2, non-randomized feasibility study, collecting wearable data via an Axivity AX3 wrist-worn triaxial accelerometer [105].
5 Wearable Accelerometers in Cancer Patients
Diagnosis HGG
Treatment
123
Follow-up
Metastatic Wrist worn AX3 Accelerometer (Continuous)
Surgery Only
Healthy Volunteers
Off Study: PS 3 ≥ 2 weeks Patient preference Stable disease for ≥ 6 months
MRI Imaging & Patient reported outcomes (Intermittent)
Fig. 5.6 BrainWear trial design. Patients with a radiological diagnosis of high-grade glioma (HGG), a low-grade glioma (LGG) having surgery-only, metastatic disease from a non-brain tumour primary or healthy volunteers, were screened via the Neuro-Oncology multidisciplinary team meeting and approached at early clinical visits. At baseline assessment, all participants were provided with an Axivity AX3 wrist worn accelerometer to wear on their non-dominant hand and requested to complete four patient reported outcome measures (PROMs); EORTC QLQ C30 and brain tumour specific module BN20 [106], the Montreal Cognitive Assessment (MoCA) and Multidimensional fatigue inventory (MFI) scale. At the time of recruitment and at each successive trial timepoint, patients were requested to complete the PROMs and the treating clinician to grade PS according the ECOG PS scale. The Axivity AX3 triaxial accelerometer recorded data at 100 Hz with a dynamic range of 8 g. Patients were asked to wear the device continuously. Medical history and relevant clinical information including imaging, histopathology results and treatment plans were extracted from patient electronic health records. Patients withdrew from the study if they no longer wished to take part, if they experienced an adverse reaction to the device (e.g., skin irritation), if they reached ECOG PS 3 for more than 2 weeks or had radiological evidence of stable disease for more than 6 months
Figure 5.6 shows the trial design. It seeks to collect multi-modal longitudinal data in patients from primary or secondary brain tumors at the point of diagnosis or recurrence for a minimum of 6 months, or until patient withdrawal. Standard clinical measures collected include patient and disease demographics, MRI imaging, quality of life and fatigue questionnaires and treatment related toxicity. Alongside this, patients wear an AX3 accelerometer on their non-dominant hand for as many hours of the day as they feel comfortable and in accordance with the manufacturer’s guidance. In its essence it is a feasibility trial and seeks to establish if longitudinal wearable data collection is feasible and acceptable in this patient population i.e., in patients receiving multiple treatment modalities such as chemotherapy, radiotherapy and surgery in a short period of time, and with a potentially limited life span. We hypothesize that changes in physical activity over time, are a potentially sensitive marker for progressive disease and events requiring hospitalization and seek to evaluate if this data can be used to explain, influence and/or predict healthrelated outcomes and in turn be translated into a digital biomarker to guide clinical decision making. In designing the study, we set out clear aims and objectives, with an appropriate selection of clinical outcome measures as described above. Table 5.1 shows the path and considerations taken in device selection which reflect the study objectives and Fig. 5.7 shows the Axivity AX3 device.
124
S. Dadhania and M. Williams
Table 5.1 Table to show considerations taken in device selection in keeping with aims and objectives of the study Considerations
Parameters
Explanation for device selection
Device and Data
• Can measure • To facilitate reproducibility and transparency in physical activity, clinical wearable research and given the sheer energy volume of configurations possible (sampling rate, expenditure, sampling resolution and accelerometer range), a sleep, temperature device was chosen which can collect raw rather than pre-processed accelerometer data and which • Battery when analyzed can generate these endpoints • Internal storage • Temperature monitoring allows wear-time analysis • Connectivity and adherence thus meets the requirements of our primary objective of feasibility and acceptability • Given the longitudinal nature of data collection, adequate battery life and storage was a key consideration. A maximum of one device change every 2-weeks at 100 Hz sampling frequency was required to encourage patient adherence and reduce burden on the research team • Comfortable and lightweight wristband, compatible with isopropyl alcohol (IPA) wipes • Prospective data collection design with retrospective data analysis, thus Bluetooth connectivity to enable real-time analysis was not required and met the objective of a feasibility study • A device which has been used in other wearable clinical research studies and validated for research use with a CE mark as per the Medical Devices Directive 43/42/EEC (MDD) [107]
Patient
• Technical requirements by patient • Device adherence • Burden assessment
• We conducted multiple patient and public engagement events to showcase the study design and device. We gathered information on patient preference of body location for longitudinal data collection, burden of the device, physical appearance and technical requirements • Patients felt the device was acceptable in appearance and comfort and identified the wrist as the ideal location for adherence • A device without a digital display was preferred by the research team to avoid feedback bias and prolong battery life • Minimal device handling e.g., charging, uploading data by the patient in view of disease burden and to promote adherence. A device change every two weeks acts as a prompt
Site and research management
• Training • Technical support • Device management
• A device was selected which had a simple process for set up and data download, with a user-friendly GUI appropriate for training multiple members of the research team and requires minimal technical support
Regulatory requirements
• Ethics and local sponsor approvals (continued)
5 Wearable Accelerometers in Cancer Patients
125
Table 5.1 (continued) Considerations
Parameters
Security and privacy
• Passiveness of the • A device was selected which did not track and device collect location (GPS) data to protect patient Storage privacy • Storage of the device data which meets ethics board information governance requirements for secure data storage
Explanation for device selection
Fig. 5.7 Axivity AX3 triaxial accelerometer
5.5.2 Successes and Challenges of Running a Real-World Wearable Accelerometer Study 5.5.2.1
Recruitment and Adherence
Our study opened for recruitment in October 2018. Within our institution, it is the first cancer clinical study incorporating a wearable device and the novelty factor brings interest from both staff and patients. In line with our patient and public engagement and results of published surveys, patients and carers have responded positively to the study concept and feel motivated that wearable and sensing technology may form part of cancer patient monitoring in the future. Thus, recruitment to the study has not been a challenge, and we have observed between 70–80% uptake amongst participants approached to take part in the study. There remains concern that older adults are less willing to accept wearables devices as a method of health monitoring however a review of 31 studies highlighted more than 60% of older people were interested in the future utility of wearable technology for improving physical and mental health, but that awareness of these products hindered use in this population [108]. Upon recruitment of the first 20 participants, we noted a 35% withdrawal rate within the first 30 days despite all patients meeting the study inclusion criteria. Reasons cited included not understanding the device needed to be worn for most
126
S. Dadhania and M. Williams
of the day or for more than 2 weeks and the device was uncomfortable. We sought to improve retention by improving patient education when introducing the study, explaining more clearly the commitment required for longitudinal data collection and asking patients to trial the device for a few hours whilst attending outpatient clinics. In our high-grade glioma cohort, which make up the largest subset of BrainWear participants, 83% of accelerometer data recorded has been categorized as high quality, suggesting acceptance in this patient population. Data capture from each accelerometer file is categorized as high or low quality based on ≥ 72 h of data in a 7-day data collection and data in each 1 h period of a 24 h cycle over multiple days. Missing data was imputed using the averaged data from similar times of the day. Of the patients who did not withdraw within the first 30 days of consent, 40% have provided more than 6 months of accelerometer data.
5.5.2.2
Device Selection
5% of patients who withdrew from the study did so in view of discomfort or skin reaction, but most patients feel satisfied with the device and wrist band. The research support team have found device set-up and data extraction simple and minimal training is required. In longitudinal data collection in the clinical setting, it is essential that expertise of the team is evaluated to ensure computational skills and device technological requirements are matched. In terms of challenges, the 2 weekly device change and manual data upload is a potential limitation of expanding studies such as this where data analysis is retrospective. Both the number of devices required to allow for adequate rotation amongst study participants and the time commitment for manual data upload and device set-up at each 2-weekly interval can be cumbersome. In a well-supported research setting this is more feasible but expansion of use into larger clinical trials or as a more general patient monitoring device is likely to need a device which can provide near real-time automatic feedback, analysis and incorporation into electronic health records. A recent systematic review of wearables in clinical trials during cancer treatment identified only 2 studies where the planned wear time was > 91 days [78].
5.5.2.3
Data Analysis
There are several opensource packages available for raw triaxial accelerometer data processing, summarized in Table 5.2. The term raw refers to the data being expressed in m/s2 or gravitational acceleration rather earlier generation accelerometers which stored the data directly on the device in brand specific units. There remains no widespread agreement on window length, calibration or data processing method to be used. Some guidance is available on optimizing methodology [74, 109, 110] but each individual study will have specific requirements based on the health parameters being measured. The advantage of using raw acceleration data is that device settings
5 Wearable Accelerometers in Cancer Patients
127
Table 5.2 Open-source packages available for processing raw triaxial accelerometer data Package
Link
Features
GGIR
https://cran.r-project.org/web/pac kages/GGIR/index.html [122–124]
• R ≥ 3.2.0 • Estimates of physical activity, inactivity and sleep • Visualized time series data • Algorithm to detect sleep period time window without sleep diary [116]
UK biobank
https://github.com/activityMonitor ing/biobankAccelerometerAnalysi [73, 125, 126]
• Python 3.7 • Estimates of physical inactivity, inactivity and sleep • Summary metrics • Statistical machine learning of sleep and physical activity phenotypes
pampro
https://github.com/Thomite/pampro [127]
• Python 3.7 • Physical activity estimates, summary statistics • Visualized time series data
MATLAB
https://raw.githubusercontent.com/dig italinteraction/openmovement/master/ Software/Analysis/Matlab/CWA_rea dFile.m
OpenMovement https://github.com/digitalinteraction/ • Windows Command Line or Matlab openmovement/tree/master/Software/ • Physical activity estimates, Analysis/Matlab summary statistics • Visualized time series data
are transparent, and analysis reproducible even if utilizing different data processing methods. All of these packages described in Table 5.2 provide estimates of physical activity, whilst the UK biobank package has further incorporated activity classification using a two-stage ML model consisting of balanced random forests and hidden markov models. Using this model on 87,509 UK biobank participants, they found reallocating time from any behavior to MVPA, or from sedentary behavior to any behavior was associated with a lower risk of incident cardiovascular disease (CVD) and can be used to guide public health interventions and guidelines [111]. In the case of short interval data collection periods such as the UK Biobank 7-day accelerometer sampling in 100, 000 participants [73], energy consumption and battery life is not a priority. When data is collected over a longer period as in our study (protocol for a minimum of 6 months), study duration far exceeds the battery lifetime of the sensor. Furthermore, a major challenge in accelerometer based sleep measurement is to derive sleep parameters without the additional information of patient diaries, which are often seen as cumbersome and prone to reporting bias [112–115]. Vincent van Hees et al. has examined whether sleep parameters can be estimated from accelerometer data in the absence of sleep diaries. They developed a heuristic algorithm which
128
S. Dadhania and M. Williams
uses the variance in estimated z-axis angle and incorporates a set of basic assumptions about sleep interruptions. They reported the sleep period time (SPT) window when compared against a sleep diary in 3752 participants to be 10.9 and 2.9 min longer when compared to the sleep diary in men and women, respectively. The mean concordance statistic to detect the SPT-window when compared to the gold standard polysomnography was 0.86 and 0.83 in 28 clinic based and healthy sleepers [116]. Whilst there is an expansive amount of research on activity classification which is beyond the scope of this chapter (see [86] for a comprehensive review of application of raw accelerometer data and ML techniques to characterize human movement behavior), clinical research demands an explainable approach to ML in order to justify its use in clinical decision making when concerned with patients. A recent systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models [117]. More specific to wearable accelerometers, a study comparing linear and non-linear models for predicting energy expenditure (EE) from raw accelerometer data placed on the hip, thigh and wrists showed ML models offered significant improvement in EE prediction accuracy over linear models. Conversely, linear models showed similar EE prediction accuracy to ML models for accelerometers placed on the hip or thigh [118]. There remains no consensus in wearable clinical research as to optimal data processing methods, calibration, window lengths or sampling frequency and it is often left to the individual requirements of the study. Although some guidance does exist, a recent systematic review assessing calibration and validation of accelerometry to measure physical activity thresholds in adult clinical groups highlighted significant heterogeneity in the clinical protocols which hinders the applicability and comparison of the developed cut-points. A mixed protocol containing controlled laboratory exercise tests and activities of daily living is suggested and recommends a statistical approach that allows for adjustments for disease specific populations or the use of ML models. It also highlights the need to develop cut points in clinical populations, which are likely to be significantly different to those developed in healthy population but which are currently used in clinical wearable research [109, 119]. Given the effect of disease and aging on body composition and functionality, healthy population cut point algorithms are unlikely to be suitable for a cancer population, and use of these estimates may lead to bias in reporting [40, 120, 121]. As the field of wearables in clinical research evolves, it is vital that data is stored in its raw format to facilitate future application of ML learning algorithms currently being developed.
5.6 Clinical Studies in Cancer Patients Using Wearable Accelerometers Cancer remains the leading cause of morbidity and mortality worldwide [128] and there is a growing interest in the role of physical activity as a nonpharmacological intervention and as a cancer prevention strategy [129]. The number of clinical studies
5 Wearable Accelerometers in Cancer Patients
129
utilizing wearable accelerometers has steadily risen over the past decade. In 2010 a report identified just 10 clinical trials incorporating accelerometers or pedometers into their trial design, with only 2 of the 10 recruiting patients receiving active treatment [130]. A decade later, the use of wearables in cancer clinical trials during cancer treatment has steadily increased, and a 2020 systematic review identified 25 studies in this arena. 15 of the 25 studies were in the three largest tumour groups (breast, lung and gastrointestinal) and the oncological treatment was primarily chemotherapy with only 3 being randomized controlled trials. Whilst a large proportion of these studies collected PRO data (17/25), only half of these correlated wearable data with the PRO. The heterogeneity of study outcomes, wearable settings and adherence presented in this review and others [78, 131, 132] highlights the need for more established standards to conclude if wearable data can be utilized with the addition of more randomized controlled trials to create consensus on how to integrate wearables into Oncology studies. Going forward, research in this field requires well-defined outcomes and where applicable should be based on the available guidance within the field [133, 134]. Table 5.3 highlights those studies which have compared wearable data to measurable clinical cancer outcomes whilst receiving treatment in more detail. In 7 of the 9 studies, step count is collected as a measure of physical activity despite only one device having evidence of a validation study being performed in a cancer population [135]. When this same device was used to assess total daily energy expenditure in patients with COPD, it identified 2 different software versions reporting significantly different outputs of 12 and 35% underestimation of energy expenditure. Even in the widely used ActiGraph devices where there are many validation studies which have legitimized the use of actigraphy in clinical trials, few studies have performed a direct comparison of the different commercially available ActiGraphs with gold-standard polysomnography (PSG). ActiGraph devices vary in both hardware and software though are assumed to produce similar data and have been used interchangeably, and no standard has been defined. One of the main issues remain that researchers and clinicians who have an interest in incorporating wearable accelerometers as a method of remote monitoring are inhibited by the lack of validated activity monitors in cancer populations where sedentary behavior is more likely [136]. Although many of these devices are marketed as being clinically validated, there are currently no standards to ensure that the data from digital medicine tools is evaluated and appropriate for clinical use. Goldsack et al. have developed a three-step framework which evaluates and documents the clinical usefulness of digital medicine tools and standardize smartwatches and other devices that collect biometric data. They argue that the current terminology in the digital wearable space are not aligned and a consensus approach is needed to evaluate the quality of digital medicine products. Furthermore, the rapid rise in the development of and demand for medical digital devices to support clinical practice has left a knowledge gap in developing and evaluating the body of evidence systematically, with the potential for misinterpretation of data, misleading clinical trials and potentially patient harm if not addressed. The
• Physical activity in minutes/week in 1-min epochs
Vassbakk- Sense-Wear Armband Brovold Pro3 or SenseWear et al. [143] Armband Mini / SenseWear version 6.1 for Pro3 and version 7.0 for Mini
Device Outcome
• Step count (steps/day)
Device/ software
Nyrop FitBit Zip/not given et al. [138]
Author
Mixed
Breast
Patient tumour type
66
42
Sample size, n
59 (11)
24–64
Age group (range or mean [SD])
Literature supporting device validation / participant group
5 days / ≥ 19.2 h, for ≥ 1 day
(continued)
Lopez et al. [144] / HVac Bhammar et al. [145] / HVa Cereda et al. [135] / DP a – Cancer Farooqi et al. [146] / DPa – COPD Hermann et al. [147]/ DPa – OA Hip
6–12 weeks / ≥ 3 weeks Tully et al. [139] / HVa Schneider et al. [140] / HVa St-Laurent et al. [141] / HVb Sharp et al. [142] / HVc
Length of wearable capture/ valid wear time
Table 5.3 Studies which have compared patient wearable data to measurable clinical cancer outcomes whilst receiving treatment. The table shows the devices selected at each study, and further evidence for validation studies performed on the specific device
130 S. Dadhania and M. Williams
• Step count (steps/day) • Physical activity (sedentary minutes/day) • Sleep(minutes)
Gupta FitBit Flex/ not given et al. [154]
Device Outcome
• Energy expenditure (MET) h/day • Step count • Position time (hours/day)
Device/ software
Lowe et al. ActivPAL/ not given [148]
Author
Table 5.3 (continued)
Mixed
Mixed
Patient tumour type
24
31
Sample size, n
Length of wearable capture/ valid wear time
54 (12.5)
Skipworth et al. [149] / DP a –cancer Bourke et al. [150] / HVa – older adults Kozey-Keadle [151] / HVa -inactive Sellars et al. [152] / HVa Grant et al. [153] / HVa
Literature supporting device validation / participant group
(continued)
12 weeks/ ≥ 1 steps/day St-Laurent et al. recorded [141] / HVa Alharbi et al. [155] / DPa – CVD Burton et al. [156] / HVa – older adults
63.5 (10.4) 7 days / not stated
Age group (range or mean [SD])
5 Wearable Accelerometers in Cancer Patients 131
• Step count Gastrointestinal (steps/day) • Floors climbed (n) • Sleep (minutes) • Awakenings (n) Time in bed (minutes)
FitBit Charge HR/ not given
Low et al. [160]
Mixed
• Step count (steps/day) • Heart rate • Calories (calories/hour)
Lung
Patient tumour type
Broderick Microsoft Band 2 / not et al. [158] reported
Device Outcome
• Step count (steps/day) • Number of 10 + minutes step bouts/day • Duration of 10 + minutes bouts • Cadence of 10 + minutes bouts (steps/min)
Device/ software
Edbrooke SenseWear et al. [157] accelerometer / not reported
Author
Table 5.3 (continued)
14
42
92
Sample size, n
40–74
24–72
63 (12.3)
Age group (range or mean [SD])
Literature supporting device validation / participant group
4 weeks / not stated
60 days / ≥ 6 h/day
(continued)
Burton et al. [156] / HVa – older adults Gorny et al. [161] / HVa Jo et al. [162] / HVa
Pope et al. [159] / HVa
7 days at 3 timepoints/ As above 8 h per day for ≥ 4 days
Length of wearable capture/ valid wear time
132 S. Dadhania and M. Williams
Chemotherapy
Chemotherapy
Whole brain radiotherapy
Vassbakk- Brovold et al. [143]
Lowe et al. [148]
• Step count (steps/day) • Heart rate
Nyrop et al. [138]
FitBit Zip and FitBit Charge 2 / Fitabase
Wright [167]
• Circadian consistency • Daytime activity level (minutes) • Sleep (%)
Treatment
Mini-Motionlogger ActiGraph / Action 3
Roscoe [163]
Device Outcome
Author
Device/ software
Author
Table 5.3 (continued)
10
102
Sample size, n
60(11)
34–79
Age group (range or mean [SD])
Literature supporting device validation / participant group
30 days/ ≥ 4 days/week
As above
72 h at 2 timepoints / not Cole et al. [164] / stated HV & DP – sleep disorders Ancoli-Israel et al. [165] – review paper, sleep and circadian rhythm Van de Water et al. [166] – Systematic review, sleep
Length of wearable capture/ valid wear time
PA Symptoms QOL
PA
Symptoms QOL PA Mental health Fatigue
PA
Circadian Rhythm
Skin temp
(continued)
Symptoms
Sleep
Wearable data outcome (top row) reporting a relationship to clinical outcomes collected by questionnaire (columns)
Gynaecological
Breast
Patient tumour type
5 Wearable Accelerometers in Cancer Patients 133
Systemic therapy
Mixed
Chemotherapy
Chemotherapy
Chemotherapy ± radiotherapy
Chemotherapy
Gupta et al. [154]
Edbrooke et al. [157]
Broderick et al. [158]
Low et al. [160]
Roscoe [163]
Wright [167]
a
= Adults; b = pregnant women; c = children HV = Healthy volunteers DP = diseased patient COPD = chronic obstructive pulmonary disease OA = osteoarthritis CVD = cardiovascular disease
Treatment
Author
Table 5.3 (continued)
PA Fatigue Sleep
PA Mental health QOL Symptoms
Mental health QOL Fatigue
PA
Fatigue
Circadian Rhythm
Skin temp
Sleep Mental health
Mental health QOL Fatigue
Sleep
Wearable data outcome (top row) reporting a relationship to clinical outcomes collected by questionnaire (columns)
134 S. Dadhania and M. Williams
5 Wearable Accelerometers in Cancer Patients
135
framework includes (1) verification, (2) analytical validation, and (3) clinical validation and attempts to determine whether a digital medicine tool is fit-for purpose [134].
5.7 Ethical Issues with Wearable Accelerometer Data 5.7.1 Data Privacy and Security Data privacy and security remains an important consideration when incorporating wearable accelerometers into clinical studies and the clinic environment. All stakeholders involved in the implementation of these devices require assurances that patient data provided is linked to the correct user of the device in electronic health records and provided by the patient themselves. A report detailing analysis of fitness tracker privacy and security found several fitness applications connected to wearable devices are exposed to security vulnerabilities that enable unauthorized third-parties to read, write and delete user data which can leave users personal and geolocation data exposed [167]. The FDA and other agencies have produced guidance on the necessity for robust security procedures and policies to be in place and the requirements from device manufacturers, including end-to-end encryption and auditing/logging of wearable accelerometer data [168, 169]. Implementing wearables into clinical studies will require adherence to these and other national cybersecurity policies.
5.7.2 Data Ownership Whilst users expect their wearable data will remain both private and anonymous, there are concerns that data can be sold onward to health analytical companies and with respect to ownership of wearable data. Users of commercial wearable devices often do not own their data and instead see only a fraction of their data presented as activity summaries or counts. Clinical research studies which utilize commercial wearable devices are thus exposed to the same data ownership challenges and research teams have access to only physical activity summaries produced by proprietary algorithms. A move towards raw data collection in clinical studies provides the opportunity to build large accelerometer datasets in specific patient groups and increases transparency in clinical research whilst future proofing the data for use with models currently not fully developed. Raw data can additionally be held securely and safely in the research institution, which is likely to have a high level of security and privacy governance in place. Patient and public involvement sessions carried out in participants with chronic disease showed people were generally supportive of wearable research, and trusted university researchers to handle and protect device data and treat it confidentially [170].
136
S. Dadhania and M. Williams
5.7.3 Insurance Premiums In the United States, many insurance companies are encouraging policy holders to use wearable devices, which has generated some concern about personal health data being used to manipulate insurance premiums. There remain concerns about unvalidated algorithms detecting inaccurate activity counts, and the high cost for devices which can worsen health care disparities [171].
5.8 Conclusion The digital revolution refers to the opportunity of collecting, transmitting and analyzing large quantities of digital data related to health and medical conditions with the aid of monitoring devices [172]. The intermittent monitoring which currently takes place during clinical visits provides only a snapshot of a patient’s health. Data collected from wearable accelerometers forming part of patient-generated health data provides an unparalleled opportunity to monitor and track an individual’s cancer experience, whilst engaging patients in their care and moving towards a true learning healthcare system for cancer care. There remain several challenges in unlocking the potential of wearable healthcare data include concerns from providers, workflow issues, standardization of patient generated health data, security and privacy concerns, interoperability of sensor containing devices and limited electronic health record functionalities to incorporate the large volume data. Translating large volumes of wearable accelerometer PA data into clinically useful summaries requires a standardized database and data-processing algorithms. Existing commercial products have brand specific web-based applications, but for the purpose of health research, a device- or brand -agnostic solution is required for data standardization. There are several areas where standardization is required including establishing valid daily wear-time protocols, handling of missing or spurious data and the datahandling process for which many studies have not reported their decision rules [173]. A thorough assessment of use patterns is required before any of the above can be standardized, as they have a direct impact on validation of the data. Furthermore, endpoints of each study will need to be taken into consideration as studies focusing on population level versus individual-level data will need different outcomes establishing. The former may choose to incorporate less restrictive criteria to facilitate recruitment and retention whilst the latter a stricter approach which considers intraindividual variation. Establishing which patient groups will take part in wearable accelerometer research will also allow us the opportunity to gather information on those patients who wish not too, e.g., elderly, those who are less technologically skilled or in patients requiring carer input. In our experience of running a wearable accelerometer clinical study, age does not appear to be a limiting factor in participation, with 50% of recruits above the age of 60. What is less clear is what technological expectations clinical and research teams will require from their study population
5 Wearable Accelerometers in Cancer Patients
137
e.g., access to a smartphone, device charging, data uploading, enabling Bluetooth connectivity. Although this may become less of an issue as increasing adoption of smartphones across all demographics makes access to mobile phone platforms more available. It is well documented that cancer survivors engage in only a few minutes of MVPA and spend a large proportion of their day sedentary [174]. Few studies have examined the reliability and validity of accelerometers for measuring sedentary behavior in older adults, and studies to date suggest that the criteria researchers currently use for classifying an epoch as sedentary instead of as non-wear time may need to be different for older versus younger adults. Cancer remains a disease of the elderly, and thus incorrectly identifying and classifying behaviors in those who have cautious movement patterns may lead to concerns about the accuracy and thus utility of wearable accelerometer data [175]. Incorporation of commercial physical activity monitors are being rapidly adopted by patients, clinicians and researchers and certainly provide great potential in being able to more accurately quantify physical activity in cancer patients. However previous work on quantified-self data has been based upon older generation research grade accelerometers, which have paved the way for consumer based wearable devices to be validated as a feasible in patients with cancer. Further work is required to understand what parameters these devices are best suited to measuring in a nonhealthy population before broader implementation can be supported. Although the increased granularity of raw accelerometer data makes it more challenging to use than summarized data provided by commercial devices, it has the potential to identify additional information which can easily be looked over by taking minute-, hour-, or day-level summaries. Shortage of skilled healthcare professionals with the required information-and communication technology skills remain a challenge due to the fast pace of technological innovations. The handling of raw accelerometer data requires specialized visualization and analytical methods and there is an ever-increasing role for healthcare data analysts competent in handling large amounts of patient generated health data whilst being able to grasp the clinical aspects of the studies incorporating these devices. Karas et al. summarized a number of key points when handling raw accelerometer data (1) devices which produce activity counts which are summaries of the raw data will differ according to the device manufacturer, software versioning and body location; (2) open-source packages which produce summary statistics are more widely available and summarized in Table 5.2; (3) both raw and summarized PA date can vary with the location of the device, along with intra and inter-individual variability; (4) correct choice of device location may identify data signatures tailored to a particular study purpose or outcome; (5) device orientation can change during the study period and ideally needs to be standardized both within and between individuals and devices; (6) altering the sampling frequency can affect both raw data and PA summaries; (7) correctly labelling data at the sub-second level is crucial for training human activity movement classifiers [67]. Further advancements in PA measurement methodology are needed at both the group and individual level and will aid the implementation of PA as a treatment modality in future health to counteract the expected rise in lifestyle related diseases
138
S. Dadhania and M. Williams
including cancer. When planning new studies incorporating wearable accelerometers, it is recommended to plan a methodology which demonstrates high precision and accuracy at the individual and study group level for body placement and identify analytical methods which embody the complete behavioral pattern, follow a strict measurement protocol and establish collaboration across disciplines to harness expertise in data processing, statistical and ML methods and clinical and epidemiological PA research. Given the dynamic changes which occur in physical functioning with aging and disease, application of general population algorithms may not be appropriate for cancer populations. It is recommended that studies incorporating wearable accelerometers store and analyze the data in its raw format as counts or raw g’s to preserve data integrity as the rapid evolution of research in this field warrants availability of raw data for the future application of ML algorithms currently under development. Acknowledgements Many thanks to David Taylor and Scott Small for their comments and time when reviewing this chapter.
References 1. F. Bray, J. Ferlay, I. Soerjomataram, R.L. Siegel, L.A. Torre, A. Jemal, Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA. Cancer J. Clin. 68(6), 394–424 (2018). https://doi.org/10.3322/caac.21492 2. A. Mctiernan et al., Physical activity in cancer prevention and survival: a systematic review. Med. Sci. Sports Exerc. 51(6), 1252–1261 (2019). https://doi.org/10.1249/MSS.000000000 0001937 3. A.V. Patel et al., American college of sports medicine roundtable report on physical activity, sedentary behavior, and cancer prevention and control. Med. Sci. Sports Exerc. 51(11), 2391– 2402 (2019). https://doi.org/10.1249/MSS.0000000000002117 4. L.F.M. de Rezende et al., Physical activity and cancer: an umbrella review of the literature including 22 major anatomical sites and 770 000 cancer cases. Br. J. Sports Med. 52(13), 826–833 (2018). https://doi.org/10.1136/bjsports-2017-098391 5. Guidance for industry: patient-reported outcome measures: use in medical product development to support labeling claims: draft guidance. Health Qual. Life Outcomes 4 (2006) https:// doi.org/10.1186/1477-7525-4-79 6. T.A. Albrecht, A.G. Taylor, Physical activity in patients with advanced-stage cancer: a systematic review of the literature. Clin. Oncol. Nurs. 16(3), 293–300 (2012). https://doi.org/10.1188/ 12.CJON.293-300 7. R. Beaton, W. Pagdin-Friesen, C. Robertson, C. Vigar, H. Watson, S.R. Harris, Effects of exercise intervention on persons with metastatic cancer: a systematic review. Physiotherapy Can. 61(3), 141–153 (2009). https://doi.org/10.3138/physio.61.3.141 8. A. Ravizza, C. De Maria, L. Di Pietro, F. Sternini, A.L. Audenino, C. Bignardi, Comprehensive review on current and future regulatory requirements on wearable sensors in preclinical and clinical testing. Frontiers in Bioengineering and Biotechnology 7, 313 (2019). https://doi.org/ 10.3389/fbioe.2019.00313 9. TVSCN commissioning guidance. Available https://commissioninguidance.tvscn.nhs.uk/wpcontent/uploads/2016/03/Cancer.pdf. Accessed 12 Jan 2021 10. Cancer Pathway. Available https://datadictionary.nhs.uk/nhs_business_definitions/cancer_ pathway.html. Accessed 12-Jan-2021
5 Wearable Accelerometers in Cancer Patients
139
11. Definition of overall survival—NCI Dictionary of Cancer Terms—National Cancer Institute. Available https://www.cancer.gov/publications/dictionaries/cancer-terms/def/overallsurvival. Accessed 17-Feb-2021 12. A. Baker, Book: crossing the quality chasm: a new health system for the 21st century. BMJ 323(7322), 1192–1192 (2001). https://doi.org/10.1136/bmj.323.7322.1192 13. A. Oliver, C.C. Greenberg, Measuring outcomes in oncology treatment: the importance of patient-centered outcomes. Surg. Clin. North Am. 89(1), 17–25 (2009). https://doi.org/10. 1016/j.suc.2008.09.015 14. Understanding Health Care Outcomes Research—Google Books. Available https://books. google.co.uk/books?hl=en&lr=&id=E1QIafhAlj8C&oi=fnd&pg=PP1&ots=JnNxl2vJ-8& sig=a9rLyyK4VBOUP5afOvkPU0du5gQ&redir_esc=y#v=onepage&q&f=false. Accessed 06 Apr 2021 15. N.F. Butte, U. Ekelund, K.R. Westerterp, Assessing physical activity using wearable monitors: measures of physical activity. Med. Sci. Sports Exerc. 44(SUPPL), 1 (2012). https://doi.org/ 10.1249/MSS.0b013e3182399c0e 16. M.M. Oken et al., Toxicity and response criteria of the Eastern Cooperative Oncology Group. Am. J. Clin. Oncol. (1982). https://doi.org/10.1097/00000421-198212000-00014 17. B.J. Karnofsky DA, The clinical evaluation of chemotherapeutic agents in cancer. Evaluation of chemotherapeutic agents, ed. by C.M. MacLeod (Columbia University Press, New York, 1949), pp. 191–205 18. D.H. Henry, H.N. Viswanathan, E.P. Elkin, S. Traina, S. Wade, D. Cella, Symptoms and treatment burden associated with cancer treatment: results from a cross-sectional national survey in the U.S. Support. Care Cancer 16(7), 791–801 (2008). https://doi.org/10.1007/s00 520-007-0380-2 19. W.T. Riley et al., Patient-reported outcomes measurement information system (PROMIS) domain names and definitions revisions: further evaluation of content validity in IRT-derived item banks. Qual. Life Res. 19(9), 1311–1321 (2010). https://doi.org/10.1007/s11136-0109694-5 20. N.K. Aaronson et al., The European organization for research and treatment of cancer QLQC30: a quality-of-life instrument for use in international clinical trials in oncology. JNCI J. Natl. Cancer Inst. 85(5), 365–376 (1993). https://doi.org/10.1093/jnci/85.5.365 21. D.F. Cella et al., The functional assessment of cancer therapy scale: development and validation of the general measure. J. Clin. Oncol. 11(3), 570–579 (1993). https://doi.org/10.1200/ JCO.1993.11.3.570 22. The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection—PubMed. Available https://pubmed.ncbi.nlm.nih.gov/1593914/. Accessed 13 Jan 2021 23. G. G. et al., Wearable activity monitors to assess performance status and predict clinical outcomes in advanced cancer patients. npj Digit. Med. 1(1), 27 (2018). https://doi.org/10. 1038/s41746-018-0032-6 24. A.E. Taylor, I.N. Olver, T. Sivanthan, M. Chi, C. Purnell, Observer error in grading performance status in cancer patients. Support. Care Cancer 7(5), 332–335 (1999). https://doi.org/ 10.1007/s005200050271 25. N.A. Christakis, E.B. Lamont, Extent and determinants of error in doctors’ prognoses in terminally ill patients: prospective cohort study. Br. Med. J. 320(7233), 469–472 (2000). https://doi.org/10.1136/bmj.320.7233.469 26. J.A. Schrack, G. Gresham, A.A. Wanigatunga, Understanding physical activity in cancer patients and survivors: new methodology, new challenges, and new opportunities (2017). https://doi.org/10.1101/mcs.a001933 27. D. Kyte, J. Ives, H. Draper, M. Calvert, Current practices in patient-reported outcome (PRO) data collection in clinical trials: a cross-sectional survey of UK trial staff and management. BMJ Open 6(10), e012281 (2016). https://doi.org/10.1136/bmjopen-2016-012281 28. J. Chen, L. Ou, S.J. Hollis, A systematic review of the impact of routine collection of patient reported outcome measures on patients, providers and health organisations in an oncologic setting. BMC Health Ser. Res. 13(1), 1–24 (2013). https://doi.org/10.1186/1472-6963-13-211
140
S. Dadhania and M. Williams
29. C. Quinten et al., Baseline quality of life as a prognostic indicator of survival: a meta-analysis of individual patient data from EORTC clinical trials. Lancet Oncol. 10(9), 865–871 (2009). https://doi.org/10.1016/S1470-2045(09)70200-1 30. C.C. Gotay, C.T. Kawamoto, A. Bottomley, F. Efficace, The prognostic significance of patientreported outcomes in cancer clinical trials. J. Clin. Oncol. 26(8), 1355–1363 (2008). https:// doi.org/10.1200/JCO.2007.13.3439 31. E. Basch et al., Symptom monitoring with patient-reported outcomes during routine cancer treatment: a randomized controlled trial. J. Clin. Oncol. 34(6), 557–565 (2016). https://doi. org/10.1200/JCO.2015.63.0830 32. C.E. Matthews, S.C. Moore, S.M. George, J. Sampson, H.R. Bowles, Improving self-reports of active and sedentary behaviors in large epidemiologic studies. Exerc Sport Sci Rev 40(3), 118–126 (2012). https://doi.org/10.1097/JES.0b013e31825b34a0 33. L. Thorsen, E. Skovlund, S.B. Strømme, K. Hornslien, A.A. Dahl, S.D. Fosså, Effectiveness of physical activity on cardiorespiratory fitness and health-related quality of life in young and middle-aged cancer patients shortly after chemotherapy. J. Clin. Oncol. 23(10), 2378–2388 (2005). https://doi.org/10.1200/JCO.2005.04.106 34. S.S. Lowe, Physical activity and palliative cancer care. Recent Results Cancer Res. 186, 349–365 (2011). https://doi.org/10.1007/978-3-642-04231-7_15 35. N.Y. Arnardottir et al., Objective measurements of daily physical activity patterns and sedentary behaviour in older adults: Age, Gene/Environment Susceptibility-Reykjavik Study. Age Ageing 42(2), 222–229 (2013). https://doi.org/10.1093/ageing/afs160 36. K.R. Martin et al., Changes in daily activity patterns with age in U.S. men and women: national health and nutrition examination survey 2003–04 and 2005–06. J. Am. Geriatr. Soc. 62(7), 1263–1271 (2014). https://doi.org/10.1111/jgs.12893 37. J. Barker et al., Physical activity of UK adults with chronic disease: cross-sectional analysis of accelerometer-measured physical activity in 96 706 UK Biobank participants. Int. J. Epidemiol. 48(4), 1167–1174 (2019). https://doi.org/10.1093/ije/dyy294 38. J.A. Schrack, G. Gresham, A.A. Wanigatunga, Understanding physical activity in cancer patients and survivors: new methodology, new challenges, and new opportunities. Cold Spring Harb. Mol. Case Stud. 3(4), a00193 (2017). https://doi.org/10.1101/mcs.a001933 39. B.E. Ainsworth et al., Compendium of physical activities: classification of energy costs of human physical activities. Med. Sci. Sports Exerc. 25(1), 71–74 (1993). https://doi.org/10. 1249/00005768-199301000-00011 40. J. A. Schrack et al., Assessing the physical cliff: Detailed quantification of age-related differences in daily patterns of physical activity. J. Gerontol.—Ser. A Biol. Sci. Med. Sci. 69(8), 973–979 (2014). https://doi.org/10.1093/gerona/glt199 41. R.P. Troiano, D. Berrigan, K.W. Dodd, L.C. Mâsse, T. Tilert, M. Mcdowell, Physical activity in the United States measured by accelerometer. Med. Sci. Sports Exerc. 40(1), 181–188 (2008). https://doi.org/10.1249/mss.0b013e31815a51b3 42. D.E.R. Warburton, C.W. Nicol, S.S.D. Bredin, Health benefits of physical activity: the evidence. CMAJ 174(6), 801–809 (2006). https://doi.org/10.1503/cmaj.051351 43. J. A. Schrack, G. Gresham, A. A. Wanigatunga, S. J.A., G. G., Understanding physical activity in cancer patients and survivors: new methodology, new challenges, and new opportunities. Cold Spring Harb. Mol. case Stud. 3(4) (2017) http://dx.doi.org/https://doi.org/10.1101/mcs. a001933 44. C.M. Friedenreich, Q. Wang, H.K. Neilson, K.A. Kopciuk, S.E. McGregor, K.S. Courneya, Physical activity and survival after prostate cancer. Eur. Urol. 70(4), 576–585 (2016). https:// doi.org/10.1016/j.eururo.2015.12.032 45. C.M. Dieli-Conwright, K. Lee, J.L. Kiwata, Reducing the risk of breast cancer recurrence: an evaluation of the effects and mechanisms of diet and exercise. Curr. Breast Cancer Rep. 8(3), 139–150 (2016). https://doi.org/10.1007/s12609-016-0218-3 46. J.C. Brown, K. Winters-Stone, A. Lee, K.H. Schmitz, Cancer, physical activity, and exercise. Compr. Phys. 2(4), 2775–2809 (2012). https://doi.org/10.1002/cphy.c120005
5 Wearable Accelerometers in Cancer Patients
141
47. A.L. Hawkes, K.I. Pakenham, S.K. Chambers, T.A. Patrao, K.S. Courneya, Effects of a multiple health behavior change intervention for colorectal cancer survivors on psychosocial outcomes and quality of life: a randomized controlled trial. Ann. Behav. Med. 48(3), 359–370 (2014). https://doi.org/10.1007/s12160-014-9610-2 48. J. Hamer, E. Warner, Lifestyle modifications for patients with breast cancer to improve prognosis and optimize overall health. CMAJ 189(7), E268–E274 (2017). https://doi.org/10.1503/ cmaj.160464 49. J.A. Meyerhardt et al., Impact of physical activity on cancer recurrence and survival in patients with stage III colon cancer: findings from CALGB 89803. J. Clin. Oncol. 24(22), 3535–3541 (2006). https://doi.org/10.1200/JCO.2006.06.0863 50. M. Maddocks, A. Byrne, C.D. Johnson, R.H. Wilson, K.C.H. Fearon, A. Wilcock, Physical activity level as an outcome measure for use in cancer cachexia trials: a feasibility study. Support. Care Cancer 18(12), 1539–1544 (2010). https://doi.org/10.1007/s00520-009-0776-2 51. S.N. Garland et al., Sleeping well with cancer: a systematic review of cognitive behavioral therapy for insomnia in cancer patients. Neuropsychiatric Dis. Treat. 10, 1113–1123 (2014). https://doi.org/10.2147/NDT.S47790 52. M.S. Jeon, H.M. Dhillon, M.R. Agar, Sleep disturbance of adults with a brain tumor and their family caregivers: a systematic review. Neuro. Oncol. 19(8), 1035–1046 (2017). https://doi. org/10.1093/neuonc/nox019 53. C.A. Engstrom, R.A. Strohl, L. Rose, L. Lewandowski, M.E. Stefanek, Sleep alterations in cancer patients. Cancer Nurs. 22(2), 143–148 (1999). https://doi.org/10.1097/00002820-199 904000-00006 54. D. Howell et al., Sleep disturbance in adults with cancer: a systematic review of evidence for best practices in assessment and management for clinical practice. Ann. Oncol. 25(4), 791–800 (2014). https://doi.org/10.1093/annonc/mdt506 55. S. Faithfull, M. Brada, Somnolence syndrome in adults following cranial irradiation for primary brain tumours. Clin. Oncol. 10(4), 250–254 (1998). https://doi.org/10.1016/S09366555(98)80011-3 56. Z. Chen et al., Deregulated expression of the clock genes in gliomas. Technol. Cancer Res. Treat. 12(1), 91–97 (2013). https://doi.org/10.7785/tcrt.2012.500250 57. Validation of an accelerometer to quantify a comprehensive battery of gait characteristics in healthy older adults and Parkinson’s disease: toward clinical and at home use | enhanced reader. Available chrome-extension://dagcmkpagjlhakfdhnbomgmjdpkdklff/enhancedreader.html?pdf=https%3A%2F%2Fbrxt.mendeley.com%2Fdocument%2Fcontent%2Fec4e4ba1– 22d8–3de0-bed0-d3363013d97e. Accessed 01 Feb 2021 58. A. Godfrey, S. Del Din, G. Barry, J.C. Mathers, L. Rochester, Instrumenting gait with an accelerometer: a system and algorithm examination. Med. Eng. Phys. 37(4), 400–407 (2015). https://doi.org/10.1016/j.medengphy.2015.02.003 59. F.A. Storm, B.W. Heller, C. Mazzà, Step detection and activity recognition accuracy of seven physical activity monitors. PLoS ONE 10(3), e0118723 (2015). https://doi.org/10.1371/jou rnal.pone.0118723 60. K.R. Evenson, M.M. Goto, R.D. Furberg, Systematic review of the validity and reliability of consumer-wearable activity trackers. Int. J. Behav. Nutr. Phys. Act. 12(1), 159 (2015). https:// doi.org/10.1186/s12966-015-0314-1 61. J. Verghese, C. Wang, R.B. Lipton, R. Holtzer, X. Xue, Quantitative gait dysfunction and risk of cognitive decline and dementia. J. Neurol. Neurosurg. Psychiatry 78(9), 929–935 (2007). https://doi.org/10.1136/jnnp.2006.106914 62. S. Del Din et al., Analysis of free-living gait in older adults with and without Parkinson’s disease and with and without a history of falls: identifying generic and disease-specific characteristics. J. Gerontol.—Ser. A Biol. Sci. Med. Sci. 74(4), 500–506 (2019). https://doi.org/ 10.1093/gerona/glx254 63. A. Hickey, S. Del Din, L. Rochester, A. Godfrey, Detecting free-living steps and walking bouts: validating an algorithm for macro gait analysis. Physiol. Meas. 38(1), N1–N15 (2017). https://doi.org/10.1088/1361-6579/38/1/N1
142
S. Dadhania and M. Williams
64. T. Fojo, S. Mailankody, A. Lo, Unintended consequences of expensive cancer therapeutics— the pursuit of marginal indications and a me-too mentality that stifles innovation and creativity: The John Conley lecture. JAMA Otolaryngol.—Head Neck Surg. 140(12), 1225–1236 (2014). https://doi.org/10.1001/jamaoto.2014.1570 65. S.I. Mishra, R.W. Scherer, C. Snyder, P.M. Geigle, D.R. Berlanstein, O. Topaloglu, Exercise interventions on health-related quality of life for people with cancer during active treatment, 2012(8), CD008465 66. A. Haslam, D. Herrera-Perez, J. Gill, V. Prasad, Patient experience captured by quality-of-life measurement in oncology clinical trials. JAMA Netw. Open 3(3), e200363 (2020). https:// doi.org/10.1001/jamanetworkopen.2020.0363 67. M. Karas et al., Accelerometry data in health research: challenges and opportunities Rev. Examples. https://doi.org/10.1101/276154 68. V.T. van Hees et al., Separating movement and gravity components in an acceleration signal and implications for the assessment of human daily physical activity. PLoS One 8(4) (2013). https://doi.org/10.1371/journal.pone.0061691 69. K. Bakrania et al., Intensity thresholds on raw acceleration data: euclidean norm minus one (ENMO) and mean amplitude deviation (MAD) approaches. PLoS ONE 11(10), e0164045 (2016). https://doi.org/10.1371/journal.pone.0164045 70. E. Aadland, L.B. Andersen, S.A. Anderssen, G.K. Resaland, O.M. Kvalheim, Associations of volumes and patterns of physical activity with metabolic health in children: a multivariate pattern analysis approach. Prev. Med. (Baltim) 115, 12–18 (2018). https://doi.org/10.1016/j. ypmed.2018.08.001 71. UK Biobank—UK Biobank. Available https://www.ukbiobank.ac.uk/. Accessed 06 Apr 2021 72. NHANES—National Health and Nutrition Examination Survey Homepage. Available https://www.cdc.gov/nchs/nhanes/index.htm?CDC_AA_refVal=https%3A%2F%2F www.cdc.gov%2Fnchs%2Fnhanes.htm. Accessed 06 Apr 2021 73. A. Doherty et al., Large scale population assessment of physical activity using wrist worn accelerometers: The UK Biobank Study. https://doi.org/10.1371/journal.pone.0169649 74. J.H. Migueles et al., Accelerometer data collection and processing criteria to assess physical activity and other outcomes: a systematic review and practical considerations. Sports Med. 47(9), 1821–1845 (2017). https://doi.org/10.1007/s40279-017-0716-0 75. J.J. Reilly, V. Penpraze, J. Hislop, G. Davies, S. Grant, J.Y. Paton, Objective measurement of physical activity and sedentary behaviour: review with new data. Arch Dis Child 93(7), 614–619 (2008). https://doi.org/10.1136/adc.2007.133272 76. S. Vale, R. Santos, P. Silva, L. Soares-Miranda, J. Mota, Preschool children physical activity measurement: importance of epoch length choice. Pediatr. Exerc. Sci. 21(4), 413–420 (2009). https://doi.org/10.1123/pes.21.4.413 77. T. Sanders, D.P. Cliff, C. Lonsdale, Measuring adolescent boys’ physical activity: bout length and the influence of accelerometer epoch length. PLoS ONE 9(3), e92040 (2014). https://doi. org/10.1371/journal.pone.0092040 78. U. L. Beauchamp, H. Pappot, C. Holländer-Mieritz, The use of wearables in clinical trials during cancer treatment: systematic review. JMIR mHealth uHealth, 8(1) (2020). https://doi. org/10.2196/22006 79. R.P. Troiano, J.J. McClain, R.J. Brychta, K.Y. Chen, Evolution of accelerometer methods for physical activity research. Br. J. Sports Med. 48(13), 1019–1023 (2014). https://doi.org/10. 1136/bjsports-2014-093546 80. G. McLellan, R. Arthur, D.S. Buchan, Wear compliance, sedentary behaviour and activity in free-living children from hip-and wrist-mounted ActiGraph GT3X+ accelerometers. J. Sports Sci. 36(21), 2424–2430 (2018). https://doi.org/10.1080/02640414.2018.1461322 81. S.J. Fairclough, R. Noonan, A.V. Rowlands, V. Van Hees, Z. Knowles, L.M. Boddy, Wear compliance and activity in children wearing wrist- and hip-mounted accelerometers. Med. Sci. Sports Exerc. 48(2), 245–253 (2016). https://doi.org/10.1249/MSS.0000000000000771 82. J.J. Scott, A.V. Rowlands, D.P. Cliff, P.J. Morgan, R.C. Plotnikoff, D.R. Lubans, Comparability and feasibility of wrist- and hip-worn accelerometers in free-living adolescents. J. Sci. Med. Sport 20(12), 1101–1106 (2017). https://doi.org/10.1016/j.jsams.2017.04.017
5 Wearable Accelerometers in Cancer Patients
143
83. L. Hassan et al., Tea, talk and technology: patient and public involvement to improve connected health ‘wearables’ research in dementia. Res. Involv. Engagem. 3(1) (2017). https://doi.org/ 10.1186/s40900-017-0063-1 84. K. Ellis, J. Kerr, S. Godbole, J. Staudenmayer, G. Lanckriet, Hip and wrist accelerometer algorithms for free-living behavior classification. Med. Sci. Sports Exerc. 48(5), 933–940 (2016). https://doi.org/10.1249/MSS.0000000000000840 85. S. Del Din, A. Hickey, N. Hurwitz, J.C. Mathers, L. Rochester, A. Godfrey, Measuring gait with an accelerometer-based wearable: influence of device location, testing protocol and age. Physiol. Meas. 37(10), 1785–1797 (2016). https://doi.org/10.1088/0967-3334/37/10/1785 86. A. Narayanan, F. Desai, T. Stewart, S. Duncan, L. MacKay, Application of raw accelerometer data and machine-learning techniques to characterize human movement behavior: a systematic scoping review. J. Phys. Act. Health 17(3), 360–383 (2020). https://doi.org/10.1123/jpah. 2019-0088 87. V. Farrahi, M. Niemelä, M. Kangas, R. Korpelainen, T. Jämsä, Calibration and validation of accelerometer-based activity monitors: A systematic review of machine-learning approaches. Gait Posture 68, 285–299 (2019). https://doi.org/10.1016/j.gaitpost.2018.12.003 88. M. Karas, Accelerometry data in health research: challenges and opportunities review and examples. Stat. Biosci. 11, 210–237 (2019). https://doi.org/10.1007/s12561-018-9227-2 89. L. Bao, S.S. Intille, Activity recognition from user-annotated acceleration data 90. S.R. Small, S. Khalid, P. Dhiman, S. Chan, D. Jackson, A.R. Doherty, Impact of reduced sampling rate on accelerometer-based physical activity monitoring and machine learning activity classification. medRxiv, p. 2020.10.22.20217927, 2020, https://doi.org/10.1101/2020. 10.22.20217927. 91. A. Khan, N. Hammerla, S. Mellor, T. Plötz, Optimising sampling rates for accelerometerbased human activity recognition ✩. Pattern Recognit. Lett. 73, 33–40 (2016). https://doi.org/ 10.1016/j.patrec.2016.01.001 92. M. Hildebrand, V.T. Van Hees, B.H. Hansen, U. Ekelund, Age group comparability of raw accelerometer output from wrist-and hip-worn monitors. Med. Sci. Sports Exerc. 46(9), 1816– 1824 (2014). https://doi.org/10.1249/MSS.0000000000000289 93. H. Vähä-Ypyä, T. Vasankari, P. Husu, J. Suni, H. Sievänen, A universal, accurate intensitybased classification of different physical activities using raw data of accelerometer. Clin. Physiol. Funct. Imaging 35(1), 64–70 (2015). https://doi.org/10.1111/cpf.12127 94. J.C. Br Ønd, L.B. Andersen, D. Arvidsson, Generating actigraph counts from raw acceleration recorded by an alternative monitor. Med. Sci. Sports Exerc. 49(11), 2351–2360 (2017). https:// doi.org/10.1249/MSS.0000000000001344 95. A. Bhattacharya, E.P. McCutcheon, E. Shvartz, J.E. Greenleaf, Body acceleration distribution and O2 uptake in humans during running and jumping 49(5), 881–887 (1980). https://doi.org/ 10.1152/jappl.1980.49.5.881 96. (No Title), (2020). https://doi.org/10.1177/1534735416684016 97. A.H.K. Montoye, R.W. Moore, H.R. Bowles, R. Korycinski, K.A. Pfeiffer, Reporting accelerometer methods in physical activity intervention studies: a systematic review and recommendations for authors. https://doi.org/10.1136/bjsports-2015-095947 98. A.H.K. Montoye et al., Development of cut-points for determining activity intensity from a wrist-worn ActiGraph accelerometer in free-living adults. J. Sports Sci. 1–10 (2020). https:// doi.org/10.1080/02640414.2020.1794244 99. K. Lyden, S.K. Keadle, J. Staudenmayer, P.S. Freedson, A method to estimate free-living active and sedentary behavior from an accelerometer. Med. Sci. Sports Exerc. 46(2), 386–397 (2014). https://doi.org/10.1249/MSS.0b013e3182a42a2d 100. I.C. Gyllensten, A.G. Bonomi, Identifying types of physical activity with a single accelerometer: evaluating laboratory-trained algorithms in daily life. IEEE Trans. Biomed. Eng. 58(9), 2656–2663 (2011). https://doi.org/10.1109/TBME.2011.2160723 101. S.G. Trost, W.K. Wong, K.A. Pfeiffer, Y. Zheng, Artificial neural networks to predict activity type and energy expenditure in youth. Med. Sci. Sports Exerc. 44(9), 1801–1809 (2012). https://doi.org/10.1249/MSS.0b013e318258ac11
144
S. Dadhania and M. Williams
102. J. Staudenmayer, S. He, A. Hickey, J. Sasaki, P. Freedson, Methods to estimate aspects of physical activity and sedentary behavior from high-frequency wrist accelerometer measurements. J. Appl. Physiol. 119(4), 396–403 (2015). https://doi.org/10.1152/japplphysiol.00026. 2015 103. A. Mannini, S.S. Intille, M. Rosenberger, A.M. Sabatini, W. Haskell, Activity recognition using a single accelerometer placed at the wrist or ankle. Med. Sci. Sports Exerc. 45(11), 2193–2203 (2013). https://doi.org/10.1249/MSS.0b013e31829736d6 104. Digital Endpoints in clinical trials | ICON plc. Available https://www.iconplc.com/insights/ blog/2020/04/28/wearables-and-digital-end/. Accessed 06 Apr 2021 105. Axivity | Product. Available https://axivity.com/product/ax3. Accessed 06 Apr 2021 106. A. Leung et al., The EORTC QLQ-BN20 for assessment of quality of life in patients receiving treatment or prophylaxis for brain metastases: a literature review. Expert Rev. Pharmacoecon. Outcomes Res. 11(6), 693–700 (2011). https://doi.org/10.1586/erp.11.66 107. EUR-Lex—31993L0042—EN, Off. J. L 169 , 12/07/1993 P. 0001—0043; Finnish Spec. Ed. Chapter 13 vol. 24 P. 0085 ; Swedish Spec. Ed. Chapter 13 vol. 24 P. 0085 108. S. Kekade et al., The usefulness and actual use of wearable devices among the elderly population. Comput. Meth. Prog. Biomed. 153, 137–159 (2018). https://doi.org/10.1016/j.cmpb. 2017.10.008 109. M. de Almeida Mendes, I.C.M. da Silva, V.V. Ramires, F.F. Reichert, R.C. Martins, E. Tomasi, Calibration of raw accelerometer data to measure physical activity: a systematic review. Gait Posture 61, 98–110 (2018). https://doi.org/10.1016/j.gaitpost.2017.12.028 110. S. LB, J. PB, “Erratum: Usefulness of motion sensors to estimate energy expenditure in children and adults: a narrative review of studies using DLW (European Journal of Clinical Nutrition (2017) 71 (331–339) https://doi.org/10.1038/ejcn.2017.2),” European Journal of Clinical Nutrition, vol. 71, no. 8. Nature Publishing Group, p. 1026, 01-Aug-2017, https:// doi.org/10.1038/ejcn.2017.78 111. R. Walmsley et al., Reallocating time from device-measured sleep, sedentary behaviour or light physical activity to moderate-to-vigorous physical activity is associated with lower cardiovascular disease risk, medRxiv. medRxiv, p. 2020.11.10.20227769, 20-Nov-2020, https://doi. org/10.1101/2020.11.10.20227769 112. M.Z. Campanini, E. Lopez-Garcia, F. Rodríguez-Artalejo, A.D. González, S.M. Andrade, A.E. Mesas, Agreement between sleep diary and actigraphy in a highly educated Brazilian population. Sleep Med. 35, 27–34 (2017). https://doi.org/10.1016/j.sleep.2017.04.004 113. S. Mazza, H. Bastuji, A.E. Rey, Objective and subjective assessments of sleep in children: comparison of actigraphy, sleep diary completed by children and parents’ estimation. Front. Psychiatry 11, 1 (2020). https://doi.org/10.3389/fpsyt.2020.00495 114. I.C.M. Da Silva et al., Physical activity levels in three Brazilian birth cohorts as assessed with raw triaxial wrist accelerometry. Int. J. Epidemiol. 43(6), 1959–1968 (2014). https://doi.org/ 10.1093/ije/dyu203 115. K.N. Anderson et al., Assessment of sleep and circadian rhythm disorders in the very old: The newcastle 85+ cohort study. Age Ageing 43(1), 57–63 (2014). https://doi.org/10.1093/ ageing/aft153 116. V.T. van Hees et al., Estimating sleep parameters using an accelerometer without sleep diary. Sci. Rep. 8(1), 1–11 (2018). https://doi.org/10.1038/s41598-018-31266-z 117. E. Christodoulou, J. Ma, G.S. Collins, E.W. Steyerberg, J.Y. Verbakel, B. Van Calster, A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 110, 12–22 (2019). https://doi.org/10.1016/ j.jclinepi.2019.02.004 118. A.H.K. Montoye, M. Begum, Z. Henning, K.A. Pfeiffer, Comparison of linear and non-linear models for predicting energy expenditure from raw accelerometer data. Physiol. Meas. 38(2), 343–357 (2017). https://doi.org/10.1088/1361-6579/38/2/343 119. M.S. Bianchim, M.A. McNarry, L. Larun, K.A. Mackintosh, Calibration and validation of accelerometry to measure physical activity in adult clinical groups: a systematic review. Prev. Med. Rep. 16 (2019). https://doi.org/10.1016/j.pmedr.2019.101001
5 Wearable Accelerometers in Cancer Patients
145
120. J.A. Schrack et al., Assessing daily physical activity in older adults: unraveling the complexity of monitors, measures, and methods. J. Gerontol.—Ser. A Biol. Sci. Med. Sci. 71(8), 1039– 1048 (2016). https://doi.org/10.1093/gerona/glw026 121. J.A. Schrack, V. Zipunnikov, J. Goldsmith, K. Bandeen-Roche, C.M. Crainiceanu, L. Ferrucci, Estimating energy expenditure from heart rate in older adults: a case for calibration. PLoS One 9(4), e93520 (2014). https://doi.org/10.1371/journal.pone.0093520 122. V.T. van Hees et al., A novel, open access method to assess sleep duration using a wrist-worn accelerometer. PLoS ONE 10(11), e0142533 (2015). https://doi.org/10.1371/journal.pone. 0142533 123. V.T. van Hees et al., Autocalibration of accelerometer data for free-living physical activity assessment using local gravity and temperature: an evaluation on four continents. J. Appl. Physiol. 117(7), 738–744 (2014). https://doi.org/10.1152/japplphysiol.00421.2014 124. V.T. van Hees, Raw accelerometer data analysis [R package GGIR version 2.2–0], (2020) 125. A. Doherty et al., GWAS identifies 14 loci for device-measured physical activity and sleep duration. Nat. Commun. 9(1), 5257 (2018). https://doi.org/10.1038/s41467-018-07743-4 126. M. Willetts, S. Hollowell, L. Aslett, C. Holmes, A. Doherty, Statistical machine learning of sleep and physical activity phenotypes from sensor data in 96,220 UK Biobank participants. https://doi.org/10.1038/s41598-018-26174-1 127. T. White, Thomite/pampro v0.4.0, (2018). https://doi.org/10.5281/ZENODO.1187043 128. R.L. Siegel, K.D. Miller, A. Jemal, Cancer statistics, 2016. CA. Cancer J. Clin. 66(1), 7–30 (2016). https://doi.org/10.3322/caac.21332 129. R.M. Speck, K.S. Courneya, L.C. Mâsse, S. Duval, K.H. Schmitz, An update of controlled physical activity trials in cancer survivors: a systematic review and meta-analysis. J. Cancer Surviv. 4(2), 87–100 (2010). https://doi.org/10.1007/s11764-009-0110-5 130. L.Q. Rogers, Objective monitoring of physical activity after a cancer diagnosis: challenges and opportunities for enhancing cancer control. Phys. Ther. Rev. 15(3), 224–237 (2010). https:// doi.org/10.1179/174328810X12814016178872 131. S.M. Cox, A. Lane, S.L. Volchenboum, C.S.M., L.A., Use of wearable, mobile, and sensor technology in cancer clinical trials.JCO Clin. Cancer Inform. 2(2), 1–11 (2018) https://doi. org/10.1200/cci.17.00147 132. G. Gresham et al., Wearable activity monitors in oncology trials: Current use of an emerging technology A R T I C L E I N F O, (2017). https://doi.org/10.1016/j.cct.2017.11.002 133. C. Holländer-Mieritz, C. Johansen, H. Pappot, eHealth-mind the gap. Acta Oncol. (Madr) 59(8), 877–878 (2020). https://doi.org/10.1080/0284186X.2020.1794037 134. J.C. Goldsack et al., Verification, analytical validation, and clinical validation (V3): the foundation of determining fit-for-purpose for Biometric Monitoring Technologies (BioMeTs). npj Digit. Med. 3(1), (2020). https://doi.org/10.1038/s41746-020-0260-4 135. E. Cereda, M. Turrini, D. Ciapanna, L. Marbello, A. Pietrobelli, E. Corradi, Assessing energy expenditure in cancer patients: a pilot validation of a new wearable device. J. Parenter. Enter. Nutr. 31(6), 502–507 (2007). https://doi.org/10.1177/0148607107031006502 136. P. Fassier et al., Variations of physical activity and sedentary behavior between before and after cancer diagnosis: Results from the prospective population-based NutriNet-Santé cohort. Med. (United States) 95(40) (2016). https://doi.org/10.1097/MD.0000000000004629 137. N. K.A. et al., Measuring and understanding adherence in a home-based exercise intervention during chemotherapy for early breast cancer. Breast Cancer Res. Treat. 168(1), 43–55 (2018). https://dx.doi.org/https://doi.org/10.1007/s10549-017-4565-1 138. M.A. Tully, C. McBride, L. Heron, R.F. Hunter, The validation of Fitbit ZipTM physical activity monitor as a measure of free-living physical activity. BMC Res. Notes 7(1), 1–5 (2014). https://doi.org/10.1186/1756-0500-7-952 139. M. Schneider, L. Chau, Validation of the Fitbit Zip for monitoring physical activity among free-living adolescents. BMC Res. Notes 9(1), 448 (2016). https://doi.org/10.1186/s13104016-2253-6 140. A. St-Laurent, M.M. Mony, M. Mathieu, S.M. Ruchat, Validation of the Fitbit Zip and Fitbit Flex with pregnant women in free-living conditions. J. Med. Eng. Technol. 42(4), 259–264 (2018). https://doi.org/10.1080/03091902.2018.1472822
146
S. Dadhania and M. Williams
141. C.A. Sharp, K.A. Mackintosh, M. Erjavec, D.M. Pascoe, P.J. Horne, Validity and reliability of the Fitbit Zip as a measure of preschool children’s step count. BMJ Open Sport Exerc. Med. 3(1), 272 (2017). https://doi.org/10.1136/bmjsem-2017-000272 142. V.-B. K., K. C., F. L., M. O., M. S., and S. S., Cancer patients participating in a lifestyle intervention during chemotherapy greatly over-report their physical activity level: a validation study. BMC Sports Sci. Med. Rehabil. 8(1), 10 (2016).http://dx.doi.org/https://doi.org/10. 1186/s13102-016-0035-z 143. G.A. Lopez, J.C. Brønd, L.B. Andersen, M. Dencker, D. Arvidsson, Validation of SenseWear Armband in children, adolescents, and adults. Scand. J. Med. Sci. Sport. 28(2), 487–495 (2018). https://doi.org/10.1111/sms.12920 144. D.M. Bhammar, B.J. Sawyer, W.J. Tucker, J.M. Lee, G.A. Gaesser, Validity of SenseWear® Armband v5.2 and v2.2 for estimating energy expenditure. J. Sports Sci. 34(19), 1830–1838 (2016). https://doi.org/10.1080/02640414.2016.1140220 145. N. Farooqi, F. Slinde, L. Håglin, T. Sandström, Validation of sensewear armband and actiheart monitors for assessments of daily energy expenditure in free-living women with chronic obstructive pulmonary disease. Physiol. Rep. 1(6), 1–12 (2013). https://doi.org/10.1002/phy 2.150 146. A. Hermann et al., Low validity of the Sensewear Pro3 activity monitor compared to indirect calorimetry during simulated free living in patients with osteoarthritis of the hip. BMC Musculoskelet. Disord. 15(1), 43 (2014). https://doi.org/10.1186/1471-2474-15-43 147. S.S. Lowe, B. Danielson, C. Beaumont, S.M. Watanabe, V.E. Baracos, K.S. Courneya, Associations between objectively measured physical activity and quality of life in cancer patients with brain metastases. J. Pain Symptom Manage. 48(3), 322–332 (2014). https://doi.org/10. 1016/j.jpainsymman.2013.10.012 148. R.J.E. Skipworth et al., Patient-focused endpoints in advanced cancer: criterion-based validation of accelerometer-based activity monitoring. Clin. Nutr. 30(6), 812–821 (2011). https:// doi.org/10.1016/j.clnu.2011.05.010 149. A.K. Bourke, E.A.F. Ihlen, J.L. Helbostad, Validation of the activPAL3 in free-living and laboratory scenarios for the measurement of physical activity, stepping, and transitions in older adults. J. Meas. Phys. Behav. 2(2), 58–65 (2019). https://doi.org/10.1123/jmpb.20180056 150. S. Kozey-Keadle, A. Libertine, K. Lyden, J. Staudenmayer, P.S. Freedson, Validation of wearable monitors for assessing sedentary behavior. Med Sci Sports Exerc 43(8), 1561–1567 (2011). https://doi.org/10.1249/MSS.0b013e31820ce174 151. C. Sellers, P. Dall, M. Grant, B. Stansfield, Validity and reliability of the activPAL3 for measuring posture and stepping in adults and young people. Gait Posture 43, 42–47 (2016). https://doi.org/10.1016/j.gaitpost.2015.10.020 152. P.M. Grant, C.G. Ryan, W.W. Tigbe, M.H. Granat, The validation of a novel activity monitor in the measurement of posture and motion during everyday activities. Br. J. Sports Med. 40(12), 992–997 (2006). https://doi.org/10.1136/bjsm.2006.030262 153. A. Gupta et al., Feasibility of wearable physical activity monitors in patients with cancer. JCO Clin. Cancer Inform. 2, 1–10 (2018). https://doi.org/10.1200/cci.17.00152 154. M. Alharbi, A. Bauman, L. Neubeck, R. Gallagher, Validation of Fitbit-Flex as a measure of free-living physical activity in a community-based phase III cardiac rehabilitation population. Eur. J. Prev. Cardiol. 23(14), 1476–1485 (2016). https://doi.org/10.1177/2047487316634883 155. E. Burton et al., Reliability and validity of two fitness tracker devices in the laboratory and home environment for older community-dwelling people. BMC Geriatr. 18(1), 103 (2018). https://doi.org/10.1186/s12877-018-0793-4 156. L. Edbrooke, C.L. Granger, R.A. Clark, L. Denehy, Clinical medicine physical activity levels are low in inoperable lung cancer: exploratory analyses from a randomised controlled trial (2019) https://doi.org/10.3390/jcm8091288 157. J.E. Broderick et al., Patient reported outcomes can improve performance status assessment: a pilot study. J. Patient-Reported Outcomes 3(1), 1–10 (2019). https://doi.org/10.1186/s41687019-0136-z
5 Wearable Accelerometers in Cancer Patients
147
158. Validation of Four Smartwatches in Energy Expenditure and Heart Rate Assessment During Exergaming | ActiGraph. Available https://actigraphcorp.com/research-database/validationof-four-smartwatches-in-energy-expenditure-and-heart-rate-assessment-during-exergamin g-2/. Accessed 17-Feb-2021 159. C. A. Low et al., Estimation of symptom severity during chemotherapy from passively sensed data: exploratory study. https://doi.org/10.2196/jmir.9046 160. A.W. Gorny, S.J. Liew, C.S. Tan, F. Müller-Riemenschneider, Fitbit charge HR wireless heart rate monitor: Validation study conducted under free-living conditions. JMIR mHealth uHealth 5(10), e157 (2017). https://doi.org/10.2196/mhealth.8233 161. E. Jo, K. Lewis, D. Directo, M.J. Kim, B.A. Dolezal, Validation of biofeedback wearables for photoplethysmographic heart rate tracking (2016) 162. J.A. Roscoe et al., Temporal interrelationships among fatigue, circadian rhythm and depression in breast cancer patients undergoing chemotherapy treatment. Support. Care Cancer 10(4), 329–336 (2002). https://doi.org/10.1007/s00520-001-0317-0 163. R.J. Cole, D.F. Kripke, W. Gruen, D.J. Mullaney, J.C. Gillin, Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992). https://doi.org/10.1093/sleep/15. 5.461 164. Sleep3.qxd | Enhanced Reader. Available chrome-extension://dagcmkpagjlhakfdhnbomgmj dpkdklff/enhanced-reader.html?pdf=https%3A%2F%2Fbrxt.mendeley.com%2Fdocument %2Fcontent%2Fc81cb77a-6744-3b63-add9-2fc5cc1d9138. Accessed 17 Feb 2021 165. Objective measurements of sleep for non-laboratory settings as alternatives to polysomnography-a systematic review. https://doi.org/10.1111/j.1365-2869.2009.00814.x 166. A.A. Wright et al., The hope pilot study: harnessing patient-reported outcomes and biometric data to enhance cancer care. JCO Clin. Cancer Inform. 2, 1–12 (2018). https://doi.org/10. 1200/cci.17.00149 167. Every step you fake: a comparative analysis of fitness tracker privacy and security | open effect. Available https://openeffect.ca/fitness-tracker-privacy-and-security/. Accessed 03 Feb 2021 168. Framework for Improving Critical Infrastructure Cybersecurity, Version 1.1, (Gaithersburg, MD, 2018) 169. Communicating Cybersecurity Vulnerabilities to Patients: Considerations for a Framework | FDA. Available https://www.fda.gov/about-fda/cdrh-patient-science-and-engagementprogram/communicating-cybersecurity-vulnerabilities-patients-considerations-framework. Accessed 03 Feb 2021 170. L. Hassan et al., Tea, talk and technology: patient and public involvement to improve connected health ‘wearables’ research in dementia. Res. Involv. Engagem. 3(1), 12 (2017). https://doi. org/10.1186/s40900-017-0063-1 171. I. Raber, C.P. McCarthy, R.W. Yeh, Health insurance and mobile health devices: opportunities and concerns. JAMA—J. Am. Med. Assoc. 321(18), 1767–1768 (2019). https://doi.org/10. 1001/jama.2019.3353 172. D.C. Klonoff, Twelve modern digital technologies that are transforming decision making for diabetes and all areas of health care. J. Diab. Sci. Technol. 7(2), 291–295 (2013). https://doi. org/10.1177/193229681300700201 173. M.S. Beg, A. Gupta, T. Stewart, C.D. Rethorst, Promise of wearable physical activity monitors in oncology practice. J. Oncol. Pract. 13(2), 82–89 (2017). https://doi.org/10.1200/JOP.2016. 016857 174. M.G. Sweegers et al., Which cancer survivors are at risk for a physically inactive and sedentary lifestyle? Results from pooled accelerometer data of 1447 cancer survivors. Int. J. Behav. Nutr. Phys. Act. 16(1), 66 (2019). https://doi.org/10.1186/s12966-019-0820-7 175. Cancer. Available https://www.who.int/news-room/fact-sheets/detail/cancer. Accessed 08Apr-2021
Chapter 6
Online Application of a Home-Administered Parent-Mediated Program for Children with ASD Margarita Stankova, Tsveta Kamenski, Polina Mihova, and Todor Datchev Abstract The aim of the chapter is to present an online application of a homeadministered parent-mediated program for children with Autism spectrum disorder (ASD) for development and enhancement of their communication and language skills. The program is organized in modules and is administered within a 12-week range. It includes the following components: a short text, coupled with a picture that illustrates it (plot picture); visual cards for the target words per module, related to the text (nouns, verbs and adjectives). Every module has different text and visual cards, but same tasks are to be performed on a daily basis, targeting impressive/expressive language; discourse abilities, wh-questions; nominative function, pragmatics and generalization. The instructional component for parents involves activities within the educational e-platform Moodle, where each component is interactively introduced and demo videos showing how to perform the tasks are available in order to ensure the correct performance at home and make sure instructions are properly followed. On the e-platform there are clear instructions for use, methods for reinforcement, prompting and coping with undesired behaviors. The program instructive module is presented with written text, as well as video files. Parents can virtually meet a specialist, and request feedback, based on video task performance. The administration for the Program follows a strict schedule, also available in Moodle. The Program has been administered with 20 children with ASD and their families, and some of the results are presented hereby.
M. Stankova (B) · T. Kamenski · P. Mihova · T. Datchev New Bulgarian University (NBU), Sofia, Bulgaria e-mail: [email protected] T. Kamenski e-mail: [email protected] P. Mihova e-mail: [email protected] T. Datchev e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_6
149
150
M. Stankova et al.
Keywords Online application · Program for parents · Autism spectrum disorder (ADS)
6.1 Introduction Autism spectrum disorder (ASD) refers to a group of neurobehavioral and neurodevelopmental conditions characterized by impairments in two domains: (1) social communication, social reciprocity; (2) restricted and repetitive patterns of behavior, interests, or activities [1]. Due to the possibility of multifactorial inheritance and genetic heterogeneity, as well as the attempts to assess the role of the environment, autism is defined as etiological, neurobiological and clinical heterogeneous disorder by the term ASD (Autism Spectrum Disorder) [2]. There have been numerous attempts to identify the causes and characteristics of the existing subtypes of ASD, especially in the context of the comorbid problems and disorders. General assumption is that the pathophysiological base comprises genetic and epigenetic factors, as well as immunological and environmental factors, that trigger neuroanatomical and neurochemical dysfunctions relatively early in the development of the child [3]. Some children with autism have comorbid specific medical disorders, such as Fragile X syndrome, abnormalities of chromosome 15, other chromosomal abnormalities, single gene mutations and genetic syndromes, Tuberous Sclerosis [4], rare genic CNVs, genome-wide and at specific loci [5–7]. Some of the children with ASD are diagnosed with macrocephaly [8], medical conditions and epilepsy [9, 10] sleep disorders [11, 12], inlc. insomnia [13], gastrointestinal disorders [14–16] respiratory disorders [17], auditory disorders and infections, allergies, infections and immune problems [18–21]. According to McDougle [22], it may be possible to define a specific subtype of children with autistic symptoms, that display immunological problems. Some of the behavioral problems of children with ASD may originate from different medical conditions, sometimes gastrointestinal disorders [15]. Treatment of these medical conditions may affect positively the quality of life of the children and their families [23]. Children with ASD that display gastrointestinal symptoms are more prone to social problems, irritability and anxiety. These children are less responsive to therapeutic interventions [24]. In addition to medical problems, patients with ASD often have co-morbid and psychiatric conditions, related to distress, emotional and behavioral challenges. Several factors are known to contribute to these challenges, such as age, level of intellectual functioning, sex etc. [25, 26]. Core symptoms in ASD are impaired communication and social interaction deficiencies. The rest of the symptoms, specific to ASD might be considered secondary to the core ones: anxiety, agitation, impulsivity, aggression, attention problems, sleep disturbances, self-injuring behavior, oversensitivity. Unfortunately, they have a negative effect on the children and their caretakers [27]. Saunders discusses the placement/nature of the additional conditions in ASD diagnosis, namely whether they are
6 Online Application of a Home-Administered …
151
co-morbid disorders or they are part of the ASD diagnosis itself [28]. Such disorders are too heterogeneous and are not found in all patients with ASD. Several authors consider ADHD [29], anxiety, depression, and bipolar disorder [18], oppositional defiant disorder as co-morbid states in ASD, the most frequent one being ADHD [30]. Presuming that the biological and genetic base for ASD is the same, scientists seek to find the reason for the uneven distribution of ASD with regards to sex (male/female). An interesting theory is that girls are more susceptible to the influence of the environmental factors and thus develop social competence skills easier [31]. Delays in language development, late speech and specifics in language acquisition are noted by a number of authors [32, 33]. Although there are cases of later successful acquisition of spoken language, age of 5 is considered critical for speech production [34]. Many children with ASD display also a loss of already acquired language skills [35]. To differentiate ASD subtypes based on language skills and abilities has been a serious challenge [36]. A great number of children with ASD with severe speech disorders display also challenging behavior, which hinders their communication with others [37]. Low levels of receptive language abilities are related to higher levels of stereotypical and self-injurious behaviors [38]. Pre-verbal skills of children with ASD are closely related to joint attention, understanding the meaning of gestures, response to the speech of adults and imitation [39]. Children with ASD aged 18–24 months have difficulties in joint attention activities, considered a factor of major importance for the development of language and social skills, as well as communication levels and communicative gestures [40, 41]. The early appearance of difficulties introduces the need of early intervention programs with focus on the family, organizational and political context [42]. Parents’ active participation is considered of utmost importance in every therapeutic intervention [43]. Parents’ inclusion in the therapeutic programs decreases stress and increases self-confidence and quality of life of both children and parents [44]. Parents’ instruction and education in applying strategies in the everyday activities has a proven positive effect on the core symptoms of autism [45, 46]. Involvement of parents in therapeutic programs is considered as a prerequisite of a better result of the intervention [47]. Parents’ instruction is closely related to decreasing stress levels, improvement of life skills, more efficient communication between parent and child, better quality of life of parents [48].
6.2 Conceptual Framework and Aims of the Program 6.2.1 Behavioral Model of Communicative Failure Regardless of the wide range of possible etiological causes, the symptoms of social disorders, impaired verbal communication, and even the stereotypical behaviors in ASD can be regarded as a result of communicative failure. In some cases, it may be
152
M. Stankova et al.
due to impaired language comprehension, in others deficits are observed mainly in expressive language or in some aspects of the verbal and non-verbal communication and behavior, that is related to the reaction of others behavior and their expectations. Here are some examples for the Behavioral model of communicative failure: • Echolalia—the child does not understand its communication partner and the situation; • Production of neologisms—attempts to overcome the difficulties in communication; • Use of ready-made phrases in native/foreign language—some children with ASD try to overcome the difficulties that they have comprehending the communicative situation by using ready-made word combinations in certain situations, without considering the communicative situation itself and without the necessary skills for turn-taking in a discourse communication. The failure in communicative situations is probably a lengthy and slow process. Motor delays are noticed by parents at an earlier stage compared to manifestation of deficiencies in the development of social communication skills. Thus, they miss on the development of intentional communication, receptive language skills and attention to language, that are of key importance for performance improvement [35, 49]. Regardless of the primary source, many children with ASD experience withdrawal because of difficulties or failure to attain the expected levels of communicative performance. That profile includes children with additional specific temperamental traits, higher levels of anxiety and introvert characteristics, that contribute to the autistic reaction to communicative failure. In addition, schizoid personality traits can be an extra burden for people with ASD [50]. The overall reduction of manifestation of autistic traits and behavior is positively related to one’s performance in the area of social communication [51]. We assume that communicative failure in the direct communication with one’s parents can lead to social withdrawal and building of stereotypical behaviors to decrease anxiety levels, adhere to familiar objects, games, situations, people in order to find safety and avoid failure. In this aspect, opening communicative situations with parents, their support and encouragement would be an invaluable experience in order to improve and increase the willingness for social interaction in children’s immediate and wider environment. Verbal communication is a key component for the development of most of the children with ASD. Disturbances in social functioning lead to serious disorders in verbal abilities, especially those related to active verbal interaction with others. Lack of verbal communication, coupled with social development issues may lead to restriction to a limited number of activities and discharge of tension, provoked by the behavioral „otherness “, translated into stereotypical and repetitive movements. We believe that better language abilities in an interactive environment would improve social functioning and reduce the repetitive movements. Through the use of a structured program aiming at developing child’s communication skills, the parents are motivated to work with their child at home and have a clear idea what tasks to perform and how, when and in what order and timing.
6 Online Application of a Home-Administered …
153
Parents are often over-worried for their child’s development, verbal development included. In their quest for research on the topic, various suitable therapies etc. the parents forget to get together their own potential and put it to work towards developing better their child’s language and communication skills by using interactive parent-mediated interventions, applicable at home. Apart from laying down the tasks required and the expected outcomes, the structured program also teaches the parents how to apply various interventions and strategies, that can be further deployed, developed and used in different communicative situations. In addition, it shows tangible results and boosts confidence with regards to a favorable prognosis and their child’s future development. Most importantly, the Program for Indirect Intervention gives a positive feedback and opens opportunities for both parents and child for communication without failure.
6.2.2 Structure of the Program The Program for Indirect Intervention for development of language and communication skills in children with ASD is organized in modules and is administered within a 12-week range. It includes the following components: a short text, coupled with a picture that illustrates it; pictures for the target words per module (nouns, verbs and adjectives), related to the text. Every module has different text and pictures, but same exercises are to be performed on a daily basis, targeting impressive/expressive language; discourse abilities, wh-questions; nominative function, pragmatics and generalization. As a support, there are 6 short videos on the platform, each 5–7 min long. The first one is introductory and contains general instructions and work principles for the parents. The rest (N5) demonstrate the task execution with clear verbal instructions and visualization. The aim of the Program is in the course of 12 weeks to induce/instigate/improve children’s communication skills and at the same time give parents opportunities to create/provide communicative situations and learn how to model children’s behavior, thus avoiding communication failure.
154
M. Stankova et al.
The Program is based on 12 popular animal short stories for children, similar in terms of complexity and length. Each story takes 7–9 min to read. Every week the parent reads one story following a strict schedule and performs the required tasks with the help of the visual cards. The visual photo cards represent the target words that are different for every story, respectively for every week. There are 7 target words on a weekly basis involved in the tasks—3 nouns, 2 verbs, 2 adjectives. 5 different tasks, related to the acquisition of different language skills, are performed using the cards in different combinations. They are also part of the story and are used together with the plot picture for the week. Target words have been selected from LDS (Language Development Survey), part of the Bulgarian adapted version of CBCL (Child Behavior Checklist). LDS contains 310 words considered the first to be acquired by children in their early vocabulary development [52]. There are 5 short tasks (10–15 min) performed on a daily basis, coupled with clearly laid instructions and video demonstrations; participants are informed of the importance of persistence and strict adherence to the instructions of administration. In addition, parents are instructed to have at hand a strong reinforcer, such as a favorite toy or edible, that are to be used to reward success. If the task proves to be difficult for the child, parents can use various prompts-hold child’s hand and point to the picture, use a gesture, vocalization etc. in order to reach a satisfactory level of independent performance. If the child makes a mistake, the parent encourages her/him to try again, saying: “Lets try again” and they start the task all over again. The parent is aware that it is of utmost importance when performing the tasks to be calm and encouraging, and the child expects her/his reward after every (other) trial. Another important instruction for the parents is that the tasks have to be performed at a normal pace, but their instructions have to be well articulated, and the rhythm of speech-moderate, in order to achieve good intelligibility and adequate comprehension on behalf of the child. The tasks and supporting materials are the same for all children and a strict schedule of task performance is followed on a weekly basis. Tasks target various components of linguistic competence and performance, namely comprehension, impressive/expressive language, speech production, discourse/dialogue skills, use of wh-questions, nominative function, pragmatic skills and generalization of the acquired knowledge (Table 6.1).
6 Online Application of a Home-Administered …
155
Table 6.1 Tasks of the programs Setting and materials
Frequency
Method
Task 1 Reading a story /expressive/impressive language/discourse
Plot picture and matching text
The task is performed once at the beginning and once at the end of the week
A short text, a short story, up to 1 page, is provided and is divided into paragraphs. After each paragraph there are series of questions/comments to stimulate of discourse/dialogic speech
Task 2 Development of impressive language
Visual cards for the respective week
The task is performed as indicated in the schedule—Table 6.2 and each word is practiced 5 times. In week two you can practice the cards from week 1, in week 3 those from the previous weeks, etc
The parent shows three cards/one target and two distractors/in front of the child, names them, and then instructs the child to point to a specific card, for example.”Show me shoes”If the child has difficulty, the parent prompts by pointing to the card and/or guiding the child’s hand. If the child selects the wrong card, she says, “Let’s try again,” shuffles the cards, and starts all over
Task 3 Development of impressive language, wh-questions, nominative function
Plot picture for the story for the respective week
The task is performed as indicated in the schedule (Table 6.2) 7 questions related to the text are asked
The parent shows the picture to the child and asks her/him questions, which she/he can answer verbally, or non- verbally, for example: “Where is the sun?; Who is walking?”; “What is hot?” For each week, there are questions to each plot picture (continued)
156
M. Stankova et al.
Table 6.1 (continued) Setting and materials
Frequency
Method
Task 4 Development of expressive language
Visual cards for the respective week
The task is performed as indicated in the schedule (Table 6.2) and each word is practiced up to 5 times
The parent shows a card and instructs the child to describe a target card/What is this?/
Task 5 Development of expressive language/pragmatics/generalization of what has been learned
Visual cards for the respective week
The task is performed according to the weekly schedule (Table 6.2) whenever a communicative opportunity opens and in natural environment
Each target word for the week must be used in at least 4 sentences of different type/e.g. I’m watching—Look, the bird is flying!!/Let’s look at the bunnies!/Are you watching the movie?/in the natural environment of the child. The corresponding visual card is displayed as a visual prompt
Table 6.2 Week schedule
Day/Task Monday
Task 1
Task 2
Task 5
Tuesday
Task 2
Task 3
Task 5
Wednesday
Task 2
Task 4
Task 5
Thursday
Task 1
Task 3
Task 5
Friday
Task 1
Task 4
Task 5
6 Online Application of a Home-Administered …
157
Sample Text Week 1 The Sun and the Wind Nouns: sun, shoes, coat Adjectives: hot, yellow Verbs: blow, walk The Sun and the Wind argued who is stronger. -I am so strong, I can blow away all clouds from the sky!-the Wind said. The Sun replied: -I am stronger than you, because I can evaporate the water from the sea and thus create many more clouds. They argued for a long, long time. Hey buddy, let us look at the picture! Can you figure where the Sun is? Let’s look outside, do you see the sun? How does it look like? Where is it?/yellow, in the sky/What does the sun do? When the sun is shining bright, is it hot or cold outside? When the sun is shining, we feel hot! When does the sun shine bright? In the summer/what is the color of the sun? It’s yellow! Do you see anything else yellow in color on the picture? Let’s try to find something yellow on the picture. Yeees, that is boy’s t-shirt, it is yellow! In this moment the Sun and the Wind noticed a boy walking down the road below them. He was wearing a coat. -Let’s see who will manage to make the boy remove the coat- proposed the Wind. The Sun accepted the challenge. Wow…Let’s see what else is the boy wearing…/wears shoes, jeans, yellow t-shirt/ What is the boy doing? He is enjoying his walk; he is walking down the road. Do you thing he is hot/ cold? Let’s look outside, is it hot or cold today? Did you put your coat when you went for a walk today? What else were you wearing? Does your father/mother/sister wear a coat? When do we wear a coat?-in the winter. The Wind tried first-it started blowing mightily, in front and in the back of the boy. The stronger it blew, the more the boy was holding on to his coat. At the end the Wind gave up. Hey, tell me-when there is strong wind outside, do we feel cold or warm? In the summer? In the winter?
158
M. Stankova et al.
What do we do in order to warm up? We wear warm clothes. Why did the Wind blow? In order to make the boy remove his coat. Hey, can you show me how the wind blows? Can you do stronger? Can you do stronger?? Well done!!! It was the Sun’s turn. It appeared from behind the clouds and shone brightly. Its rays caressed the boy. It was getting so hot, that the boy could not stand the heat, so he removed his coat. In this way, the sun won the challenge. In the summer, when the sun is shining bright, do we wear coats? Right, we don’t! What do we wear in the summertime? Where do we like to go? Seaside, mountain. When is it hotter-in the summer or in the winter? Why is it warmer in the summer? Because the sun shines bright! Questions for Task 3 1. 2. 3. 4. 5. 6. 7.
Where is the Sun on the picture? Where is the Wind on the picture? Who is walking down the road? What is the boy holding in his hand? What on the picture is yellow? What is the boy wearing on his feet? What do you think, is the boy on the picture feeling hot or cold?
6.2.3 Technical Description and Parameters of the Program The educational platform is an interactive tool dedicated to distant teaching and learning. Educational platforms are generally cloud-based software. A specialized web-based platform is used for the implementation of the online application, providing the opportunity to implement specific interactive activities, suitable for use in desktop mode and for work with a mobile device—cell phone and/or tablet. According to the purpose of the therapeutic program, a schematic model of presentation of the tasks with their complementary elements was initially developed, in order to provide an optimal number of feedbacks on one hand, as well as analysis and systematization of the interactive activities in relation to each other, on the other. The mobile app, presented as a block scheme in Fig. 6.1, ensures development and design of new courses or programs from scratch by stacking modular media, text, and interactive blocks. It allows modification and design of the application outline, task titles, and the section headers and also elaboration and interpretation in different technical ways of the modular media, text, and interactive multimedia blocks. The access to the platform is secured by a pre-generated username and password so that it makes it possible on the one hand to control access, on the other—to monitor the progress of the participant, and last but not least—to collect statistical data for further analysis, as well as for ongoing feedback and regular task coordination. The
6 Online Application of a Home-Administered …
159
Fig. 6.1 Modular block-scheme of the online application
program follows a strict schedule, which is developed in Excel media and implemented in the online solution. It is important for the program to follow and register the frequency and number of task repetitions. Also, the online application provides the opportunity for virtual consulting with experts through an integrated module for video conferencing.
6.2.4 Technical Specifications of the System XAMPP stands for: Cross-Platform (X), Apache (A), MySQL (M), PHP (P) and Perl (P). This is a lightweight Apache distribution that is extremely convenient for building non production and production local web server. All applications required to run a local server—a web server (Apache), a database server (MySQL) and a scripting language (PHP)—are included in a single installation file. XAMPP is also a cross-platform application, which means it works equally well on Linux, MacOS and Windows. Most professionally used web servers use the same components as XAMPP, so moving from a local a non-production server to a professional (production) server is extremely easy. Using a non-production server to create a new site or a new version of an existing site is a good, time-tested practice with several obvious advantages: • Does not overload the server that hosts your active sites
160
M. Stankova et al.
Fig. 6.2 Installation process
• Completely eliminates the risk of new projects being indexed by search engines as there is no external access to the test server. Novicat Premium was used for the creation of the database for the Moodle website—Figs. 6.2 and 6.3: The Program is developed and realized through Moodle mobile app solution. The first thing when running the installation process is to enable mobile access to the site so users can use the mobile app for Moodle. For sites using https, mobile access is enabled by default in new installations of Moodle 3.0 onwards. The site has a trusted SSL certificate, but for security reasons the app doesn’t work with self-signed certificates. There are numerous reasons to use the mobile app version of Moodle, but one of the most important is the offline availability. The app allows access to all the learning resources at any time while offline, with progress automatically syncing back to Moodle next time the user is online. For our purposes, the Calendar (Fig. 6.4) is also a very important factor, because the Program requires following a schedule. So, the Calendar access at any time is an additional functionality, which allows the user to easily check the status of individual activities, events and tasks in combination with the push notifications directly to learners, as well as private messages, forum posts and submissions that keep everyone up to date. The personal mobile app features allow individual branding, imagery and color scheme, matching perfectly to user’s preferences.
6 Online Application of a Home-Administered …
Fig. 6.3 Installation process—database settings
Fig. 6.4 Moodle calendar
161
162
M. Stankova et al.
The organization’s custom app is available in the following platforms: • • • • • • • • • • • • • •
platform-android: on a device running Android. platform-cordova: on a device running Cordova. platform-core: on a desktop device. platform-desktop: on a device running Electron. platform-ios: on a device running iOS. platform-ipad: on an iPad device. platform-iphone: on an iPhone device. platform-linux: on a device running Linux. platform-mac: on a device running MacOS. platform-mobile: on a mobile device. platform-mobile web: in a browser on a mobile device. platform-phablet: on a phablet device. platform-tablet: on a tablet device. platform-windows: on a device running Windows.
The managed hosting using enterprise-grade cloud infrastructure ensures that the app and any data stored and transferred is GDPR compliant, which is another factor in favor to the app version. Another reason for registering the Moodle site in the mobile Apps portal is that it can retrieve statistics like the number of active user devices receiving. The 12-week program is developed in separate modules. It includes the following components: a short text, coupled with a picture that illustrates it; pictures for the target words per module (nouns, verbs and adjectives), related to the text. Every module has different text and pictures, but same exercises are to be performed on a daily basis, targeting impressive/expressive language; discourse abilities, wh-questions; nominative function, pragmatics and generalization (Fig. 6.5). The instructional component for parents involves activities (Fig. 6.6) within the online application Moodle, where each component is interactively introduced and demo videos of the various exercises are available in order to ease the correct performance at home and make sure instructions are properly followed. Clear instructions, methods for reinforcement—such as visual cards and interactive images combine with recorded dialogues and short video, representing the story.
6.3 Pilot Testing of the Program—Qualitative Analysis There were 27 participants that took part in the pilot testing of the Program: 6 girls and 21 boys with ASD in the age bracket 2–8 years. The results of the parents’ survey were as follows: • 6 of the children have progress, quickly learn the words from the images, like the application, listen to the stories with pleasure, and perform the tasks required;
6 Online Application of a Home-Administered …
Fig. 6.5 Moodle versions of tasks
Fig. 6.6 Moodle screenshots from “The Wind and the Sun” practical tasks
163
164
M. Stankova et al.
• for 2 of the children who are non-verbal, it is difficult to assess progress; • 1 parent reports that the tasks are very difficult and impossible for her child to complete; • 2 of the parents recommend that the text should be enlarged in order to be legible; • 7 parents report difficulties in child’s concentration ability as to stay still and listen to the text; • 3 parents share that the text and content seem too complex for their child’s abilities; • 2 parents report difficulty in following the exact schedule and sequence; • 3 parents share that they have not encountered any difficulties in performing the tasks; • 3 parents appreciate the opportunity to work at a time of their convenience; • 14 of the parents define high quality of the visual material—the pictures and their relevance to the content of the texts; • 3 parents consider the stories as adequately selected, interesting, entertaining and instructive; • 2 parents report the daily repetition of verbs useful; • 4 parents report as positive the fact that stories relate to real life situations; • 15 parents share that they like the Program as a whole and would work with new ones or extensions of the same. Most of the parents share that children do not follow their instructions and fail to concentrate on task completion; • 3 parents share that the Program is not entirely suitable for the skills level of their child; • 26 out of 27 surveyed parents are satisfied with their participation, and report positive influence of the Program on their children. One parent shares the view that it is applicable to the further development of the skills of typically developing children, and not only for children with developmental delays. In the conditions of COVID-19 pandemics, the possibility the parents to be the therapists themselves is also considered as a positive effect, despite the fact that children do not follow their instructions at times. A number of parents-N3-consider it inappropriate for their child’s age, but have worked with it nonetheless. More than half of the parents say that the program has given them opportunities and ideas to work with children on their own.
6.4 Conclusions According to the behavioral model of communicative failure, regardless of the primary cause, the child with ASD undergoes series of communication failures that do not match expected performance in her/his environment. We assume that in the presence of certain personal characteristics, related to a temperament, prone to anxiety, high sensitivity to failure, discontent of comparison with others and inner dissatisfaction, the child chooses the model of social withdrawal and restriction of communication, possibly coupled with stereotypical repetitive movements and strictly routine activities in order to avoid performance assessment. In this sense,
6 Online Application of a Home-Administered …
165
parents’ positive attitude towards structured communication opportunities and trials, that follow a routine schedule, encouragement of turn-taking in a real communicative interaction would help/support the child to overcome the fear and withdrawal of communication, would induce her/his eagerness to communicate with her/his parents and the others, would develop sensitivity to language and its comprehension, that would, in its turn, lead to a better understanding of social situations.
References 1. American Psychiatric Association, 5th edn. Arlington: Diagnostic and Statistical Manual of Mental Disorders, (2013) 2. M.L. Bauman, Medical comorbidities in autism: challenges to diagnosis and treatment. Neurotherapeutics 7(3), 320–327 (2010) 3. R. Marotta, M.C. Risoleo, G. Messina, L. Parisi, M. Carotenuto, L. Vetri, M. Roccella, The neurochemistry of autism. Brain Sci. 10(3), (2020) 4. P.F. Bolton, Medical conditions in autism spectrum disorder. J. Neurodev. Disord. 1(2), 102–113 (2009) 5. D. Pinto, A.T. Pagnamenta, L. Klei, R. Anney et al., Functional impact of global rare copy number variation in autism spectrum disorders. Nature 466(7304), 368–372, (2010) 6. D.I. Zafeiriou, A. Ververi, E. Vargiami, Childhood autism and associated comorbidities. Brain Development 29(5), 257–272 (2007) 7. A. Lo-Castro, P. Curatolo, Epilepsy associated with autism and attention deficit hyperactivity disorder: is there a genetic link? Brain Dev. 36(3), 185–193 (2007) 8. J.E. Lainhart, E.D. Bigler, M. Bocian, H. Coon, E. Dinh, et al., Head circumference and height in autism: a study by the collaborative program of excellence in autism. Am. J. Med. Genet. A 140(210), 2257–2274, (2006) 9. S.S. Jeste, R. Tuchman, Autism spectrum disorder and epilepsy: two sides of the same coin? J. Child Neurol. 30(14), 1963–1971 (2015) 10. L. Strasser, et al., Prevalence and risk factors for autism spectrum disorder in epilepsy: a systematic review and meta-analysis, Dev. Med. Child Neurol. 60(1), 19–29, (2018) 11. L. Mazzone, V. Postorino, M. Siracusano, A. Riccioni, P. Curatolo, The relationship between sleep problems, neurobiological alterations, core symptoms of autism spectrum disorder, and psychiatric comorbidities. J. Clin. Med. 7(5), 102 (2018) 12. S. Cohen, R. Conduit, S.W. Lockley, S.M. Rajaratnam, K.M. Cornish, The relationship between sleep and behavior in autism spectrum disorder (ASD): a review. J. Neurodev. Disord. 6(1), 44 (2014) 13. B.A. Malow, et al., A practice pathway for the identification, evaluation, and management of insomnia in children andadolescents with autism spectrum disorders. Pediatrics 130(Suppl. 2), 106–124, (2012). 14. C. Lajonchere, N. Jones, D.L. Coury, J.M. Perrin, Leadership in health care, research, and quality improvement for children and adolescents with autism spectrum disorders: autism treatment network and autismintervention research network on physical health. Pediatrics 30(Suppl. 2), 62–68 (2013) 15. T. Buie, D.B. Campbell, G.J. Fuchs 3rd, G.T. Furuta, J. Levy, et al., Evaluation, diagnosis, and treatment of gastrointestinal disorders in individuals with ASDs: a consensus report. Pediatrics 125(Suppl. 1), 1–18, (2010) 16. C. Holingue, C. Newill, L.C. Lee, P.J. Pasricha, M.F. Daniele, Gastrointestinal symptoms in autism spectrum disorder: a review of the literature on ascertainment and prevalence. Autism Res.: official journal of the International Society for Autism Research 11(1), 24–36, (2018)
166
M. Stankova et al.
17. S. Woolfenden, V. Sarkozy, G. Ridley, K. Williams, A systematic review of the diagnostic stability of autism spectrum disorder. Res. Autism Spectr. Disord. 6, 345–354 (2012) 18. N. Meghan, Y. Qian, M. Massolo, L.A. Croen, Psychiatric and medical conditions in transitionaged individuals with ASD. Pediatrics 141(Suppl. 4), 335–345 (2018) 19. C. Tye, A.K. Runicles, A.J.O. Whitehouse, G.A. Alvares, Characterizing the interplay between autism spectrum disorder and comorbid medical conditions: an integrative review. Front. Psychiatry 9, (2019) 20. E.C Kurtz-Nelson, et al., Co-Occurring medical conditions among individuals with ASD— Associated Disruptive Mutations. Children’s Health Care, 1–24, (2020) 21. O. Zerbo, A. Leong, L. Barcellos, P. Bernal, B. Fireman, L.A. Croen, Immune mediated conditions in autism spectrum disorders. Brain Behav. Immun. 46, 232–236 (2015) 22. C.J. McDougle, S.M. Landino, A. Vahabzadeh, J. O’Rourke, N.R. Zurcher, B.C. Finger, M.L. Palumbo, J. Helt, J.E. Mullett, J.M. Hooker, W.A. Carlezon Jr., Toward animmune-mediated subtype of autism spectrum disorder. Brain Res. 1617, 72–92 (2015) 23. J. Isaksen, V. Bryn, T.H. Diseth, A. Heiberg, S. Schjølberg, O.H. Skjeldal, Children with autism spectrum disorders—the importance of medical investigations. Eur. J. Paediatr. Neurol. 17(1), 68–76 (2013) 24. R.N. Nikolov, K.E. Bearss, J. Lettinga, C. Erickson, M. Rodowski, M.G. Aman, et al., Gastrointestinal symptoms in a sample of children with pervasive developmental disorders. J Autism Dev Disord 39(3), 405–413, (2009) 25. T.E. Rosen, C.A. Mazefsky, R.A. Vasa, M.D. Lerner, Co-occurring psychiatric conditions in autism spectrum disorder. Int. Rev. psychiatry (Abingdon, England) 30(1), 40–61 (2018) 26. F. Doshi-Velez, Y. Ge, I. Kohane, Comorbidity clusters in autism spectrum disorders: an electronic health record time-series analysis. Pediatrics 133(1), 54–63 (2014) 27. L.H. Quek, K. Sofronoff, J. Sheffield, A. White, A. Kelly, Co-occurring anger in young people with Asperger’s syndrome. J. Clin. Psychol. 68(10), 1142–1148 (2012) 28. A. Saunders, K.E. Waldie, Distinguishing autism from co-existing conditions: a behavioural profiling investigation. Adv. Autism 2(1), 41–54 (2016) 29. E. Gordon-Lipkin, R. Alison, J. Marvin, K. Law, P.H. Lipkin, Anxiety and mood disorder in children with autism spectrum disorder and ADHD, Pediatrics 141(4), (2018) 30. E. Simonoff, A. Pickles, T. Charmanand, et al., Psychiatric disorders in children with autism spectrum disorders: prevalence, comorbidity, and associated factors in a population-derived sample. J. Am. Acad. Child Adolesc. Psychiatry 47(8), 921–929, (2008) 31. J.N. Constantino, R.D. Todd, Autistic traits in the general population: a twin study. Arch. Gen. Psychiatry 60(5), 524–530 (2003) 32. M. Grandgeorge, M. Hausberger, S. Tordjman, M. Deleau, A. Lazartigues, E. Lemonnier, Environmental factors influence language development in children with autism spectrum disorders. PLoS ONE 4(4), 4683 (2009) 33. I.M. Eigsti, J.M. Schuh, in Language and the Human Lifespan Series, Language Acquisition in ASD: Beyond Standardized Language Measures, ed. by L.R. Naigles. Innovative Investigations of Language in Autism Spectrum Disorder, (American Psychological Association, 2017), pp. 183–200 34. E. Pickett, O. Pullara, J. O’Grady, B. Gordon, Speech acquisition in older nonverbal individuals with autism: a review of features, methods, and prognosis. Cogn. Behav. Neurol. 22(1), 1–21 (2009) 35. K. Chawarska, A. Klin, R. Paul, F. Volkmar, Autism spectrum disorder in the second year: stability and change in syndrome expression. J. Child. Psychol. Psychiatry 48(2), 128–138 (2007) 36. H. Tager-Flusberg, R.M. Joseph, Identifying neurocognitive phenotypes in autism. Philos. Trans. R. Soc. Lond. B Biol. Sci. 358(1430), 303–314, (2003) 37. H.M. Chiang, Expressive communication of children with autism: the use of challenging behaviour. J. Intellect Disabil. Res.: JIDR 52(11), 966–972 (2008) 38. J.L. Matson, T.T. Rivet, J.C. Fodstad, T. Dempsey, J.A. Boisjoli, Examination of adaptive behavior differences in adults with autism spectrum disorders and intellectual disability. Res. Dev. Disabil. 30(6), 1317–1325 (2009)
6 Online Application of a Home-Administered …
167
39. R. Paul, K. Chawarska, C. Fowler, D. Cicchetti, F. Volkmar, Listen my children and you shall hear: auditory preferences in toddlers with autism spectrum disorders. J. Speech Lang. Hear. Res. 50(5), 1350–1364 (2007) 40. S. Shumway, A.M. Wetherby, Communicative acts of children with autism spectrum disorders in the second year of life. J. Speech Lang. Hear. Res. 52(5), 1139–1156 (2009) 41. T. Charman, S. Baron-Cohen, J. Swettenham, G. Baird, A. Drew, A. Cox, Predicting language outcome in infants with autism and pervasive developmental disorder. Int. J. Lang. Commun. Disord. 38(3), 265–285 (2003) 42. K.A. Anderson, C. Sosnowy, A.A. Kuo, T.P. Shattuck, Transition of individuals with autism to adulthood: a review of qualitative studies. Pediatrics 141(Suppl. 4), 318–327 (2018) 43. J. Prata, W. Lawson, R. Coehllo, Parent training for parents of children on the autism spectrum: a review. Int. J. Clin. Neurosci. Ment. Health, (2018) 44. J.S. Karst, A.V. Van Hecke, Parent and family impact of autism spectrum disorders: a review and proposed model for intervention evaluation. Clin. Child. Fam. Psychol. Rev. 15(3), 247–277 (2012) 45. S.J. Rogers, L. Vismara, A.L. Wagner, C. McCormick, G. Young, S. Ozonoff, Autism treatment in the first year of life: a pilot study of infant start, a parent-implemented intervention for symptomatic infants. J. Autism Dev. Disord. 44(12), 2981–2995 (2014) 46. C. Kasari, K. Lawton, W. Shih, T.V. Barker, R. Landa, C. Lord, F. Orlich, B. King, A. Wetherby, D. Senturk, Caregiver-mediated intervention for low-resourced preschoolers with autism: an RCT. Pediatrics 134(1), 72–79 (2014) 47. J.L. Matson, S. Mahan, S.V. LoVullo, Parent training: are view of methods for children with developmental disabilities. Res. Dev. Disabil. 30(5), 961–968 (2009) 48. D. Preece, V. Trajkovski, Parent education in autism spectrum disorder—a review of the literature. Hrvatska Revija Za Rehabilitacijska Istraživanja 53(1), 128–138 (2017) 49. K. Chawarska, A. Klin, R. Paul, S. Macari, F.A. Volkmar, Prospective study of toddlers with ASD: short-term diagnostic and cognitive outcomes. J. Child Psychol. Psychiatry 50(10), 1235– 1245 (2009) 50. M.L. Cook, Y. Zhang, J.N. Constantino, On the continuity between autistic and schizoid personality disorder trait burden: a prospective study in adolescence. J. Nerv. Ment. Dis. 2(100), 94–100 (2020) 51. R. Pender, P. Fearon, J. Heron, W. Mandy, The longitudinal heterogeneity of autistic traits: a systematic review. Res. Autism Spectr. Disord. 79, (2020) 52. T. Achenbach, L. Rescorla, Manual for the ASEBA Preschool Forms & Profiles (University of Vermont, Research Center for Children, Youth & Families, Burlington, 2000)
Chapter 7
Explainable AI, But Explainable to Whom? An Exploratory Case Study of xAI in Healthcare Julie Gerlings, Millie Søndergaard Jensen, and Arisa Shollo
Abstract Advances in AI technologies have resulted in superior levels of AIbased model performance. However, this has also led to a greater degree of model complexity, resulting in “black box” models. In response to the AI black box problem, the field of explainable AI (xAI) has emerged with the aim of providing explanations catered to human understanding, trust, and transparency. Yet, we still have a limited understanding of how xAI addresses the need for explainable AI in the context of healthcare. Our research explores the differing explanation needs amongst stakeholders during the development of an AI-system for classifying COVID-19 patients for the ICU. We demonstrate that there is a constellation of stakeholders who have different explanation needs, not just the “user”. Further, the findings demonstrate how the need for xAI emerges through concerns associated with specific stakeholder groups i.e., the development team, subject matter experts, decision makers, and the audience. Our findings contribute to the expansion of xAI by highlighting that different stakeholders have different explanation needs. From a practical perspective, the study provides insights on how AI systems can be adjusted to support different stakeholders’ needs, ensuring better implementation and operation in a healthcare context. Keywords Artificial Intelligence · X-ray · COVID-19 · xAI · Explainable AI · Decision making support · Stakeholder concerns
J. Gerlings (B) · A. Shollo Department of Digitalization, Copenhagen Business School, Howitzvej 60, 2000 Frederiksberg, Denmark e-mail: [email protected] M. S. Jensen Department of Linguistics, Cognitive Science and Semiotics, Aarhus University, Langelandsgade 139, 8000 Aarhus C, Denmark © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_7
169
170
J. Gerlings et al.
7.1 Introduction The adoption, use and diffusion of Artificial Intelligence (AI) technologies in healthcare comprise an area that currently receives vast amounts of attention from both researchers and practitioners. This increased attention is highly motivated by advances in machine learning (ML) technology, which have significantly improved the average performance of AI systems. ML-based systems are able to inductively learn rules from training data, typically consisting of input–output pairs (e.g., historical medical images and corresponding diagnoses). Once the rules are learned, they can be applied to make inferences about unseen data (e.g., medical images of a new patient). ML-based AI systems have enabled the automation and augmentation of sophisticated tasks previously performed exclusively by medical specialists. However, these systems are often built on non-transparent models where it is unclear how a model arrives at a given prediction. These systems are consequently often referred to as “black box” models [1, 2]. Explainable AI (xAI) has emerged as a response to this “black box” modelling and seeks to provide explanations that accommodate human understanding and transparency [3–5]. In a healthcare context, existing literature on the role of xAI suggests that xAI is believed to enhance the trust of medical professionals interacting with AI systems [2]. At the same time, rising legal and privacy aspects [6], as well as, accountability issues, reliability, justification, and risk reduction [7, 8] are other driving factors in the development of xAI for the healthcare sector. These are all areas that are important to explore in order to secure successful AI implementation in healthcare. However, much of the research done so far is conceptual in nature, and there is currently limited knowledge on the need for xAI in applied healthcare contexts. In this chapter, we aim to gain further insights on this concept by exploring how the need for xAI arises during the development of AI applications and which issues xAI can help alleviate in an empirical setting. To investigate this, we conducted a case study and followed an AI startup during their development of an AI-based product for the healthcare sector. We guided our case study with the following research question: “How does the need for xAI emerge during the development of an AI application?” The empirical setting for our case study is a Nordic healthtech company specializing in medical imaging; the company is widely recognized as a startup with strong professional competence in this area. Our study explores the development phase of an AI-based medical imaging product during the COVID-19 crisis; this product will hereafter be referred to as LungX. Based on a desire to utilize their competences to help alleviate some of the strain on health services, the company developed LungX to assist in automatic early assessment of COVID-19 patients. One of the main challenges facing health services in relation to the COVID-19 crisis is predicting how the disease develops for each patient and how it thereby affects the resources of a given hospital [9]. Our investigation follow the development of LungX with a particular focus on identifying the ability of xAI to accommodate the needs of different stakeholders during the product life cycle.
7 Explainable AI, But Explainable to Whom? …
171
The remainder of the chapter is structured as follows: the next section provides an overview of related literature, followed by a detailed description of our methods and the empirical setting of our research. We then present the findings of the study before closing with a discussion on the theoretical and practical implications of our findings.
7.2 Related Work As a starting point for answering our research question, we drew on prior work at the intersection of AI, xAI, and medical work with a primary focus on radiology. In the following section, we present relevant literature describing the drivers of AI adoption in healthcare, the emergence of xAI, and lastly, how both AI and xAI have been employed in the fight against the COVID-19 pandemic.
7.2.1 Adoption and Use of AI in Healthcare AI-based algorithms are increasingly being developed for use in medical applications [2, 7, 10]. AI offers medical professionals the capacity to develop powerful, precise models capable of delivering individually tailored treatments backed up by aggregated and anonymized healthcare data [9, 11]. Hence, AI technologies create new opportunities for discovering new and improved treatment plans, facilitating early detection of diseases, and monitoring disease progression [12]. Further, these powerful technologies promise to solve problems of sparse resources by carrying out tasks currently operated by specialists such as radiologists, whose availability and mobility are limited [13, 14]. At the same time, they address the increasing need to keep up with the growing population and provide higher quality healthcare. However, the majority of these complex AI-based models remain in the development phase, never reaching production [9, 11, 12]. The increasing presence of complex AI models in healthcare has generated strong debates, as outcomes have varied [9, 15]. Models portrayed have failed to meet performance expectations, jeopardized critical decisions, and used discriminatory factors in determining outcomes [5, 13, 15–18]. Projects such as “Watson for Oncology” [19], an unnamed, broadly used algorithm to manage the health of large populations [20]; Google Health [21], which can identify signs of diabetic retinopathy; or COMPAS [22], a decision support tool used in court houses to assess risk of recidivism, have all failed one way or another when put into test or even production. This has spawned a backlash against AI-based outcomes due to the potentially severe consequences they might have for humans are making high-stakes medical decisions. Improving the performance of machine learning models often requires increased complexity of the underlying models, larger datasets and more computing power [4, 23, 24]. However, due to this heightened complexity, it becomes impossible to
172
J. Gerlings et al.
understand how these models work, how the data is processed and how outcomes are generated [7, 25–27]. Hence, scholars have characterized such models as opaque or black box models [28–30]. In black box models, metrics such as accuracy, precision and prediction speed take a front seat, hindering the ability of the general population to understand relevant outcomes making. Their use in high-stakes decision-making, such as medical decision-making [2, 31–34], has led to increasing demands for transparent and explainable AI. Recent studies have shown that model transparency, interpretable results and an understanding of clinical workflow are all necessary to ensure the correct use and adoption of these powerful models [2, 35, 36] in healthcare practices. Apart from generating transparent or explainable models, it has also proven difficult to organize cross-disciplinary work between healthcare workers and data scientists to enable essential knowledge-sharing. Recently, scholars have addressed the issue of an “AI Chasm” in the clinical research community [27, 36–38], identifying a gap in the literature between building a sound and scientifically correct model and using it in a real-world medical clinic. Though the EHR (electronic health record) has improved data volume in some parts of the world, both data quality and availability for training AI models are still prominent issues [11]. Lastly, regulatory obstacles like FDA approval and/or CE certification, which are typically necessary to obtain clinical trials, are slow processes that demand high performance results and documentation [36, 39]. These circumstances create a catch-22 situation where the production and implementation of higher-quality models are inhibited by restricted access to relevant data for training, poor performance scores and limited amount of knowledge about how experts work.
7.2.2 Drivers for xAI Many scientists have suggested that xAI could ease and promote the adoption of AI in the medical domain, as explainability could accommodate understanding and trust, making stakeholders more willing to adopt a given AI application [6–8, 11, 40]. Studies indicate that information provided by xAI frameworks may be of great relevance [2, 10, 35] in a decision-making context. For example, Cai et al. [2] found that clinicians are interested in the local, case-specific reasoning behind a model decision as well as the global properties of the model. These information needs are similar to the information clinicians need when interacting with medical colleagues to discuss a patient case. Both the local and global model information could potentially be provided by xAI frameworks such as LIME, SHAP or PDP [41, 42]. In connection with this, Lebovitz [11] studied how an AI application, which aimed at diagnostic support in radiology, was used in practice; the study found that the introduction of AI in the decision-making context introduced additional ambiguity into medical decision-making and caused their routine decision-making tasks to become nonroutine. The study emphasized that the lack of information about the model workings intensified the degree of ambiguity and that knowing more about how the algorithm
7 Explainable AI, But Explainable to Whom? …
173
was trained would have increased confidence in the results of the AI application. However, there remain very few studies investigating the actual information needed when introducing AI in algorithm-assisted decision-making in the medical sector. These studies motivated our research in xAI approaches and how they might be able to provide information to ease the understanding, adoption and implementation of complex AI-based models.
7.2.3 Emergence of xAI The demand for transparency and explainable AI has emerged as a response to the increasing “black box” problem of AI. xAI refers to methods and techniques that seek to provide insights into the outcome of a ML model and present it in qualitative, understandable terms or visualizations to the stakeholders of the model [4, 6, 24, 43]. Moreover, [1] described xAI as follows: ‘Explainability is associated with the notion of explanation as an interface between humans and a decision maker that is, at the same time, both an accurate proxy of the decision maker and comprehensible to humans’. The terms explanation, transparency and interpretation all appear frequently in research dealing with the “black box” problem of AI, but the field has not yet reached a consistent, universal consensus on their meaning. This is partly due to the nascent state of the field and the fact that it spans multiple disciplines. xAI itself is the newest concept associated with human understanding of models and the efforts made in response to opaque deep neural networks [3, 44]. Explainability entails that a meaningful explanation can be formulated for a stakeholder [1]. Interpretability, on the other hand, entails the ability to assign subjective meaning to an object [44]. Interpretation occurs when a human uses their cognitive capabilities to form meaning from information. Transparency has a much more technical origin within the field of xAI. Here, “transparency” is mainly used in computer science literature to describe different frameworks or methods to simplify complex models into binary models, rule generators or the like [1, 5, 42, 45–50]. When defining xAI products in relation to their utility for humans/model stakeholders, one key dilemma is the trade-off between complexity and interpretability. Oftentimes, scientists and engineers must choose between a more interpretable but less complex model and a less interpretable model that may offer a more accurate representation of reality. Another important trade-off arises in the form of accurate explanations versus comprehensible explanations; the more accurate the explanation, the more incomprehensible it will be for AI-illiterate stakeholders. Even though it may be possible to use xAI frameworks to accurately depict the model output or how the model produced the output, this description may still be incomprehensible to AIilliterate stakeholders [24, 51, 52]. While these dilemmas have been acknowledged by scholars, there is a limited understanding of how they are dealt with in practice. Social scientists have begun to address the request for more knowledge on the constitution of explanations—conceptual papers such as [46] have addressed the
174
J. Gerlings et al.
origins of explanations themselves, how we are biased in our explanation interpretation, and how explanations are phenomena that occur in the context of human interaction. Moreover, researchers from the socio-technical HCI-related fields [53, 54] have problematized the way xAI research is approached by data scientists. Because xAI technology has its origins in data science, it was initially used as a tool for debugging and variable exploration. These studies have implicitly adopted a universal conceptualization of explanations as useful for either developers or users. Further research is required to understand the constellation of stakeholders for AI models—the receivers of explanations—and the need for explanations they deem satisfactory. Moreover, scholars have similarly called for a more nuanced understanding of xAI and how it can satisfy different stakeholder needs by building more targeted explanations [7, 32, 55–57].
7.2.4 AI and xAI in the Fight Against the COVID-19 Pandemic Before the WHO (World Health Organization) had even announced COVID-19 as a potential pandemic, AI-assisted and autonomous systems had succeeded in detecting and predicting the spread and severity of the pandemic [58]. Systems like BlueDot and HealthMap, which previously helped identify and detect SARS and the Zika virus (other coronaviruses), were now the first to identify unusual viral activity in Wuhan, China [59–62]. Algorithms utilizing big data and people’s whereabouts both on- and offline helped warn the authorities about the impending crisis, helping public health officials respond in a fairly fast manner [61]. However, as we have seen, pandemics are extremely hard to combat; therefore, methods for identifying infected patients and disease severity are vital steps in this fight [18, 63]. The most common way to diagnose COVID-19 is by using the Reverse Transcription Polymerase Chain Reaction (RT-PCR), which is an expensive, time-consuming, resource-heavy and complicated process. This has fueled the search for alternative ways of detecting the virus [18, 58, 63]. Recent research has shown promising results from methods using CT scans and lung X-rays to detect COVID-19 [64, 65]. However, the sparsity of radiologists who need to describe the images and give a diagnostics report have spurred the use of AI-based models to assist in this process. Current literature has identified two main approaches that address COVID-19 in chest X-rays (CXR): a classification and detection problem or a severity measure approach. The first approach is concerned with identifying COVID-19 and other lung-related pathologies (or anomaly detection, which tries to identify only COVID19 infections from healthy CXRs). The classification and detection approaches have the same goal of identifying COVID-19, whereas the other approach is concerned with a severity measure for disease progression itself [66]. As the pandemic has evolved and the infrastructure for administering RT-PCR tests has greatly improved, the scope of AI-supported image scans has leaned towards disease progression and
7 Explainable AI, But Explainable to Whom? …
175
severity measures instead of disease diagnosis. Data used for training these deep neural networks (DNN) primarily consists of open-source CXR image repositories, which often contain labeled images of viral and bacterial pneumonia, fractures and tuberculosis. However, image quality has proven suboptimal, as original X-ray images offer much higher quality and contain more helpful metadata in the DICOM (Digital Imaging and Communications in Medicine) format than the images often available in a compressed format, such as JPEG [67–69]. A recent study [68] on how data processing and its variability in deep learning models affect explainability describes the fallacies of biased DNNs, which are not visible without some deeper insight into the models: ‘…it is unclear if the good results are due to the actual capability of the system to extract information related to the pathology or due to the capabilities of the system to learn other aspects biasing and compromising the results.’ [68]. Researchers have used xAI to gain insights into the inner workings of their models and how it detects pneumonia in CXR. In particular, they have utilized the Gradient-weighted Class Activation Mapping (Grad-CAM) method [70], originally a debugging tool for DNN, to visualize the output of the model with a heatmap overlay. This visualizes the pixels of interest in the image and colors them in different shades according to their importance in relation to the output. This way, the visualization can be more easily validated by developers or radiologists to see if the correct pixels (variables) are marked by the model. Others have used the LIME framework [6] for similar purposes, helping radiologists by visualizing the model and identifying which features play a crucial role in distinguishing between COVID-19 patients and other patients [71]. Hryniewska et al. [15] pointed out that when a radiologist was shown the assumptions about data, models and explanations, many of the models created for COVID-19 were deemed incorrect. They therefore created a useful checklist for building and optimizing future model performance and reliability. The above studies show how xAI can be used to provide human-interpretable explanations for radiologists using the model for decision support or for developers either debugging or testing the model’s performance and diminishing biased data, for example. This nascent research on the fight against COVID-19 appears to follow the same steps as the more general literature on healthcare xAI. Overall, we observe that current studies view xAI as an advanced method that provides universal explanations without taking into consideration the specific needs of different stakeholders. In light of the above and the concept of explainability as socially constructed [46], we conducted a case study to examine how the need for xAI emerges and how the needs of different stakeholders are taken into account. Hence, this study aims to shed light on how xAI influences the development, adoption and use of AI-based products in healthcare settings.
176
J. Gerlings et al.
7.3 Method The empirical basis for this research is an illustrative interpretive case study that investigates the use of xAI in a healthcare setting. Case studies are particularly valuable for exploratory research where a thorough understanding of a phenomenon in a particular context is preferred [72, 73]. Further, the choice of the method depends on the nature of the question that is being investigated. Case studies are best suited for investigating “how” questions [74], in this instance how the need for xAI emerges during the development of an AI-based healthcare project. Against this background, we conducted an exploratory case study in a Nordic start-up developing AI-based tools for decision support in healthcare. The start-up is currently working on two different products; whereof the one we investigated is aimed at detecting COVID-19 based on lung X-rays and providing an automated severity score. Two of the researchers collected empirical data over a period of six months, focusing on the development process of the product.
7.3.1 Data Collection The primary data source was semi-structured interviews with employees at the company, online workshops and a collection of written documents produced by the company. Due to the circumstances of COVID-19, real-time observations were very limited, and most interviews and workshops were conducted via Microsoft Teams. Moreover, follow-up questions and updates from both sides were exchanged through email. Interviews were conducted by two researchers, one leading the interview and the other following up on relevant points and remarks from the interviewee. Participants were informed of the purpose of the interview, the focus of our research, and the fact that the interviews would be anonymized. The interviews lasted an average of 50 min, and they were conducted in English. All the interviews were recorded and transcribed with the consent of the interviewees. We conducted nine interviews with seven employees involved in the project, whereof two interviews were follow-up interviews with the CTO and a medical annotator (see Fig. 7.1 for more details). We did not interview end-users such as radiologists, ICU clinicians or patients, since LungX was not tested or implemented in hospitals during our research period. The interviews were based on an interview guide. Hence, we started with demographic and open-ended questions, followed by questions focusing on the history of the project, the interviewees’ daily work with the project, data quality, use of xAI, model requirements and future expectations. The interviews provided us with a thorough understanding of the project, including insights into both business and technical aspects of the product. While limited, the interviews were sufficient for the purposes of proof of concept.
7 Explainable AI, But Explainable to Whom? …
Role in Company:
CEO
CTO (2 interviews)
Developer
Developer
Product Owner
Medical annotator, (2 interviews)
Clinical Operations Officer
Main responsibilities:
Developing the business case of the product; responsible for businessrelated communication with clinics
177
Interview themes
Business aspect of the product, understanding of clinical operations, decisions for pursuing the case
Designing the entire product (conceptual as well as technical parts); making research prototype
Technical perspective of the case, medical imaging and product details, reasoning behind decisions
Converting machine learning research code into quality-assured code for the product
Deeper insights into the product, including how the model is constructed
Fetching data from online sources and storing it for training; aligning technical aspects of LungX project with existing projects
Data quality and annotation process
Medical annotation of data; developing new features the product should include (medical advisory role)
Data quality, annotation process, validation
Medical annotation of data
Will be involved in planning the clinical validation of the product and day-to-day interactions with clinics; has not been very involved in early phases of product development
Fig. 7.1 Interviewee summary
Data quality, annotation process, validation
Project testing procedures, obstacles in product testing at hospitals, deep understanding of clinical processes at hospitals
178
J. Gerlings et al.
Background information (including business proposal, PowerPoint presentations, project plans, grant application, a demonstration of the model and the user interface, user manual, requirement documentation and meeting minutes) was also collected as complementary material to the interview data.
7.3.2 Data Analysis For the data analysis, the researchers adopted an “insider–outsider” interpretive approach [75, 76] where, initially, they established the “insider” perspective of how the need for xAI emerged in the AI-based project. Once the “insider” understanding was formed, we engaged on a more abstract, theoretical level—the so-called “outsider” point of view—where the researchers created a link between the four dimensions [77]. In other words, we associated the “insider” point of view with the “outsider” point of view to merge our understanding of practice with our understanding of the existing literature. The first and second author undertook the entirety of the field work, jointly developing an “insider” view of the process. The third author looked at the data after collection—having an “outsider” perspective on the phenomenon and the research site allowed the third author to provide new ways of theorizing and identify new patterns in the data which were discussed with the other authors. This method of data analysis revealed itself to be very fruitful, since the first two authors could draw on their rich understanding of the data during discussions on the identified patterns by agreeing or disagreeing with the third author, thus linking the “insider” view with the existing literature. Our qualitative data analysis followed the approach of Gioia et al. [76]. We employed constant comparative techniques and open coding [78] to analyze our data, and our data analysis was an iterative process during which we discussed codes until we reached an agreement. First, we moved all interview transcripts and background materials to NVIVO software to look for specific indicators of the practices where the need for explanations emerged in the project. These practices constitute our unit of analysis. Next, we started by applying open coding on the interview transcripts to identify first-order concepts, staying close to the informant’s original statements. We linked these original concepts together and formulated second-order themes (axial coding). Finally, we further grouped the themes together to come up with four dimensions through which the need for xAI manifested itself in the project. We illustrate our data structure in Fig. 7.2, where we depict the concepts, themes and dimensions identified in the data analysis. As our coding of the data progressed, the first-order concepts emerged from quotes by interviewees. These were then grouped into second-order themes reflecting a higher-level concurrent theme structure in the coding. Lastly, these themes were aggregated into four dimensions of stakeholders’ concerns emerging in the development of LungX: Development Team concerns, Subject Matter Expert (SME) needs and concerns, Decision-Maker needs
7 Explainable AI, But Explainable to Whom? …
179
Fig. 7.2 Data structure
and concerns, and Audience: (Patient) concerns. These concerns are constituted by the model’s different stakeholders articulating their own relationship to the model or their perception of other stakeholder groups in the development phase of LungX. Assessing the second-order themes, it is important to note that they are not necessarily mutually exclusive to the different dimensions but rather have the most influence on the depicted dimension. These stakeholder concerns are discussed in more detail in the findings section. Going forward, we present the case setting followed by the findings from the data analysis and structure, thereafter a discussion of the findings, conclusion, and further research.
7.4 Case Setting The empirical setting for our case study is a Nordic healthtech start-up that specializes in medical imaging. The company was founded within the last five years with the vision of simplifying radiology and improving the patient journey through diagnostics. The five founders come from a mix of business and university backgrounds— four of them have extensive experience within medical imaging, while one of them has many years of experience with the commercial execution of life science projects. The company has approximately 20 full-time employees and a handful of part-time associates organized within the areas of business, research, development, operations
180
J. Gerlings et al.
and regulatory affairs. When our research project started, the startup had one Class I CE-marked product on the market: a medical imaging solution able to triage patients and provide diagnostic support based on brain MRI scans. While it was not the subject of our focus, the existing brain product is worth mentioning because it came up in several interviews, as some of the elements of LungX were inspired by the brain solution. Our research is centered around the development of LungX, a new solution based on lung X-rays. The idea to develop LungX emerged based on a desire to alleviate strain on health services providers during the COVID-19 crisis by putting the company’s medical imaging capabilities to use. One of the main challenges facing health services providers in relation to the COVID-19 crisis is diagnosing and predicting how the disease will develop [9]. To address this, the company partnered with relevant hospital stakeholders and collaborators from a computer science department at a top Nordic university to develop the idea of using imaging to diagnose and predict disease development, such as the need for intubation, admission to intensive care unit or prediction of death, and thereby assisting in resource planning at hospitals. During the initial phases of product development, it became clear to the company that LungX could be based on X-ray scans as they are generally faster, more mobile and more accessible than CT scans. The fact that many hospitals have mobile X-ray scanners was a big advantage; this meant that healthcare providers could bring the scanner to patients instead of vice versa, making it easier to avoid spreading infection. LungX was therefore built to take X-rays as input and use advanced AI and image analysis to provide real-time information that can assist in resource planning. The input LungX provides to the resource planning process comes in the form of a severity score for COVID-19 patients—a modified Brixia score [79] that quantifies how many of the patient’s 12 lung zones are affected by lung edema/consolidation. Moreover, LungX is able to detect six different lung abnormalities (atelectasis, fracture, lesion, pleural effusion, pneumothorax, edema/consolidation). The severity score is calculated when edema/consolidation is detected, and it indicates how widespread the finding is throughout the lungs. The detection of the six different lung abnormalities provides trained medical professionals with complementary information in the evaluation of chest X-rays and is assistive in nature. Besides the detection of lung abnormalities and the severity score, LungX also provides a planning element based on triaging patients. The triage assessment for each incoming patient measures the severity of the pathologies detected and is based on a ranking of the lung abnormalities and the severity score to help support planning of patient care. It is meant to give a quick status overview of current patients and prioritize those with the most severe symptoms. Thus, LungX has multiple value propositions that can be utilized based on the specific needs of a given hospital. For the interested reader, the technical details, data foundation and test performance metrics for LungX are described in Appendix 1. The information provided by LungX is visualized in two different interfaces: The Study List and the Findings Viewer. The Study List provides an overview of all patients scanned and their triage category. This means that the Study List can be used to prioritize patients based on their triage category so the patients with the
7 Explainable AI, But Explainable to Whom? …
181
most critical triage category can be seen first. The Findings Viewer shows all the information extracted by LungX for a single patient, including their scan, their triage category and a findings overview showing whether each of the six abnormalities has been found in a binary fashion (present or not present). If a given abnormality has been found, it is possible to use a “Findings Selected” option to see a markup of where in the X-ray LungX has identified the finding. Figures 7.3 and 7.4 show a representation of the Study List and Findings Viewer, respectively. The two figures are not identical representations of the actual system, but they accurately portray the type and format of information presented in the GUI.
Fig. 7.3 Study list
Fig. 7.4 Findings viewer
182
J. Gerlings et al.
7.5 Findings While investigating how the need for xAI emerged during the development of the AI application, we observed a plethora of concerns as expressed by the interviewees. In particular, we found clear patterns of variations in how the need for xAI emerged, and we identified four aggregated concerns related to development, domain expertise, decision-making and patient-related concerns. These concerns manifested with specific stakeholders in mind. In the next sections, we present the specific stakeholder concerns that lead to the need for xAI in AI-based healthcare applications.
7.5.1 Development Team Explanation needs were expressed not only by the members writing the software, building infrastructure or training the model, but also by people working with and annotating the training data, developing the GUI and testing the model and product. In this particular setup—the partnership among the university, the company, and the hospital—the company is responsible for researching and developing the best possible solution for identifying and classifying COVID-19 based on CXR (chest X-rays) and potentially other data sources. As a result, we observed that explanation needs emerged through the development team’s concerns, which primarily involved Model performance and Data foundation concerns. Model Performance Concerns Many times, interviewees stated that in order to achieve the aim of the project—to improve the patient journey—the model had to fulfill certain performance requirements while still complying with GDPR policies and local health regulations. The performance of the model constituted a major development obstacle in terms of obtaining satisfactory accuracy and precision scores. The main obstacle would be to ensure enough, you know, accuracy and precision. (Medical annotator)
Developers used the explainable element in the GUI to test and validate the model’s performance, viewing the output and performance in the GUI to ensure correct visualization of the pathologies in the CXR. Various tests on 20% of the test data landed around 0.8–0.9 AUC (area under the curve) for detecting COVID-19, green-lighting the model for empirical testing to see if those numbers would still stand. However, two types of pathologies proved more difficult in terms of maintaining a reliable performance. This resulted in a scope creep to four instead of six pathologies in the test version for the clinics and hospitals. It’s not because we don’t have enough data, it’s just we don’t have enough phenotype of the dataset in the sense that you could have lots of pneumothorax pretty clear, you could have faint pneumothorax but there are some pneumothorax that are extremely hard to see. (CTO)
7 Explainable AI, But Explainable to Whom? …
183
Moreover, the current model only examines images and does not take into account any additional information, such as gender, age or other patient information. On the one hand, this is good in terms of complexity, as fewer variables will often be easier to interpret; however, on the other hand, the model still needs to maintain a satisfactory performance. This is often a dilemma in developing models for critical decisionmaking, as one of the interviewers reports that more data would also improve the accuracy of the model: Right now, you only have the images and do not take into account any other additional patient-related data like age, gender, blood pressure, or whatever it might be that could be relevant in terms of improving the accuracy of the model. (Developer)
To improve the accuracy of the model, the developers are aware that additional information can significantly influence the model performance. We did see, well, an increased accuracy when they had age information as well, so I think age was definitely something to include. (Developer)
However, the current performance proves good enough for testing in empirical settings, but it will be expected to include more vital patient information in testing. Hence, this increases the need for explainability, as complexity in the model increases in parallel with more data and variables to comprehend. Though it’s worth noting that LungX is an aiding tool rather than a diagnostics tool, clinicians still need to have some understanding of the model output. However, the use of the model output can differ, depending on the role of the clinician. For example, a radiologist probably doesn’t need as much help finding a series of kind of common pathologies in a chest X-ray, I think they’re able to do that really really quickly, versus a junior clinician maybe working in the emergency department who might not have very much experience in looking at X-rays. And that actually might be the main benefit for them, whereas maybe in the radiology department it’s not so much about finding the pathology, it’s about having something that very quickly scores that and provides a risk assessment. (Product Owner)
Performance is now a situation-specific measure that depends on the intended use case (e.g., diagnostic assistance or risk assessment) and shows how it can differ from use case to use case. In this sense, performance is not necessarily usefully displayed in numeric values dealing with quick risk assessments, as it is arguably harder to comprehend a numeric value than simple color-coded indications in stressful situations. Development of the shown output should reflect the stakeholders’ needs in their specific situation, including the developers themselves. Data Foundation Concerns Data foundation concerns relate to the data collection approach practices of storing and cleaning data, as well as how data is structured, annotated and evaluated. The ability to build any form of supervised DNN (deep neural network) demands a data foundation that is labelled correctly, with a variation that is evenly distributed. Just as critical is the size of the dataset. Both the variation, size and distribution of the
184
J. Gerlings et al.
data foundation can be explored with different xAI frameworks to identify potential biases in the data prior to modelling. It is imperative to have a dataset large enough for both training and testing to provide a convincing consistency in the model. In this case, data annotation practices were standardized by an employee with a medical background, who created a protocol for how to interpret the CXR, segment them into different pathologies and annotate them. Three annotated datasets were used—one received from the hospital (containing COVID-19 CXR) and two open-source datasets containing different types of similar pathologies. These datasets established the data foundation of the model. The severity score the company developed builds upon the Brixia scoring system for COVID-19 [79, 80] to segment the lungs into 12 different zones, visualized and explained in the user manual. The zonal assessment severity score (ZASS) and the Brixia scores have been used at clinics as ICU doctors need something easily interpretable to measure patient progress. In this case, the company found out that radiologists were doing this by hand. They may, therefore, be familiar with interpreting this scoring, which is now automated for them. As illustrated in Fig. 7.5, the model measures the affected areas
Fig. 7.5 Visualization of ZASS calculation
of the lungs and gives a total score of 0–12. And then we found out, at one hospital, the radiologists were scoring the X-rays just because the intensive care doctors wanted something quantifiable to base patient progress on. That helped us define if we should use a similar scoring system. We can automate that scoring. (CTO)
Moreover, the company has employed several doctors—one of whom is the product owner—to guide the team as to which information is important for ICU doctors and radiologists when they make their decisions. The team has also established an international network of specialists from around the world, including Italy, which was one of the first countries to shut down due to the pandemic. The insights and learnings gained from these partnership-discussions with other clinicians were used to prioritize information and metadata related to the images. Furthermore, these discussions prompted the team to replace the previous heatmap overlays on the X-rays with the current boundary boxes to indicate findings in an image.
7 Explainable AI, But Explainable to Whom? …
185
However, knowing what data and information you want is not always the same as having it; one of the developers also expressed concern with using open-sourced data, as some information such as which scanner was used, image quality and if any preprocessing of the images had been done is often not described in these datasets. Besides these concerns, the ability to generate correctly classified COVID-19 instances is another hurdle the team managed to tackle with an accuracy around 0.8, by training and educating their employees in reading and interpreting CXR.
7.5.2 Subject Matter Expert Subject matter expert (SME) concerns relate to domain knowledge and validation practices. The role of the SME can shift depending on the context. In the development phase of LungX, when annotating the images with the different pathologies, the SMEs consist of the employed clinicians and one radiologist. The annotation team was creating the ground truth for the machine learning model to be able to recognize COVID-19. As such, the team has gone through extensive training in detecting abnormalities in CXR. Even though some are doctors themselves, the team members are young and have therefore chosen to partner up with a more senior radiologist with years of experience. Validation Concerns The team is very conscious of the life-or-death stakes their product is dealing with, which reflects their validation concerns and how the company addresses the need for a sound model that supports the trust of all stakeholders. Operating in highstakes environments has led the team to maintain a human-in-the-loop approach when inspecting the annotations of the data foundation, even though ML helped in identifying many of the pathologies by using NLP (natural language processing) tools to identify pathologies of interest. It’s worth noting that the more uncertainties the model relies on, the more doubtful it will be in the end, though this will not necessarily be visible in the performance metrics. To approach this, the company provided manual insurance by having SMEs oversee the annotation process. When we look at the images, we also had a report from the radiologists who had described it previously, and we use that report as a ground truth for confirming that findings were there. And the way that those images got found was an algorithm search in these reports. And they found keywords such as pneumothorax or pneumonia. And then they took that image and put it into the pneumonia or pneumothorax dataset. (Medical annotator)
Besides the validation of training data, the company faces a clinical validation process based on images delivered by the hospital. In this setting, the SME would be one of the collaborators from the hospital reviewing the output of the model and evaluating if the results are substantial. If so, they will continue testing the product in production.
186
J. Gerlings et al.
There will be a process of formal clinical validation happening in the next couple of weeks, and so that will be an analysis based on a data set and dedicated data set for validation. (Medical annotator) The clinical validation part is together with the clinics finding out whether or not the LungX solution is relevant for [the clinics] and actually provides benefit for them … and obviously, Hospital X is more a partner in terms of providing medical insights into what is needed and necessary, but also for testing and validation. (CEO)
Validating the output of the model with experienced professionals from the hospital as well as medical annotators led the company to drop two of the original six pathologies in scope, as their performance was not living up to their standards.
7.5.3 Decision-Makers Decision-making concerns were very prominent among the interviewees, as the value proposition of the product is directly related to the use of the system for improving decisions in the hospitals. However, LungX entails not only one but multiple value propositions supporting different end users. These could be the ICU clinicians, the hospital manager, the radiologist or the nurse. For instance, the severity score (ZASS) would primarily be useful for ICU clinicians. We identified four themes that represent concerns about decision-makers (e.g. clinicians, radiologist) and their interaction with LungX. These concerns were inferred from the interviews with other stakeholders, both medical and development staff at the company. Human/AI Discretionary Boundaries Decision-making is a process where various information is collected or presented, interpreted and used by humans to draw a conclusion. In the case of using AI systems to assist this process, the interviewees expressed their concern regarding discretionary boundaries between the model output and the decision-maker’s judgment. I’m always curious about what [LungX] says. I have my own findings, and I’m also very curious about what it says. If it’s the same, then I’m more confident in my findings. So as a young doctor, I would use it for that. Even though it can’t catch it all. (Medical annotator) Trying to understand who’s liable for the decisions is extremely important. (CEO)
The user interface of LungX has limited information on how certain the model is in its classification, which could lead young doctors to rely too much on potentially low accuracy output from the model. The need to adjust the complexity of the information displayed could be re-evaluated; though there is a risk of contaminating the interface, an accuracy score would potentially awaken a more critical judgment in young doctors. Hence, there is a need for further understanding the interplay between a human and an AI in order to achieve altogether better diagnostic outcomes. This information was consistently requested by the employees and is on the roadmap for implementation.
7 Explainable AI, But Explainable to Whom? …
187
I like to know the boundaries, the thresholds that determine whether the machine learning model classified it as normal or not normal. I’d like to know that threshold so I can use that in my own decision-making. And then I regard the software, the framework, as helping me with some inputs to my own decisions. That’s why I don’t want it to [automatically] make decisions for me, like diagnosing. (Medical annotator)
Though LungX will influence the decision-making process, it was emphasized several times that none of the clinical staff at the company wish for the model to take over the decision; rather, they’d prefer to use it as an aiding tool in their own decision-making. Therefore, it is even more important to stress that the final decision belongs to the doctor and that the model cannot stand alone. The primary goal is to assist with diagnostics or assist with planning but not kind of take over those processes because that’s not the goal, and I think there’s much more risk when software tries to automate processes in medicine, and that’s not what we’re trying to do. (CEO)
Moreover, the tool does not take into account the patient’s history or any information from the clinical investigation (blood samples, blood pressure, etc.), all of which are highly relevant to the final decision. I deem that the clinical information, the patient history, and the clinical signs and the results of the clinical investigation of the patient is very important in telling me what it might be that I’m looking at, and so that’s important to include. It wasn’t included in this product and it isn’t included right now, but I think that’s actually really important in doing that. (Medical annotator)
For the decision-maker, the complexity lies in judging when and how they should factor the information presented in LungX into their own decision-making. Depending on where these boundaries are agreed upon, different xAI needs emerge. The more complex and autonomous the system, the more need for information that explains the why. Furthermore, the tradeoff between complexity and interpretability emerges as the decision-making workflow should be smooth and undisturbed while still generating an informative, reliable explanation. Output Interpretation Concerns Whereas the theme of discretionary boundaries is concerned with when and how the decision-maker should use the aiding tool in their decision-making process, the theme output interpretation is about how the information is presented and interpreted by them. The information presented in the interface is intended to be easily interpreted, stripped of unnecessary noise to support a smooth and fast process. The most prominent feature in the software is the red “bounding boxes” that enclose abnormalities in the image. Their findings are then defined in a table beside the image, as shown in the illustration of the Findings Viewer. It’s basically some sort of explanation of what we think [bounding box]. If there’s a 95% probability of seeing a pleural effusion, the radiologist will think OK, why? Therefore, we have to be able to show exactly the reason and the region in which we think there is a high probability of a pleural effusion. So, this is where the bounding boxes come into the picture, to show the radiologist that these are the regions we think are the most suspicious of a particular finding. (CTO)
188
J. Gerlings et al.
Though the probability score of the findings (i.e., how certain the model is of its classification) is not displayed in the interface, information on the severity of the finding is shown (i.e., how widespread the disease detected in the lungs is). The severity score is considered more difficult to comprehend, as it consists of several underlying factors, such as the severity ranking of the detectable diseases—as decided by the individual clinic—and the ZASS score, which is based on the Brixia scoring system for COVID-19 to measure the extent of the disease [79]. As the severity score is subject to adjustments from clinic to clinic depending on their practices, this could increase the complexity in interpretation of the score. I think that that it [bounding box] mitigates a lot of that misinterpretation risk. At the moment, I think it’s relatively easy to interpret. You know, if there’s a finding or if there’s not, but in terms of actually… the fact that it actually gives a score to quantify how widespread a finding is. I mean, that then can be misinterpreted, and so we have, for example, an explanation of what that score means in the user manual … I think it is hard to communicate risk, particularly when that would then involve some clinical information as well, and then that becomes less easy to explain to a user because you can’t just say, ‘Oh, the lung was divided into the zones’ and, therefore, we kind of get this score. (Product Owner)
How the clinicians interpret the visual information displayed is considered relatively easy when discussing the images and their boundary boxes. Moreover, the red/yellow/green triage field showing which patient should be prioritized first according to severity of disease and type of disease is deemed understandable for prioritizing patients but not for supporting diagnosis. In our software, we also have a triage function, and that’s very much low, medium, high triage. And I think that that is a pretty simple way to communicate risk or urgency to a user. (Product Owner)
The quantified severity scores are deemed somewhat more complicated. Because of this, the company incorporated two measures to accommodate that: training sessions for users on using and interpreting information in LungX to ensure that decision-makers feel comfortable using the values displayed. I think also there are some risks related to more subtle things like interpretation of results […] and what do the results mean, and I think a lot of that has to then come down to making sure that everyone who uses the technology is trained in how to use it. (Product Owner)
Moreover, the current decision-making process involves interaction between colleagues if they doubt a claim made by another clinician. This is seen as difficult to maintain when operating with AI-based systems. I could ask my colleague, ‘Why do you think that this person has pneumonia?’ And he will answer [X, Y, Z], and OK, then I’ll trust him if I agree, but I can’t really ask the network [LungX]. (Medical annotator)
Adoption Concerns However, it is not enough to know how and what to use in a decision-making process if the product is not adopted in the decision-making process. The adoption concerns
7 Explainable AI, But Explainable to Whom? …
189
theme refers to how the employees experience resistance toward the product or express doubt about how it will be received and used. Purely, it comes down to the lack of being able to get a good way of applying it in the clinic. (Clinical Operations Officer)
According to the interviewees, understanding the processes and workflow surrounding the work of an AI-based model is a key concern. The way LungX is incorporated into the overall process influences its adoption in hospitals. Interviewees also expressed a more general concern with using automation tools within medicine, as this field often deals with high-stake decisions. So, I think it’s twofold. I think it’s that you need to make sure that performance is very, very good. But I also think … it’s just very difficult for people to accept automation within medicine I think. (Clinical Operations Officer)
Though LungX is not intended for automation, it has a significant influence on the decision-making process. The development team seems to be aware of potential resistance from medical staff, especially when introduced to new and unknown tools. LungX is one of the first systems of its kind to be implemented in clinics, which may subject it to resistance in many forms. In addition, some clinics might operate with contingencies from other software products that inhibit them to change or alter their current processes. Moreover, the LungX product might not fit exactly what the clinics are looking for. Different pathologies can be of interest to different clinics, so the team risks having spent time on developing a product that is not relevant in real-life scenarios. It’s able to differentiate A from B. You go to the clinic and they say, ‘Well, we can’t implement that because, you know, we use this different product and we can’t change that because the whole region has bought this product. We don’t care about A or B. We care whether it’s A&B or C’. So, you solve the wrong problem or maybe it’s the wrong people that have to implement it. That is where you finally realize you’ve got a hammer and there’s no nails that match it. (Clinical Operations Officer)
This proves that deep knowledge of how the product is intended for use in a reallife scenario can be essential in order to ensure adoption by stakeholders in general. Moreover, the information required by the decision-makers needs to be present for them to benefit from the tool, wherefore the company has initiated the collaboration with hospitals and clinics worldwide. Here, they have acquired knowledge on what kind of information is relevant to stakeholders. However, a wide range of factors come into play in the adoption of new tools, including trust in the product, culture, local work processes and general support from management. Trust Concerns The decision-maker has to make the final call about a patient’s health and further treatment; understandably, this leads them to be very cautious in their decisionmaking, including the information they rely on to make their decision. After all, they are held liable and face the highest risks, both personally and professionally. Because
190
J. Gerlings et al.
using AI in decision-making within medical imaging is still a fairly new concept, the trust factor remains fragile, as with all new things. To have other doctors who are less enthusiastic about new technology, to have them use it because if the software, for instance, makes one mistake, just one mistake, it could be fatal, then the trust in it will be almost lost. (Medical annotator)
To support public trust in the model for those who need more evidence of reliability, the company has scheduled clinical trials. Moreover, the implementation of xAI in the form of the bounding boxes and the color coding for the triage score makes the interface amenable to decision-makers such as radiologists or ICU doctors. I think people are quite nervous around this technology, and I think you kind of need the experiences from those trials in hospitals to convince people to give things a go. (CEO) And the clinical evaluation reports, the clinical trials that we are going to run with the hospitals that will establish the reliability of the system. (CTO)
Though LungX is not automating the entire diagnostic process, it provides information that must be trusted in order to be useful. Interviewees explained that they discuss different cases when in doubt, as they are never always right. The same goes for AI models, wherefore it should never be blindly trusted. Some doctors may rely too much on the information provided and end up letting the model reinforce potentially flawed decisions. Maybe the less experienced doctors would trust it so much that they wouldn’t have a second opinion on the image. (Medical annotator) If the radiologist blindly trusts the algorithm without even looking at the images and just ships the image of… said OK, there’s a lesion in there, then the ‘bug’ is on the radiologist’s desk, and this is why understanding liability is so important — who’s liable for it? (CTO)
All interviewees stressed the fact that LungX is an aiding tool and cannot be held liable in any way, making the radiologist describing the images and doing the diagnostics report the liable party. The risk of being liable, as a software provider, for an undetected case of COVID-19 could have fatal consequences. Hence, accountability boundaries need to be made explicit that would allow the decision maker to critically use the output of the ML by applying their clinical judgment.
7.5.4 Audience Since the case study investigates the development of the LungX product, the typical audience discussed in the interviews was the patients themselves. In the end, clinicians and radiologists are trying to treat and save patients’ lives, and LungX is intended to improve the patient journey by assisting in speedy identification of pneumonia related to COVID-19 in the lungs and its severeness. The interviewees expressed concerns related to mitigating decision consequences for the patients.
7 Explainable AI, But Explainable to Whom? …
191
Decision Consequences The value proposition was to use LungX to quickly identify which patients need the highest priority and which patients needed to be admitted to the hospital. LungX is dealing with high-stakes decisions that involve life or death for patients. In this sense, the interviewees are aware of the severity and how it can improve ‘…the patient journey in terms of disease progression and predict the need for Intensive Care Unit, hospitalization, ICU or even ventilation, respiratory support or death’ (CEO). All interviewees expressed a high level of concern for the patients and managing the high stakes, both in terms of prioritizing the most severe cases and determining how individual cases should be dealt with according to patient history. The high stakes are reflected throughout the model development in the product’s great attention to detail, high-level performance and meticulous process for validating the model’s training data. At the end of the day, it is a tool that interferes with a patient’s life, so to speak… You can’t call back a patient without having consequences, so this is really important to understand. If you have a patient that’s diagnosed wrongly, and the patient gets some sort of an attack, then it’s a lifelong rehabilitation. (CTO)
Presenting LungX as a decision support tool where the decision-maker is still the liable party—rather than a completely automated diagnostic tool—ensures that the patient will preserve the possibility to object to a decision or ask for an explanation. In this case, the company guards this possibility due to the severity of the matter.
7.6 Discussion and Concluding Remarks In this study, we investigated the development efforts of LungX, an AI-based model for diagnosing COVID-19 in patients. In this context, we identified a plethora of xAI needs that emerge during the idea formulation and development phases. Further, we found that these xAI needs emerged through four aggregated concerns related to development, domain expertise, decision-making and audience-related concerns. We also found that all these concerns were expressed either by a specific stakeholder or with a specific stakeholder group in mind (see Fig. 7.6). Explanation concerns expressed by and for the development team have been highly recognized in the literature [6, 47, 81], where developers use xAI frameworks to make sense of the datasets used as well as the inner workings of AI-based models in order to achieve performance results. Similarly, our findings in relation to xAI emerging through decision-making concerns—particularly manifesting in output interpretation as well as adoption and trust concerns—support earlier studies that have found ambiguity as an effect of using AI-based healthcare applications [11] or resistance to using these applications [2]. Further, it became apparent that the need for xAI also emerges from trying to establish discretionary boundaries, a point linked to the problem of accountability when using AI-based systems [82]. xAI needs driven by SME subject matter expert concerns refer to the necessity of domain knowledge
192
J. Gerlings et al.
Fig. 7.6 Concerns that xAI can alleviate depend on the stakeholder group
during the development of AI applications [83]. Finally, audience-related concerns drive xAI needs in order to be able to communicate decisions to patients as well as manage and mitigate decision consequences that may severely impact patients [84]. By highlighting the different concerns as represented by different stakeholder groups, our study provides empirical evidence of multiple stakeholder xAI needs and concerns in relation to AI—in contrast to the usual developer-user perspective [24]. Building more visual explanations may resolve some of these concerns; however, some stakeholder groups might not be satisfied with a visualization of a finding in a picture or red/yellow/green dots indicating the ideal prioritization of patients. Instead, they may require more scientific explanations entailing more complexity [85]. Our study shows that the tradeoff between explanation accuracy and comprehensibility varies among different stakeholders. It appears that the closer a stakeholder works with the AI model, the more accurate an explanation they need. This is not a problem for developers, since their knowledge on AI systems, statistics and computer science is extensive. Our findings indicate that during the development of LungX, the tradeoff point was carefully considered for SMEs and decision-makers (radiologists and ICU clinicians). SMEs and decision-makers need both accurate and comprehensible explanations to be able to validate an AI system; these stakeholders oftentimes lack basic knowledge of AI systems, yet they are expected to be able to validate and rely on them in order to do their work. On one hand, they need accurate explanations to make confident decisions based on the AI output. On the other hand, they also need comprehensible explanations both to trust the system themselves as well as to be able to communicate their decision to the patient. Hence, they need models to offer comprehensible explanations that do not sacrifice accuracy. Finally, for the audience, the balance seems to shift toward more comprehensible explanations that may sacrifice some degree of thorough accuracy. As such, our study responds to recent calls for a stakeholder perspective in xAI research [24, 46].
7 Explainable AI, But Explainable to Whom? …
193
The plethora of concerns from each stakeholder group further reinforces the need for a multidisciplinary approach when developing xAI solutions, as recent conceptual works have called for [24, 26, 35, 36]. Moreover, the study shows a continued demand for understanding the workflow around the different stakeholders to address their specific explanation needs due to output interpretation concerns. The model might be right, but results interpreted and used incorrectly can result in biased decisions and unintended treatment, as in the case of COMPAS. Our findings, however, focus on the emergence of xAI needs during the development of AI-based systems. Future studies should also investigate xAI during the implementation and use of these AI-based applications to address emerging xAI needs, especially as existing and additional stakeholders become more active and have more in-situ insights. While our findings emphasized how the need for xAI emerged during development, future studies could advance our knowledge of how different technical frameworks can assist these stakeholder concerns.
Appendix 1—Technical Aspects of LungX Turning to the more technical aspects of LungX, the solution is built on three different types of convolutional neural networks (CNNs). The first of the three models is a densenet 121 [86], which is able to detect the presence of the six different lung abnormalities as well as their location on a given X-ray. The two additional models are used to calculate the severity score for COVID-19 patients. Only one of the six findings is related to COVID-19, and if this finding (edema/consolidation) is detected, the two additional models will calculate the severity score. A u-net [87] is used to segment the lungs into 12 pre-defined anatomical zones, while a mask-RCNN [88] is applied to segment the opacity in the lungs. When the outputs from the two models are mapped together, it is possible to calculate how many lung zones are affected by opacity. This score of how many of the 12 lung zones are affected is then used as the severity score indicating how badly the lungs are affected (Fig. 7.7). The triage category indicates whether triage is low, medium or high. The score is based on the severity score and the presence of any lung abnormalities the system is able to detect. It is configurable, meaning that the clinics using the system decide how the six abnormalities and the severity score for COVID-19 patients should each rank in relation to one another, as well as the level of triage. The highest triage category detected takes precedence. The data foundation for training the developed model consists of two open-access data sources that are carefully joined to train the full model. Sources combined 112,120 frontal-view X-ray images of 30,805 unique patients from the Kaggle RSNA Pneumonia Detection Challenge in. PNG format and 224.316 radiographs from the CheXpert dataset from 65,240 patients who underwent a radiographic examination from Stanford University Medical Center [89–91]. However, none of these datasets
194
J. Gerlings et al.
Fig. 7.7 Visualization of how the networks in LungX operate together
included examples of COVID-19. COVID-19 examples were only provided from the hospital in collaboration with the university project. Two hundred COVID patients who tested positive for COVID-19 were run only on the models for prediction of disease progression (Fig. 7.8).
Fig. 7.8 Performance metrics for LungX on 200 COVID-19 patients (box A) and on the CheXpert dataset (box B)
7 Explainable AI, But Explainable to Whom? …
195
References 1. A.B. Arrieta et al., Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI, in Inf. Fusion, no. October, 2019. 2. C.J. Cai, S. Winter, D. Steiner, L. Wilcox, M. Terry, ‘Hello Ai’: Uncovering the onboarding needs of medical practitioners for human–AI collaborative decision-making, in Proceedings of the ACM on Human-Computer Interaction, vol. 3, no. CSCW. Association for Computing Machinery, pp. 1–24, 01-Nov-2019 3. A. Adadi, M. Berrada, Peeking inside the black-box: a survey on Explainable Artificial Intelligence (XAI). IEEE Access 6, 52138–52160 (2018) 4. F. Doshi-velez, B. Kim, Towards A Rigorous Science of Interpretable Machine Learning, no. Ml (2017) pp. 1–13 5. Z.C. Lipton, The Mythos of Model Interpretability, no. Whi, Jun. 2016 6. M.T. Ribeiro, S. Singh, C. Guestrin, ‘Why should i trust you?’ Explaining the predictions of any classifier, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. 13–17, Aug 2016, pp. 1135–1144 7. A. Holzinger, C. Biemann, C. S. Pattichis, D.B. Kell, What Do We Need to Build Explainable AI Systems for the Medical Domain?, no. Ml (2017) pp. 1–28 8. U. Pawar, D. O’Shea, S. Rea, R. O’Reilly, Incorporating explainable artificial intelligence (XAI) to aid the understanding of machine learning in the healthcare domain. CEUR Workshop Proc. 2771, 169–180 (2020) 9. J. Phua et al., Intensive care management of coronavirus disease 2019 (COVID-19): challenges and recommendations, in The Lancet Respiratory Medicine, vol. 8, no. 5. Lancet Publishing Group, pp. 506–517, 01 May 2020 10. E. Tjoa, C. Guan, A Survey on Explainable Artificial Intelligence (XAI): Towards Medical XAI, vol. 1 (2019) 11. S. Lebovitz, Diagnostic doubt and artificial intelligence: An inductive field study of radiology work, in 40th Int. Conf. Inf. Syst. ICIS 2019 (2020) 12. T. Davenport, R. Kalakota, The potential for artificial intelligence in healthcare. Futur. Healthc. J. 6(2), 94–98 (2019) 13. T. Panch, H. Mattie, L.A. Celi, The ‘inconvenient truth’ about AI in healthcare. npj Digit. Med. 2(1), 4–6 (2019) 14. A.L. Fogel, J.C. Kvedar, Artificial intelligence powers digital medicine. npj Digit. Med. 1(1), 3–6 (2018) 15. W. Hryniewska, P. Bombi´nski, P. Szatkowski, P. Tomaszewska, A. Przelaskowski, P. Biecek, Do Not Repeat These Mistakes—A Critical Appraisal of Applications Of Explainable Artificial Intelligence For Image Based COVID-19 Detection, no. January, 2020 16. C. O’Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, vol. 272 (2016) 17. A. Shaban-Nejad, M. Michalowski, D.L. Buckeridge, Health intelligence: how artificial intelligence transforms population and personalized health. npj Digit. Med. 1(1) (2018) 18. A. Sharma, S. Rani, D. Gupta, Artificial intelligence-based classification of chest X-ray images into COVID-19 and other infectious diseases. Int. J. Biomed. Imaging (2020) 19. E. Strickland, How IBM Watson Overpromised and Underdelivered on AI Health Care - IEEE Spectrum, 2019. [Online]. Available: https://spectrum.ieee.org/biomedical/diagnostics/howibm-watson-overpromised-and-underdelivered-on-ai-health-care. Accessed: 26 Jan 2021 20. Z. Obermeyer, B. Powers, C. Vogeli, S. Mullainathan, Dissecting racial bias in an algorithm used to manage the health of populations. Science (80-) 366(6464), 447–453 (2019) 21. W.D. Heaven, Google’s Medical AI was Super Accurate in a Lab. Real Life was a Different Story. MIT Technology Review (Online). Available: https://www.technologyreview.com/ 2020/04/27/1000658/google-medical-ai-accurate-lab-real-life-clinic-covid-diabetes-retinadisease/. Accessed 16 Mar 2021
196
J. Gerlings et al.
22. J. Angwin, J. Larson, S. Mattu, L. Kirchner, Machine Bias—ProPublica. ProPublica (Online). Available: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sen tencing. Accessed: 03 Mar 2019 23. T.W. Kim, Explainable artificial intelligence (XAI), in The Goodness Criteria and the GraspAbility Test (2018), pp. 1–7 24. J. Gerlings, A. Shollo, I.D. Constantiou, Reviewing the need for Explainable Artificial Intelligence (xAI), in HICSS 54 (2021), pp. 1284–1293 25. J. Kemper, D. Kolkman, Transparent to whom? No algorithmic accountability without a critical audience. Inf. Commun. Soc. (2019) 26. D.D. Miller, The medical AI insurgency: what physicians must know about data to practice with intelligent machines. npj Digit. Med. 2(1) (2019) 27. J. Burrell, How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data Soc., vol. January (2016) pp. 1–12 28. A. Páez, The pragmatic turn in Explainable Artificial Intelligence (XAI). Minds Mach. (2019) 29. M. Bhandari, D. Jaswal, Decision making in medicine-an algorithmic approach. Med. J. Armed Forces India (2002) 30. J.-B. Lamy, B. Sekar, G. Guezennec, J. Bouaud, B. Séroussi, Explainable artificial intelligence for breast cancer: a visual case-based reasoning approach. Artif. Intell. Med. 94, 42–53 (2019) 31. A. Wodecki et al., Explainable Artificial Intelligence (XAI ) the need for explainable AI. Philos. Trans. A. Math. Phys. Eng. Sci. (2017) 32. S.M. Lauritsen et al., Explainable Artificial Intelligence Model to Predict Acute Critical Illness from Electronic Health Records (2019) 33. N. Prentzas, A. Nicolaides, E. Kyriacou, A. Kakas, C. Pattichis, Integrating machine learning with symbolic reasoning to build an explainable ai model for stroke prediction, in Proceedings— 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering, BIBE 2019 (2019) 34. S. Lebovitz, H. Lifshitz-Assaf, N. Levina, To Incorporate or Not to Incorporate AI for Critical Judgments: The Importance of Ambiguity in Professionals’ Judgment Process, 15 Jan 2020 35. A. Rajkomar, J. Dean, I. Kohane, Machine learning in medicine. N. Engl. J. Med. 380(14), 1347–1358 (2019) 36. M.T. Keane, E.M. Kenny, How case-based reasoning explains neural networks: a theoretical analysis of XAI using post-hoc explanation-by-example from a survey of ANN-CBR twinsystems, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 37. Stanford University, Artificial Intelligence and Life in 2030 (2016), pp. 52 38. L. Reis, C. Maier, J. Mattke, M. Creutzenberg, T. Weitzel, Addressing user resistance would have prevented a healthcare AI project failure. MIS Q. Exec. 19(4), 279–296 (2020) 39. A. Vellido, The importance of interpretability and visualization in machine learning for applications in medicine and health care. Neural Comput. Appl. 32(24), 18069–18083 (2020) 40. A. Holzinger, G. Langs, H. Denk, K. Zatloukal, H. Müller, Causability and explainability of artificial intelligence in medicine. Wiley Interdisc. Rev.: Data Mining Knowl. Discovery 9(4), 1–13 (2019) 41. M. Goldstein, S. Uchida, A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE 11(4), 1–31 (2016) 42. C. Molnar, Interpretable Machine Learning. A Guide for Making Black Box Models Explainable, Book (2019), p. 247 43. Z.C. Lipton, The mythos of model interpretability. Commun. ACM 61, 35–43 (2016) 44. G. Ciatto, M.I. Schumacher, A. Omicini, D. Calvaresi, Agent-based explanations in AI: towards an abstract framework, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12175 LNAI (2020), pp. 3–20 45. L.H. Gilpin, D. Bau, B.Z. Yuan, A. Bajwa, M. Specter, L. Kagal, Explaining explanations: an overview of interpretability of machine learning, in Proceedings —2018 IEEE 5th Int. Conf. Data Sci. Adv. Anal. DSAA 2018 (2019) pp. 80–89
7 Explainable AI, But Explainable to Whom? …
197
46. T. Miller, Explanation in Artificial Intelligence : Insights from the Social Sciences (2018) 47. A. Goldstein, A. Kapelner, J. Bleich, E. Pitkin, Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J. Comput. Graph. Stat. 24(1), 44–65 (2015) 48. R. Brandão, J. Carbonera, C. de Souza, J. Ferreira, B.N. Gonçalves, C.F. Leitão, Mediation Challenges and Socio-Technical Gaps for Explainable Deep Learning Applications. arXiv Prepr. arXiv …, no. Query date: 2020-04-16 13:43:28 (2019), pp. 1–39 49. O. Biran, C. Cotton, Explanation and justification in machine learning: a survey. IJCAI Work. Explain. AI 8–14 (2017) 50. S.T. Mueller, R.R. Hoffman, W. Clancey, A. Emrey, G. Klein, Explanation in human-AI systems: a literature meta-review. Def. Adv. Res. Proj. Agency 204 (2019) 51. R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, F. Giannotti, D. Pedreschi, A survey of methods for explaining black box models. ACM Comput. Surv. 51(5) (2018) 52. A. Asatiani, P. Malo, P.R. Nagbøl, E. Penttinen, T. Rinta-Kahila, A. Salovaara, Challenges of explaining the behavior of black-box AI systems. MIS Q. Exec. 19(4), 259–278 (2020) 53. T. Miller, P. Howe, L. Sonenberg, Explainable AI: Beware of Inmates Running the Asylum (1990) 54. P. Madumal, L. Sonenberg, T. Miller, F. Vetere, A grounded interaction protocol for explainable artificial intelligence. Proc. Int. Joint Conf. Autonomous Agents Multiagent Syst. AAMAS 2, 1033–1041 (2019) 55. W. Samek, T. Wiegand, K.-R. Müller, Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models, Aug. 2017 56. R.R. Hoffman, S.T. Mueller, G. Klein, J. Litman, Metrics for Explainable AI: Challenges and Prospects (2018), pp. 1–50 57. Z. Che, S. Purushotham, R. Khemani, Y. Liu, Interpretable deep models for ICU outcome prediction, in AMIA ... Annu. Symp. proceedings, vol. 2016, no. August (2016), pp. 371–380 58. M. Ilyas, H. Rehman, A. Nait-ali, Detection of Covid-19 From Chest x-ray Images Using Artificial Intelligence: An Early Review, arXiv, pp. 1–8 (2020) 59. C.H. Sudre et al., Anosmia and other SARS-CoV-2 positive test-associated symptoms, across three national, digital surveillance platforms as the COVID-19 pandemic and response unfolded: an observation study, in medRxiv (2020) 60. E. Dong, H. Du, L. Gardner, An interactive web-based dashboard to track COVID-19 in real time, in The Lancet Infectious Diseases, vol. 20, no. 5. Lancet Publishing Group, pp. 533–534, 01 May 2020 61. Y. Hswen, J.S. Brownstein, X. Xu, E. Yom-Tov, Early detection of COVID-19 in China and the USA: Summary of the implementation of a digital decision-support and disease surveillance tool. BMJ Open 10(12) (2020) 62. T. Macaulay, AI sent first coronavirus alert, but underestimated the danger, in The Next Web (2020) (Online). Available: https://thenextweb.com/neural/2020/02/21/ai-sent-first-cor onavirus-alert-but-underestimated-the-danger/. Accessed: 09 Jan 2021 63. M.E.H. Chowdhury et al., Can AI help in screening viral and COVID-19 pneumonia? IEEE Access 8, 132665–132676 (2020) 64. J. Bullock, A. Luccioni, K. Hoffman Pham, C. Sin Nga Lam, M. Luengo-Oroz, Mapping the landscape of Artificial Intelligence applications against COVID-19. J. Artif. Intell. Res. 69, 807–845 (2020) 65. K. Murphy et al., COVID-19 on chest radiographs: a multireader evaluation of an artificial intelligence system. Radiology 296(3), E166–E172 (2020) 66. J. Zhang et al., Viral pneumonia screening on chest X-rays using confidence-aware anomaly detection, in IEEE Trans. Med. Imaging (2020), p. 1 67. X. Li, C. Li, D. Zhu, COVID-MobileXpert: On-Device COVID-19 Patient Triage and Follow-up using Chest X-rays (2020) 68. J.D. Arias-Londoño, J.A. Gomez-Garcia, L. Moro-Velazquez, J.I. Godino-Llorente, Artificial Intelligence Applied to Chest X-Ray Images for the Automatic Detection of COVID-19. A Thoughtful Evaluation Approach (2020) pp. 1–17
198
J. Gerlings et al.
69. R.M. Wehbe et al., DeepCOVID-XR: an artificial intelligence algorithm to detect COVID-19 on chest radiographs trained and tested on a large US clinical dataset. Radiology, 203511 (2020) 70. R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128(2), 336–359 (2016) 71. M.M. Ahsan, K.D. Gupta, M.M. Islam, S. Sen, M.L. Rahman, M.S. Hossain, Study of Different Deep Learning Approach With Explainable AI for Screening Patients With Covid-19 Symptoms: Using CT Scan and Chest X-ray Image Dataset, arXiv (2020) 72. K. Conboy, G. Fitzgerald, L. Mathiassen, Qualitative methods research in information systems: motivations, themes, and contributions. Eur. J. Information Syst. 21(2), 113–118 (2012) 73. P. Powell, G. Walsham, Interpreting information systems in organizations. J. Oper. Res. Soc. (1993) 74. A. George, A. Bennett, Case Studies and Theory Development in the Social Science (MIT Press, Cambridge, MA, 2005) 75. J.M. Bartunek, P.G. Foster-Fishman, C.B. Keys, Using collaborative advocacy to foster intergroup cooperation: a joint insider-outsider investigation. Hum. Relations 49(6), 701–733 (1996) 76. D.A. Gioia, K.G. Corley, A.L. Hamilton, Seeking qualitative rigor in inductive research: notes on the Gioia methodology. Organ. Res. Methods 16(1), 15–31 (2013) 77. K.G. Corley, D.A. Gioia, Identity ambiguity and change in the wake of a corporate spin-off. Admin. Sci. Q. 49(2), 173–208 (2004) 78. J. Corbin, A. Strauss, Basics of Qualitative Research (3rd edn.): Techniques and Procedures for Developing Grounded Theory. SAGE Publications Inc (2012) 79. A. Borghesi, R. Maroldi, COVID-19 outbreak in Italy: experimental chest X-ray scoring system for quantifying and monitoring disease progression. Radiol. Medica 125(5), 509–513 (2020) 80. A. Borghesi et al., Radiographic severity index in COVID-19 pneumonia: relationship to age and sex in 783 Italian patients. Radiol. Medica 125(5), 461–464 (2020) 81. S.M. Lundberg, S.I. Lee, A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst., vol. 2017-Decem, no. Section 2 (2017), pp. 4766–4775 82. M. Veale, M. Van Kleek, R. Binns, Fairness and accountability design needs for algorithmic support in high-stakes public sector decision-making, in Conference on Human Factors in Computing Systems—Proceedings (2018) 83. M. Kuzba, P. Biecek, What Would You Ask the Machine Learning Model? Identification of User Needs for Model Explanations Based on Human-Model Conversations, arXiv (2020) 84. E. Rader, K. Cotter, J. Cho, Explanations as mechanisms for supporting algorithmic transparency,” in Conference on Human Factors in Computing Systems—Proceedings (2018) 85. S. Chari, D. M. Gruen, O. Seneviratne, D.L. McGuinness, Directions for Explainable Knowledge-Enabled Systems, March, 2020 86. G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 2261–2269, Aug 2016 87. R. Selvan et al., Lung Segmentation from Chest X-rays using Variational Data Imputation, August, 2020 88. K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2020) 89. J. Irvin et al., CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison, in 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019 (2019), pp. 590–597 90. Kaggle, RSNA Pneumonia Detection Challenge. Kaggle (2020) (Online). Available: https:// www.kaggle.com/c/rsna-pneumonia-detection-challenge. Accessed: 23 Jan 2021 91. Z. Yue, L. Ma, R. Zhang, Comparison and validation of deep learning models for the diagnosis of pneumonia. Comput. Intell. Neurosci. (2020)
Chapter 8
Pandemic Spreading in Italy and Regional Policies: An Approach with Self-organizing Maps Marina Resta
Abstract The purpose of the chapter is using machine learning techniques (namely Self-Organizing Maps) to catch the emergence of clusters among Italian regions that can eventually contribute to explain the different behaviour of the pandemic within the same country. To do this, we have considered demographic, healthcare, and political data at regional level and we have tried going to the root of interactions among them. In this way, we obtained a model of the relations among variables with good explanatory capabilities, a kind of early-warning system which we hope could be helpful to address further intervention in the battle against COVID-19 pandemic. Keywords Machine Learning · Self-organizing Maps · COVID-19 outbreak
8.1 Introduction Since the end of February 2020, the ongoing COVID-19 outbreak has dramatically afflicted Italy with nearly 90,000 deaths. Basic facts are dramatically known to everyone. At the end of 2019 China reported some cases of pneumonia of unknown causes that rapidly became an outbreak [1]; on January 7th, 2020, the Chinese health authority identified the virus as part of the family of Coronavirus. The rest of January 2020 has been a whirlwind of dramatic events: the first case outside the Chinese border was isolated in Thailand, China entered lockdown, and, on January 30th, 2020, the World Health Organization (WHO) declared the outbreak as a Public Health Emergency of International Concern. In the meantime, on January 31st, 2020, the Italian Government activated the state of national emergency up the end of July, but the situation seemed relatively quiet. However, while on February 11th, 2020 the coronavirus disease was given the name as COVID-19, sudden the Italian situation turned for the worse. On February 21st, M. Resta (B) School of Social Sciences, Department of Economics, University of Genova, via Vivaldi 5, 16126 Genova, Italy e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_8
199
200
M. Resta
in a few hours Italy has experienced: the localization of Patient Zero in a small town on the Po valley, the virus spread to 14 people, the first COVID related dead, and the arising of red zones i.e., the definition of parts of Italy with strong restrictions to circulation. On March 9th, 2020, the Italian government extended limitations to all the country, the first western state to adopt such restrictive measures to the circulation of people, and on March 11th, 2020 the WHO declared the human respiratory disease caused by COVID-19 as a pandemic. Looking at the available data, for sure what catches the eyes is that both the pandemic spreading rate and the deaths incidence on the overall population have been sensitively different in different regions of Italy. This was true during the first wave of the outbreak, with dramatic rates especially in some regions in the Northern and Central part of the country (Lombardy, Emilia Romagna and Marche) and more moderate increments in the Southern part of Italy. A superficial analysis also suggested by the national press, supports the views that the difference was due to many different causes whose combination has generated a kind of explosive mix. The complexity of the issue is related to the fact that determinant variables in that dramatic game were very heterogeneous: demographic variables interacted with political and economic factors. However, a reading key of the events in the light of those interactions is surely appealing and it could be helpful to address National public interventions. To such purpose, we are aimed at analyzing a dataset with variables of various nature (demographic, healthcare, and political) at regional level by way of machine learning techniques, trying to explore how they interacted. Specifically, we employed Self-Organizing Maps to catch emerging clusters between regions that can eventually contribute to explain the different behavior of the pandemic within the same country. In this way, we managed a model of the relations which might be helpful to address further intervention in the battle against COVID-19 pandemic. The structure of the chapter is as follows. Section 8.2 provides the literature review, focusing on the works that have already debated the application of machine learning techniques to this area; Sect. 8.3 introduces the dataset and the related research questions; Sect. 8.4 gives basic details about the algorithm employed in the study and Sect. 8.5 discusses the results. Section 8.6 concludes.
8.2 Related Literature From the scholar’s perspective, the COVID outbreak offered multiple challenges, not only in the medical field, but also involving wide areas of expertise, such as economics, computer science and psychology, just to cite some. One of the key points, concerns the fact that outbreak of COVID-19 posed a high risk to countries with vulnerable health systems around the world [2]; this means that proper tools could help to prevent fast deteriorating situations in terms of number of available beds in hospitals, life quality and political and economic stability. In this respect, Machine Learning (ML) as well as statistical and computational methods
8 Pandemic Spreading in Italy and Regional Policies …
201
have been in the spotlight in a variety of applications. Although concentrated in a single year, the relevant literature has already taken a remarkable size: [3] can give an idea, updated to September 2020, about the number of already published papers, but obviously the number is set to rise again. As said in previous rows, ML applications span a huge variety of fields: [4] combined epidemiological, statistical and neural network approaches to develop projections for the pandemic; [5] applied convolutional neural networks on computed tomography for COVID-19 diagnosis with very high performances, while [6] investigated the cases of Coronavirus diseases with statistical techniques and [7] suggested applying a mathematical model based on nonlinear ordinary differential equations to forecast the spreading dynamics of the infection. Furthermore, [8] employed a battery of Artificial Neural Networks (ANN) prediction models to estimate the confirmed cases of COVID-19 in various countries, [9] used spatiotemporal graph neural networks to get COVID-19 forecasting and [10] combined nonlinear autoregressive neural networks to have several modules designed to be efficient predictors for each country. Finally, [11] proposed recurrent neural networks (RNN) to predict daily infection in Brazil; [12] applied RNN to estimate the exposure risk to COVID and [13] used classical feed forward neural networks to predict and forecast the behavior of COVID’s infection in US, UK, France, China, and Korea. Geographical tracking is also considered in a considerable number of papers; for instance [14] analyzed an artificial intelligence framework to detect COVID patterns into the population subject to quarantine treatments, and [15] discussed a model for mapping contagion by way of GIS technologies. Deep learning (DL) models have been extensively employed, too: [16] examined a model to predict commercially available antiviral drugs successful against COVID disease, while [17, 18] applied DL for detection and diagnosis of COVID-19 from chest x-ray images and [19] highlighted the sensibility of DL models to perturbations in input images that can induce to failure in classification tasks. This chapter nests into the stream already examined by contributions such as [20, 21], offering visual insights about transmission dynamics of the COVID disease. In particular, we are going to apply the Self Organizing Map (SOM) algorithm as done in [21], but similarities between the two works end at this point. In fact, the research efforts in [21] are concentrated to analyze COVID patterns among various countries in the world; in our case, on the other hand we are interested at analyzing whether combining a battery of demographic, political and healthcare indicators was possible to ex-ante match most at risk Italian regions according to our model to those effectively hit harder in the first wave of the pandemic (from the end of February 2020 to the end of June 2020). As a matter of fact, Italy has been the first Western country to be badly affected by the COVID-19, but the diffusion of the virus as well as its incidence has been sensitively different by regions.
202
M. Resta
8.3 Data and Research Questions The dataset in use reflects our goal to search similarity patterns among Italian regions to explain different responses to the COVID-19 disease. We tried at addressing two separate research questions (RQ): RQ1 Is it possible an ex-ante evaluation of the impact of demographic, healthcare and political choices on the regions capability to react to the first wave of the pandemic? RQ2 Is it possible to employ the mapping of such variables to estimate the exposition risk of regions to other outbreak waves? To such aim, we have examined 2017 data (the most recent ones as available from the National Institute of Statistics–ISTAT–database) for each region, as we identified a knotty point in the organization of Italy at regional level. Going deep to the root of the issue, although Italy is a single state, since 1970 regions have been established as the first-level constituent entities of the Italian Republic. There are 20 regions, each with a statute serving as a regional constitution, which determines their government form as well as fundamental principles of the organization and the functioning of the region, as prescribed by the Article 123 in the Constitution of Italy. The regions have exclusive legislative power with respect to any matters not expressly reserved to state law (Article 117 of the Italian Constitution). Yet their financial autonomy is quite modest: they just keep 20% of all levied taxes, mostly used to finance the region-based healthcare system. Article 116 of the Italian Constitution granted home rule to five regions: Sardinia, Sicily, TrentinoAlto Adige/Sudtirol, Aosta Valley and Friuli Venezia Giulia, allowing them some legislative, administrative, and financial power to a varying extent, depending on their specific statute. These regions became autonomous in order to take into account cultural differences and protect linguistic minorities. Table 8.1 lists the Italian regions highlighting basic facts including the acronyms in use throughout the chapter, the status and the political orientation. Talking about Italian politics is generally a delicate issue. However, to the aim of the paper, we can ease the point by observing that, very roughly, regions ruled by Conservative-oriented parties are rooted on the principles of the neo-liberal market economy and unrestrained competition with lower attention to the public service mission. The opposite, at least in principle, happens in case of regions ruled by democratic-oriented parties. Our dataset comes from data collected from the National Institute of Statistics (ISTAT) database for each region; the most recent records date back to 2017, except for the Political Party Orientation, now indicated as Government Form (GOV) which is updated at the end of November 2020. For Trentino Alto Adige, due to the specificity of the government not only at regional but also at provincial level, that is an an intermediate stage between regions and municipalities, we split data into three parts, by referring to TAA as from Table 8.1, as well as to the two autonomous provinces separately (Bozen–BZ and Trento–TN). In addition to Italian regions, we have also
SIC
TAA
TUS
UMB
VEN
Trentino-Alto Adige
Tuscany
Umbria
Veneto
SAR
Sardinia
Sicily
MOL
PIE
Marche
Piedmont
MAR
Lombardy
Molise
LIG
LOM
Liguria
LAZ
Campania
Lazio
CAM
Calabria
ER
CAL
Basilicata
FVG
BAS
Apulia
Friuli–Venezia Giulia
APU
Aosta Valley
Emilia–Romagna
ABR
AV
Abruzzo
ID
Name
Ord
Ord
Ord
Aut
Aut
Aut
Ord
Ord
Ord
Ord
Ord
Ord
Aut
Ord
Ord
Ord
Ord
Ord
Aut
Ord
Status
4,905,854
882,015
3,729,641
1,072,276
4,999,891
1,639,591
4,356,406
305,617
1,525,271
10,060,574
1,550,640
5,879,082
1,215,220
4,459,477
5,801,692
1,947,131
562,869
4,029,053
125,666
1,311,580
Pop
18,020
8,464
22,987
13,606
25,832
24,100
25,387
4,461
9,401
23,864
5,416
17,232
7,924
22,453
13,671
15,222
10,073
19,541
3,261
10,832
Area
267
104
162
79
194
68
172
69
162
422
286
341
153
199
424
128
56
206
39
121
Pop1
Venice
Perugia
Florence
Trento
Palermo
Cagliari
Turin
Campobasso
Ancona
Milan
Genoa
Rome
Trieste
Bologna
Naples
Catanzaro
Potenza
Bari
Aosta
L’Aquila
Capital
Luca Zaia
Donatella Tesei
Eugenio Giani
Arno Kompatscher
Nello Musumeci
Christian Solinas
Alberto Cirio
Donato Toma
Francesco Acquaroli
Attilio Fontana
Giovanni Toti
Nicola Zingaretti
Massimiliano Fedriga
Stefano Bonaccini
Vincenzo De Luca
Antonino Spirlì (acting)[a]
Vito Bardi
Michele Emiliano
Erik Lavévaz
Marco Marsilio
President
League
League
Democratic Party
South Tyrolean People’s Party
Diventerà Bellissima
Sardinian Action Party
Forza Italia
Forza Italia
Brothers of Italy
League
Cambiamo!
Democratic Party
League
Democratic Party
Democratic Party
League
Forza Italia
Democratic Party
Valdostan Union
Brothers of Italy
Political Party
Cons
Cons
Dem
Cons
Indep
Indep
Cons
Cons
Cons
Cons
Cons
Dem
Cons
Dem
Dem
Cons
Cons
Dem
Indep
Cons
Orientation
Table 8.1 List of Italian regions. For each region, we highlighted: the acronym in use throughout the chapter (ID), the status, i.e. Ordinary (Ord) or Autonomous (Aut), the population (Pop) and the percentage with respect to the overall population of Italy (% IT Pop), the Area, the population density (Pop1) the Capital, the President, the Political Party and the Political Party Orientation (PPO), distinguishing among Conservative (Cons), Democratic (Dem) and Independent (Ind)
8 Pandemic Spreading in Italy and Regional Policies … 203
204
M. Resta
considered data for two macro-areas roughly matching to the first-level NUTS 1 for the European Union, that is Center (C) including: Lazio, Marche, Tuscany, Umbria, and North-Center (CN) which is like C but with the addition of Emilia–Romagna. The dataset considers for each region the following variables. • Population (POP) as defined in previous rows. • Government form (GOV) i.e. a number in the interval [−1, + 1], where at the extremes we find ultra-conservative (−1) and democratic (+1) parties, while intermediate positions at both sides are moduled by way of a sigmoidal function. • The Average Age at Child’s Birth (AAB) is the mean age of women on giving birth to their first child. These data are relevant as Italy is registering one of the highest values among EU countries. • Old Age Dependency Ratio (OADR) is a demographic index computed as the number of individuals aged 65 and over per 100 people of working age defined as those aged between 20 and 64. It measures the burden caused by non-working people on the nation’s working-age population: the higher the dependency ratio, the greater the burden. • Population density (POP1) is the number of individuals per square meter: including it is motivated by [22] highlighting a significant positive linear correlation between population density and cases, deaths, and case-fatality rate. • Life Expectancy i.e., the average time an organism is expected to live. Records for this variable were split into two groups depending on gender: ELdF and ELdM. • Rate of natural increase (NGR) is a statistic calculated by subtracting the crude death rate from the crude birth rate of a given region. This rate gives an idea of how population is growing. • Alcohol Consumers Over 14 (ACO14) and Smokers Over 14 (SO14) monitor both underage and adult drinking/smoking. • Obesity over 18 (OO18) was included into the database, as [23] pointed on having obesity increases the risk of severe illness from COVID-19. • Hospitals Migration Index (HMI) together to Hospitals Immigration Index (HII) are indexes analysing patients’ migration inside and out a region to choose where they wish to be treated. • Number of Hospitals (NH) is the number of hospitals per region. • Number of Available Beds on Average (NAB) refers to the mean value of hospitals bed available in each region. • Ordinary Hospitalized Cardio-Vascular (OCVH) and Ordinary hospitalized for tumors (OHT) report yearly data on average for each region. They were inserted into the database as cardio-vascular diseases as well as tumors have been often indicated as comorbidity cause in association to COVID-19 • Ordinary beds demand (OBD) is the usual (that is the normal) number of occupied beds per region each year. • Cardio-Vascular mortality rate (CVMR) and Tumors mortality rate (TMR) are demographic variables expressing the rates of mortality due to cardio-vascular diseases and tumors, respectively.
8 Pandemic Spreading in Italy and Regional Policies …
205
• Family Expenses for Healthcare (FHE) and Healthcare expenses per capita (HEPC) are the yearly family and per capita expenses for healthcare, respectively. The final dataset is then a 24 × 22 matrix, made by the 24 regions as behind explained, and the 22 variables described in previous rows. All the variables except GOV have been scaled in the interval [−1,1] according to the formula: x i − min x i max x i
(8.1)
where xi is the 24 × 1 column of the i-th variable (i = 1,…,22) and min x i (max x i ) is the minimum (maximum) value in it.
8.4 Methodology The Self-Organizing Map or SOM [24, 25] is a computational model extending the intuition of Willshaw and Von Der Malsburg [26, 27] who discovered that some areas of the brain develop specialized structures in different areas, with a high sensitivity for a specific input pattern. The SOM algorithm is an ensemble of computational tasks aimed at mimicking the neuro-biological process, which maps different sensory inputs onto corresponding areas of the cerebral cortex in an orderly fashion. The key elements in the biological process are competitive learning and the winner takes all principle: all the units are excited with the same signal, but only one will produce the highest response thus automatically becoming a candidate to the receptive basin for that specific pattern. The Self-Organizing algorithm goes one-step further, generalizing the winner takes all idea into that of the winner taking the most. According to this principle, when a pattern is presented to the SOM, the related information is retrieved not only by the best neuron, but also by its closest neighbors, according to a proper (mathematical) similarity criterion. In this way, neurons in the map organize themselves, and connectivity structures are formed, which are topology preserving with respect to input data, that is: similar input items are located close to each other in the 2-D projection plane. The SOM training can be summarized in the following steps performed in a sequential way. Let us denote by x an input pattern, then: 1. 2. 3.
Evaluate the distance between x and each neuron of the SOM. Select the neuron (node) with the smallest distance from x. This is the winner neuron or Best Matching Unit (BMU). Correct the position of each node according to the results of Step 2, in order to preserve the network topology.
Steps 1–3. can be repeated either once or more than once for each input pattern: a good stopping criterion generally consists in taking a view to the Quantization Error
206
M. Resta
(QE), i.e., a weighted average over the Euclidean norms of the difference between the input vector and the corresponding BMU. When QE goes below a proper threshold level, say for instance 10−2 or lower, it might be suitable to stop the procedure. In this way, once the learning procedure is concluded, the organization of the SOM is the projection of the input space into a lower dimensional space with closer nodes representing neighbor input patterns. Moreover, the very added value of the SOM is that it offers a platform for further visual investigations towards three directions. First, the direct output of the SOM procedure is the so-called U-Matrix whose nodes (hexagons) are colored according to their distance one to each other. In a graded scale, colors at the top correspond to large distances among vectors in the input space, while colors at the bottom indicate closeness among the vectors also the input space. In case of well-defined input features, like in the case of customers segmentation [28] areas colored in a similar way might be thought as clusters; however, in the case under examination, the original situation might be more blurred, so that the U-Matrix represents only an intermediate step, and it is necessary to perform an additional k-means [29] analysis to identify unambiguous clusters. In order to be consistent with what we have said, in this study we will show the final map, resulting after the k-means treatment. Second, it is possible to visualize the contribution of each variable to the input space vectors by using Components Maps (CMs). Each CM can be interpreted as a sliced version of the SOM visualizing the distribution of the input data with respect of the examined component: in the examined case, we have therefore 22 CMs. By comparing different slices, it is possible to derive whether two components correlate or not and hence to calculate the strength of their relations and the relative importance inside the input vectors. For a better understanding, CMs are often represented in full colors, unlike the U-Matrix which is commonly visualized in gray scales. The reading key for colors is twofold. With respect to input data, if nodes in the CM are colored either in the same way or with similar color shades, then input vectors mapped in that area exhibit similar behavior. With respect to input variables, examining CMs by couples helps in correlation hunting: areas colored in similar fashion in two maps underpin the existence of positive correlation between variables; on the contrary, when similar areas are associated to sensitively different colors then related input variables are negative correlated. Finally, results of the components maps may be better and easily appreciated by building the ‘DNA matrix ‘, an ad-hoc tool consisting in a colored matrix whose rows (in our case: the regions) are variously colored depending on the color associated to each variable in the component maps. Color conventions as discussed and described earlier apply. In this way, it is possible to obtain an overall representation of the “DNA features” of each input pattern without losing the simplicity of the 2-D visualization as offered by the SOM.
8 Pandemic Spreading in Italy and Regional Policies …
207
8.5 Analysis We applied the SOM algorithm on our dataset examining various grid dimensions, choosing the best one, with respect to QE index values. We therefore describe and discuss the results obtained by training a 4 × 4 map with a hexagonal topology, reaching a QE of 0.00752. To enhance the knowledge discovery process, at the end of the SOM learning procedure we run an incremental k-means clustering procedure, which stopped once reaching the lowest average distance between clusters (less within-group distance between data points in the cluster), that is for an overall number of four clusters and we divided the SOM accordingly. Figure 8.1 shows clusters organization. Clusters (CL) are obtained depending on the intra-group distance: nodes in CL01 are those with the lowest intra-group distance, i.e., regions they represent have very similar patterns with respect to the variables examined in the study. Gray shades subsequently aim at identifying whether nodes are similar one to each other (and hence grouped together) or not (and hence put into different groups): black indicates the greater closeness, while shading towards light gray is associated to lower levels of similarity within the same group. Table 8.2 shows clusters composition in the map. The component maps for each variable are shown in Fig. 8.2, however, to emphasize the usefulness of the clusters division we are going to discuss clusters features by way of the DNA matrix accompanied by a more traditional investigation based on the descriptive statistics of each group. Combining the DNA matrix to descriptive statistics offer complementary information working back and forth between input and neural space. The DNA Matrix,
Fig. 8.1 The SOM clusters
208 Table 8.2 Clusters composition
M. Resta Cl01
Cl02
Cl03
Cl04
MOL
LAZ
EMI
ABR
PIE
LOM
FVG
BAS
APU
LIG
CN
BOZ
SIC
MAR
CAL
TUS
VEN
CAM
SAR
CEN
TAA TN UMB AV
Fig. 8.2 The component planes
in fact, works in the neural space and visualizes how close representative neurons (i.e., the neurons mapping regions) are one to each other; this, in turn, may help in understanding the closeness of the regions therein mapped. Descriptive statistics, on the other hand, examine the concept of similarity in the input space: provided the clusters organization inflated on the regions through the SOM, they explore how close the regions are and at what level within the cluster. The DNA matrix is given in Figure 8.3. Multiple reading keys are suggested by the DNA-matrix. Regions are ordered according to the cluster they belong to, that is, rows from 1 to 10 collect regions in the first cluster, while rows from 11 to 15 gather regions in the second cluster, finally rows 16–18 and 19–22 group regions in the third and fourth cluster, respectively. Next, we can analyze the matrix both by rows and by columns. By rows, we
8 Pandemic Spreading in Italy and Regional Policies …
209
Fig. 8.3 The DNA matrix: rows indicate regions, columns show the variables. For each region the variable color corresponds to the one in the related component map
can search for regions sharing similar DNAs, i.e. similar sequence of colors, and hence similar patterns of behavior. Consider for instance regions in rows 16–18: this is a small cluster, but looking at how colors alternate by row, we can observe that while EMI and FVG seem homogeneous under the profile of all the examined indicators, the same does not apply to CN, the central macro-region. This makes sense as while EMI and FVG are single regional entities, CN groups together spatially close regions that clearly are not so alike under the examined profiles. Studying the DNA matrix organization by columns, on the other hand, makes it possible to explore how many regions correlate with respect to the model variable. A homogeneous color by column means that neurons behave in the same way with respect to the examined variable, and hence that mapped regions will move towards the same direction; on the contrary different column colors are associated to non-homogeneous (i.e., not towards the same directions) behavior: for example, we can observe that EldF is positively correlated to EldM and negatively to NGR. Finally, looking to descriptive statistics therein represented with the aid of a boxplot in Fig. 8.4 suggests to exam in deepest detail the behavior at cluster levels of the variable with higher dispersion, namely: AAB, OADR, POP1, ELDF, ELDM, NGR, NAB, OHT, OBD and CVMR.
210
M. Resta
Fig. 8.4 Boxplot for the model variables
By combining these instances, we are now ready to answer the first research question: RQ1: Is it possible an ex-ante evaluation of the impact of demographic, healthcare and political choices on the regions capability to react to the first wave of the pandemic? Looking at the results, the answer, in our opinion, is affirmative. Cross referencing the outcomes of our study to the mortality statistics from the first COVID-19 wave, available from the Italian Civil Protection authority, we observe that the most affected regions have been mostly those in clusters CL01 and CL02 that are characterized by patterns with respect to demographic and medical variables. In particular, higher values with regards to the age at childbirth (AAB), the old age dependency ratio (OADR), the natural growth rate (NGR), and the cardio-vascular mortality rate (CVMR) combined with (relatively) low number of available beds (NAB) and expected life at birth (EldF, EldM) have determined an explosive mix which led to the infamous situation of March–April 2020. On the other hand, looking at the SOM data, we cannot assert the existence of direct dependence between the situation and the regions political orientation; nevertheless, we can claim existing some indirect responsibilities, as healthcare decisions largely depends on regional politics and hence from the more or less economic liberal vocation of the region government. Now those that remains to make is to answer the second research question: RQ2 Is it possible to employ the mapping of such variables to estimate the exposition risk of regions to other outbreak waves? To do this, we run the following procedure. We replaced in the input matrix the NGR variable with the Average Mortality Rate in 2020 at the end of June assumed as
8 Pandemic Spreading in Italy and Regional Policies …
211
Fig. 8.5 Paths of spreading for the second wave as suggested by our model. Shading from red (highest value) to yellow (lower values) depends on the average increases in deaths in the period January–June 2020
the (unofficial) date for the end of the first wave, because of the number of COVID-19 deaths very close to zero. We then computed the Euclidean distance between each BMUs and the modified input matrix. Ordering the distances (and hence the BMUs) in descending order, we were then able to draw a trajectory that starting from VEN (the node associated to the highest distance) touches all the BMUS. We use this line as a proxy for the timing of the second wave: the closer the BMU to VEN, the fast the time the region will be interested by a new surge of the COVID disease. Results are provided in Fig. 8.5. Also, in this case if we look at the still-evolving second COVID-19 outbreak, we can claim that at the present stage our methodology seems showing a good prediction ability.
8.6 Analysis In this chapter we have examined and discussed an application of the Self-Organizing Map (SOM) as a policy instrument and we tested to what extent it was possible to use the SOM to answer two distinct research questions. The first research question pertained testing whether the SOM was able to analyse demographic and healthcare data of Italian regions giving an ex-ante map of the vulnerability of the system to the pandemic. Empowering the SOM visual capability, in particular thanks of the DNA matrix, highlighted the emergence of significant patterns that can provide some plausible explanations about the infamous situation of some regions at the beginning of the first wave of the COVID outbreak. Second, we suggested a way to employ the SOM outcomes in a predictive way, working as a kind of early warning system, to prevent or at least contain most difficult situations under the pandemic pressure. To
212
M. Resta
do this we suggested a method to build a trajectory on the SOM model to evaluate the exposure risk to the COVID-19 recurrence, and the results seem confirming that Kohonen’s maps are a very powerful tool to define prevention strategies.
References 1. J.-H. Tian, Y.Y. Pei, M.-L. Yuan et al., A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020) 2. C. Sohrabi, Z. Alsafi, N. O’Neill, M. Khan, A. Kerwan, A. Al–Jabir, R. Agha, World Health Organization declares global emergency: a review of the 2019 novel coronavirus (COVID–19). Int. J. Surg 76, 71–76 (2020) 3. B.S. Santos, I. Silva, M.D.C. Ribeiro–Dantas, G. Alves, P.T. Endo, L. Lima, COVID–19: A scholarly production dataset report for research analysis. Data Brief. 32, 106178 (2020). 10.1016 4. S. Uhlig, K. Nichani, C. Uhlig, K. Simon, Modeling Projections for COVID–19 Pandemic by Combining Epidemiological, Statistical and Neural Network Approaches. medRxiv preprint (2020). https://doi.org/10.1101(2020.04.17.20059535 5. T.D. Pham, A comprehensive study on classification of COVID–19 on computed tomography with pretrained convolutional neural networks. Sci. Rep. 10, 16942 (2020) 6. S.A. Sarkodie, P.A. Owusu, Investigating the cases of novel Coronavirus Disease (COVID–19) in China using dynamic statistical techniques. Heliyon 6(4), e03747 (2020) 7. L. Zhong, L. Mu, J. Li, J. Wang, Z. Yin, D. Liu, Early prediction of the 2019 novel coronavirus outbreak in the mainland China based on simple mathematical model. IEEE Access (2020). https://doi.org/10.1109/ACCESS.2020.2979599 8. H.R. Niazkar, M. Niazkar, Application of artificial neural networks to predict the COVID-19 outbreak. Glob. Health Res. Policy 5, 50 (2020). 10.1186 9. A. Kapoor, X. Ben, L. Liu, B. Perozzi, M. Barnes, M. Blais, S. O’Banion Examining COVID–19 Forecasting using Spatio–Temporal Graph Neural Networks (2020). arXiv preprint arXiv:2007.03113 10. P. Melin, J.C. Monica, D. Sanchez, et al., A new prediction approach of the COVID-19 virus pandemic behavior with a hybrid ensemble modular nonlinear autoregressive neural network. Soft. Comput. (2020) 10.1007 11. M. Hawas, Generated time-series prediction data of COVID–19’s daily infections in Brazil by using recurrent neural networks. Data Brief 32, 106175 (2020) 12. R. Pal, A.A. Sekh, S. Kar, D.K. Prasad, Neural network based countrywise risk prediction of COVID–19. Appl. Sci. 10, 6448 (2020). 10.3390 13. S.K. Tamang, P.D. Singh, B. Datta, Forecasting of Covid–19 cases based on prediction using artificial neural network curve fitting technique. GJESM 6, 53–64 (2020) 14. A.S.R.S. Rao, J.A. Vazquez, Identification of COVID–19 can be quicker through artificial intelligence framework using a mobile phone-based survey in the populations when Cities/Towns are under quarantine. Infect. Control Hosp. Epidemiol. (2020). https://doi.org/10.1017/ice.202 0.61 15. M.N. Kamel Boulos, E.M. Geraghty, Geographical tracking and mapping of coronavirus disease COVID–19/severe acute respiratory syndrome coronavirus 2 (SARS–CoV–2) epidemic and associated events around the world: how 21st century GIS technologies are supporting the global fight against outbreaks and epidemics. Int. J. Health Geogr. 19, 8 (2020) 16. B.R. Beck, B. Shin, Y. Choi, S. Park, K. Kang, Predicting commercially available antiviral drugs that may act on the novel coronavirus (SARS-CoV-2) through a drug–target interaction deep learning model. Comp. Struc. Biotech. J. 18, 784–790 (2020)
8 Pandemic Spreading in Italy and Regional Policies …
213
17. A. Khan, J.L. Shah, M.M. Bhat, CoroNet: a deep neural network for detection and diagnosis of COVID–19 from chest x–ray images. Comput. Methods Programs Biomed. 196, 105581 (2020) 18. H. Mukherjee, S. Ghosh, A. Dhar, S.M. Obaidullah, K.C. Santosh, K. Roy, Deep neural network to detect COVID–19: one architecture for both CT scans and chest X-rays. Appl. Intell. (2020). https://doi.org/10.1007/s10489-020-01943-6 19. H. Hirano, K. Koga, K. Takemoto, Vulnerability of deep neural networks for detecting COVID– 19 cases from chest X–ray images to universal adversarial attacks. PLoS One (2020). https:// doi.org/10.1371/journal.pone.0243963 20. Gao P, Zhang H, Wu Z, Wang J (2020) Visualising the expansion and spread of coronavirus disease 2019 by cartograms. Environ. Plann. A. https://doi.org/10.1177/0308518-20910162 21. P. Melin, J.C. Monica, D. Sanchez, O. Castillo, Analysis of Spatial Spread Relationships of Coronavirus (COVID–19) Pandemic in the World using Self Organizing Maps. Chaos, Solitons Fractals, vol. 138 (2020), p. 109917 22. A. Ilardi, S. Chieffi, A. Iavarone, C.R. Ilardi, SARS–CoV–2 in Italy: population density correlates with morbidity and mortality. Jpn. J. Infect. Dis. 22; 74(1), 61–64 (2021) 23. Obesity Worsens Outcomes from COVID-19 (2020) CDC report. https://www.cdc.gov/obe sity/data/obesity-and-covid-19.html 24. T. Kohonen, Self–organized formation of topologically correct feature maps. Biol. Cybern. 43, 59–69 (1982) 25. T. Kohonen, Self–Organized Maps (Springer, Berlin, 1997) 26. D.J. Willshaw, C. von der Malsburg, How patterned neural connections can be set up by self–organization. Proc. R. Soc. Lond. B 194, 431–445 (1976) 27. D.J. Willshaw, C. von der Malsburg, A marker induction mechanism for the establishmentof ordered neural mappings: its application to the retinotectal problem. Philos. Trans. R. Soc. Lond. B 287, 203–243 (1979) 28. P. Hanafizadeh, M. Mirzazadeh, Visualizing market segmentation using self–organizing maps and Fuzzy Delphi method—ADSL market of a telecommunication company. Expert Syst. Appl. 38(1), 198–205 (2011) 29. J.B. MacQueen, Some methods for classification and analysis of multivariate observations, in Proceedings of 5th Berkeley Symposium Math Statistics and Prob. University of California Press (1967), pp. 281—297
Chapter 9
Biases in Assigning Emotions in Patients Due to Multicultural Issues David Casacuberta and Jordi Vallverdú
Abstract The use of AI in medical scenarios is currently biased by a conceptual bias: universal emotions. A real analysis of emotions in real health procedures shows a completely different perspective: emotional responses in medical practices are heavily culturally mediated. Therefore, situated and multicultural approaches must be implemented into medical practices, taking special care of emotional variations. Considering the fundamental role of emotional wellbeing for the good recovery and health advances of options, such variability must be identified and implemented. Here is suggested a way of applying such advice. Keywords Universal emotions · Situated · Pain · Multicultural · Care
9.1 Introduction One of the most critical aspects of health care is related to the lack of universal response patterns of patients towards treatments. Due to plenty of variables, some individuals (gender, age, occupation, …), while some other social (sports trends, food habits, and availability, …), medical success is a complex thing to achieve, as it requires collaboration from patients (and their surrounding conditions). We have evidence obtained by medical anthropologists through studies on emotions in healthcare procedures [73, 68], and thanks to them we know that patients react very differently according to the cultural and personal backgrounds. This challenge can be partially solved using new data analysis technologies. With the support of rich multimodal and comprehensive data about patients, thanks to AI data services, the best approach for medical treatment could be adapted to the specificities of the target patient. Nevertheless, this approach conveys several ethical, legal, and even epistemic challenges. On the other hand, robotic facilities could also D. Casacuberta · J. Vallverdú (B) Philosophy Department, Universitat Autònoma de Barcelona, Plaça Civica, 08193 Bellaterra, Spain e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_9
215
216
D. Casacuberta and J. Vallverdú
help to manage part of such cure interactions [55], although, again, introduce new complexities into the general health practices analysis. In this chapter we will argue how an automatic approach to distinguish among such different responses can be helpful indeed, but we will also point out the limits of a pure automatic approach to gather and process such data and the importance to have human experts supervising the process and the result, to assure that the system implemented will be reliable as well as just. We see two major difficulties when trying to build an AI system able to process different types of responses due to multicultural issues. First, to grasp the different types of responses and attitudes of multicultural patients, we know that it is not enough to compile and process data about beliefs and attitudes, but it is also important to consider nonconceptual responses and behaviors that have been apprehended by the patients through cultural exposure, in a non-declarative way. Such aspects may evade current AI techniques [8, 20]. Secondly, it is also important to consider the type of biases that such a complex task to automate may generate. Therefore, in the paper, we will discuss the most common biases that can originate from the treatment of data within the machine learning paradigm. And how they could disturb the results and their reliability.
9.2 The Non-Universality of Emotions Despite the great and recent advances in the understanding of the crucial cognitive role of emotions, there are still two general related misunderstandings: to consider both cognition and emotions as universal human processes. But it is not the case. Several studies from a broad range of fields (anthropology, linguistics, sociology, medicine) show us how emotions are geographically and culturally situated. Although it would be really helpful and easy to work with a clear set of universal basic emotions, like the famous Ekman’s model of 6 emotions, the truth is that it is an oversimplification that produces several pitfalls and problems to serious researches [26]. Several detailed pieces of research show us that the number of emotions is big: from Plutchick 8 basic emotions [54], to 27 categories of emotion [16] to close 100 core feelings [50]. Such divergences were soon noticed by authors like Ortony and Turner [51], who not only displayed a strong range of opinions about basic emotions but also studied the complexity of defining sub-boundaries among them (as primary, secondary, and tertiary emotions, Parrott [52]. Besides such taxonomic differences, soon other deeper problems emerged: not only we cannot define some emotions as basic, but we either cannot identify specific emotions as sharing the same meaning. Take for example the fear face, which is interpreted as a threat face for Melanesian society [17]. Smiling faces follow a similar interpretation pattern: they do not always convey happiness [39]. Therefore, and as a consequence of previous statements supported by empirical evidence, we can affirm that emotions are eminently a cultural phenomenon [44].
9 Biases in Assigning Emotions in Patients Due …
217
A very interesting exercise is to look for untranslatable emotional words from plenty of languages [58]. Of course, once described the emotional core meaning, a non-native can understand the feeling of such an untranslatable word [43], although it doesn’t imply having an own way to use such a bunch of ideas and feelings in a compressed lexicographic way. In any case, the most comprehensive linguistic study on emotional words across 2474 spoken languages using colexification [31], shows a curious process: emotional words have some similar origin (due to embodied nature of experience), but at the same time the way by which different emotions are related to differs from language to language. In any case, the situated bodily nature of emotional language is clearly influenced by cultural forces. But even agreeing on the cultural weight of emotions, we have to consider also internal divergences between people living in the same culture with different cultural backgrounds, like was studied by Zborowski [73]. North-American patients hosted in the same hospital but with different cultural backgrounds (Italoamerican, Irish, Jew, WASP) not only felt and expressed pain in different ways, but they also showed different degrees of recovery. Other variables, like gender, or age must also be included as generators of variance among emotional experiences [6, 69]. Even religious practices can convey specific emotional responses, or even emotional alignment [25]. Finally, the role of social aspects of emotional contagion must be taken into account, especially those related to social networks [38]. All these pieces of evidence provide us clear lessons: there is not a universal deterministic bodily grammar of emotions, nor their elicitation follows the same causes. Besides, the number of variables implied into the elicitation of emotions and their social contagion follow a broad set of rules.
9.3 Emotions in Medical Contexts “From 1 to 10 let me know how painful is your pain” is surely one of the most common and sad phrases in practical medical contexts. Common, because we still do not have any objective way to quantitatively measure the human experience of pain, and sad because it offers us a blurred approach to emotional experiences across human beings. If pain is a clue for the understanding of the real cause of some malfunction, then it is crucial to know its exact nature. Unfortunately, humans differ greatly in the ways and degrees by which they experience pain, as we explored in the previous section. In some cultures, such fuzzy disposition is captured using ingenious ways, like the onomatopoeic sets in Japanese concerning pain expression by patients [1]. When people attend to the medical consultation, they know about the complexity of defining and localizing their painful feelings, and therefore have been created some onomatopoeic resources for conveying such qualia [66]: • ムカムカ (Muka-muka): for a throbbing pain, when you feel queasy or nauseous.
218
D. Casacuberta and J. Vallverdú
• ズキズキ (zuki-zuki): for a throbbing pain, describing both sharp and dull aches, but is most commonly used for toothaches. • シクシク(shiku-shiku): This is usually, but not exclusively used to describe stomach pains. It describes dull, prolonged pain. • ガンガン(gan-gan): used almost exclusively for headaches, this word describes a “pounding” or “splitting” pain, as in “I have a splitting headache!” • キリキリ(kiri-kiri): This is used for sharp pains like stomach cramps, or that weird feeling when your lung sticks to your rib cage. • ピリピリ(piri-piri): Similar to the word “sting” in English, this word describes the sensation like you have been stuck with a needle. • ゴロゴロ (goro-goro): when you have something in your eye, or an upset stomach). • チクチク(chiku-chiku): This pain is somewhat similar to “piri-piri” as it describes a stinging of burning feeling, but “piri-piri” is more for superficial injuries whereas “chiku-chiku” tends to be used for deeper sorts of pain. This word can also describe an itchy or uncomfortable feeling like chafing. • ドーン(d¯on): This is a general way to state that pain is dull. • キー(ki): By contrast, this is used for sharp pains. • キュー(ky¯u): This describes a sort of wrenching, or squeezing feeling like menstrual cramps. Despite the lack of specific words in such a medical Japanese context, these expressions are sets of specific ranges of feelings, with clear meanings for both patients and physicians. So, at a communicative level, they accomplish the same functions as those expected from specific words. Again, some morphological structures justify and explain similar routes of emotional pain elicitation [71], although they are strongly mediated by cultural forces. Think for example on birth pain: while Western women display intense cries during childbirth work and therefore very usually ask for pain relief asking for epidural anesthesia (and mass media have even created a memetic approach to it), by the contrary Japanese childbirth processes in Japan show us laboring women being encouraged to endure in silence and medical staff withholding pain medication. The same universal process, giving birth, is mediated culturally to manage its pain in specific ways. Even dealing with pain is not so complex, when pregnant women assume it as part of the process and learn how to manage it, turning it into a deep spiritual experience [27, 63]. Medical anthropology has provided us with thousands of other examples that illuminate the strong relationships between culture, emotions, and health. Think for example of the placebo effect [47], as a clear medical context in which those values interfere directly with health. A fundamental lesson from such studies is that the type of relationship created between patients and medical staff (or health practitioners) is determinant in the process of providing mechanisms for explaining health success or not. Placebo and their cultural distribution is a clear example of survivorship bias: they show us the hidden mechanisms of health, eminently managed emotionally. That emotion is morphologically embedded in our bodies, for example in the visual
9 Biases in Assigning Emotions in Patients Due …
219
system [37]. Such embodiment is not only based on the brain but is multi located [14], and can be even mapped across the body [49]. As a conclusion of this section, we can affirm that emotions have a significant role in emotional practices, as well as that their implementation does not follow universal patterns. Even agreeing on the fact that there are root morphological mechanisms that unify sets of experiences, the cultural variations eliciting such emotions produce severe changes in their expression and impact on health processes.
9.4 Machine Learning, Data, Emotions, and Diagnosis 9.4.1 What is Affective Computing? Affective computing is the discipline that studies the role that emotions can have while humans interact with machines. This field of research started with the seminal and revolutionary work by Rosalind Picard. According to Picard [53] affective computing is a multidisciplinary study field that includes computer science, psychology, neurology, and cognitive sciences, that tries to establish how computing may detect, activate and transform emotions. Emotions play a very significant role when trying to make sense of human behavior. They have a powerful influence on the way we perceive and label our surroundings, and the decisions we make, so they are key to understanding and predicting how we act, Picard [53], Casacuberta and Vallverdú [9]. Therefore, computers able to detect emotions in humans have a broad spectrum of applications: from better recommendation systems to airport security, as well as marketing, sports, crime detection, entertainment, teaching, or healthcare. Applications in healthcare are extremely important, and include general wellbeing monitoring of patients, anxiety or depression early detection, and establishing the causes of stress Vallverdú and Casacuberta [67], Hosssain [30]. Using a continuous analysis of a patient’s evolution of his or her emotional state can be a meaningful addition to their health profile, to detect problems earlier. To develop functional affective computing within a machine learning context, we may use two different sets of data [5]: external behavior, especially face expression and speech, and physiological measures. To create such algorithms, and assure their accuracy, we need to defend some sort of transcultural reality of emotions. Although in previous sections we have criticized the full universality of a basic set of emotions, it is at the same true and operationally valid to use some sort of basic emotions as valid in our societies (we mean Western cultures sharing similar European cultural heritage). Then, the main basis here is the groundbreaking work of Paul Ekman. Ekman argued for the existence of discrete, universal basic emotions that arise from common evolutionary processes in humans, and could be detected based on the universality of facial expressions, how humans express sadness, surprise, or anger [23] as well as common physiological features
220
D. Casacuberta and J. Vallverdú
[24]. At least for our current populations, we can affirm the utility of this approach, and more once deep learning techniques are applied to it [10].
9.4.2 Data for Automatic Emotion Detection To develop algorithms to detect and predict emotional states in humans, there are two main sets of data. First, we have observational data, which can be recorded by observing the human subjects using cameras and microphones. The two main sources of such data are facial expressions and speech tone [46]. Other sources can be body movement, word meaning, and salient surroundings [53]. Physiological data is obtained connecting sensors and/or wearables to the users, this data is transferred to a computer where it is stored and analyzed later. The main physiological data used to establish the emotional state of a person are: (i)
(ii)
(iii)
(iv)
(v)
(vi)
Respiration. The rate and volume of subject breathing are usually measured with a chest belt worn either in the thorax or the abdominal area and measures the expansion and constriction of the body to obtain how deep and fast the subject inhales and exhales [64], Skin temperature can be measured using an infrared reader or a temperature sensor attached to the skin. Skin temperature is a good indicator of changes in the autonomic nervous system and therefore a relevant change in the emotional state of the subject [33]. Electrocardiography through specific censors in the skin the electrical activity of the heart is recorded, giving information on how fast the heart muscle relaxes and contracts [41]. Electroencephalography measures the electrical activity of neurons in our cortex. EEG systems tended to be very clumsy but recently, wearables have been developed to allow the user to walk and move around instead of being forced to stay still [7]. Blood volume pressure. The subject blood pressure is usually measured indirectly, by measuring the quantity of light that is absorbed and how much is backscattered. Depending on the total amount of returned light the system infers the total volume of blood moved in the vessels under the sensor [22]. Galvanic Skin Response (GSR). Depending on the level of humidity -which is an indication of sweat—and other emotion-related factors, the electrical resistance of our changes. Sending feeble electrical currents to the subject— which aren’t noticed—it is possible to measure changes in resistance and infer changes in the emotional state of the subject.
Once we decide which type of data we are going to use, we need to be sure to elicit an accurate emotional response in the users, to properly link the observational or physiological data with the corresponding emotion. To do so, several strategies can be used: pictures, videos, games, video games, sound, music, words, and recall. Liu and Sourina [42], El Ayadi et al. [21] , Bota et al. [5], and Dhali et al. [19].
9 Biases in Assigning Emotions in Patients Due …
221
Most of these recollections of data are done in a laboratory setting, with a tight control environment, but recently we have seen more studies in real-world scenarios, with a lot less intervention, especially when looking for observational data Barrett [3, 4].
9.4.3 Developing the Algorithm Once the data is obtained it should be arranged in databases, dividing the collections into training and test sets to train and test the algorithms. Such databases can be fully based on physiological data, observational data, or a combination of the two. What type of data we are going to use will depend on several factors, besides what data is currently available as well as the aims of the machine learning project. Observational data is a lot easier to obtain, but it is less reliable or accurate than physiological data, as emotions inferred from observing behavior heavily depend on the social and cultural context, and can be easily faked. But then physiological data implies expensive equipment and having to work in a lab environment, which will distort the emotions felt by the participants, Bota et al. [5], Heaven [29]. No matter if the data is observational or physiological we need to associate each vector to a relevant emotional state. This can be done in an introspective way, the subject himself or herself tagging the emotional states they feel in a given moment, or by a third party. Both solutions are neither free of problems (Miranda-Correa 2017). A first-person testimony will always be more reliable than the inference by a third party, but users may feel uneasy forced to disclose their feelings to a certain stimulus, especially if those data are going to be in a public database that researchers will use and distribute to other colleagues [5]. Also, the subjects may find it difficult to express their emotions in words, so questionnaires are used to help them to tag their emotional state [62]. Once datasets are obtained and divided into training and tests, we can train and test our algorithm for emotion recognition. Myroniv et al [48] describe encouraging results using different algorithms and models, such as Random Forest, decision trees, support vector machines, as the best result in a K-NN classifier with an accuracy close to 98%. An older study by Kolodyazhniy et al. [36] also indicates the best performance from a K-NN classifier. Also relevant is Rubin et al. [57] tested several models to assess panic and obtained a 97% of accuracy with an RF classifier. There are also relevant studies in the field of deep learning models, like Santamaria-Granados et al. [59] which use convolutional neural networks with the AMIGOS dataset—a database of physiological data on emotions—obtaining better and more accurate results than the original study based on classic neural network models. Also relevant is how Dachapally [18] uses an autoencoder model that also outperforms older models working on facial expressions to establish the emotional state of a user and Zhang et al. [70] using probabilistic neural networks to estimate the emotional state using data from EEG of subjects.
222
D. Casacuberta and J. Vallverdú
To validate the results, predictions are compared to the data in the test set, using different formulae and methods to do so. One of the most used methods is crossvalidation, and we can observe it being used both in algorithms fed with observational data [60, 72] or physiological data [32, 35] as well as multimodal systems [2]. Despite its overall impressive accuracy, what are the real efficiency of such systems within a medical context? Can we really trust that they will deliver correct and fair predictions? Or biases are inevitable? We will discuss the subject in the next section of this paper.
9.5 Correcting Data Biases in Medical Diagnosis Following Chouldechova and Roth [11] we can find the following sources that may cause distortion and lead to a biased machine learning algorithm: (i)
Bias encoded in data
This happens when the humans responsible to generate or at least tag the dataset are biased themselves. Because machine learning algorithms are precisely designed to fit the data they are given, the algorithms will inevitably reproduce the bias already present in the data, and it would be wishful thinking to believe that the algorithm may somehow get rid of such biases. As we described in Sect. 4.3 humans have to label the dataset to assign every vector a relevant emotion. If the task is assigned to third parties, they may be biased considering their cultural background and make mistakes when labeling facial expressions belonging to another culture [12]. (ii)
(iii)
Minimizing Average Error Fits Majority Populations. Because different communities will probably have different distributions on how emotions are expressed, if the system is trained with mostly data from one type of community (like the far too common presence of graduate psychology students in the US), and we design the algorithm to minimize overall error, then the system will make more error when trying to label the emotional states of minorities. The Need to Explore: For many key problems in biomedicine datasets will evolve and change depending on the actions that the algorithm has taken in the past. To confirm that an emotional state is related to a disease to be treated we need to interact with the patient, we need to explore his or her situation and take actions that may not be the optimal ones to gather more data and see whether we can trust the algorithm or not. This generates tension between what is better for the patient and what is more useful to improve the dataset and the algorithm. If some minorities are underrepresented on the dataset or the system is not accurate enough with people within a different cultural context, the distance between what the system needs and what is better for the patient will increase and therefore generate an unfair situation for patients from minority contexts.
9 Biases in Assigning Emotions in Patients Due …
223
How can we avoid, or at least minimize, such biases and inaccuracies in our algorithms? We need to refine our datasets according to the following steps: First of all, we need to assure that all relevant minorities will be present in our dataset in a similar number of a case to avoid overrepresentation of majorities and the wrong labeling of a member of minorities due to the lack of enough examples of their way of expressing emotions. We have to remember, as stated in Sect. 4.2 data obtained within a lab context may not be enough for proper labeling of emotions expressed in a real-life context, so we should, whenever it is possible, obtain data in natural contexts, where subjects express their emotions as freely as possible [29]. We should also take into consideration that some cultures may find it more problematic to share their emotions in a public context than others, and therefore assure extra protocols so the recording system is not intrusive. Once we have a proper, non-biased database, we must assure also that the labeling of individual cases is as accurate as possible. If the labeling is done by the subjects themselves, we must be aware of any cultural differences in the way or reporting, like some culture being more resistant to label negative emotions, or the lack of proper equivalence of linguistic emotional terms among cultures, like the fact that “disgust” in English is not equivalent to “asco” in Spanish [61, 65]. If labeling is made by third parties, we have to be sure that they represent the understanding of different cultures, or at least, are aware of the differences between cultures when labeling those emotional states. Once such datasets are established, we have to consider privacy issues if we plan the dataset to be reused and distributed to third parties. The subject of anonymizing and reidentifying datasets is outside the scope of this paper. Here we would like to observe that, even if the dataset is perfectly anonymized, different cultures may have a different understanding of which data such as be private, and what can be public, and how for some cultures minimizing negative emotional states in public is very important and may give wrong labelings in order to protect such culture-based privacy concerns. As stated in all cognitive understanding of emotions like Lazarus et al. [40], , Roseman [56], and more recently Barret [4], in order to fully assign an emotional state to a subject we need to understand the specific context of the subject is in and what are his or her plans and aims. That means that the input recollected just by physiological sensors or video recording may not be enough if we don’t instruct the system to process the surroundings as well as the cognitive attitudes of the system at the same time. We should go beyond the idea that just raw data from EEG or pixels in a face is enough to capture emotion without some context of the environment. As stated in Barret [3] and Haeven [29] automatic emotion recognition systems are still in their infancy and more data is needed to get systems that can really be trusted. Finally, as indicated in Casacuberta and Guersenzvaig [8] just declarative knowledge about those surroundings may not be enough to assign a specific emotional state to a subject. To decide the emotional state of a subject we need to understand not just the causes, but also the reasons for such a person to act. To decide whether it was fear or anger that pushed a person to punch another, we may need to understand their
224
D. Casacuberta and J. Vallverdú
reasons to act and not just a general assessment of his physiological state. Following Lyons [45] if we reduce emotions to their physiological basis we will never be able to understand the casual efficiency of emotional states to explain human behavior. As stated in Dreyfus and Dreyfus [20] when we ascribe reasons to someone else’s (or our own) behavior, such knowledge is not just declarative, but also includes tacit knowledge that is prereflective, That is the knowledge that is learned in direct contact to society and not studied in a reflexive form. Let us consider interpersonal distance. We are not taught in school, or by our parents what is the correct interpersonal distance when being with a family, a friend, a colleague, or a stranger. However, we all learn it and keep such distances in a structured and consistent way. Such knowledge is culturally dependent, and it changes from culture to culture, and we rapidly recognize when someone from another culture steps beyond the limits of our intercultural distance. Following studies by Solomon or Colombetti [13] we are aware of how such pre-reflective knowledge is basic to assign emotions to people, and according to Casacuberta and Guersenzvaig [8] such knowledge cannot be replicated either with declarative Good Old Fashioned AI neither with the current state of the art machine learning, so we have to accept that no artificial system may be able yet to assign emotional states to people as efficiently and reliably as humans do now.
9.6 Conclusions Emotions are the fundamental tool for providing not only phenomenological meaning about reality but are also functional ways to process bodily information to react to dynamical systems. In this process, they are a complex combination of neuromodulators, hormones, and informational shortcuts (memories), all of them trained and executed in multilayered cultural domains. Health is a fundamental aspect of a living system that maintains stable its negentropy, using as a self-evaluation system such emotional mechanisms. As a consequence, medical contexts must consider the cultural background of patients and evaluate accordingly the value of the information that they can obtain from them, as well as design the better possible strategy for conveying to the patients the correct information as a way to reach the expected medical aims. Health processes are deeply connected to the experience of fragility and vulnerability, as well as with powerful moral evaluations about behavior. For all these reasons, it is not a simple informational process but, instead, is plenty of half-truths, hidden feelings, and even lies. A correct combination of physiological measurements (if culturally situated and validated) with other informational variables like patients’ verbal forms or bodily expression can provide a more reliable way to create better medical procedures. At the quantitative part of such work, we have remarked on the possible pitfalls and problems of dealing with sets of data that need to use algorithmic management. Computational epistemology is a powerful and reliable field that merged with medical communication and diagnosis can provide better results.
9 Biases in Assigning Emotions in Patients Due …
225
References 1. Y. Asano-Cavanagh, Japanese interpretations of “pain” and the use of psychomimes. Int J Lang Cult 1(2), 216–238 (2014) 2. K. Bahreini, R. Nadolski, W. Westera, Towards multimodal emotion recognition in e-learning environments. Interact. Learn. Environ. 24(3), 590–605 (2016) 3. L.F. Barrett, The theory of constructed emotion: an active inference account of interoception and categorization. Soc. Cogn. Affect. Neurosci. 12(1), 1–23 (2017a) 4. L.F. Barrett, How Emotions are Made: The Secret Life of the Brain (Houghton Mifflin Harcourt, 2017b) 5. P.J. Bota, C. Wang, A.L. Fred, H.P. Da Silva, A review, current challenges, and future possibilities on emotion recognition using machine learning and physiological signals. IEEE Access 7, 140990–141020 (2019) 6. L.R. Brody, J.A. Hall, Gender and emotion in context. Handbook Emot. 3, 395–408 (2008) 7. A.J. Casson, D.C. Yates, S.J. Smith, J.S. Duncan, E. Rodriguez-Villegas, Wearable electroencephalography. IEEE Eng. Med. Biol. Mag. 29(3), 44–56 (2010) 8. D. Casacuberta, A. Guersenzvaig, Using Dreyfus’ legacy to understand justice in algorithmbased processes. AI Soc. 34(2), 313–319 (2019) 9. D. Casacuberta, J. Vallverdú, Emociones sintéticas. Páginas de filosofía 11(13), 116–144 (2010) 10. A. Chatterjee, U. Gupta, M.K. Chinnakotla, R. Srikanth, M. Galley, P. Agrawal, Understanding emotions in text using deep learning and big data. Comput. Hum. Behav. 93, 309–317 (2019) 11. A. Chouldechova, A. Roth, The frontiers of fairness in machine learning. arXiv preprint arXiv: 1810.08810 (2018) 12. M.N. Coleman, Critical incidents in multicultural training: an examination of student experiences. J. Multicult. Couns. Dev. 34(3), 168–182 (2006) 13. G. Colombetti, Varieties of pre-reflective self-awareness: foreground and background bodily feelings in emotion experience. Inquiry 54(3), 293–313 (2011) 14. G. Colombetti, E. Zavala, Are emotional states based in the brain? A critique of affective brainocentrism from a physiological perspective. Biol. Philos. 34(5), 1–20 (2019) 15. J.A.M. Correa, M.K. Abadi, N. Sebe, I. Patras, Amigos: a dataset for affect, personality and mood research on individuals and groups. IEEE Trans. Affect. Comput. (2018) 16. A.S. Cowen, D. Keltner, Self-report captures 27 distinct categories of emotion bridged by continuous gradients. Proc. Natl. Acad. Sci. 114(38), E7900–E7909 (2017) 17. C. Crivelli, J.A. Russell, S. Jarillo, J.M. Fernández-Dols, The fear gasping face as a threat display in a Melanesian society. Proc. Natl. Acad. Sci. 113(44), 12403–12407 (2016) 18. P.R. Dachapally, Facial emotion detection using convolutional neural networks and representational autoencoder units. arXiv preprint arXiv:1706.01509 (2017) 19. A. Dhall, A. Asthana, R. Goecke, T. Gedeon, Emotion recognition using PHOG and LPQ features. IEEE (2011), pp. 878–883 20. H.L. Dreyfus, S.E. Dreyfus, The ethical implications of the five-stage skill-acquisition model. Bull. Sci. Technol. Soc. 24(3), 251–264 (2004) 21. M. El Ayadi, M.S. Kamel, F. Karray, Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011) 22. M. Elgendi, On the analysis of fingertip photoplethysmogram signals. Curr. Cardiol. Rev. 8(1), 14–25 (2012) 23. P. Ekman, An argument for basic emotions. Cogn. Emot. 6(3–4), 169–200 (1992) 24. P. Ekman, R.W. Levenson, W.V. Friesen, Autonomic nervous system activity distinguishes among emotions. Science 221(4616), 1208–1210 (1983) 25. R.A. Emmons, Emotion and religion, in Handbook of the Psychology of Religion and Spirituality (2005), pp. 235–252 26. V.Franzoni, J. Vallverdú, A. Milani, Errors, biases and overconfidence in artificial emotional modeling, in IEEE/WIC/ACM International Conference on Web Intelligence—Companion Volume (2019), pp. 86–90
226
D. Casacuberta and J. Vallverdú
27. I.M. Gaskin, Spiritual Midwifery. (Book Publishing Company, 2010) 28. C. He, Y.J. Yao, X.S. Ye, An emotion recognition system based on physiological signals obtained by wearable sensors. Wearable Sens. Rob. 15–25 (2017) 29. D. Heaven, Why faces don’t always tell the truth about feelings. Nature 578(7796), 502–504 (2020) 30. M.S. Hossain, Patient state recognition system for healthcare using speech and facial expressions. J. Med. Syst. 40(12), 272 (2016) 31. J.C. Jackson, J. Watts, T.R. Henry, J.M. List, R. Forkel, P.J Mucha, K.A. Lindquist KA et al., Emotion semantics show both cultural variation and universal structure. Science 366(6472), 1517–1522 (2019) 32. E.H. Jang, B.J. Park, S.H. Kim, M.A. Chung, M.S. Park, J.H. Sohn, Emotion classification based on bio-signals emotion recognition using machine learning algorithms, in 2014 International Conference on Information Science, Electronics and Electrical Engineering, vol. 3. (IEEE, 2014), pp. 1373–1376 33. K.H. Kim, S.H. Bang, S.R. Kim, Emotion recognition system using short-term monitoring of physiological signals. Med. Biol. Eng. Comput. 42, 419–427 (2004) 34. S. Koelstra, C. Muhl, M. Soleymani, J.S. Lee, A. Yazdani, T. Ebrahimi, I. Patras et al., Deap: a database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 3(1), 18–31 (2011) 35. C.A. Kothe, S. Makeig, J.A. Onton, Emotion recognition from EEG during self-paced emotional imagery, in 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction. (IEEE, 2013), pp. 855–858 36. V. Kolodyazhniy, S.D. Kreibig, J.J. Gross, W.T. Roth, F.H. Wilhelm, An affective computing approach to physiological emotion specificity: toward subject-independent and stimulusindependent classification of film-induced emotions. Psychophysiology 48(7), 908–922 (2011) 37. P.A. Kragel, M.C. Reddan, K.S. LaBar, T.D. Wager, Emotion schemas are embedded in the human visual system. Sci. Adv. 5(7), eaaw4358 (2019) 38. A.D. Kramer, J.E. Guillory, J.T. Hancock, Experimental evidence of massive-scale emotional contagion through social networks. Proc. Natl. Acad. Sci. 111(24), 8788–8790 (2014) 39. K. Krys, C.M. Vauclair, C.A. Capaldi, V.M.C. Lun, M.H. Bond, A. Domínguez-Espinosa, A.A. Yu, Be careful where you smile: culture shapes judgments of intelligence and honesty of smiling individuals. J. Nonverbal. Behav. 40(2), 101–116 (2016) 40. R.S. Lazarus, A.D. Kanner, S. Folkman, Emotions: a cognitive–phenomenological analysis, in Theories of Emotion. (Academic Press, 1980), pp. 189–217 41. R.W. Levenson, The autonomic nervous system and emotion. Emot. Rev. 6(2), 100–112 (2014) 42. Y. Liu, O. Sourina, EEG databases for emotion recognition, in 2013 International Conference on Cyberworlds. (IEEE, 2013), pp. 302–309 43. T. Lomas, Towards a positive cross-cultural lexicography: enriching our emotional landscape through 216 ‘untranslatable’ words pertaining to well-being. J. Posit. Psychol. 11(5), 546–558 (2016) 44. C. Lutz, G.M. White, The anthropology of emotions. Annu. Rev. Anthropol. 15(1), 405–436 (1986) 45. W. Lyons, An introduction to the philosophy of the emotions. Int. Rev. Stud. Emot. 2, 295–314 (1992) 46. D. Mehta, M.F.H. Siddiqui, A.Y. Javaid, Recognition of emotion intensities using machine learning algorithms: a comparative study. Sensors 19(8), 1897 (2019) 47. D.E. Moerman, Meaning, Medicine, and the “Placebo Effect” , vol. 28 (Cambridge University Press, Cambridge, 2002) 48. B. Myroniv, C.W. Wu, Y. Ren, A. Christian, E.Bajo, Y.C. Tseng, Analyzing user emotions via physiology signals. Data Sci. Pattern Recogn. 1(2), 11–25 (2017) 49. L. Nummenmaa, E. Glerean, R. Hari, J.K. Hietanen, Bodily maps of emotions. Proc. Natl. Acad. Sci. 111(2), 646–651 (2014)
9 Biases in Assigning Emotions in Patients Due …
227
50. L. Nummenmaa et al., Maps of subjective feelings. Proc. Natl. Acad. Sci. 115(37), 9198–9203 (2018) 51. A. Ortony, T.J. Turner, What’s basic about basic emotions? Psychol. Rev. 97(3), 315 (1990) 52. W.G. Parrott (ed.), Emotions in Social Psychology: Essential Readings. (Psychology Press, 2001) 53. R.W. Picard, Affective Computing (MIT Press, Cambridge, MA, USA, 1997) 54. R. Plutchik, Emotions: a general psychoevolutionary theory. Approaches Emot. 1984, 197–219 (1984) 55. B.N. Poudel, M.A. Ray, Consciousness: humanoid robots and caring in nursing from multicultural perspectives. Int. J. Human Caring 23(2), 185–195 (2019) 56. I.J. Roseman, Cognitive determinants of emotion: a structural theory. Rev. Pers. Soc. Psychol. (1984) 57. J. Rubin, R. Abreu, S. Ahern, H. Eldardiry, D.G. Bobrow, Time, frequency and complexity analysis for recognizing panic states from physiologic time-series, in PervasiveHealth (2016), pp. 81–88 58. E.F. Sanders, Lost in Translation: An Illustrated Compendium of Untranslatable Words from Around the World. (Ten Speed Press, 2014) 59. L. Santamaria-Granados, M. Munoz-Organero, G. Ramirez-Gonzalez, E. Abdulhay, N.J.I.A. Arunkumar, Using deep convolutional neural network for emotion detection on a physiological signals dataset (AMIGOS). IEEE Access 7, 57–67 (2018) 60. S. Sahu, R. Gupta, C. Espy-Wilson, On enhancing speech emotion recognition using generative adversarial networks. arXiv preprint arXiv:1806.06626 (2018) 61. G.R. Semin, C.A. Görts, S. Nandram, A. Semin-Goossens, Cultural perspectives on the linguistic representation of emotion and emotion events. Cogn. Emot. 16(1), 11–28 (2002) 62. I. Siegert, R. Böck, B. Vlasenko, D. Philippou-Hübner, A. Wendemuth, Appropriate emotional labelling of non-acted speech using basic emotions, in 2011 IEEE International Conference on Multimedia and Expo. (IEEE, 2011), pp. 1–6 63. P. Simkin, A. Bolding, Update on nonpharmacologic approaches to relieve labor pain and prevent suffering. J. Midwifery Womens Health 49(6), 489–504 (2004) 64. P. Schmidt, A. Reiss, R. Duerichen, K.V. Laerhoven, Wearable affect and stress recognition: a review. arXiv Preprints arXiv:1811.08854 (2018) 65. H. Stadthagen-González, P. Ferré, M.A. Pérez-Sánchez, C. Imbault, J.A. Hinojosa, Norms for 10,491 Spanish words for five discrete emotions: happiness, disgust, anger, fear, and sadness. Behav. Res. Methods 50(5), 1943–1952 (2018) 66. J. Uki, T. Mendoza, C.S. Cleeland, Y. Nakamura, F. Takeda, A brief cancer pain assessment tool in Japanese: the utility of the Japanese Brief Pain Inventory—BPI-J. J. Pain Symp. Manag. 16(6), 364–373 (1998) 67. J. Vallverdú, D. Casacuberta, Ethical and technical aspects of emotions to create empathy in medical machines, in Machine Medical Ethics. (Springer, Cham, 2015), pp. 341–362 68. P.J. Watson, R.K. Latif, D.J. Rowbotham, Ethnic differences in thermal pain responses: a comparison of South Asian and White British healthy males. Pain 118(1–2), 194–200 (2005) 69. S. Whittle, M. Yücel, M.B. Yap, N.B. Allen, Sex differences in the neural correlates of emotion: evidence from neuroimaging. Biol. Psychol. 87(3), 319–333 (2011) 70. J. Zhang, M. Chen, S. Hu, Y. Cao, R. Kozma, PNN for EEG-based emotion recognition, in 2016 IEEE International Conference on Systems, Man, And Cybernetics (SMC). (IEEE, 2016), pp. 002319–002323 71. A. Wierzbicka, “Pain” and “suffering” in cross-linguistic perspective. Int. J. Lang. Cult. 1(2), 149–173 (2014) 72. Y.D. Zhang, Z.J. Yang, H.M. Lu, X.X. Zhou, P. Phillips, Q.M. Liu, S.H. Wang, Facial emotion recognition based on biorthogonal wavelet entropy, fuzzy support vector machine, and stratified cross validation. IEEE Access 4, 8375–8385 (2016) 73. M. Zborowski, People in Pain (Jossey-Bass, 1969)
228
D. Casacuberta and J. Vallverdú
David Casacuberta is a philosophy of science professor in the Universidad Autonoma de Barcelona (Spain). He has a PhD in Philosophy and a master’s degree in “Cognitive Sciences and Language”. His current line of research is the cognitive, social, and cultural impact of digital technologies, with an emphasis on the fair uses of machine learning algorithms. He has published several books, book chapters, and papers about the subject both in electronic and printed format. In 2003 he received the Eusebi Colomer Award from the Epson Foundation for the best essay on the social, anthropological, philosophical, or ethical aspects related to the new technological society with his book “Creacion Colectiva”. He is also the co-director of the master’s degree in Web Projects of Elisava Design School, in direct contact with Ferran Adria and his BulliFoundation. Jordi Vallverdú is a Catalan ICREA Acadèmia investigator who has devoted his researches to the cognitive and epistemic aspects of Philosophy of Computing, Philosophy of Sciences, Cognition, and Philosophy of AI. After his Bachelor of Philosophy at U. Barcelona (1996) he obtained his M.Sci and Ph.D at UAB (2001, 2002, respectively). He also obtained a Bachelor in Music (ESMUC, 2011). He has enjoyed research stays at Glaxo-Wellcome Institute for the History of Medicine (1997), J.F.K. School - Harvard U. (2000), and Nishidalab - Kyoto U. (2011, JSPS Grant). In 2019 won the Best presentation award of the HUAWEI Neuro-inspired, cognitive, and unconventional computing workshop, Kazan (Russia). His latest researches have been focused on emotional modeling of emotions, bioinspired computation, minimal cognition, multi-heuristic scientific practices, and causality in Deep Learning. His current researches are done under the support of GEHUCT, Recercaixa “AppPhil”, Tecnocog (MICINN), ICREA Acadèmia, and EU H2020 project CSI-COP.
Part II
Prospects of AI Methodologies in Healthcare
Chapter 10
Artificial Intelligence in Healthcare: Directions of Standardization Hoc Group on Application of AI Technologies
Abstract Artificial intelligence (AI) can have a significant positive impact on health and healthcare. AI can be used to improve the quality, efficiency and equity of health care. However, AI has the potential to have significant negative impacts. Therefore, AI medical applications should be designed and deployed in accordance with established guidelines and legislation. There may be gaps or questions in the current regulatory framework related to the interpretation and application of the existing regulatory framework to healthcare applications that include artificial intelligence solutions. Global standardization maintains a consistent approach and can reduce the burden on stakeholders when it comes to establishing regulatory frameworks, interpreting and complying with regulatory requirements. While AI is far from new, it has only recently become mainstream. This chapter outlines the research of the authors who are members of the Hoc Group on Application of AI Technologies in Health Informatics (ISO AHG2 TC215), which was formed by ISO Technical Committee 215 to define goals and directions for standardization in the field of AI in health care. Keywords Artificial intelligence · Deep learning · Electronic health records · Standardization of digital health
10.1 Introduction Artificial Intelligence (AI) has the potential to have a significant positive impact on health and healthcare. AI can be used to improve the quality, efficiency, and equity of healthcare delivery. Nevertheless, AI has potential for significant negative impacts. Consequently, AI health applications should be developed and deployed according to established principles, as well as to comply with jurisdictional regulations. Current regulatory frameworks may have gaps, or there may be questions
Hoc Group on Application of AI Technologies (B) Washington, USA © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_10
231
232
Hoc Group on Application of AI Technologies
related to interpreting and applying existing regulatory frameworks to health applications that incorporate AI solutions. Global standardization supports a harmonized approach and can reduce the burden on stakeholders when it comes to establishing regulatory frameworks and interpreting and fulfilling regulatory requirements. While AI is far from being new, it has only recently become ‘mainstream’. Progress in computing and transmission hardware and software has paved the way for embedding AI components in many products and services for the general public. The following three stakeholder groups are impacted by AI in healthcare: 1.
2. 3.
Healthcare, public health and research community at large including physicians of various clinical sub-specialties, nurses, administrators, researchers, pharmacists, laboratory staff, executives and other healthcare professionals. Health Information Technology (HIT) and AI technology solutions developers and, End users—consumers of healthcare and public health services (patients including children, family members, care givers, and the public at large).
Each group plays a critical role in ensuring productive, safe and ethical use of AI in health-related information sharing and use. Management and candidate ML approaches for combining the value of these complementary yet disparate data resources for patient-specific risk prediction modelling.
10.2 Definition of Artificial Intelligence (AI) Numerous definitions of “artificial intelligence (AI)” are exist. Some of them focused on philosophical aspects, others on mathematical issues or computer science. For example, the definition proposed by Academy of Medical Royal Colleges: “The simulation of human intelligence processes by machines, especially computer systems. These processes include learning (the acquisition of information and rules for using the information), reasoning (using rules to reach approximate or definite conclusions) and self-correction.” It seems to be too wide and non-specific. Really, there is no agreed definition, including field of healthcare. In general, the term “AI” broadly refers to systems and technologies that resemble processes associated with human intelligence: first reasoning, learning and adaptation, sensory understanding, also as interaction. There is the specific definition at ISO/IEC 22989 Artificial Intelligence Concepts and Terminology, N695 draft, August 18, 2020: artificial intelligence: capability to acquire, process, create and apply knowledge, held in the form of a model, to conduct one or more given tasks. This is an excellent one from the technical point of view. In addition, it is very affordable for standard development process. Otherwise, specific aspects of medicine and healthcare were neglected. In 2018 Expert group of European Commission proposed a capacious definition, which perfectly reflects the features of AI in medicine and healthcare also: «Artificial
10 Artificial Intelligence in Healthcare: Directions of Standardization
233
intelligence (AI) systems are software (and possibly also hardware) systems designed by humans that, given a complex goal, act in the physical or digital dimension by perceiving their environment through data acquisition, interpreting the collected structured or unstructured data, reasoning on the knowledge, or processing the information, derived from this data and deciding the best action(s) to take to achieve the given goal. AI systems can either use symbolic rules or learn a numeric model, and they can also adapt their behavior by analyzing how the environment is affected by their previous actions [1]». Can be propose a definition for medicine and healthcare, based on cited one (with respect to patient and health practitioner needs and rights, and medical practice features). Artificial intelligence (AI) systems in healthcare are software (alone or embedded into hardware) systems: • • • •
designed by humans teams led by health practitioner, on standardized and prepared on evidence-based approach data, given a specific clinical or management goal, act in the physical or digital dimension (with integration into hospital information systems, applications), • by perceiving their environment through data acquisition, interpreting the collected structured or unstructured data from health records, medical devices, patient itself, follow-ups, own previous actions, • reasoning on the knowledge, or processing the information, derived from this data and deciding the best action(s) to take to achieve the given goal with considering patient safety, evidence-based practice, prevalence of human (doctor, nurse) decision. The components of the definition will be discuses in the following sections.
10.3 History The study of mechanical or “formal” reasoning began with ancient philosophers and mathematicians in antiquity. In 1832 Semen Korsakov (1787–1853) invents five mechanical devices—so-called “intelligent machines”—for the partial “mechanization of mental activity in the tasks of search, comparison and classification of information”. His concept of artificial amplifying of natural intelligence echoes the modern concept proposed by The American Medical Association. They are recently defined the role of AI in healthcare as “augmented intelligence,” stating that AI will be designed and used to enhance human intelligence rather than replace it [2]. In 1950 Alan Turing (1912–1954) proposed the imitation game later known as “The Turing test”. This is the test of a machine’s ability to exhibit intelligent behaviour equivalent to human. In 1956 the term “artificial intelligence” is coined by John McCarthy (1927–2011) at a Dartmouth conference. AI is founded as an academic discipline. 1956–1980 are worldwide golden years of AI science, expert systems development,
234
Hoc Group on Application of AI Technologies
funding. Computer automated analysis of medical information (electrocardiogram, spirometry, other functional tests) was successfully implemented in the USSR and the USA in parallel. Several systems have been used in clinical practice. Around 1980 the first “AI winter”, with reduced funding and interest in AI research due to limited capacities of available computers. Up to 1987 the rise of interest due to creation of knowledge-based expert systems conception. The first clinical decision support systems are used in medicine. It was rule-based systems for diagnosis, especially in complex patient cases, choose appropriate treatments, and provide interpretations of clinical reasoning. However, rule-based systems are costly to build, they were critically limited by the comprehensiveness of prior medical knowledge. It was hard to implement in clinical practice a system that integrates deterministic and probabilistic reasoning. Thus, rule-based approach was unsuccessful. In 1987–1993 happens the second “AI winter”, as even knowledge-based expert systems show their serious limitations and prove expensive to update and maintain. 1993–2011 Returning of some optimism. New successes are marked with the help of increased computational power and AI becomes data-driven. Some AI-based software can beats human champions at chess and Jeopardy. 2012–today incredible progress of computational power, data transmission speeds, also as availability of data allow for breakthroughs in machine learning, neural networks and deep learning. Progress of AI development for medical imaging and records analyzing. Commercial developers’ hype mainly based on unpublished, untested and unverifiable results. The prevalence of mathematics over medicine in researches. Restrained attitude of health practitioners due to lack of evidence. Arising of evidence-based approach for AI in healthcare. Thus, artificial intelligence includes a range of methods that allow computers to perform tasks typically thought to require human reasoning and skills. Worldwide algorithms based on rules and logic specified by humans has been use in healthcare since the 1970s. During last 20 years, there have been huge technological developments, including two main components: 1. 2.
Incredible increasing of hardware computing capabilities and data exchange rate. Mathematical progress of artificial neural networks and machine learning methodologies.
Progress of hardware and mathematics allows computers learn from examples rather than explicit programming now.
10.4 AI Features and Development Rule-based expert systems contains preset answer options or backgrounds on some statistical methods (logic regression, etc.). Usually, they just able to evaluate the question according to the specified criteria and choose the most appropriate answer from the list. AI is something else. The mathematical model (neural network) is train on a prepared dataset. After that, the model becomes able to interpret new data
10 Artificial Intelligence in Healthcare: Directions of Standardization
235
based on internal algorithms and previous experience. The model has ability to learn, accumulate and analyze own experience—this is an “artificial intelligence”. As a scientific medical discipline, AI includes several approaches and techniques: 1.
2.
3.
Machine learning (deep learning and reinforcement learning). For example, diagnostic imaging recognition and interpretation, electronic health records analysis. Machine reasoning (planning, scheduling, knowledge representation and reasoning, search, and optimization). For example, decision-making support tools integrated into hospital information systems, data extraction systems for electronic libraries. Robotics (control, perception, sensors and actuators, integration of all other techniques into cyber-physical systems). For example, automatic injections, equipment supervision, robot-assisted surgery.
Machine learning (ML) a field of computer science that uses algorithms to identify patterns in data, it represents the dominant approach in AI [3]. Progress of ML is responsible for most of the recent advancements in the field. Usually, ML refers to a system that trains a predictive model by identifying patterns of data from input, then uses such a model to make useful predictions from new, never-before-seen data. Such algorithms can automatically learn and improve from experience without being explicitly programmed, and such “learnability” represents a key feature of AI as was mentioned above [2]. The most common ML algorithms are supervised learning, unsupervised learning, reinforcement learning, and deep learning. A very good tutorial on this topic has been published quite recently [4]. Supervised ML a type of machine-learning task that aims at predicting the desired output (such as the presence or absence of disease, symptom) based on the input data (such as diagnostic images, health records, laboratory tests). Supervised machinelearning methods work by identifying the input–output correlation in the ‘training’ phase and by using the identified correlation to predict the correct output of the new cases [3]. Supervised ML based on datasets as input and some known, labelled outcomes (tagged dataset) as output. This type of ML has been widely applied to healthcare, providing data-driven clinical decision support for mapping input variables into discrete categories (for example, structured data in radiology imaging to diagnose and stage tumor) and predictive analytics within a continuous output (for example, personal risk assessment and prognosis based on unstructured data in electronic health records). Unsupervised ML a type of machine-learning task that aims at inferring underlying patterns in unlabeled data (untagged datasets). Shortly, it is use to discover the structure of data and make predictions based on input alone. Unsupervised ML allows making an algorithm, which can find sub-clusters of the original data, identify outliers in the data, or produce low-dimensional representations
236
Hoc Group on Application of AI Technologies
of the data [3]. This type of ML do not widely used in healthcare—it is more prone to errors because it may use trivial features of the data to make predictions. One of the few applications are predicting individual disease risks using genetic biomarkers or designing personalized treatments based on genomic variations. In some sense learning without human’s labelling data it is closer to “true AI”, nevertheless risks of errors is to high for healthcare. There is combination of methods that call semi-supervised learning. Supervised and unsupervised ML applying together by joint use of a large amount of unlabeled dataset for training with only a small proportion of tagged data. This approach is more applicable for healthcare in case of insufficient of labeled data, but it needs strict and thorough blind external validation. Reinforcement learning is a more autonomous learning algorithm that allows a model to take actions and interact with the environment using rewards and errors as the feedback to guide training [2]. It is somewhat real self-learning approach because the model learns from its own experience without either data or tagged datasets. In healthcare reinforcement learning is applicable for situations in which AI needs to continuously interact with the environment and adjust its actions based on the feedback from the environment. Therefore, it can be tasks for optimizing or treatment (medication therapy) design or robotic-assisted surgery, manipulation and diagnostics (like intravenous injection, ultrasound examinations). Deep learning (DL) a subfield of the larger discipline of ML, which employs artificial neural networks with many layers to identify patterns in data. It discovers the intricate structure in large datasets by using a backpropagation algorithm operating on multiple levels of abstraction. The key ability of DL it is adding of “hidden layers” of artificial neural networks, which are increase the capacity of algorithms for solving complex real-world problems. DL is a perfect approach in cases that result rely heavily on feature detection and real big data like genomics, unstructured health records in hospital archive, drug and biomarkers discovery, speech and language recognition. Natural language processing (NLP) uses computational methods to automatically analyze and represent human languages. In healthcare it is applicable for: • Doctor’s speech recognition and automated documents filing, • Patient speech and write recognition for identification, some diseases or symptoms screening (so-called symptom-checkers), navigation and information, • Health practitioners’ and patients’ personal assistance, • Equipment control, • Quality control, • Health record analysis for personal risk assessment and other various tasks. There is a very large amount of unstructured textual data in healthcare (history, doctors’ notes, test results, lab and radiology reports, patients’ diary, medication orders, discharge instructions, etc.). NLP can extract critical information for various
10 Artificial Intelligence in Healthcare: Directions of Standardization
237
tasks. However, much more impressive is abilities of combination of ML and NLP for medicine and healthcare. It will enable health practitioners to make timely diagnoses and treatment decisions, which can have profound impact on health service delivery, particularly on the ways that patients are treated. This combination is open ways for medical robots. Potentially, they can help with surgical operations, diagnostic and treatment manipulations, rehabilitation, social interaction, assisted living, quality control, and more. Rights now there are AIassisted surgical robots for in neurology, orthopedic, and various laparoscopic procedures. They can analyze data from preoperative health records to physically guide a surgeon’s instrument in real time during a minimally invasive procedure. There are evidences that such robot-assisted surgery allows reduce hospital stay, complications, and errors. In nearest future AI-assisted robots will be used for rehabilitation (after serious trauma, stroke or other neurologic disease). They would assist in the care of the elderly individuals, monitor vital signs and take proper actions when needed. The process of AI development with mentioned above methods consists from “medical” and “technical” parts. 1.
Medical part: • • • •
2.
goal setting, data selection, data tagging, dataset formation.
Technical part: • • • • •
creation of the mathematical model, training the model, calibration the model, internal validation, external validation.
It is clear that AI-model can continue learning after official development process ending. There are various dynamics of AI changes. As a product or service on healthcare market AI-model can be: 1. 2. 3. 4. 5.
Locked—a model may learn in the field, usually through analyzing of feedback at the developer site. These models does not change during practical use. Change by user—the same way of learning during practical use, but health practitioner can select an appropriate working point. Discrete change through learning—a model learns in the field itself. Update with explicit/distinct update by developer or health practitioner. Continuous change through learning—learns in the field also, but update of a model happens without explicit manufacturer or user interaction. Hybrid form—a model continues to adapt, until a human decides otherwise and returns it to a prior state.
238
Hoc Group on Application of AI Technologies
Usually, the medical device regulations impose strict limitations with regards to the significance of changes allowed by the AI manufacturer before a new conformity assessment is required. Algorithms that change themselves during use can only do so within predefined boundaries taken into account during the conformity assessment. Labeled (tagged) datasets are extremely important for AI development and validation in healthcare. Reference dataset is a structured set of biological, medical, health, social, demographic and other related data that has been pre-prepared (processed, tagged, labeled) according specific clinical task with preservation of data anonymity and patients’ rights. In medicine and healthcare, reference dataset can include diagnostic images and information on pathological changes on images (annotations); structured clinical cases and related documents from EHR; libraries of keywords, phrases and their critical combinations. If the dataset contains confirmed information on the final diagnosis for each case, then it is call “verified”. According to Sergey Morozov et al. [4] the reference dataset should meet the following requirements [5]: • the normal-to-abnormal ratio should reflect the prevalence of the target pathology in the population; • several medical centers should source the reference dataset to introduce the data heterogeneity; • demographic, socio-economic characteristics and basic health indicators in the reference dataset should correspond to the population’s average characteristics in the target region; • the proposed size of the reference dataset should be justified per statistical considerations, and the desired diagnostic accuracy by the main metrics indicated above; • reference datasets used in clinical tests for registering the software as a medical device should not be publicly available (to exclude the possibility of training AI algorithms on reference datasets). The methodology of reference datasets preparation is a specific topic discussed elsewhere. There are three main types of companies that are providing a AI solutions for healthcare and related areas: 1.
2.
3.
Vendors of EHR and PACS, which add AI capabilities in their products (for example, algorithms for image analysis, NLP to support clinical decisionmaking, etc.). Big tech companies, which are providing AI cloud platforms, services, and ML algorithms for health organizations to build, manage, and deploy various AI applications with massive data. Specialized healthcare AI companies and start-ups. They producing various kinds of AI healthcare applications usually focused on small specific tasks.
10 Artificial Intelligence in Healthcare: Directions of Standardization
239
In general, most applications of AI are narrow, current solutions are only able to carry out specific tasks or solve pre-defined problems. However, for the healthcare this approach seems to be effective and widely adopted. In conditions of huge heterogeneity of data and colossal risks, it is easier and more reliable to develop and train an AI-based system for clear very specific clinical task.
10.5 Problems and Challenges Results of AI health applications depend on (1) data quality, (2) quality of algorithms (and hence of software programming generally), (3) limits of validity of their applicability, and (4) their proper implementation (e.g., in workflows) and other operational considerations. Beyond the availability of sufficient data, data quality is one of the primary constraints in implementing AI health applications (particularly machine learning, deep learning, and other applications dependent on big data). It is also the main factor determining the validity of results, captured in the well-worn phrase from the start of the computer-age, garbage in garbage out. AI has the serious potential to help solve important healthcare challenges, but might be limited by 2 serious issues: • quality and availability of standardized health data, • inability to display some human characteristics as clinical thinking and reasoning, compassion, emotional behavior, life experience sharing, also as intuition. A key challenge is ensuring that AI is developed and used in a way that is transparent, explainable, safe and compatible with the public (including doctor, patients, society, industry) interests. AI in healthcare promises great benefits to patients and health practitioners, otherwise it equally presents risks to patient safety, health and data security. There is an only one reasonable way to ensure that the benefits are maximize and the risks are minimize. Health practitioners have to take an active role in the development of AI-based technologies. Their medical knowledge and clinical experience are vital for their involvement for reasonable task definition, standards and methodologies creation, and limitations overcoming, also as for dataset preparation, systems validation and following the evidence-based approach. For example, according to European Society of Radiology almost 100% of doctors believe that radiologists will play a role in the development and validation of AI-based software. Majority thinks that they should supervise all development stages of an AI system (>64%) or helping in task definition (>53%). Otherwise, one third only focused on just providing of labelled images (>29%) and directly developing of AI-based applications (>27%). More than 20% of radiologists are already involved in AI systems development and testing [6]. Thus, health practitioners can and have to be part of the development and use of AI. It will require rethinking and changes in behavior and attitude to education and careers. In nearest future, principles and basic methodologies of data science would be a part of doctor’s competences as auscultation or injection. Anyway, AI-based
240
Hoc Group on Application of AI Technologies
software must be develop in a regulated way in partnership between clinicians and computer scientists. There is no differences on patients’ rights and safety depending on presence or absence of AI in clinical activity. Safety have to be remain paramount. It is important to understand safety as a need of a patient and health practitioner both. Patient are carry about health and life. Doctor or nurse are carry about good practice and trust to technology. There is a critical aspect is the conformity with the national or regional data protection environment. On a global level, availability of data must be balanced by individuals right to personal data protection. AI can combine data coming from different information sources, each one containing anonymized data but when that data is combined using AI, identification of individuals is possible. An ethical principle should be: Do not use AI in the Healthcare sector for unwanted or unintended re-identification of individuals by combining anonymized data from different sources. Explainability Regulations across the world requires medical treatments be explainable and understood by patients so that patients can give their informed consent to a planned medical intervention. However, AI systems may utilize advanced statistical and computational methods to determine a course of treatment that may not be easily understand by health practitioners, let alone patients. Modern machine learning algorithms are usually describe as a “black box”, because it is too difficult for a human to understand how the conclusion based on the huge number of connections between artificial “neurons” was reached. Meanwhile, doctors can trust to “black box” because unclear decision-making creates serious risks to a patient. Doctors and nurses need to be able to inferentially authorize their decisions, recommendations, diagnoses and predictions and take responsibility for them. There are two critical barriers for AI in healthcare, how mentioned by Heinrichs et al. [6]: “The first issue is that epistemic opacity is at odds with a common desire of understanding and potentially undermines information rights. The second (related) issue concerns the assignment of responsibility in cases of failure. Subsequently, we elaborate these issues in detail [7]”. Conception of explainable AI potentially can help to overcome these problems. First step here is making AI interpretable. Interpretability it is understanding a links between a cause and effect within an AI model. Observer can to predict what is going to happen, given a change in input or algorithmic parameters. Explainability it is a next step. This is understanding and explanation in human terms internal mechanics of a machine or deep learning system. Explainable AI means implementation of transparency and traceability of statistical “black-box” machine and deep learning methods. Explainability of AI became a mandatory requirement in healthcare. Explainability should be described from patient and health practitioner point of view. Patient view Explainability is best understand as effective contestability. Explanation of this conception are exist in patient-centric approach to AI usage in medicine
10 Artificial Intelligence in Healthcare: Directions of Standardization
241
(particularly in diagnostics) had propose by Ploug and Holm [8]. According to the approach, patients should be able to contest the diagnoses of AI diagnostic systems. It is necessary to ensure the availability of four types of information for this purpose: 1. 2. 3. 4.
How to the AI system’s use of data. The system’s potential biases. The system performance. The division of labor between the system and healthcare professionals.
First, individuals have a right to privacy, to protect themselves against harm and risks. Exercising this right to contest the use of personal health and other data are backgrounded on two types of information. So, AI-based medical services requires that individuals have access to information about: • types of personal data used in AI diagnostics (e.g. clinical tests, images, biopsy etc.), • sources of such data (e.g. Electronic Patient Record etc.), because sensitivity and quality of data may be critically dependent on the source). Second, individuals have a right to protect themselves against discrimination, including due to AI’s bias. Exercising this right requires that individuals have access to information about: • characteristics of the dataset on which the model is built and validated, • how the data for dataset were selected and categorised by humans, • characteristics and level of testing the AI model. The good practice if developer make an initial general claim of potentially relevant bias. Nevertheless, an individual have a right to have individual bias investigation. Third, the right to contest the AI model performance are directly links with the right to protect themselves against harm. Here individuals must have access to information about: • • • • •
performance of the AI model, trials and tests used to evaluate the performance, information about the key indicators of the diagnosis, alternatives to the suggested diagnosis, changes that will lead to a reconsideration of the diagnosis.
Fourth, the right to contest the division and organization of labor is also protect individuals against harm and makes responsibility clear. For exercising of this right, individuals must have access to information about: • role of AI in the clinical work-flows, • role of health practitioners in the clinical work-flows, • legal responsibility for medical (diagnostic) procedures. AI developer and health practitioner both have a duty to prepare and provide the information needed for effective contestation. In real life, most patients are unlikely
242
Hoc Group on Application of AI Technologies
to contest the AI advice. They would be satisfy by explanation of doctor about diagnosis and further tactics. Anyway, relevant information should exist and be updated regularly. One more duty exist whenever patient wish, patient have to be inform that AI system has been provide an advice and it has been used by the health practitioner. Health practitioner viewAI interpretability refers to a health practitioner’s ability to understand the AI-model itself or at least a summary. In clinical context, interpretability of AI provides knowledge for shared decision-making. Such AI allows humans to gain knowledge about the considered features, their integration and weighting. This information are relevant for connecting it to data from health records, laboratory tests, imaging examinations, etc. Health practitioner can process and interpret the results of the AI model relative to information from various other sources and makes individual evaluation of the clinical case. Moreover, the final clinical decision can be clearly explain to the patient on evidence-based manner, because interpretability allows estimate the probability of a result generated by AI. In healthcare explainable AI is need for many purposes including clinical practice and decision-making, professional education, research. Medical professionals must be able to understand and to retrace the machine decision process. AI explainability have to be realize on two levels [7]: 1. 2.
Model level—as a human’s ability to understand the structure of the process, which provides a bridge to shared clinical decision-making. Results level—as understanding why was this particular decision made in this specific clinical case.
Explainable results of AI works can be directly integrate into clinical decisions and recommendations, also as communicated to the patient. According to modern level of technologies, developers have to make their algorithms interpretableand implement elements of explainable AI. Transferability AI can be well optimized for the specific task, but it will be incorrect imprecise and ineffective on data it has not seen before. This is a very typical situation, which occurs due to: • AI training on limited dataset or data from only one hospital, • Lack of independent (external) blinded evaluation on real-world data. A number of developers uses cross-validation (leave-one-out) method. They use one dataset, which divided on 2 parts: training dataset and testing dataset. Creation and learning of the AI-model is carried out at the first one. Then, they make an internal validation of AI accuracy at testing dataset. This is not enough, since technically both “subsets” consist of data from the same source (hospital or even a number of hospitals). The problem is the medical and health data are not standardized, and always have some peculiar properties due to differs in clinical traditions and rules, population, medical devices customization, protocols, etc. Thus, external validation on new (“previously not seen”) datasets is obligatory. Moreover, it should include two stages:
10 Artificial Intelligence in Healthcare: Directions of Standardization
243
• Retrospective external validation at new reference datasets, • Prospective external validation on real-world data. This approach allows to overcome a problem when narrow applications that cannot generalize to clinical use. Ethics The use of AI in clinical medicine and health researches raises many ethical issues [9]: • • • • • • • • • •
potential to make erroneous decisions; responsibility when AI is used to support decision-making; difficulties in validating the outputs of AI systems; inherent biases in the data used to train AI systems; ensuring the protection of potentially sensitive data; securing public and professional auditorium trust in the development and use of AI technologies; effects on people’s sense of dignity and social isolation in care situations; effects on the roles and skill-requirements of healthcare professionals; patient’s preparedness and reactions to medical services (for example, image reporting) made by an AI-application alone without supervision and approval by a physician. potential for AI to be used for malicious purposes.
Currently, more than 84 ethics initiatives have published reports describing highlevel principles, tenants, and abstract requirements for the development and deployment of AI. An analysis of 36 prominent sets of such principles revealed the following 8 themes: • • • • • • • •
privacy, accountability, safety and security, transparency and explainability, fairness and non-discrimination, human control of technology, professional responsibility, and promotion of human values.
These principles should be applied to every stage of the AI health application life-cycle, from needs determination and design through decommissioning. When pertinent, to promote trustworthiness, they can be applied in basic requirements for the development, deployment, use, and evaluation of AI health applications. Moreover, relevant principles and requirements can also be applied to the development and revision of health informatics standards pertaining to or encompassing AI health applications. In particular, whenever applicable there should be full transparency about an AI product throughout its life-cycle, e.g., how the algorithm works, what data were used to train it, what tests were conducted, how the trained product performed in such tests, experience with use in practice, etc.
244
Hoc Group on Application of AI Technologies
Unsafe AI could harm patients across the national healthcare system. Really, medical AI will help some patients but expose others to unforeseen risks. It seems like a doctor have an automatic right to over-rule an AI decision. One must always remember AI-tools could be confidently wrong, moreover a misleading algorithms hard to identify. AI will change or at least influent the doctor-patient relationship. The health practitioner will need to behave differently to learn: • interact with expert patients, who may have selfdiagnosed with AI tools, • preserve of human contact to reduce patients’ loneliness, safeguarding, social needs due to introduction of AI into healthcare. Responsibility is still an unclear. A rhetorical question: who will be responsible for harm caused by AI mistakes: developer, IT-company, the regulator or the health practitioner? National regulators and international authorities (World Health organization (WHO) and International Telecommunication Union (ITU)) should solve this question, as soon as possible. Currently, radiology is most involved in AI sphere of medicine. So the opinion of radiologists can be interpolate into other specialties. According to European Society of Radiology survey 41% of respondents believes only doctor will take responsibility for AI outcome, but other 41% choose a scenario of shared responsibilities between doctor, patient and AI developer. In exclusive responsibility of developers or insurance companies believes 10 and 3.6% of doctors accordingly. Note, more than half (55.4%) doctors believe that patients will not accept a report made by an AI alone without supervision and approval by a health practitioner. Only 11.7% claim the opposite, one third (32.9%) are still doubt about [6].
10.6 AI Systems in Healthcare Currently, AI is used or trialled for a range of healthcare purposes, including detection, staging and monitoring of disease (screening or in clinical environments), quality control, prognosis, quantitative measurement of biomarkers, support of structured reporting, management of chronic conditions, delivery of health services, productivity increasing, support of clinical decision, drug discovery. According to Chen et al. [2] as a part of hospital or healthcare digital environment, AI can accomplish the following: • Unlock the power of big data and gain insight into patients; • Support evidence-based decision-making, improving quality, safety, and efficiency, coordinate care and foster communication; • Improve patient experience and outcomes; • Deliver value and reduce costs; • Optimize health system performance.
10 Artificial Intelligence in Healthcare: Directions of Standardization
245
AI should standardize assessment and treatment according to up-to-date clinical guidelines and protocols, raising minimum standards and reducing unwarranted variation. Reasonable, well-trained and well-validated AI improves access to health services, providing advice in realtime to health practitioners and patients, also as identifying critical and dangerous situations (medical emergencies). There are number main AI applications in healthcare: 1.
Clinical practice: • • • • • • •
Screening and detection of diseases at an early stage, Prognosis, risk stratification, prevention, Decision-making support on diagnosis, Decision-making support on treatment and clinical tactics, Management of medications (pharmacotherapy), Assistance during surgery/invasive manipulation, automated surgery Patient monitoring, pro-active screening via wearables and sensors embedded into smart environment, • Processing of medical images and test results, health records for various clinical tasks, • Automated filling of medical documentation (sample generation, speech recognition), • Personal support in self-control, self-diagnosis, healthy living. 2.
Healthcare management: • • • • •
3.
Healthcare system modeling, Epidemiology control, Predictive analytics, Quality control in healthcare, Medical education.
Researches: • • • • • • • •
Automated experiments, Automated data collection, Patient selection for clinical trials, Genome discovery, Biomarker discovery, Drug discovery and repurposing, Literature mining, “Omics” discovery.
Technically in healthcare AI performs detection, classification, segmentation, processing (including natural language processing), comparison, prediction (prognosis), and recommendation generation with three types of data: 1.
Imaging (still and moving, including radiology, endoscopy, dermoscopy, patient view, etc.).
246
2. 3.
Hoc Group on Application of AI Technologies
Documents and speech (including various health records, patient or doctor speech, etc.). Data stream (including statistics, epidemiology data, raw diagnostic data, etc.).
Thus, for proper, safe and effective using AI have to be embed into the clinical workflows to solve specific tasks at the point of care. Electronic Health Records (EHR) are the backbone of modern digital healthcare systems. Therefore, the preferable approach to integrate AI directly into EHR systems. In some countries, this approach already recommended in clinical protocols and guidelines, at least for radiology. In term on AI integration into EHR a number of abilities appears. Based on Chen et al. [2] they can be systematize as follows: 1.
Providing clinical decision support at the point of care to improve diagnostic accuracy and treatment recommendations: • Diagnostic analytics using medical imaging or genomic, clinical, laboratory, behavioural and other data; • Predictive analytics and personal risk assessment (high-risk patients selection, outcomes prognosis); • Personalized treatment recommendations based on evidence-based practice (this is an unique ability of AI to joint “narrow” individual data and “wide” clinical recommendations); • Prediction and prevention of adverse events; • Medication safety and reconciliation; • Routine integration of various data for triage and critical care monitoring, diagnostic interpretation, and treatment modification.
2.
Providing patient engagement technology to support self-care • Patient empowerment via access to personal health data, prognostic information; • Patient engagement tools (chatbots, wearables, mobile devices) for supporting patient and family members education, informed decisionmaking, selfmonitoring, and self-management of chronic conditions; • Pre-hospital support in emergency situation (including prediction of acute situation, patient support, hospital information); • Proactive screening via smart environment; • Channels for patients to interact with healthcare providers and on-line services; • Crucial patient data extraction from wearable, mobile devices, sensors in smart environment, health apps, integration of these data into EHR.
3.
Optimizing workflows and resource allocation, improving operational efficiency • Predictions of the number of patients during a specific period and resources needed (for example, in situation of epidemic); • Integrated voice technologies in EHR for clinical documentation;
10 Artificial Intelligence in Healthcare: Directions of Standardization
247
• Integrated NLP for processing narrative health data and providing critical summaries of key patient information, also as for quality control; • Simplification of operational processes through AI automation. 4.
Facilitating population health monitoring and management, improving wellness via data from smart environment, social media, various information systems (with preserve of human rights, personal data and confidentiality protection): • • • • •
Population health monitoring; Epidemic prediction; Predictive analytics for health service; Identification of high-risk population groups; Prioritization of at-risk patient populations and management of proactive interventions; • Investigation of social determinants on healthcare and management of population wellness. 5.
Supporting real-world clinical research and evidence-based medicine: • Collection and storage of real-world data for clinical research and care improvement; • Precision medicine and clinical trial matching; • Drug discovery; • Patient selection for clinical trials; • Biomarkers discovery based on various data sources.
The possibilities for using AI in healthcare are very wide as was mentioned above. There is the recognized framework for any kind of AI case in the field (Fig. 10.1). The framework defined four broad categories, with subcategories defined for the “Individual Health” category: 1. 2.
Population Health Individual Health
Fig. 10.1 Framework of AI use cases in healthcare
248
Hoc Group on Application of AI Technologies
a b c d e f 3. 4.
.Care Routing .Care Services .Prevention .Diagnosis .Acute Treatment .Follow-up and Chronic Treatment
Health Systems Pharma & Medtech
Almost any AI use case can be categorize within this framework. Furthermore, specific attributes associated with each category will inform whether (or not) an AI use case would be subject, for example, to regulatory approval. The AI can enhance, extend, and expand human capabilities in medicine, delivering the types of care patients need, at the time and place they need them. In healthcare AI cannot be used alone. A human–machine partnership is a key for improving of clinical effectiveness (quality, safety, and efficiency), access, and affordability of care. Nonetheless, complete automation is possible for some specific situations: • tasks where AI has surpassed human performance like (screening, health records peer-review for quality control, library search for data extraction, etc.), • tasks where mistakes do not lead to serious consequences (primary prevention, flagging an at-risk population group for vaccination), • situations where human health practitioner are unavailable but AI can help with information and support (for example, chatbot for patient support and navigation during insulin self-injection). In clinical practice the key in “doctor-AI” partnership is to keep the delicate balance between the types of care human value and the levels of automation that technologies offer. In educational context AI should be incorporate into simulations generating clinical scenarios across a range of specialities to enhance training and learning. Advancement of medical knowledge produce the sheer volume of new information exceeds human abilities to keep pace in real time. AI can analyze large datasets and libraries across multiple sites to condense information for the health practitioner for clinical decision-making and lifelong learning. Moreover, AI, combined with other digital technologies, can personalize education by evaluating previous experiences, responses and outcomes to model the strengths and weaknesses of individual clinicians.
10.7 Quality and Safety of AI Quality principles include health care that is: safe, effective, patient-centered, timely, efficient, and equitable. Healthcare system goals include:
10 Artificial Intelligence in Healthcare: Directions of Standardization
(1) (2) (3) (4) (5) (6)
249
enhancing patient experience, improving population health, reducing per capita health care costs, safe-guarding/improving the work-life of health care providers, improving business processes, equity and inclusion.
Currently, more than 84 ethics initiatives have published reports describing highlevel principles, tenants, and abstract requirements for the development and deployment of AI [10]. An analysis of 36 prominent sets of such principles revealed the following 8 themes: (1) (2) (3) (4) (5) (6) (7) (8)
privacy, accountability, safety and security, transparency and explainability, fairness and non-discrimination, human control of technology, professional responsibility, and promotion of human values.
These principles apply to the development and deployment of AI health applications. They should be applied to every stage of the AI health application life-cycle, from needs determination and design through decommissioning. When pertinent, to promote trustworthiness, they can be applied in basic requirements for the development, deployment, use, and evaluation of AI health applications. Moreover, relevant principles and requirements can also be applied to the development and revision of health informatics standards pertaining to or encompassing AI health applications. In particular, whenever applicable there should be full transparency about an AI product throughout its life-cycle, e.g., how the algorithm works, what data were used to train it, what tests were conducted, how the trained product performed in such tests, experience with use in practice, etc. Nuffield Council on Bioethics declared: “The tech mantra of “move fast and break things” does not fit well when applied to patient care” [9]. Despite hype, artificial intelligence in healthcare is still in its infancy, and it really has hardly started. There are some positive prognosis about AI: it will deliver major improvements in healthcare quality and safety, reduce costs, and even make an imminent revolution in clinical practice. Nevertheless, there are no strong evidence for that. Medical society are very early in the evidence cycle and it is unclear how the predictions coincide with reality. It may be difficult to apply current regulatory frameworks for health and medical technologies to applications utilizing artificial intelligence. For instance, medical device regulations across the world require validation that devices and processes produce reproducible and expected outputs or results. Many jurisdictions are modifying existing regulations and/or developing new regulatory frameworks to govern
250
Hoc Group on Application of AI Technologies
the use of AI in Healthcare. These for instance are related to the jurisdictions’ applicable regulatory frameworks for Software as a Medical Device (SaMD). Relevant initiatives are underway in the USA, the EU, Russian Federation Australia, Canada and elsewhere. Most of these jurisdictional approaches leverage a common risk categorization framework developed under the auspices of the International Medical Device Regulators Forum’s (IMDRF) SaMD working group. AI applications can be classify by applying a risk-based approach. For example, IMDRF categorize SaMD along two dimensions and classify SaMD into one of four categories, ranging from low-risk to high-risk, taking account of: • State of healthcare situation or condition, • Significance of information provided by SaMD to healthcare decision. Similarly, the EU Commission’s White Paper on Artificial Intelligence considers that in a healthcare a range of AI applications can exist, and proposes an assessment of the level of risk of a given use based on the impact on the affected parties. For AI applications in the healthcare sector, additional factors might be taken into account for establishing a risk-classification framework, for example: the degree of adaptivity (values could be “locked”, “discrete adaptive”, “continuously adaptive”), the degree of autonomy. AI applications that have a high degree of adaptivity and/or high degree of autonomy can be regarded as potentially of higher risk. It is not immediately clear how to regulate such applications under existing legislations, how to place such systems on the market, and how to operate such systems, at least in a safe and effective manner and with certainty about potential liability. Currently, legislators and regulatory agencies across the world are involved in active work to creation and harmonization of rules for AI in healthcare. Interpretation and guidance may be helpful for applying existing regulations to AI technology, and modifications, new regulations and new standards may be necessary to fill in possible gaps and to resolve ambiguities in existing regulations. Wherever possible, regulations and standards should be harmonized internationally to ensure that everyone has access to current state of the art technologies and to prevent the development of technical barriers to trade that raise costs and limit access to healthcare. Based on experience with Management System Standards (MSS) such as ISO 9001:2015, certification of manufacturers of AI health applications can be expected: (1) (2)
to improve product quality, to assist customers to select vendors.
Such certification may also be of value to regulators of medical products when using a risk-based approach. The term manufacturer encompasses both developers of commercial products (vendors) and also healthcare and similar organizations that develop products for their own use. According to globally recognized practice a manufacturer can only place medical AI-based devices on the market for use on patients or their data when these are safe and effective. Once the device is on the market, the manufacturer must perform clinical evaluations throughout the entire lifetime of the device, including post-market clinical follow-up, to prove the assumptions remain valid and no risks emerge that are
10 Artificial Intelligence in Healthcare: Directions of Standardization
251
unacceptable. For that purpose a clinical trial (test) should be perform. The objective of clinical trial (test) is to confirm the effectiveness, safety of use, and compliance of medical device characteristics with the intended use specified by the manufacturer. Usually, the clinical trial consist from two stages [5]: 1. 2.
Analytical validation. Clinical acceptance.
Analytical validation refers to the evaluation of the correctness of input data processing by the software to create reliable output data, which is performed using reference datasets. Clinical acceptance (evaluation of the performance by using the software within a standard operating process) consists of two components: • clinical correlation (evaluation of whether there is a reliable clinical relationship between the results and the target clinical condition), • clinical validation (confirmation of achievement of the intended goal for the target population in the clinical workflow through the use of accurate and reliable output data). Clinical tests are organize per national legislation and local or accepted international methodology for assessing the quality, effectiveness, and safety of medical devices. Various metrics can be used to assess AI. The standard set of diagnostic metrics includes: 1. 2. 3. 4. 5. 6.
Sensitivity, specificity, accuracy, the likelihood ratio of a positive or negative result, positive and negative predictive value. Area under receiver operating characteristic curve (ROC) as area bounded by ROC-curve and horizontal coordinate. The agreement (concordance) of classification. Similarity degree. Timing study. Retrospective per-review (audit).
Detailed information, definitions and formulas can be find elsewhere. Standard metrics are used to compare the diagnostic performance of index-test (AI-based software) relative to the reference-test (another option for diagnostic, screening, decision-making, etc.). Thus, results of AI healthcare applications depend on: • • • •
data quality; algorithms quality (and hence of software programming generally); limits of validity of their applicability; their proper implementation (e.g., in workflows) and other operational considerations.
252
Hoc Group on Application of AI Technologies
Beyond the availability of sufficient data, data quality is one of the primary constraints in implementing AI health applications. It is also the main factor determining the validity of results, captured in the well-worn phrase from the start of the computer-age, garbage in garbage out. Assessing product/algorithm performance is different from assessing (and assuring) data quality. Certain aspects of algorithm quality can be assessed by using fit-for-purpose (FFP) standard data sets (1) to ensure that an algorithm performs reliably and (2) to compare the performance of different algorithms with the same purpose. Using multiple standard FFP data sets may provide insight into the validity of outputs with different data inputs. Further, using standard degraded data sets may help to gauge use-risk, i.e., to assess potential results produced by an AI health application that was trained using FFP data when it is used with the type of real-world data (RWD) expected to be encountered in practice. The validity of results in practice depends on the quality of RWD inputs, as well as the quality of algorithms or other machine-performed processes.
10.8 Standardization of AI in Healthcare AI has the potential to have a significant positive impact on healthcare and improve the quality, efficiency, and equity of healthcare delivery. However, like any emerging field, there is a lack of regulatory guidance and standards regarding the use of AI in healthcare. Current regulatory frameworks may have gaps, or there may be questions related to interpreting and applying existing regulatory frameworks to health applications that incorporate AI solutions. Global standardization supports a harmonized approach and can reduce the burden on stakeholders when it comes to establishing regulatory frameworks and interpreting and fulfilling regulatory requirements. Standardization work should be focuse in the following areas: • • • •
Methods to measure and to reduce bias Methods to measure reliability Notions of reproducibility in non-deterministic systems Methods for explainability for various kinds of AI techniques (for example, for deep-learning neural networks).
Many different organizations are developing or have developed papers that are potentially relevant to the update of AI standards (for example, ISO/IEC JTC1/SC42 is developing ISO/IEC 22989 which is an AI glossary. JTC 1/SC SC42’s made ISO/IEC 23894 is titled “Risk Management”). The World Health Organization is working on developing a standardized assessment framework for the evaluation of AI-based methods for health, diagnosis, triage or treatment decisions. Some other organizations (like number of technical committees of ISO) are developing horizontal AI standards, many governments have published papers regarding the development and use of AI in multiple industries, regulatory agencies have published draft (or final) guidance documents specific to AI in healthcare, etc. There are many opportunities
10 Artificial Intelligence in Healthcare: Directions of Standardization
253
to leverage their existing work, as well as help in the development of future work products of those organizations. There are three not exclusive categories of AI standards in healthcare: 1. 2. 3.
AI Technologies and Applications. AI in a Clinical Encounter. AI in Clinical, Public Health and Research Sub-specialties.
Current standard landscape includes topics, already realized by number of organization: • • • • • • • • •
adaptive regulatory frameworks; definitions, vocabulary and general characteristics; recommended practice and basic principles of quality management; trustworthiness principles; ethical concerns; data privacy and safety; set of standards for human augmentation; set of standards for biotechnology; set of standards for AI in imaging. Critically important for further progress standards for:
• Clinical trials, evaluating the performance and validity of AI health applications, such as both static (fixed until updated by the manufacturer) and dynamic (selflearning) algorithms used in AI/ML products; • Dataset preparation, data labeling (tagging), including issues of describing, assessing, and communicating, data quality (to assist manufacturers and users of AI health applications to decide if available data are fit-for-purpose and/or how they differ from data that are). • Quality management system for organizations that manufacture AI health applications, such intermediate products as standard data sets, and/or supply data for these purposes; • Methods to measure and to reduce bias, to measure reliability and performance; • Notions of reproducibility in non-deterministic systems; • Methods for explainability for various categories of “AI solutions” in healthcare. Guidance and regulation (via national and international standards) are need for manufacturers and users of AI in different sectors to increase the usability and confidence in such systems. Manufacturers would benefit from the guidance produced by establishing standards for good manufacturing processes and practices; which may vary by type of AI health application. Required standards for manufacturing such as AI health applications for machine learning, deep learning, and other data dependent AI health applications should encompass inputs (e.g., data), processes (e.g., algorithms, interpretation, display/distribution), and results (including their implementation, limits on use, evaluation, etc.); as well as applicable management modules (which settings may determine AI application processes and/or performance) and
254
Hoc Group on Application of AI Technologies
environmental probes (which may set operational parameters). Resultant requirements should be expressed in the form of standardized checklists. While general principles may apply (so that some checklist items may be common to all such products), some checklist items may be specific to a type of AI health application. Checklist standards should specify personnel qualifications, experience, and/or training necessary to be able to use such checklists effectively (and, if applicable, associated certification requirements). Checklists could be the basis for developing a computerized decision support tool (CDST) to facilitate their use in practice and to document which requirements were considered when and with what results. Customers of AI health application manufacturers, and such individual end-users as clinicians, need information pertaining to the correct use of an AI product and the interpretation of resultant information. The scope of such required correct-use information to be provided by AI health application manufacturers should be standardized, in terms of contents and expression. Customers of AI health application manufacturers, such as health care organizations, can use such information to inform purchasing decisions, product installation, training, individual end-user guidance, etc. A corollary is that an organizational user has assessed the FFP of its available data. Further, organizational users should periodically repeat assessments of RWD quality so that they can gauge use-risks and can track the effectiveness of data quality improvement efforts. Regulators could require organizational health AI application users to submit results of their QMS assessments and/or product-produced process and outcome data to enable regulators to monitor product safety and performance across organizations and settings. Such requirements might extend to individual endusers so that organizational users of the AI health application can aggregate their experiences for reporting purposes. The information resulting from such reported data may support regulators in meeting their obligations to ensure that products on the market are safe and effective. Various factors contribute to people having trust in AI systems, these factors are grouped into aspects of efficacy, adoptability, and understandability, respectively. • Knowing that the AI system has been developed according to the state of the art, and is operated by skilled/well-trained persons • The system offers insights into its decision-making, by providing some form of transparency and by offering explanations that are understandable to the target audience • The system operates reliably according to some measure of reliability • The system is proven to make unbiased decisions, according to some measure of bias • The system is verified and validated according to standardized, recognized software development methods that are well-suited to the system at hand, taking into account of, for example, its degree of adaptability. Many of these factors lend themselves well to standardization and are therefore seen as opportunities for standardization. This list is not exhaustive and just gives some examples:
10 Artificial Intelligence in Healthcare: Directions of Standardization
255
• Standardized definitions (terms, concepts), once established, contribute to common understanding of various stakeholders (among them: legislators, operators, manufacturers) Regulations can refer to and use these definitions, either directly in legal text or by making use or harmonized/recognized standards • Verification and validation of software can be covered in SW Lifecycle Standards • Methods for Explainability and Transparency can be described in standards, based on the state of science, making such methods the state of the art. To start addressing the need of regulatory guidance and standards, ISO/TC 215/AHG 2 was created in 2019 at the ISO/TC 215 meeting in Daegu, S. Korea. A cross-functional team formed and divided the work into categories such as a landscape analysis, establishment of key principles, regulatory assessments, etc. Note that ISO/TC 215/AHG 2 does not create specific recommendations for specific updates to a specific standard, rather, it provides a series of resources (e.g. AI standards landscape, use case inventory, etc.) that can be used by teams performing an assessment of how AI might impact existing standards or require new standards. There is the roadmap to future directions in developing standards for AI health applications, a fast evolving field. The ISO/TC 215 leadership should decide the direction of travel and roads to be taken. Key recommendations include the following: • Establish a mechanism to keep the landscape map up-to-date, fit for purpose, and accessible—to avoid duplication, overlaps, and conflicts in standards. • Establish a mechanism to develop/maintain a dictionary of key terms, synonyms, abbreviations, etc. to be used in standards development—to standardize and to avoid confusion in terminology used in standards. • Issue guidance to TC conveners to review existing standards to establish priorities for revision to include needed but missing provisions pertaining to AI health application or missing additional standards—to ensure that ISO/TC 215 and its standards remain relevant. • Develop/maintain a checklist of AI health application considerations for use when revising/developing standards—to ensure that all relevant considerations are addressed. • Develop/maintain standards for manufacturing, evaluating, and using AI health applications, including a management system standard for certifying organizations involved in the AI health application life-cycle/supply chain—to foster good practices, safe and effective products and to support regulators.
10.9 Conclusion AI is playing an increasingly important role in the provision of medical care, in supporting medical decision-making, and in managing patient flows. In many countries of the world, sore attention is paid to the development and application of AI in medicine. In these countries, government programs are being developed and innovative solutions are being introduced. Therefore, the formation of unified approaches,
256
Hoc Group on Application of AI Technologies
definitions, requirements for AI in medicine will significantly increase the efficiency of its development and application. The tasks solved by the ISO AHG2 TC215 are essential for the development of this direction of AI and will be extremely useful to the global community. Acknowledgements The authors would like to thank the leadership of Technical Committee ISO/TC 215 “Health Informatics” and Subcommittee ISO/IEC JTC 1/SC 42 “Artificial Intellegence” of ISO Technical Committee ISO/IEC JTC 1 “Information Technology” for providing the opportunity to bring together such a wonderful international team and do useful and meaningful work. *Hoc Group on Application of AI Technologies in Health Informatics (AHG2 TC215 ISO) Paolo Alcini1 , Pat Baird2 , Peter Williams3 , SB Bhattacharyya6 , Todd Cooper7 , Rich de la Cruz8 , Chandan Kumar9 , Gora Datta10 , Dorotea Alessandra De Marco11 , Andreas Franken12 , Regina Geierhofer13 , Peter G. Goldschmidt14 , Ilkka Juuso15, 16 , Herman Klimenko4 , Antonio Kung17 , Frederic Laroche18 , Joe Lewelling19 , Martin Meyer20 , Sergey Morozov21 , Anna Orlova22 , Telonis Panagiotis23 , Thomas Penzel24 , Derek Ritz25 , Gaur Sander26 , Soo-Yong Shin27 , Alpo Värri28 , Anton Vladzimerskiy4, 21 , Georgy Lebedev4, 5 ([email protected]). Affiliations Medicines Agency, 2 Royal Philips, Pleasant Prairie, USA, 3 Oracle Corporation, 4 I.M. Sechenov First Moscow State Medical University (Sechenov University), 5 Federal Research Institute for Health Organization and Informatics, 6 Bhattacharyyas Clinical Records Research & Informatics, 7 Trusted Solutions Foundry, Inc., San Diego, USA, 8 Silver Lake Group, Inc., Minnetonka, USA, 9 Indian Standards (BIS), New Delhi, India, 10 CAL2CAL Corporation, Irvine, USA, 11 Italian Data Protection Authority, 12 Bundesverband der Arzneimittel-Hersteller e.V. (BAH), Bonn, Germany, 13 Medizinische Universität Graz, Austria, 14 World Development Group, Inc., 15 Cerenion Ltd., 16 Center for Machine Vision and Signal Analysis at the University of Oulu, 17 Trialog, Paris, France, 18 LCI Consulting Inc, Montreal, Canada, 19 Association for the Advancement of Medical Instrumentation, Arlington, USA, 20 Siemens Healthineers, 21 Research and Practical Center of Medical Radiology, Department of Health Care of Moscow, 22 School of Medicine, Tufts University, Boston, USA, 23 European Medicines Agency, Amsterdam, The Netherlands, 24 Charite University Hospital, Interdisciplinary Center of Sleep Medicine, Berlin, Germany, 25 ecGroup Inc, 26 Centre for Development of Advanced Computing (C-DAC), India, 27 SungKyunKwan University, Seoul, Republic of Korea, 28 Tampere University, Faculty of Medicine and Health Technology, Tampere, Finland
1 European
References 1. A definition of AI: main capabilities and disciplines. European Commission High-Level Expert Group on Artificial Intelligence.08.4.2018.- https://ec.europa.eu/newsroom/dae/doc ument.cfm?doc_id=56341. 2. M. Chen, M. Decary, Artificial intelligence in healthcare: an essential guide for health leaders. Healthc. Manag. Forum. 33(1), 10–18 (2020). https://doi.org/10.1177/0840470419873123 3. K.H. Yu, A.L. Beam, I.S. Kohane, Artificial intelligence in healthcare. Nat. Biomed. Eng. 2(10), 719–731 (2018). https://doi.org/10.1038/s41551-018-0305-z 4. J. Tohka, M. van Gils, Evaluation of machine learning algorithms for health and wellness applications: A tutorial. Computers in Biology and Medicine 132, (May 2021). https://doi.org/ 10.1016/j.compbiomed.2021.104324
10 Artificial Intelligence in Healthcare: Directions of Standardization
257
5. S.P. Morozov, A.V. Vladzymyrskyy, V.G. Klyashtornyy, A.E. Andreychenko et al., Clinical acceptance of software based on artificial intelligence technologies (radiology), in Best Practices in Medical Imaging, vol. 57 (2019), p. 45 6. European Society of Radiology (ESR), Impact of artificial intelligence on radiology: a EuroAIM survey among members of the European Society of Radiology. Insights Imaging 10(1), 105. https://doi.org/10.1186/s13244-019-0798-3 (2019) 7. B. Heinrichs, S.B. Eickhoff, Your evidence? Machine learning algorithms for medical diagnosis and prediction. Hum. Brain Mapp. 41(6), 1435–1444 (2020). https://doi.org/10.1002/hbm. 24886 8. T. Ploug, S. Holm, The four dimensions of contestable AI diagnostics—a patient-centric approach to explainable AI. Artif. Intell. Med. 107, 101901 (2020). https://doi.org/10.1016/j. artmed.2020.101901 9. Bioethics briefing note: artificial intelligence (AI) in healthcare and research. Nuffield Council on Bioethics (2018). https://www.nuffieldbioethics.org/assets/pdfs/Artificial-Intelligence-AIin-healthcare-and-research.pdf 10. A. Jobin, M. Ienca, E. Vayena, The global landscape of AI ethics guidelines. Nat. Mach. Intell. 1, 389–399 (2019)
Chapter 11
Development of Artificial Intelligence in Healthcare in Russia A. Gusev, S. Morozov, G. Lebedev, A. Vladzymyrskyy, V. Zinchenko, D. Sharova, E. Akhmad, D. Shutov, R. Reshetnikov, K. Sergunova, S. Izraylit, E. Meshkova, M. Natenzon, and A. Ignatev
Abstract Research and development in the field of artificial intelligence in Russia has been conducted for several decades. Amid a global increase in attention to this area, the Russian Federation has also developed and systematically implemented its own national strategy, which includes healthcare as a priority sector for the introduction of AI products. Government agencies in collaboration with the expert community and market are developing several key areas at once, including legal and technical A. Gusev · S. Morozov · A. Vladzymyrskyy · V. Zinchenko · D. Sharova · E. Akhmad · D. Shutov · R. Reshetnikov Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies, Moscow Health Care Department, Moscow, Russia A. Gusev K-Skai, LLC, Petrozavodsk, Republic of Karelia, Russia G. Lebedev (B) · A. Vladzymyrskyy I.M. Sechenov First Moscow State Medical University (Sechenov University), Moscow, Russia e-mail: [email protected] G. Lebedev Federal Research Institute for Health Organization and Informatics, Moscow, Russia R. Reshetnikov Institute of Molecular Medicine, I.M. Sechenov First Moscow State Medical University, Moscow, Russia K. Sergunova National Research Center «Kurchatov Institute», Moscow, Russia S. Izraylit · E. Meshkova Non-Commercial Organization Foundation for Development of the Center Elaboration and Commercialization of New Technologies, the Skolkovo, Foundation, Moscow, Russia M. Natenzon National Telemedicine Agency, Research-and-Production Corporation, Moscow, Russia Center for Big Data Storage and Analysis Technology, Center for Competence in Digital Economics, Moscow State University M.V. Lomonosov, Moscow, Russia A. Ignatev MGIMO University (Moscow State Institute of International Relations), Moscow, Russia © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_11
259
260
A. Gusev et al.
regulation. Based on the IMDRF recommendations, software products created using AI technologies for application in the diagnostic and treatment process are considered in Russia as the Software as Medical Devices (SaMD). Over the last year, the Government of the Russian Federation, the Ministry of Health and Roszdravnadzor made many targeted changes to the current legislation to enable a state registration of software products based on AI technologies and their introduction to the market. More than 20 regions of the Russian Federation have launched various projects to implement AI technologies in the real clinical practice. Work is underway to create the first series of national technical standards to accelerate products’ development and create more trust from healthcare practitioners. Keywords Artificial intelligence · Software as medical devices · Technical regulation · Legal regulation · Clinical practice
11.1 Introduction In 2017–2019, virtually all of the leading states, including the United States, China, the United Kingdom and Russia, developed and published strategies or other program documents that define national goals, objectives, and plans for developing AI 0, 2, 3. As a rule, these documents set up important general principles, including approaches to AI regulation. In this regard, some regulatory legal acts (RLA), advisory documents (soft law), various methodologies and guidelines, as well as technical standards began to be created in each country considering the existing general national strategy. Russia, participating in this process, has developed and adopted a comprehensive package of program documents that allowed starting the National Strategy implementation. In October 2019, the President of the Russian Federation V. Putin signed Decree No. 490 “On the Development of Artificial Intelligence in the Russian Federation” and adoption of “National Strategy for the Development of Artificial Intelligence over the Period up to the Year 2030"4. It should be noted that the National Strategy was developed based on extensive own experience accumulated in Russia. Research on artificial intelligence began in the USSR in the 60s of the last century. Their basis were the developed mathematical school, emergence of computers and beginning of space exploration. The natural development of these innovations led to the expansion of their applications, which are now defined, not fully established yet, by the term of “artificial intelligence”. From the very beginning, these works were aimed, among other things, at the application in medicine. There are someexamples of Russianinnovations: (i) Mathematical analysis of electrocardiograms, as time series for quick diagnosis in the ambulance and during population health screenings, (ii) Remote monitoring and control of the condition of pilots and astronauts, (iii) A project to prevent instant death associated with solar flares and their impact on cardiac patients, using data of high-apogee solar activity monitoring satellites; and many other projects.
11 Development of Artificial Intelligence in Healthcare in Russia
261
The development of artificial intelligence in the Russian Federation is aimed to ensure the growth of welfare and quality of life of its population, as well as national security and the rule of law, to achieve sustainable competitiveness of the Russian economy, including leading positions in the world in artificial intelligence. The main directions of AI development in Russia are stimulating demand and implementation of domestic products, AI safety, improving standard and technical regulations, ensuring high-quality data sets. The document is the basis for the development and adjustment of AI state programs and federal projects in the Russian Federation.
11.1.1 National Strategy for AI in Healthcare of the Russian Federation A prioritization of AI development in healthcare is a peculiarity of the Russian National Strategy that distinguishes it from most national strategies of other countries. AI introduction in this area should contribute to the achievement of the strategic goals and objectives stipulated by the national project “Healthcare”, including reducing morbidity and mortality, increasing life expectancy, etc.5. For this purpose, based on the analysis of key Russian and global trends, six main directions of development have been identified: (1) transition from the model of “treating diseases” to the model of “saving and promoting health”; (2) transition in the provision of healthcare services to the implementation of the concept of P4 medicine (predictive, preventative, personalized, participatory); (3) transition from local systems to global, cross-border systems of medical care, implementation of international projects to eliminate epidemics and pandemics of the most dangerous infectious and non-infectious diseases; (4) the use of information, telemedicine systems, artificial intelligence and big data analytics to ensure public availability and unified high quality of health care for the population; (5) development of new highly informative methods of diagnostics and treatment; (6) development of new forms of drugs. Therefore, the National Strategy for AI Development in Russia defines that “the use of artificial intelligence technologies in healthcare contributes to the creation of conditions for improving the living standards of the population. It includes raising quality of the healthcare services, in particular, preventive examinations, imagebased diagnostics, predicting the occurrence and development of diseases, selecting the optimal dosage of medications, reducing the threat of pandemics, automation and accuracy of surgical interventions"6. To do this, it is planned to ensure the solution of a number of tasks at the level of federal and regional executive authorities, which are listed below. (i) Long-term support for research and innovations, (ii) Promoting the implementation of software based on AI technologies in public health authorities and medical organizations, (iii) Development of AI education, (iv) Raising public awareness about capabilities of AI technologies, (v) Development of legal and technical regulation, (vi) Export support
262
A. Gusev et al.
and promotion of Russian AI products in international markets, (vii) Creating incentives to attract investment in the development of science, research and AI products, (viii) Ensuring integrated safety in the process of creation and use of AI products. The key strategic directions of AI development for healthcare in the Russian Federation are listed below. Production of high-quality labeled datasets and provision of controlled access to them for scientific organizations and developers. It should ensure improvement of scientific research and publications, as well as increased competition between developers and, as a result, enhance the quality of software products and their ultimate effectiveness in practical health care. Stimulation of creation and development of the competitive market of software products in healthcare using AI technologies, provision of financial support for covering the cost of the usage of such products at all levels of healthcare. It will increase investments in this sector, a number of developments, competition in this area and ultimately improve the efficacy and practical use. It is expected that the implementation of assigned tasks for AI development in the healthcare will allow Russia to achieve the results: improve management efficiency, automate routine operations, improve the safety of medical activities and satisfaction of the population with medical care, and ensure a single high standard of quality of medical care. Main indicators of successful implementation of an AI-strategy in the healthcare system are the following: (1) increase in the number of companies developing AI products for healthcare; (2) increase in the number of results of the intellectual activity (patents, publications in Russian and international scientific peer-reviewed journals, etc.); (3) increase in the number of products passed state registration, including software as a medical device; (4) increase in the number of government agencies and organizations in the healthcare system using AI-based products to improve their productivity; (5) increase in the number of scientific articles on the topic of AI in healthcare by Russian scientists and citation index in the world’s leading publications; (7) increase in the number of accessible datasets, labeled and verified by qualified health professionals.
11.1.2 The Work of Government Agencies and the Expert Community on the Development of AI in Healthcare In order to accomplish the national strategy for AI development, two federal projects “Regulation for the digital environment” and “Artificial intelligence” are being implemented in the Russian Federation. The project “Artificial Intelligence” is currently at the final stage. By Decree of the Government of the Russian Federation N 2347 of March 2, 2019, a special management system was established for the project “Regulation of the digital environment”. Autonomous Nonprofit Organization (ANO) “Digital Economy” and the Skolkovo Foundation participate in this management in
11 Development of Artificial Intelligence in Healthcare in Russia
263
addition to the Government of the RF and federal executive agencies. ANO “Digital Economy” represents large, medium, and small businesses as well as experts in AI development, adoption of regulatory legal acts on AI in Russia. The Skolkovo Foundation, which is the center of competence for regulations, provides methodological support, writes drafts of regulatory legal acts on AI, as well as specific proposals for regulatory legal acts on AI, participates in the meetings at federal executive agencies, committees of the State Duma of the Russian Federation. Thus, this management system allows to consider the opinion of business and experts, as well as to include administrative, state, business interests together in the adopted regulatory legal acts. It creates the basis for cumulative economic benefits, healthcare quality improvement, enhancement in citizens’ quality of life, and greater prosperity. A similar special management system operates the federal project “Artificial Intelligence”. On December 22, 2020, a joint perspective standardization priority program “Artificial Intelligence” was approved by the Ministry of Economic Development and the Federal Agency for Technical Regulation and Metrology of the Russian Federation for the period 2021–2024. The implementation of the program is planned in accordance with the action plans (roadmaps) of promising markets of the National Technology Initiative (NTI), as part of NTI Competence Centers’ programs and other interested organizations. The program is planned to be updated annually for the preparation of national standardization programs for the next year. The updated program will require approval by technical committees of adjacent areas, including, but not limited to: TC 022 “Information technologies”; TC 026 “Cryptographic information protection”; TC 045 “Railway transport”; TC 051 “System of design documentation”; TC 057 “Intelligent transport systems”; TC 098 “Biometrics and biomonitoring”; TC 194 “Cyber-physical systems”; TC 201 “Ergonomics, labor and engineering psychology”; TC 234 “Alarm and anti-crime protection systems”; TC 362 “Information security”; TC 461 “Information and communication technologies in education”; TC 468 “Health informatization”, TC 164 “Artificial committee”. Thus, the comprehensive work of the expert community will be ensured, including adjacent areas to the technologies based on artificial intelligence. It is important to note that Russia has also prepared a draft order of the Ministry of Economic Development of the Russian Federation, which establishes rules for determining the ownership of projects supported by the Innovation Promotion Foundation and Skolkovo Foundation as part of the federal project “Artificial Intelligence” of the national program “Digital Economy of the Russian Federation”. The document will be of great importance, considering the formalization of the main criteria that allow attributing the project to artificial intelligence. In addition, the federal project “Digital Technologies” has been implemented in the country under the direction of the Ministry of Digital Development, Communications and Mass Media of the Russian Federation since November 20,188. The key project goals are to ensure the technological independence of the state, the possibility of commercializing domestic research and innovations, as well as to accelerate the
264
A. Gusev et al.
technological development of Russian companies and competitiveness of their products and solutions made by AI on the global market. A list of end-to-end digital technologies includes big data, new production technologies, industrial internet, artificial intelligence; wireless communication technologies, robotics sensor components, quantum technologies, distributed ledger systems, virtual and augmented reality technologies. As part of the project, relevant roadmaps have been developed in the fields of artificial intelligence and neurotechnology. Besides the above-mentioned projects, the national project “Science” has been implemented in the country under the direction of the Ministry of Science and Higher Education of the Russian Federation since October 2018. As part of this project, world-class research centers on different directions including the development of artificial intelligence will be launched. The key goals of this national project are to make the Russian Federation one of the five world leading countries in research and development in the areas determined by the priorities of scientific and technological progress, to attract leading Russian and foreign scientists as well as young promising researchers to work in the Russian Federation, to increase internal budget for research and development. The main events and important Russia’s management steps at the state level in the field of artificial intelligence can be found on the most authoritative international monitoring resource—the Artificial Intelligence Policy Observatory of the Organization for Economic Cooperation and Development (AIPO—OECD), launched in February 2020. AIPO is a comprehensive analytical platform for reviewing policy measures and national program documents on artificial intelligence. One of the content blocks of the Observatory is an interactive database on countries’ various initiatives, about 300 initiatives in total. Currently, the Observatory reflects 12 major Russian initiatives. In reality, the number of initiatives is significantly higher. At the moment, the country has a group of main authors in formation of regulations and an appropriate legal framework for AI, as well as in technological development and engineering practice (leaders in the system design, their implementation and operation). Moreover, which is no less important, the foundation for the maturity of the research community in the field of AI at the junction of law, philosophy, economics and statistics (across the entire value chain of AI systems), as well as for applied scientific disciplines was laid down. The leading Russian organizations that are most actively developing research and development in the field of AI are: Russian Academy of Sciences, National Research University—Higher School of Economics, Moscow State University, Sberbank of Russia, Southern Federal University, Novosibirsk State University and a number of others. On the scientific track, we can highlight the potential of the Russian mathematical school and groundwork for developing competitive software, as well as promising research in neuroinformatics. We can also note the potential in the field of philosophical research, which can apply successfully to humanitarian, social and cultural issues of the development of cybernetics, computer science, data science, artificial intelligence. At the competencies’ junction of ethics specialists and practicing engineers, a balanced understanding of how ethical concepts and theories can be configured and
11 Development of Artificial Intelligence in Healthcare in Russia
265
developed in the field of artificial intelligence is being formed in Russia. The emphasis is placed on verified scientific justifications, evidence base, effective measurement tools, subject ontology, a comprehensive analysis of the potential consequences of engineering solutions, and thorough preliminary testing of hypotheses. The starting points of main ethical and moral approaches for developing AI technologies are listed below. • AI should be considered as part of the general paradigm of the ecosystem of endto-end digital technologies’ development. In this regard, it is advisable to rely on and gradually develop the already existing basis of ethical norms and standards that are applicable to the field of Information and Communication Technologies (ICT), computer science, and data. You should refer to the ethical practices and tools that have already been developed in cybernetics, medicine, biotechnology and genetics, and harmonize them in relation to interpenetrating convergent technologies and modern scientific, technical and humanitarian challenges. • Follow the path of reasonable application of universal norms as well as subject and industry-specific ethical approaches considering the specifics of particular types of AI models/systems and environment for their application (domain). In this context, a number of Russian authors have proposed approaches for the industryspecific codification’s development of ethics in the field of AI. Such a model will allow providing a targeted and subject-oriented assessment of the potential impact of specific products and devices on a person and society under certain operating conditions and in relation to specific situations. In this regard, we can emphasize the importance of individual specialized studies in health care. Based on the diversity of humanity’s ethical concepts and theories, to avoid attempts at hasty generalization and stereotypes, to consider cultural, historical, spiritual, ideological and religious peculiarities of the countries, nations and ethnic groups, a complex, multi-faceted inner nature of humans, their ideas about welfare and justice. In Russia, great attention is paid to the training of AI specialists. Leading Russian universities are training specialists in developing AI products, and the number of trainees is constantly increasing. On December 10, 2020, the first Russian Institute of Artificial Intelligence was opened in Innopolis, which is a specialized IT city built in the Republic of Tatarstan. As TASS informs, the new division of the Russian IT University will be engaged in scientific and educational activities as well as in the development of AI projects in the oil and gas industry, medicine, geoinformation technologies, industry, new materials and microelectronics. Development of the Institute of Artificial Intelligence supposed to create the international consortium that will include large Russian and international companies. Innopolis University has been developing artificial intelligence technologies for oil and gas industry, energy sector, medicine, forestry and agriculture since 2014. In 2019, all AI projects were merged into a single center. From now and on, the center and specialized laboratories will conduct research and applied developments under the umbrella of the Institute of Artificial Intelligence.
266
A. Gusev et al.
The Russian National Strategy stipulates the development of international cooperation, including exchange of specialists and participation of domestic specialists in international conferences in the field of AI as one of the measures for active promotion in the field of fundamental and applied research [1]. Representatives of government agencies and the expert community participate in the international dialogue on developing soft law instruments, creating regulatory and legal framework at the sites of the Council of Europe (CAHAI Committee), UNESCO, ITU, UNCTAD, OECD. Russian specialists are involved in ISO/IEC, IEEE, ITU projects on the standardization. In addition to a large number of Global Digital Economy Forums organized by Russia, the country hosts annual international conferences on AI, including «AI Journey», «Neuroinformatics», «13th International Conference “Intelligent Data Processing"» (IDP-2020), «AI Global Dimension—Governance Challenges», «International Conference on Information Systems 2020», «Trustworthy AI». Representatives of government agencies and the expert community participate in the international dialogue on developing soft law instruments, creating regulatory and legal framework at the sites of the Council of Europe (CAHAI Committee), UNESCO, ITU, UNCTAD, OECD. Russian specialists are involved in ISO/IEC, IEEE, ITU projects on the standardization. When the ITU Telecommunication Development Bureau established a Working Group on Telemedicine in 2002, a Russian representative became a co-chair of this group. Currently, the Working Group is focusing on expanding work on AI application in healthcare. It was reflected in the “Tokyo Declaration 2018” adopted at the Japan-Russia eHealth Workshop on November 25, 2018 to be sent as a joint Japanese-Russian contribution to the Development Bureau of the International Telecommunication Union for making recommendations on practical use of telemedicine by the Working Group based on Japanese and Russian experience for other countries—ITU members. Seminar participants formulated the directions for joint projects in these documents: (1) research and development of medical artificial intelligence, including collective AI; (2) socio-economic studies onthe efficacy of the implementation of the entire spectrum of healthcare technologies; (3) practical use of system solutions, technologies, stationary and mobile equipment for medical care of the population in remote areas; (4) personal Digital Home Doctor (PDHD) for health monitoring of the elderly, children, chronic patients and disabled people; (5) monitoring the health status of the working population; (6) a network that includes PDHD and telemedicine consulting centers in inpatient settings of all levels.
11 Development of Artificial Intelligence in Healthcare in Russia
267
11.2 AI Regulations in Healthcare of the Russian Federation Despite the explosive growth in investments and developments in AI for healthcare, the safety and confidence associated with the incorporation of these technologies into routine clinical practice to this day remain a challenge [2]. The Russian authorities continue to improve regulations for AI-based software products for healthcare. These regulators are focused on the following aspects: (1) ensuring safety, quality, and efficiency of AI products for healthcare; (2) setting the responsibility for the consequences of using AI; (3) establishing measures to protect data and confidentiality; (4) implementing cybersecurity measures; (5) Protecting intellectual property. In his book “Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again” Eric Topol addresses the importance of the regulatory control over AI technologies in terms of the foreseeable risks: “Regulatory AI issues are especially pertinent in medicine. We are in the early days of regulatory oversight of medical algorithms, and only a limited number have been approved. However, the issue is not simply how many AI-based applications have been submitted to the FDA and howmany more will be submitted for approval. These tools are and will be constantly evolving with larger datasets and autodidactic potential. This will require developing new ground rules for review and approval, conducting post-market control, and bringing on new personnel with AI expertise to regulatory agencies. Giving a green light to an algorithm that is not properly validated, or that is easily hacked, could have disastrous implications” [3].
11.2.1 Basic Principles of Regulations in Healthcare The fundamental principles of legal and technical regulation are applied to the interaction between human and artificial intelligence in healthcare, and ethical standards. Simultaneously, it is recommended to avoid excessive regulation instead of establishing a legal framework and simplifying administrative procedures to clear the way for testing and implementing AI products. In Russia, the legal and technical regulation of the AI technologies including in healthcare is defined by the Government Decree No.2129-p of August 19, 2020, that approved the “Concept for the regulation of Artificial Intelligence and robotics until 2024” [4, 5]. The concept establishes the following targets for the regulation of AI: securing simplified implementation of AI and robotics; providing the legal framework for the use of AI and robotics and facilitate the development of health insurance institutions; enhancing the data circulation mechanisms; establishing a national system for technical regulation and compliance; introducing measures to promote technological development.
268
A. Gusev et al.
Together with multiple-point changes in the existing regulatory legal acts, the Federal Law No. 258-FZ of July 31, 2020, “On experimental legal regimes in the field of digital innovation in the Russian Federation” established the contingent framework for the implementation of digital products, including those that based on AI technologies, and access to depersonalized data. Such an approach is designed to pilot innovations. If successful, it would be additional leverage and proof-of-concept to further develop the legislative and technical frameworks14 [6]. In January 2016, the Federal Law No. 4-FZ of January 31, 2016, Russia ratified the “Agreement on common principles and rules for the circulation of medical devices (medical products and medical equipment) in the framework of the Eurasian Economic Union”15 [7]. The law is aimed at the development of a coordinated policy in the circulation of medical devices based on the rules and recommendations defined by the International Medical Device Regulators Forum (IMDRF). IMDRF is a voluntary group of medical device regulators from around the world who have come together to build on the strong foundational work of the Global Harmonization Task Force on Medical Devices (GHTF) and aim to accelerate international medical device regulatory harmonization and convergence. IMDRF was established in October 2011. The Russian Minister of Health is a member of the Management Committee. IMDRF has created a Working Group on Artificial Intelligence Medical Devices (AIMD). This Working Group aims to achieve a harmonized approach to managing artificial intelligence (AI) medical devices. This Working Group will cover machine learning-based medical devices representing AI technology applied to medical devices, and further standardize terminology for machine learning-based medical devices among member jurisdictions. IMDRF accepts software designed for clinical use, as a Software as a Medical Device. Starting 2013, IMDRF issued four documents, given below, concerning the Software as a Medical Device concept. (i) The “Software as a Medical Device (SaMD): Key Definitions” document was issued in 2013. Purpose: helping regulators establish a common approach to adopting comprehensive control measures concerning SaMD [8], (ii) The “Software as a Medical Device: Possible Framework for Risk Categorization and Corresponding Considerations” document was issued in 2014. The paper addresses the identification of risks and offers internal control measures for SaMD. It also tackles general and particular issues essential to manufacturers, regulators, and users to streamline efforts to the SaMD-related activities [9], (iii) The “Software as a Medical Device (SaMD): Application of Quality Management System” document was issued in 2015. The document provides insights into quality management systems for better safety, efficacy, and functionality of SaMD [10], (iv) The “Software as a Medical Device (SaMD): Clinical Evaluation” document was issued in 2017. The paper illustrates a general approach to planning the clinical evaluation of SaMD [11]. In Russia, the AI-based software intended for use in healthcare is also considered SaMD. The Federal Law No. 323 “On the fundamentals of public health protection in the Russian Federation” includes article 38, where the special software designed
11 Development of Artificial Intelligence in Healthcare in Russia
269
for clinical use is considered a part of the “medical device” umbrella term [12]. The circulation (i.e., production, advertising, sales, use, maintenance, etc.) of the medical devices, including software, is allowed once the vendor obtains a marketing authorization through the established procedure. Article 6.28 of the Code of Administrative Offences [13] and article 238.1 of the Criminal Code [14] envisage administrative and criminal liabilities for any offenders who allow the circulation of unregistered medical devices. To categorize a software product as a medical device, it has to meet the criteria outlined in the informational letter No. 02I-297/20 from the Federal Service for Supervision in Healthcare (hereinafter—Roszdravnadzor) dated February 13, 2020 [15]. The letter defines four following criteria: the software must be a computer program that is not a part of another medical device, it must be intended for the provision of medical care by the manufacturer, it must carry out interpretation of incoming data by appropriate means, including AI technologies. The software operation outputs must have an impact on clinical decision making. To categorize a software product as SaMD, it has to meet all of the criteria mentioned above. Otherwise, the software cannot be considered SaMD; hence it cannot obtain state registration and release on the market.
11.2.2 Technical and Clinical Trials of Software as a Medical Device Created with the Application of AI Technologies The effective Russian legislation mandates the manufacturers to conduct independent technical and clinical trials concerning any potential risk factors and in accordance with the Nomenclature Classification of Medical Devices defined by the Order of the Ministry of Health No. 4n of June 6, 2012 [16]. This document defines four major risk groups: 1 (minimal), 2a, 2b, and 3 (maximal). Currently, every AI-based software is classified as risk group 3 product. During the pre-registration phase, the manufacturer must develop technical and service manuals as per the Order of the Ministry of Health No. 11n dated January 19, 2017 [17]. The technical trials’ purpose is to assess whether a SaMD meets the submitted regulatory, guidance, technical, and service documentations. The purpose of the clinical trials is to evaluate the safety and efficacy of SaMD using the predesigned and mapped out systematic study procedures. The clinical trial phase comes once the technical trial is completed. The procedure for technical and clinical trials of medical devices is outlined by the Order of the Ministry of Health No. 2n of January 9, 2014 [18]. According to the Government Decree No. 1906 of November 24, 2020, clinical trials of SaMDs does not require a permit from Roszdravnadsor and an approval from the Ethics Committee [19]. Plans are underway to issue a revised version of the Order of the Ministry of Health No. 2n. The updated document will tackle the specifics peculiar
270
A. Gusev et al.
to SaMDs, including those based on AI technologies. It will also outline a procedure for the approval of the technical trial outcomes, including those conducted under the supervision of the quality management. The Quality Management System was built in line with the EEC requirements and harmonized by the law submitted to the State Duma on December 21, 2020. Unlike regular (hardware) medical devices, the clinical trial of SaMDs, although it does not involve human subjects, require collection, analysis, and evaluation of clinical data received either from the SaMD in routine clinical practice and/or from the existing validated datasets. The procedure and other details of the clinical trials are outlined in the Clinical Trials Program. This document is composed by the SaMD’s manufacturer in collaboration with the medical facility responsible for the testing. The outlined specifications should be based on the provided technical and operational documentation. The complexity, duration, and other clinical trial parameters depend on the complexity of the SaMD. The main task of specialists who conduct clinical trials is to verify and collect sufficient, informative and reliable evidence of the safety and efficacy of the SaMD. In some cases, the testing of the SaMD involves comparison with its alternatives to check for possible clinical correlation. The registration of the application is rejected in situations when sufficient evidences have not been gathered. Other reasons include misleading product claims related to the clinical use and other representations from the submitted documentation or an immediate threat to human life and health. Otherwise, the registration is approved. Should the clinical trials prove successful, the investigators sign a Clinical Evaluation Report that consists of the investigation certificates as outlined in the Order of the Ministry of Health No. 2. The Order sets the criteria of sufficient clinical evidence for safety and efficacy of the SaMD. This is a core document that would establish the decision-making structure at the registration phase. The manufacturers have the right to choose the laboratory for technical trials and a contract research organization (CRO) responsible for clinical trials. The Order of the Ministry of Health No. 300n of May 16, 2013, defines the requirements for CROs, where the clinical trial may be conducted [20]. The list of CROs accredited by Roszdravnadzor is available at http://www.roszdravnadzor.ru/services/test_clinical. To initiate the clinical trial, the manufacturer concludes a contract with the selected medical facility. The Order of the Ministry of Health No. 1386n defines how to calculate the cost of the expert evaluation [21].
11 Development of Artificial Intelligence in Healthcare in Russia
271
11.2.3 State Registration of Software as a Medical Device Created with the Application of AI Technologies State registration of medical devices to authorize circulation in the Russian Federation is the responsibility of the Federal Service for Supervision in Healthcare (Roszdravnadzor). The rules for the registration of medical devices are outlined in the Government Decree No. 1416 of December 27, 2012 [22], with changes and amendments introduced by the Government Decree No. 1906 of November 24, 2020, to extend the effect of the document to SaMDs, including those based on AI technologies [19]. The administrative regulation procedure is established by the Order of Roszdravnadzor No. 9260 of December 9, 2019 [23]. The state registration’s primary goal is to conduct an independent expert assessment of the quality, efficacy, and safety of the SaMD based on the set of documents submitted by the manufacturer. The assessment is performed by experienced experts from the institutes subordinate to Roszdravnadzor: Russian National Institute for Research, Development, and Testing of Medical Equipment and the National Institute of Quality (formerly Center for Monitoring and Clinical and Economic Expertise). SaMDs, including those developed using AI technologies, are subject to the expedited registration procedure stipulated by the Government Decree No. 633 of May 31, 2018 that cannot exceed 20 business days. Its design is a single-phase expert assessment that does not require formal approval for clinical trial [24]. The expert assessment procedure is set out by the Order of the Ministry of Health No. 1353n of December 21, 2012 [25], and the decision of the Council of the Eurasian Economic Commission No. 46 of February 21, 2016 [26]. The expert assessment is performed based on the general practices [27] and the guidelines reviewed by the expert organizations of Roszdravnadzor. The latest version was issued in February 2021 [28]. Upon completing the expert assessment stage, Roszdravnadzor decides to authorize the registration to the SaMD. The corresponding notification is published on the dedicated public register, maintained under the Government Decree No. 615 of June 19, 2012 [29]. The on-line version of the register is available at http://www. roszdravnadzor.ru/services/misearch. A market authorization certificate for SaMD is a documentary proof of successful state registration. The market authorization template is defined by the Order of Roszdravnadzor No. 40-Pr/13 [30].
11.2.4 Post-registration Monitoring of Software as a Medicaldevice The decision of the Eurasian Economic Commission No. 174 of December 22, 2015, mandates that after the medical device has been registered and launched to the market, it must be monitored for the safety of use in routine clinical practice [31]. Such
272
A. Gusev et al.
monitoring is governed by the Order of the Ministry of Health No. 980n of September 15, 2020 [32]. The monitoring aims to detect and prevent side effects and adverse events associated with the use of the medical device that may pose life-threatening risks to patients and health workers. The relevant data is collected and forwarded to the “Adverse Events from Medical Devices” database for further analysis and expert evaluation. Where necessary, the data is used to develop measures to prevent adverse scenarios in the future. The data come from various sources, including public health information systems, user data, data provided by applicants, etc. According to the Order No. 1113n of October 19, 2020, the entities involved in the circulation of medical devices must notify Roszdravnadzor of the facts and circumstances related to endangering of life and health of patients and health workers during the use of the medical devices within 20 business days. Such a medical device manufacturer must submit a report on such adverse events and a plan for corrective actions [33]. Where adverse event or harm to human health has been identified and confirmed, the manufacturer must perform specific preventive measures according to the Order of the Ministry of Health No. 980n, which includes awareness-raising campaigns among consumers [34]. Should the manufacturer fail to take the necessary steps, Roszdravnadzor has the right to withdraw the medical device from circulation.
11.3 Technical Regulations of Artificial Intelligence in the Russian Federation Along with legal regulation, Russia is observing a rapid growth of technical rules to launch medical software developed using AI technologies. In July 2019, the Technical Committee for Standardization “Artificial Intelligence” (TC 164) was established by the Order No. 1732 of the Federal Agency for Technical Regulation and Metrology (Rosstandart) [35]. The committee deals with AI general issues and efficiency improvement of standardization in this area. The TC 164 is managed by Sergei Garbuk, Director for Research Projects of the Higher School of Economics, and Olga Mironova, Acting Executive Secretary of TC 164, is responsible for administrative and technical matters. A separate special Subcommittee SC01 “Artificial Intelligence in Healthcare” has been established as part of TC 164. It develops national and international standards for the development and performance of testing procedures and the use and operations of AI-based medical software. According to the Order of Rosstandart No. 3471 of December 31, 2019, the SC01 was established for coordinating unification and standardization of the development, testing, and operations of AI systems in healthcare, as well as setting certification requirements for medical devices that utilize AI technologies [36]. The primary agency of the Subcommittee is the Center for Diagnostics and Telemedicine of the Moscow Healthcare Department (DZM) under the leadership of
11 Development of Artificial Intelligence in Healthcare in Russia
273
Prof. Sergei Morozov, MD, PhD, MPH, CIIP, Chief Regional Radiology and Instrumental Diagnostics Officer of DZM and Ministry of Health of the Russian Federation for the Central Federal District. The SC 01 currently includes 35 agencies and 28 external experts. Following the plan devised by the TC164, the experts of the Center for Diagnostics and Telemedicine together with the SC01 members are working on the first seven national standards of the series «Artificial Intelligence Systems in Clinical Medicine», including «Clinical Trial», «Program and Methodology of Technical Trials», «Management of Changes in Artificial Intelligence Systems with Adaptive Algorithms», «Assessment and Control of Operational Parameters», «Requirements for the Structure and Application of a Dataset for Algorithm Training and Testing», «General Requirements for Operations», «Life Cycle Processes». New national standards will regulate critical aspects of the AI use in healthcare and its role in medical decision-making.
11.4 Practical Experience of Artificial Intelligence in Healthcare of the Russian Federation The first pilot projects for using commercial-grade AI software were launched in Russia in 2018. One of such projects introduced AI systems to the healthcare of the Yamal-Nenets Autonomous Okrug. The Botkin.Ai medical image analysis system and the Webiomed predictive analytic system were used in this project. It helped gather real-life data on how doctors perceive AI systems in real clinical practice and what potential effects these products may bring to healthcare officials and clinicians [37, 38]. Another example is the projects by PhthisisBioMed company based on multilevel neural webs for automated diagnosis and classification of tuberculosis using fluorograms obtained from mass screening of the Russian population. The software was developed as part of the project “Cloud Technologies for processing and interpreting medical diagnostic images using Big Data analysis tools” of the NTI Competence Center under the auspices of the Lomonosov Moscow State University in the direction of “Big Data storage and Analysis Technologies”. This software is currently used in 53 regions of the Russian Federation. Another interesting case is AI based “CT Calculator”, neural network for assessing the degree of lung damage. It was developed by the scientists from the Lomonosov Moscow State University together with the Moscow Healthcare Department. The results of blood tests, saturation, and general clinical presentation of patients diagnosed with COVID pneumonia compared with CT scans of the same patients. Now, a trained neural network will assist physicians in predicting the likelihood of a mild (CT 0–1), moderate (CT 2), or severe (CT 3–4) pneumonia and making decisions on further treatment tactics. In some cases, no computed tomography is required if the
274
A. Gusev et al.
calculator assumes a mild form of pneumonia. In other cases, an immediate hospitalization, a CT scan or X-ray, and active treatment are necessary. “CT Calculator” is already integrated into the Moscow UMIAS system. The new service will become an additional tool helping regions that, unlike Moscow, do not have broad access to computed tomography facilities. The Moscow project for the introduction of AI technologies in healthcare is considered to be the largest in Russia. On April 24, 2020, the government adopted Federal Law N 123-FZ “On conducting an experiment to establish special regulation the purpose of creation of necessary conditions for development and deployment of technologies of artificial intelligence in the subject of the Russian Federation— federal city of Moscow” and amendments to Articles 6 and 10 of the Federal Law “About Personal Data.” This law establishes the Experimental legislative framework for five years. The law determines the goals, objectives, and basic principles of establishing the Experimental legislative framework and regulates the underlying relations. The Federal Law No. 123 empowered the Moscow Government to determine conditions and procedures for the development and implementation of artificial intelligence technologies as well as for using its application results. On February 18, 2020, Moscow Health Care Department started to accept applications for participating in the Experiment on using AI/computer vision technologies (hereinafter as AI/CV) to analyze medical images. AI/CV software developers, that are ready to present programs for seamless integration into radiology departments’ workflows, were invited to participate in the tender. The Experiment is aimed at investigating applications of clinical decision support systems based on AI technologies in radiology departments of Moscow healthcare facilities. The first phase ended in December 2020, yielding recommendations for radiologists on using new tools based on computer vision technologies. The Experiment was launched as part of the execution of the Moscow Government Decree No. 1543-PP of November 21, 2019 “On conducting an experiment on the use of innovative computer vision technologies to analyze medical images and further application in the Moscow health care system.” It is being conducted under the Order No. 43 of January 24, 2020 by the Moscow Healthcare Department “On approval of the procedure and conditions for conducting an experiment on the use of innovative computer vision technologies for the analysis of medical images and further application in the Moscow healthcare system.” The Experiment required the development and approval of regulatory documents for approbation and testing of AI software by the Moscow Health Care Department. As a result, the Moscow experiment has become the most ambitious project to introduce AI into healthcare in Russia. Its first stage included the launch of the computer vision technologies to analyze medical images in the Moscow radiology services. This is the first Russian scientific study of such a scale that engaged almost every clinical institution of the city. As a result, a quality and productivity of diagnostic radiology services of the metropolitan health care has been improved. The Experiment is conducted by the Moscow Healthcare Department hosted by the Center for Diagnostics and Telemedicine, together with specialists from the Information
11 Development of Artificial Intelligence in Healthcare in Russia
275
Technology Department. The Experiment, carried out as a prospective study, was approved by the Independent Ethics Committee of the Moscow Regional Branch of the Russian Society of Radiology and registered on the clinicaltrials.gov platform. Unified Radiological Information Service (URIS), which operates as part of the Unified Medical Information and Analytical System of Moscow (UMIAS) is a platform of the Experiment. At the moment, 683 pieces of equipment are connected to URIS: computer tomography and magnetic resonance imaging scanners, as well as mammography, fluorography, angiography, and X-ray diagnostic devices, allowing to obtain more than 400,000 radiology examinations per month. Both Russian and international IT companies, start-ups, or IT departments of large technology companies that had a finished product tested in clinical practice, a service based on computer vision technologies in particular, were invited to participate in the Experiment. These services analyze three imaging modalities: digital mammography, computed and low-dose computed tomography (CT/LDCT), and digital chest X-ray images. In the future, plans are underway to extend the Experiment to magnetic resonance imaging, ultrasound diagnostics, and electrocardiography. The main goal of the Experiment is to comprehensively investigate the capabilities of AI-based software (computer vision) operating in Moscow outpatient and inpatient healthcare facilities. Based on the outcomes, recommendations on the safest and most accurate application of AI technology in diagnostic radiology will be developed. Eventually, this will also form a basis for national and international standards. At the end of 2020, various projects on the introduction of AI software were launched in more than 20 Russian regions. At the same time, the COVID-19 pandemic has become a critical driver that sparked the interest in AI in Russia. According to some experts, the Russian market for AI in healthcare is in its early development stage. Funding is provided by the Russian Academy of Sciences, the Ministry of Education and Science of the Russian Federation, National Research Centers (Moscow State University, the Moscow Institute of Physics and Technology), and Sberbank of Russia. Many start-ups are funded through venture capital raising and grants from development institutions such as the Skolkovo Foundation and the Russian Venture Company. Three AI-based healthcare software products were registered in Russia by the end of 2020. Webiomed, an AI-based predictive analytics and risk management platform in healthcare, is the first system to pass independent technical and clinical trials. In the Spring of 2020, it was registered by Roszdravnadzor as the Software as a Medical Device (https://webiomed.ai). Two other medical image analysis products obtained a license in November and December 2020, respectively: Botkin.Ai, based on AI platform for analyzing and processing medical images (https://botkin.ai) and Care Mentor Ai, a service platform of diagnostic radiology based on AI (https://careme ntor.ru). Medical software companies have great expectations for 2021. For example, clinical decision support systems are expected to be widely introduced in medical image analysis and predictive analytics. AI technologies are expected to be implemented directly into clinical practice.
276
A. Gusev et al.
In 2021, Russia plans to integrate the AI services into medical organizations’ workflow and make it a standard of clinical practice. Clinical validation of various automated services will play the essential role in this process. It is planned to upgrade the existing computer vision systems and create new ones to analyze threedimensional medical images, develop head MRI and CT scanning. These modalities are vital for the diagnostics of tumors, vascular, and degenerative dystrophic diseases of the central nervous system.
11.5 Chapter Summary In general, the Russian Federation has created the necessary legal conditions for the active development and implementation of AI technologies in healthcare practices. The process of independent technical and clinical trials of AI-based software products is regulated in detail. At the same time, on the basis of the national strategy for the development of AI and the “Concept for the regulation of artificial intelligence and robotics until 2024”, Russia continues to work on improving this regulation. Roszdravnadzor, together with the Ministry of Health and industry experts are actively looking for a balance between creating conditions for the accelerated launch of products to the market and improving a state control over the safety and efficacy of AI products at the same time, taking into account potential risks and problems caused by the peculiarities of AI technologies. The state agencies of the Russian Federation and the expert community consider the following as priority measures to improve legislative and technical regulations: (1) Reducing the time of clinical trials and examination of AI products before their state registration by further improving procedures for the clinical assessment of the safety and efficacy of the SaMD; (2) Creating conditions for conducting clinical trials on the basis of high-quality labeled datasets; (3) Development of requirements for the quality management system and the possibility of state control of its availability and level in order to increase trust in developers, and provide successful developers with simplified or accelerated procedures for launching the SaMD to the market; (4) Authorization to release new versions of products by notifying the regulator without the need for clinical retesting; (5) Development of post-registration monitoring of the SaMD, including by the creation of independent services for automated control of the AI product operation and automatic registration of adverse events in the relevant information system of Roszdravnadzor, for example, in the event of deterioration of the accuracy metrics of AI models indicated in the SaMD specifications.
References 1. A. Kuleshov, A. Ignatiev, A. Abramova, et al, Addressing AI ethics through codification, in International Conference Engineering Technologies and Computer Science (EnT) (2020), pp. 24–30 2. S. Gerke, T. Minssen, G. Cohen, Ethical and legal challenges of artificial intelligence-driven healthcare, in Artificial Intelligence in Healthcare (2020), pp. 295–336
11 Development of Artificial Intelligence in Healthcare in Russia
277
3. E. Topol, Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again (Basic Books, New York, 2019) 4. Decree of the Government of the Russian Federation No. 2129-r, http://publication.pravo.gov. ru/Document/View/0001202008260005. Accessed 19 Aug 2020 5. A.V. Gusev, S.P. Morozov, V.A. Kutichev et al., Legal regulation of artificial intelligence software in healthcare in the Russian Federation. Med. Technol. Assess. Choice 1(43), 36–45 (2021) 6. Decree of the President of the Russian Federation No. 490 On the Development of Artificial Intelligence in the Russian Federation and adoption of National Strategy for the Development of Artificial Intelligence over the Period up to the Year 2030, http://www.kremlin.ru/acts/bank/ 44731. Accessed 10 Oct 2019 7. Decree of the President of the Russian Federation No. 204 On the National Goals and Strategic Tasks of the Development of the Russian Federation for the Period up to 2024, https://minene rgo.gov.ru/view-pdf/11246/84473. Accessed 7 May 2018 8. IMDRF/SaMD WG/N10:2013 Software as a Medical Device (SaMD): Key Definitions, http://www.imdrf.org/docs/imdrf/final/technical/imdrf-tech-131209-samd-key-defini tions-140901.pdf. Accessed 9 Dec 2013 9. IMDRF/SaMD WG/N12:2014 Software as a Medical Device: Possible Framework for Risk Categorization and Corresponding Considerations. International Medical Device Regulators Forum, http://www.imdrf.org/docs/imdrf/final/technical/imdrf-tech-140918-samdframework-risk-categorization-141013.pdf. Accessed 18 Sep 2014 10. IMDRF/SaMD WG/N23:2015 Software as a Medical Device (SaMD): Application of Quality Management System. International Medical Device Regulators Forum, http://www.imdrf.org/ docs/imdrf/final/technical/imdrf-tech-151002-samd-qms.pdf. Accessed 2 Oct 2015 11. IMDRF/SaMD WG(PD1)/N41R3:2016 Software as a Medical Device (SaMD): Clinical Evaluation. International Medical Device Regulators Forum, http://www.imdrf.org/docs/imdrf/final/ technical/imdrf-tech-170921-samd-n41-clinical-evaluation_1.pdf. Accessed 21 Sep 2017 12. Federal Law № 323 On the fundamentals of public health protection in the Russian Federation, http://base.garant.ru/57499516. Accessed 21 Nov 2011 13. The Code of Administrative Offences of the Russian Federation, 195-FZ Article 6.28 Violation of the existing medical device rules, http://finansovyesovety.ru/statya6.28-koap-rf/. Accessed 30 Dec 2001 14. Criminal Code, 63-FZ Article 238.1. The production, import or sale of falsified, inferior, or unregistered medicines or medical devices, as well as the circulation of unregistered falsified active additives containing pharmaceutical substances, http://finansovyesovety.ru/statya238.1uk-rf/. Accessed 13 Jun 1996 15. Informational letter № 02I-297/20 from Roszdravnadzor, https://roszdravnadzor.ru/medpro ducts/registration/documents/65752. Accessed 13 Feb 2020 16. Order of the Russian Ministry of Health № 4n On approval of the nomenclature classification for medical devices, http://www.roszdravnadzor.ru/documents/121. Accessed 6 Jun 2012 17. Order of the Ministry of Health No. 11n About approval of requirements to the content of technical and operational documentation of medical devices manufacturer, http://base.garant. ru/71626748/. Accessed 19Jan 2017 18. Order of the Ministry of Health № 2n On approval of the procedure for the assessment of the compliance of medical devices in the form of technical trials, toxicological studies and clinical trials for medical device state registration purposes, http://base.garant.ru/70631448/. Accessed 9 Jan 2014 19. Government Decree № 1906 On amendments to the rules for state registration of medical devices, https://www.garant.ru/products/ipo/prime/doc/74862496/. Accessed 24 Nov 2020 20. Order of the Ministry of Health № 300n Approval of the requirements for medical institutions conducting clinical trials of medical devices and procedures of establishing compliance of medical institutions with these requirements, http://www.roszdravnadzor.ru/documents/106. Accessed 16 May 2013
278
A. Gusev et al.
21. Order of the Ministry of health № 1386n On approval of the methodology for determining the amount of payment for examination and testing of medical devices for state registration of medical devices and the maximum amount of fees for examination and testing of medical devices for state registration of medical devices, http://docs.cntd.ru/document/902 314476. Accessed 22 Nov 2011 22. Government Decree № 1416 On approval of the state registration of medical products”, http:// www.roszdravnadzor.ru/documents/121. Accessed 27 Dec 2012 23. Order of Roszdravnadzor № 9260 On approval of the administrative regulations of the Federal Service for Supervision in Healthcare for the implementation of state control over the circulation of medical devices, http://publication.pravo.gov.ru/Document/View/0001202002200032 Accessed 9 Dec 2019 24. Government Decree № 633 On amending the rules for state registration of medical devices, http://government.ru/docs/32763/. Accessed 31 May 2018 25. Order of the Ministry of Health № 1353n On approval of the procedure for medical product quality, efficacy and safety review with changes and amendments approved by the Order of the Ministry of Health № 303n, http://www.roszdravnadzor.ru/documents/115. Accessed 3 Jun 2015 26. Decision of the Council of the Eurasian Economic Commission № 46 On the rules for registration and expert assessment of safety, quality and efficacy of medical devices, https://docs. eaeunion.org/docs/ru-ru/01510767/cncd_12072016_46. Accessed 12Feb 2016 27. Guidelines on the procedure for conducting the examination of the quality, efficacy and safety of medical devices for the state registration, http://www.roszdravnadzor.ru/documents/34129 28. Guidelines on the procedure for conducting the examination of the quality, efficacy and safety of medical devices (in terms of software) for the state registration within the framework of the national system, https://roszdravnadzor.gov.ru/medproducts/documents/71502. Accessed 12 Feb 2021 29. Government Decree № 615 On approval of the Rules for maintaining the state register of medical devices and organizations (individual entrepreneurs) engaged in the production and manufacture of medical devices, http://docs.cntd.ru/document/902353655. Accessed 19 Jun 2012 30. Order of Roszdravnadzor№ 40-Pr/13 On approval of the form of registration certificate for a medical device, http://base.garant.ru/70329006/. Accessed 16 Jan 2013 31. Decision of the Council of the Eurasian Economic Commission № 174 On approval of the rules for monitoring of safety, quality and efficacy of medical devices, https://docs.eaeunion. org/docs/ru-ru/01510767/cncd_12072016_46. Accessed 22 Dec 2015 32. Order of the Russian Ministry of Health № 980n On approval of the Procedure monitoring of safety, quality and efficacy of medical devices, http://www.roszdravnadzor.ru/documents/121. Accessed 15 Sep 2020 33. Order of the Ministry of Health № 1113n On approval of the Procedure for reporting by the subjects of circulation of medical devices every detection of side effects not specified in the package leaflet or the user manual of a medical device; about adverse reactions during its use; about the peculiarities of the interaction of medical devices between themselves; about the facts and circumstances that pose a threat to the life and health of citizens and medical workers during the use and operation of a medical device, http://publication.pravo.gov.ru/Document/ View/0001202012070057. Accessed 19 Oct 2020 34. Order of the Russian Ministry of Health № 980n On approval of the Procedure monitoring of safety, quality and efficacy of medical devices”, http://publication.pravo.gov.ru/Document/ View/0001202011020039. Accessed 15 Sep 2020 35. Order № 1732 On amendments to the program of national standardization for 2020, approved by Order № 2612 of the Federal Agency for Technical Regulation and Metrology, http://docs. cntd.ru/document/566051442. Accessed 19 Oct 2020 36. On amending the Order of the Federal Agency for Technical Regulation and Metrology № 1732 On establishment of the technical committee for standardization ‘Artificial Intelligence’, http://docs.cntd.ru/document/564243465. Accessed 25 Jul 2019
11 Development of Artificial Intelligence in Healthcare in Russia
279
37. I. Korsakov, A. Gusev, T. Kuznetsova, et al., Deep and machine learning models to improve risk prediction of cardiovascular disease using data extraction from electronic health records. Euro. Heart J. 40(1), (2019) 38. I. Korsakov, D. Gavrilov, L. Serova, et al., Adapting neural network models to predict 10-year CVD development based on regional data calibration. Euro. Heart J. 41(2), (2020)
Chapter 12
Robotics in Healthcare Dmitrii Kolpashchikov, Olga Gerget, and Roman Meshcheryakov
Abstract A robot is a programmed actuated mechanism with a degree of autonomy. Medical robots came a long way since first prototypes based on industrial robots in the 1960s-70 s to become modern complex systems that assist surgeons, patients, and nurses. Over time, robots proved their usefulness and evolved for the ability to operate in confined spaces inside human bodies, help people recover the functions of injured limbs, or provide support to physically and cognitively impaired persons. This chapter provides an overview along with the challenges of current robotics in healthcare. Keywords Surgical robots · Rehabilitation robots · Assistive robotics · Socially assistive robotics · Robotics · Healthcare · Artificial Intelligence
12.1 Introduction ˇ In the 1920s, the term ‘robot’ was first introduced by Karel Capek in Rossum’s Universal Robots, where robots were artificial creatures that served humans. According to the current working definition, a robot is a programmed actuated mechanism with a degree of autonomy that moves within its environment to perform intended tasks [1]. The first robots appeared in the 1960s in the industry and quickly became essential components in automated systems. They augmented humans’ capacities or replaced human workers in dull, dirty, and dangerous jobs as they worked fast, accurately, and without complaints. D. Kolpashchikov · O. Gerget Tomsk Polytechnic University, Lenina, 30, Tomsk 634050, Russian Federation e-mail: [email protected] O. Gerget e-mail: [email protected] R. Meshcheryakov (B) Institute of Control Sciences, Russian Academy of Sciences, Lenina, 30, Tomsk 634050, Russian Federation e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_12
281
282
D. Kolpashchikov et al.
Further advancements in robotics and related fields improved software and hardware components of robots. They became intelligent enough to interact with humans, operate in unknown environments, and make their own decisions. New various designs from humanoid robots to miniature snake-like robots suitable for human blood vessels expanded capacities of robots making them able to work with high precision in new environments. As a result, nowadays, robots are used in many different fields such as cleaning [2], search and rescue [3], underwater [4], space [5, 6], homes [7], services [8], entertainment [9], education [10, 11], manufacturing [12], healthcare [13, 14], military [15], and etc. The history of robotics in healthcare started in the 1960s with concepts and prototypes of various medical robotic systems, which were based on industrial robots [16, 17]. The first surgical robots appeared in the 1980s and were used for patient treatment assistance [17] and direct surgical support [18]. The field of surgical robotics actively evolved since the 1990s when many new robotic systems were deployed targeting specific areas [17]. Modern robotic systems are able to hold and manipulate dozens of different surgical instruments with high precision and dexterity, allowing surgeons to reach areas where operations were impossible before. Robotic surgery improves intervention outcomes by reducing the length of stay, complications, and in-hospital mortality due to the increased safety and accuracy of surgical manipulations [19–21]. Major drawbacks are increased operative time and high cost. Today, robotic surgery is a fast-growing industry with a compound annual growth rate of 12–25% [22, 23]. The earliest rehabilitation robots were introduced in the 1960-70 s. Similar to early surgical robots, they were adapted from industrial robots [24]. Some early prototypes of rehabilitation robots were telemanipulator systems to perform activities of daily living for disabled people and master/slave exoskeleton systems for lower limb therapy. In the 1980s, robotic devices were introduced into upper limb rehabilitation [25]. In the same decade, several projects were initiated that later developed wheelchair-mounted robotic manipulators and autonomous wheelchair navigation systems [26]. In the 2000s, social assistant robots to support mental health for both children and elderly [27] and the world’s first commercially available bionic hand appeared [28]. Today, rehabilitation robotics is a large family of robots used to recover injured limbs, assist disabled people, replace missing limbs, and provide social companionship. Rehabilitation robotics is a fast-growing industry with a compound annual growth rate of 12–25% [29, 30]. Rehabilitation robotics is motivated by the demands from a great number of disabled persons: over 1 billion— 15% of the total population. This number is expected to double to 2 billion by 2050 [31]. The chapter is divided into the following sections. Surgical robots, their design and control, the role and application of computer-assisted surgery are outlined in the second section. The third section describes rehabilitation robotics: contact therapy robots, which help people relearn how to move, assistive robots used to complete daily activities and socially assistive robots, which are used for cognitive rehabilitation therapy or companionship. The fourth section gives a short description of
12 Robotics in Healthcare
283
non-medical robots and their role in healthcare. Some challenges of robotics in healthcare are given and discussed in the fifth section.
12.2 Surgical Robots Surgical robots are the domain of medical robots used as smart tools to work alongside surgeons during interventions. They usually work under direct surgeon control and can perform some procedures autonomously. This way of work combines the advantages of humans and robots to enhance the treatment for both patients and surgeons. Patients benefit from a wider range of available surgery with enhanced safety and efficiency, shorter hospital stays, lower mortality, and invasivity. Surgeons favor ergonomic workspaces outside hostile radiological environments that reduce physical and mental stress [32], as well as receive a precision tool with high dexterity able to perform certain tasks autonomously. Medical robots can also be used as surgical simulators [33].
12.2.1 Computer-Assisted Surgery Robotic surgery is inextricably linked with computer-assisted surgery (CAS). The role of CAS systems in medicine is similar to CAD/CAM/CAE systems in manufacturing. CAS is used for surgical planning, matching patients with their pre-operative models, developing accurate patient models, performing pre-programmed surgical actions, collecting and processing pre-, intra-, and post-operative patient-specific data [34]. In CAS, robots are considered as tools that expand surgeons’ capabilities needed to successfully complete some planned surgical interventions. During a pre-operative phase, CAS is used to produce a comprehensive computer model of a patient. To do this, data from different sources like medical imaging and lab tests are collected and combined with available statistical information about human anatomy, physiology, and the disease. The model is used to predict the outcome of certain actions or the whole intervention to optimize the intervention plan for a particular patient. In an operating room, the computer model is matched with the actual patient. Typically, this is done by identifying corresponding landmarks or structures on the preoperative model and the patient either by means of additional imaging with a pointing device or the robot itself. During the intervention, intra-operative imaging and sensors are used to track changes in the patient anatomy and update the model. This allows the surgeon to estimate the success of the procedure and make adjustments if needed. After the procedure is completed, data is further collected for the patient followup. At a later stage, all patient-specific data can be stored. This data can be used in surgical simulators to train and assess surgeons. The analyzed data can also be used to update statistical information for better pre-operative planning.
284
D. Kolpashchikov et al.
12.2.2 Mechanical Design and Control The mechanical design of surgical robots is influenced by many parameters such as safety (software and hardware redundancy, sterilizability, and biocompatibility), the place to mount (floor, ceiling, surgical table, or even a patient [35]), and compatibility with other devices in the workspace (for example, medical imaging devices). In this section, we discuss a robot’s end-effector, control approaches, and their effects on robot design. The mechanical design of a surgical robot depends on the control approach. There are three ways a surgeon can control a surgical robot (see Fig. 12.1). The first is a pre-programmed or active robot (see Fig. 12.1 left). This approach is similar to CNC machining. At the pre-operative stage, a surgeon prepares a control program for the robot using medical data. During operation, the robot matches the patient model with the real patient by either medical imaging or implanted fiducial pins. Then the robot autonomously executes pre-programmed commands. To ensure safety and accuracy, the robot is supervised by the surgeon and by various sensors and medical imaging. Pre-programmed control is only applied to the tasks with a predictable environment such as fixed bone during orthopedic surgery. TSolution One by Think Surgical is an example of an active robot [36]. The second control approach is teleoperation. In this approach, a surgeon and a robot can be separated by the distance of several meters or a transatlantic crossing [37]. The surgeon sees the operating field through a camera mounted on the robot’s end-effector and manipulates robotic tools from a remote workstation, while the robot repeats those manipulations on the patient side in real-time. To increase the quality of the operation, the robot filters surgeon tremors and provides haptic feedback [38]. Active boundaries can also be applied to prevent the robot from entering any restricted area [39]. This approach provides ergonomic workspace to a surgeon as compared with conventional surgery. A key limitation for teleoperation is time delay. Da Vinci by Intuitive Inc. is an example of such system and is used in various fields
Fig. 12.1 Robots with different control approaches: the active robot TSolution One [36] (left), the teleoperated robot with a workstation Da Vinci [40] (center) and the collaborative robot Mako (right) [42]
12 Robotics in Healthcare
285
[40]. Teleoperated control provides the greatest versatility for interactive surgery applications, such as dexterous minimally-invasive surgery or remote surgery [41]. The third approach is collaborative control. Collaborative robots are intended for direct human–robot interaction within a shared space. A surgeon grasps the robot’s end-effector and manipulates it like a regular tool but with the precision of a robot. Each end-effector is equipped with the force and torque sensors, senses the direction that the surgeon wishes to move the tool and the computer moves the robot to comply [43]. This approach exploits the surgeon’s natural eye-hand coordination. Besides, force sensors at the end-effector perceive forces that are insensible for humans and the robot can scale such forces enough for the surgeon to feel resistance, which is useful for microsurgery. Same as telerobots, collaborative surgical robots can filter hand tremors and apply active boundaries. An example of such robot is Mako by Stryker designed for orthopedic surgery [44]. The mechanical design of a surgical robot’s end-effector depends on available workspace. Generally, we can distinguish two main workspace types: wide and confined (see Fig. 12.2). A wide workspace has much space for an end-effector and its maneuvers. Bone shaping or spine surgery is typically performed in such workspaces. An end-effector used in a wide workspace is often a rigid instrument with high-precision actuators and sensors. High rigidity provides the precision of the operation by eliminating repelling during contact. In some cases, rigid end-effectors are enhanced with dexterous wrists to accomplish complex manipulations [45]. The other workspace is opposite to the first one: it is a confined workspace with a little space for an end-effector and its motions. Confined workspaces are typical for natural orifice transluminal endoscopic surgery, single-port access surgery, and intraluminal surgery, where end-effectors operate inside the human body surrounded by delicate soft tissues. An end-effector should be small enough to fit inside the human body and be able to navigate through without injuring. The best way to meet both requirements is to use a continuum robot with a snake-like flexible body [46]. Such robots have a great variety of designs and actuators and the size of up to 2 mm
Fig. 12.2 Robots in the wide workspace of an operation room [49] (left) and the confined workspace of a human lung [50] (right)
286
D. Kolpashchikov et al.
[47, 48]. Continuum robots are able to bend at any point of their body causing safe navigation inside the human body [46]. For diagnostic and intervention in confined spaces, there exist unconventional robots such as robotic capsules [51] and microrobots [52]. Robotic capsules are swallowable robots for gastrointestinal tract inspection. They use mechanical (wormtype, legged, wheeled, or crawling systems) and magnetic methods for locomotion in the stomach. In the lower gastrointestinal tract, the robots move along with peristalsis [53]. Microrobots are proposed for minimally invasive treatment of the circulatory system, the urinary tract, the eye, or the nervous system. Therapy options for such robots include targeted brachytherapy, drug delivery, material removal, telemetry, and the introduction of controllable structures [54]. Locomotion can be achieved through helical flagella driven magnetically, traveling-wave flagella driven with piezoelectric motors, or external magnets. Electrical energy for microrobots is either received by conversion of mechanical or thermal energy or transmitted from outside.
12.2.3 Application Robotic surgery is used in many fields of medicine. Some systems can be used for different purposes. In this section, we list commercially available systems by their fields of application. Robotic radiosurgery is a treatment, where lethal radiation doses focus on a tumor to destroy it. Robots for radiosurgery include CyberKnife (Accuray) [55], Novalis (Brainlab AG) [56], Gamma Knife (Elektra AB) [57], and Exacure (BEC) [58]. CyberKnife is an industrial robot that carries a linear accelerator as a radiation source. Markers are placed on the patient body to compensate for patient motion [59]. Novalis is an L-shaped robot that rotates around the patient longitudinal axis [60]. Gamma Knife is a half-sphere robot, which houses a fixed radiation source and a moveable focusing mechanism [59]. Exacure is an industrial-based robot for boron neutron capture therapy that moves the patient on a table or chair to the radiation source. In orthopedic surgery, robots are used as milling and sawing manipulators for bone shaping during total knee arthroplasty and total hip arthroplasty like in manufacturing. Examples of such robots are the preprogrammed robots TSolution One (Think Surgical) [36] and OMNIBotics (OMNI Life Science (acquired by Corin Group)) [61] and the collaborative robots Mako (Stryker) [44], Navio (Smith & Nephew) [62], and ROSA Knee (Zimmer Biomet) [63]. Robotic guide systems for spine surgery and robotic systems for needle placement and biopsy have a similar mechanical design based on an industrial robot. Commercially available robotic guide systems for spine surgery are ExcelsiusGPS (Globus Medical) [64], Mazor X (Medtronic) [49], and TiRobot (TINAVI) [65]. Robots for needle placement and biopsy are Neuromate system (Renishaw) [66], ROSA ONE Brain (Zimmer Biomet) [67], The Micromate (Interventional Systems) [68], guidoo (BEC) [69], ROBIO (Perfint Healthcare) [70], and iSR’obot Mona Lisa (Biobot Surgical) [71].
12 Robotics in Healthcare
287
Fields of application for minimally invasive surgical robots include but are not limited to otolaryngology, urology, thoracoscopy, gynecology, laparoscopy, colorectal therapy, hysterectomy, cardiac procedures, oncology, and neurosurgery. Some available minimally invasive surgical robots are the teleoperated multi-port access systems da Vinci (Intuitive) [40], the Senhance Surgical System (TransEnterix) [72], Versius (CMR Surgical) [73], BITRACK (Rob Surgical) [74], Revo (meerecompany) [75], Surgenius (Surgica Robotica) [76], and the single-port access system Enos (Titan Medical) [77]. Systems for robotic catheter procedures in heart and blood vessels are CorPath GRX (Corindus) [78] and Amigo (Catheter Robotics) [79]. Other continuum robots for medical applications are MONARCH (Auris Health) [50] and Ion (Intuitive) [80] for bronchoscopy and the Flex Robotic System (Medrobotics) [81] for head and neck surgery. All listed robotic catheters are steered by wires. Niobe (Stereotaxis) is a magnetically guided robotic catheter [82]. Among other available robots, there are the hair restoration robot ARTAS (Venus Concept) [83] and the robotic capsule PILLCAM (Medtronic) [84].
12.3 Rehabilitation Robots According to WHO, disability is any impairment of a body or mind that limits the impaired person’s activity and restricts interaction with the environment [85]. Disability could be caused by injuries, diseases, or aging. WHO data show that 15% of the world’s population live with some form of disability of whom 2–4% experience significant difficulties in functioning. This number is expected to double to 2 billion people by 2050 [31]. Rehabilitation robotics is a large family of devices used to recover injured limbs, as-sist disabled people with daily life activities, and provide social companionship. Rehabilitation robots improve the quality of rehabilitation for all user groups: disabled persons, therapists, and caregivers. Disabled people receive a faster and more efficient rehabilitation therapy and the ability to take care of themselves. Robots also ease the work of rehabilitation therapists as automated exercise machines only need to be set up once and then they can apply constant therapy over long periods without tiring [26]. However, rehabilitation robots are still not cost-efficient in most cases.
12.3.1 Contact Therapy Robots The process of neurorehabilitation exploits the use-dependent plasticity of the human neuromuscular system in order to help people re-learn how to move. Rehabilitation is a time-consuming and labor-intensive process for both therapists and patients. Wherein more therapy means better recovery [86]. The mechanical and repetitive nature of motion during rehabilitation can be automated by robots causing better
288
D. Kolpashchikov et al.
Fig. 12.3 Upper extremity [89] (left) and lower extremity [90] (right) rehabilitation robots
results. Robots should cooperate with the patient’s own movement attempts during rehabilitation in order to improve movement ability. Rehabilitation robotics has many advantages in comparison with conventional therapy. One of the features of rehabilitation robots is the ability to assess patient progress better than a human supervisor can do [87]. Another important advantage is that robots can involve patients in the rehabilitation process by games that increase their motivation. Besides, several robotic devices can be supervised by a single therapist even remotely [88]. Another important feature of a robot is the ability to acquire and store data. Analysis of this data can be used to improve the rehabilitation process. Contact therapy robots can be divided into categories by the field of application (upper extremity or lower extremity (see Fig. 12.3)), way they contact with patients (wearable or end-effector-based), and the number of engaged limbs (unilateral (single limb) or bilateral (both limbs)). When end-effector-based robots are used, the trajectories of the robot’s endeffector and the human limb are physically coupled in an operational space. Robots for the upper extremity are typically rigid planar or spatial robots with limited degrees of freedom. However, soft robot prototypes for contact rehabilitation also exist [91]. Some robots can involve both limbs for mirror therapy [92]. Robots for lower extremities are mostly treadmill systems with a harness to support patients and sometimes are bicycle-based [93] or tilt-table [94]. They are mainly used to train gross motor movement. All end-effector-based therapy robots are collaborative. They move in cooperation with patients. Most of them can also operate in a passive mode when a robot moves a passive patient limb. The limb that needs rehabilitation is strapped to the end-effector. The robot senses motion direction by force sensors and helps or applies resistance (to the degree adjusted by the therapist) when the patient moves the limb while playing simple games for motivation. Motivation is important because it improves treatment efficiency. Some cases show that robotic devices can improve results just because patients preferred exercising with the devices [95]. During exercise patients often play some game (sometimes in virtual reality) for better involvement in the treatment process.
12 Robotics in Healthcare
289
Available end-effector-based systems are the unilateral InMotionARM/HAND (BIONIK) [96] and ReoGo (Motorika) [97] for hand and arm rehabilitation, the bilateral Bi-Manu-Track (Reha-Stim Medtec) [98] and Amadeo (TYROMOTION) [99] for hand and finger rehabilitation. Typical examples of rehabilitation robots for the lower extremity are the bicycle-based OMEGO Plus [93] and the LEXO (TYROMOTION) [100], THERA-Trainer lyra (Medica Medizintechnik) [101] systems for gait training, and the unilateral System 4 (Biodex) [102]. Wearable therapy robots work in a continuous physical contact with a large part of the human body. An exoskeleton is a typical example of a wearable robot. Exoskeleton links and human limbs are physically connected and move in parallel. Such connection is perfect for training mobile or joint-specific movements because they drive not only the end-effector of the human limb but the whole kinematic chain. They have complete control over a patient’s individual joint movement and applied torque and ensure improved motion guidance, a relatively greater range of motion, and better quantitative feedback [103]. Lower extremity exoskeletons are often coupled with a stationary workstation, which has a treadmill or gait trainer and a harness to support patients. Researches show that exoskeletons have a significant and positive impact on therapy and some secondary benefits across multiple physiological systems [104, 105]. There are several rehabilitation exoskeletons available commercially. There exist several upper extremity exoskeletons: the unilateral exoskeleton for arm therapy ArmeoPower (Hocoma) [89], the mobile bilateral exoskeleton for arm therapy EksoUE (Ekso Bionics) [106], the unilateral exoskeleton for hand therapy Hand of hope (Gogoa) [107], and the exoskeleton for bilateral arm therapy ALEx (Wearable Robotics) [108]. Lower extremity exoskeletons are varied: Lokomat (Hocoma) [90], ReoAmbulator and Optimal-G Pro (Motorika) [109], and Walkbot (P&S Mechanics) [110] are exoskeletons coupled with a treadmill; EksoNR (Ekso Bionics) [111], HANK and BELK (Gogoa) [112], RoboGait (BAMA Teknoloji) [113], Atalante (Wandercraft) [114], ExoAtlet (ExoAtlet) [115], Indego (Parker Hannifin Corporation) [116], REX (Rex Bionics) [117], and ABLE (ABLE Human Motion) [118] are mobile exoskeletons; ATLAS 2030 (Marsi Bionics) is a mobile pediatric exoskeleton [119]. Besides, there are several robotic body-weight support systems. They support patients or their limbs to prevent falling, so that the risk of injury decreases and the fear of falling does not block physiological movements of patients. Two examples of robotic body-weight support systems are Andago (Hocoma) [120] for lower extremities and DIEGO (TYROMOTION) [121] for upper extremities.
12.3.2 Assistive Robotics Assistant robots (see Fig. 12.4) are used by disabled people to complete their daily tasks and achieve a quality of life on par with able-bodied individuals. Assistant robots can be used for object manipulation, patient mobility, or cognition aid. Manipulation
290
D. Kolpashchikov et al.
Fig. 12.4 Examples of assistant robots: exoskeleton [115] (left) and robotic prosthesis [122] (right)
robots help disabled people interact with the environment, e.g. open a door or manipulate household objects. Mobility assistance robots help patients with mobility impairment move again. Electric wheelchairs with navigation systems are one example of such robots. Assistant robots are similar to domestic devices in terms of a human–robot interface. Their human–robot interface is designed to be easy-to-understand for any enduser. Moreover, end-users are often people with disabilities, which means that the interface should allow the use by residual capabilities. Therefore, assistant robots are usually highly personalized devices with interfaces that suit particular end-users. The interface could include pushbuttons, joysticks, head-position cursor controls, eye-trackers, speech recognition systems, and etc. Safety is an essential feature of such robots. As with any medical robots, assistant robots are equipped with software and hardware redundancy, emergency systems, limitations in motion and dynamics. Assistant robotic manipulators are robots used to grasp and operate different objects. Manipulation robots can be fixed-base, portable, or mobile [26]. A fixed manipulation robot is usually similar to an industrial manipulator, which is mounted in frequently used locations, like a kitchen, and operates only within it. However, due to their high cost, these robots exist only in prototypes or limited series, e.g. the PUMA-based ProVAR [123], AfMASTER [124], Giving-A-Hand [125], or KARES II [126]. In contrast to a fixed-base device, portable assistant manipulators can be moved with a patient. They are easy-to-carry devices, like Neater Eater (Neater) [127], iEAT Robot (Assistive Innovations) [128], and OBI (Kinova Robotics) [129], or a robotic arm mounted on an electric wheelchair. Such solutions allow robots to assist patients in various places such as a grocery store. Portable manipulators are usually directly
12 Robotics in Healthcare
291
controlled by their end-users. This type of manipulators has commercially available examples: iARM (Assistive Innovations) [130] and JACO (Kinova Robotics) [131]. A mobile system is an autonomous robot that has manipulators and can move by itself, e.g. a humanoid robot. The simplest approach to locomotion for such systems is to use wheels since walking is a more complex task for robots. Such systems are autonomous and are not controlled by users. Therefore, they have stricter artificial intelligence requirements. They should be able to navigate in a safe and efficient way, avoid obstacles, recognize, grasp and manipulate objects. An early example of a mobile system is MovAid [132]; more recent projects are Robot-Era [133], Care-O-bot [134], and HOBBIT [135]. Mobility assistance robots are intended to help patients move or support them while walking. A smart wheelchair is one example of such devices. Mobility assistance robots are able to navigate autonomously or support users during complex actions. They use sensors to recognize the environment for safe navigation in the space with moving obstacles such as other humans. Currently, the only commercially available system is iBOT (Mobius Mobility) [136]. This robotic wheelchair helps users balance on rough terrain and go up and down stairs. Earlier systems and prototypes are NavChair [137], Hephaestus [138] and KARES [139]. The other type of mobility assistance robots is designed for walking assistance. They are external collaborative robots to help users walk themselves. They respond to force applied by users and move in a user-controlled direction. Simultaneously, the robots scan surrounding space to prevent collisions. Pam-Aid is an example of this kind of robot [140]. The commercially available Andago system by Hocom has the closest functionality [120]. This is a walk assistance robot used for rehabilitation and solely moved by a patient. The robot supervises the patient in its harness to prevent the patient from falling. Exoskeletons (see Fig. 12.4 left) are one of the most advanced types of assistive robots. They are similar to rehabilitation exoskeletons without a treadmill but intended to be used not only for therapy but for daily life scenarios. Exoskeletons proved their usefulness in many tests with persons with complete or incomplete paralysis [141]. There are several available examples in addition to mobile rehabilitation exoskeletons: ReWalk (ReWalk Robotics) [142], HAL (Cyberdyne) [143], FreeGait (BAMA Teknoloji) [144] for lower extremities and MyoPro (Myomo) for upper extremities [145]. Robotic prostheses (see Fig. 12.4 right) are used to replace missing limbs so their features should be maintained as close as possible to natural. They could be called wearable assistive manipulators for daily living. Being a replacement for a missing limb, a robotic prosthesis should have all its features. From the point of view of mechanical design, this should be a device with enough links and joints to provide similar strength and dexterity. The devices should also be capable of tactical feedback to their users. The most challenging feature of a robotic prosthesis is its control method, which should be close to the natural control of a human limb. There are several ways to enable natural control of a robotic prosthesis with its own motion instead of switches. The simplest way is to use a Bowden cable attached
292
D. Kolpashchikov et al.
to residual muscles. Another way is to place electrodes on muscles in a residual limb. However, both ways can only be used in single-degree-of-freedom systems. A more advanced way to control a robotic prosthesis with multiple degrees of freedom is to re-rout nerves of the lost limb to a spared muscle and then detect the user’s intent to move the limb using electromyography at the spared muscle [146]. This way allows the user to simultaneously control several joints. The most advanced way of control is to use neural interfaces. They are systems that are capable of recording the electrical activity of peripheral nerves as well as of the brain cortex. Direct electrical stimulation of a residual peripheral nerve can provide feedback to a user [147]. Recently, there has also been progress in decoding movement-related signals directly from the brain in real-time [148]. Recent roboticsrelated advances for prosthetic legs include embedding microprocessors and support software into artificial limbs [149]. Examples of robotic prostheses are POWER KNEE (Össur) [150], LUKE arm (Mobius Bionics) [151], and Hero Arm (Open Bionics) [122].
12.3.3 Non-Contact Therapy Robots and Socially Assistive Robotics Social robots (see Fig. 12.5) are used for cognitive rehabilitation therapy or companionship. This is a relatively new field targeted at human–robot social interaction. A social robot is the only medical robot that has no physical contact with patients or the contact is its secondary feature. Typically, it is a small toy-like robot that socially interacts with a patient to engage the patient in some helpful activities and provide
Fig. 12.5 Social assistance robot Paro [154]
12 Robotics in Healthcare
293
motivation. Multiple studies have demonstrated that physically embodied and collocated robots generate the social presence to convince users to take challenging steps [152] and boost performance [153]. Social robots are designed to be likable in appearance and have tactile properties. They greatly vary in morphology from zoomorphic and humanoid to everyday objects like balls and drawlers [155, 156]. Research shows that people prefer human-like and especially feminine-looking robots [157]. The way robots move [158], their facial expressions, and gestures are also important [159]. As for tactile properties, people usually have a positive response to touching initiated by a robot, preferably when it is soft and warm [160, 161]. A social robot should also adapt its personality and plan interaction that is suitable for a particular patient to improve the effect. Social robots are proposed for training social skills in children with autism [162] and cognition disorders [163], for older adults [164], to motivate and support during rehabilitation [165], reduce loneliness [166]. They can also be used for cognitive support (remind about diets, exercise, medications and etc.), measure and store different vital parameters [167].
12.4 Non-Medical Robots Unlike all robots mention in the previous sections, non-medical robots are used in healthcare, but are not directly involved in treatment. They are used by healthcare organizations to complete daily routines such as logistics, cleaning, guiding patients, measure vital signs and etc. [168]. Some of such tasks, e.g. cleaning, could be accomplished by common domestic, service robots, or mobile assistive robots or their modifications [169]. Other tasks, such as disinfection [170], patient transportation inside the hospital [171], patient transfer from a bed to a wheelchair and back [172, 173], pharmacy dispensing [174, 175], assisting during intervention [176] require robots of specific design. This is part of the hospital digitization. This field of healthcare robotics became more important during the COVID-19 outbreak when it became necessary to minimize contacts between people, especially in hospitals, where it is more likely to get infected. The pandemic and the lack of medical personnel boosted the automation of hospitals since robots can perform human tasks without getting infected or tired. Because of that, robots appeared in hospitals to help fight coronavirus from the very beginning of the pandemic period [177]. The increased demand has led to the emergence of not only new air and ground robots that disinfect buildings and streets but also non-contact temperature measurement devices and robots that inform people about the COVID-19 and prevention measures, and many others [178]. Examples of non-medical robots are shown in Fig. 12.6.
294
D. Kolpashchikov et al.
Fig. 12.6 Examples of non-medical robots in healthcare: service robot to measure vital signs [179] (left), disinfection robot [180] (right)
12.5 Challenges Healthcare robotics has many different hardware-related challenges. Many healthcare robots are based on devices that are used in other industries and evolve along with them. However, specific healthcare areas develop separately from the main stream due to special requirements. Surgical robots with flexible end-effectors, which work inside the human body, are one example of such special devices. To increase their capabilities and the range of procedures, surgical robots should improve precision, flexibility, and strength. Researchers look for new materials, actuators, and designs to achieve better performance [181–183]. Variable stiffness when a catheter can become rigid or flexible by operator’s command is another recent research area [184]. To provide decent haptic and force feedback to a surgeon, it is also important to develop systems able to do that outside the human body or sensors that are small enough to be placed inside a catheter [185]. Another aspect of hardware design is ergonomics. Medical robots should have comfortable workstations. For rehabilitation robots, the challenge is even more complex. Despite the advantages, there are major issues associated with the kinematic compatibility of rehabilitation robots with human limbs [186]. It is well established that kinematics of biological joints is complex and difficult to model and replicate with mechanical systems [187]. Misalignment between the axis or center of biological and robotic joints can result in large interaction forces at the human–robot interfaces, making the device uncomfortable and dangerous to patients in extreme cases [188]. In the case of robotic prostheses, it is essential to improve device control, haptic feedback, and similarity of function with a missed limb. To do this it is important to develop new accurate kinematic and dynamic models of human limbs. Appearance matters in social robot design. A secondary feature for most robots, it is essential that patients find their assistant robots appealing. It is likely that groups of different ages, gender, and disease will have different preferences for the appearance
12 Robotics in Healthcare
295
of social robots. Researchers look for features that receive a more positive response from target groups. Today, there exist many different types of appearance for social assistant robots [189]. The other side is cost-effectiveness. A robot should be cheap to produce and easy to maintain, so that hospitals can afford and use it. Emerging rapid prototyping technologies can help achieve this. Rapid prototyping will contribute to the customization of devices, which will make them more user-friendly. The other type of challenges is related to software. Increasing autonomy and intelligence is a common challenge for all healthcare robots, especially surgical robots. High robot autonomy can reduce human errors, speed up operations, and allow complex precisive manipulations that are inaccessible for humans. Except for orthopedic surgical robots, however, nowadays all surgical robots are directly controlled by human surgeons. The reason for this is the lack of models that describe tissues, end-effectors, and/or interaction between them. Research on these topics is being actively pursued. The need for automation is especially evident in micro-invasive surgery performed in confined spaces where continuum robots with flexible bodies are used as endeffectors. In comparison to conventional robots, which consist of links and joints and are based on well-known forward and inverse kinematics, continuum robots move due to elastic bending of their own bodies. This increases the complexity of the forward and inverse kinematic models. To achieve autonomy, end-effector models should be able to work in real-time with high accuracy to ensure the safety and efficiency of robotic operations. The forward kinematic model describes a robot’s body position and orientation depending on the desired shape. There are two approaches to model the position of continuum robots: mechanical and constant curvature. The constant curvature approach is fast and quite accurate. It uses geometric features of a continuum robot to describe the robot’s shape. The mechanical approach is more accurate, able to take into account friction inside the robot, but more time-consuming. The inverse kinematic model describes a robot’ shape depending on the desired robot’s body position and orientation. Algorithms to solve inverse kinematics problem need to be established for successful motion planning. It is a more challenging task. There are different approaches to solve inverse kinematics with different results. The most common approach is to use the Jacobian matrix to find a solution to the inverse kinematics problem [190]. The Jacobian-based approaches are accurate and can be used in real-time; however, they have high computational cost, require complex matrix calculations, often converge to singularity, and sometimes fail to find a solution that exists. Another approach to solving inverse kinematics is based on neural networks [191]. Trained networks achieve good results in time and error. However, this data-driven approach is badly scalable and requires training a new neural network for each new robot design. Recently, iterative inverse kinematics solvers for multi-section continuum robots were developed based on the Forward And Backward Reaching Inverse Kinematics (FABRIK) algorithm [192]. Algorithms for continuum robots simplify each arc to one (arc’s chord) virtual rigid links, so the FABRIK method is able to cope with the inverse kinematics issue [193]. Results
296
D. Kolpashchikov et al.
show that the proposed algorithms are faster than Jacobian-based methods and have higher solution rates while holding high accuracy. Path planning is another common software problem for all robots. Robotic manipulators and mobile platforms should plan an efficient and safe path to avoid collisions. The problem becomes harder if there are several agents in the same space since they should work in concert and should not interrupt each other. Although there are many effective algorithms for conventional manipulators and mobile platforms, it is hard to reach the desired position in micro-invasive surgery due to the complex geometry of the human body. There are several algorithms for path planning for continuum end-effectors that allow avoiding obstacles [194, 195]; however, still, they are not fast or accurate enough for real-time applications. The only real-time algorithm for path planning that takes into account collisions with obstacles exists for planar cases [196]. Another path planning issue associated with contact rehabilitation robots appears from outside the robotics domain. For them, neither the optimal movement tasks nor the optimal mechanical inputs are known [26]. To plan their paths, robots need to be aware of their surroundings. Mobile robots and robots that work in a wide space are equipped with various sensors, which give them full information about the environment. Robots in confined spaces have a limited set of sensors. To compensate, they can receive information from medical imaging. However, in contrast to surgeons who are able to process this information (define anatomic structures, detect the robot, and make informed decisions), robots need that this data is processed by various intelligent analysis algorithms, e.g. image segmentation or object detection methods, in order to create a map. There is much advancement in this field [197–199]. Body tissue is a dynamic structure, which moves itself and under contact with a robot. A model is needed to provide better path planning along with better control of the robust tissue. This also could be useful in teleoperation, where the effect of a time delay could be reduced by a proper model. The model could predict tissue and robot behavior when data is received in real-time. Another challenge is specific to social robotics. To better engage with a patient, the robot’s personality should be compatible. However, there is no single character that could be liked by everyone as people differently react to the same actions. To understand patients better, robots should recognize and take into account cultural differences, emotional states, context, verbal and non-verbal language [189]. And then socially assistive robots should be able to adapt to a specific patient. This will boost the robot’s performance. Researchers propose to use reinforcement learning algorithms to do this [200].
12.6 Conclusion In this chapter, robotics in healthcare was described, including a brief history of robotics in healthcare, an overview of surgical robots, rehabilitation robots, and nonmedical robots. Surgical robots are smart tools that help surgeons during inventions.
12 Robotics in Healthcare
297
Variability of design and control schemes allows the use of surgical robots in many fields of medicine. Rehabilitation robots are used to help disabled people recover injured limbs, help to accomplish daily tasks, as well as used for cognitive rehabilitation and companionship. Non-medical robots are used by healthcare organizations to reduce costs, prevent the spreading of diseases, and compensate for the lack of personnel. In hardware, challenges are related to design restrictions, ergonomic issues, and high costs. To overcome these challenges, new robotic designs, actuators and sensors should be developed. All types of robots have common challenges like path planning and obstacle avoidance. These challenges limit the degree of autonomy of medical robots. Lack of autonomy is an obstacle that prevents future development. This issue is most challenging in minimally-invasive surgery, where flexible robots are used in confined spaces. The challenge could be resolved by developing a new model for a flexible robot, tissue, and interaction between them.
References 1. “ISO 8373:2012(en), Robots and robotic devices—Vocabulary (2020). https://www.iso.org/ obp/ui/#iso:std:iso:8373:ed-2:v1:en Accessed 29 Aug 2020 2. Cleaning Robot Market Size, Share | Industry Report, 2019–2025. https://www.grandviewres earch.com/industry-analysis/cleaning-robot-market Accessed 07 Feb 2021 3. Search and Rescue Robots Market—Growth, Trends, and Forecasts (2020−2025). https:// www.researchandmarkets.com/reports/5177587/search-and-rescue-robots-market-growthtrends Accessed 07 Feb 2021 4. Underwater Robotics Market Size | Industry Report, 2018–2025. https://www.grandviewres earch.com/industry-analysis/underwater-robotics-market Accessed 07 Feb 2021 5. Y. Gao, S. Chien, Review on space robotics: Toward top-level science through space exploration. Sci. Robot. 2(7), eaan5074 (2017). https://doi.org/10.1126/scirobotics.aan5074 6. Space Robotics Market Share & Analysis Report 2020–2027. https://www.grandviewres earch.com/industry-analysis/space-robotics-market Accessed 07 Feb 2021 7. Household Robots Market | Industry Analysis and Market Forecast to 2024 | MarketsandMarkets (2021). https://www.marketsandmarkets.com/Market-Reports/household-robot-market253781130.html Accessed 07 Feb 2021 8. S. Ivanov, U. Gretzel, K. Berezina, M. Sigala, C. Webster, Progress on robotics in hospitality and tourism: a review of the literature. J. Hosp. Tour. Technol. JHTT-08–2018–0087 (2019). https://doi.org/10.1108/JHTT-08-2018-0087 9. Entertainment Robots Market by Type, Size, Growth and Forecast–2023 | MRFR (2021). https://www.marketresearchfuture.com/reports/entertainment-robots-market-2925 Accessed 07 Feb 2021 10. S. Anwar, N.A. Bascou, M. Menekse, A. Kardgar, A systematic review of studies on educational robotics. J. Pre-College Eng. Educ. Res. 9(2), 2019. https://doi.org/10.7771/2157-9288. 1223 11. Educational Robot Market by Type, Component, Education Level | COVID-19 Impact Analysis | MarketsandMarketsTM (2021). https://www.marketsandmarkets.com/Market-Reports/ educational-robot-market-28174634.html Accessed 07 Feb 2021 12. Industrial Robotics Market | Growth, Trends, and Forecasts (2020–2025). https://www.mor dorintelligence.com/industry-reports/industrial-robotics-market Accessed 07 Feb 2021
298
D. Kolpashchikov et al.
13. Medical Robotic System Market–Growth, Trends, COVID-19 Impact, and Forecasts (2021– 2026). https://www.researchandmarkets.com/reports/4591245/medical-robotic-system-mar ket-growth-trends Accessed 07 Feb 2021 14. J. Troccaz, G. Dagnino, G.-Z. Yang, Frontiers of medical robotics: from concept to systems to clinical translation. Annu. Rev. Biomed. Eng. 21(1), 193–218 (2019). https://doi.org/10. 1146/annurev-bioeng-060418-052502 15. Military Robots Market Size, Statistics–Global Forecast 2027 (2021). https://www.gminsi ghts.com/industry-analysis/military-robots-market Accessed 07 Feb 2021 16. C. Frumento, E. Messier, V. Montero, History and future of rehabilitation robotics 2010. Accessed 12 Feb 2021. [Online] Available http://digitalcommons.wpi.edu/atrc-projects/42 17. Á.R. Takács, D. Nagy, I.J. Rudas, T. Haidegger, Origins of surgical robotics: from space to the operating room. Acta Polytech. Hungarica 13(1), 13–30 (2016). https://doi.org/10.12700/ aph.13.1.2016.1.3 18. Y.S. Kwoh, J. Hou, E.A. Jonckheere, S. Hayati, A robot with improved absolute positioning accuracy for CT guided stereotactic brain surgery. IEEE Trans. Biomed. Eng. 35(2), 153–160 (1988). https://doi.org/10.1109/10.1354 19. S. Sheng, T. Zhao, X. Wang, Comparison of robot-assisted surgery, laparoscopic-assisted surgery, and open surgery for the treatment of colorectal cancer A network meta-analysis (2018). https://doi.org/10.1097/MD.0000000000011817 20. P.C. van der Sluis et al., Robot-assisted minimally invasive thoracolaparoscopic esophagectomy versus open transthoracic esophagectomy for resectable esophageal cancer. Ann. Surg. 269(4), 621–630 (2019). https://doi.org/10.1097/SLA.0000000000003031 21. M.J.G. Blyth, I. Anthony, P. Rowe, M.S. Banger, A. MacLean, B. Jones, Robotic arm-assisted versus conventional unicompartmental knee arthroplasty. Bone Joint Res. 6(11), 631–639 (2017). https://doi.org/10.1302/2046-3758.611.BJR-2017-0060.R1 22. Surgical Robots Market by Product & Service (Instruments and Accessories, Systems, Service), Application (Urological Surgery, Gynecological Surgery, Orthopedic Surgery), End User (Hospitals, Ambulatory Surgery Centers)—Global Forecasts to 2025. https://www.researchandmarkets.com/reports/5005621/surgical-robots-market-by-pro duct-and-service#pos-0 Accessed 29 Aug 2020 23. Surgical Robots Market Size, Share | Global Industry Report, 2019–2025. https://www.gra ndviewresearch.com/industry-analysis/surgical-robot-market Accessed 29 Aug 2020 24. K. Corker, J.H. Lyman, S. Sheredos, A preliminary evaluation of remote medical manipulators. Bull. Prosthet. Res. 10(32), 107–134 (1979) 25. M.P. Dijkers, P.C. deBear, R.F. Erlandson, K. Kristy, D.M. Geer, A. Nichols, Patient and staff acceptance of robotic technology in occupational therapy: a pilot study. J Rehabil Res Dev 28(2), 33–44 (1991) 26. H.F.M. Van der Loos, D.J. Reinkensmeyer, E. Guglielmelli, Rehabilitation and health care robotics. in Springer Handbook of Robotics (Cham, Springer International Publishing, 2016), pp. 1685–1728 27. T. Shibata, K. Wada, T. Saito, K. Tanie, Human interactive robot for psychological enrichment and therapy. Proc. AISB 5, 98–109 (2005) 28. BBC—A History of the World—Object : Bionic Hand (2020). http://www.bbc.co.uk/ahisto ryoftheworld/objects/rnjCtSFqRxekdECEgBSwRw Accessed 29 Aug 2020 29. Rehabilitation Robots Market | Growth, Trends, and Forecast (2020−2025). https://mordor intelligence.com/industry-reports/rehabilitation-robots-market Accessed 29 Aug 2020 30. Global Rehabilitation Robotics Market Share, Growth and Opportunities (2020). https://www. gmiresearch.com/report/global-rehabilitation-robotics-market/ Accessed 29 Aug 2020 31. Disability (2020). https://www.who.int/health-topics/disability#tab=tab_1 Accessed 29 Aug 2020 32. I.J.Y. Wee, L. Kuo, J.C. Ngu, A systematic review of the true benefit of robotic surgery: ergonomics. Int. J. Med. Robot. Comput. Assist. Surg. 16(4) (2020). https://doi.org/10.1002/ rcs.2113
12 Robotics in Healthcare
299
33. M.A. Liss, E.M. McDougall, Robotic surgical simulation. Cancer J. 19(2), 124–129 (2013). https://doi.org/10.1097/PPO.0b013e3182885d79 34. R.H. Taylor, A. Menciassi, G. Fichtinger, P. Fiorini, P. Dario, Medical robotics and computerintegrated surgery. in Springer Handbook of Robotics (Springer, 2016). pp. 1657–1684 35. W. Sukovich, S. Brink-Danan, M. Hardenbrook, Miniature robotic guidance for pedicle screw placement in posterior spinal fusion: early clinical experience with the SpineAssist®. Int. J. Med. Robot. Comput. Assist. Surg. 2(2), 114–122 (2006). https://doi.org/10.1002/rcs.86 36. Technology—THINK Surgical®, Inc (2020). https://thinksurgical.com/professionals/techno logy/ Accessed 29 Aug 2020 37. M. Ghodoussi, S.E. Butner, Y. Wang, Robotic surgery-the transatlantic case. in Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292) vol. 2 (2002). pp. 1882–1888 38. C. Freschi, V. Ferrari, F. Melfi, M. Ferrari, F. Mosca, A. Cuschieri, Technical review of the da Vinci surgical telemanipulator. Int. J. Med. Robot. Comput. Assist. Surg. 9(4), 396–406 (2013). https://doi.org/10.1002/rcs.1468 39. P. Marayong, M. Li, A.M. Okamura, G.D. Hager, Spatial motion constraints: theory and demonstrations for robot guidance using virtual fixtures. in 2003 IEEE International Conference on Robotics and Automation (Cat. No. 03CH37422),vol. 2. (2003). pp. 1954–1959 40. Intuitive | da Vinci | Robotic Surgical Systems (2020). https://www.intuitive.com/en-us/pro ducts-and-services/da-vinci Accessed 29 Aug 2020 41. L. Morelli, S. Guadagni, G. Di Franco, M. Palmeri, G. Di Candio, F. Mosca, Da Vinci single site© surgical platform in clinical practice: a systematic review. Int. J. Med. Robot. Comput. Assist. Surg. 12(4), 724–734 (2016). https://doi.org/10.1002/rcs.1713 42. B. Hagag, R. Abovitz, H. Kang, B. Schmitz, M. Conditt, RIO: robotic-arm interactive orthopedic system MAKOplasty: user interactive haptic orthopedic robotics. in Surgical Robotics (Boston, MA, Springer US, 2011). pp. 219–246 43. T. Haidegger, B. Benyö, L. Kovács, Z. Benyö, Force sensing and force control for surgical robots. in IFAC Proceedings Volumes (IFAC-PapersOnline). vol. 7(PART 1) (2009). pp. 401– 406.https://doi.org/10.3182/20090812-3-DK-2006.0035 44. Mako | Stryker (2020). https://www.stryker.com/us/en/portfolios/orthopaedics/joint-replac ement/mako-robotic-arm-assisted-surgery.html Accessed 29 Aug 2020 45. J. H. Palep, Robotic assisted minimally invasive surgery. J. Minimal Access Surg 5(1), 1–7 (2009) Wolters Kluwer- Medknow Publications. https://doi.org/10.4103/0972-9941.51313 46. J. Burgner-Kahrs, D.C. Rucker, H. Choset, Continuum robots for medical applications: a survey. IEEE Trans. Robot. 31(6), 1261–1280 (2015). https://doi.org/10.1109/TRO.2015.248 9500 47. H.M. Le, T.N. Do, S.J. Phee, A survey on actuators-driven surgical robots. Sensors Actuators A Phys. 247, 323–354 (2016). https://doi.org/10.1016/j.sna.2016.06.010 48. M. Runciman, A. Darzi, G.P. Mylonas, Soft robotics in minimally invasive surgery. Soft Robot. 6(4), 423–443 (2019). https://doi.org/10.1089/soro.2018.0136 49. Mazor X Stealth Edition Spine Robotics | Medtronic (2020). https://www.medtronic.com/ us-en/healthcare-professionals/therapies-procedures/spinal-orthopaedic/spine-robotics.html Accessed 30 Aug 2020 50. Monarch Platform—Endoscopy Transformed—Auris Health (2019). https://www.aurish ealth.com/monarch-platform Accessed 09 Jan 2019 51. S.S. Mapara, V.B. Patravale, Medical capsule robots: a renaissance for diagnostics, drug delivery and surgical treatment. J. Control. Release 261(Elsevier B.V) 337–351 (2017). https:// doi.org/10.1016/j.jconrel.2017.07.005 52. S. Ornes, Medical microrobots have potential in surgery, therapy, imaging, and diagnostics. Proc. Natl. Acad. Sci. USA. 114(47), 12356–12358 (2017). https://doi.org/10.1073/pnas.171 6034114 53. N. Simaan, R.M. Yasin, L. Wang, Medical technologies and challenges of robot-assisted minimally invasive intervention and diagnostics. Annu. Rev. Control. Robot. Auton. Syst. 1(1), 465–490 (2018). https://doi.org/10.1146/annurev-control-060117-104956
300
D. Kolpashchikov et al.
54. B.J. Nelson, I.K. Kaliakatsos, J.J. Abbott, Microrobots for minimally invasive medicine. Annual Rev. Biomed. Eng. 12(1), 55–85 (2010). https://doi.org/10.1146/annurev-bioeng-010 510-103409 55. CyberKnife S7 Launch | Accuray (2020). https://www.accuray.com/cyberknifes7/ Accessed 30 Aug 2020 56. Novalis−Brainlab (2020). https://www.brainlab.com/ru/resheniya-dlya-radiohirurgii/nov alis/ Accessed 30 Aug 2020 57. Leksell Gamma Knife® IconTM | Radiation Icon | Icon Treatment (2020). https://www.ele kta.com/radiosurgery/leksell-gamma-knife-icon/ Accessed 30 Aug 2020 58. Exacure−precision for life (2021). https://www.exacure.com/ Accessed 12 Feb 2021 59. È. Coste-Manière, D. Olender, W. Kilby, R.A. Schulz, Robotic whole body stereotactic radiosurgery: clinical advantages of the Cyberknife® integrated system. Int. J. Med. Robot. Comput. Assist. Surg. 1(2), 28–39 (2005). https://doi.org/10.1002/rcs.39 60. B.S. Teh et al., Versatility of the Novalis system to deliver image-guided stereotactic body radiation therapy (SBRT) for various anatomical sites. Technol. Cancer Res. Treat. 6(4), 347–354 (2007). https://doi.org/10.1177/153303460700600412 61. OMNIBotics® » Corin Group (2020). https://www.coringroup.com/uk/solutions/omnibotics/ Accessed 30 Aug 2020 62. NAVIO Surgical System–Technology behind the machine | Smith & Nephew (2020). https://www.smith-nephew.com/professional/microsites/navio/navio-technology/pro duct-overview/ Accessed 30 Aug 2020 63. Rosa Knee System (2020). https://www.zimmerbiomet.com/medical-professionals/knee/pro duct/rosa-knee-system.html Accessed 30 Aug 2020 64. ExcelsiusGPS® Robotic Navigation Platform | Globus Medical (2020). https://www.globus medical.com/musculoskeletal-solutions/excelsiusgps/ Accessed 30 Aug 2020 65. TiRobot Introduction—TINAVI | Intelligent Medical Solutions (2020). https://www.tinavi. com/index.php?m=content&c=index&a=lists&catid=9 Accessed 30 Aug 2020 66. Neuromate® stereotactic robot (2020). https://www.renishaw.ru/ru/neuromate-robotic-sys tem-for-stereotactic-neurosurgery--10712 Accessed 30 Aug 2020 67. ROSA ONE® Brain (2020). https://www.zimmerbiomet.com/medical-professionals/cmf/ rosa-brain.html Accessed 30 Aug 2020 68. Interventional Systems | Micromate (2021). https://www.interventional-systems.com/microm ate/ Accessed 12 Feb 2021 69. Medical—BEC GmbH—robotic solutions (2020). https://www.b-e-c.de/us/medical Accessed 30 Aug 2020 70. PERFINT HEALTHCARE (2020). http://www.perfinthealthcare.com/robioEX_Overvie w.php Accessed 30 Aug 2020 71. Robotic Prostate Biopsy with MRI-Ultrasound Fusion | iSR’obot Mona Lisa (2020). https:// biobotsurgical.com/monalisa/ Accessed 30 Aug 2020 72. What’s New | Senhance Surgical System (2020). https://www.senhance.com/us/home Accessed 30 Aug 2020 73. Versius Surgical Robotic System—CMR Surgical (2020). https://cmrsurgical.com/versius Accessed 30 Aug 2020 74. Rob Surgical | Bitrack system for minimally invasive surgery (2020). https://www.robsurgical. com/bitrack/ Accessed 30 Aug 2020 75. Revo (2020). http://revosurgical.com/#/main.html Accessed 30 Aug 2020 76. Surgica Robotica−Medical Technology Solutions (2020). https://www.surgicarobotica.com/ Accessed 30 Aug 2020 77. SPORT Surgical System | Titan Medical Inc (2020). https://titanmedicalinc.com/technology/ Accessed 30 Aug 2020 78. Corindus Corpath GRX for Cath Lab Safety (2020). https://www.corindus.com/corpath-grx/ benefits Accessed 30 Aug 2020 79. Catheter Robotics--Other Resources (2020). http://catheterrobotics.com/product-main.htm Accessed 30 Aug 2020
12 Robotics in Healthcare
301
80. Intuitive | Robotic-Assisted Bronchoscopy | Ion Platform (2020). https://www.intuitive.com/ en-us/products-and-services/ion Accessed 30 Aug 2020 81. Flex® Robotic System: Expanding the reach of surgery® | Medrobotics (2020). https://med robotics.com/gateway/flex-system-int/ Accessed 30 Aug 2020 82. Stereotaxis Products (2020). http://www.stereotaxis.com/products/ Accessed 30 Aug 2020 83. Robotic Hair Restoration Machine (2020). ARTAS iXTM —Venus Concept USA. https://www. venusconcept.com/en-us/artas-ix.htm Accessed 30 Aug 2020 84. PillCamTM SB 3 System | Medtronic (2020). https://www.medtronic.com/covidien/en-us/pro ducts/capsule-endoscopy/pillcam-sb-3-system.html#pillcam-sb-3-capsule Accessed 30 Aug 2020 85. Disability and health (2020). https://www.who.int/news-room/fact-sheets/detail/disabilityand-health Accessed 30 Aug 2020 86. D.U. Jette, R.L. Warren, C. Wirtalla, The relation between therapy intensity and outcomes of rehabilitation in skilled nursing facilities. Arch. Phys. Med. Rehabil. 86(3), 373–379 (2005). https://doi.org/10.1016/j.apmr.2004.10.018 87. H.I. Krebs et al., Robotic measurement of arm movements after stroke establishes biomarkers of motor recovery. Stroke 45(1), 200–204 (2014). https://doi.org/10.1161/STROKEAHA. 113.002296 88. A. Peretti, F. Amenta, S.K. Tayebati, G. Nittari, S.S. Mahdi, “Telerehabilitation: review of the state-of-the-art and areas of application. JMIR Rehabil. Assist. Tecnol. 4(2), e7 (2017).https:// doi.org/10.2196/rehab.7511 89. Armeo®Power—Hocoma (2021). https://www.hocoma.com/solutions/armeo-power/ Accessed 13 Feb 2021 90. Lokomat®—Hocoma (2021). https://www.hocoma.com/solutions/lokomat/ Accessed 08 Feb 2021 91. P. Polygerinos, Z. Wang, K.C. Galloway, R.J. Wood, C.J. Walsh, Soft robotic glove for combined assistance and at-home rehabilitation. Robot. Auton. Syst. 73, 135–143 (2015). https://doi.org/10.1016/j.robot.2014.08.014 92. S. Hesse, C. Werner, M. Pohl, S. Rueckriem, J. Mehrholz, M.L. Lingnau, Computerized arm training improves the motor control of the severely affected arm after stroke: a single-blinded randomized trial in two centers. Stroke 36(9), 1960–1966 (2005). https://doi.org/10.1161/01. STR.0000177865.37334.ce 93. OMEGO Plus—Tyromotion (2020). https://tyromotion.com/en/products/omegoplus/ Accessed 30 Aug 2020 94. Erigo®—Hocoma (2020). https://www.hocoma.com/solutions/erigo/ Accessed 30 Aug 2020 95. S.J. Housman, K.M. Scott, D.J. Reinkensmeyer, A randomized controlled trial of gravitysupported, computer-enhanced arm exercise for individuals with severe hemiparesis. Neurorehabil. Neural Repair 23(5), 505–514 (2009). https://doi.org/10.1177/1545968308331148 96. InMotion ARM/HANDTM :: Bionik Laboratories Corp. (BNKL) (2020). https://www.bionik labs.com/products/inmotion-arm-hand Accessed 30 Aug 2020 97. ReoGoTM —Motorika—Motorika (2020). http://motorika.com/reogo/ Accessed 30 Aug 2020 98. Bi-Manu-Track—Reha Stim Medtec (2020). https://reha-stim.com/bi-manu-track/ Accessed 30 Aug 2020 99. AMADEO—Tyromotion (2020). https://tyromotion.com/en/products/amadeo/ Accessed 30 Aug 2020 100. LEXO—Tyromotion (2020). https://tyromotion.com/en/products/lexo/ Accessed 30 Aug 2020 101. THERA-Trainer: THERA-Trainer lyra (2020). https://www.thera-trainer.de/en/thera-trainerproducts/gait/thera-trainer-lyra/ Accessed 30 Aug 2020 102. System 4 ProTM —Dynamometers—Physical Medicine | Biodex (2020). https://www.biodex. com/physical-medicine/products/dynamometers/system-4-pro Accessed 30 Aug 2020 103. R. Colombo, V. Sanguineti, in Rehabilitation robotics: Technology and Application (Academic Press, 2018)
302
D. Kolpashchikov et al.
104. L.E. Miller, A.K. Zimmermann, W.G. Herbert, Clinical effectiveness and safety of powered exoskeleton-assisted walking in patients with spinal cord injury: systematic review with metaanalysis. in Medical Devices: Evidence and Research, March 22, vol. 9. (Dove Medical Press Ltd, 2016), pp. 455–466. https://doi.org/10.2147/MDER.S103102 105. D.R. Louie, J.J. Eng, Powered robotic exoskeletons in post-stroke rehabilitation of gait: A scoping review. J. NeuroEng. Rehab. 13(1), 1–10 (2016), BioMed Central Ltd. https://doi. org/10.1186/s12984-016-0162-5 106. EksoUE–Ekso Bionics (2020). https://eksobionics.com/eksoue/ Accessed 30 Aug 2020 107. HAND OF HOPE | EXOESQUELETO DE MANO | ICTUS (2020). https://en.gogoa.eu/ hand-of-hope Accessed 30 Aug 2020 108. ALEx—Wearable RoboticsWearable Robotics (2020). http://www.wearable-robotics.com/ kinetek/products/alex/ Accessed 30 Aug 2020 109. Optimal-GTM Pro—World’s most advanced robotic gait rehabilitation platform—Motorika— Motorika (2020). http://motorika.com/optimal-g-pro/ Accessed 30 Aug 2020 110. Robot Assisted Gait Training Rehabilitation System—WALKBOT (2020). http://walkbot. co.kr/ Accessed 30 Aug 2020 111. EksoNR—Ekso Bionics (2020). https://eksobionics.com/eksonr/ Accessed 30 Aug 2020 112. GOGOA | EXOESQUELETOS | FABRICANTES (2020). https://en.gogoa.eu/ Accessed 30 Aug 2020 113. RoboGait—Bama Teknoloji (2021). http://www.bamateknoloji.com/product-category/rob otic-rehabilitation-3-en/robogait-en/?lang=en Accessed 13 Feb 2021 114. HOME—Wandercraft (2020). https://www.wandercraft.eu/ Accessed 30 Aug 2020 115. ExoAtlet | ExoAtlet (2020). https://exoatlet.ru/ Accessed 30 Aug 2020 116. Indego | Powering People Forward (2020). http://www.indego.com/indego/us/en/home Accessed 30 Aug 2020 117. Rex Bionics—Reimagining Rehabilitation (2020). https://www.rexbionics.com/ Accessed 30 Aug 2020 118. ABLE Human Motion | Walk again with ABLE exoskeleton (2020). https://www.ablehuman motion.com/ Accessed 30 Aug 2020 119. Atlas Pediatric Exo—Patients—Marsi Bionics (2020). https://www.marsibionics.com/en/ atlas-pediatric-exo-pacientes/ Accessed 30 Aug 2020 120. Andago®—Hocoma (2020). https://www.hocoma.com/solutions/andago/ Accessed 30 Aug 2020 121. DIEGO—Tyromotion (2020). https://tyromotion.com/en/products/diego/ Accessed 30 Aug 2020 122. Hero Arm—an affordable, advanced and intuitive bionic arm (2020). https://openbionics. com/hero-arm/ Accessed 30 Aug 2020 123. H.F.M. Van der Loos et al., ProVAR assistive robot system architecture. in Proceedings IEEE International Conference Roboticts Automation, vol. 1. (1999), pp. 741–746. https://doi.org/ 10.1109/robot.1999.770063 124. R. Gelin, B. Lesigne, M. Busnel, J.P. Michel, The first moves of the AFMASTER workstation. Adv. Robot. 14(7), 639–649 (2000). https://doi.org/10.1163/156855301742067 125. M.J. Johnson, E. Guglielmelli, G.A. Lauro, C. Laschi, M.C. Carrozza, P. Dario, 6 GIVINGA-HAND system: the development of a task-specific robot appliance. in Advances in Rehabilitation Robotics, (Springer Berlin, Heidelberg, 2006), pp. 127–141 126. Z. Bien et al., Integration of a rehabilitation robotic system (KARES II) with human-friendly man-machine interaction units. Auton. Robots 16(2), 165–191 (2004). https://doi.org/10. 1023/B:AURO.0000016864.12513.77 127. Neater Eater Robotic—Neater Solutions (2020). https://neater.co.uk/neater-eater-robotic/ Accessed 30 Aug 2020 128. Assistive Innovations—iEAT Robot | Assistive feeding and eating robot for people (2020). https://www.assistive-innovations.com/eatingdevices/ieat-robot Accessed 30 Aug 2020 129. Assistive technology for Eating device | Assistive technology | Kinova (2020). https://www.kin ovarobotics.com/en/products/assistive-technologies/eating-devices Accessed 30 Aug 2020
12 Robotics in Healthcare
303
130. Assistive Innovations—iARM (2020). https://assistive-innovations.com/robotic-arms/iarm Accessed 30 Aug 2020 131. Jaco | Robotic arm | Kinova (2020). https://www.kinovarobotics.com/en/products/assistivetechnologies/kinova-jaco-assistive-robotic-arm Accessed 30 Aug 2020 132. P. Dario, E. Guglielmelli, C. Laschi, G. Teti, MOVAID: a personal robot in everyday life of disabled and elderly people. Technol. Disabil. 10(2), 77–93 (1999) IOS Press. https://doi.org/ 10.3233/tad-1999-10202 133. R. Bevilacqua et al., Robot-era project: Preliminary results on the system usability. Lecture Notes in Comput. Sci. (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 9188, 553–561 (2015). https://doi.org/10.1007/978-3-319-208893_51 134. B. Graf, U. Reiser, M. Hägele, K. Mauz, P. Klein, Robotic home assistant care-O-bot® 3— product vision and innovation platform. in Proceedings of IEEE Workshop on Advanced Robotics and its Social Impacts, ARSO (2009). pp. 139–144. https://doi.org/10.1109/ARSO. 2009.5587059 135. L. Lammer, A. Huber, A. Weiss, M. Vincze, Mutual care: how older adults react when they should help their care robot (2021). Accessed 14 Feb 2021. [Online]. Available http://hobbitproject.eu 136. Mobius Mobility—The next generation iBOT is here. Offering extraordinary levels of mobility and performance. With its rich history of innovation. Now fully updated with the latest technology. Ready to go where you want to go.” https://mobiusmobility.com/ Accessed 09 Feb 2021 137. S.P. Levine, D.A. Bell, L.A. Jaros, R.C. Simpson, Y. Koren, J. Borenstein, The navchair assistive wheelchair navigation system. IEEE Trans. Rehabil. Eng. 7(4), 443–451 (1999). https://doi.org/10.1109/86.808948 138. R.C. Simpson, D. Poirot, F. Baxter, The hephaestus smart wheelchair system. IEEE Trans. Neural Syst. Rehabil. Eng. 10(2), 118–122 (2002). https://doi.org/10.1109/TNSRE.2002.103 1980 139. W.K. Song, H. Lee, Z. Bien, KARES: intelligent wheelchair-mounted robotic arm system using vision and force sensor. Rob. Auton. Syst. 28(1), 83–94 (1999). https://doi.org/10. 1016/S0921-8890(99)00031-7 140. G. Lacey, S. Macnamara, User involvement in the design and evaluation of a smart mobility aid. J. Rehabil. Res. Dev. 37(6), (2000) 141. T.A. Swift, K.A. Strausser, A.B. Zoss, H. Kazerooni, Control and experimental results for post stroke gait rehabilitation with a prototype mobile medical exoskeleton. in ASME 2010 Dynamic Systems and Control Conference, DSCC2010, January, vol. 1. (2010). pp. 405–411. https://doi.org/10.1115/DSCC2010-4204 142. ReWalk Robotics—More Than Walking (2020). https://rewalk.com/ Accessed 30 Aug 2020 143. Hal For Medical Use (Lower Limb Type) Cyberdyne (2020). https://www.cyberdyne.jp/eng lish/products/LowerLimb_medical.html Accessed 30 Aug 2020 144. FreeGait—Bama Teknoloji (2020). http://www.bamateknoloji.com/products/robotic-rehabi litation-3/1971/?lang=en Accessed 30 Aug 2020 145. Myomo—Medical Robotics Solutions for Stroke, BPI, Upper Limb Paralysis (2020). https:// myomo.com/ Accessed 30 Aug 2020 146. G.A. Bertos, E.G. Papadopoulos, Upper-limb prosthetic devices. in Handbook of Biomechatronics, (Academic Press, 2018). pp. 177–240 147. S. Raspopovic et al., Bioengineering: restoring natural sensory feedback in real-time bidirectional hand prostheses. Sci. Transl. Med. 6(222), 222ra19–222ra19 (2014). https://doi.org/10. 1126/scitranslmed.3006820 148. L.R. Hochberg et al., Neuronal ensemble control of prosthetic devices by a human with tetraplegia. Nature 442(7099), 164–171 (2006). https://doi.org/10.1038/nature04970 149. D. Berry, Microprocessor prosthetic knees. Phys. Med. Rehab. Clinics of North America 17(1), 91–113 (Elsevier, 2006). https://doi.org/10.1016/j.pmr.2005.10.006
304
D. Kolpashchikov et al.
150. POWER KNEETM (2020). https://www.ossur.com/en-us/prosthetics/knees/power-knee Accessed 30 Aug 2020 151. LUKE Arm Detail Page—Mobius Bionics (2020). https://www.mobiusbionics.com/lukearm/ Accessed 30 Aug 2020 152. J. Wainer, D.J. Feil-Seifer, D.A. Shell, M.J. Matari´c, Embodiment and human-robot interaction: a task-based perspective. in Proceedings—IEEE International Workshop on Robot and Human Interactive Communication, (2007), pp. 872–877. https://doi.org/10.1109/ROMAN. 2007.4415207 153. D. Leyzberg, S. Spaulding, M. Toneva, B. Scassellati, in UC Merced Proceedings of the Annual Meeting of the Cognitive Science Society Title The Physical Presence of a Robot Tutor Increases Cognitive Learning Gains Publication Date, The Physical Presence of a Robot Tutor Increases Cognitive Learning Gains, vol.34. (2012) pp. 34 154. PARO Therapeutic Robot (2021). http://www.parorobots.com/ Accessed 09 Feb 2021 155. L.D. Riek, Robotics technology in mental health care. in Artificial Intelligence in Behavioral and Mental Health Care, (Elsevier Inc., 2016), pp. 185–203 156. E. Martinez-Martin, A.P. del Pobil, in Personal Robot Assistants for Elderly Care: An Overview (2018), pp. 77–91 157. S.J. Stroessner, J. Benitez, The social perception of humanoid and non-humanoid robots: effects of gendered and machinelike features. Int. J. Soc. Robot. 11(2), 305–315 (2019). https://doi.org/10.1007/s12369-018-0502-7 158. M. Heimerdinger, A. LaViers, Modeling the interactions of context and style on affect in motion perception: stylized gaits across multiple environmental contexts. Int. J. Soc. Robot. 11(3), 495–513 (2019). https://doi.org/10.1007/s12369-019-00514-1 159. C. Moro, S. Lin, G. Nejat, A. Mihailidis, Social robots and seniors: a comparative study on the influence of dynamic social features on human-robot interaction. Int. J. Soc. Robot. 11(1), 5–24 (2019). https://doi.org/10.1007/s12369-018-0488-1 160. A.E. Block, K.J. Kuchenbecker, Softness, warmth, and responsiveness improve robot hugs. Int. J. Soc. Robot. 11(1), 49–64 (2019). https://doi.org/10.1007/s12369-018-0495-2 161. C.J.A.M. Willemse, J.B.F. van Erp, Social touch in human-robot interaction: robot-initiated touches can induce positive responses without extensive prior bonding. Int. J. Soc. Robot. 11(2), 285–304 (2019). https://doi.org/10.1007/s12369-018-0500-9 162. L.I. Ismail, T. Verhoeven, J. Dambre, F. Wyffels, Leveraging robotics research for children with autism: a review. Int. J. Social Robot. 11(3), 389–410 (Springer Netherlands , 2019). https://doi.org/10.1007/s12369-018-0508-1 163. E. Mordoch, A. Osterreicher, L. Guse, K. Roger, G. Thompson, Use of social commitment robots in the care of elderly people with dementia: a literature review. Maturitas 74(1), 14–20 (Elsevier, 2013). https://doi.org/10.1016/j.maturitas.2012.10.015 164. D. Karunarathne, Y. Morales, T. Nomura, T. Kanda, H. Ishiguro, Will older adults accept a humanoid robot as a walking partner? Int. J. Soc. Robot. 11(2), 343–358 (2019). https://doi. org/10.1007/s12369-018-0503-6 165. M.J. Matari´c, J. Eriksson, D.J. Feil-Seifer, C.J. Winstein, Socially assistive robotics for poststroke rehabilitation. J. Neuroeng. Rehabil. 4(1), 1–9 (2007). https://doi.org/10.1186/17430003-4-5 166. H. Robinson, B. MacDonald, N. Kerse, E. Broadbent, The psychosocial effects of a companion robot: a randomized controlled trial. J. Am. Med. Dir. Assoc. 14(9), 661–667 (2013). https:// doi.org/10.1016/j.jamda.2013.02.007 167. H. Petrie, J. Darzentas, Older people and robotic technologies in the home: perspectives from recent research literature. in ACM International Conference Proceeding Series, June 2017, vol. Part F128530. (2017), pp. 29–36. https://doi.org/10.1145/3056540.3056553 168. Z.H. Khan, A. Siddique, C.W. Lee, Robotics utilization for healthcare digitization in global COVID-19 management. Int. J. Environ. Res. Public Health 17(11), 3819 (2020). https://doi. org/10.3390/ijerph17113819 169. XAG Robot Joins Drone Fleet to Initiate Ground Air Disinfection in Coronavirus Battle (2021). https://www.xa.com/en/news/official/xag/72 Accessed 14 Feb 2021
12 Robotics in Healthcare
305
170. A. Begi´c, Application of service robots for disinfection in medical institutions. in Lecture Notes in Networks and Systems, vol. 28. (Springer, 2018), pp. 1056–1065 171. C. Wang, A.V. Savkin, R. Clout, H.T. Nguyen, An intelligent robotic hospital bed for safe transportation of critical neurosurgery patients along crowded hospital corridors. IEEE Trans. Neural Syst. Rehabil. Eng. 23(5), 744–754 (2015). https://doi.org/10.1109/TNSRE.2014.234 7377 172. T. Mukai et al., Development of a nursing-care assistant robot RIBA that can lift a human in its arms. in IEEE/RSJ 2010 International Conference on Intelligent Robots and Systems, IROS 2010—Conference Proceedings (2010), pp. 5996–6001. https://doi.org/10.1109/IROS. 2010.5651735 173. J. Ding et al., Giving patients a lift—the robotic nursing assistant (rona) (2014). https://doi. org/10.1109/TePRA.2014.6869137 174. T.E. Kirschling, S.S. Rough, B.C. Ludwig, Determining the feasibility of robotic courier medication delivery in a hospital setting. Am. J. Heal. Pharm. 66(19), 1754–1762 (2009). https://doi.org/10.2146/ajhp080184 175. ROBOT-Rx Central Pharmacy Robotic System | Omnicell (2021). https://www.omnia-health. com/product/robot-rx-central-pharmacy-robotic-system Accessed 14 Feb 2021 176. C. Perez-Vidal et al., Steps in the development of a robotic scrub nurse. Rob. Auton. Syst. 60(6), 901–911 (2012). https://doi.org/10.1016/j.robot.2012.01.005 177. China Buys Danish Robots To Fight Coronavirus (2021). https://www.uvd-robots.com/blog/ test Accessed 14 Feb 2021 178. A. Efimov et al., Practical use of robots and related technologies in counteraction to COVID-19 pandemic. Robot. Tech. Cybern. 8(2), 87–100 (2020). https://doi.org/10.31776/rtcj.8201 179. Robot Medical Assessor | PROMOBOT. (2021). https://promo-bot.ai/use-case/medical-ass essor/ Accessed 14 Feb 2021 180. UVD Robots® (2021). https://www.uvd-robots.com/ Accessed 14 Feb 2021 181. S. Ruiz, B. Mead, V. Palmre, K.J. Kim, W. Yim, A cylindrical ionic polymer-metal compositebased robotic catheter platform: modeling, design and control. Smart Mater. Struct. 24(1), 015007 (2015).https://doi.org/10.1088/0964-1726/24/1/015007 182. J. Guo, S. Guo, L. Shao, P. Wang, Q. Gao, Design and performance evaluation of a novel robotic catheter system for vascular interventional surgery. Microsyst. Technol. 22(9), 2167–2176 (2016). https://doi.org/10.1007/s00542-015-2659-4 183. C. Bergeles, A.H. Gosline, N.V. Vasilyev, P.J. Codd, P.J. Del Nido, P.E. Dupont, Concentric tube robot design and optimization based on task and anatomical constraints. IEEE Trans. Robot. 31(1), 67–84 (2015). https://doi.org/10.1109/TRO.2014.2378431 184. C. Chautems, A. Tonazzini, D. Floreano, B.J. Nelson, A variable stiffness catheter controlled with an external magnetic field. in IEEE International Conference on Intelligent Robots and Systems, December 2017, vol 2017-September, (2017). pp. 181–186. https://doi.org/10.1109/ IROS.2017.8202155 185. F. Ju et al., A miniature piezoelectric spiral tactile sensor for tissue hardness palpation with catheter robot in minimally invasive surgery. Smart Mater. Struct. 28(2), 025033 (2019).https:// doi.org/10.1088/1361-665X/aafc8d 186. A. Zeiaee, R. Soltani-Zarrin, R. Langari, R. Tafreshi, Design and kinematic analysis of a novel upper limb exoskeleton for rehabilitation of stroke patients. in IEEE International Conference on Rehabilitation Robotics, August (2017), pp. 759–764. https://doi.org/10.1109/ ICORR.2017.8009339 187. A. Schiele, F.C.T. Van Der Helm, Kinematic design to improve ergonomics in human machine interaction. IEEE Trans. Neural Syst. Rehabil. Eng. 14(4), 456–469 (2006). https://doi.org/ 10.1109/TNSRE.2006.881565 188. N. Jarrassé et al., Robotic exoskeletons: a perspective for the rehabilitation of arm coordination in stroke patients. Front. Human Neurosci. 8, 947 (2014). Frontiers Media S. A. https://doi. org/10.3389/fnhum.2014.00947 189. K. Baraka, P. Alves-Oliveira, T. Ribeiro, An extended framework for characterizing social robots. arXiv. arXiv, (2019), pp. 21–64. https://doi.org/10.1007/978-3-030-42307-0_2
306
D. Kolpashchikov et al.
190. B.A. Jones, I.D. Walker, Kinematics for multisection continuum robots. IEEE Trans. Robot. 22(1), 43–55 (2006). https://doi.org/10.1109/TRO.2005.861458 191. J. Lai, K. Huang, H.K. Chu, A learning-based inverse kinematics solver for a multi-segment continuum robot in robot-independent mapping. in 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO), December 2019, (2019). pp. 576–582. https://doi.org/ 10.1109/ROBIO49542.2019.8961669 192. A. Aristidou, J. Lasenby, FABRIK: a fast, iterative solver for the inverse kinematics problem. Graph. Models 73(5), 243–260 (2011). https://doi.org/10.1016/j.gmod.2011.05.003 193. D. Kolpashchikov, N. Laptev, V. Danilov, I. Skirnevskiy, R. Manakov, O. Gerget, FABRIKbased inverse kinematics for multi-section continuum robots (2018) 194. J. Xiao, R. Vatcha, Real-time adaptive motion planning for a continuum manipulator. in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, October (2010), pp. 5919–5926. https://doi.org/10.1109/IROS.2010.5648888 195. O.M. Gerget, D.Y. Kolpashchikov, Collision avoidance for continuum robot using FABRIK algorithm. in 2019 Twelfth International Conference “Management of large-scale system development” (MLSD), October (2019), pp. 1–4. https://doi.org/10.1109/MLSD.2019.891 1070 196. D.Y. Kolpashchikov, N.V. Laptev, R.A. Manakov, V.V. Danilov, O.M. Gerget, R.V. Meshcheryakov, Motion planning algorithm for continuum robots bending over obstacles. in Proceedings of 2019 3rd International Conference on Control in Technical Systems, CTS 2019, October (2019), pp 269–272. https://doi.org/10.1109/CTS48763.2019.8973282 197. V. Danilov, K. Klyshnikov, O. Gerget, A. Kutikhin, V. Ganyukov, A. Frangi, Real-time coronary artery stenosis detection based on modern neural networks (2021). https://doi.org/10. 21203/rs.3.rs-130610/v1 198. M. Allan, S. Ourselin, D.J. Hawkes, J.D. Kelly, D. Stoyanov, 3-D pose estimation of articulated instruments in robotic minimally invasive surgery. IEEE Trans. Med. Imaging 37(5), 1204– 1213 (2018). https://doi.org/10.1109/TMI.2018.2794439 199. D. Shen, G. Wu, H. Il Suk, Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248 (2017). https://doi.org/10.1146/annurev-bioeng-071516-044442 200. H.W. Park, I. Grover, S. Spaulding, L. Gomez, C. Breazeal, A model-free affective reinforcement learning approach to personalization of an autonomous social robot companion for early literacy education. in 33rd AAAI Conference on Artificial Intelligence, AAAI 2019, 31st Innovative Applications of Artificial Intelligence Conference, IAAI 2019 and the 9th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, July 2019, vol 33(01) (2019). pp. 687–694. https://doi.org/10.1609/aaai.v33i01.3301687
Chapter 13
Smart Healthcare, IoT and Machine Learning: A Complete Survey Valerio Bellandi, Paolo Ceravolo, Ernesto Damiani, and Stefano Siccardi
Abstract In the last years monitor the health status of the people has become a one of the major IoT research filed application. Many works and proposal are been presented in literature, some with a specific focus and other with a general purpose objective. From this motivation in this chapter we analyze in dept the state of the art, focusing on the (i) architectural aspects and (ii) algorithm system point of view. Keywords Internet of Things (IoTs) · Smart healthcare · Machine learning · Analytics · Cloud computation
13.1 Introduction Smart healthcare systems use Internet of Things (IoT) devices situated in living environments or worn by patients to monitor health status at all times. These solutions are becoming more and more popular due to the maturing of the technology that makes them attractive alternatives to the hospitalization of elderly people, especially for long-term non-acute diseases. Examples are balance disorders, hypertension, diabetes, depression, etc.
V. Bellandi (B) · P. Ceravolo · E. Damiani · S. Siccardi Computer Science Department, Università Degli Studi di Milano, Via Celoria 18, Milano, MI, Italy e-mail: [email protected] P. Ceravolo e-mail: [email protected] E. Damiani e-mail: [email protected] S. Siccardi e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_13
307
308
V. Bellandi et al.
The review study in [1] reports that from 2016 onward, there has been a significant increase in the development of experimental laboratory research methods, with a large number of medical IoT applications devoted to neurology (31%), cardiology (28%), and psychiatry or psychology (20%). The IoT infrastructure contributions in 67% of cases were located at the patient home. These systems are normally connected to some form of cloud system to collect and analyze the data coming from a network of several patients. Feedback to patients can be provided through their physicians, who may receive reports from the system, or directly by the system. In the latter case, feedback may consist of periodical monitoring reports or real-time alerts. Data analysis is of course very important for the advancement of knowledge of diseases themselves, the study of trends, symptoms, prognosis, etc. Moreover, the system can learn from the data it collects, using machine learning algorithms, in order to better assess the status of the patients and give them proper feedback. The infrastructure making smart healthcare systems operative may appear unproblematic. The elements of the infrastructure are interconnected by straightforward services. However, their efficiency and effectiveness are affected by multiple design choices [2]. 1. Big volumes of data are continuously produced and must be processed and stored. Unnecessary overheads must be avoided. Suitable filters should be applied to the data. The processing load should be appropriately balanced between nodes. The network architecture should supports data replication. 2. The machine learning algorithms must be carefully selected in order to ensure the mix of data available is suitable to the specific problems to be faced. Designers have to decide whether to use independent measures of each sensor or to correlate sensors output one each other. Holistic algorithms, learning from a whole patients population, may be used in some cases. In other situations, individual characteristics are important to provide case-specific recommendations. 3. All data is subject to privacy issues, then the system security, the authorization mechanisms, the intrusion detection must be carefully considered. Anonymization of data should be applied whenever possible. 4. The wide heterogeneity of sensors, wearable devices, clinical equipment, operating systems, etc. makes the health network difficult to manage. To guide the design choices to be addressed in creating a smart healthcare system, we review the existing design patterns, considering the following main areas: • the architecture, i.e., fundamental structures of a software system, the elements composing the system and the relations among those elements; • the pipeline, that describes the data flow through the system’s elements; • the solutions and proposals for the privacy problem. The chapter is organized as follows: in Sect. 13.2 we describe the research area and state the research questions. In the following sections, we will review available papers describing projects and methods to answer the stated questions.
13 Smart Healthcare, IoT and Machine …
309
13.2 Architecture and Pipeline The elements of an architecture can be grouped in levels of similar function. This section presents the functions commonly identified in the literature (see e.g. [3–6]). The sensors level or IoT level is a prerequisite for any smart healthcare system. It is composed of the devices interacting with the user and the environment: edge devices, gateways, sensors, etc. Even if its main function is collecting data, intelligent actions can be enabled directly at this level. For example, specific events can be recorded or alerts can be created if some threshold values are exceeded; data can be collected and aggregated, objects can be classified and labeled based on pre-defined structured descriptions before transference to the next level of the architecture. Above the sensor level, a local integration level or Edge level can be found. It collects data generated by the sensors, acting as a hub that conveys the local data transactions to a global level in a single step, but may perform narrow analytical tasks to improve its effectiveness in aggregating or summarizing data. Its main goal, however, is to establish communication between the IoT devices and the Fog nodes or the Cloud data center. In most cases it must manage two different communication protocols: a Local Area Network or Wide Area Network made of the end devices based on 3G/4G/Wi-Fi/Bluetooth and other protocols and a communication channel to reach the Cloud-based on the Internet infrastructure. The term “Local network management” or Edge level is often used for the first set of functions and “Cloud communication management” for the latter. When discussing the models from a data pipeline point of view, this level is sometimes named the “data collection layer”. In older systems, neither specific computational tasks were allocated at this level, nor a lot of data was recorded at this level. However, later on, the concept of “local smart applications” emerged and an increasing number of systems run analytical and even machine learning software on edge devices. In particular, real-time monitoring applications may be placed at this level. In the following sections, we will consider that an edge level is present only when some application is running, giving for granted that a gateway for networking is in general necessary. An optional intermediate integration level or Fog level is more and more frequently found. Several tasks can be performed by fog devices, e.g. data integration and storage, and more demanding computations than those that can be afforded at the edge level. This level may be omitted in some implementations, especially in the older ones. While edge devices perform computation using data from local sensors only and are therefore concerned with a single user or local area, fog devices are usually organized in a network whose nodes may share between each other tasks coming from several IoT or Edges nodes. Some mechanisms for task allocation and resource management are often provided; this level can however delegate more demanding computations to the cloud. The top-level is the global integration level or Cloud and Big Data level. It processes the large amount of data coming from many IoT or Edges instances to perform classification, prediction, trend analysis, data visualization, reporting, etc. At this level, several services can be performed to enhance the performances of the whole system, for instance: monitoring the system status and performances, storing
310
V. Bellandi et al.
historical information about resource demand, job/task allocation to the best available machines, big data analytics, security management, performance prediction. At each level, the number of functions addressing intelligent and data storage tasks can significantly variate. With “intelligence” we intend the ability to filter, cluster, analyze and modify data; possibly give feedback to the user or the inferior level; decide to pass all or selected data to the upper level. In most cases the hardest computational tasks are achieved on the cloud level using so-called back-end servers, increasing the latency of the system, which should quickly react to critical situations. Some slightly different descriptions of the architecture can be found in the literature, either for the number of levels or for their meaning; this latter point depends in general on the authors’ interest to highlight one aspect or the other of the systems. For instance, the IoT Big Data workflow is described in [7] as a four steps process: the first step is data collection at the sensor level. Data is then sent to Big Data frameworks like Hadoop and machine learning algorithms are applied to extract meaningful information. The output is then sent to patients or physicians in a suitable format. In [8] six levels are considered, as the cloud level processing is detailed in data mining activities, knowledge derivation, and user services provision. The following processes are considered: data generation; data gathering and consumption; real-time data processing; data storing; data visualizing. Privacy is an issue at all system levels. Depending on the architecture, however, the higher levels may contain generic or anonymized data only, so that the criticality of the privacy problem is mostly confined to the lower levels. In Sect. 13.6, when describing the data pipeline, we will consider the main risks at each system level and the mechanisms that have been proposed to overcome them.
13.2.1 Research Questions and Methodology Adopted In this chapter we adopt the following main research questions: • which are the characteristics of the architectures used in terms of the number of levels, their processing capabilities, and interactions? How are resources managed? Can they produce real-time alerts? Can they be used by mobile users? • how is the pipeline of data collection and usage organized, as a consequence of the architectural design? Which kinds of data are stored at each level and which ones are sent to the next level? Which are the main privacy and security issues at each system level? To apply this review of the state of the art we adopt this methodology that used the API provided by arxiv.org, Sciencedirect (by Elsevier), Springer Nature, and IEEExpolore to perform some text mining, using search keys related to smart healthcare, artificial intelligence, internet of things and so on. We considered the most relevant and newer papers and obtained the first set of 964 references. We performed a rough clustering of the abstract, then refined the set of references, discarding papers that seemed not relevant. Moreover, we distinguished review papers from papers describing specific
13 Smart Healthcare, IoT and Machine …
311
projects, systems, or implementation proposals, that were the focus of our review. Then we defined some entities that we want to track, for instance “data preparation” and some keywords to represent it, like “data validation”, “feature extraction”, “data aggregation” and so on. A search for these entities in the selected papers was run; its output is a list of entity occurrences and their connections, estimated quite simply in terms of closeness in the papers. In this way, we obtained basic representations of the structure or architecture of the proposed systems, that gives a general picture of the situations. It was then checked against the papers in order to fix misrepresentations and to add some details.
13.3 The General Picture of Levels Figure 13.1 represents the general picture of the architectures. We consider the number of levels as identifying the architecture type (second column in the pictures); in the first column the main goals achieved by the system can be found; the third column contains the network technologies and the last column the main computation types; the thickness of the nodes is proportional to the number of quotations we found in the papers. In this general diagram we included even a few papers that are related to some specific topic and that are not found in the more detailed architectural pictures that follow in Sect. 13.3.1, for instance particular devices or algorithms, systems devoted to a single pathology or to solve specific technical issues, like energy consumption. They are [9–32]. We did not represent relationships that are found less than one half of the maximum frequency. The nodes that appear disconnected are actually of the second order in terms of quotations. On the side of the project’s goals and functionalities, they include the support for users’ mobility, the customization of alerts and advice according to individual characteristics, and the device interoperability. However important these goals, they are addressed mainly by specific works. On the side of the tools, we found that the Internet is not quoted very often, probably because it is so common to be given for granted. Among computations, ontologies and semantics are still used for specific tasks only, among which interoperability is one of the most frequently encountered. System management, that is task allocation, scheduling, synchronization, etc. is considered in a few specific papers. On the other hand, we see that sensors are the most connected node, as is to be expected because they are the main source of information in an IoT environment. In the Sect. 13.3.1 we will describe some functions of the levels and in Figs. 13.2, 13.3 13.4 and 13.5 we detail the characteristics of the main subtypes of architectures that we found.
312
V. Bellandi et al.
13.3.1 Architectures for the Local Integration Level—The Edge Level Local integration networks differ in many ways from one implementation schema to the other, both for the networking technology and topology used and for the processing capabilities at this level. The choice of the Networking technology considers the trade-off between, on one hand, low energy consumption with limited throughput and area coverage and higher energy demand with better performances on the other. The concept of Personal Area Network (PAN) is sometimes used to denote a small network made of sensors and devices communicating the status of the patient (see [4]). Often there are power limitations, so that the device’s functionalities and the network throughput may be low. The information needs to be secured. Some standards used for PANs are, for instance, Bluetooth, IEEE 802.15.4, IEEE 802.15.3, etc. Both master-slave and peer-to-peer architectures can be found. We note that these architectures are basically different from the WAN networks based e.g. on 2G/3G/4G technologies, that are capable of long-range communication. The third type of networks [5] is the Low Power Wide Area Network, that can transmit small amounts of data on long distance. It is well suited for medical devices that may transmit data some times per hour because the values do not suddenly change. They are based on LoRa, LoRaWAN, SigFox, and similar technologies. A different approach [33] considers the use of 5G technology to enhance cellular coverage and network performance, addressing security issues at the same time. In [34] can be found a review of the most common communication technologies. The term “device teaming” [5] is used for architectures were, at level 2, many devices synchronize to send alarms. For instance, data relevant to blood glucose, heartbeat, blood pressure, and temperature can be gathered together. An important distinction in the monitoring systems is between static remote monitoring (patients at home or in hospitals) and dynamic monitoring of patients freely moving. The network architecture requirements are of course more demanding in the latter case.
13.3.2 Task Allocation and Resource Management—The Fog Level Another distinctive point is the choice of the level where processing tasks and resources are allocated and the degree to which allocation is dynamical. Even at the edge level, portable devices like smartphones, smartwatches, compact embedded systems, etc. can perform preprocessing and some low-level elaborations. For instance, [35] considers computing at the edge level, where a number of edge components are listed: edge devices producing data; edge gateways where processing can be done, they work as windows on larger environments beyond the edge; fat clients, that are applications for data processing on edge devices; edge computing equipment connected to Internet computing services. In [5], edge computing is considered for
13 Smart Healthcare, IoT and Machine …
313
some basic tasks i.e.: outliers detection, data validation, compression ad encryption. At the Fog level, it is expected that PCs and servers gather data from sensor networks to perform local processing and storage. In [36] a review of IoT monitoring solutions for medical applications can be found; it is shown a trend to allow ML computing even at the Edge level. Examples are described in the fields of physiological parameters analysis, rehabilitation systems, dietary assessment, epidemic diseases, and diabetes treatment. Some architectures [3] consider the possibility that a Fog Server Manager manages several Fog Data Servers at level 3 of a local IoT network and can forward processing requests to the cloud in case of unavailability of local resources. A mechanism to optimize resource usage between several fog servers and cloud servers is described, using a particle swarm algorithm. It is reported that response time, latency, and energy consumption are reduced by more than 10 % using their resource management approach. The goal is that high-speed computations are performed locally whenever possible and their results are communicated just-in-time [37]. Of course, it must be taken into account that local processing resources are limited, as based on low-end computers, personal devices, and mobile devices. A fog computing scenario too is described in [8], where fog nodes are Raspberry pi and Arduino. A cloudlet is defined as a group of fog nodes that collaboratively divide the tasks and process them in parallel. AI and decision making systems are deployed on those nodes. Alert notifications are sent through an Android application to the physician and caretaker. Other proposals [38] consider Service Composition to give the users the required level of service within the required time limits. Service orchestration and service choreography methods are used, the first being a centralized process and the latter a collaborative one based on the sequence of messages issued by the involved processes. Specific languages for the Service Composition in IoT have been described. Some methods have been proposed [36] to distribute the inference tasks of AI and ML between the Cloud, the Fog, and the Edge levels. In [39] a hierarchical computing architecture, HiCH, is considered, where a Convolutional Neural Network is employed to perform a real-time heart disease detection. HiCH is an extension of the MAPE-K model introduced by IBM.
13.3.3 Global Integration of Tasks and Resources—The Cloud Level Cloud and IoT technologies are complementary paradigms [40], for instance, IoT has a lack of security level, while Cloud Computing can provide a high level of privacy and security. Thanks to the Cloud, personalized ubiquitous applications can be delivered through the IoT, with automatic data collection and distribution at a low cost. The storage of a large amount of data can be achieved in the cloud with virtually unlimited processing power. On the other hand, as already mentioned, critical tasks that need low latency response may benefit from local processing.
314
V. Bellandi et al.
At the Cloud level, it is expected that Big Data high-performance processing happens. Trends in cloud computing, related to Big Data and IoT have been reviewed in [41]. We mention two proposals for Cloud—IoT integration in the health domain: OpenIoT and the SENSEI project. The paper [8] lists some available IoT platforms to perform processing in the cloud: Google’s Cloud Pub/Sub, BigQuery and DataLab; Amazon’s Kinesis, Redshift, and QuickSight; Microsoft Azure components. Other available CloudIoT platforms are listed in [6]: KAA project, Sensor cloud, Etherios, Exosite, Arkessa, Axeda, Nimbits, ThingWorx. The same paper quotes Xively, ThingSpeak as available services to facilitate data collection.
13.3.4 Algorithms and Data Analytics Even if they are not properly at an architectural level, in this section we consider algorithms dedicated to extracting useful information from the collected data, as they describe the levels’ functionalities. Machine learning algorithms can use IoT data to classify symptoms and behaviors, to produce disease clustering, diagnosis, prognosis, risk assessments, etc. Depending on the system’s main purposes, an algorithm can use all the data from a healthcare network, the data of a subnetwork only (e.g. a local area), or single patients’ data to produce personalized analyses. Accordingly, it will be necessary to send more or fewer data to the cloud and to define strategies for data aggregation. Many machine learning algorithms have been used, at several network levels and for a number of different goals. For instance, Convolutional Neural Networks have been used for several tasks, e.g. heart diseases detection [39]; low resources unsupervised machine learning for telemonitoring of Parkinson patients [42]. A review of the types of Neural Networks that have been used in IoT, in general, can be found in [43]: it is reported that CNN has been used for voice diseases, fall detection, and ECG monitoring; RNN for activity recognition. Another review paper [44], takes into account more specifically deep learning applications for health monitoring and disease analysis, and reports several cases using Convolutional Neural Networks and LSTM. The survey [45] of Machine Learning in IoT for healthcare considers the C4.5, the linear regression, the K-Nearest Neighbor, the Expectation-Maximization, the CNN, and the Deep Belief Network models. Other considerations about the algorithms can be found in [46]. Another important point is how the model will be updated if it needs all historical data or can be retrained from scratch. The problem of retraining in the case of wearable devices is addressed, for instance, in [47].
13 Smart Healthcare, IoT and Machine …
315
Fig. 13.1 General model of the architectures
13.3.5 Architectural Configurations Now we turn to the description of the main architectural configurations that can be found in the literature. We recall that in the pictures the thickness of nodes and edges is proportional to how frequently they are found in the projects. The most classical architecture considers just 2 levels, Sensors and Cloud. It is described in many papers, some of which deal with specific aspects of the systems, like: security, authentication methods, blockchain applications and interoperability [48–55]; particular applications of machine learning and neural networks to health problems [56–58]; optimization, data transfer, system organization [59–61]; cognitive systems and semantic technologies [62, 63]; wearable devices characteristics [64–67] describes an intelligent medicine box. Other works [33, 68–73] describe general frameworks or applications to monitor diseases or perform early warning of critical situations. It should be noted, however, that in some cases there may be some intermediate levels, but they were not deemed important from a computational or architectural point of view. For instance, local hubs or gateways are in general used to manage sensors, to collect data and to send it to the cloud. However no significant data storage and computational tasks are performed at this level. Machine Learning and analytical computations are of course performed in the cloud and results are used to monitor patients and to send alerts. Sensors themselves can generate alerts in many cases, e.g. to a mobile phone app. Alerts are intended both for the patients themselves and for the caregivers. Interoperability has been considered in some studies, notably in [52] where a semantic approach is described. Personalized healthcare through federated transfer learning has been described in [61]. Users’ mobility has been explicitly described in [64] and [70], but as several studies refer to wearable sensors, it is expected that some amount of mobility has
316
V. Bellandi et al.
Fig. 13.2 Model of the 2 levels cloud—sensors architecture
been given for granted in many cases. A number of physiological data have been collected with these systems, including heartbeat and other ECG related measures, respiration parameters, blood pressure and glucose level, EEG, and data related to balance. Among pathologies, we mention cardiac diseases, prevention and generic healthcare for elderly people and children, epilepsy, diabetes, and human activity recognition. A couple of papers, [47, 74], describe systems where a lightweight computational network is deployed at the edge level, so that the system’s architecture reduces to the sensors and the edge level. Analogously, [75, 76] consider only the sensors and fog levels. In Fig. 13.3 the model for a 3 level architecture, Cloud-Edge-Sensors, can be found. It is based on [10, 39, 77–96]. It can be seen that these works perform data analysis and Machine Learning and analytics both at the cloud and edge levels of the architecture. Results are mainly used to monitor the patients’ health and to send alerts; real time alerts are considered for instance for ecg data; in [84] they are sent to to physicians both directly and via the cloud. These systems typically consist of (1) a network of sensors collecting data and transmitting them to a gateway (2) the gateway (edge level), that receives and locally stores the data. It runs smart applications that preprocess the data and analyses values to generate timely alerts. These can be sent to the physician and to the patient through mobile phones or the edge device itself (3) the cloud level that archives the patients’ health records with suitable privacy and security standards and performs analytic algorithms on anonymized or aggregated data. The possibility of monitoring and sending alerts to mobile users is seldom considered and, when present, relies mainly on the sensor’s capabilities. The most often quoted signals are those related to heart activity, cholesterol, glucose,
13 Smart Healthcare, IoT and Machine …
317
Fig. 13.3 Model of the 3 levels cloud—edge—sensors architecture
Fig. 13.4 Model of the 3 levels cloud—fog—sensors architecture
blood pressure, and balance. Among the pathologies addressed we find fall detection, asthma, heart diseases, and voice disorders. Ontologies are used just in [77] for heart disease detection and monitoring. Clustering using k-neighbor algorithms is used in several papers, along with convolutional neural networks. A few papers [42, 97–99] use a 3 levels architecture with Cloud, Fog and Sensors levels. In this case, most of the analytical computations are made at the Fog level. Real time alerts are considered specifically in [99], that is devoted to a home hospitalization system.
318
V. Bellandi et al.
Fig. 13.5 Model of the 4 levels architecture
In Fig. 13.5 the model for a 4 level architecture can be found. It is based on [100– 107]. It can be seen that these works privilege data analysis and Machine Learning at the bottom levels of architecture (Fog and Edge); all these levels are strongly connected to the Cloud, whose importance, however, seems reduced compared to other architectures. Results are mainly used to monitor the patients’ health. Alerts and the related performance problems are also considered and managed at the Fog or Sensors level. When the mobility of users is considered [102, 105], it is managed more often at the Fog level, even if in some cases other levels are involved. A typical system of this kind is composed of (1) a body area sensor network consisting of medical, activity, and environment sensors (2) a gateway (the edge level) receiving data from the sensor network, locally storing the data, and performing some analytical activities (3) a fog network, whose nodes receive data and job requests from the edge, allocate the tasks between themselves and manage privacy and security issues. Fog nodes can contain sophisticated deep learning models to process and analyze the input data and generate results. (4) a cloud data center that is invoked when the fog infrastructure is overloaded, or when the size of data exceeds a threshold and the task latency is not critical. About the type of signals managed, ECG related data is quoted in five out of the seven papers of the group. Other data include temperature, pressure, and balance. Data compression is discussed in [105]. Devices interoperability is considered, actually just in [105], in association with semantic methods, specifically in connection with Sensors. The system’s management tasks include both scheduling and synchronization. These last aspects seem a bit niche in this set of papers.
13 Smart Healthcare, IoT and Machine …
319
13.4 Data Pipeline and Data Storage The general picture of data flow among all the architectural levels is drawn in Figs. 13.6 and 13.7 schematically represents the impact of privacy and security risks. At each level, in general, one can have data storage and alert generation. Data flows upstream from each level to all the above and the other way around. Privacy, in general, is taken into account at all the architectural levels; of course, actual risks are related to the amount of not anonymized data stored at each level. Even if privacy and security issues have been discussed by almost all authors, some papers are devoted mainly to these topics and to specific means to prevent attacks. For instance [108, 109] or [110]. Some applications of Machine Learning algorithms to the security of health data have been reviewed in [46]. A more detailed review can be found in [111], where privacy and security issues are considered at the front-end, communication layer, and back end level. In this paper a taxonomy of security and privacy issues in healthcare is proposed, considering processing based (at Fog or Cloud level or using Blockchain), Machine learning (ML) based, Wearable Devices based, IoT based, Telehealthcare based, policy-based (using standards like HIPAA and HITECH), scheme based (authentication through a password or biometric data), and network traffic based (layer-based or attack based). Blockchain techniques have been proposed by several authors to solve privacy and security issues in healthcare systems, see e.g. [112, 113]. Many works described solutions to specific privacy problems. For instance, in [114] it is proposed a privacy-preserving scheme for cluster analysis. A security assessment framework based on an ontological scenario-based approach is described in [28].
Fig. 13.6 Data flow among the system levels
320
V. Bellandi et al.
Fig. 13.7 Graphical representation of privacy impact on the systems
Turning to data flows if the goals of the project are just collecting data for subsequent analysis, trends detection, and study or to monitor the patients’ situation producing a periodic report, there will be data transfer in the bottom-up direction only. On the other hand, if real-time interactions and feedback to the user must be provided and the local devices are “intelligent” enough, a transfer of data in the opposite direction is deemed useful, when the cloud system can feed local appliances with the results of global analyses (e.g. learned weights of a model). Some of the edge computing approaches for ML in the health area are the following: • Process data using conventional ML techniques and transfer the results or necessary features extracted from raw sensor data. • Deploy part of deep learning networks layers on the edge, preprocess the data, and transfer the extracted features, that are smaller than the original input • Deploy Neural Networks on the edge, with the minimal size necessary to maintain the required accuracy • Train the networks on the cloud and ship the trained models to the edge. Moreover, if the real-time interactions and feedback are separate for each signal type one can have feedback programs even at the sensor level; on the contrary, if they need to integrate data one must have feedback programs at the local integration level or above. Apart from the above general considerations, some issues related to data collection must be addressed. The data volume of IoT is large and contains some redundant information so that a bottleneck can occur. Compression schemes, congestion control, path optimization, vertical partitioned, parallel distributed algorithms have been proposed (see e.g. [115]). At the edge level or at the sensors some potential security issues are found (see [6]): control a node changing its software; insert malware in the software of a node; counterfeiting certification keys to access a sensor; gain access to encryption keys; superimpose noise to corrupt wireless transmission; keep sensors running when not required to consume all available energy.
13 Smart Healthcare, IoT and Machine …
321
To solve privacy problems at this level patient identification via biometric data has been considered in [51, 80]. Identification happens at the edge or phone levels and the resultant biometric is not sent to the server for maintaining users’ privacy. Some authors described specific types of attacks, e.g. the so-called man-in-the-middle [30], where each sensor node is evaluated to check if it can be an attacker; and the Sybil attack, where a malicious node uses morphed identities to generate Sybil nodes. Sybil nodes can acquire an authorized node identity and misbehave by affecting their routing information [108]. Both the quoted attacks are at the bottom level of the architecture. The data collection layer, or “local integration” level, is the key point of the performance of the whole system and its capability of real-time responses. In [91] it is described how an edge node is responsible for processing, analyzing, filtering, storing, and sending the data to the cloud. Data compression is used to reduce the number of bits to be transmitted to the edge node. Some methods dynamically adjust the sampling frequency of devices, others take advantage of the temporal correlation in the data; compression techniques usually used for images have been proposed for multivariate data. A lossy compression method is described in [91]. Protocols to send data upward from the edge levels are listed in [8], including Message Queuing Telemetry Transport (MQTT), Constrained Application Protocol (CoAP), Advanced Messaging Queuing Protocol (AMQP), HTTP and Data Distribution Service (DDS). At fog level (or network level) typical security and privacy issue include sending massive traffic to a node to render it nonfunctional; spoofing the identity of a user to access the system; “man in the middle” attacks; creating route loops or congestions in the network. Blockchain techniques have been employed to deal with the identification of IoT objects and with the signing of automatic contracts between various devices. Blockchain at Fog level and generation of the hash of each data so that any change or alteration in data can be spotted out has been reported in [50] and [100], see also [54]. In [53] a design of a secure data sharing scheme based on the blockchain can be found, while an authentication mechanism based on blockchain and suitable for lightweight IoT devices can be found in [75]. The paper [34] is devoted to a mapping of the usage, communication, and analysis of physiological data in smart environments. The first examples considered date back to 2008. Many systems are concerned with data collection during the subject’s normal life, e.g. firemen or police officers during service. Pashazadeh and Navimipour [116] is devoted to a review of Big Data handling in healthcare applications. The MapReduce paradigm is described, along with its Hadoop implementation. Five mechanisms are analyzed: machine learning-based, agent-based, cloud-based, heuristic-based, and hybrid. The analysis, however, is not specific to smart healthcare—IoT systems. A MapReduce framework at edge level is described in [35] to perform real-time analysis. The same paper describes a collaborative cloud—edge framework, where the role of the cloud is to monitor and give instructions to the edge nodes for operations. At the cloud level (or application level) the main security and privacy risks are related to obtaining credentials through phishing; obtaining credentials through malware; obtaining credentials using network applications (cross-site scripting).
322
V. Bellandi et al.
Data encryption has been extensively used and applied to local storage at the sensors, edge and fog levels [100, 102, 105]. Of course, encryption is used during data transmission and for data stored in the cloud (see e.g. [10, 53, 75, 108]), per se or in conjunction with blockchain, especially when other protection means, like e.g. anonimysation, are not provided. Anonymisation of data stored in the cloud is extensively considered a best practice. Support of data analytics algorithms on anonymized patient data has been quoted in [84]. Anonymity preservation has been considered e.g. in [53, 55, 75]. One of the main issues for IoT in healthcare is due to the usage of many different sensors and services by many vendors, each with its features. This results in an interoperability problem, with impacts on the pipeline, as data must be gathered and analyzed as a whole. A semantic backbone is described in [52], based on ontologies, for the Internet of Medical Things. Another use of semantic technologies for healthcare monitoring is described in [63]. Some considerations about interoperability and security issues can be found in [49]. Another important point is related to data aggregation, as data are naturally generated in a distributed fashion and must be combined to get results valid in general cases. A common strategy is to combine predictions from local models trained at each site into a global model. A comparative evaluation of aggregation methods can be found in [117], where both elementary and meta-learning-based aggregation methods have been considered. In [61] a federated transfer learning framework is described to aggregate data from wearable devices and build personalized models for each user. Data minimization and data aggregation are two main points considered in the privacy by design for IoT systems approach (see [118]). Such general approaches and privacy by design methods do not seem to be very popular, even if, as noted above, data aggregation is often quoted in relation to various objectives. Accordingly it is likely that privacy is attained as a kind of byproduct, if not as a primary goal of aggregation. For instance, [9, 11, 11, 28, 30, 39, 61, 64, 77, 77, 83, 84, 91, 92, 96, 103, 105] and others report data aggregation for several reasons, such as sparing network overhead and storage, computing synthetic indicators, and privacy issues too.
13.5 Conclusion As described above Smart healthcare systems are based on IoT devices placed at the patients’ home or wearable devices. They are becoming more and more popular because the technology of such devices is mature enough and the costs are affordable. Moreover such systems are an attracting alternative to hospitalization of elderly people, especially those needing to be monitored for long term non acute diseases. In this chapter we had provided a complete review of the state of the art relating to healthcare systems, taking into consideration the issues described above and therefore
13 Smart Healthcare, IoT and Machine …
323
defining a taxonomy and classification of the systems under analysis. Moreover we focused on design of architectures related to the smart healthcare and the implications that they have on the machine learning algorithms applied in this area.
References 1. F. Sadoughi, A. Behmanesh, N. Sayfouri, Internet of things in medicine: a systematic mapping study. J. Biomed. Inf. 103(2020). https://doi.org/10.1016/j.jbi.2020.103383 2. S. Vidya Priya Darcini, D.P. Isravel, S. Silas (2020) A comprehensive review on the emerging IoT-cloud based technologies for smart healthcare, in 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS) vol. 9058619 https://doi.org/ 10.1109/ICACCS48705.2020.9074457 3. S.S. Gill, P. Garraghan, R. Buyya, Router: fog enabled cloud based intelligent resource management approach for smart home IoT devices. J. Syst. Softw. 154(2019). https://doi.org/10. 1016/j.jss.2019.04.058 4. A. Rajput, T. Brahimi, Chapter 15: Characterizing internet of medical things/personal area networks landscape. Innovat. Health Inf. (2020) https://doi.org/10.1016/B978-0-12-8190432.00015-0 5. C. Ana Maria Dr˘agulinescu, A.F. Manea, O. Fratu, A. Dr˘agulinescu, LoRa-based medical IoT system architecture and testbed. Wireless Personal Commun. (2020). https://doi.org/10. 1007/s11277-020-07235-z 6. J.L. Shah, H.F. Bhat, CloudIoT for Smart Healthcare: architecture, issues, and challenges. Internet of Things Use Cases for the Healthcare Industry (2020). https://doi.org/10.1007/ 978-3-030-37526-3_5 7. R. Jha, V. Bhattacharjee, A. Mustafi, IoT in Healthcare: a big data perspective. Smart Healthcare Anal. IoT Enabled Environ. (2020). https://doi.org/10.1007/978-3-030-37551-5_13 8. G. Jeya Shree, S. Padmavathi, A fog-based approach for real-time analytics of IoT-enabled healthcare. Internet of Things Use Cases Healthcare Ind (2020). https://doi.org/10.1007/9783-030-37526-3_11 9. S.Md. Mahamud, Md.M. Islam, Md.S. Rahman, S.H. Suman, Custody: an IoT based patient surveillance device, in Proceedings of the Future Technologies Conference (FTC) 2018(2019). https://doi.org/10.1007/978-3-030-02686-8_18 10. U. Syed Tauhid Shah, F. Badshah, F. Dad, N. Amin, M.A. Jan, Cloud-assisted IoT-based smart respiratory monitoring system for asthma patients. Appl. Intell. Technol. Healthcare (2019). https://doi.org/10.1007/978-3-319-96139-2_8 11. M. Hilal Özcanhan, U. Semih, M.S. Unluturk, Neural network-supported patient-adaptive fall prevention system. Neu. Comput. Appl. (2020). https://doi.org/10.1007/s00521-019-04451y 12. O.M. Igwe, Y. Wang, G.C. Giakos, J. Fu, Human activity recognition in smart environments employing margin setting algorithm. J. Amb. Intell. Humanized Comput. (2020). https://doi. org/10.1007/s12652-020-02229-y 13. X. Zhou, W. Liang, K. I-Kai Wang, H. Wang, L.T. Yang, Q. Jin, Deep learning enhanced human activity recognition for internet of healthcare things. IEEE Int. Things J. 6488907(2020). https://doi.org/10.1109/JIOT.2020.2985082 14. A. Almazroa, H. Sun, An internet of things (IoT) management system for improving homecare—a case study, in International Symposium on Networks. Computers and Communications (ISNCC) 8894812(2019). https://doi.org/10.1109/ISNCC.2019.8909186 15. T. Zhang, A. Hassan Sodhro, Z. Luo, N. Zahid, M.W. Nawaz, S. Pirbhulal, M. Muzammal, A joint deep learning and internet of medical things driven framework for elderly patients. IEEE Access 6287639(2020). https://doi.org/10.1109/ACCESS.2020.2989143
324
V. Bellandi et al.
16. Md.A. Sayeed, S.P. Mohanty, E. Kougianos, H.P. Zaveri, Neuro-detect: a machine learningbased fast and accurate seizure detection system in the IoMT. IEEE Trans. Cons. Electron. 30(2019). https://doi.org/10.1109/TCE.2019.2917895 17. N. Wadhwani, N. Mehta, N. Ruban, IOT based biomedical wireless sensor networks and machine learning algorithms for detection of diseased conditions. 2019 Innovations in Power and Advanced Computing Technologies (i-PACT) vol. 8956176 (2019). https://doi.org/10. 1109/i-PACT44901.2019.8960191 18. A. Athira, T.D. Devika, K.R. Varsha, S. Sanjanaa, S. Bose, Design and development of IOT based multi-parameter patient monitoring system. 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS) vol. 9058619 (2020). https:// doi.org/10.1109/ICACCS48705.2020.9074293 19. H. Kordestani, R. Mojarad, A. Chibani, A. Osmani, Y. Amirat, K. Barkaoui, W. Zahran, Hapicare: A Healthcare Monitoring System with Self-Adaptive Coaching using Probabilistic Reasoning, in 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA) vol. 9006726 (2019). https://doi.org/10.1109/AICCSA47632.2019. 9035291 20. V. Bianchi, M. Bassoli, G. Lombardo, P. Fornacciari, M. Mordonini, I. De Munari, IoT wearable sensor and deep learning: an integrated approach for personalized human activity recognition in a smart home environment. IEEE Int. Things J. 6488907. (2019). https://doi.org/10. 1109/JIOT.2019.2920283 21. Q. Zhang, D. Zhou, X. Zeng, Hear the heart: Daily cardiac health monitoring using EarECG and machine learning, in IEEE 8th Annual Ubiquitous Computing. Electronics and Mobile Communication Conference (UEMCON) vol. 8234833 (2017). https://doi.org/10. 1109/UEMCON.2017.8249110 22. S. Nookhao, V. Thananant, T. Khunkhao, Development of IoT heartbeat and body temperature monitoring system for community health volunteer. 2020 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT & NCON), vol. 9085868 (2020). https://doi.org/10.1109/ECTIDAMTNCON48261.2020.9090692 23. A.K.M. Iqtidar Newaz, A. Kumar Sikder, M. Ashiqur Rahman, A. Selcuk Uluagac, HealthGuard: a machine learning-based security framework for smart healthcare systems (2019) 24. M. Bhatia, S.K. Sood, A comprehensive health assessment framework to facilitate IoT-assisted smart workouts: a predictive healthcare perspective. Comput. Ind. 92–93(2017). https://doi. org/10.1016/j.compind.2017.06.009 25. H. Qiu, M. Qiu, Z. Lu, Selective encryption on ECG data in body sensor network based on supervised machine learning. Inf. Fusion 55(2020). https://doi.org/10.1016/j.inffus.2019.07. 012 26. Md. Zia Uddin, M. Mehedi Hassan, A. Alsanad, C. Savaglio, A body sensor data fusion and deep recurrent neural network-based behavior recognition approach for robust healthcare. Inf. Fusion 55(2020). https://doi.org/10.1016/j.inffus.2019.08.004 27. M. Amoon, T. Altameem, A. Altameem, Internet of things sensor assisted security and quality analysis for health care data sets using artificial intelligent based heuristic health management system. Measurement 161(2020). https://doi.org/10.1016/j.measurement.2020.107861 28. F. Alsubaei, A. Abuhussein, V. Shandilya, S. Shiva, IoMT-SAF: internet of medical things security assessment framework. Internet of Things 8(2019). https://doi.org/10.1016/j.iot. 2019.100123 29. M. Mehedi Hassan, S. Ullah, M.S. Hossain, A. Alelaiwi, An end-to-end deep learning model for human activity recognition from highly sparse body sensor data in Internet Med Things Environ. J Supercomput. (2020). https://doi.org/10.1007/s11227-020-03361-4 30. A. Kore, S. Patil, IC-MADS: IoT enabled cross layer man-in-middle attack detection system for smart healthcare application. Wireless Personal Commun. (2020). https://doi.org/10.1007/ s11277-020-07250-0 31. P. Gupta, A. Pandey, P. Akshita, A. Sharma, IoT based healthcare kit for diabetic foot ulcer. Proc ICRIC 2019(2020). https://doi.org/10.1007/978-3-030-29407-6_2
13 Smart Healthcare, IoT and Machine …
325
32. S. Ranjani, Rajendran, Machine learning applications for a real-time monitoring of arrhythmia patients using IoT. Internet Things Healthcare Technol. (2021). https://doi.org/10.1007/978981-15-4112-4_5 33. B. Mohanta, P. Das, S. Patnaik, Healthcare 5.0: a paradigm shift in digital healthcare system using artificial intelligence. IOT and 5G Commun, in 2019 International Conference on Applied Machine Learning (ICAML), vol. 8967488 (2019). https://doi.org/10.1109/ ICAML48257.2019.00044 34. S.J.A. Aranda, L.P.S. Dias, J.L.V. Barbosa, Carvalho, J.V., J.E. da Rosa Tavares, M.C. Tavares, Collection and analysis of physiological data in smart environments: a systematic mapping. J. Amb Intell. Human. Comput. (2020). https://doi.org/10.1007/s12652-019-01409-9 35. P. Verma, S. Fatima, Smart healthcare applications and real-time analytics through edge computing. Internet Things Use Cases Healthcare Ind (2020). https://doi.org/10.1007/9783-030-37526-3_11 36. L. Greco, G. Percannella, P. Ritrovato, F. Tortorella, M. Vento, Trends in IoT based solutions for health care: moving AI to the edge. Pattern Recogn Lett 135 (2020) 37. N. Mani, A. Singh, S.L. Nimmagadda, An IoT guided healthcare monitoring system for managing real-time notifications by fog computing services. Proced. Comput. Sci. 167 (2020). https://doi.org/10.1016/j.procs.2020.03.424 38. I. Machorro-Cano, G. Alor-Hernández, J.O. Olmedo-Aguirre, L. Rodríguez-Mazahua, M.G. Segura-Ozuna, IoT services orchestration and choreography in the healthcare domain. Tech Tools Methodol Appl Glob Supply Chain Ecosyst. (2020). https://doi.org/10.1007/978-3030-26488-8_19 39. I. Azimi, J. Takalo-Mattila, A. Anzanpour, A.M. Rahmani, J.-P. Soininen, P. Liljeberg, Empowering healthcare IoT systems with hierarchical edge-based deep learning, 2018 IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), vol. 8641765 (2018). https://doi.org/10.1145/3278576. 3278597 40. A. Darwish, A.E. Hassanien, M. Elhoseny, A.K. Sangaiah, K. Muhammad, The impact of the hybrid platform of internet of things and cloud computing on healthcare systems: opportunities, challenges, and open problems. J. Amb. Intell. Humanized Comput. (2019). https://doi. org/10.1007/s12652-017-0659-1 41. A. Kobusi´nska, C. Leung, C.-H. Hsu, S. Raghavendra, V. Chang, Emerging trends, issues and challenges in internet of things, big data and cloud computing. Fut. Generat. Comput. Syst. 87(2018). https://doi.org/10.1016/j.future.2018.05.021 42. D. Borthakur, H. Dubey, N. Constant, L. Mahler, K. Mankodiya, Smart fog: fog computing framework for unsupervised clustering analytics in wearable internet of things (2017). https:// doi.org/10.1109/GlobalSIP.2017.8308687 43. T.J. Saleem, M.A. Chishti, Deep learning for internet of things data analytics. Proced. Comput. Sci. 163(2019). https://doi.org/10.1016/j.procs.2019.12.120 44. X. Ma, T. Yao, H. Menglan, Y. Dong, W. Liu, F. Wang, J. Liu, A survey on deep learning empowered IoT applications. IEEE Access 6287639 (2019). https://doi.org/10.1109/ ACCESS.2019.2958962 45. S. Durga, R. Nag, E. Daniel, Survey on machine learning and deep learning algorithms used in internet of things (IoT) healthcare, in 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), vol. 8811524 (2019). https://doi.org/10.1109/ ICCMC.2019.8819806 46. P. Ghosal, D. Das, I. Das, Extensive survey on cloud-based IoT-healthcare and security using machine learning, in 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), vol. 8716487 (2018). https://doi.org/ 10.1109/ICRCICN.2018.8718717 47. S.A. Rokni, H. Ghasemzadeh, Plug-n-learn: automatic learning of computational algorithms in human-centered internet-of-things applications, in 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), vol. 7502236 (2016). https://doi.org/10.1145/2897937. 2898066
326
V. Bellandi et al.
48. S. Boudko, H. Abie, Adaptive cybersecurity framework for healthcare internet of things. 2019 13th International Symposium on Medical Information and Communication Technology (ISMICT), vol. 8741513 (2019) https://doi.org/10.1109/ISMICT.2019.8743905 49. M.L. Challa, K.L.S. Soujanya, C.D. Amulya, Remote monitoring and maintenance of patients via IoT healthcare security and interoperability approach. Cybernet. Cogn. Mach. Lear. Appl. (2020). https://doi.org/10.1007/978-981-15-1632-0_22 50. G. Rathee, A. Sharma, H. Saini, R. Kumar, R. Iqbal, A hybrid framework for multimedia data processing in IoT-healthcare using blockchain technology. Mult. Tools Appl. (2020). https:// doi.org/10.1007/s11042-019-07835-3 51. H. Hamidi, An approach to develop the smart health using Internet of things and authentication based on biometric technology. Fut. Generation Comput. Syst. 91(2019). https://doi.org/10. 1016/j.future.2018.09.024 52. I. Villanueva-Miranda, H. Nazeran, R. Martinek, A semantic interoperability approach to heterogeneous internet of medical things (IoMT) platforms, in 2018 IEEE 20th International Conference on e-Health Networking, Applications and Services (Healthcom), vol. 8502682 (2018). https://doi.org/10.1109/HealthCom.2018.8531103 53. X. Cheng, F. Chen, D. Xie, H. Sun, C. Huang, Design of a secure medical data sharing scheme based on blockchain. J. Med. Syst. (2020). https://doi.org/10.1007/s10916-019-1468-1 54. G. Tripathi, M.A. Ahad, S. Paiva, S2HS- A blockchain based approach for smart healthcare system. Healthcare 8(2020). https://doi.org/10.1016/j.hjdsi.2019.100391 55. F. Merabet, A. Cherif, M. Belkadi, O. Blazy, Emmanuel conchon, damien sauveron, new efficient M2C and M2M mutual authentication protocols for IoT-based healthcare applications. Peer-to-Peer Network. Appl. (2020). https://doi.org/10.1007/s12083-019-00782-8 56. W.N. Ismail, M. Mehedi Hassan, H.A. Alsalamah, G. Fortino, CNN-based health model for regular health factors analysis in internet-of-medical things environment. IEEE Access 6287639 (2020). https://doi.org/10.1109/ACCESS.2020.2980938 57. G. Mylavarapu, J.P. Thomas, A multi-task machine learning approach for comorbid patient prioritization, in 2017 IEEE International Conference on Big Data (Big Data), vol. 8241556 (2017). https://doi.org/10.1109/BigData.2017.8258392 58. P. Malarvizhi Kumar, U.D. Gandhi, A novel three-tier internet of things architecture with machine learning algorithm for early detection of heart diseases. Comput. Electr. Eng. 65(2018). https://doi.org/10.1016/j.compeleceng.2017.09.001 59. R.P. França, Y. Iano, B. Ana Carolina Monteiro, R. Arthur, A methodology for improving efficiency in data transmission in healthcare systems. Int. Things for Healthcare Technol. (2021). https://doi.org/10.1007/978-981-15-4112-4_3 60. N. Moraes, do Nascimento, C. José Pereira de Lucena, FIoT: an agent-based framework for self-adaptive and self-organizing applications based on the Internet of Things. Inf. Sci. 378(2017). https://doi.org/10.1016/j.ins.2016.10.031 61. Y. Chen, J. Wang, C. Yu, W. Gao, X. Qin, FedHealth: a federated transfer learning framework for wearable healthcare (2019). arxiv.org:1907.09173 62. S.U. Amin, M. Shamim Hossain, G. Muhammad, M. Alhussein, Md.A. Rahman, Cognitive smart healthcare for pathology detection and monitoring. IEEE Access 6287639(2019). https://doi.org/10.1109/ACCESS.2019.2891390 63. A. Dridi, S. Sassi, S. Faiz, A smart IoT platform for personalized healthcare monitoring using semantic technologies, in 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI), vol. 8344747 (2017). https://doi.org/10.1109/ICTAI.2017.00182 64. S. Din, A. Paul, Erratum to “Smart health monitoring and management system: Toward autonomous wearable sensing for Internet of Things using big data analytics [Future Gener. Comput. Syst. 91 (2020), 611–619]”. Fut. Generation Computer Systems 108(2019). https:// doi.org/10.1016/j.future.2019.06.035 65. S.A. Khowaja, A.G. Prabono, F. Setiawan, B.N. Yahya, S.-L. Lee, Contextual activity based healthcare internet of things, services, and people (HIoTSP): an architectural framework for healthcare monitoring using wearable sensors. Comput. Netw. 145(2018). https://doi.org/10. 1016/j.comnet.2018.09.003
13 Smart Healthcare, IoT and Machine …
327
66. Y. Zhang, J. Cui, K. Ma, H. Chen, J. Zhang, A wristband device for detecting human pulse and motion based on the Internet of Things. Measurement 163(2020). https://doi.org/10.1016/j. measurement.2020.108036 67. A. Jagtap, A. Chougule, S. Pujari, A. Khamkar, G. Machhale, Intelligent medicine box for medication management using internet-of things. ICDSMLA 2019(2020). https://doi.org/10. 1007/978-981-15-1420-3_15 68. P. Kaur, N. Sharma, A. Singh, B. Gill, CI-DPF: a cloud IoT based framework for diabetes prediction, in IEEE 9th Annual Information Technology. Electronics and Mobile Communication Conference (IEMCON) 8584037 (2018). https://doi.org/10.1109/IEMCON.2018.8614775 69. A. AbdulGhaffar, S. Mohammad Mostafa, A. Alsaleh, T. Sheltami, E.M. Shakshuki, Internet of things based multiple disease monitoring and health improvement system. J. Amb. Intell. Humanized Comput. (2020). https://doi.org/10.1007/s12652-019-01204-6 70. V. Karmani, A.A. Chandio, P. Karmani, M. Chandio, I.A. Korejo, Towards self-aware heatstroke early-warning system based on healthcare IoT, in 2019 Third World Conference on Smart Trends in Systems Security and Sustainablity (WorldS4), vol. 8892594 (2019). https:// doi.org/10.1109/WorldS4.2019.8904006 71. N. Nigar, L. Chowdhury, An intelligent children healthcare system by using ensemble technique, Proceedings of International Joint Conference on. Computational Intelligence (2020). https://doi.org/10.1007/978-981-13-7564-4_12 72. S. Sendra, L. Parra, J. Lloret, J. Tomás, Smart system for children s chronic illness monitoring. Inf. Fusion 40 (2018). https://doi.org/10.1016/j.inffus.2017.06.002 73. N.G.B. Pulgarín, L.D.C. Aljure, O.J.S. Parra, eHeart-BP, prototype of the internet of things to monitor blood pressure, in 2019 IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), vol. 8905987 (2019). https:// doi.org/10.1109/CHASE48038.2019.00025 74. P. Agarwal, M. Alam, A lightweight deep learning model for human activity recognition on edge devices. Proc. Comput. Sci. 167(2020). https://doi.org/10.1016/j.procs.2020.03.289 75. U. Khalid, M. Asim, T. Baker, P.C.K. Hung, M.A. Tariq, L. Rafferty, A decentralized lightweight blockchain-based authentication mechanism for IoT systems. Clust. Comput. (2020). https://doi.org/10.1007/s10586-020-03058-6 76. D. Ravì, C. Wong, B. Lo, G.-Z. Yang, A deep learning approach to on-Node sensor data analytics for mobile or wearable devices. IEEE J. Biomed. Health Inf. 6221020(2017). https:// doi.org/10.1109/JBHI.2016.2633287 77. K.G. Rani Roopha Devi, R. Mahendra Chozhan, R. Murugesan, Cognitive IoT integration for smart healthcare: case study for heart disease detection and monitoring, 2019 International Conference on Recent Advances in Energy-efficient Computing and Communication (ICRAECC), vol. 8975948. (2019) https://doi.org/10.1109/ICRAECC43874.2019.8995049 78. MMd. Islam, A. Rahaman, Md.R. Islam, Development of smart healthcare monitoring system in IoT environment. SN Comput. Sci. (2020). https://doi.org/10.1007/s42979-020-00195-y 79. K. Kommuri, V.R. Kolluru, Prototype development of CAQSS health Care system with MQTT protocol by using Atmega328, in 2020 International Conference on Artificial Intelligence and Signal Processing (AISP), vol. 9057353. https://doi.org/10.1109/AISP48273.2020.9073339 80. H.A. El Zouka, M.M. Hosni, Secure IoT communications for smart healthcare monitoring system. Internet of Things (2019). https://doi.org/10.1016/j.iot.2019.01.003 81. G. Muhammad, M.F. Alhamid, M. Alsulaiman, B. Gupta, Edge computing with cloud for voice disorder assessment and treatment. IEEE Commun. Magaz. 35(2018). https://doi.org/ 10.1109/MCOM.2018.1700790 82. T. Muhammed, R. Mehmood, A. Albeshri, I. Katib, UbeHealth: a personalized ubiquitous cloud and edge-enabled networked healthcare system for smart cities. IEEE Access 6287639 (2018). https://doi.org/10.1109/ACCESS.2018.2846609 83. M. Hossain, S.M. Riazul Islam, F. Ali, K.-S. Kwak, R. Hasan, An internet of things-based health prescription assistant and its security system design. Fut. Generat. Comput. Syst. 82(2018). https://doi.org/10.1016/j.future.2017.11.020
328
V. Bellandi et al.
84. R.K. Pathinarupothi, P. Durga, E.S. Rangan, IoT-based smart edge for global health: remote monitoring with severity detection and alerts transmission. IEEE Internet of Things J. 6488907 (2019). https://doi.org/10.1109/JIOT.2018.2870068 85. K.N. Qureshi, S. Din, G. Jeon, F. Piccialli, An accurate and dynamic predictive model for a smart M-Health system using machine learning. Inf. Sci. (2020). https://doi.org/10.1016/j. ins.2020.06.025 86. D. Mrozek, A. Koczur, B. Małysiak-Mrozek, Fall detection in older adults with mobile IoT devices and machine learning in the cloud and on the edge. Inf. Sci. 537(2020). https://doi. org/10.1016/j.ins.2020.05.070 87. D.F.S. Santos, H.O. Almeida, A. Perkusich, A personal connected health system for the Internet of Things based on the constrained application protocol. Comput. Electr. Eng. 44(2015). https://doi.org/10.1016/j.compeleceng.2015.02.020 88. X. Qian, H. Chen, H. Jiang, J. Green, H. Cheng, M.-C. Huang, Wearable computing architecture over distributed deep learning hierarchy: fall detection study. IEEE Sens. J. 7361(2020). https://doi.org/10.1109/JSEN.2020.2988667 89. Z.Md. Fadlullah, A.-S.K. Pathan, H. Gacanin, On Delay-sensitive healthcare data analytics at the network edge based on deep learning, in 2018 14th International Wireless Communications & Mobile Computing Conference (IWCMC), vol. 8410977 (2018). https://doi.org/10.1109/ IWCMC.2018.8450475 90. W.-J. Chang, L.-B. Chen, C.-H. Hsu, C.-P. Lin, T.-C. Yang, A deep learning-based intelligent medicine recognition system for chronic patients. IEEE Access 6287639 (2019). https://doi. org/10.1109/ACCESS.2019.2908843 91. J. Azar, A. Makhoul, M. Barhamgi, R. Couturier, An energy efficient IoT data compression approach for edge machine learning. Future Generat. Comput. Syst. 96(2019). https://doi. org/10.1016/j.future.2019.02.005 92. A. Vishwanatham, N. Ch, S.R. Abhishek, C.R. Ramakrishna, S. Sankara, S. Sanagapati, S. Mohanty, Smart and wearable ECG monitoring system as a point of care (POC) device, in 2018 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS) vol. 8703707 (2018)https://doi.org/10.1109/ANTS.2018.8710115 93. U. Satija, B. Ramkumar, M.S. Manikandan, Real-time signal quality-aware ECG telemetry system for IoT-based health care monitoring, in IEEE I. Things J. 6488907 (2017). https:// doi.org/10.1109/JIOT.2017.2670022 94. J. Boobalan, M. Malleswaran, A novel and customizable framework for IoT based smart home nursing for elderly care. Emerg. Trends Comput Expert Technol. (2020). https://doi.org/10. 1007/978-3-030-32150-5_3 95. K. Gnana Sheela, A.R. Varghese, Machine Learning based health monitoring system. Mater. Today: Proc. 24(2020). https://doi.org/10.1016/j.matpr.2020.03.603 96. S.R. Moosavi, T.N. Gia, A.-M. Rahmani, E. Nigussie, H. Tenhunen, SEA: a secure and efficient authentication and authorization architecture for IoT-based healthcare using smart gateways. Proc. Comput. Sci. 52(2015). https://doi.org/10.1016/j.procs.2015.05.013 97. R. Patan, G.S. Pradeep Ghantasala, R. Sekaran, D. Gupta, M. Ramachandran, Smart healthcare and quality of service in IoT using grey filter convolutional based cyber physical system. Sustain. Cities Soc. 59 (2020). https://doi.org/10.1016/j.scs.2020.102141 98. Bhatia, M., Kaur, S., S.K. Sood, V. Behal, Internet of things-inspired healthcare system for urine-based diabetes prediction. Artif. Intell. Med. 107(2020). https://doi.org/10.1016/ j.artmed.2020.101913 99. H.B., Hassen, N. Ayari, B. Hamdi, A home hospitalization system based on the internet of things, fog computing and cloud computing. Inf. Med. Unlocked 20(2020). https://doi.org/ 10.1016/j.imu.2020.100368 100. S. Tuli, N. Basumatary, S.S. Gill, M. Kahani, R.C. Arya, G.S. Wander, R. Buyya, HealthFog: an ensemble deep learning based smart healthcare system for automatic diagnosis of heart diseases in integrated IoT and fog computing environments. Fut. Gener.Computing Systems 2020(2019). https://doi.org/10.1016/j.future.2019.10.043
13 Smart Healthcare, IoT and Machine …
329
101. J. Yu, B. Fu, A. Cao, Z. He, D. Wu, EdgeCNN: a hybrid architecture for agile learning of healthcare data from IoT devices, in 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), vol. 8635632 (2018) . https://doi.org/10.1109/PADSW. 2018.8644604 102. A. Mukherjee, D. De, S.K. Ghosh, FogIoHT: a weighted majority game theory based energyefficient delay-sensitive fog network for internet of health things. Internet of Things 11 (2020). https://doi.org/10.1016/j.iot.2020.100181 103. P. Pratim Ray, D. Dash, D. De, Internet of things-based real-time model study on e-healthcare: device, message service and dew computing. Comput. Netw. 149 (2019). https://doi.org/10. 1016/j.comnet.2018.12.006 104. A. Anzanpour, H. Rashid, A.M. Rahmani, A. Jantsch, P. Liljeberg, Energy-efficient and reliable wearable internet-of-things through fog-assisted dynamic goal management. Proc. Comp. Sci. 151(2019). https://doi.org/10.1016/j.procs.2019.04.067 105. A.M. Rahmani, T.N. Gia, B. Negash, A. Anzanpour, P. Liljeberg, Exploiting smart e-Health gateways at the edge of healthcare internet-of-things: a fog computing approach. Fut. Gener. Comput. Syst. 78(2018). https://doi.org/10.1016/j.future.2017.02.014 106. G. Neagu, M. Ianculescu, A. Alexandru, V. Florian, C. Zoie R˘adulescu, Next generation IoT and its influence on decision-making. An Illustrat. Case Study. Proc. Comput. Sci. 162 (2019). https://doi.org/10.1016/j.procs.2019.12.023 107. H. Dubey, A. Monteiro, N. Constant, M. Abtahi, D. Borthakur, L. Mahler, Y. Sun, Q. Yang, U. Akbar, K. Mankodiya, Fog computing in medical internet-of-things: architecture. Implement. Appl. (2017). https://doi.org/10.1007/978-3-319-58280-1_11 108. S. Vaishnavi, T. Sethukarasi, SybilWatch: a novel approach to detect sybil attack in IoT based smart health care. J. Ambient Intell. Humanized Comput. (2020). https://doi.org/10.1007/ s12652-020-02189-3 109. R. Guo, X. Li, D. Zheng, Y. Zhang, An attribute-based encryption scheme with multiple authorities on hierarchical personal health record in cloud. J. Supercomput. (2020). https:// doi.org/10.1007/s11227-018-2644-7 110. Chatterjee, U., D. Sadhukhan, S. Ray, An improved authentication and key agreement protocol for smart healthcare system in the context of internet of things using elliptic curve cryptography, in Proceedings of International Conference on IoT Inclusive Life (ICIIL 2019), NITTTR Chandigarh, India (2020). https://doi.org/10.1007/978-981-15-3020-3_2 111. J.J. Hathaliya, S. Tanwar, An exhaustive survey on security and privacy issues in Healthcare 4.0. Comput. Commun. 153 (2020). https://doi.org/10.1016/j.comcom.2020.02.018 112. R.G. Shukla, A. Agarwal, S. Shukla, Chapter 10: blockchain-powered smart healthcare system. Handbook Res. Blockchain Tech. https://doi.org/10.1016/B978-0-12-819816-2.000101 113. H. Rathore, A. Mohamed, M. Guizani, Chapter 8: Blockchain Applications for Healthcare (Energ. Effic. Med. Dev, Healthcare Appl, 2020) 114. Z. Guan, Z. Lv, D. Xiaojiang, W. Longfei, M. Guizani, Achieving data utility-privacy tradeoff in internet of medical things: a machine learning approach. Fut. Generat. Comput. Syst. 98(2019). https://doi.org/10.1016/j.future.2019.01.058 115. J. Peng, K. Cai, X. Jin, High concurrency massive data collection algorithm for IoMT applications. Comput. Commun. 157(2020). https://doi.org/10.1016/j.comcom.2020.04.045 116. A. Pashazadeh, N.J. Navimipour, Big data handling mechanisms in the healthcare applications: a comprehensive and systematic literature review. J. Biomed. Inf. 82(2018). https://doi.org/ 10.1016/j.jbi.2018.03.014 117. B. Trevizan, J. Chamby-Diaz, A.L.C. Bazzan, M. Recamonde-Mendoza, A comparative evaluation of aggregation methods for machine learning over vertically partitioned data. Expert Syst. Appl. 152(2020). https://doi.org/10.1016/j.eswa.2020.113406 118. C. Perera, C. McCormick, A.K. Bandara, B.A. Price, B. Nuseibeh, Privacy-by-design framework for assessing internet of things applications and platforms, in 6th International Conference on the Internet of Things (IoT 16) (2016). https://doi.org/10.1145/2991561.2991566
330
V. Bellandi et al.
Valerio Bellandi is an Assistant Professor at the Computer Science Department, Università degli Studi di Milano. His research interests are in the Smart Data-Driven Systems research area. He works on Big Data Models and Platforms, Data Simulation, and Network Analysis. On these topics, he has published several scientific papers. As a data scientist, he was involved in several international research projects and innovative startups. He was a recipient of the Chester-Sall Award from IEEE IES. Paolo Ceravolo is an Associate Professor at the Computer Science Department, Università degli Studi di Milano. His research interests include Data Representation and Integration, Business Process Monitoring, Empirical Software Engineering. On these topics, he has published several scientific papers. As a data scientist, he was involved in several international research projects and innovative startups. The URL for his web page is http://www.di.unimi.it/ceravolo. Ernesto Damiani (Senior Member, IEEE) is Full Professor with Universita’ degli Studi di Milano, where he leads the SESAR Lab, Senior Director of the Artificial Intelligence and Intelligent Systems Institute, and President of the Consortium of Italian Computer Science Universities (CINI), and. His work has more than 17 100 citations on Google Scholar and more than 6 700 citations on Scopus, with an H-index of 35. His areas of interest include artificial intelligence, machine learning, big data analytics, edge/cloud security and performance, and cyber-physical systems. He was a recipient of the Stephen Yau Award from the Service Society, the Outstanding contributions Award from IFIP TC2, the Chester-Sall Award from IEEE IES, and a doctorate honoris causa from INSA Lyon, France, for his contribution to big data teaching and research. Stefano Siccardi is an assistant researcher at Università degli Studi di Milano, Computer Science Department. He completed his M.D. in Mathematics and Ph.D. in Computer Science at Università degli Studi di Milano. He published 20 research papers in reputed journals. He worked for more than 30 years as an IT consultant in several business areas (healthcare, finance, manufacturing, etc.) and software applications (business intelligence, ERP systems, scientific computation, etc.). His areas of interest are artificial intelligence, data mining, unconventional computing, and quantum computing.
Chapter 14
Digital Business Models in the Healthcare Industry Nathalie Hoppe, Felix Häfner, and Ralf Härting
Abstract Nowadays, digital technologies become more and more indispensable on a personal and business level. New innovations accelerate processes and disrupt the markets even in the healthcare sector. A wide range of studies have demonstrated the effectiveness of digital technologies for numerous application areas like diagnostics or treatment, but there is no research about the general potential that experts from the healthcare sector see in the implementation of digital business models. In addition to technological developments and low research depth in this area, pandemics like Covid-19 demonstrate the importance of the healthcare industry. Through this motivation a research project on the topic “Potential benefits of digital business models in the healthcare industry” was developed to answer this concern. The authors could identify key performance indicators (KPIs), individualization, efficiency and communication channels as central potentials. These determinants were evaluated by means of structural equation modelling, whereby KPIs and communication channels show a significant influence on the potential of digital business models and their processes in healthcare. In order to address the rapid developments in the field of Artificial Intelligence (AI), an outlook on its potential benefits and challenges in healthcare is given finally. Keywords Digital business models · Healthcare industry · eHealth · Telemedicine · Virtual reality · Artificial intelligence · Quantitative study · Structural equation modeling
N. Hoppe · F. Häfner · R. Härting (B) Aalen University of Applied Sciences, Aalen, Germany e-mail: [email protected] N. Hoppe e-mail: [email protected] F. Häfner e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_14
331
332
N. Hoppe et al.
14.1 Introduction For the majority of people, health is the most important asset. Free from health impairments, people are able to independently take care of their livelihood including self-development and the participation in social activities [1]. The German healthcare system strives to provide optimal treatment for its citizens [2] while facing numerous challenges: An aging population [3, 4], a growing and changing health awareness [5], rise of mental illnesses [6], skills shortages [7, 8] and many more. Since 2020, people around the world were confronted with an additional challenge: the Covid-19 pandemic. In order to cope with it, additional capacities within healthcare systems, such as the supply of personal protective equipment, disinfectants and other medical devices were required [2]. The Covid-19 pandemic amplifies the importance of digital transformation within healthcare. In general, digital technologies have the potential to create efficient processes, to optimize treatment methods and to ensure more substantiated and faster decision-making [9]. Several studies examine the benefits of digital technologies for specific application areas in healthcare. Digital technologies and digital business models play an increasing role in chronic disease management [10, 11]. Moreover, researchers examined the influence of digital technologies on patient empowerment [12, 13] and the benefits for hospitals [14–16]. However, an identification of relevant determinants and their influence on the potential of digital business models in the healthcare sector is still an open research question. In order to clarify the question, an empirical research project was carried out by Aalen University targeting experts in the healthcare industry. The chapter is structured as follows. First, a brief introduction to the role as well as current facts and figures of the healthcare sector are presented. Following section presents and classifies current trends of digitalization in the healthcare sector. The next section includes the main part by introducing the study on the potential benefits of digital business models and processes in the healthcare industry. The research method is explained, followed by an introduction of the identified determinants: KPIs, individualization, efficiency and communication channels. In Sect. 14.4.3, the potential benefits of digital technologies are examined based on an exemplary patient’s care pathway. After examining the challenges of digital business models, the study results will be presented, followed by an interpretation. Subsequent to the conclusion, a brief outlook on the role of Artificial Intelligence (AI) is given.
14.2 Role of the Healthcare Sector This section briefly illustrates the role of the healthcare sector with focus on the German healthcare industry. According to the World Health Organization (WHO) in 2007, healthcare sectors should be able to fulfill numerous objectives. They should improve health and health
14 Digital Business Models in the Healthcare Industry
333
equity, thereby acting responsive and economically fair. At the same time, the efficient use of resources should be taken into account. Intermediate goals are to improve and guarantee the access to healthcare provision and at the same time ensure safety and quality [17]. Thus, healthcare systems have a major responsibility. The importance of the healthcare industry is also reflected by the global sales volume. It is estimated at over 4 trillion US dollars annually. Pharmaceuticals and biotechnology play the most important role with almost 850 billion US dollars, followed by diagnostics and medical technology with over 400 billion US dollars [18]. The healthcare industry has a decisive economic impact on the gross value added (GVA) in Germany. The estimated GVA of the healthcare industry was 372 billion euros in 2019 and represents a share of around 12% of the total economy. With a growth rate of 4.1%, it exceeds the growth rate of the total economy (3.3%) [19]. The healthcare industry in Germany can be divided into different areas. The service-oriented healthcare industry comprises of inpatient and outpatient healthcare. Outpatient care includes medical practices, pharmacies and outpatient care facilities. Hospitals, preventive care and rehabilitation facilities can be attributed to inpatient care. The service-oriented healthcare industry amounts to more than 53% (estimated for 2019) of the GVA of the German healthcare industry and has thus the largest share. The industrial healthcare industry includes medical technology manufacturers, pharmaceutical manufacturers, medical retail and wholesale, but also other goods such as construction investments and equipment for digital applications. It has a share of 22.8% on the overall GVA of the German healthcare industry. The third and smallest subgroup is biotechnology with a growth rate of 5.6% [20, 21]. In 2018, healthcare expenditure was approximately 390.6 billion euros, thereby exceeding the mark of 1 billion euros per day [4]. The number of people employed in the German healthcare industry is rising steadily and reached 7.5 million in 2019. This corresponds to a share of 16.6% of the workforce in Germany [19]. According to a study by PwC in 2021, 72% of the respondents consider the healthcare system of Germany to be one of the best three in the world. This represents a significant increase compared to the previous year. In contrast, the study shows that the existing dissatisfaction with doctors results primarily from insufficient time commitment and the increased need for flexibility [22]. In the following section, current trends of digitalization in healthcare are presented which offer new opportunities to ensure needs like flexibility.
14.3 Current Trends of Digitalization in Healthcare Whereas the term digitization comprises the conversion of analog information into digital formats [23], digitalization can be considered as “the use of digital technologies to change a business model and provide new revenue and value-producing opportunities” [24]. It leads to the cross-linking of all areas of business and society. In addition, it offers the possibility of collecting and analyzing information and translating it into action [25]. The digital transformation of business models can affect
334
N. Hoppe et al.
the entire business model as well as individual elements, value chains and players within the value chain. As part of the digital transformation, technologies are used that enable new applications and services [26]. Those digital technologies are able to provide innovative health strategies and solutions. They offer the opportunity to gain access to major social determinants of health, including employment, social networks and many more factors [27]. Before presenting the study about the potential benefits of digital business models and processes in the healthcare industry, main digital trends and technologies in healthcare are introduced. Digital Health. As information technology is evolving, healthcare derives benefits by combining IT applications and medical knowledge [28]. Digital health acts as a generic term for digital health technologies which take advantage of large amounts of health data. That includes important insights about transactions and user interactions. Both patients and healthcare professionals benefit from the variety of new possibilities offered by digital technologies in numerous fields of application, such as personal health promotion or treatment of diseases [29]. Hermes et al. (in 2020) examine the digital transformation of the healthcare industry by analyzing a large number of healthcare organizations. The findings of the study reveal the emergence of new market segments, value streams, generic roles and an increasing collaboration between healthcare sectors and the information technology industry [30]. eHealth. Electronic health (eHealth) can be defined as “the use of emerging information and communications technology, especially the Internet, to improve or enable health provision and to facilitate the improvement of individual health” [31]. It includes both collaborative and interactive instructional technologies for consumers and patients [32]. Ehealth can be used for patient empowerment leading to a change in the patient-professional relationship by eliminating temporal and spatial restrictions, providing home-based solutions and functions like information and education [33]. Moreover, it serves as a generic term for telemedicine and telehealth [34]. Ehealth can be divided into three categories: mobile health (mHealth), personalized health (pHealth) and connected health (cHealth). Mhealth is the implementation of eHealth in wireless and mobile devices like smartphones, tablets or patient monitors. It can provide benefits for both health practitioners and patients to improve clinical care or patients’ self-management of diseases [16]. PHealth can especially be used by people with unstable health conditions in order to provide specific treatment options [35]. cHealth emphasizes an emerging network of devices, services and interventions to fulfill patients’ needs and to share health data between different actors [36]. Big Data. The term Big data comprises large amounts of data that can be stored, processed and evaluated [37]. Big data is able to support various medical functions like clinical decision support, population management and disease surveillance. It thus creates new opportunities for handling large and complex electronic health data [38]. 3D-printing in healthcare. 3D-printing is an additive manufacturing technology that is able to transform healthcare by producing individualized objects and even
14 Digital Business Models in the Healthcare Industry
335
organs. In addition, 3D printers can be used for the production of pharmaceuticals, thereby enabling personalized medicine for patients [39]. Robotics in healthcare. Robots, especially service robots, are increasingly implemented in healthcare to improve operational efficiency. For example, they are able to deliver materials and medications. However, Gombolay et al. (in 2018) see improvement potential in terms of error prevention and integration into the healthcare process [40]. Robots are also used as assistants in surgeries to ensure minimal invasive surgical care with high precision and to even improve clinical outcomes like reduced blood loss or surgical duration [41]. Artificial Intelligence (AI). Especially the past decade is characterized by a tremendous boom of AI technologies and applications [42]. In general, AI can be considered as “intelligence demonstrated by machines” [43]. To date, there is no universal definition of AI. However, the central characteristics and abilities of AI are learning, imitating human cognitive functions and problem-solving [44]. AI technologies are considered as important drivers for a new era of healthcare [45] with numerous application areas presented in Sect. 14.6.
14.4 Potential Benefits of Digital Business Models in the Healthcare Industry The research project investigates the potential benefits of digital business models and their processes in the healthcare industry. In the following subsection, the research method is presented in order to gain a closer insight into the approach.
14.4.1 Research Method Based on several research projects of Aalen University [46, 47], companies and other forms of organizations (non-profit, freelancer) from various industries were surveyed on topics related to digitalization. Both industry-specific and industry-independent questions were addressed. Based on the results and an additional literature review, four determinants (see Sect. 14.4.2) were identified to indicate the influence on the potential benefits of digital business models and their processes. These four determinants are part of a structural equation model that builds the basis of the study (see Sect. 14.4.5). Structural Equation Modeling is a useful method to test hypotheses and derive statements about the relationships between variables. It is a multivariate data analysis method which is often used in research in order to test linear and additive causal models. To evaluate the data, the authors chose SmartPLS which is a statistics software tool for Partial Least Squares Structural Equation Modeling (PLS-SEM) [48, 49].
336
N. Hoppe et al.
14.4.2 Industry-Dependent Determinants of Digitalization This subsection introduces the identified determinants as influences on the potential benefits of digital business models and processes in the healthcare industry (Fig. 14.1). Key Performance Indicators (KPIs). KPIs reflect the economic situation of companies and other types of organizations. The KPIs focus especially on performance, success, organizational units and capacity utilization [50]. KPIs concretize objectives that serve to control and evaluate departments and processes [51]. Sample key indicators in healthcare are waiting times, resource utilization and length of stay [52]. Due to the increasing economic focus of health facilities keeping health expenditure as low as possible is a core task of healthcare management [53]. Individualization. Patients want to be treated according to their individual needs and illnesses. Since a disease can manifest itself differently in each patient, specific treatment is necessary [54]. Personalized medicine, also known as precision medicine, occurs as a tremendous trend in healthcare. It is intended to prevent and treat diseases based on a patient’s lifestyle, genetic makeup and environment [55]. Efficiency. The overall goal in healthcare is to increase the quality of life and life expectancy of people [56]. Nevertheless, efficiency plays an increasing role, especially in hospitals, to ensure optimal patient care [14]. The challenge here is to maintain a balance between health outcomes and (national) economic benefits [57]. Communication channels. With the digital transformation of healthcare, a huge number of new communication channels such as video consultations or health applications (in the following referred to as “health apps”), arise providing faster and more efficient exchange [10]. These new communication channels will especially change the communication between patients and healthcare professionals [30]. Fig. 14.1 Identified determinants to examine the potential benefits of digital business models and their processes
Key Performance Indicators (KPIs)
Individualization
Efficiency
Communication channels
14 Digital Business Models in the Healthcare Industry
337
14.4.3 Digital Technologies Along the Care Pathway According to the European Court of Justice and the Federal Fiscal Court of Germany, curative treatments are “activities undertaken for the purpose of preventing, diagnosing, treating and, as far as possible, curing diseases or health disorders in humans” [58]. Based on this definition, an exemplary patient’s care pathway was developed to examine the potential benefits of digital business models and their processes [59] in consideration of new digital technologies. It provides the basis for five sector-specific questions within the study (see Sect. 14.4.5). Health apps for Prevention. In order to be able to recognize and treat diseases as early as possible or avoid them completely, preventive healthcare is necessary [60]. A distinction is made between primary, secondary, tertiary, and quaternary prevention. Primary prevention aims at avoiding the occurrence of diseases while secondary prevention includes measures (like early detection checkups) for a sick patient whose illness has not yet been diagnosed. For sick patients who have already been diagnosed, the risk can be reduced by lifestyle changes and rehabilitation measures as part of tertiary prevention. In addition, unnecessary examinations should be avoided within quaternary prevention. This primarily involves not performing all possible examinations for every symptom [61] (Fig. 14.2). Health apps are increasingly being used for prevention [62]. A study by BIS Research (in 2018) estimates the mobile health app market will generate 11.2 billion USD in global revenue by 2025. In 2017, sales amounted to 2.4 billion USD [63]. There is a huge variety of health apps, distinguishing between apps for prevention of healthy and sick people and the application in different health facilities. For example, fitness apps provide functions like a calory counter or pedometer, workout exercises or tips for relaxed sleep [64]. Health apps can act as everyday companions by collecting and analyzing health-related data [65]. Telediagnostics. Diagnostics can be divided into different areas. First of all, the medical history of the patients is examined, followed by the examination of specific symptoms and laboratory tests like blood testing and imaging techniques such as X-ray examinations [66]. Telemedicine includes virtual communication technologies between the patient and provider in order to overcome time, space or cultural barriers to healthcare [30]. It can be used in diagnostics, therapy, rehabilitation and medical consultation.
Fig. 14.2 Digital technologies along the care pathway based on [59]
338
N. Hoppe et al.
Telemedicine can be divided either on the basis of the medical specialty (like teleradiology or telecardiology) or on the basis of the functions that are used. The most used functions are teleconsultation and telemonitoring [67]. Virtual Reality (VR) for Treatment. After the diagnosis, the treatment process follows. The German healthcare system regulates the entitlement to healthcare treatment in Social Code 5. According to section 27, Paragraph 1, people with health insurance are entitled to healthcare treatment if it is necessary to recognize an illness, cure it, prevent it from worsening or alleviate discomfort [68]. VR-based applications enhance simulation by providing a realistic virtual 3D environment. Both patients and healthcare professionals are able to use them in combination with special devices [69, 70]. VR can enhance medical training, e.g. by teaching medical students about the prescription and use of antibiotics [71]. Moreover, medical students can use VR glasses to follow the operations of experienced physicians without being tied to a specific location [72]. Patients can also take advantage of VR, e.g., in the therapy of anxiety disorders or neurological diseases. VR is still rarely used in the German healthcare sector. However, due to decreasing hardware costs and increasing medical evidence of VR, the technology offering new treatment options is predicted to become more widely used [73]. Mail-order pharmacies for pharmaceutical supply. Pharmaceutical supply is a central component of healthcare and includes pharmaceutical manufacturers, pharmaceutical wholesalers, public pharmacies as well as hospital pharmacies [21]. The prescription of the right medication represents a complex task with far-reaching consequences and side effects in case of incorrect medication or dosage [74]. Mail-order pharmacies play an increasing role in the supply of pharmaceuticals as they provide home delivery, price comparison and continuous access [75]. Especially chronically ill, the elderly and people with walking disabilities who are able to order pharmaceuticals via phone, internet or by mail can benefit from online pharmacies. European online provider of pharmaceuticals are subject to certification with the European mail order logo [76]. Despite the advantages of online pharmacies, there is insufficient empirical evidence whether online pharmacies are displacing retail pharmacies and thus could negatively impact the provision of comprehensive care, especially in rural regions [1]. Telerehabilitation. Rehabilitation includes measures to recover the physical, mental and social condition of a person or to reduce the limitations of a disability [60]. It takes place in special rehabilitation facilities, for example in the form of a cure or follow-up treatment after hospital treatment [21]. The overall goal for the patients is to regain self-dependence and the ability to lead an active life [77]. In order to be able to carry out rehabilitation independently of location, telerehabilitation, as part of telemedicine, is increasingly used based on new technologies [78]. Marzano and Lubkina (in 2017) make a distinction between medical and social telerehabilitation. Medical telerehabilitation comprises curative medicine involving trained healthcare professionals like physiotherapists, whereas social telerehabilitation contains individual services like occupational therapy, language therapy or art therapy [77].
14 Digital Business Models in the Healthcare Industry
339
14.4.4 Challenges of Digitalization in Healthcare In order to create an overall picture of the potential benefits and challenges of digital business models and technologies in healthcare, the main challenges are presented in the following section. Scientific evidence. A key challenge is to evaluate whether digital health technologies provide benefits for users [79]. Health applications provide a wide range of opportunities for prevention and individual health promotion. However, it has not been scientifically proven whether the apps can positively influence users’ health behavior [65]. Despite the enormous number of health apps, there are relatively few that are certified. One reason for this are the costs of such certification [80]. More evidence-based studies are needed to enable the widespread use of health technologies like telemedicine [67]. Legal regulations. Remote diagnostics and treatment via digital media are subject to legal regulations in Germany [81]. However, these regulations have been liberalized, allowing physicians to consult and treat their patients via digital communication channels considering various conditions. For example, sick leave via telephone and video conferencing of unknown patients is restricted [82] with exceptions during the Covid-19 pandemic [83]. Communication barriers and implementation effort. Incorrect communication in the context of telemedicine can lead to false diagnoses [67]. Another difficulty is the technical factor, which has not yet been guaranteed by the nationwide expansion of broadband networks [84]. In order to be able to deal with telemedicine applications in the best possible way, training, special software and the adaptation of internal processes are necessary. However, this is associated with high implementation effort [67, 85]. To date, there is still no adequate funding for service providers, so the incentive to invest in telemedicine is rather low [67]. Ethical concerns and data privacy. As confidentiality is a key principle of medicine, it also has to be ensured when using telemedicine and other digital technologies [86]. Due to the complexity of digital technologies, there is a risk of data misuse when processing personal data [65]. Another ethical aspect is the changing patient and healthcare professional relationship through the use of digital technologies. Due to the physical distance, (almost) no emotional bond can be established which would be important for the treatment of patients. This creates a new patient-doctor relationship that has to take new data protection aspects into account [67]. Attitude towards technology adoption. Another barrier to the use of digital technologies is the attitude of healthcare actors towards technology adoption. Reasons for resistance to new technologies may include a desired focus solely on patient care, influence by technology-averse colleagues or the privacy concerns mentioned above [30]. Barriers for new business models. Company foundations are associated with high costs and risks. The research and development costs are particularly high for innovations in the healthcare sector, since patient safety has to be ensured. During
340
N. Hoppe et al.
the development of pharmaceuticals, testing quality and efficacy as well as safety can take decades. Many companies and research institutions fail due to these hurdles [87].
14.4.5 Study Results In order to investigate the general potential of digital business models in the healthcare industry, a quantitative study was conducted in 2019 (before COVID-19 arised). 169 experts from different organizations (providers or companies) within the healthcare industry took part in the online survey to submit their assessment of the potential benefits of digital business models and their processes. The survey contains both cross-industry and industry-specific questions. The cross-industry questions comprise general aspects such as the number of employees, revenue levels as well as three digital questions. In addition, five industry-specific questions were examined in order to gain more detailed insights into the impact of digitalization. 157 participants stated that they already have experience with digitalization, whereas only 12 participants do not have any experience with digital technologies. This result emphasizes the great role of digitalization in the healthcare industry. Cross-industry questions. Examining the size of the participating organization (healthcare providers or companies), the majority have more than 250 employees. In contrast, only 26 participants have a headcount of less than 50 employees. Twelve organizations employ 51 to 250 people. The size of the organization can also be derived from the turnover. 46.2% of the participants reach an annual turnover of more than 50 million euros. 29.6% of the participants record a turnover of less than 50 million euros and consequently can be assigned to micro, small and medium-sized enterprises (SMEs) [88]. 41 organizations did not indicate their amount of turnover. The European Commission defines the company size according to the number of employees, annual turnover and balance sheet total. Companies with a maximum number of employees of 250, an annual turnover of maximum 50 million euros and a balance sheet total of up to 50 million euros belong to the category of SMEs [88]. According to this classification, about 22–30% of the surveyed organizations are SMEs. The participants were also asked how long they have been dealing with digitalization. 111 respondents have been addressing the topic for more than 3 years which illustrates the important role of digitalization in healthcare. 41 participants deal with the topic for 1–3 years whereas only 5 participants have been addressing the topic for less than one year. The following three cross-industry questions refer to the assessment of the potential benefits of digital business models and their processes. In healthcare, medical errors can have serious effects on the health of patients and have to be avoided. 99 participants agree that the frequency of errors can be reduced by digital business models and their processes. 23 do not agree and 47 respondents did
14 Digital Business Models in the Healthcare Industry
341
not indicate an answer. The next question focuses on how to address customers and patients. The study indicates that 68.6% of the participants expect digital business models to improve the way how customers are being addressed. 11.2% do not expect any improvement in addressing customers by means of digital business models. 34 experts responded neutrally. Digital business models and processes can also have a positive impact on the productivity which in turn increases the efficiency of organizations. 113 experts agree with the statement, while 39 participants do not indicate a concrete answer and 17 participants do not associate a positive impact on productivity due to digital business models and their processes. The literature research confirms the positive results as technologies like Big Data can increase productivity and profitability [89]. Industry-specific questions. The five sector-specific questions focus on the healthcare industry and are intended to provide a deeper insight into the effects of digitalization for organizations. The questions are based on the care pathway of patients [59]. In contrast to the cross-industry questions, only 151 participants responded to the questions. The first question provides information about the impact of telediagnostics on patient care (see Fig. 14.3). A total of 41 experts fully agree with this statement, whereas 80 rather agree. 14 respondents do not expect an impact on patient care. The remaining 16 respondents disagree that telediagnostics improves patient care. The study also surveyed the impact of telerehabilitation for patient follow-up. 99 experts agree that telerehabilitation enables an improved patient follow-up. Thus, the recovery process after treatment offers a wide range of opportunities for future applications. 28 respondents do not associate an impact and 24 participants disagree with the statement (Fig. 14.4). 70.9% of the respondents agree that new business models and their processes enable new digital products that can lead to improved patient care. Thus, new digital
companies
n=151 90 80 70 60 50 40 30 20 10 0
The majority agrees that telediagnostics improves patient care. 80
41
completely agree
rather agree
14
10
6
neither nor
rather disagree
completely disagree
Fig. 14.3 Telediagnostics to improve patient care
342
N. Hoppe et al.
companies
n=151 90 80 70 60 50 40 30 20 10 0
The majority of experts confirm improved patient followup through telerehabilitation. 78
28
21
completely agree
15
rather agree
neither nor
rather disagree
9 completely disagree
Fig. 14.4 Telerehabilitation for improved patient follow-up
products such as health apps play an important role for managing (chronic) illness as well as for leading a healthy lifestyle (Fig. 14.5). The results of the following sector-specific questions are mixed. 73 participants agree that digital business models including VR can reduce medical errors and mistreatments. While 23.2% of the participants do not see any effects of VR for the reduction of mistreatments, 43 participants disagree with the statement. One reason may be the lack of widespread use of VR in healthcare (Fig. 14.6). In pharmaceutical supply, digital business models enable new forms of online pharmacies. The following question (see Fig. 14.7) examined whether new forms of mail-order pharmacy can improve pharmaceutical supply. 81 participants agree with the statement while respectively 23.2% disagree with the hypothesis or do not see any
companies
n=151 90 80 70 60 50 40 30 20 10 0
New digital products (such as health apps) lead to improved care. 78
29
29 10
completely agree
rather agree
Fig. 14.5 New digital products for patient care
neither nor
rather disagree
5 completely disagree
14 Digital Business Models in the Healthcare Industry
companies
n=151 70 60 50 40 30 20 10 0
343
Mixed opinions on business models that use VR to reduce medical errors 63
35
35
10 completely agree
8 rather agree
neither nor
rather disagree
completely disagree
Fig. 14.6 VR to reduce medical errors
n=151 Majority agrees that new forms of mail-order pharmacy can
companies
improve pharmaceutical supply.
70 60 50 40 30 20 10 0
60
35 24
21
11
completely agree
rather agree
neither nor
rather disagree
completely disagree
Fig. 14.7 Mail-order pharmacy for improved pharmaceutical supply
effects. The balance of responses may be due to the fact that online pharmacies are not yet very well established. In addition, the participants may fear the replacement of retail pharmacies. In the following figure, the results of the structural equation model including the determinants KPIs, individualization, efficiency and communication channels are being examined (Fig. 14.8). Structural equation model. The determinants that have an influence on the potential benefits of digital business models and processes were evaluated by means of the software tool SmartPLS [49]. Figure 14.8 depicts the research model with the influences of the determinants KPIs, individualization, efficiency and communication channels on the dependent variable Potential benefits of digital business models and processes in the healthcare industry.
344
N. Hoppe et al.
KPIs
2.583**
Individualization
Potential benefits of digital business models and processes in the healthcare industry
0.497
1.250 Efficiency
2.579** * p < 0.1 ** p < 0.05 *** p < 0.01
Communication Channels Fig. 14.8 Evaluation of the research model
When evaluating the data, it is necessary to consider the quality criteria according to Homburg: objectivity, reliability and validity [90]. Objectivity is given as soon as different researchers obtain the same measurement results when conducting the evaluation and when the same results lead to the same conclusions. Moreover, execution objectivity can be fulfilled if the participants are not influenced by the investigator [91]. The reliability of a measuring instrument examines how results are being measured and provides information on the extent to which the results are reproducible when the measurement is repeated. The third criterium examines the validity of a measuring instrument and verifies whether the method measures according to the measurement goal [91, 92]. The present study examines the influences of the determinants on the potential benefits of digital business models and their processes in the healthcare industry. Before analyzing the structural model, the reliability and validity is measured by means of different quality criteria (see Table 14.1). The quality criterium Composite Table 14.1 Quality criteria of the study Composite reliability (CR)
Cronbachs alpha (CA)
Average variance extracted (AVE)
KPIs
0.834
0.765
0.504
Individualization
0.768
0.629
0.413
Efficiency
0.672
0.479
0.316
Communication Channels
0.825
0.744
0.489
14 Digital Business Models in the Healthcare Industry
345
Reliability (CR) defines the reliability of internal consistency and takes into account different indicator loadings. Values of at least 0.6 are acceptable to confirm the reliability of the indicators [93]. Considering Table 14.1, all CR values are above 0.6 and consequently can be accepted. KPIs has the highest CR of 0.834 which indicates a high internal consistency. Cronbach’s Alpha (CA) also examines the reliability of the items on the latent (not directly measurable) variable. In contrast to CR, this criterium assumes equal indicator loadings [93]. The threshold value depends on the number of indicators included in a construct. If the CA value of a construct with at least four indicators is equal or higher than 0.7, high quality can be assumed [92]. KPIs has accordingly to the above-mentioned quality criterium CR the highest α-value of 0.765. Efficiency indicates the lowest value of 0.479, so the values show strong fluctuations. Average variance extracted (AVE) determines the extent to which a latent construct reflects the variance of the indicators. The target value to be able to confirm a variance has to be higher than 0.5 [93]. KPIs has again the highest value with an AVE of 0.504 and thus explains more than half of the variance of its indicators [93]. Closely below the threshold is the variable communication channels with a value of 0.489. It is followed by individualization with 0.413 and efficiency with a value of 0.316. Additionally, R2 has been examined. It measures the prediction performance of a model and is a quality coefficient of linear regression. According to Chin (1998), values of 0.67 can be considered substantial whereas results up to 0.19 are considered weak [94]. The results show an R2 value of 0.279 and indicate that the predictive power of the model is rather weak to average. In order to derive statements about the hypotheses, the results of the evaluation are presented in Table 14.2. Hypotheses The results regarding the hypotheses are presented in the following section. The original sample and the sample mean should exceed a value of 0.2 to be considered a good result [94]. At a significance level of 10%, the test value for the t-statistics is 1.65. The p-value corresponds to the probability that the null hypothesis is falsely rejected. The p-value has to be below 0.1 at a significance level of 10% [93]. H1 Digital business models and their processes lead to an improvement of KPIs in the company. Table 14.2 Results of the hypothesis model Original sample (O) KPIs
Sample mean (M)
Standard deviation
T-Statistics
P-Value
0.224
0.232
0.087
2.583
0.010
−0.050
−0.040
0.101
0.497
0.620
Efficiency
0.131
0.158
0.105
1.250
0.212
Communication channels
0.282
0.254
0.109
2.579
0.010
Individualization
346
N. Hoppe et al.
The latent variable KPIs has an original sample value of 0.224 and a significance level of p = 0.01 (p < 0.05). Consequently, it can be assumed that KPIs have a high and positive influence on the potential benefits of digital business models and their processes in healthcare. Both the original sample and the sample mean are above the threshold of 0.2. Moreover, the t-value of 2.583 exceeds the test value. Consequently, hypothesis 1 can be accepted. Due to digital business models and processes, KPIs like transaction costs, productivity and growth rates can be improved. H2 Digital business models and their processes lead to more individualized product and service offerings and more specific customer care. The value of the original sample is -0.050 at a significance level of p > 0.1. Additionally, the value of the sample mean is clearly below the target value. The tvalue of 0.2 is below the test value. As a result, individualization has no influence on the potential benefits of digital business models and their processes. The hypothesis has to be rejected. One reason for the result can be insufficient experience with new digital technologies and individualized products in healthcare. H3 Digital business models and their processes lead to more efficient corporate structure and processes. At a significance level of p > 0.1, the original sample value is 0.131 and does not reach the target value of 0.2. The mean value is 0.158 and the t-value of 1.250 is below the critical t-value. Consequently, the hypothesis has to be rejected. It can be assumed that the use of digital technologies in organizations and the benefits of telediagnostics or health apps are still not widespread. H4 Digital business models and their processes lead to improved internal and external corporate communication and optimize opportunities for cooperation with other organizations. The influence of communication channels on the potential benefits of digital business models and processes is represented by a value of 0.282 at a significance level p = 0.01 (p < 0.05). Thus, communication channels have a positive and significant influence. Moreover, the t-value exceeds the test value. The hypothesis can be accepted and leads to the conclusion that new communication channels facilitate the cooperation between different actors in healthcare by supporting real-time and location-independent communication.
14.4.6 Interpretation This section aims to interpret the study results making use of the findings from the literature. The study participants emphasize the importance of digital business models and processes in healthcare. The literature research shows that the use of digital technologies and processes can enhance productivity and efficiency. Digital technologies can further improve diagnostics, treatment and rehabilitation [67, 69, 77].
14 Digital Business Models in the Healthcare Industry
347
The majority of participants agree that error rates can be reduced and productivity can be enhanced. The study also shows that digital business models and processes can have a positive impact on the way physicians and patients interact and communicate. These digital technologies in healthcare entail new challenges. The social component as well as emotions essential for patients interacting with doctors and nursing staff are likely to fade into the background when using new digital communication channels [67]. Other challenges associated with new digital business models and processes in healthcare are ensuring confidentiality and data privacy [86], implementation effort [67, 85] and the creation evidence-based health technologies that users can rely on [79]. On the other hand, new digital technologies have the potential to compensate skills shortages and fulfill the need of cost-effective treatment [80] which is also confirmed by the study. The study shows that health apps have the potential to support the detection and treatment of diseases and can enhance health prevention. Health apps are able to support patients’ disease progression or positively influence their lifestyle [64, 80]. Additionally, most of the participants agree with the statement that digital business models and their processes can improve the way how customers are being addressed. Telemedicine offers the possibility to create new ways of interaction, e.g., through video consultations, thereby enabling spatial and temporal flexibility [30, 67]. The majority of experts see potential in telediagnostics for improving patient care. While most participants agree with the preceding statements, the potential of telerehabilitation for patient follow-up is seen more critically. Anyway, the majority agrees that telerehabilitation can improve patient follow-up which is emphasized by the literature research [77]. The individual treatment of patients plays a decisive role in healthcare. VR technologies enable patients to train the handling of their (chronic) diseases in a personalized way. Consequently, mistreatments can be reduced and the treatment process can be optimized [70, 73]. The study participants are rather critical towards the benefits of VR technologies which can be attributed to the potential infrequent use of VR technologies in healthcare. Mail-order pharmacies have the potential to change the branch structure of pharmacies and improve medical care. Especially elderly people and people in rural areas can benefit from online pharmacies [75, 76]. The mixed results may be derived from the fact that retail pharmacies have to be closed and numerous people have to fear losing their jobs. The evaluation of the SEM shows mixed results. KPIs have strongest influence on the potential benefits of digital business models and processes in healthcare. More efficient and cost-effective processes can be realized by new digital business models and technologies [30, 71]. These new digital technologies lead to a changing treatment process which companies or other health care facilities and patients have to consider. Overall, KPIs can be improved through increased productivity, resource savings and lower costs. The SEM also shows significant values regarding the determinant communication channels. In contrast to the results of the literature research, individualization and efficiency do not show significant influences on the potential benefits of digital business models and their processes in healthcare. This may be
348
N. Hoppe et al.
attributable to the still low level of digitalization in the healthcare industry, as new innovations and ideas are associated with high implementation effort [67, 85].
14.5 Conclusion Germany has one of the best healthcare systems in the world [22], made even more evident by the management of the Covid-19 pandemic. However, it is not a pioneer in terms of digitalization and so there is still enormous improvement potential. In order to cope with the tremendous changes enhanced by digitalization, companies and health facilities have to integrate new digital technologies, business models and processes and embed them into their existing processes and corporate structure. A careful and well considered use of digital technologies is particularly important when dealing with patients in order to ensure optimal care. It is necessary to include patients into the treatment process as early as possible while carefully introducing them to these new technologies. There is an increasing trend moving away from the classic doctor-patient relationship towards a more self-determined patient. Digitalization offers a wide range of new opportunities and provides a wide range of benefits for both patients and organizations in the health sector. It also contributes to increasing people’s health literacy and can have a positive impact on efficiency, error frequency and costs in healthcare organizations. Increased choice for people simultaneously enhances competition between healthcare organizations, especially in the relationship between retail pharmacies and online pharmacies. The study illustrates the impact of digital transformation on the German healthcare industry, thereby providing a practice-oriented view on the potential benefits of digital business models and processes. The results show that KPIs and communication channels have a positive and significant influence while efficiency and individualization do not have a significant influence. Thus, not all determinants derived by the literature research could be confirmed. The study is subject to some limitations. The focus was on German-speaking experts and the survey was conducted in 2019, before COVID-19 occured. It is recommended to carry out the study again with an extended sample size by including international experts. This could provide a comparison of the potential of digital technologies before and after the COVID-19-Pandemic and illustrate the progress made in each healthcare system. An enhanced research approach could focus on specific technologies or digital business model in order to examine more specific potential benefits. The research project shows that digitalization is a relevant topic in most of the organizations of the healthcare industry. In order to benefit from digitalization and remain competitive, health care facilities and industry have to integrate digital technologies and business models into their existing processes and structures.
14 Digital Business Models in the Healthcare Industry
349
14.6 Outlook: The Role of AI in Healthcare The presented research project mainly focuses on digital business models like health apps, telemedicine, mail-order pharmacies and VR. The results show that KPIs and communication channels have a significant influence on the potential benefits of digital business models and processes in healthcare. In order to elaborate the determinants efficiency and individualization which could not be confirmed as significant influences, the research area “Management for Small and Medium-sized Enterprises” at Aalen University currently examines the potential benefits of AI in healthcare. The aim is to identify new and more specific influencing factors related to AI. Both clinical and economic benefits will be considered. This subsection provides an introduction to the development, potential benefits and challenges of AI in healthcare. In 1950, the polymath Alan Turing created a test to investigate whether a machine was capable of imitating human cognitive functions. He wanted to demonstrate the potential power and possibilities of computer technologies [95]. Six years later, John McCarthy and his research group assumed that each feature of human intelligence could be simulated by machines [44]. AI has already been implemented in numerous industries such as manufacturing, finance or logistics [96]. Both the availability of large data sets and the continuous development of big data analytics have empowered the application of AI in healthcare [97]. These AI technologies strive to create a new era of healthcare [45]. AI has already been applied in healthcare in the 1950s in order to improve diagnostics with computer-aided programs. One example is the diagnosis of acute abdominal pain with the help of computers by Gunn [98]. Over the years, the number of applications in diagnostics increased enormously [99, 100]. AI is also being applied in medical treatment [101, 102], documentation tasks [103] or drug discovery [104, 105]. AI in healthcare is often applied by means of Machine Learning (ML) or Deep Learning (DL) techniques. By identifying patterns and ML algorithms, machines are able to provide problem-solutions and make decisions. DL is a subgroup of ML that aims to improve the accuracy of AI [106]. The number of AI applications in healthcare are expected to increase. According to a study by MarketsandMarkets, the estimated turnover of AI in healthcare will be 4.9 billion US dollars in 2020 (worldwide). In contrast, the estimated turnover in 2026 will exceed the 45 billion US dollar mark [107]. Apart from the potential benefits, there are also various challenges of the implementation of AI in healthcare. One issue is confidentiality which is especially important in the handling with medical sensitive data. Anonymizing data helps to ensure confidentiality but the emerging possibilities of data identification still represent a challenge. Thesmar et al. (in 2019) demand the creation of improved de-identification techniques even in pattern detection techniques [108]. The study results of Laï et al. (in 2020) show that there is an urgent need of a balance between augmenting data access and ensuring confidentiality and privacy [109]. Furthermore, organizations with AI-based systems can also become victims of cyber-attacks. Past examples
350
N. Hoppe et al.
show that healthcare lags behind other industries in terms of human-based countermeasures like employee training [110]. Apart from other challenges like ensuring transparency [108], trusting AI-based systems may be one of the most important challenges in the use and implementation of AI in healthcare [111]. The continuous and rapid development of AI business models and technologies in healthcare provides numerous opportunities for future research in order to discuss potential benefits and challenges. The promise of AI revolutionizing healthcare, empowering people’s health and curing diseases through a multitude of new possibilities is therefore a great motivator. Acknowledgements This work is based on several research projects of Aalen University. We would like to thank Viola Krämer and María Leticia Aguilar Vázquez for their great support especially.
References 1. G. Bäcker, G. Naegele, R. Bispinck, Sozialpolitik Und Soziale Lage in Deutschland (Springer Fachmedien Wiesbaden, Wiesbaden, 2020) 2. Bundesministerium für Gesundheit (ed.), Das deutsche Gesundheitssystem. Leistungsstark. Sicher. Bewährt (2020) 3. G. Yang, Z. Pang, M. Jamal Deen, Dong, M., Zhang, Y.-T., Lovell, N., Rahmani, A.M., Homecare robotic systems for healthcare 4.0: visions and enabling technologies. IEEE J Biomed Health Inf. 24, 9, 2535–2549 (2020). https://doi.org/10.1109/JBHI.2020.2990529 4. Bundesministerium für Gesundheit (ed.) Daten des Gesundheitswesens. 2020 November 2020 (2020) 5. S. Gschoßmann, A. Raab, Content-Marketing als Strategie der Zukunft im Krankenhaus, in Digitale Transformation von Dienstleistungen im Gesundheitswesen II, ed by Pfannstiel, M., Da-Cruz, P., Mehlich, H. (Springer Gabler, Wiesbaden), pp. 107–127 https://doi.org/10.1007/ 978-3-658-12393-2_8 6. DAK-Gesundheit (Andreas Storm) (ed.), Gesundheitsreport 2020. Beiträge zur Gesundheitsökonomie und Versorgungsforschung (33) (2020) 7. B. Meskó, G. Hetényi, Z. Gy˝orffy, Will artificial intelligence solve the human resource crisis in healthcare? BMC Health Serv. Res. 18, 545 (2018). https://doi.org/10.1186/s12913-0183359-4 8. R. Flake, S. Kochskämper, P. Risius, S. Seyda, Fachkräfteengpass in der Altenpflege. Vierteljahresschrift zur empirischen Wirtschaftsforschung 45, 20–39 (2018) 9. U. Sury, Digitalisierung im Gesundheitswesen. Informatik Spektrum 43, 442–443 (2020). https://doi.org/10.1007/s00287-020-01317-9 10. K. Yousaf, Z. Mehmood, I.A. Awan, T. Saba, R. Alharbey, T. Qadah, M.A. Alrige, A comprehensive study of mobile-health based assistive technology for the healthcare of dementia and Alzheimer’s disease (AD). Health Care Manag. Sci. 23(2), 287–309 (2020). https://doi.org/ 10.1007/s10729-019-09486-0 11. S. Agnihothri, L. Cui, M. Delasay, B. Rajan, The value of mHealth for managing chronic conditions. Health Care Manag. Sci. 23(2), 185–202 (2020). https://doi.org/10.1007/s10729018-9458-2 12. S. Kraus, F. Schiavone, A. Pluzhnikova, A.C. Invernizzi, Digital transformation in healthcare: Analyzing the current state-of-research. J. Bus. Res. 123, 557–567 (2021). https://doi.org/10. 1016/j.jbusres.2020.10.030
14 Digital Business Models in the Healthcare Industry
351
13. C.J. Bermejo-Caja, D. Koatz, C. Orrego, L. Perestelo-Pérez, A.I. González-González, M. Ballester, V. Pacheco-Huergo, Y. Del Rey-Granado, M. Muñoz-Balsa, A.B. Ramírez-Puerta, Y. Canellas-Criado, F.J. Pérez-Rivas, A. Toledo-Chávarri, M. Martínez-Marcos, Acceptability and feasibility of a virtual community of practice to primary care professionals regarding patient empowerment: a qualitative pilot study. BMC Health Serv. Res. 19, 403 (2019). https:// doi.org/10.1186/s12913-019-4185-z 14. E. Karahanna, A. Chen, Q.B. Liu, C. Serrano, Capitalizing on health information technology to enable advantage in U.S. Hospitals. MISQ 43(1), 113–140 (2019). https://doi.org/10.25300/ MISQ/2019/12743 15. C. Williams, Y. Asi, A. Raffenaud, M. Bagwell, I. Zeini, The effect of information technology on hospital performance. Health Care Manag. Sci. 19(4), 338–346 (2016). https://doi.org/10. 1007/s10729-015-9329-z 16. O’ Connor, Y., O’ Reilly, P.: Examining the infusion of mobile technology by healthcare practitioners in a hospital setting. Inf. Syst. Front. 20, 6, 1297–1317 (2018). https://doi.org/ 10.1007/s10796-016-9728-9 17. World Health Organization (WHO), Everybody business : strengthening health systems to improve health outcomes: WHO’s framework for action (2007) 18. HBM Healthcare Investments: Der Gesundheitsmarkt – Ein attraktives Anlageuniversum. https://www.hbmhealthcare.com/de/sektor#:~:text=Das%20globale%20Umsatzvolumen% 20der%20Gesundheitsindustrie,Diagnostik%20mit%20%C3%BCber%20400%20Mill iarden. (2020). Accessed 14 March 2021 19. Bundesministerium für Wirtschaft und Energie (BMWi) (ed.), Gesundheitswirtschaft. Fakten & Zahlen. Ergebnisse der Gesundheitswirtschaftlichen Gesamtrechnung (2019) 20. Bundesministerium für Gesundheit: Gesundheitswirtschaft. Bedeutung der Gesundheitswirtschaft. https://www.bundesgesundheitsministerium.de/themen/gesundheitswesen/ gesundheitswirtschaft/bedeutung-der-gesundheitswirtschaft.html (2019). Accessed 14 March 2021 21. M. Simon, Das Gesundheitssystem in Deutschland. Eine Einführung in Struktur und Funktionsweise, 5th edn. Hogrefe, Bern (2016) 22. PricewaterhouseCoopers GmbH (PwC): Healthcare Barometer 2021 (2021) 23. M. Rachinger, R. Rauter, C. Müller, W. Vorraber, E. Schirgi, Digitalization and its influence on business model innovation. JMTM (2019). https://doi.org/10.1108/JMTM-01-2018-0020 24. Gartner: Gartner Glossary. Digitalization. https://www.gartner.com/en/information-techno logy/glossary/digitalization. Accessed 14 March 2021 25. Bundesministerium für Wirtschaft und Energie (ed.): Industrie 4.0 und Digitale Wirtschaft. Impulse für Wachstum, Beschäftigung und Innovation (2015) 26. D.R.A. Schallmo, Jetzt Digital Transformieren (Springer Fachmedien Wiesbaden, Wiesbaden, 2019) 27. F. Baum, L. Newman, K. Biedrzycki, Vicious cycles: digital technologies and determinants of health in Australia. Health Promot. Int. 29(2), 349–360 (2014). https://doi.org/10.1093/hea pro/das062 28. S.P. Bhavnani, J. Narula, P.P. Sengupta, Mobile technology and the digitization of healthcare. Eur. Heart J. 37(18), 1428–1438 (2016). https://doi.org/10.1093/eurheartj/ehv770 29. Nebeker, C., Torous, J., Bartlett Ellis, R.J.: Building the case for actionable ethics in digital health research supported by artificial intelligence. BMC Med. 17(1), 137 (2019). https://doi. org/10.1186/s12916-019-1377-7 30. S. Hermes, T. Riasanow, E.K. Clemons, M. Böhm, H. Krcmar, The digital transformation of the healthcare industry: exploring the rise of emerging platform ecosystems and their influence on the role of patients. Bus Res. 13(3), 1033–1069 (2020). https://doi.org/10.1007/s40685020-00125-x 31. H. Kelley, M. Chiasson, A. Downey, D. Pacaud, The clinical impact of eHealth on the selfmanagement of diabetes: a double adoption perspective. JAIS 12(3), 208–234 (2011). https:// doi.org/10.17705/1jais.00263
352
N. Hoppe et al.
32. S.R. Tamim, M.M. Grant, Exploring how health professionals create eHealth and mHealth education interventions. Educ. Tech. Res. Dev. 64(6), 1053–1081 (2016). https://doi.org/10. 1007/s11423-016-9447-4 33. E. Lettieri, L.P. Fumagalli, G. Radaelli, P. Bertele, J. Vogt, R. Hammerschmidt, J.L. Lara, A. Carriazo, C. Masella, Empowering patients through eHealth: a case report of a pan-European project. BMC Health Serv. Res. 15, 309 (2015).https://doi.org/10.1186/s12913-015-0983-0 34. S. Burkhart, F. Hanser, Einfluss globaler megatrends auf das digitale betriebliche gesundheitsmanagement, in D. Matusiewicz, L. Kaiser (eds.) Digitales Betriebliches Gesundheitsmanagement. FOM-Edition (FOM Hochschule für Oekonomie & Management). Springer Gabler, Wiesbaden (2018). https://doi.org/10.1007/978-3-658-14550-7_2 35. G.N. Athanasiou, D.K. Lymberopoulos, Deployment of pHealth services upon always best connected next generation network, in Artificial Intelligence Applications and Innovations. AIAI 2012. IFIP Advances in Information and Communication Technology, ed by L. Iliadis, I. Maglogiannis, H. Papadopoulos, K. Karatzas, S. Sioutas, vol. 382 (Springer, Berlin, Heidelberg, 2012), pp. 86–94. https://doi.org/10.1007/978-3-642-33412-2_9 36. B.M. Caulfield, S.C. Donnelly, What is connected health and why will it change your practice? QJM: Monthly J. Ass. Phys. 106(8), 703–707 (2013). https://doi.org/10.1093/qjmed/hct114 37. Gabler Wirtschaftslexikon: Big Data. https://wirtschaftslexikon.gabler.de/definition/big-data54101/version-277155 (2018). Accessed 14 March 2021 38. W. Raghupathi, V. Raghupathi, Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2, 3 (2014). https://doi.org/10.1186/2047-2501-2-3 39. S.J. Trenfield, A. Awad, C.M. Madla, G.B. Hatton, J. Firth, A. Goyanes, S. Gaisford, A.W. Basit, Shaping the future: recent advances of 3D printing in drug delivery and healthcare. Expert. Opin. Drug. Deliv. 16(10), 1081–1094 (2019). https://doi.org/10.1080/17425247. 2019.1660318 40. M. Gombolay, X.J. Yang, B. Hayes, N. Seo, Z. Liu, S. Wadhwania, T. Yu, N. Shah, T. Golen, J. Shah, Robotic assistance in the coordination of patient care. Int. J. Rob. Res. 37(10), 1300–1316 (2018). https://doi.org/10.1177/0278364918778344 41. U.K. Mukherjee, K.K. Sinha, Robot-assisted surgical care delivery at a hospital: policies for maximizing clinical outcome benefits and minimizing costs. J. Ops. Manage. 66(1–2), 227–256 (2020). https://doi.org/10.1002/joom.1058 42. A. Bohr, K. Memarzadeh, The rise of artificial intelligence in healthcare applications, in Artificial Intelligence in Healthcare (Elsevier, 2020). pp. 25–60 https://doi.org/10.1016/B9780-12-818438-7.00002-2 43. M. Obschonka, D.B. Audretsch, Artificial intelligence and big data in entrepreneurship: a new era has begun. Small Bus. Econ. 55, 529–539 (2020). https://doi.org/10.1007/s11187019-00202-4 44. J. Lee, T. Suh, D. Roy, M. Baucus, Emerging technology and business model innovation: the case of artificial intelligence. JOItmC 5(3), 44 (2019). https://doi.org/10.3390/joitmc5030044 45. I. Bardhan, H. Chen, Karahanna elena: connecting systems, data, and people: a multidisciplinary research roadmap for chronic disease management. MIS. Q. 44, 185–200 (2020) 46. R.-C. Härting, C. Reichstein, M. Schad, Potentials of digital business models—empirical investigation of data driven impacts in industry. Proc. Comput. Sci. 126, 1495–1506 (2018). https://doi.org/10.1016/j.procs.2018.08.121 47. R. Härting, C. Reichstein, P. Laemmle, A. Sprengel, Potentials of digital business models in the retail industry—empirical results from European experts. Proc. Comput. Sci. 159, 1053–1062 (2019). https://doi.org/10.1016/j.procs.2019.09.274 48. C.M. Ringle, S. Wende, J.-M. Becker, SmartPLS 3. SmartPLS, Bönningstedt (2015) 49. K.K.-K. Wong, Partial least squares structural equation modeling (PLS-SEM) techniques using SmartPLS. Market. Bull. 24 (2013) 50. Gabler Wirtschaftslexikon: Key Performance Indicator (KPI). https://wirtschaftslexikon.gab ler.de/definition/key-performance-indicator-kpi-52670/version-275788 (2018). Accessed 14 March 2021
14 Digital Business Models in the Healthcare Industry
353
51. E.A. Elhadjamor, S.A. Ghannouchi, Analyze in depth health care business process and key performance indicators using process mining. Proc. Comput. Sci. 164, 610–617 (2019). https:// doi.org/10.1016/j.procs.2019.12.227 52. Y.-H. Kuo, O. Rado, B. Lupia, J.M.Y. Leung, C.A. Graham, Improving the efficiency of a hospital emergency department: a simulation study with indirectly imputed service-time distributions. Flex. Serv. Manuf. J. 28, 120–147 (2016). https://doi.org/10.1007/s10696-0149198-7 53. V. Vemulapalli, J. Qu, J.M. Garren, L.O. Rodrigues, M.A. Kiebish, R. Sarangarajan, N.R. Narain, V.R. Akmaev, Non-obvious correlations to disease management unraveled by Bayesian artificial intelligence analyses of CMS data. Artif. Intell. Med. 74, 1–8 (2016). https://doi.org/10.1016/j.artmed.2016.11.001 54. D. Bertsimas, A. Orfanoudaki, R.B. Weiner, Personalized treatment for coronary artery disease patients: a machine learning approach. Health Care Manag. Sci. 23(4), 482–506 (2020). https:// doi.org/10.1007/s10729-020-09522-4 55. S. Denicolai, P. Previtali, Precision Medicine: Implications for value chains and business models in life sciences. Technol. Forecast Soc. Chang. 151, 119767 (2020). https://doi.org/ 10.1016/j.techfore.2019.119767 56. M. Mende, The innovation imperative in healthcare: an interview and commentary. AMS Rev. 9, 121–131 (2019). https://doi.org/10.1007/s13162-019-00140-0 57. SVR Gesundheit: Ebenen von Effizienz- und Effektivitätspotenzialen. https://www.svr-ges undheit.de/index.php?id=413. Accessed 14 March 2021 58. Bundesministerium der Finanzen (ed.): Umsatzsteuerbefreiung nach § 4 Nr. 14 Buchst. a UStG; Umsatzsteuerliche Behandlung der Leistungen von Heilpraktikern und Gesundheitsfachberufen (2012) 59. V. Krämer, R.-C. Härting, Digitale Geschäftsmodelle in der Gesundheitsbranche. in Potenziale digitaler Geschäftsmodelle und deren -prozesse: Ein Branchenvergleich, ed.by R.-C. Härting (2019), pp. 76–132 60. J. Trambacz, Lehrbegriffe und Grundlagen der Gesundheitsökonomie (2016). https://doi.org/ 10.1007/978-3-658-10571-6 61. B. Riedl, W. Peter, Prävention—Früherkennung. in Basiswissen Allgemeinmedizin (Springer, Berlin, Heidelberg, 2020), pp. 435–442. https://doi.org/10.1007/978-3-662-60324-6_10 62. EPatient RSD: Nutzung von Internetanwendungen oder Apps für Gesundheitsthemen in Deutschland im Jahr 2015. https://de.statista.com/statistik/daten/studie/462483/umfrage/nut zung-von-internetanwendungen-oder-apps-fuer-gesundheitsanwendungen/ (2015). Accessed 14 March 2021 63. BIS Research: Umsatz des globalen mobilen Gesundheit-App-Marktes im Jahr 2017 und 2025. https://de.statista.com/statistik/daten/studie/1184929/umfrage/umsatz-des-mob ilen-gesundheit-apps-marktes-weltweit/#professional (2018). Accessed 14 March 2021 64. Airnow: Ranking der beliebtesten Gesundheits- und Fitness-Apps im Google Play Store nach der Anzahl der Downloads in Deutschland im November 2020 (https://de.statista.com/statis tik/daten/studie/688733/umfrage/beliebteste-gesundheits-und-fitness-apps-im-google-playstore-nach-downloads-in-deutschland/). Accessed 14 March 2021 65. T. Jahnel, B. Schüz, Partizipative Entwicklung von Digital-Public-Health-Anwendungen: Spannungsfeld zwischen Nutzer*innenperspektive und Evidenzbasierung (Participatory development of digital public health: tension between user perspectives and evidence). Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 63(2), 153–159 (2020). https://doi.org/10.1007/s00103-019-03082-x 66. S. Azzi, S. Gagnon, A. Ramirez, G. Richards, Healthcare applications of artificial intelligence and analytics: a review and proposed framework. Appl. Sci. 10, 18, 6553 (2020). https://doi. org/10.3390/app10186553 67. F. Fischer, V. Aust, A. Krämer, eHealth: hintergrund und Begriffsbestimmung., in eHealth in Deutschland, ed. by F. Fischer, A. Krämer (Springer Vieweg, Berlin, Heidelberg, 2016), pp. 3–23. https://doi.org/10.1007/978-3-662-49504-9_1
354
N. Hoppe et al.
68. Bundesministerium der Justiz und für Verbraucherschutz / Bundesamt für Justiz: Sozialgesetzbuch (SGB) Fünftes Buch (V) - Gesetzliche Krankenversicherung - (Artikel 1 des Gesetzes v. 20. Dezember 1988, BGBl. I S. 2477) § 27 Krankenbehandlung (2021) 69. A.K. Srivastava, S. Kumar, M. Zareapoor, Self-organized design of virtual reality simulator for identification and optimization of healthcare software components. J. Ambient Intell. Human Comput. (2018). https://doi.org/10.1007/s12652-018-1100-0 70. A.A. Kononowicz, N. Zary, S. Edelbring, J. Corral, I. Hege, Virtual patients–what are we talking about? A framework to classify the meanings of the term in healthcare education. BMC Med. Educ. 15, 11 (2015). https://doi.org/10.1186/s12909-015-0296-3 71. R. Tsopra, M. Courtine, K. Sedki, D. Eap, M. Cabal, S. Cohen, O. Bouchaud, F. Mechaï, J.-B. Lamy, AntibioGame® : a serious game for teaching medical students about antibiotic use. Int. J. Med. Inf. 136, 104074 (2020). https://doi.org/10.1016/j.ijmedinf.2020.104074 72. T. Huber, M. Paschold, C. Hansen, T. Wunderling, H. Lang, W. Kneist, New dimensions in surgical training: immersive virtual reality laparoscopic simulation exhilarates surgical staff. Surg. Endosc. 31, 4472–4477 (2017). https://doi.org/10.1007/s00464-017-5500-6 73. M. Müschenich, L. Wamprecht, Gesundheit 4.0—Wie gehts uns denn morgen? (Health 4.0 - how are we doing tomorrow?). Bundesgesundheitsblatt, Gesundheitsforschung, Gesundheitsschutz 61(3), 334–339 (2018). https://doi.org/10.1007/s00103-018-2702-6 74. K. Miller, G. Mansingh, OptiPres: a distributed mobile agent decision support system for optimal patient drug prescription. Inf. Syst. Front. 19(1), 129–148 (2017). https://doi.org/10. 1007/s10796-015-9595-9 75. B.M. Alwon, G. Solomon, F. Hussain, D.J. Wright, A detailed analysis of online pharmacy characteristics to inform safe usage by patients. Int. J. Clin. Pharm. 37(1), 148–158 (2015). https://doi.org/10.1007/s11096-014-0056-1 76. Bundesministerium für Gesundheit: Apotheken. https://www.bundesgesundheitsministe rium.de/themen/krankenversicherung/online-ratgeber-krankenversicherung/arznei-heil-undhilfsmittel/apotheken.html#c1211 (2020). Accessed 14 March 2021 77. G. Marzano, V. Lubkina, A review of telerehabilitation solutions for balance disorders. Proc. Comput. Sci. 104, 250–257 (2017). https://doi.org/10.1016/j.procs.2017.01.132 78. T. Johansson, C. Wild, Telerehabilitation in stroke care–a systematic review. J. Telemed. Telecare 17(1), 1–6 (2011). https://doi.org/10.1258/jtt.2010.100105 79. C. Guo, H. Ashrafian, S. Ghafur, G. Fontana, C. Gardner, M. Prime, Challenges for the evaluation of digital health solutions-a call for innovative evidence generation approaches. NPJ Dig. Med. 3, 110 (2020). https://doi.org/10.1038/s41746-020-00314-2 80. J. Jörg, Digitalisierung in der Medizin. Wie Gesundheits-Apps, Telemedizin, künstliche Intelligenz und Robotik das Gesundheitswesen revolutionieren. Springer, Berlin (2018) 81. J. Siglmüller, Rechtsfragen der Fernbehandlung (Springer Berlin Heidelberg, Berlin, Heidelberg, 2020). https://doi.org/10.1007/978-3-662-61808-0 82. Bundesärztekammer: Hinweise und Erläuterungen zu § 7 Abs. 4 MBO-Ä – Behandlung im persönlichen Kontakt und Fernbehandlung. Stand: 10.12.2020. Deutsches Ärzteblatt (2020) 83. Presse- und Informationsamt der Bundesregierung: Telefonische Krankschreibung wieder möglich. https://www.bundesregierung.de/breg-de/themen/coronavirus/telefonische-kranks chreibung-1800026 (2020). Accessed 12 January 2021 84. M. Kremers, Teleradiologie und Telemedizin. MKG-Chirurg 13(4), 248–259 (2020). https:// doi.org/10.1007/s12285-020-00270-6 85. A.-C.L. Leonardsen, C. Hardeland, A.K. Helgesen, V.A. Grøndahl, Patient experiences with technology enabled care across healthcare settings- a systematic review. BMC Health Serv. Res. 20(1), 779 (2020). https://doi.org/10.1186/s12913-020-05633-4 86. B. Stanberry, Legal and ethical aspects of telemedicine. J. Telemed. Telecare 12(4), 166–175 (2006). https://doi.org/10.1258/135763306777488825 87. F. Koerber, R.C. Dienst, J. John, W. Rogowski, Einführung. in Business Planning im Gesundheitswesen, W. Rogowski (Springer Gabler, Wiesbaden, 2016), pp. 1–24. https://doi.org/10. 1007/978-3-658-08186-7_1
14 Digital Business Models in the Healthcare Industry
355
88. European Commission: Commission Recommendation of 6 May 2003 concerning the definition of micro, small and medium-sized enterprises. L 124/36 (2003) 89. R.-C. Härting, R. Schmidt, M. Möhring, Business intelligence & big data: eine strategische Waffe für KMU?, in Big Data – Daten strategisch nutzen!, Tagungsband, ed. R. Härting, vol. 7. (Transfertag, Aalen 2014, BOD Norderstedt), pp. 11–25 (2014) 90. C. Homburg, H. Baumgartner, Beurteilung von Kausalmodellen. Bestandsaufnahme und Anwendungsempfehlungen. Marketing : ZFP—J. Res. Mmanage. 17, 162–176 (1995) 91. A. Himme, Gütekriterien der Messung: Reliabilität, Validität und Generalisierbarkeit, in Methodik der empirischen Forschung ed by S. Albers, D. Klapper, U. Konradt, A. Walter, J. Wolf (Gabler Verlag, Wiesbaden, 2009), pp. 485–500. https://doi.org/10.1007/978-3-32296406-9_31 92. R. Weiber, D. Mühlhaus, Güteprüfung reflektiver Messmodelle, in Strukturgleichungsmodellierung. Springer-Lehrbuch, ed by R. Weiber, D. Mühlhaus (Springer Berlin Heidelberg, Berlin, Heidelberg, 2014), pp. 127–172 93. J.F. Hair, G.T.M. Hult, C.M. Ringle, M. Sarstedt, N.F. Richter, S. Hauff, Partial Least Squares Strukturgleichungsmodellierung (Eine anwendungsorientierte Einführung. Verlag Franz Vahlen, München, 2017) 94. W.W. Chin, The partial least squares approach to structural equation modeling, in Modern Methods for Business Research, pp. 295–336 95. S. Akter, K. Michael, M.R. Uddin, G. McCarthy, M. Rahman, Transforming business using digital innovations: the application of AI, blockchain, cloud and data analytics. Ann Oper Res (2020). https://doi.org/10.1007/s10479-020-03620-w 96. P. Esmaeilzadeh, Use of AI-based tools for healthcare purposes: a survey study from consumers’ perspectives. BMC Med. Inf. Dec. Mak. 20(1), 170 (2020). https://doi.org/10. 1186/s12911-020-01191-1 97. F. Jiang, Y. Jiang, H. Zhi, Y. Dong, H. Li, S. Ma, Y. Wang, Q. Dong, H. Shen, Y. Wang, Artificial intelligence in healthcare: past, present and future. Stroke Vascular Neurol. 2(4), 230–243 (2017). https://doi.org/10.1136/svn-2017-000101 98. A.A. Gunn, The diagnosis of acute abdominal pain with computer analysis. J. R. Coll. Surg. Edinb. 21, 170–172 (1976) 99. K.-C. Yuan, L-.W. Tsai, K.-H. Lee, Y.-W. Cheng, S.-C. Hsu, Y.-S. Lo, R.-J. Chen, The development an artificial intelligence algorithm for early sepsis diagnosis in the intensive care unit. Int. J. Med. Inf. 141, 104176 (2020). https://doi.org/10.1016/j.ijmedinf.2020.104176 100. L. Strohm, C. Hehakaya, E.R. Ranschaert, W.P.C. Boon, E.H.M. Moors, Implementation of artificial intelligence (AI) applications in radiology: hindering and facilitating factors. Eur. Radiol. 30(10), 5525–5532 (2020). https://doi.org/10.1007/s00330-020-06946-y 101. T.H. Davenport, R. Ronanki, Artificial intelligence artificial intelligence for the real world. Don’t start with moon shots. Harvard Bus. Rev. January-February 2018, 1–10 (2018) 102. J. Amann, A. Blasimme, E. Vayena, D. Frey, V.I. Madai, Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med. Inf. Dec. Making 20(1), 310 (2020). https://doi.org/10.1186/s12911-020-01332-6 103. A.B. Kocaballi, K. Ijaz, L. Laranjo, J.C. Quiroz, D. Rezazadegan, H.L. Tong, S. Willcock, S. Berkovsky, E. Coiera, Envisioning an artificial intelligence documentation assistant for future primary care consultations: A co-design study with general practitioners. J. Am. Med. Inf. Assoc.: JAMIA 27(11), 1695–1704 (2020). https://doi.org/10.1093/jamia/ocaa131 104. P.M. Doraiswamy, C. Blease, K. Bodner, Artificial intelligence and the future of psychiatry: Insights from a global physician survey. Artif. Intell. Med. 102, 101753 (2020). https://doi. org/10.1016/j.artmed.2019.101753 105. A. Akay, H. Hess, Deep Learning: Current and Emerging Applications in Medicine and Technology. IEEE J. Biomed. Health Inform. 23(3), 906–920 (2019). https://doi.org/10.1109/ JBHI.2019.2894713 106. Z. Dlamini, F.Z. Francies, R. Hull, R. Marima, Artificial intelligence (AI) and big data in cancer and precision oncology. Comput. Struct. Biotechnol. J. 18, 2300–2311 (2020). https:// doi.org/10.1016/j.csbj.2020.08.019
356
N. Hoppe et al.
107. MarketsandMarkets: Artificial Intelligence in Healthcare Market with Covid-19 Impact Analysis by Offering (Hardware, Software, Services), Technology (Machine Learning, NLP, Context-Aware Computing, Computer Vision), End-Use Application, End User and Region Global Forecast to 2026 (https://www.marketsandmarkets.com/Market-Reports/artificial-int elligence-healthcare-market-54679303.html). Accessed 14 March 2021 108. D. Thesmar, D. Sraer, L. Pinheiro, N. Dadson, R. Veliche, P. Greenberg, Combining the power of artificial intelligence with the richness of healthcare claims data: opportunities and challenges. Pharmacoeconomics 37(6), 745–752 (2019). https://doi.org/10.1007/s40273-01900777-6 109. M.-C. Laï, M. Brian, M.-F. Mamzer, Perceptions of artificial intelligence in healthcare: findings from a qualitative survey study among actors in France. J. Transl. Med. 18(1), 14 (2020). https://doi.org/10.1186/s12967-019-02204-y 110. E. Meinert, A. Alturkistani, D. Brindley, P. Knight, G. Wells, N. de Pennington, Weighing benefits and risks in aspects of security, privacy and adoption of technology in a value-based healthcare system. BMC Med. Inf. Dec. Mak. 18(1), 100 (2018). https://doi.org/10.1186/s12 911-018-0700-0 111. S. Thiebes, S. Lins, A. Sunyaev, Trustworthy artificial intelligence. Electron Markets (2020). https://doi.org/10.1007/s12525-020-00441-4
Chapter 15
Advances in XAI: Explanation Interfaces in Healthcare Cristina Manresa-Yee, Maria Francesca Roig-Maimó, Silvia Ramis, and Ramon Mas-Sansó
Abstract Artificial Intelligence based algorithms are gaining a main role in healthcare. However, the black-box nature of models such as deep neural networks challenges the users’ trust. Explainable Artificial Intelligence (XAI) strives for more transparent and interpretable AI, achieving intelligent systems that help the user understand the AI predictions and decisions increasing the trustfulness and reliability of the systems. In this work, we present an overview of contexts in healthcare where explanation interfaces are used. We conduct a search in main research databases and compile works related to healthcare to show the widespread applicability of the intelligent systems and how researchers offer explanations in form of natural text, parameters influence, visualizations of data graphs or saliency maps. Keywords Explainable artificial intelligence · XAI · Healthcare · Explanation interface
C. Manresa-Yee (B) · M. F. Roig-Maimó · S. Ramis · R. Mas-Sansó Computer Graphics, Vision and Artificial Intelligence Group, University of Balearic Islands, Palma, Spain e-mail: [email protected] M. F. Roig-Maimó e-mail: [email protected] S. Ramis e-mail: [email protected] R. Mas-Sansó e-mail: [email protected] C. Manresa-Yee · M. F. Roig-Maimó · R. Mas-Sansó Research Institute of Health Sciences, UIB, Palma, Spain © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_15
357
358
C. Manresa-Yee et al.
15.1 Introduction Artificial Intelligence (AI) is achieving impressive results, but frequently these results are difficult to understand by the human user, causing mistrust and disbelief, especially in the health and well-being domains [1]. Just as an example, the U.S. Food and Drug Administration (FDA) asks healthcare providers to independently review the recommendations presented by software and rely on their own judgment to make clinical decisions [2]. To address this issue, Explainable Artificial Intelligence (XAI) proposes to shift towards more transparent AI. XAI can be defined as “AI systems that can explain their rationale to a human user, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future” [3]. Therefore, the aim is to develop more explainable models and explanation interfaces while maintaining high performance levels [4] (see Fig. 15.1). The growing interest on achieving XAI is reflected in the organization of special sessions and workshops addressing this topic in main conferences on Human–Computer Interaction (HCI) and AI in recent years (e.g. IJCAI’17, AAMAS’18, CHI’19, IUI’19) and the increase of surveys regarding XAI [1, 5–15], some of them specifically related to healthcare [16, 17]. AI experts are developing new methods (i.e. deep learning, fuzzy logic) to improve the outcomes of an intelligent system and its explainability [7, 18, 19]. In this sense, Silva et al. [20] analyze interpretability from a broader point of view, going beyond the machine learning (ML) scope and covering different AI fields such as distributional semantics and fuzzy logic. In parallel, from the HCI domain there are also endeavors to contribute with research to develop explainable and comprehensible AI, that is, AI solutions that are easy to learn and use considering effective HCI design, which focus on enabling users to understand the algorithm and parameters used in the AI algorithms [21]. These research lines advance in a major comprehension on how people understand explanations and how should the system deliver them [22], designing practices, guidelines and principles [23, 24] or developing frameworks and innovative algorithm visualizations, interfaces or toolkits [25]. As AI use in healthcare contexts is growing [26], explainability is fundamental to consider when designing the user interface of an AI system, due to the critical nature of the field [27–30]. The need for reasoned information behind the model’s predictions is demanded by the professionals to trust, understand and validate the
Fig. 15.1 XAI concept (based on [4])
15 Advances in XAI: Explanation Interfaces in Healthcare
359
system [31, 32]. Moreover, a trade-off is weighed between accuracy and explainability. Frequently, the high accurate methods are the least transparent, and the more explainable methods are less accurate [33]. In this line, Gunning and Aha [4] place deep learning neural networks at the top of accuracy but at the lowest of explainability. Finally, aspects such as transparency, accountability or fairness are also important and could be assessed through XAI [34]. In this work, we present an overview of contexts in healthcare where explanation interfaces have been applied aiming at offering information which helps the end-user to understand the outcomes provided by the system to make decisions. We conduct a search in main research databases and select works to include in the review that describe the system including the interface for the end-user. Our ultimate goal is to provide newcomers to the field of XAI with an overview of the range of applications where XAI is being used focusing specially on how explanations are delivered in the interface. This work can inspire future research advances in eXplainable Artificial Intelligence combining AI and HCI. The Chapter is organized as follows. Section 15.2 describes works addressing XAI for healthcare in general and design strategies for XAI interfaces in healthcare. Section 15.3 introduces the method applied to search and select the works included in the review. Section 15.4 summarizes the works classified by themes. Finally, conclusions and future work are presented in the last section.
15.2 Related Work In this section we will focus on research works addressing XAI for healthcare in general and design strategies for XAI interfaces in healthcare. The works included present reviews of XAI in healthcare, classify XAI methods used in medicine or report on key issues with multidisciplinary perspectives. Holzinger et al. [33] analyzed what was needed to build XAI systems for the medical domain. They suggested the integration of both neural approaches and knowledge-based to unite the high accuracy of the former with the explainability of the latter. They also emphasized the need of effective user interfaces, which include new strategies to present understandable explanations to the human. In a later work [35], they discussed the differences between explainability and causability and presented an example in the medical domain. They defined causability as a property of a person, while explainability is a property of a system. Similarly to the existing usability methods developed by the HCI community, they fostered the development of causability methodologies and causability measures to ensure the “quality of explanations”. Focusing on the patients, Ploug and Holm [36] advocated that patients should be able to contest the diagnoses of AI diagnostic systems. They focused on four aspects of AI involvement in medical diagnosis: (1) the AI system’s use of data, (2) the system’s potential biases, (3) the system performance, and 4) the division of labor between the system and health care professionals.
360
C. Manresa-Yee et al.
Cutillo et al. [37] presented a white paper that emerged from a workshop entitled “Machine Intelligence in Healthcare” which joined together scientists, physicians, informaticians, and patients from academic, industry, regulatory, and patient advocacy organizations. They reported on key issues such as trustworthiness, explainability, usability, and transparency and discussed on gaps, barriers, and methodologies for implementing machine intelligence in healthcare. Tjoa and Guan [17] presented an exhaustive survey analyzing different interpretability methods suggested by the research community and categorized them based on perceptive interpretability (e.g. Local Interpretable Model-agnostic Explanations (LIME)) and interpretability via mathematical structure (e.g. Generalized Additive Mode (GAM)). They included brief descriptions of the methods and they classified works in the medical field using this categorization. Fan et al. [18] also presented a taxonomy for explainable AI, but focusing specifically on deep neural networks and the interpretation methods. They classified the methods in two main groups: post-hoc interpretability analysis and ad-hoc interpretable modelling, which included subgroups such as explaining-by-base or saliency. Then, they described applications in medicine following their taxonomy, and discussed future directions of interpretability research, such as in relation to fuzzy logic and brain science. Although not directly applied to medicine, Mencar [38] informed on how interpretable fuzzy systems have the capability of explaining its inference process so that users may be confident on how it produces its outcomes. The author posed and answered to questions related to interpretability issues such as how to ensure it or how to assess it. Moreover, the author highlighted the importance of interpretability, “especially in some applicative domains (like Medicine) where fuzzy systems can be used to support critical decisions upon which users (e.g. physicians) must rely”. Tonekaboni et al. [39] presented a study which compiled information from interviews to ten clinicians to (1) identify specific aspects of explainability that builds trust in machine learning models, (2) identify the most important classes of explanations for the clinicians and (3) they mention several concrete metrics for evaluating clinical explainability methods (e.g. domain appropriate representation or consistency). Clinicians viewed explainability as “a means of justifying their clinical decision-making (for instance, to patients and colleagues) in the context of the model’s prediction”. Among the classes of explanation, they discussed on the importance of knowing the subset of features deriving the model outcome or knowing about the uncertainty of a model. Related with the design of XAI, Wang et al. [22] proposed a conceptual framework for building human-centered decision-theory-driven XAI analyzing how people reason, make decisions and seek explanations. The framework relates XAI techniques with concepts in human reasoning processes and has 4 modules (considering aspects of the human and the machine): (1) how people should reason and explain, (2) how XAI generates explanations, (3) how people actually reason (with errors) and (4) how XAI supports reasoning (and mitigates errors). Then, they used it to co-design with 14 clinicians an explainable clinical diagnostic tool.
15 Advances in XAI: Explanation Interfaces in Healthcare
361
Another framework for designing user-centered explanation displays for machine learning models was presented by Barda et al. [40]. The framework is based on a model-agnostic, instance-level explanation approach based on feature influence. The framework considered the user of the system, the goals to be accomplished and the context of use (where or when do users need an explanation). As an example of use, the work describes a display design for predictions from a Pediatric Intensive Care Unit mortality risk model. Finally, a different approach is done by Amann et al. [41]. They adopted a multidisciplinary approach to analyze the relevance of explainability for medical AI, specifically for clinical decision support systems, from the technological, legal, medical, and patient perspectives. They presented challenges in medical AI and highlighted the importance of developing XAI collaborating multidisciplinary.
15.3 Method In order to identify relevant literature for this study we searched in a broad collection of scientific databases covering different disciplines: Scopus, Web of Science, IEEE Xplore and PubMed. We used the terms “XAI”, “Explainable AI” or “Explainable Artificial Intelligence” together with “Health*” (to accommodate both health and healthcare). These keywords were adapted to each search engine. Due to the novelty of the XAI field, we conducted the search for the last 5 years from 2016–2020, including some preprints to be published in 2021. Each work was analyzed to determine their relevance to our search, excluding works not related to healthcare or not addressing the explanation interface, duplicated works, proposals of workshops and tutorials. Therefore, although the search found works and reviews on the use of XAI in areas such as hyperthension [42], use of electronic medical records [43] or behavioral neurostimulation [44], we do not include them as they do not report or show the interface for the end-user.
15.4 Findings In this section we present the works organized in recurring central themes: prediction, diagnosis or automated tasks. XAI explanations in the interfaces of the systems include XAI elements such as attribution or feature influence, visualizations such as tornado diagrams or saliency maps or interactive dashboards.
362
C. Manresa-Yee et al.
15.4.1 Prediction Tasks XAI systems for prediction tasks are used in predicting decline of cognitive functions, critical or chronic diseases and disorders. Kwon et al. [45] provide a visual analytics solution based on a Recurrent Neural Network (RNN) to make risk predictions using Electronic Medical Records (EMR). The system presents a dashboard which allows medical experts to explore the EMR for prediction tasks: estimate the current and future states of patients, analyze common patterns of patients with the same target outcome and test hypothetical scenarios on patients (what-if scenarios). The dashboard comprises rich visualizations including area charts, forecast charts or summary tables. Analyzing also EMR, Lauritsen et al. [46] present xAI-EWS, a system used to predict acute critical illness such as sepsis, acute kidney injury or acute lung injury. To offer explanations of the outcome of a temporal convolutional network, they use a deep Taylor decomposition explanation module. The explanations are given via visualizations showing the relevance of the ten highest-ranking parameters per patient. Further, they inform about the global parameter importance and display local explanations by showing all the individual data-points. Vasquez-Morales et al. [55] propose a neural network-based classifier to predict the risk of developing a chronic kidney disease. They used the Case-Based Reasoning (CBR) paradigm to explain-by-example the predictions of the system. The system highlights the features in common between a given query and the three most similar explanatory cases. To represent the explanatory cases, they select the most relevant variables such as age or myalgia and the actual classification value. In addition, the system can generate a description of the explanation in natural language using templates. Prentzas et al. [47] suggest a methodology to apply argumentation on top of a ML solution to provide explanation for stroke prediction. They apply Gorgias-based argumentation theory (a preference-based structured argumentation framework of logic programming with priorities) on the ML process. The explanation is given in the form of text showing a set of arguments/decision-rules. Finally, the system allows the user to execute queries for new patients by introducing the input data. Workman et al. [48] developed a XAI system to predict opioid use disorder. The authors used Veterans Health Administration data to compute the most relevant features (e.g. gender, mental health issues) and their impact score to predict the disorder. Their findings suggest that impact scores can help understanding the prediction by indicating which features have more influence. For dementia screening, Mengoudi et al. [38] propose a methodology to extract highlighted features from the raw eye-tracking data. To explain the decisions of a deep neural network, they apply the Layer-wise Relevance Propagation (LRP) technique using heatmaps that show areas of the input that particularly contribute to the prediction. Finally, in a home context, Khodabandehloo et al. [49] present a system that uses data from sensorized smart-homes to detect a decline of the cognitive functions
15 Advances in XAI: Explanation Interfaces in Healthcare
363
of the elderly in order to promptly alert practitioners. Their method uses clinical indicators such as subtle disruptions when performing tasks or locomotion anomalies. They present a dashboard to explain the system’s prediction to the clinicians. Together with the prediction (the level of person anomaly), they present various forms of data presentation such as tables, plots, natural language explanations or numerical summaries. Their initial evaluation with clinicians suggests that the explanation capabilities are useful to improve the task performance and to increase trust and reliance.
15.4.2 Diagnosis Tasks Using medical images or other personal information, AI can be used to diagnose diseases such as tissue tumor, diabetic retinopathy, degenerative diseases (e.g. Alzheimer or Parkinson), pathologic gait patterns or pain situations. Applying XAI methods help practitioners to understand the results of the AI system. Palatnik de Sousa et al. [50] report on a XAI system generated with LIME that aims at generating explanations visually on the image on how a Convolutional Neural Network (CNN) detects tumor tissue in patches extracted from histology whole slide images. Future work includes the collaboration with expert pathologists to evaluate the explanations. To assist eye specialists in detecting unhealthy retinas to identify diabetic retinopathy, Kind and Azzopardi [51] present an interactive tool that draws bounding boxes around suspicious lesions and allows specialists to examine them in detail. The intelligent system generates a report informing whether the image is healthy or not and contains some descriptive text along with both the input image and the pre-processed image with the detected features indicated by bounding boxes and confidence scores. Kashyap et al. [52] also develop a XAI approach that automatically extracts the expected anomaly location in a X-ray image from textual reports. They use automatic text to bounding box translation and the obtained region is then used to bias a guided attention inference network (GAIN) to isolate the anomaly. They show an improvement in accuracy of classification and a better justification for the learned model. A mark is generated on the X-ray image and it is linked to the textual description in the medical report. Such local annotations in the image make the radiology follow up easier and faster. Magesh et al. [53] present an interpretable solution using LIME for early diagnosis of Parkinson’s disease. The model is based on a CNN and they use LIME for explaining the decisions on the DaTSCAN images. The application of LIME allows the visual tracing of the regions of interest on the brain for the users to understand the diagnosis, however, the system has not undergone clinical validation yet. Achilleos et al. [54] study the use of rule extraction in the assessment of Alzheimer’s disease from Magnetic Resonance Imaging (MRI) images using decision trees and random forests algorithms. They integrate the extracted rules within
364
C. Manresa-Yee et al.
a XAI framework in order to improve the interpretability and explainability of the results. For this task, the authors used the Gorgias framework to show the final diagnosis explaining the criteria in favor of diagnosing Alzheimer disease. To find individual pathologic gait patterns, Dindorf et al. [55] develop a system using Inertial Measurement Unit (IMU) data. They use a LIME model to approach the study of the influence of different input representations to find out individual pathologic gait patterns of both healthy subjects and subjects after total hip arthroplasty. The input representations are: automatically extracted features, features based on descriptive statistics and waveform data. They conclude that the used representation heavily influences the interpretation and clinical relevance of the pathology. Finally, to detect pain in facial expressions, Weitz et al. [56] propose an explainable system to distinguish facial expressions of pain from other facial expressions such as happiness and disgust. They use two explainable methods, Layer-wise Relevance Propagation (LRP) and Local Interpretable Model-agnostic Explanations (LIME) to visualize areas and pixels relevant for the classification on the image.
15.4.3 Automated Tasks XAI systems for automated tasks process data to automatically make decisions and explain them. Such decisions can help in supporting clinical decision or reducing workloads. Examples include systems for predicting the next visit of a patient, ordering MRI scans, provide alerts, recommend antibiotics, or evaluate public health interventions. Panigutti et al. [57] propose a multilabel classifier which takes as input the clinical history of a patient to predict the next visit using a rule-based explanation approach. Visual information showing the visits and the medical codes of the patient highlighted with colors together with text explaining the decision rules applied with the definition of the codes is given to understand the prediction. Zhang et al. [32] integrate a RNN with LIME to provide explainable support to clinicians on ordering MRI scans. The system analyzes the adhesion of the clinical case notes to the American College of Radiology (ACR) guidelines and predicts the need of an MRI. The inbuilt LIME algorithm in the user interface provides insights by assigning weights to each feature. The user interface presents the prediction as well as the corresponding word feature weight diagrams of LIME. Clinicians perceived potential in the system to improve their workflow, despite the limitations of LIME supporting the explanation of multiple terms and phrases (i.e. “acute meningitis’). Among the contributions of Kobylarz et al. [58], we find an intelligent platform implemented with multiple ML models and XAI methods that provides early warning to the healthcare team of clinical deterioration using alerts delivered in a dashboard deployed with ELI5 to produce visual explanation. They test different ML to analyze the most accurate and present the alerts with several vital signs such as heart rate or systolic blood pressure, to improve the understanding of why the alert was generated.
15 Advances in XAI: Explanation Interfaces in Healthcare
365
Lamy and Tsopra [59] propose an interactive visual decision support application in the field of antibiotherapy for urinary infections in primary care using irregular, deformable and interactive rainbow boxes. The system computes a score for each suitable antibiotic and the one with the highest score is recommended as the first choice. The antibiotics and patient conditions are displayed and visually show the contribution of each patient condition to the result. Finally, in order to evaluate public health interventions, policies, and programs, Brenas and Shaban-Nejad [60] propose a formal methodology consisting of generating causal diagrams and an ontology-based inference model for causal description, using the theory of change (TOC) with logic models that define the intervention under consideration. The authors apply this methodology to study smoking cessation interventions. They use semantic inference and causal reasoning to identify the elements that affect more in a disease (e.g. behaviors, conditions) and that should therefore be changed.
15.5 Conclusions AI in healthcare is a very active area of research. Machine learning techniques can analyze big datasets or automate tasks in order to provide clinical decision support, increase disease diagnosis or create tools to reduce the workloads, among others benefits. However, the black-box nature of these systems raise trust, legal and liability issues. Therefore, XAI is gaining importance to help users to understand the decisions or predictions of the AI system. In this review, we have compiled works related to healthcare to show the widespread applicability of the intelligent systems and how researchers offer explanations in the interfaces in form of natural text, parameters influence, visualizations of data graphs, highlighted zones in medical images or interactive dashboards. The current research landscape in XAI still has many challenges both for the AI and HCI communities for developing more explainable models with high accuracy results and for designing explanation interfaces. Research to achieve the latter comprises a deep comprehension on how people understand and perceive explanations, the needs for different type of explainability or the development of guidelines, frameworks, principles, toolkits and measurements. Acknowledgements This work has been supported by the project PID2019-104829RA-I00 / AEI / https://doi.org/10.13039/501100011033, EXPLainable Artificial INtelligence systems for health and well-beING (EXPLAINING).
366
C. Manresa-Yee et al.
References 1. A. Adadi, M. Berrada, Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018) 2. U.S. Food and Drug Administration, Guidance Document. Clinical Decision Support Software. Draft Guidance for Industry and Food and Drug Administration Staff (2019) 3. Defense Advanced Research Projects Agency, Explainable Artificial Intelligence (XAI) (DARPA-BAA-16–53) (2016) 4. D. Gunning, D.W. Aha, DARPA’s explainable artificial intelligence (XAI) program. AI Mag 40, 44–58 (2019). https://doi.org/10.1609/aimag.v40i2.2850 5. F.K. Dosilovic, M. Brcic, N. Hlupic, Explainable artificial intelligence: a survey, in 2018 41st international convention on information and communication technology, electronics and microelectronics, MIPRO 2018—proceedings. Institute of Electrical and Electronics Engineers Inc (2018), pp. 210–215 6. A. Barredo Arrieta, N. Díaz-Rodríguez, J. Del Ser et al., Explainable explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012 7. A. Fernandez, F. Herrera, O. Cordon et al., Evolutionary fuzzy systems for explainable artificial intelligence: why, when, what for, and where to? IEEE Comput. Intell. Mag. 14, 69–81 (2019). https://doi.org/10.1109/MCI.2018.2881645 8. A. Abdul, J. Vermeulen, D. Wang, et al., Trends and trajectories for explainable, accountable and intelligible systems: an HCI research agenda, in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery (New York, NY, USA, 2018), pp. 1–18 9. O. Biran, C.V. Cotton, Explanation and justification in machine learning: a survey. in IJCAI 2017 Workshop on Explainable Artificial Intelligence (XAI) (2017) 10. D.V. Carvalho, E.M. Pereira, J.S. Cardoso, Machine learning interpretability: a survey on methods and metrics. Electronics, 8 (2019) 11. T. Chakraborti, S. Sreedharan, Y. Zhang, S. Kambhampati, Plan explanations as model reconciliation: moving beyond explanation as soliloquy, in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17 (2017), pp. 156–163 12. J.J. Ferreira, M.S. Monteiro, What are people doing about XAI user experience? A survey on AI explainability research and practice, in Design, User Experience, and Usability. Design for Contemporary Interactive Environments. ed. by A. Marcus, E. Rosenzweig (Springer International Publishing, Cham, 2020), pp. 56–73 13. R.R. Hoffman, G. Klein, S.T. Mueller, Explaining explanation for “explainable Ai.” Proc Hum Factors Ergon Soc Annu Meet 62, 197–201 (2018). https://doi.org/10.1177/154193121862 1047 14. W.J. Murdoch, C. Singh, K. Kumbier, et al., Interpretable Machine Learning: Definitions, Methods, and Applications (2019) arXiv Prepr arXiv190104592 15. A. Das, P. Rad, Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A Survey (2020). ArXiv abs/2006.1 16. M.A. Ahmad, C. Eckert, A. Teredesai, Interpretable machine learning in healthcare, in Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery (New York, NY, USA, 2018), pp. 559–560 17. E. Tjoa, C. Guan, A survey on explainable artificial intelligence (XAI): towards medical XAI. IEEE Trans. Neural Networks Learn Syst. (2020) 18. F. Fan, J. Xiong, M. Li, G. Wang, On Interpretability of artificial neural networks: a survey (2020). arXiv e-prints arXiv:2001.02522 19. E. Da˘glarli, Explainable artificial intelligence (xAI) approaches and deep meta-learning models, in Advances and Applications in Deep Learning, ed by M.A. Aceves-Fernandez IntechOpen (Rijeka, 2020)
15 Advances in XAI: Explanation Interfaces in Healthcare
367
20. V.S. Silva, A. Freitas, S. Handschuh, On the semantic interpretability of artificial intelligence models (2019) 21. W. Xu, Toward human-centered AI: a perspective from human-computer interaction. Interactions 26, 42–46 (2019). https://doi.org/10.1145/3328485 22. D. Wang, Q. Yang, A. Abdul, B.Y. Lim, Designing theory-driven user-centric explainable AI, in Conference on Human Factors in Computing Systems—Proceedings. Association for Computing Machinery (2019) 23. T. Kulesza, M. Burnett, W.-K. Wong, S. Stumpf, Principles of explanatory debugging to personalize interactive machine learning, in Proceedings of the 20th International Conference on Intelligent User Interfaces. Association for Computing Machinery (New York, NY, USA, 2015), pp. 126–137 24. Q.V. Liao, D. Gruen, S. Miller questioning the AI: informing design practices for explainable AI user experiences, in Conference on Human Factors in Computing Systems—Proceedings (Association for Computing Machinery, 2020) 25. V. Arya, R.K.E. Bellamy, P.Y. Chen, et al. One explanation does not fit all: a toolkit and taxonomy of ai explainability techniques. (2019) arXiv 26. B. Norgeot, B.S. Glicksberg, A.J. Butte, A call for deep-learning healthcare. Nat Med 25, 14–15 (2019). https://doi.org/10.1038/s41591-018-0320-3 27. F. Schwendicke, W. Samek, J. Krois, Artificial intelligence in dentistry: chances and challenges. J Dent Res 99, 769–774 (2020). https://doi.org/10.1177/0022034520915714 28. A.S. Mursch-Edlmayr, W.S. Ng, A. Diniz-Filho et al., Artificial intelligence algorithms to diagnose glaucoma and detect glaucoma progression: translation to clinical practice. Transl. Vis. Sci. Technol. 9, 55 (2020). https://doi.org/10.1167/tvst.9.2.55 29. W. Guo, Explainable artificial intelligence for 6G: improving trust between human and machine. IEEE Commun. Mag. 58, 39–45 (2020). https://doi.org/10.1109/MCOM.001.2000050 30. C. Rudin, Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019). https://doi.org/10.1038/ s42256-019-0048-x 31. M.T. Ribeiro, S. Singh, C. Guestrin, “Why should i trust you?” explaining the predictions of any classifier, in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 13–17-Augu, 1135–1144 (2016). https://doi.org/10.1145/ 2939672.2939778 32. A.Y. Zhang, S.S.W. Lam, M.E.H. Ong, et al. Explainable AI: classification of MRI brain scan orders for quality improvement, in BDCAT 2019—Proceeding 6th IEEE/ACM International Conference Big Data Computer Applications and Technologies, pp. 95–102. https://doi.org/ 10.1145/3365109.3368791 33. A. Holzinger, C. Biemann, C.S. Pattichis, D.B. Kell, What do we need to build explainable AI systems for the medical domain? (2017) arXiv 1–28 34. B. Lepri, N. Oliver, E. Letouzé et al., Fair, transparent, and accountable algorithmic decisionmaking processes. Philos. Technol. 31, 611–627 (2018). https://doi.org/10.1007/s13347-0170279-x 35. A. Holzinger, G. Langs, H. Denk, et al. Causability and explain ability of artificial intelligence in medicine Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 9 (2019). https://doi.org/10. 1002/widm.1312 36. T. Ploug, S. Holm, The four dimensions of contestable AI diagnostics-a patient-centric approach to explainable. AI Artif. Intell. Med. 107 (2020). https://doi.org/10.1016/j.artmed.2020.101901 37. C.M. Cutillo, K.R. Sharma, L. Foschini, ‘et al., Machine intelligence in healthcare—perspectives on trustworthiness, explainability, usability, and transparency. NPJ. Digit. Med. 3, 1–5 (2020). https://doi.org/10.1038/s41746-020-0254-2 38. C. Mencar, Interpretability of fuzzy systems, in Fuzzy logic and applications. ed. by F. Masulli, G. Pasi, R. Yager (Springer International Publishing, Cham, 2013), pp. 22–35 39. S. Tonekaboni, S. Joshi, M.D. McCradden, A. Goldenberg, What clinicians want: contextualizing explainable machine learning for clinical end use, in Proceedings of the 4th Machine Learning for Healthcare Conference, ed by F. Doshi-Velez, Fackler, J., K. Jung, et al. (PMLR, Ann Arbor, Michigan, 2019), pp. 359–380
368
C. Manresa-Yee et al.
40. A.J. Barda, C.M. Horvat, H. Hochheiser, A qualitative research framework for the design of user-centered displays of explanations for machine learning model predictions in healthcare. BMC Med. Inform. Decis. Mak. 20 (2020). https://doi.org/10.1186/s12911-020-01276-x 41. J. Amann, A. Blasimme, E. Vayena et al., Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med. Inform. Decis. Mak. 20, 1–9 (2020). https://doi. org/10.1186/s12911-020-01332-6 42. H. Koshimizu, R. Kojima, Y. Okuno, Future possibilities for artificial intelligence in the practical management of hypertension. Hypertens Res. 43, 1327–1337 (2020). https://doi.org/10. 1038/s41440-020-0498-x 43. S.N. Payrovnaziri, Z. Chen, P. Rengifo-Moreno et al., Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review. J. Am. Med. Inform. Assoc. 27, 1173–1185 (2020). https://doi.org/10.1093/jamia/ocaa053 44. J.-M. Fellous, G. Sapiro, A. Rossi, et al., Explainable artificial intelligence for neuroscience: behavioral neurostimulation. Front. Neurosci. 13 (2019). https://doi.org/10.3389/fnins.2019. 01346 45. B.C. Kwon, M.J. Choi, J.T. Kim et al., RetainVis: visual analytics with interpretable and interactive recurrent neural networks on electronic medical records. IEEE Trans. Vis. Comput. Graph. 25, 299–309 (2019). https://doi.org/10.1109/TVCG.2018.2865027 46. S.M. Lauritsen, M. Kristensen, M.V. Olsen, et al. Explainable artificial intelligence model to predict acute critical illness from electronic health records. Nat. Commun. 11. https://doi.org/ 10.1038/s41467-020-17431-x 47. N. Prentzas, A. Nicolaides, E. Kyriacou, et al. Integrating machine learning with symbolic reasoning to build an explainable ai model for stroke prediction, in Proceedings - 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering, BIBE 2019. Institute of Electrical and Electronics Engineers Inc., (2019), pp. 817–821 48. T.E. Workman, Q. Zeng-Treitler, Y. Shao, et al. Explainable deep learning applied to understanding opioid use disorder and its risk factors, in Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019. Institute of Electrical and Electronics Engineers Inc., (2019) pp. 4883–4888 49. E. Khodabandehloo, D. Riboni, A. Alimohammadi, HealthXAI: collaborative and explainable AI for supporting early diagnosis of cognitive decline. Futur. Gener. Comput. Syst. 116, 168– 189 (2021). https://doi.org/10.1016/j.future.2020.10.030 50. I. Sousa, M.B.R. de Vellasco, M.E. da Silva, Local Interpretable model-agnostic explanations for classification of lymph node metastases sensors (Basel) 19 (2019). https://doi.org/10.3390/ s19132969 51. A. Kind, G. Azzopardi, An explainable AI-based computer aided detection system for diabetic retinopathy using retinal fundus images. Lecture Notes in Computer Science (including Subser. Lect. Notes Artif. Intell Lect Notes Bioinformatics) 11678 LNCS:457–468 (2019). https://doi. org/10.1007/978-3-030-29888-3_37 52. S. Kashyap, A. Karargyris, J. Wu, et al. Looking in the right place for anomalies: explainable ai through automatic location learning (2020). arXiv 1125–1129 53. P.R. Magesh, R.D. Myloth, R.J. Tom, An Explainable machine learning model for early detection of parkinson’s disease using LIME on DaTSCAN Imagery Comput. Biol. Med. 126 (2020). https://doi.org/10.1016/j.compbiomed.2020.104041 54. K.G. Achilleos, S. Leandrou, N. Prentzas, et al. Extracting explainable assessments of alzheimer’s disease via machine learning on brain MRI imaging data. In: 2020 IEEE 20th international conference on bioinformatics and bioengineering (BIBE). (IEEE, 2020), pp. 1036–1041 55. C. Dindorf, W. Teufl, B. Taetz et al., Interpretability of input representations for gait classification in patients after total hip arthroplasty. Sensors (Switzerland) 20, 1–14 (2020). https://doi. org/10.3390/s20164385 56. K. Weitz, T. Hassan, U. Schmid, J.-U. Garbas, Deep-learned faces of pain and emotions: Elucidating the differences of facial expressions with the help of explainable AI methods. Tech. Mess. 86, 404–412 (2019). https://doi.org/10.1515/teme-2019-0024
15 Advances in XAI: Explanation Interfaces in Healthcare
369
57. C. Panigutti, A. Perotti, D. Pedreschi, Doctor XAI: an ontology-based approach to black-box sequential data classification explanations, in Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Association for Computing Machinery, New York, NY, USA, 2020), pp. 629–639 58. J. Kobylarz Ribeiro, H.D.P. Dos Santos, F. Barletta, et al., A machine learning early warning system: multicenter validation in brazilian hospitals, in Proceeding of IEEE Symposium on Computer-Based Medical Systems (2020), pp. 321–326. https://doi.org/10.1109/CBMS49503. 2020.00067 59. J.B. Lamy, R. Tsopra, Visual explanation of simple neural networks using interactive rainbow boxes, in Proceedings of the International Conference on Information Visualisation. Institute of Electrical and Electronics Engineers Inc. (2019), pp. 50–55 60. J.H. Brenas, A. Shaban-Nejad, Health intervention evaluation using semantic explainability and causal reasoning. IEEE Access 8, 9942–9952 (2020). https://doi.org/10.1109/ACCESS. 2020.2964802
Chapter 16
Medical Knowledge Graphs in the Discovery of Future Research Collaborations Nikolaos Giarelis , Nikos Kanakaris , and Nikos Karacapilidis
Abstract This chapter introduces a framework that is based on a novel graph-based text representation method and combines graph-based feature selection, text categorization and link prediction to advance the discovery of future research collaborations. Our approach integrates into a single knowledge graph both structured and unstructured textual data through a novel representation of multiple scientific documents. The Neo4j graph database is used for the representation of the proposed scientific knowledge graph. For the implementation of our approach, we use the Python programming language and the scikit-learn machine learning library. We assess our approach against classical link prediction algorithms using accuracy, recall and precision as our performance metrics. Our experiments achieve state-of-the-art accuracy in the task of predicting future research collaborations. The experimentations reported in this chapter use the COVID-19 Open Research Dataset. Keywords Link prediction · Text categorization · Feature selection · Knowledge graphs · Natural language processing · Document representation
16.1 Introduction In recent years, we have witnessed an increase in the adoption of graph-based approaches for predicting future research collaborations by utilizing tasks such as link prediction, feature selection and text categorization [1, 2]. In these approaches, graphbased text representations are being used as a means to select important features from N. Giarelis · N. Kanakaris · N. Karacapilidis (B) Industrial Management and Information Systems Lab, MEAD, University of Patras, Rio, 26504 Patras, Greece e-mail: [email protected] N. Giarelis e-mail: [email protected] N. Kanakaris e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_16
371
372
N. Giarelis et al.
all documents and build communities or clusters of similar documents, whereas a collaboration between two researchers is generally denoted by a scientific article written by them [3]. Graph-based approaches (particularly those concerning knowledge graphs) build on concepts and methods from graph theory (e.g. node centrality, link prediction and node similarity measures) to discover hidden knowledge from the structural characteristics of the corresponding research graph [4]. However, despite their broad adoption, existing graph-based approaches aiming to discover future research collaborations utilize only the structural characteristics of a research graph [5]. In cases where unstructured textual data is available (e.g. graph nodes that correspond to scientific articles), existing approaches are incapable of simultaneously exploiting both the structural and the textual information of the graph. To remedy the above weakness, this chapter proposes the construction and utilization of a scientific knowledge graph where structured and unstructured data co-exist (e.g. document, author and word nodes). Building on our previous work, we represent the documents of a scientific graph as a graph-of-docs [6–8]. This enables us to exploit both the structural and textual characteristics of a research graph, and accordingly build a framework incorporating algorithms for tasks such as link prediction to discover future collaborations, text categorization to pair similar documents in communities studying a certain topic, and feature selection to identify the key features of the documents under consideration. The proposed approach uses the Neo4j graph database (https://neo4j.com) for the representation of the knowledge graph. For the implementation of our experiments, we use the Python programming language and the scikit-learn machine learning (ML) library (https://scikit-learn.org). To evaluate the outcome of this chapter, we assess the proposed framework against different combinations of link prediction measures, which utilize only the structural information of a research graph. Our performance metrics include the accuracy, the precision, and the recall for each of the ML models considered. For our experiments, we use the COVID-19 Open Research Dataset (CORD-19). To examine whether our approach is affected by the size of the dataset (e.g. overfits or underfits), we extract and consider nine different well-balanced datasets. The experimental results demonstrate state-of-the-art accuracy in the link prediction problem. The remainder of the chapter is organized as follows: Sect. 16.2 introduces background issues and comments on related work; the proposed framework is thoroughly presented and evaluated in Sects. 16.3 and 16.4, respectively; finally, concluding remarks and future work directions are outlined in Sect. 16.5.
16.2 Background Issues For the discovery of future research collaborations, the proposed approach exploits a combination of natural language processing (NLP), graph-based text representation, graph theory and knowledge graph techniques.
16 Medical Knowledge Graphs in the Discovery …
373
16.2.1 Graph Measures and Indices Diverse graph measures and indices to capture knowledge related to the structural characteristics of a graph have been proposed in the literature [9]. Below, we mention a small subset of them, which is used in our approach. We define |S| as the number of elements found in a set S. The Common Neighbors measure, denoted by CN (a, b), calculates the number of nodes that are common neighbors for a pair of nodes a and b [10]. It is defined as: C N (a, b) = |(a) ∩ (b)|
(16.1)
where Γ (x) denotes the set of neighbors of a node x. The Total Neighbors measure, denoted by TN (a, b), takes into consideration all neighbors of a pair of nodes a and b (and not only the common ones as is the case in the previous measure). It is defined as: T N (a, b) = |(a) ∪ (b)|
(16.2)
The Preferential Attachment measure, denoted by PA (a, b), calculates the product of the in-degree values of a pair of nodes a and b [11]. This measure assumes that two highly connected nodes are far more likely to be connected in the future, in contrast to two loosely connected ones. This measure is defined as: P A(a, b) = |(a)|∗ |(b)|
(16.3)
The Adamic Adar measure, denoted by AA (a, b), calculates the sum of the inverse logarithm of the degree of the set of neighbors shared by a pair of nodes a and b [12]. This measure assumes that nodes of a low degree are more likely to be influential in the future. It is defined as: 1 (16.4) A A(a, b) = log|(c)| c∈(a)∩(b) Finally, the Jaccard Coefficient index, denoted by J (a, b), resembles the CN measure mentioned above; however, it differs slightly in that, for a pair of nodes a and b, it considers the amount of the intersection of their neighbor nodes over the union of them [13]. It is defined as: J (a, b) =
|(a) ∩ (b)| |(a) ∪ (b)|
(16.5)
374
N. Giarelis et al.
16.2.2 Graph-Based Text Representations The graph-of-words textual representation [14] represents each document of a corpus as a single graph. In particular, each graph node corresponds to a unique word of a document and each edge denotes the co-occurrence between two words within a sliding window of text. Rousseau et al. [15] suggest that a window size of four seems to be the most appropriate value, in that it does not sacrifice either the performance or the accuracy of the ML models. Compared to the bag-of-words representation, it enables a more sophisticated feature engineering process due to the fact that it takes into consideration the co-occurrence between the terms. In any case, the limitations of the graph-of-words text representation are that: (i) it is unable to assess the importance of a word for a whole set of documents; (ii) it does not allow for representing multiple documents in a single graph, and (iii) it is not easily expandable to support more complicated data architectures.
16.2.3 Graph-Based Feature Selection Various promising graph-based feature selection approaches have been already proposed in the literature. Rousseau et al. [15] proposed several combinations and arrangements of popular frequent subgraph mining techniques, such as gSpan [16], Gaston [17] and gBoost [18], to achieve unsupervised feature selection by utilizing the k-core subgraph. Especially, in order to get a performance boost, Rousseau and his colleagues build on the concept of a k-core subgraph to compress the most dense parts of the graph representation. Their experimental results indicate a significant increase in accuracy compared to common classification approaches. Henni et al. [19] applied centrality algorithms, such as PageRank, to calculate a centrality measure of each graph feature and accordingly select the most important ones. Fakhraei et al. [20] build on combinations of graph algorithms that belong in different classes, aiming to track strongly connected graph features. Such algorithms include the Louvain community detection algorithm and the PageRank centrality algorithm to discover influential nodes and other user defined graph measures. Other approaches rely on recursively filtering out features in terms of reducing the existing feature space. For instance, one of them re-applies PageRank to find the most influential features in the feature space [21]. These approaches use graphconnected features to include contextual information, as modelled implicitly by a graph structure, using edges that describe connections among real data. They aim to reduce ambiguity in feature selection and improve accuracy in traditional ML methods.
16 Medical Knowledge Graphs in the Discovery …
375
16.2.4 Graph-Based Text Categorization Many interesting approaches have been also proposed in the literature for the graphbased text categorization process. Depending on their underlying methods, these can be classified into two basic categories: (i) these that utilize frequent subgraph mining for feature extraction, and (ii) those that build on graph kernels. Well known frequent subgraph mining techniques were mentioned in the previous subsection. Rousseau et al. [15] propose several combinations of these methods, ranging from unsupervised feature mining using gSpan to unsupervised feature selection by utilizing the k-core subgraph. Nikolentzos et al. [22] make a significant contribution to previous approaches, with their work on ‘graph kernel’—based algorithms. A graph kernel is a measure that calculates the similarity between two graphs. For instance, a document similarity algorithm based on shortest path graph kernels has been proposed; common ML classifiers such as support vector machines (SVM) and k-nearest neighbors (kNN) can use the results of this algorithm as a distance measure. Their experimental results indicate that classifiers that utilize graph kernel algorithms outperform several classical approaches. Siglidis et al. [23] collect several popular graph kernel libraries into a single unified framework, namely the GraKeL Python library, and provide a user-friendly API (similar to that of scikit-learn) that enables one to augment the library with new and custom graph kernels.
16.2.5 Graph-Based Link Prediction As far as the discovery of future research collaborations using link prediction techniques is concerned, works that are closer to our approach are those of Liben-Nowell and Kleinberg [24–29]. Specifically, Liben-Nowell and Kleinberg [24] rely only on network topology aspects of a co-authors network, and the proximity of a pair of nodes to calculate the probability of future research collaborations between them. Sun et al. [27] propose the use of structural properties to predict future research collaborations in heterogeneous bibliographic networks, where multiple types of nodes (e.g. venues, topics, papers, authors) and edges (e.g. publish, mention, write, cite, contain) co-exist. They exploit the relationships between the papers to improve the accuracy of their link prediction algorithm. Guns and Rousseau [25] recommend potential research collaborations using link prediction techniques and a random forest classifier. For each pair of nodes of a co-authorship network, they calculate a variety of topology-based measures such as Adamic Adar and Common Neighbors, and they combine them with locationbased characteristics related to the authors. Hence, they propose future collaborations based on the location of the authors and their position on the co-authorship network. Huang et al. [26] construct a co-authorship network for the Computer Science field that represents research collaborations from 1980 to 2005. They rely on classical
376
N. Giarelis et al.
statistical techniques and graph theory algorithms to describe the properties of the constructed co-authorship network. The dataset used contains 451,305 papers from 283,174 authors. Yu et al. [28] utilize link prediction algorithms to discover future research collaborations in medical co-authorship networks. For a given author, they attempt to identify potential collaborators that complement her as far as her skillset is concerned. They calculate common topological and structural measures for each pair of author nodes, including Adamic Adar, Common Neighbors and Preferential Attachment. ML models are used for the identification of possible future collaborations. Chuan et al. [29] propose a new content similarity algorithm for link prediction in co-authorship networks, namely LDAcosin. This algorithm initially performs topic modelling using the LDA model to produce a feature vector for each paper, and then calculates the similarity between authors by using cosine similarity between the produced vectors. For a broader link prediction perspective, we refer to Fire et al. [30], Julian and Lu [31] and Panagopoulos et al. [32], these works describe approaches concerning the task of predicting possible relationship types between nodes (e.g. friendships in social networks).
16.3 The Proposed Framework In this section, we propose a framework that builds on the concept of the graphof-docs to support and eventually augment the quality of predicting future research collaborations.
16.3.1 Graph-Based Text Representation As mentioned in the previous section, to remedy the shortcomings of the graph-ofwords representation, Giarelis et al. [6–8] have proposed the graph-of-docs representation, which depicts and elaborates multiple textual documents as a single graph. This representation enables us to store different types of nodes and edges in a graph, ranging from node types such as ‘document’ and ‘word’ to edge types such as ‘is_similar’, ‘connects’ and ‘includes’. In addition, it allows us to explore the significance of a term not just in terms of a single document but rather across many documents. Moreover, the proposed representation permits us to abstract each graph of words by using a document node. Finally, it supports relationship edges between documents, thus enabling the calculation of important metrics as far as the documents are concerned (e.g., spotting communities of similar documents, recognizing important document which are representative for the corpus, cluster documents that share the same topic in communities without any prior knowledge, etc.).
16 Medical Knowledge Graphs in the Discovery …
377
The graph-of-docs representation builds a directed dense graph which maintains all the connections between the documents and the words of a corpus. Each unique document node connects to all the unique word nodes that it includes, using the ‘includes’ edge type; the ‘connects’ edge types are applied to link two word nodes and designate their co-occurrence within a specific sliding text window. In the end, an ‘is_similar’ edge type is used to connect a pair of document nodes and indicate their contextual similarity; this is done by utilizing the Jaccard similarity index, since it deals only with the percentage of common words, ignoring their document frequency. The above transformation of a set of documents into a graph model enables the reduction of various NLP problems to well-studied graph problems, which can be tackled by employing techniques from graph theory [15]. These techniques investigate important graph properties, such as node centrality and frequent subgraphs, which are applied respectively to extract meaningful keywords and to discover similar documents. In this chapter, we utilize the graph-of-docs model to represent the textual data of a knowledge graph. We argue that the accuracy of common NLP and text mining tasks can be improved by adopting the proposed graph-of-docs representation. The proposed representation: (i) enables the investigation of the importance of a term into a whole corpus of documents, and (ii) allows multiple node types to co-exist in the same graph, thus being easily expandable and adaptable to more complex data.
16.3.2 Graph-Based Feature Selection The proposed graph-based feature selection process follows four steps. Firstly, a document similarity subgraph is created, based on the assumption that subgraphs of the entire graph-of-docs graph describing similar documents have common word nodes and similar structural characteristics. This enables us to calculate the similarity between two documents by utilizing classical similarity measures. The similarity subgraph consists of document nodes and edges of the ‘is_similar’ type, which store the similarity score between two nodes. Secondly, by exploiting the document similarity subgraph, we identify communities (groups) of contextually similar documents using the ‘score’ property of the ‘is_similar’ type edges as a distance value. This is made possible by the use of the Louvain community detection algorithm [33]. Thirdly, given the fact that documents belonging to the same community are contextually similar, we presume that it is also very likely that they share common terms. Aiming to retrieve the top-N most important terms for all documents belonging to the same community, our algorithm ranks them firstly by their document frequency and secondly by their PageRank score, both in descending order. Finally, we perform feature selection for the whole document corpus by merging the top-N features of each community. This reduces the number of the candidate features, which results in accelerating the feature selection process, thus mitigating
378
N. Giarelis et al.
the effects of the ‘curse-of-dimensionality’ phenomenon and enabling the training of more reliable ML models.
16.3.3 Graph-Based Text Categorization Generally speaking, subgraphs extracted from similar documents share common word nodes as well as similar structural characteristics. This allows us to measure the similarity between two documents either by using classical data mining similarity measures, such as the Jaccard or cosine similarity, or by utilizing frequent subgraph mining techniques (see Sect. 16.2.3). In our current approach, we construct a similarity subgraph that contains document nodes and edges of type ‘is_similar’. It is evident that the creation of that subgraph is not practical in approaches that consider each document individually. In the aforementioned subgraph, we group documents in contextually similar communities, by considering as a distance value the ‘score’ property of the ‘is_similar’ edge types. A plethora of community detection algorithms can be found in the literature, including Louvain [33], Label Propagation [34] and Weakly Connected Components (Monge and Elka [35], an in-depth review of these algorithms can be found in Fortunato [36] and Yang et al. [37]. Since each graph community contains contextually similar documents, there is an increased likelihood for each community to contain documents that belong to the same class, as identified by a text categorization task. Hence, we can easily deduce the document class either by utilizing the most frequent class in its community or by executing a nearest neighbors’ algorithm (such as the k-nearest neighbors).
16.3.4 Graph-Based Link Prediction The CORD-19 dataset used in our work consists of multiple textual documents (i.e. scientific papers) and a metadata file (in.csv format) that contains information about the papers themselves along with their authors and affiliations. The proposed ML pipeline for predicting future collaborations includes the following five steps (Fig. 16.1): Data preprocessing. In this initial step, we preprocess the plain text of the abstract of each paper. We start by tokenizing the data in a list of terms. From this list, we first remove the English stop words; the remaining significant terms are then cleaned by unnecessary Unicode symbols, punctuation and leading whitespace. The graph-of-docs representation. In this step, we use the list of significant terms obtained from the previous one. We start by creating each term in the graph database, without adding duplicate terms. Each term is connected to the next one in the list as long as it is part of the same sentence.
16 Medical Knowledge Graphs in the Discovery …
379
Fig. 16.1 The proposed ML pipeline for discovering future research collaborations
The sliding window size, by which we connect the terms, is in the range [2, 8]; however, as indicated by diverse experimental results in the literature, a window size of 4 seems ideal [15]. The connection between the terms is created in the database as an undirected edge connecting all terms in the specified window. This edge also contains a number, namely the co-occurrence score, which measures the number of co-appearances of a pair of terms in each iteration step of the text parsing process. Common edges between texts are aggregated in terms of their co-occurrence score. This implies that no duplicate edges are introduced in our graph, which reduces the memory footprint. As long as the graph-of-words representation (for a single document) has been created, we then create a node in the database representing the paper itself, which is directly connected to all of its terms. This is crucial, since it allows us to compare papers given their common words. Knowledge graph. In this step, we utilize the metadata of each paper. We start by creating nodes for all authors and their affiliations. Then, we link authors with their affiliations and the papers they have authored by using different types of edges. Moreover, to discover authors of similar papers who work on the same field but have not collaborated so far. This is accomplished by comparing the words of papers between a pair of authors of the papers under consideration. The aforementioned connection also allows us to generate a new knowledge subgraph, called the coauthorship graph, which connects all authors who have already collaborated in the authorship of papers, with an edge indicating the year of their first collaboration. Feature extraction. Our goal in this step is to extract features for classification purposes. For each pair of authors, we apply various link prediction measures, which result in a numerical score (i.e. a positive number that indicates the likelihood of a future collaboration, or zero if there is no such likelihood). These measures exploit the structural characteristics of the co-authorship graph. The final list of features contains a pair of authors’ ids and the value of each measure, which feed the final step of the pipeline. Link prediction. In this step, we utilize the aforementioned features as input to a classification process. This process classifies each pair of authors by assigning a label ‘1’ if the authors may work together in the future, or a label ‘0’ in the opposite case. In other words, the link prediction problem is reduced to a binary classification problem, aiming to discover future research collaborations.
380
N. Giarelis et al.
Fig. 16.2 The data schema of the scientific knowledge graph
Our knowledge graph allows diverse types of entities and relationships to co-exist in a the same graph data schema, including entity nodes with types such as ‘Paper’, ‘Author’, ‘Laboratory’, ‘Location’, ‘Institution’ and ‘Word’, and relationship edges with types such as ‘is_similar’, ‘cites’, ‘writes’, ‘includes’, ‘connects’, ‘co_authors’ and ‘affiliates_with’ (see Fig. 16.2). A ‘Paper’ entity represents a scientific paper or document. An ‘Author’ entity represents an author of a scientific paper or document. The ‘Laboratory’ entity represents the laboratory of an author. The ‘Location’ entity represents the location of a laboratory. The ‘Institution’ entity represents the institution of an author. Each ‘Word’ entity corresponds to a unique word of a scientific paper or document. An ‘includes’ relationship connects a ‘Word’ with a ‘Paper’ entity. It marks the presence of a specific word to a certain paper. A ‘connects’ relationship is only applicable between two ‘Word’ entities and denotes their cooccurrence within a predefined sliding window of text. The subgraph constructed by the ‘Word’ and ‘Paper’ entities, as well as the ‘includes’, ‘connects’ and ‘is_similar’ relationships, corresponds to the graph-of-docs representation of the textual data of the available papers (see Fig. 16.3). An ‘is_similar’ relationship links either a pair of ‘Paper’ or ‘Author’ nodes. In the former case, it denotes the graph similarity of the graph-of-words representation of each paper. In the latter, it denotes the graph similarity between the graphof-docs representations associated to the two authors. The subgraph that consists of the ‘Author’ entities and the ‘is_similar’ relationships corresponds to the authors similarity subgraph. A ‘cites’ relationship links two ‘Paper’ nodes. A ‘writes’ relationship links an ‘Author’ with a ‘Paper’ entity. An ‘affiliates_with’ relationship connects an ‘Author’ entity with a ‘Laboratory’, ‘Location’
16 Medical Knowledge Graphs in the Discovery …
381
Fig. 16.3 Representing textual data of papers using the graph-of-docs model (relationships between papers are denoted with dotted lines). The graph-of-docs representation is associated to the ‘Paper’ and ‘Word’ entities, and the ‘includes’, ‘connects’ and ‘is_similar’ relationships of the scientific knowledge graph
or ‘Institution’ entity. A ‘co_authors’ relationship denotes a research collaboration between the connected ‘Author’ entities. The subgraph constructed of the available ‘Author’ entities and the ‘co_authors’ relationships corresponds to the co-authors’ subgraph. The produced knowledge graph enables the utilization of well-studied graph algorithms, which in turn assists in gaining insights about various tasks, such as finding experts nearby based on the ‘Location’ entities, recommending similar research work, and discovering future research collaborations; this chapter focuses on the last of these tasks. For the discovery of future research collaborations, we employ various link prediction and ML techniques. Particularly, we reduce the problem of predicting future research collaborations to the common binary classification problem. By using a binary classifier, we are able to predict the presence or the absence of a ‘co_authors’ relationship between two ‘Author’ entities, and thus build a
382
N. Giarelis et al.
link prediction algorithm for the discovery of future research collaborations. Available binary classifiers include logistic regression, k-nearest neighbors, linear support vector machines, decision tree, and neural networks [38].
16.4 Experiments For the implementation and evaluation of our approach, we used the Python programming language and the scikit-learn ML library (https://scikit-learn.org). The Neo4j graph database (https://neo4j.com) has been utilized for the representation of the graph-of-docs and the corresponding knowledge graph. The full code, datasets, and evaluation results of our experiments are freely available at https://github.com/imislab/book-chapter.
16.4.1 Cord-19 The COVID-19 Open Research Dataset (CORD-19) [39, 40] contains information about 63,000 research articles, related to COVID-19, SARS-CoV-2 and other similar coronaviruses. It is freely distributed from the Allen Institute for AI and Semantic Scholar (https://www.semanticscholar.org/cord19). The articles in CORD-19 have been collected from popular scientific repositories and publishing houses, including Elsevier, bioRxiv, medRxiv, World Health Organization (WHO) and PubMed Central (PMC). Each scientific article in CORD-19 has a list of specific attributes, namely ‘citations’, ‘publish time’, ‘title’, ‘abstract’ and ‘authors’, while the majority of the articles (51,000) also includes a ‘full text’ attribute. Undoubtfully, the CORD-19 dataset is a valuable source of knowledge as far as the COVID-19-related research is concerned; however, the fact that the majority of the data included is unstructured text renders a set of limitations in its processing. As advocated in the literature, the exploitation of a graph-based text representation in combination with a knowledge graph seems to be a promising step towards structuring this data [4, 5, 41]. For the construction of our scientific knowledge graph, we utilize the ‘abstract’, ‘authors’ and ‘publish time’ attributes of each scientific article. We do not exploit the ‘full text’ attribute due to hardware limitations, however, we assume that the abstract of a paper consists a representative piece of its full text.
16.4.2 Experimental Setup Selection of measures and metrics. To construct the authors similarity subgraph and to populate the edges of the ‘Author’.‘is_similar’ type, we use the
16 Medical Knowledge Graphs in the Discovery …
383
Jaccard similarity index, since it deals only with the percentage of common set of words versus all words, ignoring their document frequency. Construction of datasets for the link prediction problem. To test whether our approach performs well and does not overfit, regardless of the sample size of the dataset, we extract nine different datasets from the original one, corresponding to different volumes of papers (ranging from 1536 to 63,023). For the sample creation, we utilize (i) the authors similarity subgraph, and (ii) the co-authors subgraph (i.e. the subgraph generated from the ‘co_authors’ edges; it is noted that edges also store the year of the first collaboration between authors, as a property). The features of a sample encapsulate either structural or textual characteristics of the whole knowledge graph (e.g. the similarity between the papers of two authors). Furthermore, each sample describes the relationship between two ‘Author’ nodes of the knowledge graph. We consider the classical link prediction algorithms as the baseline methods to be compared against our approach, which they only utilize the structural characteristics of the graph. The features of a sample are analytically described in Table 16.1. Each of the nine datasets consists of a different number of randomly chosen samples. All datasets are balanced, in that the number of positive and negative samples are equal (see Table 16.2). To examine whether the features taken into account each time affect the efficiency of the ML models, we execute a set of experiments with different combinations of selected features (see Table 16.3). Finally, it is noted that the samples for the training subset are selected from an earlier instance in time of the co-authors subgraph, which is created from ‘co_authors’ edges first appeared within or before the year of 2013; respectively, the samples of the testing subset include Table 16.1 A detailed explanation of the features of a sample. Each feature is associated to either a structural or a textual relationship between two given ‘Author’ nodes Feature
Description
adamic_adar
The sum of the inverse logarithm of the degree of the set of Structural common neighbor ‘Author’ nodes shared by a pair of nodes
common_neighbors
The number of neighbor ‘Author’ nodes that are common for a pair of ‘Author’ nodes
preferential_attachment The product of the in-degree values of a pair of ‘Author’ nodes
Type
Structural Structural
total_neighbors
The total number of neighbor ‘Author’ nodes of a pair of ‘Author’ nodes
Structural
similarity
The textual similarity of the graph-of-docs graphs of two‘Author’ nodes. The Jaccard index is used to calculate the similarity
Textual
label
The existence or absence of a ‘co_authors’ edge Class between two ‘Author’ nodes. A positive label (1) denotes the existence, whereas the absence is denoted by a negative label (0)
384 Table 16.2 Number of samples (|samples|) for each dataset (the number of positive and negative samples of the training and testing subsets are fully balanced); a positive sample denotes the existence of a ‘co_authors’ edge between two ‘Author’ nodes, while a negative sample denotes the absence of such an edge
N. Giarelis et al. Training subset |samples| Testing subset |samples| Dataset 1 668
840
Dataset 2 858
1566
Dataset 3 1726
2636
Dataset 4 3346
7798
Dataset 5 5042
12,976
Dataset 6 5296
16,276
Dataset 7 6210
25,900
Dataset 8 8578
34,586
Dataset 9 13,034
49,236
Table 16.3 Combinations of features aiming to test how different set of features affect the performance of an ML model; “top n” indicates the top number of features, extracted from each community of similar documents Combination name
Features included
Structural characteristics and authors similarity top 5 (STR-SIM_top5)
adamic_adar, common_neighbors, preferential_attachment, total_neighbors, similarity_top_5
Structural characteristics and authors similarity top 100 (STR-SIM_top100)
adamic_adar, common_neighbors, preferential_attachment, total_neighbors, similarity_top_100
Structural characteristics and authors similarity top 250 (STR-SIM_top250)
adamic_adar, common_neighbors, preferential_attachment, total_neighbors, similarity_top_250
Structural characteristics (STR-baseline)
adamic_adar, common_neighbors, preferential_attachment, total_neighbors
‘co_authors’ edges created after 2013. This separation in time ensures that we avoid any data leakage between the training and testing subsets [24].
16.4.3 Evaluation To evaluate the effectiveness of our approach, we assess how the performance of various binary classifiers is affected by the similarity features. The list of the binary classifiers considered in this chapter includes: logistic regression (LR), k-nearest neighbors (50NN), linear support vector machines with a linear kernel function (LSVM), support vector machines with a RBF kernel function (SVM), decision tree (DT) and neural networks (NN). To normalize the features from our datasets, we employ the min–max normalization procedure. An extensive list of experiments using various classifiers along with different hyperparameter configurations can be found
16 Medical Knowledge Graphs in the Discovery … Table 16.4 Hyperparameter configurations in Scikit-Learn for each of the utilized binary classifiers; further hyperparameter configurations are described in the Scikit-Learn documentation
Binary classifier
385 Hyperparameter configuration
LR
solver = ‘lbfgs’, multi_class = ‘ovr’
50NN
k = 50, weights = ‘uniform’
LSVM
kernel = ’linear’
SVM
kernel = ‘rbf’
DT
max_depth = 5
NN
solver = ‘adam’, activation = ‘relu’, hidden_layers = 100 × 50
on the GitHub repository of this chapter (https://github.com/imis-lab/book-chapter). The hyperparameter configurations can be also found in Table 16.4. Our performance metrics include the accuracy, precision and recall of the binary classifiers. The obtained results indicate that the inclusion of the similarity features (i) increase the average accuracy, precision and recall scores, and (ii) decrease the standard deviation of the aforementioned scores (Table 16.5). The decrement of the standard deviation in the accuracy score indicates that our approach is reliable regardless of the size of the given dataset. Furthermore, by comparing the average precision score to the average recall score, we conclude that our approach predicts most of the future collaborations correctly. The best average accuracy score is achieved by the LSVM classifier, using the STR-SIM_top100 and STR-SIM_top250 feature combinations. As far as link prediction is concerned, our algorithm differs from existing ones in that it considers both the textual similarity between the abstracts of the papers for each pair of authors and the structural characteristics of the associated ‘Author’ nodes, aiming to predict a future collaboration between them. The utilization of the textual information in combination with the structural information of a scientific knowledge graph results in better and more reliable ML models, which are less prone to overfitting. Contrary to existing algorithms for the discovery of future research collaborations, our approach exploits structural characteristics and does not ignore the importance of the information related to the unstructured text of papers written by authors. Finally, existing approaches that concentrate only on the exploitation of unstructured textual data rely heavily on NLP techniques and textual representations, which in turn necessitate the generation of sparse feature spaces; hence, in such approaches, the effects of the ‘curse-of-dimensionality’ phenomenon re-emerge.
16.5 Conclusions This chapter considers the problem of discovering future research collaborations as a link prediction problem applied on scientific knowledge graphs. The proposed approach integrates into a single knowledge graph both structured and unstructured
DT
SVM
LSVM
50NN
LR
Method
0.9190
0.9193
0.9193
STR-SIM_top250
0.9460
STR-baseline
STR-SIM_top100
0.9516
STR-SIM_top250
STR-SIM_top5
0.9516
STR-SIM_top100
0.9586
STR-baseline
0.9488
0.9593
STR-SIM_top250
STR-SIM_top5
0.9593
0.9555
STR-baseline
STR-SIM_top100
0.9591
STR-SIM_top250
0.9592
0.9591
STR-SIM_top5
0.9587
0.9549
STR-baseline
STR-SIM_top100
0.9555
STR-SIM_top250
STR-SIM_top5
0.9555
STR-SIM_top100
0.8036
0.8036
0.8036
0.8263
0.8691
0.8691
0.8723
0.9310
0.9291
0.9291
0.9323
0.9253
0.9361
0.9361
0.9387
0.9381
0.9381
0.9381
MAX
0.9770
0.9770
0.9770
0.9744
0.9748
0.9748
0.9742
0.9705
0.9718
0.9718
0.9716
0.9753
0.9762
0.9761
0.9754
0.9646
0.9651
0.9651
0.9646
SD
0.0624
0.0624
0.0628
0.0434
0.0302
0.0302
0.0291
0.0120
0.0128
0.0128
0.0117
0.0142
0.0113
0.0113
0.0110
0.0074
0.0076
0.0076
0.0077
0.8753
0.8752
0.8748
0.9146
0.9211
0.9210
0.9162
0.9414
0.9419
0.9418
0.9414
0.9284
0.9346
0.9347
0.9343
0.9551
0.9573
0.9573
0.9563
AVG
MIN
0.9374
AVG
0.9550
STR-SIM_top5
Precision
Accuracy MIN
0.7179
0.7179
0.7179
0.7431
0.7937
0.7937
0.7984
0.8840
0.8810
0.8810
0.8842
0.8717
0.8894
0.8894
0.8935
0.9074
0.9074
0.9074
0.9063
MAX
0.9654
0.9654
0.9654
0.9646
0.9642
0.9639
0.9633
0.9719
0.9735
0.9735
0.9733
0.9653
0.9676
0.9676
0.9673
0.9809
0.9814
0.9814
0.9798
SD
0.0896
0.0896
0.0901
0.0631
0.0475
0.0475
0.0461
0.0261
0.0274
0.0274
0.0262
0.0255
0.0211
0.0211
0.0207
0.0223
0.0224
0.0224
0.0223
0.9938
0.9938
0.9938
0.9908
0.9915
0.9915
0.9915
0.9791
0.9802
0.9802
0.9804
0.9881
0.9880
0.9879
0.9874
0.9554
0.9543
0.9544
0.9545
AVG
Recall
0.9896
0.9896
0.9895
0.9850
0.9862
0.9862
0.9860
0.9691
0.9700
0.9700
0.9698
0.9818
0.9833
0.9833
0.9810
0.9334
0.9358
0.9358
0.9359
MIN
1.0000
1.0000
1.0000
0.9974
0.9974
0.9974
0.9962
0.9929
0.9929
0.9929
0.9949
0.9974
0.9962
0.9962
0.9962
0.9757
0.9757
0.9757
0.9757
MAX
(continued)
0.0038
0.0038
0.0037
0.0033
0.0030
0.0030
0.0029
0.0082
0.0082
0.0082
0.0087
0.0046
0.0037
0.0037
0.0041
0.0131
0.0119
0.0119
0.0118
SD
Table 16.5 Mean (AVG), minimum (MIN), maximum (MAX) and standard deviation (SD) of accuracy, precision and recall metrics per text classifier for each combination of selected features on the nine different datasets.
386 N. Giarelis et al.
0.9314
0.9313
0.9278
STR-SIM_top100
STR-SIM_top250
STR-baseline
0.9219
0.9290
STR-SIM_top5
STR-baseline
0.8008
0.8155
0.8167
0.8116
0.8262
0.9816
0.9803
0.9803
0.9796
0.9788
0.0613
0.0562
0.0558
0.0555
0.0574
0.8881
0.8923
0.8924
0.8885
0.8775
Precision SD
AVG
MAX
AVG
MIN
Accuracy
0.7155
0.7313
0.7326
0.7272
0.7420
MIN
0.9751
0.9691
0.9691
0.9681
0.9671
MAX
0.0871
0.0817
0.0813
0.0806
0.0841
SD
Recall
0.9931
0.9931
0.9931
0.9930
0.9941
AVG
0.9883
0.9894
0.9895
0.9887
0.9894
MIN
0.9987
0.9974
0.9974
0.9974
1.0000
MAX
Bold font indicates the best method for each ML model as far as the mean and the standard deviation value of each individual metric are concerned
NN
Method
Table 16.5 (continued)
0.0031
0.0024
0.0023
0.0025
0.0036
SD
16 Medical Knowledge Graphs in the Discovery … 387
388
N. Giarelis et al.
textual data using the graph-of-docs text representation. For the required experimentations, we generated nine different datasets using the CORD-19 dataset. For evaluation purposes, we assessed our approach against several link prediction settings, which use various combinations of a set of available features. The evaluation results demonstrate state-of-the-art average accuracy, precision and recall of the future collaborations prediction task. However, we expect that these results will be improved through the employment of contextual similarity functions that are based on graph kernels [22]. In any case, our approach has a performance issue, since the time required to build the scientific knowledge graph increases radically with the number of graph nodes. Aiming to address the above limitation, while also enhancing the performance and advancing the applicability of our approach, our future work directions include: (i) the utilization of in-memory graph databases in combination with Neo4j; (ii) the experimentation with word, node and graph embeddings [42–44],(iii) the integration of other scientific research graphs such as OpenAIRE [45] and Microsoft Academic Graph [46], and (iv) the integration and meaningful exploitation of our approach into collaborative research environments [47]. As far as the CORD19 dataset is concerned, it is worth noting here that it is increasingly explored nowadays in the investigation of various research topics. For instance, Colavizza et al. [48] attempt to produce a scientific overview of the dataset by employing various approaches such as a statistical analysis of the dataset’s metadata, unsupervised key-phrase extraction, supervised citation clustering, and LDA topic modelling. Papadopoulos et al. [49] aim to visualize in a graph various triplet fact in the form of subject-predicate-object (i.e. a knowledge graph approach). They achieve their research goal by combing a set of pre-trained BERT models and keyword extraction tools. Guo et al. [50] aim to augment the task of semantic textual similarity (STS) by producing an ad-hoc dataset (namely, CORD19STS, which aims to alleviate the poor performance of generalized STS models by fine-tuning a BERT-like deep learning language model. Finally, Wang et al. [39, 40] introduce a weakly supervised Named Entity Recognition model by optimizing pre-trained spaCy models (https:// spacy.io/), ranging from general English language to domain specific biology terms in English. Acknowledgements The work presented in this chapter is supported by the OpenBio-C project (www.openbio.eu), which is co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH—CREATE—INNOVATE (Project id: T1EDK- 05275).
References 1. D. Nathani, J. Chauhan, C. Sharma, M. Kaul, Learning attention-based embeddings for relation prediction in knowledge graphs, in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL) (2019), pp. 4710–4723
16 Medical Knowledge Graphs in the Discovery …
389
2. S. Vahdati, G. Palma, R.J. Nath, C. Lange, S. Auer, M.E. Vidal, Unveiling scholarly communities over knowledge graphs, in International Conference on Theory and Practice of Digital Libraries (Springer, Cham, 2018), pp. 103–115 3. B. Ponomariov, C. Boardman, What is co-authorship? Scientometrics 109(3), 1939–1963 (2016) 4. Q. Wang, Z. Mao, B. Wang, L. Guo, Knowledge graph embedding: a survey of approaches and applications. IEEE Trans. Know. Data Eng. 29(12), 2724–2743 (2017) 5. N. Veira, B. Keng, K. Padmanabhan, A. Veneris, Unsupervised embedding enhancements of knowledge graphs using textual associations, in Proceedings of the 28th International Joint Conference on Artificial Intelligence (AAAI Press 2019), pp. 5218–5225 6. N. Giarelis, N. Kanakaris, N. Karacapilidis, An innovative graph-based approach to advance feature selection from multiple textual documents, in IFIP International Conference on Artificial Intelligence Applications and Innovations (Springer, Cham, 2020a), pp. 96–106 7. N. Giarelis, N. Kanakaris, N. Karacapilidis, On a novel representation of multiple textual documents in a single graph, in Intelligent Decision Technologies 2020—Proceedings of the 12th KES International Conference on Intelligent Decision Technologies (KES-IDT-20), ed. by I. Czarnowski, R.J. Howlett, L.C. Jain Split (Croatia, Springer, 2020b) 8. N. Giarelis, N. Kanakaris, N. Karacapilidis, On the utilization of structural and textual information of a scientific knowledge graph to discover future research collaborations: a link prediction perspective, in Proceedings of the 23rd International Conference on Discovery Science (DS 2020), ed. by A. Appice, G. Tsoumakas, Y. Manolopoulos and S. Matwin, vol. 12323 (Springer, Cham, LNAI, 2020c), pp. 437–450 9. Á. Vathy-Fogarassy, J. Abonyi, Graph-Based Clustering and Data Visualization Algorithms (Springer, London, 2013) 10. S. Li, J. Huang, Z. Zhang, J. Liu, T. Huang, H. Chen, Similarity-based future common neighbors model for link prediction in complex networks. Sci. Rep. 8, 1–11 (2018) 11. R. Albert, A. Barabási, Statistical mechanics of complex networks. ArXiv, cond-mat/0106096 (2001) 12. L.A. Adamic, E. Adar, Friends and neighbors on the Web. Soc. Networks 25, 211–230 (2003) 13. P. Jaccard, Étude comparative de la distribution florale dans une portion des Alpes et des Jura. Bull. Soc. Vandoise Sci. Nat. 37, 547–579 (1901) 14. F. Rousseau, M. Vazirgiannis, Graph-of-word and TW-IDF: new approach to ad hoc IR, in Proceedings of the 22nd ACM International Conference on Information & Knowledge Management (ACM Press, 2013), pp. 59–68 15. F. Rousseau, E. Kiagias, M. Vazirgiannis, Text categorization as a graph classification problem, in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1 (2015), pp. 1702–1712 16. X. Yan, J. Han, gspan: Graph-based substructure pattern mining, in Proceedings of the IEEE International Conference on Data Mining (IEEE Press, 2002), pp. 721–724 17. S. Nijssen, J.N. Kok, A quickstart in frequent structure mining can make a difference, in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM Press 2004), pp. 647–652 18. H. Saigo, S. Nowozin, T. Kadowaki, T. Kudo, K. Tsuda, gBoost: a mathematical programming approach to graph classification and regression. Mach. Learn. 75(1), 69–89 (2009) 19. K. Henni, N. Mezghani, C. Gouin-Vallerand, Unsupervised graph-based feature selection via subspace and PageRank centrality. Expert Syst. Appl. 114, 46–53 (2018) 20. S. Fakhraei, J. Foulds, M. Shashanka, L. Getoor, Collective spammer detection in evolving multi-relational social networks, in Proceedings of the 21 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2015), pp. 1769–1778 21. D. Ienco, R. Meo, M. Botta, Using page rank in feature selection, in SEBD (2008), pp. 93–100 22. G. Nikolentzos, G. Siglidis, M. Vazirgiannis, Graph Kernels: a survey. arXiv preprint arXiv: 1904.12218 (2019)
390
N. Giarelis et al.
23. G. Siglidis, G. Nikolentzos, S. Limnios, C. Giatsidis, K. Skianis, M. Vazirgianis, Grakel: a graph kernel library in python. arXiv preprint arXiv:1806.02193 (2018) 24. D. Liben-Nowell, J.M. Kleinberg, The link-prediction problem for social networks. J. Am. Soc. Inf. Sci. 58, 1019–1031 (2007) 25. R. Guns, R. Rousseau, Recommending research collaborations using link prediction and random forest classifiers. Scientometrics 101(2), 1461–1473 (2014) 26. J. Huang, Z. Zhuang, J. Li, C.L. Giles, Collaboration over time: characterizing and modeling network evolution, in Proceedings of the 2008 International Conference on Web Search and Data Mining (2008), pp. 107–116 27. Y. Sun, R. Barber, M. Gupta, C.C. Aggarwal, J. Han, Co-author relationship prediction in heterogeneous bibliographic networks, in 2011 International Conference on Advances in Social Networks Analysis and Mining (IEEE, 2011), pp. 121–128 28. Q. Yu, C. Long, Y. Lv, H. Shao, P. He, Z. Duan, Predicting co-author relationship in medical co-authorship networks. PloS one 9(7), e101214 (2014) 29. P.M. Chuan, M. Ali, T.D. Khang, N. Dey, Link prediction in co-authorship networks based on hybrid content similarity metric. Appl. Intell. 48(8), 2470–2486 (2018) 30. M. Fire, L. Tenenboim-Chekina, O. Lesser, R. Puzis, L. Rokach, Y. Elovici, Link prediction in social networks using computationally efficient topological features, in 2011 IEEE Third Int’l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int’l Conference on Social Computing (2011), pp. 73–80 31. K. Julian, W. Lu, Application of machine learning to link prediction (2016) 32. G. Panagopoulos, G. Tsatsaronis, I. Varlamis, Detecting rising stars in dynamic collaborative networks. J. Infor. 11, 198–222 (2017) 33. H. Lu, M. Halappanavar, A. Kalyanaraman, Parallel heuristics for scalable community detection. Parallel Comput. 47, 19–37 (2015) 34. U.N. Raghavan, R. Albert, S. Kumara, Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007) 35. A. Monge, C. Elkan, An efficient domain-independent algorithm for detecting approximately duplicate database records (1997) 36. S. Fortunato, Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010) 37. Z. Yang, R. Algesheimer, C.J. Tessone, A comparative analysis of community detection algorithms on artificial networks. Sci. Rep. 6, 30750 (2016). https://doi.org/10.1038/srep30750 38. C.C. Aggarwal, Machine Learning for Text. Springer International Publishing (2018) 39. L. Wang, K. Lo, Y. Chandrasekhar, R. Reas, J. Yang, D. Eide, P. Mooney, CORD-19: The Covid-19 Open Research Dataset. arXiv preprint arXiv:2004.10706 (2020) 40. X. Wang, X. Song, B. Li, Y. Guan, J. Han, Comprehensive Named Entity Recognition on CORD-19 with Distant or Weak Supervision. arXiv preprint arXiv:2003.12218 (2020) 41. Z. Wang, J. Li, Z. Liu, J. Tang, Text-enhanced representation learning for knowledge graph, in Proceedings of International Joint Conference on Artificial Intelligent (IJCAI) (2016), pp. 4–17 42. W. Hamilton, Z. Ying, J. Leskovec, Inductive representation learning on large graphs, in Advances in Neural Information Processing Systems (2017), pp. 1024–1034 43. T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems (NeurIPS) (2013), pp. 3111–3119 44. G. Nikolentzos, P. Meladianos, M. Vazirgiannis, Matching node embeddings for graph similarity, in Thirty-First AAAI Conference on Artificial Intelligence (2017) 45. P. Manghi, C. Atzori, A. Bardi, J. Shirrwagen, H. Dimitropoulos, La Bruzzo, S.F. Summan, OpenAIRE Research Graph Dump (Version 1.0.0-beta) . Zenodo. (2019) 46. S. Arnab, S. Zhihong, H.M. Yang Song, B.H. Darrin Eide, W. Kuansan, An overview of microsoft academic service (MAS) and applications, in Proceedings of the 24th International Conference on World Wide Web (WWW ’15 Companion). (ACM, New York, NY, USA, 2015), pp. 243–246
16 Medical Knowledge Graphs in the Discovery …
391
47. A. Kanterakis, G. Iatraki, K. Pityanou, L. Koumakis, N. Kanakaris, N. Karacapilidis, G. Potamias, Towards reproducible bioinformatics: The OpenBio-C Scientific Workflow Environment. in Proceedings of the 19th IEEE International Conference on Bioinformatics and Bioengineering (BIBE) (Athens, Greece, 2019), pp. 221–226 48. G. Colavizza, R. Costas, A. Traag, N. van Eck, T. van Leeuwen, L. Waltman, A Scientometric overview of CORD-19. bioRxiv preprint (2020) 49. D. Papadopoulos, N. Papadakis, A. Litke, A methodology for open information extraction and representation from large scientific corpora: the CORD-19 data exploration use case. Appl. Sci. 10, 5630 (2020) 50. X. Guo, H. Mirzaalian, E. Sabir, A. Jaiswal, W. Abd-Almageed, CORD19STS: COVID-19 Semantic Textual Similarity Dataset. arXiv preprint arXiv:2007.02461 (2020)
Chapter 17
Biometric System De-identification: Concepts, Applications, and Open Problems Md. Shopon, A. S. M. Hossain Bari, Yajurv Bhatia, Pavan Karkekoppa Narayanaswamy, Sanjida Nasreen Tumpa, Brandon Sieu, and Marina Gavrilova Abstract This chapter advances information security research by integrating privacy concepts with the most recent biometric developments. Analytical discussions on how physiological, behavioral and social behavioral biometric data can be protected in various authentication applications will be presented. The chapter starts with introducing new de-identification classification, including complete deidentification, soft biometric preserving de-identification, soft biometric preserving utility retained de-identification, and traditional biometric de-identification. It then proceeds to introduce additional types of de-identification, related to emerging biometric research domains which include social behavioral biometrics, aesthetic identification, sensor-based biometrics, spatial and temporal patterns, and psychological user profiles. This chapter also provides some insights into current and emerging research in the multi-modal biometric domain and proposes for the first time multimodal biometric system de-identification based on deep learning. It concludes with formulating open questions and investigating future directions in this vibrant research field. Answers to those questions will assist not only in the establishment of the new methods in the biometric security and privacy domains, but also provide insights into the future emerging topics in big data analytics and social network research. Keywords Human identity · Security · Behavioral patterns · De-identification · Privacy · Risk analysis and assessment · Social behavioral biometrics
17.1 Introduction We live in a deeply interconnected society where aspects of someone’s personal and social life, professional affiliations, shopping profiles, hobbies and interests become increasingly reliant on each other. A notable example where various facets of life Md. Shopon · A. S. M. Hossain Bari · Y. Bhatia · P. K. Narayanaswamy · S. N. Tumpa · B. Sieu · M. Gavrilova (B) Department of Computer Science, University of Calgary, Calgary, AB, Canada e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 C.-P. Lim et al. (eds.), Handbook of Artificial Intelligence in Healthcare, Intelligent Systems Reference Library 212, https://doi.org/10.1007/978-3-030-83620-7_17
393
394
Md. Shopon et al.
become more prominent is our cyberworld profiles, or so-called online identities. In recent times, areas such as decision-making, decision fusion, artificial intelligence, pattern recognition and biometrics, now have a dominant presence in the data mining domain. Extensive studies have been dedicated to intelligent methods and information fusion techniques in the information security domain [1, 2]. Recent advancements in machine learning and deep learning allow never before explored opportunities to extract new knowledge from the publicly available data [3], thus presenting new threats to user privacy. This chapter investigates how existing biometric multi-modal systems can be advanced by integrating de-identification with other types of auxiliary information that may be available directly or indirectly to the biometric system. Analytical discussions on how this information can be extracted and thus de-identified to protect user’s privacy will be presented. This chapter also provides some insights into current and emerging research in the biometric domain, formulates open questions and investigates future directions in this vibrant research field. The answers to these questions will assist not only in the establishment of new methods in the biometric security and privacy domains, but also provide insights into the future emerging research topics in our deeply interconnected world.
17.2 Literature Review and Classification of Biometric De-identification Privacy is an essential social and political issue in our society, characterized by a large range of enabling and supporting technologies and systems. Amongst these are multimedia, big data, biometrics, communications, data mining, internet, social networks, and audio–video surveillance [4]. The New Yorker magazine stated in an article regarding data privacy: “It has become apparent in the past year, we don’t really know who is seeing our data or how they’re using it. Even the people whose business it is to know don’t know” [5]. De-Identification is one of the primary methods for protecting privacy in multimedia contents [6]. It is a process for removing personal identifiers, modifying or replacing them by substitution [4]. However, despite the pressing need for methodologies that can protect personal privacy while ensuring adequate biometric trait recognizability, de-identification has never been accentuated in biometric research. Due to the large variety of de-identification mechanisms and their variations, there are a large number of definitions that can be found for de-identification. For instance, Meden et al. [7] defined de-identification as following: “De-identification is defined as the process of concealing personal identifiers or replacing them with suitable surrogates in personal information in order to prevent the disclosure and use of data for purposes unrelated to the purpose for which the data was originally collected”. Whereas Nelson et al. [8] proposed the following definition: “Deidentification refers to the reversible process of removing or obscuring any personally
17 Biometric System De-identification: Concepts, Applications…
395
identifiable information from individual records in a way that minimizes the risk of unintended disclosure of the identity of individuals and information about them. It involves the provision of additional information to enable the extraction of the original identifiers by, for instance, an authorized body”. While the main goal of deidentification remains unchanged from protecting user’s privacy, the form the process takes is strikingly different depending on the application domain or the commercial value of the designed system. In the subsequent chapters, we will explore in more depth the differences among de-identification methodologies, create a taxonomy of de-identification methods, and introduce new classes of de-identification based on auxiliary biometric features. While research into de-identification has emerged over the past decade, there has been no consistent way to classify different approaches and to reconcile various understandings of de-identification. In this book chapter, we create a classification of biometric de-identification into four categories. We separate the classes by the biometric type and the ability of a computer or a human to identify a subject. In the following, we refer to traditional biometric being a specific biometric trait, either physiological or behavioral (face, iris, gait, fingerprint). Soft biometrics typically include age, gender, ethnicity, height, weight etc. • Complete De-identification: Complete de-identification refers to a process where the biometric modality of a person is entirely de-identified, for instance by being fully masked or obscured [9]. Neither the identity of a person based on this biometric modality nor soft biometrics of the de-identified person can be recognized. This is typically true for both human identification through visual inspection or a more common computer-based identification. There is a problem known as “a pair-wise constraint” identification [10], which means that people can determine that two de-identified faces in a video belong to the same individual by using hairstyle, clothing, dress style or other soft biometric features as alternative information. Thus, in addition to traditional biometric de-identification, soft biometric de-identification is also necessary. Such a de-identification method is generally used in the mass media or police video footages, where sensitive information needs to be hidden. For instance, news channels may release video footages of violent activity, which needs to be hidden from the general audience before showing those video footages over news channels. • Soft Biometric Preserving De-identification: In this method, usually a particular biometric feature is de-identified while the soft biometric trait will remain distinguishable [11]. The purpose of these de-identification methods is to remove the ability to identify a person using the given biometric while still retaining soft biometric traits. Neither a machine nor a human can identify the de-identified person. This approach allows, for example, the creation of a video of a similarlooking person, such that the perceived identity is changed. This makes it possible for a user to leave a natural-looking video message in a public forum in an anonymous way, that would prevent face recognition technology from identifying an individual.
396
Md. Shopon et al.
Fig. 17.1 Four types of biometric de-identification
• Soft Biometric Preserving, Utility Retained De-identification: In this method, soft biometrics are preserved, the traditional biometric is obscured, however the biometric system will still be able to identify that person using the obscured traditional biometric. The soft biometric will still be preserved, similar to the previous previous category. It will be significantly more challenging for the individual’s identity to be recognized by humans. • Traditional Biometric Preserving De-identification: In this method, only the soft biometrics are obscured, while the traditional biometrics are preserved. For example, in a video the face of an individual remains the same while the type of clothing or hair color is changed. Figure 17.1 depicts the proposed classification of biometric de-identification. Note that the typical biometric identification belongs to the classification: Traditional biometric: preserved, Soft biometric: preserved, Identifiable biometrics: traditional and soft. In addition to the above-mentioned categories, the de-identification methods can have the following characteristics: • Reversible De-identification: In reversible de-identification, the system needs to be developed in a way such that the modified biometrics can be reversed to its original form. • Irreversible De-identification: In irreversible de-identification, the transformation is developed intentionally not to be reversible. Once the data is modified, it cannot be reversed back to its original identity. Recent developments in the domain of biometrics have expanded our traditional understanding of biometric traits from physiological and behavioral to social, temporal, emotional, and other auxiliary traits. Thus, all of the above categories can be enhanced with additional emerging biometrics such as social behavioral biometrics, temporal and spatial biometrics, emotion-based biometrics, psychological traits, online acquittances, communication style and other auxiliary information that can be mined from a biometric sample (an image, a video or a signal). For instance, a
17 Biometric System De-identification: Concepts, Applications…
397
Fig. 17.2 Four types of the proposed auxiliary biometric de-identification
hair color, type of shoes, outdoor conditions, a style of clothing or even a person’s mental state can be considered an auxiliary biometric. We therefore propose the new categories of biometric de-identification based on expanding the notion of soft biometrics to include all of the aforementioned auxiliary biometrics: • • • •
Complete de-identification Auxiliary biometric preserving de-identification Auxiliary biometric preserving utility retained de-identification Traditional biometric preserving de-identification.
Summary of the newly proposed biometric de-identification categories can be seen in Fig. 17.2.
17.3 New Types of Biometric De-identification We now introduce additional types of de-identification, related to emerging biometric research domains. These include social behavioral biometrics, aesthetic identification, sensor-based biometrics, spatial, and temporal patterns, and psychological user profiles.
17.3.1 Sensor-Based Biometric De-identification 17.3.1.1
Definition and Motivation
Sensor-based biometric de-identification can be defined as the introduction of perturbation in sensor-based biometric data to obfuscate either both traditional and auxiliary biometric or either of them. This definition can be further extended to obscure the
398
Md. Shopon et al.
spatial and temporal biometric features to achieve biometric de-identification. RGB cameras, wearable sensors such as accelerometer, gyroscope, etc., a marker-based sensor such as Vicon, and a marker-less sensor such as Kinect are commonly applied to capture motions of the body joints to analyze behavioral biometrics of humans. In the biometric de-identification domain, proposed complete de-identification, auxiliary biometrics preserving de-identification, auxiliary biometrics preserving utilityretained de-identification, traditional biometric preserving de-identification can be designed considering gait as a primary behavioral biometric and the estimation of age, gender, and activity as auxiliary biometrics using sensor-based de-identification. Furthermore, spatial and temporal features extracted over the gait sequence act as the distinguishing characteristics for the identification of primary and auxiliary biometrics. Thus, the unique methodology is required to perturb the spatial and/or temporal features results in sensor-based biometric de-identification. Widespread deployment of sensors in both indoor and outdoor settings results in the application development for biometric identification and verification, estimation of auxiliary information, sports, and assessment of patients in the healthcare sector [12, 13]. In many cases, multi-modal architecture is employed to enhance the biometric system identification accuracy [14, 15]. Conversely, remote data acquisition using sensors eventuates in a compromise of privacy regarding sensitive primary and auxiliary biometric traits. Concealment of biometric traits and/or classifying features using any of the proposed biometric de-identification methods can ensure privacy and protect private data. The next section looks at how the above concepts can be applied to the sensor-based biometric de-identification.
17.3.1.2
Background on Sensor-Based Behavioral Biometric
Gait-based biometric is one of the traditional behavioral biometrics for the identification of a person where the gait sequence is captured remotely without the subject’s cooperation using a multitude of sensors [16]. The literature on the identification of a person from the gait sequence is abundant. The GaitSet [17] deep learning architecture was proposed for gait recognition from different viewpoints using an RGB camera sensor. Furthermore, Bari and Gavrilova [18] proposed an artificial neural network for person identification using the skeleton-based gait sequence acquired using the Microsoft Kinect sensor. The inertial sensor of the smartphone was utilized for the access control of the usage of a smartphone using gait analysis [19]. Zou et al. [20] designed a robust deep neural network in a combination of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network to extract spatial and temporal patterns for gait recognition using smartphone-based inertial sensors. To acquire auxiliary biometrics from the gait sequences, a large population dataset was released for the estimation of soft biometric such as age by analyzing gait sequence [21]. Li et al. [22] applied a support vector machine to estimate the age group and estimated age further incorporating regression analysis. Subsequently, a generative adversarial deep learning architecture was developed to improve the performance of age estimation when accessories were carried [23]. To estimate gender from a
17 Biometric System De-identification: Concepts, Applications…
399
wearable sensor-based gait sequence, a comparative study was performed in [24] evaluating the performance of the features extracted using deep learning-based architecture and handcrafted features. In addition, Tang et al. [25] proposed a light-weight CNN architecture, and Li et al. [26] designed a hybrid network of CNN and LSTM architectures with the domain-specific handcrafted features to identify indoor and outdoor human activities using wearable sensors.
17.3.1.3
Adaptation of Existing Methods for De-identification
Both primary biometric identification and auxiliary biometric estimation can be accomplished with an appropriate design of architecture. Therefore, perturbation needs to be introduced to ensure the obscureness of either primary biometric or auxiliary biometric or both to support biometric de-identification. Prior research demonstrated the obscuring of auxiliary data by introducing deep learning-generated neural style while preserving primary biometric traits [27]. Obscuring auxiliary biometrics, such as age, gender, activity, and emotion, while confirming the identification of a person using their gait can be a future work in sensor-based biometric de-identification. Additionally, the perturbation in gait patterns or synthetically generated gait patterns to obscure the gait-based identification while preserving the auxiliary biometric can be another future direction of the research. The performance of the de-identification methods of each of the future works can be evaluated using the established primary and auxiliary biometric identification and estimation methodologies. The methods to identify the primary biometric or estimate the auxiliary biometric traits are available in the literature. The original data is considered without modification for the identification and estimation of primary and auxiliary behavioral biometric. To find out the results of perturbation in sensor-based biometric data, a deep learning-based approach such as Generative Adversarial Network (GAN) [28], can be applied where the generator architecture of GAN will be responsible for the perturbation and the discriminator architectures will handle identifying and estimating the primary and auxiliary biometric traits, correspondingly. To accommodate sensor-based behavioral biometric de-identification, the architecture of the proposed sensor-based behavioral biometric de-identification is shown in Fig. 17.3. Based on the target de-identification method, the identification of the primary biometric, and the estimation of the auxiliary biometric will be minimized or maximized. If minimization is chosen for either biometric, the corresponding biometric will be obscured whereas maximization will result in preserving the corresponding biometric.
400
Md. Shopon et al.
Fig. 17.3 Proposed architecture of sensor-based biometric de-identification
17.3.2 Emotion-Based De-identification 17.3.2.1
Definition and Motivation
To discuss de-identification in emotion recognition, we must first define what emotion recognition is and which biometrics are relevant for identifying emotions. Emotion recognition is described as identifying emotions either in discrete categories or on a continuous spectrum from biometric data; consequently, there are many ways to do so. Emotion recognition has been readily implemented using a person’s facial expressions and such approaches have often proven to be effective. Additionally, voice is another biometric capable of expressing human emotions [29, 30], hence it is used for emotion recognition. This is more popularly known as Speech Emotion Recognition and has also been comprehensively researched in [31]. Naturally, textual data written by humans is also expressive of their underlying emotions and has been utilized for emotion recognition in studies like [32]. Recently, the research on emotion recognition has broadened to include biometrics, such as manner of walking (Gait), Electroencephalogram (commonly known as EEG), and Electrocardiogram (ECG). Emotion-based de-identification refers to biometric de-identification while recognizing emotions. Although this domain is very particular and new, it holds high significance in years to come. Industries use emotion recognition primarily to improve the experience of their users. On the contrary, users are concerned with protecting their
17 Biometric System De-identification: Concepts, Applications…
401
privacy. Introducing de-identification in this field creates a secure trade between the two by providing utility to the businesses without compromising user privacy.
17.3.2.2
Background on Emotion-Based De-identification
Although research in de-identification is recently emerging, some remarkable works have already been published that lay down the path for future research. For instance, the authors of [33] discussed Naıve approaches, such as blurring, pixelization, blindfolding and inversion of the face images before proposing the novel method to deidentify faces while only retaining the eyes and the mouth from the original image. The authors presented adaptive filtering to smooth the facial details to the point that the software-based authentication rate fell to approximately half of the original and the human recognition rate was as low as 24%. While there is no quantitative analysis of emotion preservation in this work, the authors claimed that all the expressions in the images are well preserved. Lastly, this method hides the soft biometrics and the actual facial features that are used for recognition. Therefore, the implementation can be classified as auxiliary biometrics preserving de-identification. Building on the results of the work discussed above, the authors of [34] masked original faces with “donor” faces to de-identify an image of the original person. The results show that emotions such as disgust, surprise and neutral are preserved 100% of the time, while anger and sadness are preserved more than 98% of the time. Lastly, fear and happiness are preserved only 79% of the time. Similarly, other works like [35, 36] use Generative Neural Networks (GNNs) to mask original faces using donor faces while preserving emotion. The previous research aimed to preserve emotion while concealing identities, however, there are some researches that aim to achieve the opposite. To elaborate, the authors of [37] used Cycle Generative Adversarial Networks (Cycle GANs) to transform a person’s voice to hide emotions but retain the ability for personal identification and speech recognition. Thus, this work can hence be classified as auxiliary biometrics preserving utility retained de-identification. Here, the utility is the identification of the individual. Since de-identification is a relatively new domain and yet to be explored thoroughly, not many pieces of research exist on de-identification on gait, EEG, or ECGbased data. Therefore, the next subsection discusses the possible emotion-based de-identification research options.
17.3.2.3
Adaptation of Existing Methods for De-identification
Biometrics such as Gait, EEG and ECG are gaining popularity for the emotion recognition problem, and also being researched for personal identification [38–40]. Since these biometrics have not been extensively researched as compared to face, the de-identification of these biometrics has rarely been conducted. The particular biometric features that play a vital role in person identification are still uncertain, and
402
Md. Shopon et al.
hence not many have attempted to hide those features. In [41, 42], novel techniques to identify the most significant gait features for emotion recognition were proposed. The general flowchart of such systems is presented in Fig. 17.4. Such works can be extended to learn important features required for gait-based person identification. Therefore, the features exclusively important for identification can be suppressed to achieve de-identification. Additionally, works of [18] produce a representative feature vector of the inputted gait, hence can be explored to produce a feature vector without personally identifiable features. In [38], the researchers proposed a method for person identification through gait videos. The work first assigns weight to different height-wise sections of the body based on the similarity of appearance to the original registration gait sample. Then, a cumulative similarity score of the matched sections is produced between Fig. 17.4 Framework of the proposed emotion recognition method
17 Biometric System De-identification: Concepts, Applications…
403
the registered sample and the verification sample. Since different appearances introduce varied gait patterns, this work was aimed to handle gait sequence with different appearances of a person wearing jackets, holding a bag, etc. Hence, one approach to de-identify persons in gait can be to alter the appearance of a person by adding fake paraphernalia using GNNs. This might preserve the original gait information for emotion recognition while perturbing the soft biometric traits. However, this also poses a requirement for robust gait-based emotion recognition methods. As a result, future identification systems might also adapt the robustness discussed for the pose extraction method and become unaffected by the added fake appearances.
17.3.3 Social Behavioral Biometrics-Based De-identification 17.3.3.1
Definition and Motivation
As a social being, people communicate with each other through offline and online social interactions. According to the definition of Social Behavioral Biometrics (SBB), these social interactions possess many unique features that can be used as the persons’ biometric signature [43]. Over recent years, online social platforms have become highly popular for communication, making their users leave behavioral trails in terms of their shared contents and interactions. These behavioral trails can be used to identify individuals. Therefore, privacy preservation of these person-identifiable footprints is required to increase the users’ privacy. The concept of SBB-based deidentification can help in this regard. SBB-based de-identification hides the original SBB traits to prevent a person’s identity from being revealed.
17.3.3.2
Background on Social Behavioral Biometrics
The concept of Social Behavioral Biometrics (SBB) was introduced by Sultana et al. in 2014 [43], which is relatively new compared to other traditional biometrics, such as the face, fingerprint, voice, gait, etc. In [44], Sultana et al. proposed an SBB system based on the network information of the Twitter users. The weighted networks are generated from the shared URLs, hashtags, retweeted, replied acquaintances, and the tweeting pattern of the users. Li et al. proposed a user identification method across social networks based on the k-hop (k > 1) friendship networks considering the uniqueness and faking difficulty of friendship networks [45]. Brocardo et al. proposed a method using the Gaussian-Bernoulli deep belief network to capture the writing style of the users obtained from the lexical, syntactic, and application-specific features for continuous user authentication of Twitter [46, 47]. Tumpa and Gavrilova proposed an SBB system using users’ linguistic profiles [48]. The authors have used vocabulary sets of users to identify them. Another SBB system is proposed by the same researchers combining linguistic and temporal profiles with the reply, retweet, shared weblink, and trendy topic networks generated
404
Md. Shopon et al.
Fig. 17.5 Workflow diagram of a social behavioral biometrics system
from users’ interactions of Twitter [49]. The authors used a genetic algorithm and the weighted sum rule algorithm for fusing the SBB traits. Figure 17.5 demonstrates the workflow of a SBB System.
17.3.3.3
Adaptation of Existing Methods for De-identification
Social Behavioral Biometrics (SBB) de-identification is a novel concept introduced in this chapter. The proposed classification of de-identification, namely, complete
17 Biometric System De-identification: Concepts, Applications…
405
de-identification, auxiliary biometrics preserving de-identification, auxiliary biometrics preserving utility retained de-identification, and traditional biometric preserving de-identification can be adapted with SBB. The linguistic profiles, retweet, reply, hashtag, and URL networks of the users can be considered the traditional biometrics, while the sentiment of the tweets, tweet emotions, and users’ tweeting patterns can be considered the auxiliary biometrics. For complete de-identification, all traditional and auxiliary SBB features must be obscured or masked. For example, one of the traditional SBB features is linguistic profiles. The linguistic profile of a user can be masked by hiding the writing style of a user, which will also change the sentiment and emotion of the written contents. Thus, both traditional and auxiliary features are obscured. In the case of auxiliary biometrics preserving de-identification, the sentiments of a user’s tweets can be preserved while changing the vocabularies of the tweets. The identity of the user cannot be identified using the traditional biometric, namely, linguistic profile as this profile depends on the user’s vocabulary for identification. However, the tweets will deliver the same messages with exact sentiments as the auxiliary biometrics are preserved. If the tweets of a user can be changed in such a way that a machine will be able to retrieve the original tweets but a human cannot, then this de-identification will be considered as auxiliary biometrics preserving utility retained de-identification. For the traditional biometric preserving de-identification, the sentiment from a tweet can be removed so that others will get the information expressed in the tweet but will not understand the sentiment of the user from that tweet. The examples are discussed considering linguistic profile as traditional biometric and sentiment as auxiliary biometric. A similar idea can be applied considering reply, retweet, URL, or hashtag network as traditional and tweeting behavior or emotion as auxiliary biometrics. The deidentification of SBB systems will help to preserve the privacy of the users without interrupting the legal use of information.
17.3.4 Psychological Traits-Based De-identification 17.3.4.1
Definition and Motivation
Online social networking (OSN) platforms have evolved to become important extensions of the social fabric. Platforms such as LinkedIn, Facebook, Instagram, and Twitter emulate various avenues of everyday social interactions within the professional, personal, and public realms of our society. Social behavioral patterns provide important biometric cues and hold discriminative capability with regards to an individual’s identity [50]. The emerging area of social behavioral biometrics aims to model distinguishing characteristics that manifest within a subject’s soft-biometric traits such as the patterns in their behaviors, social interactions, and communications. Incidentally, personality traits models have been used extensively by clinical psychologists to study the underlying factors influencing an individual’s behavioral patterns [51]. While the users’ personality traits have been shown to influence the
406
Md. Shopon et al.
language used to express themselves and the structure of their social network [52], this concept can be applied to the domain of social behavioral biometric recognition. In the interest of protecting user’s privacy on OSN platforms, it is important to study the de-identification of personality traits from social network data. Social network data collected for the purpose of user identification may also contain information regarding the users’ psychological traits. Moreover, the psychological traits information may also be an essential aspect of the user identification system. In such a scenario, psychological traits-based de-identification refers to the manipulation and storage of social network data in such a way that the personality traits information of users is obfuscated from the stakeholders in a user identification system development process and third parties, while preserving the social behavioral user recognition capability from the data.
17.3.4.2
Background on Psychological Traits De-identification
Patterns in social media activity and the contents of social media posts can be analyzed to predict the user’s psychological traits. Research has also suggested that personality expressed through OSN platforms can represent unique, permanent, and predictive models of human behavior which can further be used for soft-biometric recognition [53]. Automated systems that classify the psychological traits of individuals via social network data use two prevalent personality scales: the Big-five model and the Myers-Briggs Type Indicator (MBTI). The groundwork for applying personality traits-aware social computing systems within the domain of social intelligence and cybersecurity was first established by Wang et al. in 2007 [54]. They show that the semantic characteristics of an individual’s online language can reflect their underlying psychological state. Moreover, recent advances in the field of Natural Language Processing (NLP) have produced powerful language models that can rapidly extract rich feature sets from textual data to be used for further classification tasks [55]. Using such distributed representations of text has shown to be instrumental in deciphering authors’ psychological traits from a relatively short corpus of their posts on OSN platforms [56]. A classifier trained to predict users’ psychological traits based on the discussed models can embed information about the user’s personality traits in the low-dimensional representation of data [57].
17.3.4.3
Adaptation of Existing Methods in Psychological Traits-Based Social Biometrics
The context for social behavioral biometric user recognition on Twitter was first established in [58], where count-based metric, such as Term Frequency-Inverse Document Frequency (TF-IDF), was used to extract characteristic features from tweets. They demonstrated that the TF-IDF measure can be applied on the number of occurrences of replies and retweets, to denote it as a friendship network. The occurrences of common sets of URLs and hashtags shared between users can be
17 Biometric System De-identification: Concepts, Applications…
407
considered as the contextual profile. In addition, temporal behaviors can also be extracted by analyzing a user’s posting patterns over time and developed a realvalued representation of the user profile [44]. Follow-up studies aimed at closed-set user recognition on OSNs focus on user tweets and rely on linguistic and stylistic signals [59]. Recently, many neural networks have been trained to efficiently learn representations of graph data to be further used for tasks such as node classification, link prediction, and user identity linkage across social media platforms. These ideas allow for the formalization of a generic and reusable methodology for utilizing textual data from various types of online social networks to predict the users’ personality traits and to further discern user identity. Using Twitter as an example of a social networking platform, a schematic diagram of a generic psychological traits-aware user identification system is shown in Fig. 17.6. The first step is to convert textual, image, and/or graph data into real-valued vector representations to be processed by the subsequent individual component such as psychological traits classification and content-based feature extraction. After obtaining the intermediate representation for each user during enrollment, we can choose to preserve only the low-dimensional representations of the OSN users and discard the original content from their posts. Thereby, any psychological traits information and social behavioral biometric traits embedded in its content is obfuscated. During verification, the trained individual components are used to extract the representation of a test example, and a similaritybased decision-making component is employed to provide the user identification functionality.
Fig. 17.6 Methodology for psychological traits-based social behavioral biometric user identification
408
Md. Shopon et al.
17.3.5 Aesthetic-Based Biometric De-identification 17.3.5.1
Definition and Motivation
A person’s aesthetics can be described as one’s taste or preference in content. This can include both audio and visual content, ranging from preferring certain images to listening to a favorite song. In the increasingly interconnected world, a person’s aesthetic information is widely accessible on social networks or other online platforms. Systems have been created that leverage knowledge of a user’s aesthetic information for targeted advertisements, recommendation algorithms, and adaptive content. Aesthetic-based biometric identification is a new domain that utilizes a person’s aesthetics as features for classification. Although the means exist and the data is abundant, the handling of personal aesthetic information can raise concerns regarding user privacy and security. Aesthetic-based biometric deidentification can be defined as the process of modifying or obscuring aspects of an individual’s aesthetic information. As opposed to other traditional biometrics, a person’s aesthetics are categorized as a social behavioral biometric resulting from the aggregation of preference samples. Aesthetic-based de-identification aims to obfuscate the patterns and relationships between a corresponding preference sample set and the user. The de-identification of aesthetic biometrics is crucial due to aesthetic data sensitivity. In the event where aesthetic information is obtained, preference samples containing personal information can be compromised, and other social attributes of the user can be deduced. The de-identification of these samples, both complete and incomplete, hinder attempts to draw conclusions from the data. For example, research has shown that a person’s aesthetic preference can be linked to individual personality [60]. This issue can be exacerbated depending on the data quantity and quality, leading to the motivation of this form of de-identification.
17.3.5.2
Background on Aesthetic-Based Biometric
Aesthetic-based biometrics are a relatively recent trait introduced for the initial purpose of person identification. In 2012, Lovato et al. published a proof-of-concept for an identification system using a set of user-liked images [61]. Although the accuracy was low, the discriminatory value of aesthetic features was established, along with a public dataset encouraging further investigation. Subsequent work by Segalin, Perina, and Cristani in 2014 made use of additional perceptual and content feature categories. An emphasis on feature optimization through a multi-resolution counting grid technique resulted in a rank 1 accuracy of 73% [62]. Azam and Gavrilova followed in 2017 using more detailed feature extraction and principal component analysis for feature reduction. An 861 detailed feature set was reduced to 700 principal components for an accuracy of 84.50% [63]. In 2019, Sieu and Gavrilova introduced the use of Gene Expression Programming (GEP) to construct composite
17 Biometric System De-identification: Concepts, Applications…
409
features, improving the accuracy to 95.1% [64]. Finally, the most recent state-ofthe-art method by Bari, Sieu, and Gavrilova in 2020 utilized a convolutional neural network for automatic high-level feature discovery. The network, called AestheticNet, is composed of a three-stage network, reaching the rank 1 accuracy of 97.73% [65].
17.3.5.3
Adaptation of Existing Methods for De-identification
In the context of de-identification, many options are available for existing aestheticbased systems. A trend of machine learning, feature engineering, and more recently deep learning is evident among the current state-of-the-art approaches. Thus, an adaptation of the existing methods would rely on the manipulation of the sample inputs or network architecture. In the case of complete de-identification, algorithms exist that are able to obscure an image beyond both human and machine recognition [66]. For practical use however, partial preservation can see more application. This can be accomplished by selectively obscuring the auxiliary trait prior to input, for example using a grey color filter over color images. The color may be inferable by humans, but systems that rely on the pixel color or gradient of the object in question will falter. In the case that the sample must be unrecognized by humans yet identifiable with the system, pixelation or noise can be added to the image before using a de-pixelating substructure to rectify the transformation within the network [67] (Fig. 17.7).
17.4 Multi-Modal De-identification System 17.4.1 Definition and Motivation Many existing biometric systems deployed in real-world scenarios are primarily unimodal, which means only one biometric trait is relied upon. In unimodal biometric systems, intraclass variation, non-universality, interclass variation, and noisy data are some of the problems that can result in high False Acceptance Rates (FAR). Multi-modal biometrics refers to the utilization of two or more biometric traits in
Fig. 17.7 Two example inputs from the dataset collected by Lovato et al.
410
Md. Shopon et al.
an identification or verification system. Incorporating multiple biometric traits in a biometric identification system increases the accuracy rate of the system which is technically true. It also uncovers how multi-modal systems can mitigate the effect of weaker modalities (one of their primary uses) [68]. Multi-modal biometric systems have seen widespread commercial use, many of which use various fusion techniques and wavelet-based implementations [69– 73]. Hariprasath and Prabakar [70] proposed a multi-modal identification system with palmprint and iris score level fusion and Wavelet Packet Transform [WPT]. Murakami and Takahashi [74] utilized face, fingerprint and iris biometric modalities with Bayes decision rule-based score level fusion technique to identify these modalities. Ayed et al. [75] developed systems using face and fingerprint using Local Binary Patterns (LBP) and Gabor wavelet. Next, a weighted sum-based match level fusion allowed for an increase in accuracy.
17.4.2 Deep Learning Architecture Recently deep learning has gained a great interest in biometric de-identification. From text to video de-identification, in every area, deep learning has shown promising performance. Architectures like autoencoders, neural style transfer, Generative Adversarial Networks (GAN), and Recurrent Neural Networks (RNN) have been useful for de-identification.
17.4.2.1
Autoencoder
An autoencoder is a neural network architecture that is used in unsupervised machine learning algorithms. An autoencoder aims to apply backpropagation setting the target values equivalent to the input representation [76]. It comprises of two distinct neural networks; one is called an encoder, and the other one is a decoder. The encoder compresses the input data representation into some latent representation, and the decoder generates the original input data from the latent representation. In de-identification, autoencoders are used to re-generate the input data from the compressed representation so that the biometric is concealed.
17.4.2.2
Neural Style Transfer
Neural Style Transfer is a technique that takes two images as its input: a style reference image and a content image. The two images are combined such that the resultant image contains the core elements of the content image but appears to be “painted” in the style of the reference image [77].
17 Biometric System De-identification: Concepts, Applications…
17.4.2.3
411
Generative Adversarial Networks
Generative Adversarial Networks (GANs) are a branch of machine learning that aims to mimic a given data distribution. It was first proposed by Goodfellow et al. in 2014 [78]. GANs consist of two distinct neural networks, one is a generator that is trained to generate synthetic data, and the other one is a discriminator trained to discriminate between synthetic and real data.
17.4.3 Multi-Modal De-identification Methodology To perform multi-modal biometric de-identification, the biometric modalities to be de-identified must be chosen first. The biometric modalities of interest should be extracted individually from the raw data for processing. After extracting the biometric modalities, the type of de-identification should be selected. After the selection of the type of de-identification, each biometric modality should be de-identified separately. Finally, the de-identified biometric modalities need to be combined with the original data. Depending on the type of de-identification the identification/verification will be performed. The general framework for multimodal de-identification is depicted in Fig. 17.8. Earlier in this chapter, we introduced four different types of de-identification. We here discuss how those four categories can be applied to multi-modal biometric systems. Complete De-identification for Multi-modal System is irrelevant for multimodal system as the de-identified data will not allow to perform biometric user recognition. Soft Biometric Preserving Multi-modal De-identification can be adapted as following. While choosing the biometric modalities for de-identification, only the traits that fall into the category of traditional biometrics will be de-identified and the rest of the modalities will remain unchanged. For Soft Biometric Preserving, Utility Retained Multi-modal De-identification, the traditional biometric modalities that needs to be de-identified will still be recognized by individual biometric recognition modules. Finally, in Traditional Biometric Preserving Multi-modal De-identification, obscuring particular soft or auxiliary biometric modalities while leaving traditional modalities in original form is needed.
412
Md. Shopon et al.
Fig. 17.8 Flowchart of the multi-modal de-identification system
17.4.4 Potential Applications of Multi-Modal Biometric De-identification 17.4.4.1
Multi-Biometric Data Breaching
A multi-modal biometric authentication system contains multiple biometric traits for authentication. Due to this property, compromising a multi-modal biometric authentication system may reveal information spanning multiple modalities. This may lead to multi-modal spoof attacks in other places where similar biometrics are being used. Consider the following case that illustrates the above concept. Person X provided his/her face, fingerprint, and iris information for company A, face information for
17 Biometric System De-identification: Concepts, Applications…
413
company B, and iris information for company C. Now, if the data of person X is leaked from company A, it will be easier for the attacker to spoof the authentication system for company B and C as both are using similar biometric modalities as company A. Multi-biometric de-identification can be proven useful in this case to protect some of the user data even in the case of a breach.
17.4.4.2
Health State Prediction
Recent advancements in machine learning made it possible to predict a person’s health condition from various biometric sources such as face [79], gait [42] etc. An individual’s health condition is widely considered as highly sensitive and confidential information. Multi-modal biometric systems hold such biometric information from which predicting a person’s health state is possible. This misuse may result in the violation of user privacy. Thus, the use of multi-modal de-identification systems becomes essential.
17.4.4.3
Identity Theft
Identity theft is one of the most common privacy invasions that can be done using biometric information. Leakage of multi-modal biometric information can increase the repercussions of stealing identity more accurately as the attacker will have the access to more than one modality of biometric information. The de-identification of certain non-essential biometric traits can mitigate the exploitation of some user data in the case of identity theft.
17.5 Potential Applications in Risk Assessment and Public Health This section summarizes the above discussion through providing a gamut of applications of emerging de-identification research. Privacy Agreement in Surveillance: Anonymization of primary or auxiliary biometric data enhances the privacy of the public. If the sequence of gait is perturbed in such a way that primary biometric identification is successful whereas auxiliary biometric traits are estimated poorly or vice versa, this solution can be applied to surveillance to meet the privacy agreement. Gait-based Health Care: Individual can exhibit postural problems which could be diagnosed through static posture and gait analysis [80]. In such a case, primary biometric can be readily de-identified while preserving auxiliary biometric traits, such as age, gender, activity, and emotion.
414
Md. Shopon et al.
Privacy in Risk Assessment in Public Places: Since a person’s emotional state can be estimated using the gait analysis [42], this opens the opportunity to develop applications for the health sector. Based on the necessity of data protection, primary biometrics can be obscured while preserving the auxiliary information for further psychological analysis. Analysis of emotional state can be further applied in the surveillance of public places to estimate the risk of unwanted events by continuous monitoring of the emotional state of the public. In such a situation, the de-identification of primary biometric can ensure the data privacy of an individual. Advertisement: One reason why many social media companies mine their users’ data is to identify their interests so that the relevant product recommendations can be shown to them. Naturally, this raises concern among users of regarding user data and privacy. De-identifying the corresponding sensitive data and understanding a user’s emotion towards certain products can serve as a substitute or supplement data mining. As an added benefit, users might be more inclined to opt into such an approach, since it protects their privacy. Entertainment: Another possible usage of emotion recognition is adaptive entertainment experiences. For instance, movies and/or video games that change the narrative based on the user’s emotional responses can be created. However, such applications require the storage and analysis of user information. Users might be more willing to participate when user data is protected and anonymized. Medicine: There are already many applications that predict and identify mental and/or physical illnesses by monitoring user emotions. Similar to other applications discussed above, the users might be more willing to opt-in such services knowing their identities and other sensitive data is safe. Robotics: A classical goal in robotics is the creation of human-like robots. Emotion recognition can facilitate this goal through the identification of important emotional features and their simulation in robots. However, such an application requires various researches that use private and sensitive data. Hence, de-identification can provide pre-processing of such data for further use. Cyberspace Surveillance: The surveillance of cyberspace and the gathering of suspicious information is necessary to maintaining a secure environment in the cyberworld. Government authorized agents have been known to surveil the social networks disguising themselves among other users. Social Behavioral Biometrics-based deidentification can aid security agents in the covert observation and anonymous moderation of cyberspaces. Continuous Authentication: Continuous authentication refers to a technology that verifies users on an ongoing basis to provide identity confirmation and cybersecurity protection. Social Behavioral Biometrics authenticates users on social networking sites continuously without any active participation of the users. The templates of users’ writing patterns and acquaintance networks information must be stored in the database for SBB authentication. Instead of storing the identifying templates directly, SBB-based de-identification techniques can be applied to the templates to preserve the security of the system. Protecting Anonymity: Authorized officials often publish case studies and written content of cybercrime victims to create public awareness. In such cases, social
17 Biometric System De-identification: Concepts, Applications…
415
networking portals and blogs are used as convenient media to disseminate information. Typically, the identity of victims are kept anonymous. However, from the written contents, the identity of the victims might get disclosed comparing with their online social networking activities. Therefore, de-identification of these published materials helps to protect anonymity where the writers’ identity must be kept confidential. Multi-Factor Authentication: Leveraging the discriminative ability of an individual’s aesthetic information, a multi-factor authentication system can be implemented. As a remote and accessible biometric, aesthetic identification can provide additional security when another modality is compromised. De-identification in this context would preserve the security of the system when storing a user’s preference template. Adaptive Caregiving: Given the aesthetic preference information of a user, personalized caretaking behaviors can be learned by a service robot agent. These identified connections can allow for more efficient and directed responses when exposed to aesthetic stimuli from clients. The ability of an intelligent system to analyze aesthetic information and exhibit realistic interactions has high potential. De-identification of this data can preserve client privacy. Digital Identification: When supplied with a sample of images from either a computer hard drive or online account, the individual most aligned with those aesthetic features can be determined. During scenarios where testimonies are unreliable or unobtainable, aesthetic identification can be used to extract more information. This can be applied to online authorship identification, border control, and forensic investigation. De-identification would ensure the confidentiality of the individual. Psychology: Psychological traits de-identification system can be used to protect sensitive user information and implement privacy-preserving user identification systems. Furthermore, this concept can be applied in user behavior modeling problems such as predicting the likeliness to take a particular action, for example, clicking on a particular ad. Finally, psychological traits-based de-identification can be used in conjunction with other privacy-preserving measures such as data anonymization to further ensure the protection of OSN user data.
17.6 Open Problems 17.6.1 Open Problems of Sensor-Based Biometric De-identification There are several open problems of sensor-based biometric de-identification noticed in this study. First, the results of the perturbation in original data for the identification of primary biometric and the estimation of auxiliary biometric are not investigated. Second, the design of innovative deep learning architectures for the sensor-based biometric de-identification is an avenue of potential research leading towards the development of a practical solution. Third, the acceptable obscureness of biometric
416
Md. Shopon et al.
data while preserving other biometric is open to discussion. Fourth, since certain behavioral biometrics may change over time, the procedure to adapt with the updated behavioral biometric in biometric de-identification requires further analysis in the future.
17.6.2 Open Problems of Gait and Gesture De-identification De-identification approaches for gait and gesture de-identification are based on the blurring technique [81]. The difficulty with gait and gesture de-identification is how to obscure the characteristics of an individual’s movement and/or walking patterns, and at the same time retain the naturalness of the de-identified video. There are no published research reports on gait and gesture de-identification.
17.6.3 Open Problems of Emotion-Based De-identification De-identification while preserving emotional information is a very new area in research, which has been more prevalent with faces than with any other biometric. For Gait, EEG and ECG, the most significant features for person identification are unknown. Hence, the first steps with these biometrics will be to identify the particular biometric features that are crucial for personal identification. Consequently, methods must be developed to obscure personal identification features while retaining the emotional information in the data. Additionally, face emotion-based de-identification research has produced some promising results. Hence, increasing person identification error is a likely future research direction for emotion preservation-based facial emotion recognition system.
17.6.4 Open Problems of Social Behavioral De-identification Social Behavioral Biometrics (SBB) is a new promising biometric trait. Therefore, the domain of SBB has many facets to explore and improve, as do the de-identification of SBB traits. The methods for de-identifying different acquaintance networks without raising public suspicion can be a future research scope. Changing the writing styles of tweets while preserving emotions and information, and vice versa are interesting fields to explore. The reversibility to the original SBB traits after de-identification and subsequent measures to increase the difficulty level of disclosing can be investigated.
17 Biometric System De-identification: Concepts, Applications…
417
17.6.5 Open Problems of Psychological Traits-Based De-identification There are many potential avenues for applying the concept of psychological traitbased de-identification within the domain of privacy-preserving social behavioral biometrics. However, we should consider certain open problems. Firstly, although clinical research indicates the permanence of psychological traits among adults, it can change over time due to significant life events and circumstances. Moreover, psychological traits factorize a wide range of human behaviors into a fixed number of labels. Therefore, any de-identification of psychological traits may lead to the loss of a nuanced representation of user-generated content. This may result in a reduction of accuracy for the downstream prediction task. Secondly, the degree to which a dataset is de-identified may not be directly measurable. As humans may not be able to readily discern psychological traits from user content, it is difficult to ascertain if the information regarding psychological traits is truly obfuscated from automated systems.
17.6.6 Open Problems of Aesthetic-Based Biometric De-identification Although there exist adaptations and potential applications, many open problems for aesthetic-based biometric de-identification remain. The variability of human aesthetics can pose a challenge when performing de-identification. Unlike other forms of biometrics such as iris or fingerprints, what is considered a primary and auxiliary trait may be malleable depending on the extracted features. Thus, the behavior of the system when obfuscating certain features can be unpredictable. It can also be difficult gauging what is recognizable by humans or a biometric system, especially given the social-behavioral nature of an individual’s aesthetics. The degree to which the samples must be modified can be unclear or require a dynamic response based on the methodology used. As aesthetic-based biometrics is itself a developing domain, other modalities, new architectures, and interdomain correlations are all avenues with high potential.
17.6.7 Open Problems of Multi-Modal De-identification Multi-modal biometric de-identification is a concept proposed in this book chapter. To the best of our knowledge, no prior research was conducted in this area. Common multi-modal biometric authentication system involves traditional biometrics such as face, fingerprint, iris, palm, ear, etc. Using the aforementioned traditional biometrics, it is possible to predict an individual’s emotion [80] and medical conditions [82].
418
Md. Shopon et al.
De-identification to protect health information is a new proposed area of research. Research on concealing single biometrics information [83] gained attention in the last couple of years. However, no work on removing soft biometrics information from multi-modal biometrics information has been conducted as of yet. As there can be several combinations for biometric modalities there is no particular research that was done on finding what combinations are most suitable for multi-modal deidentification. In addition to that, for multimodal de-identification some applications may need all the biometric traits to be obscured, while some may need only particular traits to be modified. It is a rich avenue for future investigations.
17.7 Conclusion This chapter provided a comprehensive overview of the domain of biometric information de-identification to ensure user data privacy. Analytical discussions regarding how physiological, behavioral, and social-behavioral biometric data can be protected in various authentication applications were presented. Drawing on the most recent developments in the biometric security domain, the chapter introduced new de-identification paradigms: social biometric de-identification, aesthetic-based de-identification, context-based de identification, spatial, temporal, emotion-based de-identification, and sensor-based de-identification. It also proposes the concept of multi-modal biometric system de-identification. A large number of potential applications of introduced concepts in public health and safety domains were described. The chapter concluded with formulating open questions and investigating future directions in this vibrant research field. Answers to those questions will assist not only in the establishment of the new methods in the biometric security and privacy domains, but also provide insights into the future emerging topics in big data analytics and social networking research. Acknowledgements The authors would like to thank the National Sciences and Engineering Research Council of Canada for partial support of this research in the form of the NSERC Discovery Grant #10007544 and NSERC Strategic Planning Grant #10022972.
References 1. L.C. Jain, U. Halici, I. Hayashi, S. Lee, S. Tsutsui, Intelligent Biometric Techniques in Fingerprint and Face Recognition, vol. 10 (CRC press, 1999) 2. L.C. Jain, N. Martin, Fusion of Neural Networks, Fuzzy Systems and Genetic Algorithms: Industrial Applications, vol. 4 (CRC press, 1998) 3. G.A. Tsihrintzis, L.C. Jain, Machine Learning Paradigms: Advances in Deep Learning-based Technological Applications, vol. 18 (Springer Nature, 2020) 4. S. Ribaric, A. Ariyaeeinia, N. Pavesic, De-identification for privacy protection in multimedia content: a survey. Sig. Process. Image Commun. 47, 131–151 (2016)
17 Biometric System De-identification: Concepts, Applications…
419
5. L. Menand, Why Do We Care So Much About Privacy? (The New Yorker, 2018) 6. S. L. Garfinkel, De-identification of Personal Information (NIST Technical Series Publications, 2015) 7. B. Meden, P. Peer, V. Struc, Selective face deidentification with end-to-end perceptual loss learning, in IEEE International Work Conference on Bioinspired Intelligence (IWOBI), (IEEE, pp. 1–7) 2018 8. G.S. Nelson, Practical implications of sharing data: a primer on data privacy, anonymization, and de-identification, in SAS Global Forum Proceedings, (2015) pp. 1–23 9. X. Yu, K. Chinomi, T. Koshimizu, N. Nitta, Y. Ito, N. Babaguchi, Privacy protecting visual processing for secure video surveillance, in 15th IEEE International Conference on Image Processing, (IEEE, 2008), pp. 1672–1675 10. L. Meng, Z. Sun, O.T. Collado, Efficient approach to de-identifying faces in videos. IET Signal Proc. 11(9), 1039–1045 (2017) 11. E.M. Newton, L. Sweeney, B. Malin, Preserving privacy by de-identifying face images. IEEE Trans. Knowl. Data Eng. 17(2), 232–243 (2005) 12. M.L. Gavrilova, F. Ahmed, A.H. Bari, R. Liu, T. Liu, Y. Maret, B.K. Sieu, T. Sudhakar, Multimodal motion-capture-based biometric systems for emergency response and patient rehabilitation, in Design and Implementation of Healthcare Biometric Systems (IGI Global, 2019), pp. 160–184 13. F. Ahmed, P. Polash Paul, M.L. Gavrilova, Kinect-based gait recognition using sequences of the most relevant joint relative angles. J. WSCG 23(2), 147–156 (2015) 14. M.M. Monwar, M. Gavrilova, Y. Wang, A novel fuzzy multimodal information fusion technology for human biometric traits identification, in IEEE 10th International Conference on Cognitive Informatics and Cognitive Computing (ICCI-CC’11) (IEEE, 2011), pp. 112–119 15. M.M. Monwar, M. Gavrilova, Markov chain model for multimodal biometric rank fusion. SIViP 7(1), 137–149 (2013) 16. A.K. Jain, A.A. Ross, K. Nandakumar, Introduction to Biometrics (Springer Science & Business Media, 2011) 17. H. Chao, Y. He, J. Zhang, J. Feng, Gaitset: regarding gait as a set for cross-view gait recognition. Proc. AAAI Conf. Artif. Intell. 33, 8126–8133 (2019) 18. A.H. Bari, M.L. Gavrilova, Artificial neural network-based gait recognition using kinect sensor. IEEE Access 7, 162708–162722 (2019) 19. J.R. Kwapisz, G.M. Weiss, S.A. Moore, Cell phone-based biometric identification, in Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS) (IEEE, 2010), pp. 1–7 20. Q. Zou, Y. Wang, Q. Wang, Y. Zhao, Q. Li, Deep learning-based gait recognition using smartphones in the wild. IEEE Trans. Inf. Forensics Secur. 15, 3197–3212 (2020) 21. C. Xu, Y. Makihara, G. Ogi, X. Li, Y. Yagi, J. Lu, The ou-isir gait database comprising the large population dataset with age and performance evaluation of age estimation. IPSJ Trans. Comput. Vis. Appl. 9(1), 24 (2017) 22. X. Li, Y. Makihara, C. Xu, Y. Yagi, M. Ren, Gait-based human age estimation using age groupdependent manifold learning and regression. Multimedia Tools Appl. 77(21), 28333–28354 (2018) 23. X. Li, Y. Makihara, C. Xu, Y. Yagi, M. Ren, Make the bag disappear: carrying status-invariant gait-based human age estimation using parallel generative adversarial networks, in IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS) (IEEE, 2019), pp. 1–9 24. M.A.R. Ahad, T.T. Ngo, A.D. Antar, M. Ahmed, T. Hossain, D. Muramatsu, Y. Makihara, S. Inoue, Y. Yagi, Wearable sensor-based gait analysis for age and gender estimation. Sensors 20(8), 2424 (2020) 25. Y. Tang, Q. Teng, L. Zhang, F. Min, J. He, Layer-wise training convolutional neural networks with smaller filters for human activity recognition using wearable sensors. IEEE Sens. J. 21(1), 581–592 (2020)
420
Md. Shopon et al.
26. F. Li, K. Shirahama, M.A. Nisar, L. Köping, M. Grzegorzek, Comparison of feature learning methods for human activity recognition using wearable sensors. Sensors 18(2), 679 (2018) 27. K. Brki´c, I. Sikiri´c, T. Hrka´c, Z. Kalafati´c, De-identifying people in videos using neural art, in Sixth International Conference on Image Processing Theory, Tools and Applications (IPTA) (IEEE, 2016), pp. 1–6 28. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets. Adv. Neural. Inf. Process. Syst. 27, 2672–2680 (2014) 29. J. Cahn, The generation of affect in synthesized speech. J. Am. Voice I/O Soc. 8, 1–19 (1990) 30. C.E. Williams, K.N. Stevens, Vocal correlates of emotional states. Speech Eval. Psychiatry 221–240 (1981) 31. M. El Ayadi, M.S. Kamel, F. Karray, Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recogn. 44(3), 572–587 (2011) 32. C.-H. Wu, Z.-J. Chuang, Y.-C. Lin, Emotion recognition from text using semantic labels and separable mixture models. ACM Trans. Asian Lang. Inf. Process. (TALIP) 5(2), 165–183 (2006) 33. G. Letournel, A. Bugeau, V.-T. Ta, J.-P. Domenger, Face de-identification with expressions preservation, in IEEE International Conference on Image Processing (ICIP) (IEEE, 2015), pp. 4366–4370 34. Y. Li, S. Lyu, De-identification without losing faces, in Proceedings of the ACM Workshop on Information Hiding and Multimedia Security (2019), pp. 83–88 35. B. Meden, R.C. Mallı, S. Fabijan, H.K. Ekenel, V. Sˇtruc, P. Peer, Face deidentification with generative deep neural networks. IET Sig. Process. 11(9), 1046–1054 (2017) 36. B. Meden, Ž. Emeršiˇc, V. Štruc, P. Peer, k-same-net: k-anonymity with gen-erative deep neural networks for face deidentification. Entropy 20(1), 60 (2018) 37. R. Aloufi, H. Haddadi, D. Boyle, Emotionless: privacy-preserving speech analysis for voice assistants. (2019). arXiv:1908.03632 38. Y. Iwashita, K. Uchino, R. Kurazume, Gait-based person identification robust to changes in appearance. Sensors 13(6), 7884–7901 (2013) 39. Z.A.A. Alyasseri, A.T. Khader, M.A. Al-Betar, O.A. Alomari, Person identification using eeg channel selection with hybrid flower pollination algorithm. Pattern Recognit. 107393 (2020) 40. D. Jyotishi, S. Dandapat, An LSTM method for person identification using ecg signal. IEEE Sens. Lett. 4(8), 1–4 (2020) 41. F. Ahmed, B. Sieu, M.L. Gavrilova, Score and rank-level fusion for emotion recognition using genetic algorithm, in IEEE 17th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC) (IEEE, 2018), pp. 46–53 42. F. Ahmed, A.H. Bari, M.L. Gavrilova, Emotion recognition from body movement. IEEE Access 8, 11761–11781 (2019) 43. M. Sultana, P.P. Paul, M. Gavrilova, A concept of social behavioral biometrics: motivation, current developments, and future trends, in 2014 International Conference on Cyberworlds (IEEE, 2014), pp. 271–278 44. M. Sultana, P.P. Paul, M.L. Gavrilova, User recognition from social behavior in computermediated social context. IEEE Trans. Human-Mach. Syst. 47(3), 356–367 (2017) 45. Y. Li, Z. Su, J. Yang, C. Gao, Exploiting similarities of user friendship networks across social networks for user identification. Inf. Sci. 506, 78–98 (2020) 46. M.L. Brocardo, I. Traore, I. Woungang, M.S. Obaidat, Authorship verification using deep belief network systems. Int. J. Commun. Syst. 30(12), e3259 (2017) 47. M.L. Brocardo, I. Traore, I. Woungang, Continuous authentication using writing style, in Biometric-Based Physical and Cybersecurity Systems (Springer, 2019), pp. 211–232 48. S.N. Tumpa, M. Gavrilova, Linguistic profiles in biometric security system for online user authentication, in IEEE International Conference on Systems, Man, and Cybernetics (SMC) (IEEE, 2020), pp. 1033–1038 49. S.N. Tumpa, M.L. Gavrilova, Score and rank level fusion algorithms for social behavioral biometrics. IEEE Access 8, 157663–157675 (2020)
17 Biometric System De-identification: Concepts, Applications…
421
50. R.V. Yampolskiy, V. Govindaraju, Behavioural biometrics: a survey and classification. Int. J. Biometrics 1(1), 81–113 (2008) 51. L.R. Goldberg, The structure of phenotypic personality traits. Am. Psychol. 48(1), 26 (1993) 52. H. Ning, S. Dhelim, N. Aung, Personet: friend recommendation system based on big-five personality traits and hybrid filtering. IEEE Trans. Comput. Soc. Syst. 6(3), 394–402 (2019) 53. A. Saleema, S.M. Thampi, User recognition using cognitive psychology-based behavior modeling in online social networks, in International Symposium on Signal Processing and Intelligent Recognition Systems (Springer, 2019), pp. 130–149 54. F.-Y. Wang, K.M. Carley, D. Zeng, W. Mao, Social computing: from social informatics to social intelligence. IEEE Intell. Syst. 22(2), 79–83 (2007) 55. J. Pennington, R. Socher, C.D. Manning, Glove: global vectors for word representation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2014), pp. 1532–1543 56. P.-H. Arnoux, A. Xu, N. Boyette, J. Mahmud, R. Akkiraju, V. Sinha, 25 tweets to know you: a new model to predict personality with social media, in Proceedings of the International AAAI Conference on Web and Social Media (2017) 57. K.N.P. Kumar, M.L. Gavrilova, Personality traits classification on twitter, in 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (2019), pp. 1–8 58. M. Sultana, P.P. Paul, M. Gavrilova, Social behavioral biometrics: an emerging trend. Int. J. Pattern Recognit. Artif. Intell. 29(08), 1556013 (2015) 59. A. Theóphilo, L.A.M. Pereira, A. Rocha, A needle in a haystack? Harnessing onomatopoeia and user-specific stylometrics for authorship attribution of micro-messages, in ICASSP 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2019), pp. 2692–2696 60. T. Chamorro-Premuzic, S. Reimers, A. Hsu, G. Ahmetoglu, Who art thou? Personality predictors of artistic preferences in a large uk sample: the importance of openness. Br. J. Psychol. 100(3), 501–516 (2009) 61. P. Lovato, A. Perina, N. Sebe, O. Zandonà, A. Montagnini, M. Bicego, M. Cristani, Tell me what you like and i’ll tell you what you are: discriminating visual preferences on flickr data, in Asian Conference on Computer Vision (Springer, 2012), pp. 45–56 62. C. Segalin, A. Perina, M. Cristani, Personal aesthetics for soft biometrics: a generative multiresolution approach, in Proceedings of the 16th International Conference on Multimodal Interaction (2014), pp. 180–187 63. S. Azam, M. Gavrilova, Person identification using discriminative visual aesthetic, in Canadian Conference on Artificial Intelligence (Springer, 2017), pp. 15–26 64. B. Sieu, M. Gavrilova, Biometric identification from human aesthetic preferences. Sensors 20(4), 1133 (2020) 65. A.H. Bari, B. Sieu, M.L. Gavrilova, Aestheticnet: deep convolutional neural network for person identification from visual aesthetic. Vis. Comput. 36(10), 2395–2405 (2020) 66. S. Ribaric, N. Pavesic, An overview of face de-identification in still images and videos, in 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), vol. 4 (IEEE, 2015), pp. 1–6 67. H. Mao, Y. Wu, J. Li, Y. Fu, Super resolution of the partial pixelated images with deep convolutional neural network, in Proceedings of the 24th ACM international conference on Multimedia (2016), pp. 322–326 68. P. Sanjekar, J. Patil, An overview of multimodal biometrics. Sig. Image Process. 4(1), 57 (2013) 69. D.R. Kisku, A. Rattani, P. Gupta, J.K. Sing, Biometric sensor image fusion for identity verification: a case study with wavelet-based fusion rules graph matching, in IEEE Conference on Technologies for Homeland Security (IEEE, 2009), pp. 433–439 70. S. Hariprasath, T. Prabakar, Multimodal biometric recognition using iris feature extraction and palmprint features, in IEEE-International Conference on Advances in Engineering, Science and Management (ICAESM-2012) (IEEE, 2012), pp. 174–179
422
Md. Shopon et al.
71. A. Kumar, M. Hanmandlu, S. Vasikarla, Rank level integration of face-based biometrics, in Ninth International Conference on Information Technology-New Generations (IEEE, 2012), pp. 36–41 72. A.P. Yazdanpanah, K. Faez, R. Amirfattahi, Multimodal biometric system using face, ear and gait biometrics, in 10th International Conference on Information Science, Signal Processing and Their Applications (ISSPA 2010) (IEEE, 2010), pp. 251–254 73. F. Yang, B. Ma, Notice of retraction: two models multimodal biometric fusion based on fingerprint, palm-print and hand-geometry, in 1st International Conference on Bioinformatics and Biomedical Engineering (IEEE, 2007), pp. 498–501 74. T. Murakami, K. Takahashi, Fast and accurate biometric identification using score level indexing and fusion, in International Joint Conference on Biometrics (IJCB) (IEEE, 2011), pp. 1–8 75. N.G.B. Ayed, A.D. Masmoudi, D.S. Masmoudi, A human identification based on fusion fingerprints and faces biometrics using LBP and GWN descriptors, in Eighth International Multi-Conference on Systems, Signals and Devices (IEEE, 2011), pp. 1–7 76. M.A. Kramer, Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991) 77. L.A. Gatys, A.S. Ecker, M. Bethge, Image style transfer using convolutional neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 2414–2423 78. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020) 79. L. Wen, G. Guo, A computational approach to body mass index prediction from face images. Image Vis. Comput. 31(5), 392–400 (2013) 80. F. Ahmed, A.H. Bari, B. Sieu, J. Sadeghi, J. Scholten, M.L. Gavrilova, Kalman filter-based noise reduction framework for posture estimation using depth sensor, in IEEE Proceedings of 18th International Conference on Cognitive Informatics and Cognitive Computing (IEEE, 2019), pp. 150–158 81. N. Baaziz, N. Lolo, O. Padilla, F. Petngang, Security and privacy protection for automated video surveillance, in 2007 IEEE International Symposium on Signal Processing and Information Technology (IEEE, 2007), pp. 17–22 82. I. El Maachi, G.-A. Bilodeau, W. Bouachir, Deep 1d-convnet for accurate parkinson disease detection and severity prediction from gait. Expert Syst. Appl. 143, 113075 (2020) 83. V. Mirjalili, A. Ross, Soft biometric privacy: retaining biometric utility of face images while perturbing gender, in IEEE International Joint Conference on Biometrics (IJCB) (IEEE, 2017), pp. 564–573