Explainable Machine Learning for Multimedia Based Healthcare Applications 3031380355, 9783031380358

This book covers the latest research studies regarding Explainable Machine Learning used in multimedia-based healthcare

168 78 10MB

English Pages 247 [240] Year 2023

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Foreword
Preface
Acknowledgement
Contents
Automatic Fetal Motion Detection from Trajectory of US Videos Based on YOLOv5 and LSTM
1 Introduction
2 Material and Method
2.1 Dataset
2.2 Structure of YOLO v5
2.3 LSTM (Long-Short Term Memory) Deep Neural Networks
3 Experimental Analysis
4 Conclusion
References
Explainable Machine Learning (XML) for Multimedia-Based Healthcare Systems: Opportunities, Challenges, Ethical and Future Pros...
1 Introduction
1.1 The Following Are the Significant Contributions of This Chapter
1.2 Chapter Organization
2 Multimedia Data in Healthcare Systems
2.1 Types of Multimedia Presentations in Healthcare Systems
2.1.1 Audio
2.1.2 Visual Data
2.1.3 Video
2.1.4 Text
3 Explainable Machine Learning for Multimedia Data in Healthcare Systems
3.1 A Classification of Techniques: Various Interpretability Scopes for Machine Learning
4 The Challenges of Explainable Machine Learning in Healthcare Systems
4.1 Lack of Standardized Requirements for XML
4.2 Unstandardized Representation Techniques
4.3 What Clinicians Expect: Explainability vs. Accuracy
4.4 What and How of the Results Explained
4.5 Security and Privacy Issues
4.6 Verification of Explanations
4.7 Ethical Restrictions
4.8 Lack of Theoretical Knowledge
4.9 Absence of Cause
5 An Effective Explainable Machine Learning Framework for Healthcare Systems
6 Research Prospects and Open Issues
7 Conclusion and Future Directions
References
Ensemble Deep Learning Architectures in Bone Cancer Detection Based on Medical Diagnosis in Explainable Artificial Intelligence
1 Introduction
2 Related Works
3 System Model
3.1 Optimized Kernel Fuzzy C Means Multilayer Deep Transfer Convolutional Learning (OpKFuzCMM-DTCL) Based Segmentation and Cla...
3.2 Performance Analysis
3.3 Dataset Description
3.4 Discussion
4 Conclusion
References
Digital Dermatitis Disease Classification Utilizing Visual Feature Extraction and Various Machine Learning Techniques by Expla...
1 Introduction
2 Materials and Methods
3 Results
4 Conclusion
References
Explainable Machine Learning in Healthcare
1 Introduction
2 Data Set Used
3 Various Machine Learning Algorithms
4 Linear Regression
5 SVM
6 Naive Bayes
7 Logistic Regression
8 K-Nearest Neighbors (kNN)
9 Decision Trees
10 RF Algorithm
11 Boosted Gradient Decision Trees (GBDT)
12 Clustering with K-Means
13 Analysis by Principal Components (PCA)
14 Case Study
15 Handling Missing Values
16 Result
17 Conclusion
18 Future Scope
References
Explainable Artificial Intelligence with Scaling Techniques to Classify Breast Cancer Images
1 Introduction
2 Related Work
3 Materials and Methods
3.1 Proposed Methodology
3.2 Dataset
3.3 Data Processing
3.3.1 Min-Max Scaling
3.3.2 Normalization
L1 Normalization
L2 Normalization
3.3.3 Z-score
3.4 Model
3.4.1 Logistic Regression
3.4.2 Support Vector Machine (SVM)
Support Vectors
Hyperplane
Margin
SVM Kernels
Linear Kernel
Polynomial Kernel
Radial Basis Function (RBF) Kernel
3.4.3 Decision Tree
Gini Index
Split Creation
3.4.4 Building a Tree
Terminal Node Creation
Recursive Splitting
3.4.5 Naïve Bayes
3.4.6 Random Forest
Working on Random Forest Algorithm
3.4.7 K-Nearest Neighbor (KNN)
3.4.8 Adaptive Boosting
3.4.9 Extreme Gradient Boosting
3.5 Performance Evaluation Metrics.
3.5.1 Confusion Matrix
3.5.2 Classification Report
Accuracy
Precision
Recall or Sensitivity
Specificity
F1 Score (F-measure)
Area Under ROC Curve (AUC)
3.5.3 Logarithmic Loss (LOGLOSS)
3.6 Metrics Use Case
3.7 Explainable Artificial Intelligence (XAI)
4 Result and Discussion
4.1 Experimental Setup
4.2 Explainable Result
4.3 Experimental Results
4.3.1 SVM
Random Forest (RF)
4.3.2 Naive Bayes (NB)
4.3.3 Logistic Regression (LR)
4.3.4 KNN
4.3.5 Decision Tree (DT)
4.3.6 Adaboost
4.3.7 XGBoost
4.3.8 Comparative Analysis
5 Conclusion and Future Work
References
A Novel Approach of COVID-19 Estimation Using GIS and Kmeans Clustering: A Case of GEOAI
1 Introduction
2 Discussion and Results
2.1 Temporal Data Distribution
2.2 Equation Based on the Values of the Confirmed Cases Obtained
2.3 Artificial Intelligence in Covid-19 Drones for Survey During Lockdown
2.4 Robots During Covid-19
3 Conclusions
References
A Brief Review of Explainable Artificial Intelligence Reviews and Methods
1 Introduction
2 Fundamental Definitions
3 Recent XAI Methods and Reviews
3.1 Review Studies
3.2 XAI Methods
4 XAI in Medicine
4.1 SHAP
4.2 GRADCAM
4.3 LRP
4.4 Lime
5 Discussion and Future Directions
6 Conclusion
References
Systematic Literature Review in Using Big Data Analytics and XAI Applications in Medical
1 Introduction
1.1 The Concept of Big Data
1.2 Explainable Artificial Intelligence: XAI
1.3 Big Data Analytics and XAI Applications in Medical
2 Methodology
2.1 Research Questions
2.2 Research Process
2.3 Data Collection
2.4 Data Analysis
3 Discussion and Results
3.1 Context Results
3.2 Evaluation of the Studies
4 Conclusions
References
Using Explainable Artificial Intelligence in Drug Discovery: A Theoretical Research
1 Introduction
1.1 A General Introduction to the Drug Discovery Sector
2 Stages of Drug Discovery
2.1 Using Artificial Intelligence in Drug Discovery Phases
3 What Is Explainable Artificial Intelligence?
3.1 The Importance of Explainable Artificial Intelligence in Drug Discovery
4 Academic Studies in Drug Discovery
5 Conclusions
References
Application of Interpretable Artificial Intelligence Enabled Cognitive Internet of Things for COVID-19 Pandemics
1 Introduction
2 Applications of Explainable Artificial Intelligence in COVID-19 Pandemics
3 Cognitive Internet of Things for COVID-19 Pandemics
3.1 Rapid Diagnosis
3.2 Contact Tracing and Clustering
3.3 Prevention and Control
3.4 Screening and Surveillance
3.5 Remote Monitoring of the Patient
3.6 Real-Time Tracking
3.7 Development of Drugs and Vaccines
4 The Challenges of Interpretable Artificial Intelligence Enabled Cognitive Internet of Things for COVID-19 Pandemic
5 The Framework of an XAI Enabled CIoT for Fighting COVID-19 Pandemic
6 Conclusion and Future Directions
References
Remote Photoplethysmography: Digital Disruption in Health Vital Acquisition
1 Introduction
2 Basic Principle
2.1 Photoplethysmography (PPG)
2.2 Remote Photoplethysmography (rPPG)
2.2.1 Principle of rPPG
2.2.2 Skin Reflection Model
2.2.3 Use of AI in rPPG
3 Algorithmic Methods
3.1 Blind Source Separation (BSS) Method (PCA/ICA)
3.2 Model-Based Method (CHROM/BVP)
3.2.1 Chrominance-Based Method (CHROM)
3.2.2 BVP Signature-Based Method
3.3 Design-Based Method
4 Issues and Literature Review
4.1 PPG vs. rPPG
4.2 Factors Affecting rPPG Video Capturing
4.2.1 Effect of Light Source
4.2.2 Effect of Body Motion
4.2.3 Effect of Camera´s Frame Rate
4.3 Effect of Video Compression
4.4 ROI Detection and Selection Problem
4.5 Signal Processing Techniques Limitations
4.6 Extracted Signal Noise Problem
5 Trends and Tools
6 Study and Results
7 Conclusion
Annexure-I: Informed Consent Form
References
Recommend Papers

Explainable Machine Learning for Multimedia Based Healthcare Applications
 3031380355, 9783031380358

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

M. Shamim Hossain Utku Kose Deepak Gupta   Editors

Explainable Machine Learning for Multimedia Based Healthcare Applications

Explainable Machine Learning for Multimedia Based Healthcare Applications

M. Shamim Hossain • Utku Kose • Deepak Gupta Editors

Explainable Machine Learning for Multimedia Based Healthcare Applications

Editors M. Shamim Hossain Department of Software Engineering College of Computer and Information Sciences King Saud University Riyadh, Saudi Arabia

Utku Kose Department of Computer Engineering Suleyman Demirel University Isparta, Turkey University of North Dakota Grand Forks, ND, USA

Deepak Gupta Maharaja Agrasen Institute of Technology Delhi, India

ISBN 978-3-031-38036-5 ISBN 978-3-031-38035-8 https://doi.org/10.1007/978-3-031-38036-5

(eBook)

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Foreword

As the humanity, we have been facing many technological advancements with the start of the twenty-first century. Of course, these are somehow results of the previous discoveries and developments along the history, but the civilization feels the exponential growth and effects in especially last decade. Among all the technological advancements, the field of Artificial Intelligence has a critical role in developing not only present but also future world. The effects of intelligent systems have been even transforming the society. Today, one may say that all fields of the modern life is under dominance of intelligent systems as the supportive touches by the field of Artificial Intelligence. At this point, the healthcare has a critical place for sure. It is clear that Artificial Intelligence ensures revolutionary outcomes in the context of healthcare applications. Thanks to especially hybrid intelligent systems and strong Deep Learning models, we often hear about sensational applications of Artificial Intelligence in healthcare problems. However, there is one tradeoff in which the more advanced intelligent systems we use the more we get solutions beyond of our understanding capabilities. On other words, we may say that today’s intelligent systems are more complicated as their inside mechanisms cannot be understood without any additional efforts. So, that causes issues of safety and trust for the interaction between human and the machine. That’s even more vital when we consider Artificial Intelligence inside healthcare problems. Although it was able to interpret traditional Machine Learning techniques, it is now not always possible to apply the same thing to more complicated and more successful systems with multiple techniques including even deep analysis capabilities for challenging healthcare issues. This edited volume Explainable Machine Learning for Multimedia-Based Healthcare Applications combines the latest research regarding Explainable Machine Learning, which is among the latest solutions for improving trust and safety level of intelligent systems for healthcare problems. Nowadays, research works are done in the context of multimedia-based data when we consider multiple way of inputs for more effective outcomes of healthcare applications. So, the book focuses more on integrating explainability (and sometimes interpretability aspects) v

vi

Foreword

inside the research efforts for medical problems such as diagnosis, tracking, and discoveries. It is also great to see that there are remarkable reviews for present and future potentials of Explainable Artificial Intelligence for healthcare with especially multimedia-based data. I appreciate the general organization of the chapters because with its current form, the book has a wide readership scope from academics to degree students and even general audience. As the book also employs a multidisciplinary approach, the book can be used as a reference work inside the courses of different fields like Computer Science and Engineering, Biomedical, Medical, Mechatronics, Data Science, and the associated engineering areas for software and/or hardware oriented developments. I think the content of the book will give many inspirations for also researchers from beginner to intermediate and advanced levels. I see that the future will be brilliant with a good collaboration between human and the machine, and this book ensures remarkable points to consider for building strong bridges towards healthcare applications. I would like to thank to dear colleagues and editors Dr. Hossain, Dr. Kose, and Dr. Gupta for their great efforts to build such a valuable work for the scientific literature. Also, it is a pleasure for me to be reading from valuable authors around the world, so my special thanks go to them! Now it is time to prepare a coffee, take a look to the next pages, and learn more about the latest outcomes of Explainable Machine Learning for healthcare applications covering multimedia data. All the best for the new era of human-machine collaboration in healthcare! National Autonomous University of Mexico, Mexico City, Mexico

Jose Antonio Marmolejo-Saucedo

Preface

As a result of many advancements in the field of Artificial Intelligence, it is now easier to process data and derive meaningful outcomes for real-world problems represented in the form of digitally modeled relations. It is clear that especially Deep Learning era brought fast and effective enough intelligent systems, which are capable of dealing with even the most challenging problems we encounter in fields of the modern life. Among these fields, the healthcare has a vital place as it is affecting well-being of individuals and building up sustainable and better future for next generations. Early applications of Artificial Intelligence for healthcare were associated with rule-based expert systems and the last decade of the twentieth century included revolutionary Machine Learning solutions for several problems such as diagnosis, treatment, and healthcare planning. Now, at the time of passing the first quarter of the twenty-first century, Artificial Intelligence has provided already cutting edge solutions for many problems such as cancer diagnosis, advanced treatments, and even drug discovery. Deep Learning keeps the main role in all these advancements so far. On the other hand, it has been even seen that the rise of Internet of Things and wearables made it possible to gather instant data and ensure instant communication cycles for better patient tracking and healthcare management. However, such advanced applications nowadays caused a tradeoff between interpretability and successful performance. Although we are able to obtain great performances with today’s Artificial Intelligence-based systems, it is more difficult to understand how such systems processed the input data to derive the successful outcomes. As it may be understood, that’s a too critical problem for healthcare applications. Additionally, the most recent applications of hybrid Machine Learning or Deep Learning architectures are able to work on multimedia-based medical data, so it is more important to employ effective mechanisms to make them interpretable for the human side. As a result of advancements through interpretability of Machine Learning, a newer concept of Explainable Artificial Intelligence (XAI) was introduced for the more complex intelligent systems. By ensuring more research on interpretability and explainability of intelligent systems, the future advancements

vii

viii

Preface

for multimedia-based healthcare applications will be trustworthy and humancompatible. As based on the current state of the literature, this edited volume provides the most recent research studies regarding Explainable Machine Learning applications for multimedia-based healthcare applications. The content has been a collection of research works aiming to solve different healthcare problem cases and ensuring a wide enough literature review regarding the scope of the book. As a result of meticulous efforts by international authors, we organized a useful reference work, which will be beneficial for researchers, degree students, and even private/public sector experts in the context of healthcare and engineering fields. The most critical contribution of the book is associated with use of multimedia data for healthcare problems and ensuring interpretable and/or explainable intelligent system formations, which are able to ensure effective results for target problems. In the context of the collection, we reviewed and included a total of 12 chapters briefly as follows: Chapter “Automatic Fetal Motion Detection from Trajectory of US Videos Based on YOLOv5 and LSTM” reports an automatic fetal motion detection system, which is able to use trajectories of US videos, by employing both YOLOv5 and LSTM models. Chapter “Explainable Machine Learning (XML) for Multimedia-Based Healthcare Systems: Opportunities, Challenges, Ethical and Future Prospects” provides a comprehensive review regarding Explainable Machine Learning (XML) for multimedia-based healthcare problems. In detail, it also discusses about opportunities, challenges, and even future ways with ethical concerns included. Chapter “Ensemble Deep Learning Architectures in Bone Cancer Detection Based on Medical Diagnosis in Explainable Artificial Intelligence” comes with an ensemble explainable Deep Learning system formation to provide bone cancer detection. Chapter “Digital Dermatitis Disease Classification Utilizing Visual Feature Extraction and Various Machine Learning Techniques by Explainable AI” focuses on the problem of digital dermatitis disease classification by building a solution mechanism covering visual feature extractions and Machine Learning usage with Explainable Artificial Intelligence (XAI) aspects. Chapter “Explainable Machine Learning in Healthcare” reports a good review regarding Explainable Machine Learning for healthcare problems, by focusing on cases with multimedia data. Chapter “Explainable Artificial Intelligence with Scaling Techniques to Classify Breast Cancer Images” considers the problem of breast cancer and introduces an XAI solution with scaling techniques and classification-based outcomes. Chapter “A Novel Approach of COVID-19 Estimation Using GIS and Kmeans Clustering: A Case of GEOAI” includes a recent problem: COVID-19, and builds an estimation approach by using GIS and K-means clustering in the context of an interpretable approach.

Preface

ix

Chapter “A Brief Review of Explainable Artificial Intelligence Reviews and Methods” is based on a review-based work in which the readers are informed about XAI methodologies and the most recent applications for healthcare problems. Chapter “Systematic Literature Review in Using Big Data Analytics and XAI Applications in Medical” ensures a systematic literature review for use of Big Data Analytics and XAI applications targeting medical problem cases. Chapter “Using Explainable Artificial Intelligence in Drug Discovery: A Theoretical Research” provides a review about using XAI for a very critical problem area: Drug Discovery. Chapter “Application of Interpretable Artificial Intelligence Enabled Cognitive Internet of Things for COVID-19 Pandemics” revisits the problem area of pandemics and introduces an interpretable Artificial Intelligence-enabled Cognitive Internet of Things for the COVID-19 case. Chapter “Remote Photoplethysmography: Digital Disruption in Health Vital Acquisition” considers a remarkable problem: Remote Photoplethysmography, and reports about the research regarding digital disruption in healthcare scope. We believe that the content of the book will be giving enough reference information and ideas for building the most recent knowledge and establishing further research studies. We would like to send our warmest regards and congratulations to all chapter authors. Also, our special thanks go to respectful Prof. Jose Antonio Marmolejo-Saucedo (from National Autonomous University of Mexico, Mexico) for his kind Foreword, which improved the value of the book. We would be grateful to receive any ideas, suggestions, and collaborations from all readers around the world. We hope you will enjoy your trip inside the pages of such a valuable, timely work. Thank you! Riyadh, Saudi Arabia Isparta, Turkey Delhi, India

M. Shamim Hossain Utku Kose Deepak Gupta

Acknowledgement

As the editors, we would like to thank Shina Harshavardhan, Susan Grove, and the Springer Team for their valuable efforts and great support on pre-organizing the content and publishing of the book.

xi

Contents

Automatic Fetal Motion Detection from Trajectory of US Videos Based on YOLOv5 and LSTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Musa Turkan, Furkan Ertürk Urfalı, and Emre Dandıl Explainable Machine Learning (XML) for Multimedia-Based Healthcare Systems: Opportunities, Challenges, Ethical and Future Prospects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joseph Bamidele Awotunde, Agbotiname Lucky Imoize Abidemi Emmanuel Adeniyi, Kazeem Moses Abiodun Emmanuel Femi Ayo, K. V. N. Kavitha, Gbemisola Janet Ajamu, and Roseline Oluwaseun Ogundokun Ensemble Deep Learning Architectures in Bone Cancer Detection Based on Medical Diagnosis in Explainable Artificial Intelligence . . . . . Ulaganathan Sakthi and R. Manikandan Digital Dermatitis Disease Classification Utilizing Visual Feature Extraction and Various Machine Learning Techniques by Explainable AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . İsmail Kirbaş and Kürşad Yiğitarslan Explainable Machine Learning in Healthcare . . . . . . . . . . . . . . . . . . . . . Pawan Whig, Shama Kouser, Ashima Bhatnagar Bhatia Rahul Reddy Nadikattu, and Pavika Sharma Explainable Artificial Intelligence with Scaling Techniques to Classify Breast Cancer Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abdulwasiu Bolakale Adelodun, Roseline Oluwaseun Ogundokun Akeem Olatunji Yekini, Joseph Bamidele Awotunde, and Christopher Chiebuka Timothy

1

21

47

65 77

99

xiii

xiv

Contents

A Novel Approach of COVID-19 Estimation Using GIS and Kmeans Clustering: A Case of GEOAI . . . . . . . . . . . . . . . . . . . . . . 139 Iyyanki MuraliKrishna and Prisilla Jayanthi A Brief Review of Explainable Artificial Intelligence Reviews and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Ferdi Sarac Systematic Literature Review in Using Big Data Analytics and XAI Applications in Medical . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Behcet Oznacar and Utku Kose Using Explainable Artificial Intelligence in Drug Discovery: A Theoretical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Bekir Aksoy, Mehmet Yücel, and Nergiz Aydin Application of Interpretable Artificial Intelligence Enabled Cognitive Internet of Things for COVID-19 Pandemics . . . . . . . . . . . . . 191 Joseph Bamidele Awotunde, Rasheed Gbenga Jimoh Abidemi Emmanuel Adeniyi, Emmanuel Femi Ayo Gbemisola Janet Ajamu, and Dayo Reuben Aremu Remote Photoplethysmography: Digital Disruption in Health Vital Acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215 Monika, Harish Kumar, Sakshi Kaushal, and Varinder Garg

Automatic Fetal Motion Detection from Trajectory of US Videos Based on YOLOv5 and LSTM Musa Turkan, Furkan Ertürk Urfalı, and Emre Dandıl

1 Introduction Abnormal developments in the fetus, especially in recent years, cause an increase in deaths close to birth for many countries [1]. Almost 48 million babies are stillborn over the past two decades, and average of 2 million babies is stillborn each year, mostly in low-income countries [2]. It is considered that most infant deaths can be prevented with life-saving measures and quality health care. Additionally, 3–6% of babies worldwide are born with a serious defect each year [3]. In addition, maternal mortality due to childbearing and pregnancy is approximately 211 per 100,000 live births [4]. Therefore, detection of an abnormality occurring in the fetus at an early stage is very important for both maternal and fetal health. In addition to monitoring the development of fetal anatomical structures, some data collected about the fetus can provide strong indications for the development of the fetus. In particular, the examination of fetal images and videos enables early diagnosis of some diseases/disorders. However, recognition and tracking of movements and determination of anatomical structures from fetal video and image scans require difficult and complex processes. Fetal scans are usually performed with M. Turkan Institute of Graduate, Department of Computer and Electronics Engineering, Bilecik Seyh Edebali University, Bilecik, Turkey e-mail: [email protected] F. E. Urfalı Medical Faculty, Department of Radiology, Kütahya Health Sciences University, Kütahya, Turkey e-mail: [email protected] E. Dandıl (✉) Faculty of Engineering, Department of Computer Engineering, Bilecik Seyh Edebali University, Bilecik, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Hossain et al. (eds.), Explainable Machine Learning for Multimedia Based Healthcare Applications, https://doi.org/10.1007/978-3-031-38036-5_1

1

2

M. Turkan et al.

ultrasound (US) imaging. US is widely used in the evaluation of fetal health and development due to its many advantages such as cost, availability, real-time processing [5]. By accurately monitoring the pregnancy process using US data, important anatomical structures in the fetus can be detected and movements can be followed. In addition, the intensity, frequency and duration of fetal movements can be determined by the perception of the movements by the mother. Therefore, automatic detection and evaluation of fetal movements becomes important. In addition, detection of critical health conditions in the fetus, such as the presence of active heartbeat, is associated with fetal movements [6]. In today’s routine US scans for the follow-up of fetal health, several biometric evaluations and measurements are performed for the anatomical structure of the fetus, such as measuring the bone lengths of the hands and legs, determining the head circumference and resting the heartbeat. Besides, some examinations such as fetal weight and evaluation of abdominal circumference are used to monitor the health status of the fetus [7]. In addition, the follow-up of fetal movements is an important indicator of a healthy pregnancy process, and decreased or no fetal movements can be considered as a sign that there may be serious risks to the fetus [8]. On the other hand, increased fetal movements may also be a sign of undesirable conditions. Consequently, accurate assessment of complex fetal movements such as this is vital. It is seen that there are different studies previously suggested for the detection, follow-up and classification of fetal movements. These studies generally include tools and methods that facilitate physicians’ decision-making processes. For example, studies to determine various organs and anatomical structures in the fetus [9], to make some biometric measurements such as head circumference and leg length [10], to determine the standard plane in the fetus [11, 12] are some of these studies. In addition, there are studies [13, 14] that provide classification of US images to assist experts in their decision-making. Ishikawa et al. [13] proposed the recognition and classification of fetal parts such as head, trunk, legs and other parts to predict fetal position. Deep learning algorithms, which can be applied in many fields, have been widely used in the processing of medical data in recent years. In addition, deep learning algorithms continue to improve day by day with higher performance, less hardware consumption and more accurate results. In deep learning models developed especially for the classification of movements, objects can be tracked with their motion trajectories. In this case, the changing position of the moving object in each frame can provide information about the movement. In this way, there are studies that are combined with deep learning methods and that classify using motion trajectories [15–17]. In addition, human movements can be defined by adding the motion trajectories of the joints as a data to the 2D image [18]. Moreover, trajectory information of objects can be used to represent temporal information of motion [19, 20]. In recent years, many studies [21–23] in which deep learning methods are widely used for the evaluation of fetal ultrasound data are gaining importance. van den Heuvel et al. [23] proposed a deep learning method that allows measuring head

Automatic Fetal Motion Detection from Trajectory of US Videos Based. . .

3

circumference using ultrasound images in countries where resources may be limited. In another study, Dozen et al. [24] segmented small and dynamically shaped heart structures with a deep learning-based network in fetal cardiac ultrasound videos using time series information and special region information. In their study, Ravishankar et al. [25] provided the detection and measurement of the abdomen region from 2D ultrasound images, using a method that combines traditional tissue recognition methods and deep learning methods. Yaqup et al. [26] proposed a deep learning method for the detection of brain abnormalities in the possible fetus to ensure the correct localization of the fetal brain, the detection of the region of interest, and the recognition of the acoustic pattern in the regions that allow to confirm the plane. Accurate acquisition of fetal planes on ultrasound images is an important consideration for accurate biometric measurements and the application of various diagnoses. In their work, Chen et al. [27] developed a framework that enables standard plane recognition from ultrasound videos. In another study, Gustavo et al. [28] presented a method that detects fetal anatomical structures on ultrasound images and can perform biometric measurements. In their study, Chen et al. [29] proposed a deep learning-based framework that enables the measurement of heart ventricles from 2D ultrasound images of the fetus. In another study, Arnaout et al. [30] developed a neural network method that can identify complex heart conditions before birth using ultrasound images to detect congenital heart conditions. It is known that trajectory information of fetal movements creates different pattern patterns according to anatomical structures. Therefore, the classification of different movement types can be achieved by obtaining the movement trajectories of the anatomical structures in the fetus. In this chapter, a deep learning approach based on YOLOv5 and LSTM methods is proposed for the detection and recognition of fetal anatomical structures using motion trajectories. In the study, first of all, a dataset is prepared from US videos containing the movements of anatomical structures in the fetus. In this dataset, class and location information are obtained by using the YOLOv5 network, allowing the recognition of fetal anatomical structures (organs). In the next step, using the patterns of the 2D trajectories created by the movements, the anatomical structure of the fetal movement is determined with LSTM deep neural networks. Thus, it is possible to classify fetal movements from orbital images. The next sections of the chapter are organized as follows. In the second part, the prepared US dataset and the proposed deep learning methods are presented in detail. In the third part, the research results obtained with the experimental studies are given. In the last part, the conclusions obtained are discussed, the experimental results are evaluated and the planned future activities are mentioned.

4

M. Turkan et al.

2 Material and Method Basically, it can be applied to the network by obtaining a single location point belonging to the object, or it can be applied to a deep learning network by obtaining more than one point to represent the object. Thus, the images obtained from fetal videos can be trained with an object recognition network that enables the recognition of each anatomical structure, and classification of anatomical structures in the fetus can be achieved. Any object recognition algorithm that draws a bounding box around the object can be used to obtain location information of fetal organs (anatomical structures). For this, first of all, ultrasound images can be labeled in accordance with the infrastructure of the object recognition algorithm used to classify anatomical structures. Software that performs labeling in accordance with the object recognition algorithm can be used for this process. After labeling, the point of the bounding box of the object recognition algorithm is determined as position information, and the position of the anatomical structure is taken in each video frame. Thus, an ordered list of points consisting of x and y coordinates is obtained, which sequentially follows the change of organ movements on the screen. In addition, the position information of the anatomical structure of the fetus may occur as points on the 2D plane, depending on the screen resolution. Different screen resolutions and different position of the organ on the screen can cause these values to be quite different. To eliminate this problem, before using the raw location information, this location information is normalized to be in the range of 0–1. Normalized data is treated as an ordered array of points showing the position of the organ in each frame. A suitable deep learning algorithm, LSTM, can be used to process such sequential data. By processing sequential data with LSTM, the succession of points can be estimated. This ordering process differs for each organ. Thus, by training an LSTM network for differences in the ordering of position information, classification of anatomical structures in the fetus can be achieved. In this chapter, a hybrid deep learning model based on YOLOv5 and LSTM methods is proposed for the detection and recognition of fetal anatomical structures from motion trajectories on US scans. In the prepared dataset, first of all, the recognition of anatomical structures and their locations are provided by the YOLOv5 algorithm. Then, by using the object recognition algorithm, the position information of the object detected throughout the video is created with the coordinate information in the 2D plane, and the pattern of the 2D trajectories of the movements is created. In the last stage, object motion positions are detected in each frame by using the point data array of the motion trajectories obtained, and the obtained motion points are recognized by the LSTM network. For this study, the methodology of the proposed system for the detection and recognition of fetal anatomical structures from the motion trajectories is shown in Fig. 1.

Automatic Fetal Motion Detection from Trajectory of US Videos Based. . .

5

Fig. 1 The methodology of the proposed system for the detection and recognition of fetal anatomical structures from the motion trajectories

2.1

Dataset

In this study, fetal scans used in the prepared dataset for the prediction of fetus’ anatomical structures were collected from Evliya Çelebi Training and Research Hospital of Kütahya Health Sciences University. In addition, with the decision of the Ethics Committee of Non-Interventional Clinical Researches of Kütahya Health Sciences University, dated 08.07.2021 and numbered 2021/12–07, it has been confirmed that there is no ethical and scientific inconvenience in the conduct of the study. Data on ultrasound scans were obtained from 10 different volunteers who were pregnant between 16 and 20 weeks. In the dataset, all pregnant women were selected from the second trimester cases because they allow easy evaluation of fetal movements. For all fetus, major extremity movements, diaphragm and swallowing movements and head and body movements were recorded as US video. US

6

M. Turkan et al.

Fig. 2 Frame slices of some anatomical structures in the prepared fetal US dataset (a) head, (b) heart, (c) head and body, and (d) head and body

evaluations in fetus were performed using Acuson S3000; Siemens Medical Solutions, Mountain View, CA. This US system includes a convex transducer with 4 MHz frequency and obstetric US preset. Unlike our previous study [5], a total of 2500 2D ultrasound images from US scans were acquired. These images were labeled with the consensus of three different experts and ground truths were created. Motion trajectories of fetal anatomical structures such as body, head, arm, and heart were extracted from fetal US videos in the dataset. Sample frames of the anatomical structures of the fetus from the dataset in this study are shown in Fig. 2.

2.2

Structure of YOLO v5

YOLO (you only look once) is a deep learning method proposed by Redmon et al. [31] in 2016 and used in object recognition problems with a bounding box. YOLO is also a one-step algorithm, and with YOLO, both location information and classification of the object can be obtained [32]. Other CNN-based algorithms with accuracy close to the YOLO algorithm require more computational load as they do object class determination and object location determination with separate networks. The created neural networks are applied separately to the image at multiple locations and scales. In the YOLO algorithm, the entire image is passed through the network once.

Automatic Fetal Motion Detection from Trajectory of US Videos Based. . .

7

Fig. 3 The modular architecture of the YOLOv5 deep learning model

The YOLOv5 network model was developed by Glenn Jocher [33] in 2020 based on PyTorch. The YOLOv5 model can provide higher speed than previous YOLO versions. Also, the YOLOv5 network generates model files of smaller size and generally takes less time to train. Created with the support of many open source developers, YOLOv5 is based on PyTorch and has different configurations for object recognition from previous versions [34]. The architecture of the YOLOv5 deep learning model, which is a typical one-stage object recognition method, is shown in Fig. 3. This model consists of three modular sub-components: backbone, neck and head [35]. In YOLOv5, the features from the input image are extracted using backbone. In addition, DarkNet framework-based Cross Stage Partial Networks (CSPNet) comes into play as the backbone structure in the YOLOv5 model [36]. In the second module, the neck component, of YOLOv5, the feature pyramid network (FPN) is generated by this component. Feature pyramids are used to recognize the same object in different sizes and resolutions, and in YOLOv5, path aggregation network (PANet) neck is generally preferred to obtain pyramid features. In the last component, head, object recognition is performed as in YOLOv4. Here, the probabilities of the object classes and the bounding boxes are also generated. YOLOv5 includes architecturally different models, and in this study, YOLOv5s, the smallest version, was used to identify fetal anatomical structures.

8

2.3

M. Turkan et al.

LSTM (Long-Short Term Memory) Deep Neural Networks

Recurrent neural networks (RNN) are deep learning networks that are often used to predict the next step. Unlike other networks, they can remember, and in RNN networks, the inputs correlate with each other to predict the next step. During the RNN training phase, all relationships between the inputs are remembered and each result feeds the next step [37]. While CNN uses more spatially related data, RNN deep learning structures are models developed for processing time series and sequence data [38] and are widely used in applications such as voice applications, text recognition, author recognition, natural language processing [39]. An RNN cell is shown in Fig. 4. Because the RNN itself contains loops that make the information permanent, A, a network segment, looks at an x(t) input and generates a y(t) output. Thus, a loop allows information to pass from one step of the network to the next. RNN deep learning models can be thought of as multiple copies of the same network, with each cell transmitting a message to the next. An activation function is implemented in each layer here [40]. Thus, a chain-like structure is formed for the RNN as in Fig. 5. The chain-like structure reveals that RNN is closely related to

Fig. 4 Recurrent neural network cell structure

Fig. 5 Layered chain structure of cells in RNN

Automatic Fetal Motion Detection from Trajectory of US Videos Based. . .

9

Fig. 6 A single layer of modules in the RNN structure

Fig. 7 Cell structure in LSTM deep learning networks

arrays and lists, and is the natural architecture of the neural network to be used for such data. In addition, as seen in Fig. 6, each module in the RNN structure contains a single layer. The method that makes RNN successful is the LSTM network. Gradient disappearance problem in simple RNN models is solved with LSTM networks [41]. LSTM networks remember the important data and forget the unimportant ones and solve the gradient disappearance problem [42]. Within each repeating module in the LSTM network, there are four layers: input gate, output gate, forget gate, and memory cells [43]. Each layer uses different weight values and the output is calculated using the sigmoid function and the previous state [44]. In Fig. 7, the

10

M. Turkan et al.

structure of an LSTM cell module is denoted. Here, tanh is the tangent activation function and S is the sigmoid activation function. In addition, X is the multiplication, + addition, y (t) output, c (t-1) the previous memory cell state information, c (t) the memory state information going to the next cell / cell state, y (t-1) represents the output of the previous cell and x (t) represents the current data input.

3 Experimental Analysis In this study, anatomical structures in ultrasound images are detected with a hybrid method based on deep learning. First of all, in a fetal US dataset prepared specifically for this study, the positions of anatomical structures in the 2D plane are obtained using YOLOv5. Then, this obtained position information is applied to an LSTM network to predict anatomical structure movements in 2D plane. The results obtained within the scope of the experimental analyze show that the anatomical structures in the fetus are detected successfully with the proposed hybrid method by using the trajectories. US data for experimental studies were obtained from 10 different volunteers who were pregnant between 16 and 20 weeks. A total of 2500 2D ultrasound images (frames) from US scans were acquired for recognizing the anatomical structure in the fetus using YOLOv5 and obtaining location information. Two thousand of these fetus images are for training, and five hundred images are for test. These frames were delineated with the consensus of three different experts from Evliya Çelebi Training and Research Hospital of Kütahya Health Sciences University, as seen in Fig. 8. LabelImg [45] software tool was used to label US frames by experts.

Fig. 8 Labeling of anatomical structures by experts in fetal US images

Automatic Fetal Motion Detection from Trajectory of US Videos Based. . .

11

Fig. 9 Recognition of fetal anatomical structures from US videos with YOLOv5 and generating of bounding boxes with detection scores. (a) body and head, (b) heart, (c) heart, and (d) body, heart and head

Figure 9 shows the results of the recognition of some anatomical structures such as head, body and heart of the fetus using the YOLOv5 method in the fetal US dataset. From here, it is seen that the anatomical structures of the fetus are recognized successfully with the YOLOv5 based method, and the object is labeled with bounding boxes and its scores are also indicated. In Fig. 9a, the anatomical structures of the body and head are recognized successfully. In addition, the anatomical structure of the heart in Fig. 9b, c and the anatomical structures of the body, head and heart in Fig. 9d are recognized in US videos. In this study, as in our previous study [5], objects were identified on US videos using the Deep-SORT algorithm. Moreover, using YOLOv5, anatomical structures (like head, heart and body) were recognized from US scans. In addition, their motion trajectories were extracted by tracking their movements. In this regard, Fig. 10 shows the tracking of the motions of anatomical structures identified in the fetal ultrasound with the Deep-SORT algorithm during the US video and the extraction of motion trajectories. Here, the upper left corner of the bounding box of the fetal anatomical structure was used as the base for the position information. Moreover, the trajectory of the detected anatomical planes was formed and 2D trajectory patterns were created with different colors. After extracting the trajectories of the anatomical structures on the fetal images, the motion trajectories of the anatomical structures were obtained as raw data as csv files. The 2D signals created for the heart and head with motion trajectories are

12

M. Turkan et al.

Fig. 10 Extracting the patterns of the motion trajectories of the anatomical structures of the fetus

Fig. 11 2D signals created for heart and head with motion trajectories

presented in Fig. 11. In addition, different values may appear in the trajectory data obtained depending on the width and height of the screen. Depending on where the anatomical structure of the fetus is located on the screen, x and y coordinate information may also change. For this reason, in order to make the trajectory data more meaningful, the obtained signals were subjected to min-max normalization. When the movement points of the anatomical structures in the fetus are combined with a line on the screen, different patterns emerge. Therefore, differences in movements can be detected when the position information of the anatomical structures in the fetus is displayed in a 2D plane as a graph showing the x and y points, as in Fig. 11. From this point of view, a corner of the bounding box determined by the

Automatic Fetal Motion Detection from Trajectory of US Videos Based. . .

13

Fig. 12 Structure of LSTM network used for classification of trajectory data converted into 2D signal array

algorithm or the midpoint of the bounding box can be used when determining the position information of the anatomical structures to be referenced. In some movements, the reference to the midpoint may not make a descriptive difference, but the midpoint of the object will not change for movements such as heartbeat movement. For such cases, it would be more appropriate to consider the corner points of the object. Instead of a single vertex of the object, all vertices of the bounding box can be obtained as trajectory data. In this case, additional features such as the changing size of the object on the screen can also be used for classification purposes. When the position information is taken by reference to the same point for each anatomical structure in the US image, a dataset in the form of a sequence is formed. Trajectory data can be examined by obtaining methods such as evaluating each point as a vector with x and y components, examining the change of only one axis. The LSTM method, which is used to predict the next data in a series of data, is a suitable method for examining the sequential trajectory data depending on time. Therefore, in this study, trajectory data converted into 2D signal sequences were classified with LSTM and it was determined from which anatomical structure the movement in the fetus originated. This trajectory information was collected as much as possible from different videos according to the anatomical structure type and a dataset was created for training the LSTM network. The LSTM network works with arrays with a certain number of elements as input. When dividing the position data into sequences, it is necessary to use the successive position information to reveal the differences of the movement. The sequential position data obtained according to the number of elements of each array to be applied to the LSTM network has to be divided. Therefore, a sequence of points sufficient to define the motion in accordance with the LSTM network for training was determined and applied as an input to the LSTM network. In this study, the structure of the LSTM network used for classification of trajectory data converted into 2D signal array is shown in Fig. 12. Training and testing are done using a fully connected layer at the output of the LSTM network. To train this LSTM network, the action points are divided into arrays of 20 points. One hundred movement point data

14

M. Turkan et al.

Fig. 13 Changes in the movements of the anatomical structure of fetal heart in different US frames

that was not used in the training was also used in the testing processes. The created LSTM network is trained for 100 epochs. In addition, the motion trajectory data of the anatomical structures of the fetus were created on a computer equipped with an Intel Core i5 CPU and an Intel UHD Graphics 620 graphics card. The normalization of the data, training and testing of the LSTM network was performed on a computer with a 2.20 GHz Xeon(R) CPU and an NVIDIA Tesla T4 GPU via Google Colab. In Fig. 13, four frames of fetal heart anatomical structure obtained from US videos and fetal movements are denoted. In Frame 1, Frame 4, Frame 12 and Frame 26, it is seen that the visual pattern of the heart and therefore its movements have changed. From here, the anatomical structure can be classified by extracting the pattern of the motion trajectory in each frame in the US videos of the heart. As seen in Fig. 14a, when the movement data of the heart recognized by YOLOv5 in the fetus are normalized, a graph is obtained with the x and y components changing as in Fig. 14b. The size of the heart changes during heartbeats, and therefore the position of the bounding box may also change. This change can enable heartbeat movements to be captured. In experimental studies with reference to the midpoint of the bounding box, it may not be possible to capture the differences in motion where the size of the object changes, such as heartbeat motion. Changes in

Automatic Fetal Motion Detection from Trajectory of US Videos Based. . .

15

Fig. 14 (a) Recognizing the anatomical structure of the heart from the video image of the fetus using YOLOv5 and generating the bounding box, (b) Normalizing the motion trajectory position data for the x-axis of the heart

Fig. 15 Loss function change in LSTM network training and validation

heart rate motion can be observed more accurately if a point is referenced from the outer edges of the bounding box. Approximately 21 thousand parameters are trained in the LSTM network created for the estimation of anatomical structures from 2D signal data generated from motion trajectories. In this LSTM network, there is a bi-directional LSTM (Bi-LSTM) in the first layer, an intermediate layer consisting of 50 cells, and an output layer with a single output. ReLu is used as the activation function in the middleware and the output is estimated with a softmax function. The position data is split into 20 elements and applied to the LSTM network. The graph of loss function change in LSTM network training is as seen in Fig. 15.

16

M. Turkan et al.

Fig. 16 (a) Prediction of anatomical structure from fetal heart movements of the proposed LSTM network for training data, (b) prediction of anatomical structure from fetal heart movements of the proposed LSTM network for validation data

In experimental studies, fetal anatomical movements were predicted on training and test data using the LSTM network trained with 100 epochs. Figure 16a shows a comparison of training and prediction values for the heart along the x-axis, and Fig. 16b shows a comparison of validation and prediction results in analyzes with test data not included in training. While the mean square error (MSE) value for estimation with training data was 0.0069, the mean MSE value for validation data was calculated as 0.0097. From this, it can be seen that the prediction results

Automatic Fetal Motion Detection from Trajectory of US Videos Based. . .

17

generally agree with the real data and the LSTM network can recognize the characteristics of heart movements from both training and validation sets.

4 Conclusion In this chapter, a deep learning approach is proposed by combining YOLOv5 and LSTM methods for detecting and recognizing fetal anatomical structures such as head, heart, body and arm using motion trajectories. First of all, anatomical structures in the dataset created from US videos were recognized by using the YOLOv5 network. Then, using the signal patterns of the 2D trajectories created by the movements, the anatomical structure of the fetal movement was predicted using LSTM deep neural networks. Thus, classification of fetal movements from trajectory images has been achieved. In fact, there are many studies using trajectory information for motion classification. However, there are not many studies where objects in ultrasound videos can be classified using trajectory information. In the study, it has been observed that when the fetal movement trajectory information is normalized and trained in the LSTM network, the network can adapt to the movements of the organ, and therefore objects in ultrasound videos can be classified using trajectory information. In addition, it is a difficult process to obtain videos in which anatomical structures move in fetal ultrasound scans and to collect videos with sufficient clarity for each anatomical structure. Also, since the fetus does not always move, it is another challenge to wait for appropriate movements to occur to recognize movements at the time of video acquisition. Instead of using a single point of the object in fetal videos, more point data of movement can be collected by using other points of the bounding box. Using these data can increase the classification performance of the movement. Using this feature of the network, it is possible to predict abnormal motion patterns. Obtaining the movements of fetuses with anomalies is a difficult procedure. Also, disturbances that cause motion anomalies are rare, so collecting such fetal motion videos is also challenging. If ultrasound videos with abnormal movements can be obtained in practice, it is considered that these movements can be predicted. Besides, deep learning models can be used to obtain 3D scans. Generative adversarial network (GAN) is used to create such images, to obtain artificial images that are very lifelike. It may be possible to obtain higher resolution anatomical structure images by applying GAN networks to fetal anatomical structures. Finally, converting the fetus images to high resolution with such a mesh can provide clearer images with ordinary ultrasound devices without the need for tedious operations such as detailed ultrasound. Acknowledgments We would like to thank Evliya Çelebi Training and Research Hospital of Kütahya Health Sciences University for providing the dataset used in this study and Ethics Committee of Non-Interventional Clinical Researches of Kütahya Health Sciences University for its approval.

18

M. Turkan et al.

References 1. Salomon, L. J., Alfirevic, Z., Berghella, V., Bilardo, C., Hernandez-Andrade, E., Johnsen, S., Kalache, K., Leung, K. Y., Malinger, G., & Munoz, H. (2011). Practice guidelines for performance of the routine mid-trimester fetal ultrasound scan. Ultrasound in Obstetrics & Gynecology, 37, 116–126. 2. Report of the UN Inter-agency Group for Child Mortality Estimation, 2020-A Neglected Tragedy The global burden of stillbirths, 2020. Accessed 30 Oct 2022. https://childmortality. org/wp-content/uploads/2020/10/UN-IGME-2020-Stillbirth-Report.pdf 3. World Birth Defects Day 2022: Global efforts to prevent birth defects and support families, 2022. Accessed 30 Oct 2022. https://www.cdc.gov/globalhealth/stories/2022/world-birthdefects-day-2022.html 4. Trends in Maternal Mortality 2000 to 2017 Estimates by WHO, UNICEF, UNFPA, World Bank Group and the United Nations Population Division, 2019. Accessed 29 Oct 2022. https://apps. who.int/iris/bitstream/handle/10665/327596/WHO-RHR-19.23-eng.pdf?sequence=13& isAllowed=y 5. Dandıl, E., Turkan, M., Urfalı, F. E., Bıyık, İ., & Korkmaz, M. (2021). Fetal movement detection and anatomical plane recognition using YOLOv5 network in ultrasound scans. Avrupa Bilim ve Teknoloji Dergisi, 26, 208–216. 6. Wróbel, J., Kupka, T., Horoba, K., Matonia, A., Roj, D., & Jeżewski, J. (2014). Automated detection of fetal movements in Doppler ultrasound signals versus maternal perception. Journal of Medical Informatics & Technologies, 23, 43. 7. Deepika, P., Suresh, R., & Pabitha, P. (2021). Defending against child death: Deep learningbased diagnosis method for abnormal identification of fetus ultrasound Images. Computational Intelligence, 37, 128–154. 8. You, J., Li, Q., Guo, Z., & Zhao, R. (2017). Smart fetal monitoring. In International conference on information science and applications (pp. 494–503). Springer. 9. Yaqub, M., Napolitano, R., Ioannou, C., Papageorghiou, A., & Noble, J. A. (2012). Automatic detection of local fetal brain structures in ultrasound images. In 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI) (pp. 1555–1558). IEEE. 10. Sobhaninia, Z., Rafiei, S., Emami, A., Karimi, N., Najarian, K., Samavi, S., & Soroushmehr, S. R. (2019). Fetal ultrasound image segmentation for measuring biometric parameters using multi-task deep learning. In 2019 41st annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 6545–6548). IEEE. 11. Lei, B., Zhuo, L., Chen, S., Li, S., Ni, D., & Wang, T. (2014). Automatic recognition of fetal standard plane in ultrasound image. In 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI) (pp. 85–88). IEEE. 12. Yu, Z., Ni, D., Chen, S., Li, S., Wang, T., & Lei, B. (2016). Fetal facial standard plane recognition via very deep convolutional networks. In 2016 38th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 627–630). IEEE. 13. Ishikawa, G., Xu, R., Ohya, J., & Iwata, H. (2019). Detecting a fetus in ultrasound images using grad CAM and locating the fetus in the uterus. In ICPRAM (pp. 181–189). 14. Malathi, G., & Shanthi, V. (2009). Wavelet based features for ultrasound placenta images classification. In 2009 second international conference on emerging trends in engineering & technology (pp. 341–345). IEEE. 15. Wang, L., Qiao, Y., & Tang, X. (2015). Action recognition with trajectory-pooled deepconvolutional descriptors. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4305–4314). 16. Roy, P., & Bilodeau, G.-A. (2019). Adversarially learned abnormal trajectory classifier. In 2019 16th Conference on Computer and Robot Vision (CRV) (pp. 65–72). IEEE. 17. Shi, Y., Zeng, W., Huang, T., & Wang, Y. (2015). Learning deep trajectory descriptor for action recognition in videos using deep neural networks. In 2015 IEEE international conference on multimedia and expo (ICME) (pp. 1–6). IEEE.

Automatic Fetal Motion Detection from Trajectory of US Videos Based. . .

19

18. Wang, P., Li, Z., Hou, Y., & Li, W. (2016). Action recognition based on joint trajectory maps using convolutional neural networks. In Proceedings of the 24th ACM international conference on multimedia (pp. 102–106). 19. Abdul-Azim, H. A., & Hemayed, E. E. (2015). Human action recognition using trajectorybased representation. Egyptian Informatics Journal, 16, 187–198. 20. Shi, Y., Tian, Y., Wang, Y., & Huang, T. (2017). Sequential deep trajectory descriptor for action recognition with three-stream CNN. IEEE Transactions on Multimedia, 19, 1510–1520. 21. Gao, Y., Maraci, M. A., & Noble, J. A. (2016). Describing ultrasound video content using deep convolutional neural networks. In 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI) (pp. 787–790). IEEE. 22. Sinclair, M., Baumgartner, C. F., Matthew, J., Bai, W., Martinez, J. C., Li, Y., Smith, S., Knight, C. L., Kainz, B., & Hajnal, J. (2018). Human-level performance on automatic head biometrics in fetal ultrasound using fully convolutional neural networks. In 2018 40th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 714–717). IEEE. 23. van den Heuvel, T. L., Petros, H., Santini, S., de Korte, C. L., & van Ginneken, B. (2019). Automated fetal head detection and circumference estimation from free-hand ultrasound sweeps using deep learning in resource-limited countries. Ultrasound in Medicine Biology, 45, 773–785. 24. Dozen, A., Komatsu, M., Sakai, A., Komatsu, R., Shozu, K., Machino, H., Yasutomi, S., Arakaki, T., Asada, K., & Kaneko, S. (2020). Image segmentation of the ventricular septum in fetal cardiac ultrasound videos based on deep learning using time-series information. Biomolecules, 10, 1526. 25. Ravishankar, H., Prabhu, S. M., Vaidya, V., & Singhal, N. (2016). Hybrid approach for automatic segmentation of fetal abdomen from ultrasound images using deep learning. In 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI) (pp. 779–782). IEEE. 26. Yaqub, M., Kelly, B., Papageorghiou, A. T., & Noble, J. A. (2017). A deep learning solution for automatic fetal neurosonographic diagnostic plane verification using clinical standard constraints. Ultrasound in Medicine Biology, 43, 2925–2933. 27. Chen, H., Dou, Q., Ni, D., Cheng, J.-Z., Qin, J., Li, S., & Heng, P.-A. (2015). Automatic fetal ultrasound standard plane detection using knowledge transferred recurrent neural networks. In International conference on Medical Image Computing and Computer-Assisted Intervention (pp. 507–514). Springer. 28. Carneiro, G., Georgescu, B., Good, S., & Comaniciu, D. (2008). Detection and measurement of fetal anatomies from ultrasound images using a constrained probabilistic boosting tree. IEEE Transactions on Medical Imaging, 27, 1342–1355. 29. Chen, X., He, M., Dan, T., Wang, N., Lin, M., Zhang, L., Xian, J., Cai, H., & Xie, H. (2020). Automatic measurements of fetal lateral ventricles in 2d ultrasound images using deep learning. Frontiers in Neurology, 11, 526. 30. Arnaout, R., Curran, L., Zhao, Y., Levine, J. C., Chinn, E., & Moon-Grady, A. J. (2021). An ensemble of neural networks provides expert-level prenatal detection of complex congenital heart disease. Nature Medicine, 27(5), 882–891. 31. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, realtime object detection. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 779–788). 32. Chen, K., Li, H., Li, C., Zhao, X., Wu, S., Duan, Y., & Wang, J. (2022). An automatic defect detection system for petrochemical pipeline based on Cycle-GAN and YOLO v5. Sensors, 22, 7907. 33. YOLOv5. (2020). Accessed 11 Feb 2022. https://github.com/ultralytics/yolov5 34. Malta, A., Mendes, M., & Farinha, T. (2021). Augmented reality maintenance assistant using yolov5. Applied Sciences, 11, 4758.

20

M. Turkan et al.

35. Liu, W., Wang, Z., Zhou, B., Yang, S., & Gong, Z. (2021). Real-time signal light detection based on yolov5 for railway. In IOP conference series: Earth and environmental science (p. 042069). IOP Publishing. 36. Wang, C.-Y., Liao, H.-Y. M., Wu, Y.-H., Chen, P.-Y., Hsieh, J.-W., & Yeh, I.-H. (2020). CSPNet: A new backbone that can enhance learning capability of CNN. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops (pp. 390–391). 37. Staudemeyer, R. C., & Morris, E. R. (2019). Understanding LSTM--A tutorial into long shortterm memory recurrent neural networks, arXiv preprint arXiv:1909.09586. 38. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. 39. Sherstinsky, A. (2020). Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404, 132306. 40. Lei, C. (2021). RNN. In Deep learning and practice with MindSpore (pp. 83–93). Springer. 41. Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., & Xu, W. (2016). Cnn-rnn: A unified framework for multi-label image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2285–2294). 42. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780. 43. Park, K., Choi, Y., Choi, W. J., Ryu, H.-Y., & Kim, H. (2020). LSTM-based battery remaining useful life prediction with multi-channel charging profiles, Ieee. Access, 8, 20786–20798. 44. Williams, G., Baxter, R., He, H., Hawkins, S., & Gu, L. (2002). A comparative study of RNN for outlier detection in data mining. In 2002 IEEE international conference on data mining, 2002. Proceedings (pp. 709–712). IEEE. 45. Labelimg. (2022). Accessed 16 Aug 2022. https://github.com/heartexlabs/labelImg

Explainable Machine Learning (XML) for Multimedia-Based Healthcare Systems: Opportunities, Challenges, Ethical and Future Prospects Joseph Bamidele Awotunde , Agbotiname Lucky Imoize , Abidemi Emmanuel Adeniyi , Kazeem Moses Abiodun , Emmanuel Femi Ayo , K. V. N. Kavitha , Gbemisola Janet Ajamu and Roseline Oluwaseun Ogundokun

,

J. B. Awotunde (✉) Department of Computer Science, Faculty of Communication and Information Sciences, University of Ilorin, Ilorin, Kwara State, Nigeria e-mail: [email protected] A. L. Imoize Department of Electrical & Electronics Engineering, Faculty of Engineering, University of Lagos, Lagos, Nigeria Department of Electrical Engineering and Information Technology, Institute of Digital Communication, Ruhr, University, Bochum, Germany e-mail: [email protected] A. E. Adeniyi · K. M. Abiodun Department of Computer Science, Landmark University, Omu-Aran, Nigeria e-mail: [email protected]; [email protected] E. F. Ayo Department of Mathematical Sciences, Olabisi Onabanjo University, Ago-Iwoye, Ogun State, Nigeria e-mail: [email protected] K. V. N. Kavitha Department of Communication Engineering, School of Electronics Engineering (SENSE), Vellore Institute of Technology, Vellore, Tamil Nadu, India e-mail: [email protected] G. J. Ajamu Department of Agricultural Extension and Rural Development, Landmark University, Omu Aran, Nigeria e-mail: [email protected] R. O. Ogundokun Department of Computer Science, Landmark University, Omu-Aran, Nigeria Department of Multimedia Engineering, Kaunas University of Technology, Kaunas, Lithuania e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Hossain et al. (eds.), Explainable Machine Learning for Multimedia Based Healthcare Applications, https://doi.org/10.1007/978-3-031-38036-5_2

21

22

J. B. Awotunde et al.

1 Introduction The coronavirus (COVID-19) outbreak that began in Wuhan, China in December 2019 has had a devastating impact on the entire biosphere. The pandemic is one of the most rapidly spreading contagious viruses in recent years, posing a new threat to the global healthcare system. The estimated infection cases globally as at 18th August, 2021 have reached 209,670,370 confirmed cases, including 4,399,468 deaths, and the active cases stands at 17,343,556 across 213 countries. These have been steadily increasing every hour in numerous countries. The introduction of these novel SARS-CoV-2 variations threatens to undo the tremendous success made so far in restricting the spread of this viral infection, despite the exceptional speed with which vaccines against COVID-19 have been developed and robust global mass immunization efforts. In this global health catastrophe, medical experts and researchers are looking for new tools to hunt for and stop the spread of the pandemic [1]. In order to provide appropriate patient care, treatment, and isolation to avoid disease containment, rapid monitoring of viral infection is essential not only for healthcare professionals, but also from a broader public health point of view [2]. The advanced computational research like 5G technology, Cloud Computing, Edge Computing, Fog Computing, Artificial Intelligence (AI), and Internet of Things (IoT) in this scenario is the recent smart technologies that can be applied to fight and combat major clinical challenges associated with COVID-19 pandemic [1, 3]. The capacity to justify decisions, suggestions, forecasts, or other actions taken by a machine intelligence system, as well as the reasoning behind them, is a crucial aspect of Machine Leaning (ML) and Artificial Intelligence (AI) algorithms [1]. The idea of interpretation is intimately associated to that of explanation: if a human can comprehend how a system works, either by introspection or through a generated explanation, then that system is open to interpretation [2]. Since the majority of ML-based models are not easily interpretable, explaining is frequently a challenging undertaking. Justification is a similar idea: intrinsically, a justification shows why a decision is sound, although it might or might not do so by outlining the specific steps that led to the decision [3]. For non-interpretable systems, justifications can be created as opposed to introspective interpretations. Numerous researchers have indicated that explanation is crucial for user acceptability and contentment [1]. The capacity to justify decisions was regarded as the most highly desired feature of a decision-assisting platform by doctors in a previous study [4]. The authors in [5] tested three different forms of expert system explanations: trace, justification, and strategy, discovered that reasons in particular, as well as comprehensive explanations, help users embrace the produced recommendations, additionally, that justification (characterized as outlining the explanation for each decision-making stage) was the form of explanation that changed users’ perceptions toward the system most effectively. Later research in many disciplines that aimed at investigating the value of explanation to users, has repeatedly shown that explanations greatly boost users’

Explainable Machine Learning (XML) for Multimedia-Based. . .

23

confidence and trustworthiness [6–9] as well as their capacity to accurately determine the accuracy of a prediction [10–12]. In the past, explanations were mostly viewed as a task of systems design and first arose in the context of rule-based expert systems (i.e., the challenge of creating a system that can produce justification and explanation into its decisions). Since the 1970s, there has been discussion on the need to explain the decisions made by expert systems [13]. A framework for developing expert systems with explanation skills was presented by authors in [14], and was among the first to emphasize the significance of explanations that are not just evidence but also include reasoning. The framework of authors in [14] is a more recent illustration of this. Both utilized a distinct corporate strategy knowledge base and a domain-specific taxonomy knowledge base, and both were solely for rule-based systems. Similarly, authors in [15] added the communication layer to the previously mentioned domain and strategic layers to further divide the knowledge into three layers. It was designed to make it possible for a communication specialist to develop solutions that were distinct from the particular system and domain by splitting the communication layer from the rest of the system. Early work on explanation in the ML-based literature frequently concentrated on creating visuals of the prediction to help ML-based specialists assess the accuracy of the model. Nomograms are a very popular visualization approach. The authors in [16] applied it to logistic regression models first in 1978, and authors in [17] applied it to Naive Bayes in 2004, the approach was applied in Support Vector Machine (SVM) in 2005 by the authors in [18] and other models. A visualization-based explanation approach for Naive Bayes classifiers was put forth by authors in [19]. The hidden states of neural models have been the subject of visualization techniques [20], Convolutional Neural Nets (CNNs) are particularly notable for their use in image categorization [21, 22], and the use of Recurrent Neural Nets (RNNs) in Applications of Natural Language Processing (NLP) [23–25]. Besides visualization, two major explanation strategies have been the subject of investigation. Prediction interpretation and justification come first, where a (usually difficult to understand) model and forecast are offered, and the prediction requires justification. The second approach is called interpretable models, and it seeks to create models that are naturally interpretable and can be justified. The application of explainability is a big component of developing and rising systems’ transparency [26, 27], and cultivating trust is frequently the best strategy for enhancing system transparency [28]. For instance, it’s crucial to learn enough about how the system behaves in order to identify previously unidentified weaknesses and defects, in order to prevent the emergence of unintentional coincidences. Researchers contend that in order to assess the genuine capability of the healthcare system, explainability is essential. Especially from the perspective of developers who are designing the system, understanding the internal dynamics of an AI-based system is key. It is also highly important to study the outcomes in order to enhance the process. In other words, explainability can contribute to improving ML-based model results. Improvement is therefore a secondary goal that can be achieved by

24

J. B. Awotunde et al.

Fig. 1 Comprehensive objectives of explainable AI-based model

using XML approaches. The overall goals of Explainable Machine Learning (XML) are shown in Fig. 1. While ML-based models do not have the same level of predetermined understanding, it is possible to use the built-in ML-based explanations to find correlations and links in data. The objective of XML, according to experts, is to produce deep knowledge by learning from the behavior and results of the algorithm [29]. The use of ML-based algorithms is increasing in situations where a human error could have catastrophic results. Although there are several methods for explaining how an ML system works inside, each one has advantages and disadvantages of its own [30]. The intended audience and use case are two additional factors that affect what constitutes an outstanding explanation [31]. The question of whether explainability should be a prerequisite for ML-based models that support medical expert systems has generated controversy among researchers. Notwithstanding the substantial evidence to the contrary, there are a number of strong arguments against explainability as a virtue that is beneficial, acceptable, desired, and even necessary [32, 33]. There are compelling arguments against this conventional opinion as well [34, 35]. One of the primary prerequisites for trustworthy ML-based algorithms, according to the High-Level Expert Group on AI of the European Commission (AI HLEG), is transparency. Because of this, explainability is just one of several methods used to determine how transparent an ML-based model is. Additional safety measures include accurate dataset and algorithm code documentation, as well as practical talks of the systems’ advantages, disadvantages, and weaknesses [36]. The AI-based experts disagree with the AI-view HLEG’s that medical expert systems should always be believed because explainability is valued by the group. However, AI

Explainable Machine Learning (XML) for Multimedia-Based. . .

25

HLEG says that measures should be taken to incorporate and add more accessibility and transparency aspects into algorithms that don’t already have them [37, 38]. In general, humans don’t support procedures that aren’t transparent, understandable, intuitive, interpretable, maintainable, or reliable [39], which highlights the requirement for ethical ML-based models [40]. Although it’s a common fallacy, focusing solely on performance won’t always lead to greater results. The systems will only become more complex as a result. It should be highlighted that model performance and transparency have an antagonistic relationship [41]. Even Nevertheless, a deeper comprehension of a system could help to improve its deficiencies. Three primary factors can help the design and application of AL and ML-based models when comprehensibility is used as a sophisticated design driver: (a) By identifying and reversing distortion in the training dataset, interpretability contributes to assuring objectivity in selection. (b) By highlighting possible threatening perturbations that could lead the prediction to change, interpretability promotes resilience. (c) Interpretability may serve as a safeguard to guarantee that the model explanation has real underlying causation or that only pertinent variables are used to decide the conclusion. A visual depiction of the model’s discriminatory practices rules, or a knowledge of the model’s processes and assumptions, must be provided through the system’s interpretation, or ideas as to what would disrupt the model so that the system could be regarded as viable for the aforementioned reasons [42]. Therefore, this chapter discusses the numerous research opportunities and constraints encountered, as well as the application of explainable ML in healthcare expert systems to medical operations. Moreover, a framework has been proposed for the prediction of various diseases using explainable AI.

1.1

The Following Are the Significant Contributions of This Chapter

1. examining the arguments for and against the explainability of expert systems in the larger healthcare sector, the chapter looks at some of the key XML application areas. 2. the prospects and significant challenges of XML for healthcare systems in general were discussed, and 3. a framework based on explainable ML has been suggested for the early prediction of various diseases.

26

1.2

J. B. Awotunde et al.

Chapter Organization

Section 2 presents the types of multimedia data in healthcare systems. Section 3 discusses the explainable machine learning for multimedia data in healthcare systems. Section 4 presents the challenges of explainable machine learning in healthcare systems. Section 5 discusses the effective explainable machine learning framework for healthcare systems, section 6 presents the research prospects and open issues, and finally, section 7 concludes the chapter with future direction.

2 Multimedia Data in Healthcare Systems Texts, photos, videos, and audios are examples of the diverse types of data that can be combined to create multimedia data. Data were analyzed from each of the modalities can be extracted using multimedia data extraction to enhance the performance application [43]. Modality describes a particular method of encoding information. Multimedia data from various angles on a physiological object provide additional information that is complimentary to the analysis and interpretation. For instance, in speech recognition, visual modalities concerning lip and mouth motion in addition to audio data provide crucial information [44]. Multimedia representations naturally picks up characteristics from several modalities when there are correlations and relationships between those modalities [45]. Multimedia data are produced every day as a result of the booming online services and cellular technology [46]. The creation of various healthcare applications has been made possible by the availability of these data. However, given the challenges associated with processing, gathering, managing, and storing such a large amount of data, multimedia big data analytics requires new research areas [47]. The development of computing clusters is tied to advances in multimedia data research, new hardware innovations and new data-processing methods. Multimedia research analytics examines how to comprehend and visualize many sorts of data to address problems in practical applications [46]. The characteristics of multimedia data include heterogeneity, unstructuredness, and size. It uses information gathered from a variety of sources, including sensors, cameras, and social networks, to mention a few. Due to its heterogeneity, the data must be transformed into a specific format in order to be analyzed and evaluated [48]. There are many different types of data used in healthcare applications, including patient records, medical images, doctor’s notes, and radiographic films [49, 50]. Since features can be dynamically derived from data without human interference and can reflect several degrees of abstraction, ML-based techniques are a promising answer for multimedia data [51]. Additionally, they address the difficulties of modeling multiple variables at once by incorporating multimedia heterogeneity data in order to increase accuracy [52].

Explainable Machine Learning (XML) for Multimedia-Based. . .

27

Multimedia representation’s primary goal is to close the distribution gap in a shared feature space, maintaining modality-specific properties as a result. According to authors in [53], there are three different types of multimedia representation techniques: joint representation, coordinated representation, and encoder-decoder. By combining all the multimedia elements, joint representation projects all the unimodal representations into a single, shared global subspace. Each neural network used to encode a modality is different. Then, in order to extract and combine commonalities into a single vector, they are mapped to a shared subspace. Conversely, synchronized representation increases relationships or similarities using cross-modal similarity models. It develops coordinated representations for each modality under a restriction, thus aiding in the preservation of each modality’s unique qualities. Encoder-decoder method, which maps one modality to another, learns an intermediate presentation. It is made up of a decoder, which uses the vector created by the encoder to create a new sample, and an encoder, which maps the source modality into a vector [54].

2.1 2.1.1

Types of Multimedia Presentations in Healthcare Systems Audio

Real-time audio (for example, speech) analysis applications are necessary for medical equipment in the healthcare industry. The goal of audio analysis is to glean useful information from unstructured raw audio data [55]. Human body noises, including heart rate, speech, breath sound, and digestive system, can be recorded with the aid of wearable technology [56]. For instance, Mel-frequency cepstral coefficients, voice strength, and pitch are all extracted using Opensmile [57].

2.1.2

Visual Data

Videos and photos are examples of visual data (i.e., a series of images). The overwhelming volume of information is a major barrier for visual data, for which big data solutions are necessary [58]. The quick development of ML for processing visual data is one of the biggest technological advances in recent years. Among all the elements that influence this evolution, datasets with labels are essential. Many datasets are frequently utilized in ML-based research and analysis of various solutions. ML-based object recognition is a key component in many systems, including autonomous vehicles. To examine data with underlying patterns that are challenging for mathematical rules to convey, ML-based models can be used. Large volumes of data are frequently needed because of how complicated ML-based models are. Among all ML-based algorithmic success tales, one of the most notable advancements is the technology for identifying things in pictures and videos. Large datasets, one of many aspects that

28

J. B. Awotunde et al.

go into this, are quite important. To train and test ML-based models and achieve success in computer vision using innovative architectures, visual datasets with labels are used, with example like Faster-RCNN [59], FCIS [60], and AlexNet [61].

2.1.3

Video

Applications for video data include, but are not limited to, fraud detection, surveillance, healthcare, and criminal detection. It comprises of a series of images that must be examined individually in order to extract crucial information [62]. Numerous imaging technologies are employed in smart healthcare, including X-ray, Optical Coherence Tomography (OCT), Magnetic Resonance Imaging (MRI), CT imaging, Positron Emission Tomography (PET), microscopy images and functional MRI (fMRI) among others [63, 64]. Medical professionals employ those techniques for both treatment and diagnosis [65]. Since each time step’s input represents an image, the same approaches that are used for photographs can also be utilized for videos. ML-based features are utilized in addition to handcrafted features. For instance, OpenFace enables the extraction of head posture, eye gaze, and facial landmarks [66]. However, because visual data is so large, performing analytics research in the aforementioned areas requires high-performance computing and technologies like cloud computing [67]. Medical images, or first-hand knowledge, reflect a patient’s condition and make it possible to identify organ diseases like pathologies. Ophthalmic imaging, for instance, can be used to evaluate eyes [68], diseases of the brain, heart, bones, and joints using MRI [69], CT imaging of the chest and internal organs [70], X-rays are also used for the chest and breast [71]. Medical images can be used in ML-based techniques since they have a vector format and 2D or 3D pixel measurements, like employment of CNN and multilayer neural networks to identify diseases, segment images, and forecast disease [72].

2.1.4

Text

Web pages, social media feeds, and metadata are all examples of text-based multimedia data. Both structured and unstructured text data are possible. Utilizing database query retrieval methods, structured data is analyzed. In order to be examined, unstructured data must next be converted into structured data [73]. One of the applications in healthcare is emotion analysis from text and multimedia data [74]. Information Extraction (IE), sentiment analysis, summarization, and Question Answering (QA) are examples of text analysis approaches [75]. A text-mining system for big data text analytics in biological data is called SparkText [76]. ML-based algorithms can be used to evaluate electronic health records (EHRs) and forecast diseases or complications to enhance the quality of medical care. EHRs include a variety of clinical data formats, including name, sex, home address, phone number among others, thereby necessitating a suitable fusing

Explainable Machine Learning (XML) for Multimedia-Based. . .

29

method. Emission summaries can be found in EHRs (e.g., results of laboratory testing, medical histories, diagnoses made by doctors, and therapies, to name a few), measurements, death certificates, and reports [77].

3 Explainable Machine Learning for Multimedia Data in Healthcare Systems The use of ML to solve medical problems has significantly increased over the past few years, and multimedia and medicine have long been seen as colleagues for ML-powered systems [78]. A few significant factors that explain why ML has become more and more popular in the medical industry include the real progress of health data, including images, videos, and sensor data, that are available for analysis, the development of effective algorithms, and powerful hardware [79]. ML has been used, among other things, to find polyps in the digestive tract [80, 81], Skin cancer categorization [82] and surgical hypoxemia prognosis [83]. Manually reviewing and interpreting medical data can take a lot of time, and it is usually best left to a qualified medical professional. Additionally, the meanings could be arbitrary and reliant on the particular operator. Hence, ML may be helpful as it paves the way for an autonomous, reliable, and effective examination of medical data. DL-based models can be an effective method when there is a lot of medical data [84]. Convolutional neural networks (CNNs) have, for instance, proved successful in the medical industry for the analysis of images and movies [85]. These algorithms, sadly, are extremely complicated and difficult for people to understand. Lack of interpretability can be a challenge when developing reliable systems. When it comes to implementing and using ML models in medicine and healthcare, this has been acknowledged as a difficulty [86, 87]. By explaining the ML models and their predictions, XML seeks to address this issue [88]. Applying XML methodologies is necessary for healthcare professionals to believe in and choose to employ ML models in their medical practice [89]. It is helpful for our needs to distinguish between intrinsic and extrinsic XML methods. The goal of intrinsic explanations is to describe the model’s internal workings. Extrinsic explanation techniques present the model’s behavior and treat it like a “black box” [90]. Global explanations, on the other hand, seek to characterize the model as a whole, but local explanations, however, concentrate on explaining specific [91]. An established extrinsic XML approach is Shapley additive explanations (SHAP) [92]. It can be used with any ML model because the technique is model-agnostic. Any type of data can be used with SHAP, which gives local model explanations [93]. Innate gradient-based techniques, such as GradCAM [94], are useful for ML-based models for image processing. Are dominating [95]. The XML field is still in its early stages, thus there is still room to improve the explanation techniques. Because ML model judgments in a healthcare setting may impact therapies and consequently patient outcomes, the models must be highly

30

J. B. Awotunde et al.

trusted by healthcare professionals. Healthcare practitioners could decline to employ models if they are too difficult to comprehend. The application of ML should ultimately result in quicker and more accurate diagnosis as well as better patient outcomes and quality of life.

3.1

A Classification of Techniques: Various Interpretability Scopes for Machine Learning

Researchers use the concepts explainability and interpretability interchangeably; Despite the fact that these two notions are extremely closely related, several works highlight their distinctions. The mathematical concept of interpretability or explainability lacks an accurate description. Neither have they been evaluated using a metric/ However, other efforts have been made [96, 97] to define these two terms as well as associated ideas like understandability. All of these formulations, however, lack formality and rigor in mathematics [33]. Doshi-Velez and Kim’s concept of interpretability is one of the most widely used ones. They define it as “the ability to explain or to present in words that a human can understand” in their work [96]. Miller’s definition of interpretability as “the degree to which a human may understand the cause of a decision” is another widely used definition, found in his work [98]. These definitions lack mathematical formality and rigor yet being intuitive [33]. Based on the foregoing, interpretability is mostly related to the reasoning behind a model’s results [33]; the notion is that the interpretability of a ML-based system the simpler it is to find correlations between causes and effects in the system’s inputs and outputs. For instance, in image recognition tasks, certain prominent patterns in the image may have contributed to a system’s determination that a particular object is present in the image (output) (input). On the other hand, explainability is connected to the internal mechanics and logic of a ML-based system. Humans have a greater knowledge of the internal processes that take place while a model is learning or making judgments the more explicable the model is. A model that is interpretable does not necessarily mean that people can comprehend its fundamental logic or underlying operations. Interpretability does not prima facie imply explainability in the context of ML-based systems, and vice versa. When it comes to examining the rapidly evolving field of interpretability methodologies, various viewpoints exist. Examples include the kind of data these methods work with or whether they refer to global or local characteristics. ML-based interpretability techniques shouldn’t be categorized in a biased way. There are various points of view that set these procedures apart and may even further split them. Therefore, it is important to analyze all features of each technique in order for a clinician to choose the best approach for the unique requirements of each challenge they meet.

Explainable Machine Learning (XML) for Multimedia-Based. . .

31

Fig. 2 Machine learning interpretability techniques taxonomy overview map

Based on the possible varieties of algorithms that may be used, a divergence of interpretability approaches that is particularly significant might take place. If their use is limited to a particular collection of algorithms, then we refer to these techniques as model-specific. When compared, but model agnostic techniques are those that could be used with any algorithm. The magnitude of interpretation is another essential component of distinguishing the interpretability techniques. If the procedure simply offers an explanation for one particular occurrence, if the method just explains a portion of the model, it is a local one; otherwise, it is a global one. The kind of data that these procedures could be employed to is the last important factor that needs to be made. Images and tabular data are the two most popular categories, but there are also several approaches for text data. Figure 2 displays a streamlined mind-map; this illustrates the various criteria that could be used to classify an interpretability technique. Clinicians should constantly take these factors into consideration, so that the best approach for meeting their needs could be found. The goal that these strategies were designed to achieve and the means by which they do so are the main topics of this classification. As a result, the provided taxonomy identifies four key areas for interpretability techniques: strategies for illuminating complicated black-box models, white-box model creation techniques, techniques that encourage equity and limit the presence of discrimination, and, methods for analyzing the robustness of predicted results are the final step.

32

J. B. Awotunde et al.

4 The Challenges of Explainable Machine Learning in Healthcare Systems Researchers are working on XML tools and strategies to create reliable and secure models for healthcare contexts. Despite their best efforts, a number of problems still exist that make successful XML difficult. Several of these difficulties are detailed in this subsection. No formally defined terms: There is no formal definition for the explanation of the model structure or decision, and is defined in light of the issue at hand. The same is true for XML applications in the healthcare industry. Additionally, concepts like feature relevance, feature importance, saliency maps, heatmaps, etc. need to be defined, due to the inconsistent use of these terminology [99].

4.1

Lack of Standardized Requirements for XML

Researchers have created some preliminary recommendations on what makes a good XML model, these rules are general, nevertheless. Animal image identification will have distinct requirements for justification than medical image tagging. There are currently no standards for creating, measuring, or testing justifications in the medical XML industry. These criteria are necessary to provide clearer and more organized methods for explaining how black-box models identify or predict a specific disease [99, 101].

4.2

Unstandardized Representation Techniques

Saliency maps or heatmaps are created by all graphical representations explanations, highlighting the regions of images that are more involved in predictions. The interest of radiologists or neurologists in these theories, however, is not yet standardized. Additionally, it is unclear how the explanations will be interpreted by the end-user (a patient or a physician). Additionally, it could be challenging for new or inexperienced doctors to comprehend the language of explained outcomes. Additionally, it’s possible that the medical professionals won’t be able to comprehend the listed risk factors and predicted probabilistic explanations [102, 103]. A platform interconnecting XML researchers and medical specialists is necessary for them to collaborate and create standardized explanatory representations [104]. Quantifying the amount of information needed to make the decision intelligible to non-technical end users like patients presents another challenge, it is also crucial to gaining their confidence in these technologies.

Explainable Machine Learning (XML) for Multimedia-Based. . .

4.3

33

What Clinicians Expect: Explainability vs. Accuracy

It is a well-known issue with XML that basic ML produces results that are less accurate and easier to explain, and complicated DL, due to their complex non-linear structures, give more accurate findings with fewer justifications. This problem is not just with healthcare XML. However, because medical data is multidimensional, precise outcomes that result in less-explained results or explanations that are algorithm-centric, DL algorithms must be avoided. Designing intrinsically explicable methods that can deliver reliable findings with intricate medical data is one approach to solving this issue [105]. The alternative solution involves taking the end-preferences users into account.

4.4

What and How of the Results Explained

Recreated images are created using feature maps of healthcare image data and contain emphasized regions that stand in for features that can be recognized by humans. But the solutions to issues like what to deal with these imperfectly reconstructed pictures, how can we ensure that the feature combinations revealed by the XML are resistant to disruptions, how investigators might recover input data that hasn’t yet been taken into account by using the internally marked parameters. Complex medical data will be analyzed with the aid of reverse image analysis. By using this technique, practitioners can better comprehend the underlying mechanisms of numerous fatal illnesses like human immunodeficiency viruses (HIV), various cancer, COVID-19 [106], Zaire Ebola [107], and many other infectious diseases.

4.5

Security and Privacy Issues

Despite the performance of ML, and specifically DL-based approaches, at the state of the art, the weaknesses of these systems to antagonistic ML attacks have been emphasized in numerous recent research [108]. Furthermore, ML/DL-based healthcare platforms were previously the target of similar attacks [109]. Beyond hostile ML, there are numerous security issues that make it difficult to implement ML/DL in real-world clinical contexts, a thorough summary of these difficulties can be found in [110]. These difficulties pose numerous questions regarding the security of ML/DL-enabled systems, hence, the development of trustworthiness and transparency in ML/DL enabled medical solutions depends on the robustness of ML/DL models. Since an ML/outstanding DL’s functionality alone cannot vouch for its safety, which essentially determines how safe the ML/DL-enabled technology is for people, or patients. On the other hand, it is equally crucial that patients and

34

J. B. Awotunde et al.

physicians alike have confidence in ML/DL-based diagnosis, prediction, monitoring and treatments.

4.6

Verification of Explanations

The metrics used to verify the caliber of the explanations offered are insufficient. The absence of a metric for comparing explanations provided using various techniques is a particular issue. For instance, different XML methodologies have been used to describe the detection of glioma tumors, However, no one compared which approach led to a more accurate explanation of tumor detection. Similar to this, clinicians may require various measures for the validation of stated outcomes depending on the medical applications. There is no accepted way to evaluate how well healthcare decisions are conveyed. Additionally, there is no way to choose which of the various explanations generated by the same procedure should be favored [111].

4.7

Ethical Restrictions

The ethical balance between end-users and XML must be maintained in black-box model explanations to win the trust of clinicians and patients. An explanation, in particular, needs to be thorough and shouldn’t lead the end user astray [112]. To improve fairness and dependability, XML should provide an explanation for any errors in the results. Regrettably, there are no standards for judging how accurate and thorough an explanation is. The use of XML in therapeutic contexts could have negative effects because these safeguards are not available. As data reconstruction from explanations can be used negatively, it is also ethically required to examine how the explanations affect the dignity and wellbeing of patients [113].

4.8

Lack of Theoretical Knowledge

The theoretical foundations for dealing with data randomness are lacking in applied DL for medical applications. By using mathematical methods for dealing with the random errors and noise prevalent in medical data, field professionals attempted to close this gap. However, given the lack of reliable fundamental rules and models, the necessary scale cannot be reached in the explanation of DL. The development of self-explained universal DL for clinical applications is likewise being hampered by these problems [8]. Additionally, the DL’s “black-box” character presents a significant obstacle to building credibility [114].

Explainable Machine Learning (XML) for Multimedia-Based. . .

4.9

35

Absence of Cause

By discovering the obscure patterns that produce data, DL is made to produce exact outcomes. The issue emerges when these approaches are used for medical jobs where decisions should be made based on causal relationships. To infer causal relationships across decisions and data, DL is ineffective. As a result, insufficient results are produced, which results in poor or insufficient explanations. Additionally, XML should respond to cause-and-effect scenarios, i.e., if a physician substitutes treatment C with treatment D, the model’s conclusion will switch from A to B [115]. For making ethical decisions, these causal connections are necessary. The necessity of a strong correlation between images and their annotations was also stressed by authors in [116].

5 An Effective Explainable Machine Learning Framework for Healthcare Systems It is now clear that black-box models must be explicable in order to produce fair and reliable clinical decisions. Techniques to construct explainable models are currently being developed by researchers. However, there are still numerous ways to advance the field of XML in healthcare systems. In this subsection, we design the process for making data-driven medical applications understandable. We talk about how explainability is important for algorithms at every level, from design to clinical use. 1. ML approaches study patterns of data to make decisions, revealing the hidden parts of the data. Results that are misleading are those that are caused by data bias, subjectivity, redundancy, or problems with data representation. We should start with the explanation of data in order to obtain outcomes that are reliable and equitable. As an illustration, consider the work of authors in [117]. They created classifiers to categorize pneumonia patients as having a high or low probability of dying in the hospital. A patient with asthma who is in for pneumonia has a low chance of in-hospital mortality, according to the best model they could find. The exact opposite, though, is true. They discovered after further investigation that the asthma patients, were treated more quickly for pneumonia when admitted than those without asthma, resulting in a survival success, for this reason. Data on people who were refused medical care because they lacked health insurance can serve as another illustration. The findings produced by ML will be biased if it learns from that data. The same is true for data leakage, which can cause model learning and testing to be inaccurate [118]. Researchers must create a data explanation approach that examines all of the target’s dependency relationships in order to avoid these issues. 2. Defining the black- box’s structure: It is possible to further categorize the black-box explanation challenge into two groups. First category is explaining

36

J. B. Awotunde et al.

Fig. 3 The process for explainable machine learning deriving for the black-box models

the logic of the black-box in a human-understandable way (model-based justification), describing the input-output significance that the model uses to make decisions falls under the second category (explication of findings) [119]. Modelbased explanation techniques have been widely researched and applied in the field of healthcare. When it comes to logic learning, these models very closely resemble the behavior of black-box models and offer global interpretability. Due to their straightforward structure, several ML approaches, such as decision trees and random forests, are naturally explicable. Nevertheless, a lot of black-box models need explanations from other simulations that replicate their results. 3. Describing and explaining the findings and results: For certain non-technical medical end users, deciphering a model’s structure and logic can be challenging. In this instance, the only information that can be useful is the justification for the model’s choice. This justification typically discusses the importance of the feature for output. For generality requirements, a global explanation is necessary for a single patient, in contrast to the local explanation. 4. Evaluating the impact and effectiveness of explanations: Evaluation of explanations is a challenging undertaking since explainability is a non-monolithic term and is subjective. We cannot determine the optimum method for measuring the XML’s quality, nor can we determine the degree to which the model can be explained. Few scholars have concentrated on the issue of analyzing XML considering the increasing amount of research on the subject. The following are some strategies that healthcare researchers have chosen for the evaluation. These methods can be used for more than just evaluating healthcare XML. These processes, which are necessary to describe the black-box models, are shown in Fig. 3.

Explainable Machine Learning (XML) for Multimedia-Based. . .

37

(a) Application-based Evaluation: Integrate the justification into the solution or application and have the final user test it, who is typically a subject matter specialist. This method aids in testing the explanation in actual use-case circumstances. Considering the ML-based software used to annotate medical data, which labels the areas of the data that are unhealthy. The physician would test the annotations software in the clinical application to assess the model. The physician can justify the same choice and assess the utility of the data annotation software and its justification. (b) Human-based Assessment: Comparable to application-based evaluation is this method. The primary distinction is that it may be tested without an expensive experimental setting or a domain expert. The explanations can be tested with non-experts, and additionally, the availability of a large number of testers (laypeople) makes it easier to generalize the results. Authors in [120] used this evaluation strategy to assess explanations that included both text and visual data. (c) Evaluation based on functions: This method does not involve involving people in the process (layperson or domain expert). When human-based or applicationbased reviews have already been completed, it functions properly.

6 Research Prospects and Open Issues It is obvious why strong, explainable ML/DL techniques should be used in the healthcare industry. We talk about some potential areas for further research in this subsection. We outline potential and obstacles for further research in the area based on our analysis of the pertinent literature. These conclusions are drawn from the reviewed work and comparison of all the publications studied, as well as from the problems raised in each piece. There is a significant opportunity for improving explainable models when Semantic Web Technologies and ML are combined. Identification of knowledge base items and ML-based data matching, which has been referred to as knowledge matching [121] as one major issue that requires resolution in further study. Particularly, trustworthy and automated knowledge matching techniques are needed. For knowledge matching to be effective and efficient, more study in this area and adjacent areas like semantic annotation is required. Explainable reinforcement learning and clustering require additional research. Additionally, we observe that the work across various jobs and specialties still feels rather isolated in this context, even when methods for merging different disciplines are provided by ideas like linked data [122]. Given that explanations are a type of social engagement [33], their clarity and comprehensibility as viewed by the user substantially influence their effectiveness and quality. In other terms, the value of an explanation depends on how well the user can comprehend it. We have demonstrated in this review that the appearance and form of explanations vary widely among contemporary systems and a large number

38

J. B. Awotunde et al.

of those don’t offer explanations in plain language. Therefore, we think that the study of natural language processing (NLP) in general, and the development of natural language (NLG) in particular, provides a useful place to start. In order to provide the user with the most value, we also think that explanations ought to be flexible and participatory. Users using structured knowledge bases might be able to examine and engage with many types of explanations. For instance, the user could look through many explanations, may delve deeper into an explanation to uncover more precise factors that supported a forecast. Regarding the actual form of engagement, there does not appear to be any agreement, nevertheless. In order to display and engage with explanations in the best possible ways, hence, findings from a wider range of study topics must be included in upcoming studies. It is crucial to deal with issues like adversarial ML attacks in order to get the comprehensible, reliable, secure, and robust ML/DL approaches. It has been demonstrated in recent years that ML/DL algorithms can be easily tricked to produce desired results [123]. The crucial nature of application scenarios gives hostile actors good reason to discredit the ML/DL-based system and achieve their objectives. Numerous adversarial ML attacks have previously been put forth in the literature, and very little research has been done to build effective countermeasures [124, 125]. This demonstrates the urgent need for antagonistically robust ML/DL algorithms to be developed. Furthermore, overcoming obstacles like hostile ML attacks is necessary in order to fully realize the clinical benefit of ML/DL developments. The development of ML/DL approaches holds considerable promise for the future of healthcare. But in order to truly profit from these developments, problems like ethical ones must be adequately handled. A few research in this area recommended including all types of stakeholders in the development of ML/DL methods. It could comprise, to name a few, medical professionals, decision-makers, data scientists, ML researchers, and hospital employees [126, 127]. Collaboration amongst knowledge experts will be made possible by such a multidisciplinary research team (physicians and ML researchers, for example) and healthcare service providers who will ultimately boost output and results. Latent variables that are discovered from the data are the foundation of ML approaches’ judgments. One of the most challenging types of data to manage is medical data, which is complicated, multi-variate, occasionally non-stationary, and sparse. Due to the issue of latent-variables’ dependence on one another, which might result in false patterns, the decision-making based on ML will be deceptive. Before developing a model, the data should, according to the literature, be carefully examined to make sure they are acceptable for the issue being represented [128]. Furthermore, it’s crucial to comprehend how and why this healthcare data was gathered. Additionally, dealing with data bias is quite difficult and can eventually result in algorithmic bias [129]. These biases are challenging to eliminate, and doing so can have unanticipated effects on the outcomes [130]. Model trustworthiness is decreased by the inclusion of these small inefficiencies in medical data, especially when they are not adjusted during model construction [131]. Consequently, it is imperative to first explain the reliance and significance of data variables and patterns

Explainable Machine Learning (XML) for Multimedia-Based. . .

39

in order to design understandable, dependable, resilient, and trustworthy algorithms (before than supplying the data to ML algorithms).

7 Conclusion and Future Directions Due to technological advancement, this contemporary era has seen the development of numerous new technologies, including cloud computing, edge computing, quantum computing, ML-based, DL-based, Internet of Things, and blockchain technology. With the aid of these technology, people can live pleasantly and without difficulty. Technology now conserves the environment while assisting humans, wasting as little of the finite/available resources as possible. But, these technologies come with their problems and challenges that make them difficult to be use in many fields especially in healthcare systems. In healthcare systems, ML-based models are particularly helpful for the diagnosis and treatment of a variety of infectious diseases. Because of the complexities of existing ML-based models, it is challenging for humans to understand the precise decision-making process during interpretation. The inner structure of most models are hidden and can only be examined in the connection of input and output attributes. They are usually referred to as “black-box algorithms” because of this. Explainability is a crucial aspect that must be present for ML-based models to reach a level of adaptability and applicability. The majority of ML techniques are data-restricted and can only offer explanations based on the knowledge included in the training data. Therefore, this chapter presents the applicability, challenges, prospects and future direction of XML models in the context of healthcare systems. The chapter proposed a framework for XML-based model for diagnosis, prediction and forecast of various diseases. By highlighting the advantages, the abundance of studies on XML-based approaches in recent years has shown that there is still room for improvement and potential for better development. Additionally, these techniques can improve currently used ML-based models, hence increase their acceptance in healthcare systems. It also showed how much they lacked in terms of achievement, defects, and shortcomings. To formalize and quantify how much information integration enhances explainability, more study is required.

References 1. Biran, O., & Cotton, C. (2017, August). Explanation and justification in machine learning: A survey. IJCAI-17 Workshop on Explainable AI (XAI), 8(1), 8–13. 2. Awotunde, J. B., Adeniyi, E. A., Ajamu, G. J., Balogun, G. B., & Taofeek-Ibrahim, F. A. (2022). Explainable artificial intelligence in genomic sequence for healthcare systems prediction. In Connected e-Health (pp. 417–437). Springer.

40

J. B. Awotunde et al.

3. Abiodun, K. M., Awotunde, J. B., Aremu, D. R., & Adeniyi, E. A. (2022). Explainable AI for fighting COVID-19 pandemic: Opportunities, challenges, and future prospects. In Computational intelligence for COVID-19 and future pandemics (pp. 315–332). 4. Teach, R. L., & Shortliffe, E. H. (1981). An analysis of physician attitudes regarding computer-based clinical consultation systems. Computers and Biomedical Research, 14(6), 542–558. 5. Ye, L. R., & Johnson, P. E. (1995). The impact of explanation facilities on user acceptance of expert systems advice. MIS Quarterly, 19, 157–172. 6. Herlocker, J. L., Konstan, J. A., & Riedl, J. (2000, December). Explaining collaborative filtering recommendations. In Proceedings of the 2000 ACM conference on computer supported cooperative work (pp. 241–250). 7. Sinha, R., & Swearingen, K. (2002, April). The role of transparency in recommender systems. In CHI’02 extended abstracts on human factors in computing systems (pp. 830–831). 8. Bilgic, M., & Mooney, R. J. (2005, January). Explaining recommendations: Satisfaction vs. promotion. In Beyond personalization workshop, IUI (Vol. 5, p. 153). 9. Symeonidis, P., Nanopoulos, A., & Manolopoulos, Y. (2009, October). MoviExplain: A recommender system with explanations. In Proceedings of the third ACM conference on recommender systems (pp. 317–320). 10. Gkatzia, D., Lemon, O., & Rieser, V. (2016). Natural language generation enhances human decision-making with uncertain information. arXiv preprint arXiv:1606.03254. 11. Kim, B., Khanna, R., & Koyejo, O. O. (2016). Examples are not enough, learn to criticize! Criticism for interpretability. Advances in Neural Information Processing Systems, 29. 12. Biran, O., & McKeown, K. R. (2017, August). Human-centric justification of machine learning predictions. IJCAI, 2017, 1461–1467. 13. Shortliffe, E. H., & Buchanan, B. G. (1975). A model of inexact reasoning in medicine. Mathematical Biosciences, 23(3–4), 351–379. 14. Swartout, W., Paris, C., & Moore, J. (1991). Explanations in knowledge systems: Design for explainable expert systems. IEEE Expert, 6(3), 58–64. 15. Barzilay, R., McCullough, D., Rambow, O., DeCristofaro, J., Korelsky, T., & Lavoie, B. (1998, August). A new approach to expert system explanations. In Natural language generation (pp. 78–87). 16. Lubsen, J., Pool, J., & Van der Does, E. (1978). A practical device for the application of a diagnostic or prognostic function. Methods of Information in Medicine, 17(02), 127–129. 17. Možina, M., Demšar, J., Kattan, M., & Zupan, B. (2004, September). Nomograms for visualization of naive Bayesian classifier. In European conference on principles of data mining and knowledge discovery (pp. 337–348). Springer. 18. Jakulin, A., Možina, M., Demšar, J., Bratko, I., & Zupan, B. (2005, August). Nomograms for visualizing support vector machines. In Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining (pp. 108–117). 19. Szafron, D., Greiner, R., Lu, P., Wishart, D., MacDonell, C., Anvik, J., et al. (2003). Explaining naïve Bayes classifications (TR03-09). Department of Computing Science, University of Alberta. 20. Tzeng, F. Y., & Ma, K. L. (2005). Opening the black box-data driven visualization of neural networks (pp. 383–390). IEEE. 21. Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034. 22. Zeiler, M. D., & Fergus, R. (2014, September). Visualizing and understanding convolutional networks. In European Conference on Computer Vision (pp. 818–833). Springer. 23. Karpathy, A., Johnson, J., & Fei-Fei, L. (2015). Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078. 24. Strobelt, H., Gehrmann, S., Huber, B., Pfister, H., & Rush, A. M. (2016). Visual analysis of hidden state dynamics in recurrent neural networks. CoRR abs/1606.07461 (2016). arXiv preprint arXiv:1606.07461.

Explainable Machine Learning (XML) for Multimedia-Based. . .

41

25. Li, J., Chen, X., Hovy, E., & Jurafsky, D. (2015). Visualizing and understanding neural models in nlp. arXiv preprint arXiv:1506.01066. 26. Shin, D. (2021). The effects of explainability and causability on perception, trust, and acceptance: Implications for explainable AI. International Journal of Human-Computer Studies, 146, 102551. 27. Shin, D. (2020). User perceptions of algorithmic decisions in the personalized AI system: Perceptual evaluation of fairness, accountability, transparency, and explainability. Journal of Broadcasting & Electronic Media, 64(4), 541–565. 28. Ehsan, U., Liao, Q. V., Muller, M., Riedl, M. O., & Weisz, J. D. (2021, May). Expanding explainability: Towards social transparency in ai systems. In Proceedings of the 2021 CHI conference on human factors in computing systems (pp. 1–19). 29. Meske, C., Bunde, E., Schneider, J., & Gersch, M. (2022). Explainable artificial intelligence: Objectives, stakeholders, and future research opportunities. Information Systems Management, 39(1), 53–63. 30. Pimenov, D. Y., Bustillo, A., Wojciechowski, S., Sharma, V. S., Gupta, M. K., & Kuntoğlu, M. (2023). Artificial intelligence systems for tool condition monitoring in machining: Analysis and critical review. Journal of Intelligent Manufacturing, 34(5), 2079–2121. 31. Ploug, T., & Holm, S. (2020). The four dimensions of contestable AI diagnostics-A patientcentric approach to explainable AI. Artificial Intelligence in Medicine, 107, 101901. 32. Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., et al. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. 33. Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access, 6, 52138–52160. 34. London, A. J. (2019). Artificial intelligence and black-box medical decisions: Accuracy versus explainability. Hastings Cent. Rep., 49(1), 15–21. 35. Robbins, S. (2019). A misdirected principle with a catch: Explicability for AI. Minds and Machines, 29(4), 495–514. 36. Awotunde, J. B., Oluwabukonla, S., Chakraborty, C., Bhoi, A. K., & Ajamu, G. J. (2022). Application of artificial intelligence and big data for fighting COVID-19 pandemic. In Decision sciences for COVID-19 (pp. 3–26). 37. Veale, M. (2020). A critical take on the policy recommendations of the EU high-level expert group on artificial intelligence. European Journal of Risk Regulation, 11(1), 1–10. 38. Hleg, A. I. (2019). High-level expert group on artificial intelligence: Ethics guidelines for trustworthy AI. European Commission, 9, 2019. 39. Zhu, J., Liapis, A., Risi, S., Bidarra, R., & Youngblood, G. M. (2018, August). Explainable AI for designers: A human-centered perspective on mixed-initiative co-creation. In 2018 IEEE conference on Computational Intelligence and Games (CIG) (pp. 1–8). IEEE. 40. Goodman, B., & Flaxman, S. (2017). European Union regulations on algorithmic decisionmaking and a “right to explanation”. AI Magazine, 38(3), 50–57. 41. Došilović, F. K., Brčić, M., & Hlupić, N. (2018, May). Explainable artificial intelligence: A survey. In 2018 41st international convention on information and communication technology, electronics and microelectronics (MIPRO) (pp. 0210–0215). IEEE. 42. Langer, M., Oster, D., Speith, T., Hermanns, H., Kästner, L., Schmidt, E., et al. (2021). What do we want from Explainable Artificial Intelligence (XAI)?–A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artificial Intelligence, 296, 103473. 43. Ramachandram, D., & Taylor, G. W. (2017). Deep multimodal learning: A survey on recent advances and trends. IEEE Signal Processing Magazine, 34(6), 96–108. 44. Awotunde, J. B., Ogundokun, R. O., Ayo, F. E., & Matiluko, O. E. (2020). Speech segregation in background noise based on deep learning. IEEE Access, 8, 169568–169575.

42

J. B. Awotunde et al.

45. Zhang, S. F., Zhai, J. H., Xie, B. J., Zhan, Y., & Wang, X. (2019, July). Multimodal representation learning: Advances, trends and challenges. In 2019 International Conference on Machine Learning and Cybernetics (ICMLC) (pp. 1–6). IEEE. 46. Pouyanfar, S., Yang, Y., Chen, S. C., Shyu, M. L., & Iyengar, S. S. (2018). Multimedia big data analytics: A survey. ACM Computing Surveys (CSUR), 51(1), 1–34. 47. Chen, S. C. (2019). Multimedia deep learning. IEEE MultiMedia, 26(1), 5–7. 48. Abiodun, M. K., Misra, S., Awotunde, J. B., Adewole, S., Joshua, A., & Oluranti, J. (2021, December). Comparing the performance of various supervised machine learning techniques for early detection of breast cancer. In International conference on hybrid intelligent systems (pp. 473–482). Springer. 49. Supriya, M., & Deepa, A. J. (2020). Machine learning approach on healthcare big data: A review. Big Data and Information Analytics, 5(1), 58–75. 50. Oladipo, I. D., AbdulRaheem, M., Awotunde, J. B., Bhoi, A. K., Adeniyi, E. A., & Abiodun, M. K. (2022). Machine learning and deep learning algorithms for smart cities: A start-of-theart review. In IoT and IoE Driven Smart Cities (pp. 143–162). 51. Cardone, B., Di Martino, F., & Senatore, S. (2022). A fuzzy partition-based method to classify social messages assessing their emotional relevance. Information Sciences, 594, 60–75. 52. Hung, C. Y., Lin, C. H., Chang, C. S., Li, J. L., & Lee, C. C. (2019, July). Predicting gastrointestinal bleeding events from multimodal in-hospital electronic health records using deep fusion networks. In 2019 41st annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 2447–2450). IEEE. 53. Guo, W., Wang, J., & Wang, S. (2019). Deep multimodal representation learning: A survey. IEEE Access, 7, 63373–63394. 54. Richardson, E., Alaluf, Y., Patashnik, O., Nitzan, Y., Azar, Y., Shapiro, S., & Cohen-Or, D. (2021). Encoding in style: A stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2287–2296). 55. MAlnajjar, M. K., & Abu-Naser, S. S. (2022). Heart sounds analysis and classification for cardiovascular diseases diagnosis using deep learning 56. Cook, J., Umar, M., Khalili, F., & Taebi, A. (2022). Body acoustics for the non-invasive diagnosis of medical conditions. Bioengineering, 9(4), 149. 57. Li, B., Dimitriadis, D., & Stolcke, A. (2019, May). Acoustic and lexical sentiment analysis for customer service calls. In ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5876–5880). IEEE. 58. Chen, P. T., Lin, C. L., & Wu, W. N. (2020). Big data management in healthcare: Adoption challenges and implications. International Journal of Information Management, 53, 102078. 59. Wang, X., Han, S., Chen, Y., Gao, D., & Vasconcelos, N. (2019, October). Volumetric attention for 3D medical image segmentation and detection. In International conference on Medical Image Computing and Computer-Assisted Intervention (pp. 175–184). Springer. 60. Rathi, M., Sahu, S., Goel, A., & Gupta, P. (2022). Personalized health framework for visually impaired. Informatica, 46(1), 77. 61. Nelson, I., Annadurai, C., & Devi, K. N. (2022). An efficient AlexNet deep learning architecture for automatic diagnosis of cardio-vascular diseases in healthcare system. Wireless Personal Communications, 126, 1–17. 62. Folorunso, S. O., Awotunde, J. B., Ayo, F. E., & Abdullah, K. K. A. (2021). RADIoT: The unifying framework for iot, radiomics and deep learning modeling. In Hybrid artificial intelligence and IoT in healthcare (pp. 109–128). Springer. 63. Mohammed, B. A., & Al-Ani, M. S. (2020). Review research of medical image analysis using deep learning. UHD Journal of Science and Technology, 4(2), 75–90. 64. Awotunde, J. B., Ajagbe, S. A., Oladipupo, M. A., Awokola, J. A., Afolabi, O. S., Mathew, T. O., & Oguns, Y. J. (2021, October). An improved machine learnings diagnosis technique for COVID-19 pandemic using chest X-ray images. In International Conference on Applied Informatics (pp. 319–330). Springer.

Explainable Machine Learning (XML) for Multimedia-Based. . .

43

65. Yaqub, M., Jinchao, F., Arshid, K., Ahmed, S., Zhang, W., Nawaz, M. Z., & Mahmood, T. (2022). Deep learning-based image reconstruction for different medical imaging modalities. Computational and Mathematical Methods in Medicine. 66. Li, W., Dong, Q., Jia, H., Zhao, S., Wang, Y., Xie, L., et al. (2019). Training a camera to perform long-distance eye tracking by another eye-tracker. IEEE Access, 7(1), 155313–155324. 67. Awotunde, J. B., Bhoi, A. K., & Barsocchi, P. (2021). Hybrid cloud/Fog environment for healthcare: An exploratory study, opportunities, challenges, and future prospects. In Hybrid artificial intelligence and IoT in healthcare (pp. 1–20). 68. Burlina, P., Freund, D. E., Joshi, N., Wolfson, Y., & Bressler, N. M. (2016, April). Detection of age-related macular degeneration via deep learning. In 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI) (pp. 184–188). IEEE. 69. Liu, J., Pan, Y., Li, M., Chen, Z., Tang, L., Lu, C., & Wang, J. (2018). Applications of deep learning to MRI images: A survey. Big Data Mining and Analytics, 1(1), 1–18. 70. Hu, P., Wu, F., Peng, J., Bao, Y., Chen, F., & Kong, D. (2017). Automatic abdominal multiorgan segmentation using deep convolutional neural network and time-implicit level sets. International journal of computer assisted radiology and surgery, 12(3), 399–411. 71. Bar, Y., Diamant, I., Wolf, L., & Greenspan, H. (2015, March). Deep learning with non-medical training used for chest pathology identification. In Medical imaging 2015: Computer-aided diagnosis (Vol. 9414, pp. 215–221). SPIE. 72. Noor, M. B. T., Zenia, N. Z., Kaiser, M. S., Mamun, S. A., & Mahmud, M. (2020). Application of deep learning in detecting neurological disorders from magnetic resonance images: A survey on the detection of Alzheimer’s disease, Parkinson’s disease and schizophrenia. Brain Informatics, 7(1), 1–21. 73. Che, D., Safran, M., & Peng, Z. (2013, April). From big data to big data mining: Challenges, issues, and opportunities. In International conference on Database Systems for Advanced Applications (pp. 1–15). Springer. 74. Shrivastava, K., Kumar, S., & Jain, D. K. (2019). An effective approach for emotion detection in multimedia text data using sequence based convolutional neural network. Multimedia Tools and Applications, 78(20), 29607–29639. 75. Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137–144. 76. Ye, Z., Tafti, A. P., He, K. Y., Wang, K., & He, M. M. (2016). Sparktext: Biomedical text mining on big data framework. PLoS One, 11(9), e0162721. 77. Pendergrass, S. A., & Crawford, D. C. (2019). Using electronic health records to generate phenotypes for research. Current Protocols in Human Genetics, 100(1), e80. 78. Zhan, A. (2018). Towards AI-assisted healthcare: System design and deployment for machine learning based clinical decision support. Doctoral dissertation, Johns Hopkins University. 79. Quasim, M. T., Khan, M. A., Abdullah, M., Meraj, M., Singh, S. P., & Johri, P. (2019, December). Internet of things for smart healthcare: A hardware perspective. In 2019 First International Conference of Intelligent Computing and Engineering (ICOICE) (pp. 1–5). IEEE. 80. Thambawita, V., Jha, D., Hammer, H. L., Johansen, H. D., Johansen, D., Halvorsen, P., & Riegler, M. A. (2020). An extensive study on cross-dataset bias and evaluation metrics interpretation for machine learning applied to gastrointestinal tract abnormality classification. ACM Transactions on Computing for Healthcare, 1(3), 1–29. 81. Riegler, M., Pogorelov, K., Markussen, J., Lux, M., Stensland, H. K., de Lange, T., et al. (2016, May). Computer aided disease detection system for gastrointestinal examinations. In S. L. Eskeland (Ed.), Proceedings of the 7th international conference on Multimedia Systems (pp. 1–4). 82. Kumar, S. N., & Ismail, B. M. (2020). Systematic investigation on multi-class skin cancer categorization using machine learning approach. Materials Today: Proceedings.

44

J. B. Awotunde et al.

83. Wagner, M., Bodenstedt, S., Daum, M., Schulze, A., Younis, R., Brandenburg, J., et al. (2022). The importance of machine learning in autonomous actions for surgical decision making. Artificial Intelligence Surgery, 2(2), 64–79. 84. Awotunde, J. B., Jimoh, R. G., Oladipo, I. D., Abdulraheem, M., Jimoh, T. B., & Ajamu, G. J. (2021). Big data and data analytics for an enhanced COVID-19 epidemic management. In Artificial Intelligence for COVID-19 (pp. 11–29). Springer. 85. Anwar, S. M., Majid, M., Qayyum, A., Awais, M., Alnowami, M., & Khan, M. K. (2018). Medical image analysis using convolutional neural networks: A review. Journal of Medical Systems, 42(11), 1–13. 86. Vellido, A. (2019). Societal issues concerning the application of artificial intelligence in medicine. Kidney Diseases, 5(1), 11–17. 87. He, J., Baxter, S. L., Xu, J., Xu, J., Zhou, X., & Zhang, K. (2019). The practical implementation of artificial intelligence technologies in medicine. Nature Medicine, 25(1), 30–36. 88. Xie, F., Chakraborty, B., Ong, M. E. H., Goldstein, B. A., & Liu, N. (2020). Autoscore: A machine learning–based automatic clinical score generator and its application to mortality prediction using electronic health records. JMIR Medical Informatics, 8(10), e21798. 89. Asan, O., Bayrak, A. E., & Choudhury, A. (2020). Artificial intelligence and human trust in healthcare: Focus on clinicians. Journal of Medical Internet Research, 22(6), e15154. 90. Quinn, T. P., Jacobs, S., Senadeera, M., Le, V., & Coghlan, S. (2022). The three ghosts of medical AI: Can the black-box present deliver? Artificial Intelligence in Medicine, 124, 102158. 91. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). “Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135–1144). 92. Sun, H., Koo, J., Dickens, B. L., Clapham, H. E., & Cook, A. R. (2022). Short-term and longterm epidemiological impacts of sustained vector control in various dengue endemic settings: A modelling study. PLoS Computational Biology, 18(4), e1009979. 93. Wang, M., Zheng, K., Yang, Y., & Wang, X. (2020). An explainable machine learning framework for intrusion detection systems. IEEE Access, 8, 73127–73141. 94. Teo, Y. Y. A., Danilevsky, A., & Shomron, N. (2021). Overcoming interpretability in deep learning cancer classification. In Deep sequencing data analysis (pp. 297–309). Humana. 95. Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., et al. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2(1), 56–67. 96. Lipton, Z. C. (2018). The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 16(3), 31–57. 97. Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., & Kagal, L. (2018, October). Explaining explanations: An overview of interpretability of machine learning. In In 2018 IEEE 5th international conference on Data Science and Advanced Analytics (DSAA) (pp. 80–89). IEEE. 98. Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1–38. 99. Rasheed, K., Qayyum, A., Ghaly, M., Al-Fuqaha, A., Razi, A., & Qadir, J. (2021). Explainable, trustworthy, and ethical machine learning for healthcare: A survey 100. Rouse, W. B., & Morris, N. M. (1986). On looking into the black box: Prospects and limits in the search for mental models. Psychological Bulletin, 100(3), 349–363. 101. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. 102. Zhang, W., & Ram, S. (2020). A comprehensive analysis of triggers and risk factors for asthma based on machine learning and large heterogeneous data sources. MIS Quarterly, 44(1), 305–349.

Explainable Machine Learning (XML) for Multimedia-Based. . .

45

103. Mohapatra, S., Satpathy, S., & Paul, D. (2021). Data-driven symptom analysis and location prediction model for clinical health data processing and knowledgebase development for COVID-19. In Applications of artificial intelligence in COVID-19 (pp. 99–117). Springer. 104. Ahmad, M. A., Eckert, C., & Teredesai, A. (2018, August). Interpretable machine learning in healthcare. In Proceedings of the 2018 ACM international conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 559–560). 105. Maadi, M., Akbarzadeh Khorshidi, H., & Aickelin, U. (2021). A review on human–AI interaction in machine learning and insights for medical applications. International Journal of Environmental Research and Public Health, 18(4), 2121. 106. Folorunso, S. O., Ogundepo, E. A., Awotunde, J. B., Ayo, F. E., Banjo, O. O., & Taiwo, A. I. (2022). A multi-step predictive model for COVID-19 cases in Nigeria using machine learning. In Decision sciences for COVID-19 (pp. 107–136). Springer. 107. Jimoh, R., Afolayan, A. A., Awotunde, J. B., & Matiluko, O. E. (2017). Fuzzy logic based expert system in the diagnosis of ebola virus. Ilorin Journal of Computer Science and Information Technology, 2(1), 73–94. 108. Lai, X., Lange, T., Balakrishnan, A., Alexandrescu, D., & Jenihhin, M. (2021, October). On antagonism between Side-Channel security and soft-error reliability in BNN inference engines. In 2021 IFIP/IEEE 29th international conference on Very Large Scale Integration (VLSI-SoC) (pp. 1–6). IEEE. 109. Al-Garadi, M. A., Mohamed, A., Al-Ali, A. K., Du, X., Ali, I., & Guizani, M. (2020). A survey of machine and deep learning methods for internet of things (IoT) security. IEEE Communications Surveys & Tutorials, 22(3), 1646–1685. 110. Awotunde, J. B., Jimoh, R. G., Folorunso, S. O., Adeniyi, E. A., Abiodun, K. M., & Banjo, O. O. (2021). Privacy and security concerns in IoT-based healthcare systems. In The Fusion of Internet of Things, Artificial Intelligence, and Cloud Computing in Health Care (pp. 105–134). Springer. 111. Chatzimparmpas, A., Martins, R. M., Jusufi, I., & Kerren, A. (2020). A survey of surveys on the use of visualization for interpreting machine learning models. Information Visualization, 19(3), 207–233. 112. Floridi, L. (2019). Establishing the rules for building trustworthy AI. Nature Machine Intelligence, 1(6), 261–262. 113. Meikle, S. R., Matthews, J. C., Cunningham, V. J., Bailey, D. L., Livieratos, L., Jones, T., & Price, P. (1998). Parametric image reconstruction using spectral analysis of PET projection data. Physics in Medicine & Biology, 43(3), 651–666. 114. Gille, F., Jobin, A., & Ienca, M. (2020). What we talk about when we talk about trust: Theory of trust for AI in healthcare. Intelligence-Based Medicine, 1, 100001. 115. Abdel-Basset, M., El-Hoseny, M., Gamal, A., & Smarandache, F. (2019). A novel model for evaluation hospital medical care systems based on plithogenic sets. Artificial Intelligence in Medicine, 100, 101710. 116. Castro, D. C., Walker, I., & Glocker, B. (2020). Causality matters in medical imaging. Nature Communications, 11(1), 1–10. 117. Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., & Elhadad, N. (2015, August). Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1721–1730). 118. Kaufman, S., Rosset, S., Perlich, C., & Stitelman, O. (2012). Leakage in data mining: Formulation, detection, and avoidance. ACM Transactions on Knowledge Discovery from Data (TKDD), 6(4), 1–21. 119. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys (CSUR), 51(5), 1–42. 120. Mohseni, S., Block, J. E., & Ragan, E. D. (2018). A human-grounded evaluation benchmark for local explanations of machine learning. arXiv preprint arXiv:1801.05075.

46

J. B. Awotunde et al.

121. Geng, Y., Chen, J., Jiménez-Ruiz, E., & Chen, H. (2019). Human-centric transfer learning explanation via knowledge graph. arXiv preprint arXiv:1901.08547. 122. Seeliger, A., Pfaff, M., & Krcmar, H. (2019). Semantic web technologies for explainable machine learning models: A literature review. PROFILES/SEMEX@ ISWC, 2465, 1–16. 123. Wichmann, J. L., Willemink, M. J., & De Cecco, C. N. (2020). Artificial intelligence and machine learning in radiology: Current state and considerations for routine clinical implementation. Investigative Radiology, 55(9), 619–627. 124. Ilahi, I., Usama, M., Qadir, J., Janjua, M. U., Al-Fuqaha, A., Hoang, D. T., & Niyato, D. (2021). Challenges and countermeasures for adversarial attacks on deep reinforcement learning. IEEE Transactions on Artificial Intelligence, 3(2), 90–109. 125. Awotunde, J. B., Chakraborty, C., & Adeniyi, A. E. (2021). Intrusion detection in industrial internet of things network-based on deep learning model with rule-based feature selection. Wireless Communications and Mobile Computing, 2021, 7154587–7154517. 126. Rasheed, J., Jamil, A., Hameed, A. A., Aftab, U., Aftab, J., Shah, S. A., & Draheim, D. (2020). A survey on artificial intelligence approaches in supporting frontline workers and decision makers for the COVID-19 pandemic. Chaos, Solitons & Fractals, 141, 110337. 127. Awotunde, J. B., Folorunso, S. O., Bhoi, A. K., Adebayo, P. O., & Ijaz, M. F. (2021). Disease diagnosis system for IoT-based wearable body sensors with machine learning algorithm. In Hybrid artificial intelligence and IoT in healthcare (pp. 201–222). Springer. 128. Wiens, J., Saria, S., Sendak, M., Ghassemi, M., Liu, V. X., Doshi-Velez, F., et al. (2019). Do no harm: A roadmap for responsible machine learning for health care. Nat. Med., 25(9), 1337–1340. 129. Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G., & King, D. (2019). Key challenges for delivering clinical impact with artificial intelligence. BMC Medicine, 17(1), 1–9. 130. Latif, S., Qayyum, A., Usama, M., Qadir, J., Zwitter, A., & Shahzad, M. (2019). Caveat emptor: The risks of using big data for human development. IEEE Technology and Society Magazine, 38(3), 82–90. 131. Darabi, N., & Hosseinichimeh, N. (2020). System dynamics modeling in health and medicine: A systematic literature review. System Dynamics Review, 36(1), 29–73.

Ensemble Deep Learning Architectures in Bone Cancer Detection Based on Medical Diagnosis in Explainable Artificial Intelligence Ulaganathan Sakthi and R. Manikandan

1 Introduction Bone cancer is seen as a complex condition that can be brought on by a variety of genetic and physiological factors. It causes the unchecked cell development that results in demonic bone tumours and invades nearby bodily regions. Cancer is the abnormal cell proliferation that has the potential to invade and spread throughout any human body organ. In India, the National Institute of Cancer Prevention and Research (NICPR) conducted a survey that revealed that there were about 2.5 million cancer patients. Every year, there are more than 7 lakh new cancer patients and 556,400 cancer-related deaths. In 2030, the International Agency for Research on Cancer (IARC) forecasted 21.7 million new cases of cancer worldwide and 13 million cancer-related deaths. There are 75 different varieties of cancer, and one of them is bone cancer, which frequently includes osteosarcoma and Ewing tumours. By identifying and diagnosing the kind and stage of cancer early on and beginning the proper treatment, the death rate can be decreased [1]. An x-ray, also known as a radiograph, is a non-invasive medical diagnostic that uses radiation to display the inside of the body so that a radiologist can make a diagnosis. Using strong magnets and radio waves, magnetic resonance imaging shows the same thing in greater detail. The output of both procedures is a grayscale image. On bone X-ray or MRI images, image segmentation algorithms can be used to identify an unwelcome bone growth that may be benign (not cancerous) or malignant (cancer). Type of bone cancer can also be determined by size, form, and other characteristics. Therefore, the goal of this project is to combine picture segmentation and x-ray or MRI technologies to treat

U. Sakthi Department of Computational Intelligence, School of Computing, SRM Institute of Science and Technology, Chennai, TN, India R. Manikandan (✉) School of Computing, SASTRA Deemed University, Thanjavur, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Hossain et al. (eds.), Explainable Machine Learning for Multimedia Based Healthcare Applications, https://doi.org/10.1007/978-3-031-38036-5_3

47

48

U. Sakthi and R. Manikandan

cancer, a highly serious medical condition. In the human body, bone cancers of various sorts have been found. Sarcomas are another name for bone malignancies [2]. The benign sarcoma and the malignant sarcoma are the two different forms of sarcomas. The American Cancer Society presented an estimate of individuals impacted by bone cancer in 2014. It revealed that there were approximately 3020 new cases diagnosed and 1460 deaths were anticipated from these patients as a result of bone cancer. The anatomy of the bones is scanned using computed tomography (CT) or magnetic resonance imaging (MRI). The diagnosis of cancer always benefits greatly from image segmentation. Segmentation’s true definition is the division of an image into various regions, followed by the extraction of the relevant data from each region [3]. On the MR picture, a variety of segmentation approaches have been used. The most popular approaches for edge identification in medical image analysis were segmentation techniques based on thresholding, region, and clustering. Each segmentation technique has its own benefits and drawbacks, thus the user’s preference is the deciding factor. In addition to segmenting and detecting bone tumours, it is important to categorise the type of tumour that has been found so that medical professionals can correctly advise the patient into early intervention. Healthy cells are where a bone cancer begins to develop into a tumour [4]. A bone tumour is the main sign of bone cancer. The tumour develops gradually and has the potential to spread to other body parts. The bone tissue may be destroyed, making the bone weaker. Statistics show that 3500 Americans were impacted by bone cancer in 2018, and that approximately 47% of those who received a diagnosis of the disease also passed away. Numerous tests are used by the doctor to diagnose cancer. The examination of X-ray images is used to find bone cancer in people. The rates of X-ray assimilation into healthy bone and malignant bone varies. Resulting in the ragged appearance of a malignant bone imaging surface [5]). A stage and a grade are used to determine how severe a bone cancer is. Doctors forecast the disease growth rate using the tumour growth rate. Bone cancer diagnosis takes experience. A doctor must manually identify bone cancer, which adds time and the potential for error [6]. The contribution of this chapter are as follows: 1. To propose novel technique in Bone tumor detection based on cloud-IoT in segmenting and classification using deep learning techniques 2. To segmented and classify segmented pattern of Bone cancer optimized kernel fuzzy C means multilayer deep transfer convolutional learning (OpKFuzCMMDTCL)

2 Related Works This section includes many writers’ viewpoints on the stages of bone image processing as well as methodologies, strategies, concepts, and ideas for bone cancer detection. Researchers have conducted related studies in this area to create an automated system to help a doctor. Automated systems are quick and have low error rates. An automated system has been developed using the machine learning

Ensemble Deep Learning Architectures in Bone Cancer Detection Based. . .

49

algorithm SVM and digital image processing methods such as preprocessing, edge detection, and feature extraction [7]. In a separate study [8], created an automated system for human bone diagnostics. They have classified healthy and fractured bone using a deep neural network. The extensive enhanced picture dataset is used to train the model. The training and test datasets may contain identical copies of the images that are generated throughout the augmentation process. To prevent bias performance, k-fold crossvalidation can be performed. The GLCM function has been employed by [9] to locate broken bones. They came to the conclusion from the experiment that a GLCM-based textural feature alone is insufficient to accurately identify the malignant bone. Entropy and skewness are important factors in the prediction of malignant regions. Entropy has a low value within the diseased region and a high value outside of it. The hog feature in photos provides the pixel’s shape and orientation. To distinguish between healthy and malignant bone [10], combined a number of methods and textural factors. SVM is utilised for the categorization of the long bone. The technique only focuses on long, healthy bones and malignant bones. Model performance is 85%, however it can still be increased [11]. suggested using mean pixel power as a method of separating the bone malignant development from MR images [12]. used an MRI scan to differentiate between malignant and benign tumours. They did this by extracting textural information and separating the tumour portion using the K-means clustering technique. The plan put forward by [13] is an additional segmentation method for brain tumours. They employ fuzzy C-means and K-means algorithms in their approach. Another article by [14] presents a noble strategy that can be coupled using various division techniques on MRI and CT data [15]. provided a revolutionary method that used developed area calculation to discern between the tumor’s size and the stage of bone cancer. By applying the locally established formula, this tactic dispersed the region of enthusiasm [16]. have identified and staged bone cancer using an MRI picture. The image is denoised by creating clusters depending on the properties of the pixels. The cancer stage can be predicted using the value 245 and mean pixel intensity. To estimate the tumor’s size, the ROI (region of interest) is taken from the image and put in comparison with a threshold value.

3 System Model This section discusses novel technique in Bone tumor detection based on cloud-IoT in segmenting and classification using deep learning techniques. Here the Bone tumor dataset has been collected from various healthcare dataset using cloud based IoT module. Here, a variety of experiments have been carried out using a variety of medical datasets, as well as real-time data acquired directly from patients via a cloud device. This data has been segmented and classified using ensemble of optimized kernel fuzzy C means multilayer deep transfer convolutional learning (OpKFuzCMM-DTCL). Overall proposed method is shown in Fig. 1.

50

U. Sakthi and R. Manikandan

Fig. 1 Overall proposed architecture

Because the acquired images contain multiple irregular details and low quality pixels, the accuracy of predicted Bone cancer is reduced. Pixel intensity examination technique, which significantly modifies perception of image pixel, improves quality of CT bone images. The continual pixel change effectively eliminates consistent pixels and noisy pixels. The weighted mean histogram equalization strategy is used in this study to analyze the collected CT Bone image since introduced technique divides images into sub-images. Approach also examines photos for quality and quantity. Before boosting the image’s quality, each pixel is checked against limit esteem, and the medium esteem replaces pixel’s clamor. For available intensity, let N indicate normalised histogram bin of picture I. IN =

Number of pixels with available intensity n Total number of pixels

ð1Þ

Where n = 0, 1. . . 255. It improves picture contrast on a local level by dividing image into many subregions as well as separately modifying intensity values of every subregion to match a target histogram. ABF is then utilized to de-noise improved CT scan pictures for pre-processing. ABF is a more advanced version of bilateral filter. ABF differs from bilateral in a number of ways. ABF employs locally adaptable range filters. The range filter on histogram is removed as described in Eq. (2) by adding a counterweight to range filter.

Ensemble Deep Learning Architectures in Bone Cancer Detection Based. . .

ABF x0 y0 = × exp

x0 þN

y0 þN

x = x0 - N

y = y0

exp -N

51

ð x - x0 Þ 2 - ð y - y0 Þ 2 2∇2d

ðG½x0 y0  - G½x0 y0  - δ½x0 y0 Þ2 2∇2r ½x0 y0 

ð2Þ

Where x0 denotes the current pixel’s row index as well as y0 denotes current pixel’s column index in image. Size of a neighbouring window is denoted by the letter N. The ABF degenerates into a standard bilateral filter if ∇r and δ are both fixed. A fixed low-pas GF is used for ABF. Combination of a locally adaptable as well as bilateral filter transforms ABF into a significantly more powerful smoothing as well as sharpening filter. Furthermore, ABF improves an image by enhancing edge slope. In ABF, δ is calculated using Eq. (3). δ½x0 y0  =

MAXIMUM ðβx0 y0 Þ - G½x0 y0 , IFΩx0 y0 > 0 MINIMUM ðβx0 y0 Þ - G½x0 y0 , IFΩx0 y0 < 0 0 IFΩx0 y0 = 0

ð3Þ

Where x0 denotes the current pixel’s row index as well as y0 denotes current pixel’s column index in image. Input image’s window size is denoted as (2 W + 1) (2 W + 1). Every pixel is given as βx0;y0in this case, with the centre [x0,y0]. The operations of taking value of data in MAXIMUM and MINIMUM are depicted. With a fixed domain GF plus a range filter, the impact of ABF is effective. The value of ∇d = 1 remains constant while the value of ∇r fluctuates. The segmentation procedure in a pre-processed image finds objects or boundaries that aid in getting ROI in image. It divides image into regions in order to find relevant data. It is critical to segment cancer nodule from pre-processed CT scan image when classifying Bone cancer.

3.1

Optimized Kernel Fuzzy C Means Multilayer Deep Transfer Convolutional Learning (OpKFuzCMM-DTCL) Based Segmentation and Classification

The FCM technique is commonly used to recognise structure in large crisp data sets X = {x1, x2,..., xn}2 R p. The following conditions must be met by Eq. (4) in such a partition. n

Uik > 0 for all i 2 f1, . . . cg k=1

52

U. Sakthi and R. Manikandan c k=1

Uik = 1 for all k 2 f1, . . . ng

ð4Þ

In contrast to the HCM method (and the CL method as well), a pattern can now belong to multiple clusters at the same time, with varying degrees of membership. As is well known, the FCM method detects fuzzy clusters as spherical clouds of patterns, every of which is given by a cluster centre Vi 2 R p. ABF is then utilized to de-noise improved CT scan pictures for pre-processing. ABF is a more advanced version of bilateral filter. ABF differs from bilateral in a number of ways. The ideal cluster centres for given pattern belongingness to the clusters must be determined to produce local optimal partition of a pattern vector to clusters. This necessitates optimization of a distance-based objective function like Eq. (5). J ðU, V Þ =

c

n

i=1

k=1

ðUik Þd2 ðvi , xi Þ

ð5Þ

where d is Euclidian distance as well as suitable for requirements mentioned above, as well as 1 < m, where m is fuzziness value. As is well known, fuzzy method for optimization of objective function (2,3) employs an alternating optimization method in which the arbitrarily produces prototypes V (0) or partition matrix U (0) are used under a fixed number of clusters, and their values U (t) and V (t) are efficient based on subsequent Eq. (6) at each optimization step t: Uik =

1 c 2 i = 1 d ðνi , xi Þ=

Vi =

c 2 i = 1d

vj , xk

n m k = 1 ðUik Þ xk Þ n m k = 1 ðUik Þ

1=ðm - 1Þ

ð6Þ

The procedure continues until consecutive approximations to the prototypes d(V(t), V(t - 1)) or the partition d(U(t), U(t - 1)) have stabilised, and the decision to end condition based on to above start method. In following section, we’ll apply former method of initialization, which means prototypes V(0) is generated at random. It’s worth noting that fuzzy method produces a partition matrix U in which every pattern vector xk is allocated to each cluster as a fuzzy subset with a distinct membership degree. The assignment is based on xk = {x1, x2,..., xp}, a mapping of the cluster to a relative pattern vector. Each pattern will be crisply allocated to the associated cluster as a consequence of matching if it has the highest degree of membership or the shortest distance to the cluster centre (defuzzification process). Suggested approach, which calculate local optimal partition for every number of cluster c utilizing cluster validity or validity criteria, finds an optimal number of clusters c* according to this clustering. This technique then produces a number of weight vectors as well as their values for a particular network. The pooling feature map is used as a feature representation in this part, therefore the characteristics of each layer are determined by the dimension. As a result, adaptive network clustering can be offered, which includes the benefits of the fuzzy-based method discussed in the

Ensemble Deep Learning Architectures in Bone Cancer Detection Based. . .

53

preceding subsection. Many validity criteria have been proposed, the most wellknown of which are partition coefficient (Vpc) and partition entropy (Vpe). Some indices have been proposed to overcome this weakness as well as give good performance across a wide range of c as well as fuzziness parameter m. One of these is the VSV validity criterion, which is most suited to the requirements. This index is effective since it solely uses structured characteristics. We propose a Vsv-based approach to select an ideal number of clusters c*, consider benefits of validity criterion Vsv. Before we get into the details of this technique, let’s have a look at the Vsv index. When cc*, the foundation of criteria VSV is that clusters are under-partitioned. V u ðc, V, X Þ =

1 n

c i=1

MDi

2 ≤ c ≤ cmax

ð7Þ

MDi is mean intra-cluster distance of Ith cluster, which is described by Eq. (8) MDi =

x2Si

jjvi - xjj2 =ni

ð8Þ

where Si denotes a data subset of Ith cluster and ni denotes number of patterns within it. Obviously, c ≥ c has extremely small values and c ≥ c has very large values in this function. An over partition measure function described as Vo (c,V) is used to determine the over-partitioned state, as shown in Eq. (9): V 0 ðc, V Þ = c=dmin 2 ≤ c ≤ cmax

ð9Þ

dmin is inter-cluster minimum distance is calculated using Eq. (10) d min = min i ≠ j j vi - vj j

ð10Þ

After normalisation, the two partition measure functions produce VSV as Eq. (11): V sv ðc, V, X Þ = U uN ðc, V, X Þ þ V oN ðc, V Þ

ð11Þ

Thus, for c = 2 to cmax, optimal cluster number c* may be identified with minimum value of Vsv, and Vsv = [0,1] will be maintained. As shown in Eq. (12), OFCMNN minimises the following objective function: c

i=1 k=1 2

2

n

J KFCM ðU, V Þ =

μm ki jΦðxk Þ - jΦðvi Þ

jΦðxk Þ - Φðνi Þj = K ðxk , xk Þ þ K ðνi , νi Þ - 2K ðxk , νi Þ

ð12Þ

54

U. Sakthi and R. Manikandan

where k(x, ν) = Φ(x)TΦ(x) and is an inner product kernel function. If consider GF as 2 a kernel function, i.e., ðx, νÞ = exp - jxτ-2 νj , then k(x, x) = 1, based on Eq. (12) is given as Eq. (13): c

J KFCM ðU, V Þ = 2

n

μm ð1 - K ðxk , νi ÞÞ i = 1 ki

k=1

ð13Þ

Equation (14) is obtained by minimising Eq. (13) under the constraint of U:

μki

1 1 - K ðxk , νi Þ

=

c

1 1 - K ð xk , ν i Þ

j=1 n

νi

=

1 m-1

1 m-1

ð14Þ

k=1 n

μm ki K ðxk , νi Þxk

k=1

μm ki K ðxk , νi Þ

Although Eqs. (13) and (14) are obtained utilizing Gaussian kernel function, other functions fulfilling K(x, x) = 1 can be used in Eqs. (13) and (14) as well (14). In real-world applications, such as RBF as well as hyper tangent functions below: RBF functions according to Eq. (15): K ðx, νÞ = exp

c i=1

xai - νai

b

ð15Þ

τ2

Hyper tangent functions by Eq. (16): K ðx, νÞ = 1 - tanh

- jx - νj2 τ2

ð16Þ

It’s worth noting that the RBF function reduces to the commonly used Gaussian function when a = 1, b = 2. Equation (15) are seen as a kernel-induced new measured in data space, which is described as Eq. (17): dðx, νÞ≜ j ΦðxÞ - ΦðνÞ j =

2ð1 - K ðx, νÞÞ

ð17Þ

The data point is given an additional weight K(xk, νi), which events similarity between xk and νi according to Eq. (8). When xk is an outlier, that is, when xk is far from other data points, K(xk, νi) is very tiny, resulting in a more robust weighted sum of data points. Assignment is based on xk = {x1, x2,..., xp}, a mapping of the

Ensemble Deep Learning Architectures in Bone Cancer Detection Based. . .

55

cluster to a relative pattern vector. Each pattern will be crisply allocated to the associated cluster as a consequence of matching if it has the highest degree of membership or the shortest distance to the cluster centre (defuzzification process). Because a data point with misplaced components is more likely to become an outlier in an incomplete dataset, the OFCMNN-based technique for clustering incomplete data has a lot of promise. Algorithm for Fuzzy Technique Start start class centers νi(2 ≤ i ≤ n) and fuzzy c-partition U(0). Give fuzzificationspecification m(1 ≤ m ≤ 1), constant ζ, and value ε > 0; Evaluate membership matrix U = [μki] utilizing Eq. (2): n

μki =

1 d2ki

i=1

1 ð m - 1Þ

d2ki

where dki is Euclidean distance between training pattern xk and class center νi; Update class centersutilize Eq. (3): n c

νi = k=1

xk

i=1 μm ki

μm ki

Evaluate Δ = max (|U(t+) - U(t)|). If Δ > ε, then return to Step 2; otherwise go to Step 5; Determine results for final class centers; End. To begin, the suggested method uses a deep network to learn a deep metric as a feature extractor. Kernel regression is then performed using low-dimension features in learned metric space. Each layer of a NN, in general, has following Eq. (18): hðlÞ = f ðlÞ W ðlÞ hðl - 1Þ þ bðlÞ

ð18Þ

where h(l ) and h(l - 1) are current layer’s as well as forward layer’s outputs. The weight matrix as well as bias vector of l-th layer are W(l ) and b(l ), respectively. Activation function is f (l )(). For any input xx, there is an output f(x) = f (1)f (2)⋯f (L )(x) for a NN with L layers. The metric has form (xi, xj) = d(xi, xj) = kf(xi) - f(xj)k when this network is utilized for metric learning. Highdimensional characteristics are mapped by making output layer’s dimension smaller than the input layer’s. The kernel regression layer’s input will be features produced after dimensionality reduction, and regression error will have following Eq. (19):

56

U. Sakthi and R. Manikandan

ˆ

2

L R = Σi y i - y

ð19Þ

i

In this phase, we use the DL based classifier function φ : ℝ × H $ Λ which predicts label probability vector v 2 f0, 1gΛ using a MLP_NN, as shown in Eq. (20): s:t: : Θ0 = arg

min

K zi,1 , A d , B21,l , αu

fL ðvut , λnt j ΘÞ; 8n 2 N g

ð20Þ

where ϕd(ud - 1) = ηd(Adud - 1 + αd) is fully-connected layer ruled by non-linear 0 P0 activation function: ηd : ℝd d → ℝPd , P0d 2 ℕ is number of hidden units upcoming 0 P0 neurons and hidden units P′ of layer d, αd 2 ℝPd is bias vector, and ud 2 ℝd d extracted spatial data represented by resultant 2D feature maps in Q domain is stored in the hidden layer vector. The hidden layer vector is iteratively efficient by rule ud = ϕd(ud - 1), for evaluation at each layer, with early state vector compressed across the z and Il ˆ

domains as u0 = vec Y z L ′

: 8z 2 Z . Input vector u0 sizes G = W′H′Z ∑ ∑l Il,

holding W′ < W, H < H . Furthermore, the label adjustment optimising estimation framework evaluates training specification set Θ0 = K zi,l , Ad , bzi,l , αd , while setting the loss function j L : ℝA × ℝA → ℝ to generate gradients used to update weights as well as bias. Equation (21) gives output of convolutional layer given an input x: xn = f W n xn - 1 þ bn , n 2 ð1, N Þ

ð21Þ

The activation function is denoted by the letter f. After fine-tuning multi-layer convolution network, usual procedure is to input x into network as well as then extract output of last layer of convolution as x a feature to relates x. The output of multi-layer convolution is extracted as well as fused into X as a feature of human picture in form of splicing, as illustrated by Eq. (22): X = x1 , x 2 , . . . , x N X = f W 1 x 0 þ b1 , f W 2 x 1 þ b2 , . . . , f W N x N - 1 þ bN

ð22Þ

The input image is a colour image, which means it has three colour channels (R, G, and B). According to varied feature patterns of the channel parameters, each feature map of the random multi-level feature is different in size, and each feature map is doubled by a pooling layer with a step size of 2 × 2 in each layer. The pooling feature map is used as a feature representation in this part, therefore the characteristics of each layer are determined by the dimension. In the fusion feature, each layer has a different proportion. Because the degree to which each layer influences the recognition result is unknown, this section extracts each layer after the feature, resulting in consistent feature dimensions for each layer after pooling. To indicate nth layer of pooling, use pooln, followed by Eq. (23):

Ensemble Deep Learning Architectures in Bone Cancer Detection Based. . .

57

X = pool1 f W 1 x0 þ b1 , pool2 f W 2 x1 þ b2 , . . . , poolN f W N xN - 1 ð23Þ þbN The output of the specified pooling layer is extracted once the Bone image is input into the network. Originally, this output was used as an input to next layer network. It is used as a technique in this algorithm. In multi-layer convolution networks, progressiveness stated above, which is part of expression of pedestrians’ basic features, is sluggish, and degree of rise in degree of abstraction between each adjacent convolutional layer is not particularly large. The pooling feature map is used as a feature representation in this part, therefore the characteristics of each layer are determined by the dimension. That is, closer data is to the network data convolution characteristic, more comparable information is, and smaller the information disparity. Each convolutional layer has an information gap. Algorithm for OpKFuzCMM-DTCL Input: XS: input features in source domain Y S : labels in source domain XT: input features in target domain nb: batch size nk: number of nearest neighbors Start {Wl, bl} Fir step = 1,2.,. . . .max_step do For epoch = 1,2. . . ..max_Epoch do //feature extraction Evaluate hidden feature H Sbatch Calculate hidden feature H Tbatch //kernel regression S Select k-nn batch F ,S knn , Y knn S Estimate Y batch with kernel regression //evaluate loss as well as gradient descent Evaluate regression loss LR Y Sbatch , Y batch S Calculate reconstruction loss LAE X Sbatch , X batch S Calculate MK-MMD d2k (H Sbatch , H Tbatch ) Evaluate total loss with Eq. (8) Evaluate gradient as well as update specifications End End Output: weight as well as biases {Wl,bl}

58

3.2

U. Sakthi and R. Manikandan

Performance Analysis

Experimental enactment is carried out with Tensorflow 2.4.0, Keras 2.4.3, Python 2.6.9, Pandas 1.1.5, and Numpy 1.19.4 to generate the findings. The tests for evaluating the performance of DL models used in this research were run on a 68.40 GB Hard Disk, Google Colab with 12.72 GB RAM, and Tesla T4, and hyper-parameter optimization was complete utilizing OptKeras 0.0.7 based on Optuna 0.14.0.

3.3

Dataset Description

The Indian Institute of Engineering Science and Technology, Shibpur (IIEST) and The TCIA are two providers of the publicly accessible data sets for study on the bone X-ray picture (Cancer Imaging Archive). According to the type of lesion, each DICOM (Digital Imaging and Communications in Medicine) file is formatted, labelled, and stored in a distinct folder (0 for benign tumors, respectively, and 1 for malignant tumors). A sizable and carefully curated dataset is required in order to produce a model with notable performance. Different image processing methods were used to the dataset utilised in the study with the main objective of obtaining adequate data for a model’s training. Above Table 1 shows comparative analysis between existing and proposed technique for various Bone cancer datasets. Here parametric analysis are carried out with accuracy, precision, recall, AUC, TPR, FPR. The techniques compared are ANN and ML_DCD with proposed KM-DTCL_OFCMNN for datasets TCIA and DICOM. The analysis has been carried out for both the datasets with all the parameters. For TCIA dataset, proposed approach obtained accuracy of 96%, precision of 91%, recall of 81%, AUC of 69%, TPR of 58%, FPR of 50%; while the existing techniques ANN attained accuracy of 91%, precision of 88%, recall of 77%, AUC of 61%, TPR of 65%, FPR of 55%, ML_DCD achieved accuracy of 93%, precision of 89%, recall of 79%, AUC of 65%, TPR of 61%, FPR of 51% as shown in Fig. 2a–f. Secondly for DICOM dataset, proposed approach attained accuracy of 97%, precision of 94%, recall of 85%, AUC of 72%, TPR of 55%, FPR of 45%; whereas existing ANN attained accuracy of 92%, precision of 89%, recall of 81%, AUC of 63%, TPR of 61%, FPR of 52%, ML_DCD achieved accuracy of 94%, precision of 92%, recall of 83%, AUC of 69%, TPR of 58%, FPR of 50% as shown in Fig. 3a–f. From above analysis, proposed technique obtained optimal results in detecting Bone cancer based on deep learning architectures (Table 2).

Ensemble Deep Learning Architectures in Bone Cancer Detection Based. . .

59

Table 1 Comparison between existing techniques Author Abdullah and Ahmed (2021)

Method/techniques MLP, SVM, NN, GBT, DT, KNN, RF, stochastic gradient descent and deep learning

Dataset CT scan and X-ray images

Heuvelmams et al. (2021)

LCP-CNN AND CNN

CT scan image

Chaudhari and Malviya (2021) Pragya et al. (2021)

PCA & GLCM

CT scan image

SVM, CNN &ANN

Raut et al. (2021)

DI-COM software, SVM, GLCM

LUNA16, SUPER BOWL, DATASET 2016, LIDC-IDRI CT scan image

Joshua et al. (2020)

HRCT, MRI, Di.COM, SVM, NAÏVE BAYES, KNN AND DT FSES

Various machine learning algorithms

SOM-GRR based on RBFNN

CT scan image

There was significant increase in classification accuracy Most algorithms had accuracy levels between 90% and 95% Prediction of Bone cancer was achieved relatively fair Higher accuracy

MAN, SVM & DL

Chest X-Ray and CT scan image

Accuracy of 97.27% was obtained

Khalil and Ma (2020) Wutswa and Farhan (2020) Bhandary et al. (2020)

3.4

CT scan image

Findings Deep learning performed better than the classical machine learning Benign were differentiated from malignant nodules The ROC curve of 93.75% was obtained Bone cancer was classified at the early stage

Discussion

Bone cancer has been detected, predicted, compared, and classified using a variety of machine learning techniques. However, there are certain lessons to be learned and research avenues to be pursued in the field of Boneas well as related diseases: Many machine learning methods are utilized to detect, predict, compare, and categorise Bone cancer. However, there is a scarcity of study into the application of current soft computing to give great accuracy. Soft computing techniques such as ABC, Genetic method, PSO, Functional Approximation, and others can be used as a single or hybrid method to better appreciate their strengths as well as weaknesses in this domain. Number of researchers working on ML applications in Bone cancer is growing. However, the majority of research has concentrated on premature detection; there is a require to proceed to harshness level as well as other Bone cancer components to assist medical practitioners in their everyday work. Pooling feature map is used as a feature representation in this part, therefore the characteristics of each layer are determined by the dimension. Most studies utilized CT scan images to

60

U. Sakthi and R. Manikandan

Fig. 2 Comparative analysis between existing and proposed technique in terms of (a) Accuracy, (b) Precision, (c) Recall, (d) F-1 score, (e) TPR, (f) FPR for LUNA16 dataset

detect, predict, compare, or classify Bone cancer utilizing traditional ML methods, but more research is needed to see if other data sets, such as family history, personal characteristics, and X-ray images, can give insights into presence of Bone cancer as well as related diseases. Utilizing of ML methods in Bone cancer research has grown in popularity over years. There are, however, few comparison studies available to

Ensemble Deep Learning Architectures in Bone Cancer Detection Based. . .

61

Fig. 3 Comparative analysis between existing and proposed technique in terms of (a) Accuracy, (b) Precision, (c) Recall, (d) AUC, (e) TPR, (f) FPR for LIDC-IDRI dataset

determine which ML approaches as well as input sets can deliver high accuracy. To assist medical practitioners as well as other connected health employees with Bone cancer as well as related disorders, there is a necessity to compare contemporary software with diverse inputs to identify, predict, comparation or identify Bone cancer.

62

U. Sakthi and R. Manikandan

Table 2 Comparative analysis between proposed and existing technique for various datasets Datasets TCIA

DICOM

Techniques ANN ML_DCD OpKFuzCMM-DTCL ANN ML_DCD OpKFuzCMM-DTCL

Accuracy 85 88 89 86 88 92

Precision 75 77 79 76 78 81

Recall 82 84 86 85 88 89

AUC 48 51 53 49 52 56

TPR 55 59 61 58 62 63

FPR 51 53 55 52 54 56

4 Conclusion This chapter discuss novel technique in Bone tumor detection based on cloud-IoT in segmenting and classification using deep learning techniques. Here the Bone tumor dataset has been collected from various healthcare dataset using cloud based IoT module. Here, a variety of experiments have been carried out using a variety of medical datasets, as well as real-time data acquired directly from patients via a cloud device. This data has been segmented and classified using ensemble of optimized kernel fuzzy C means multilayer deep transfer convolutional learning (OpKFuzCMM-DTCL). The presented technique’s performance was assessed using a benchmark image Bone tumour dataset as well as Bone MRI images. When compared to current strategies in the literature, our new method outperformed them in terms of accuracy, recall, precision, AUC, TPR, and FPR. Our suggested method outperforms all of the photos in the application dataset in a variety of ways. For TCIA dataset, proposed approach obtained accuracy of 96%, precision of 91%, recall of 81%, AUC of 69%, TPR of 58%, FPR of 50%; for DICOM dataset, proposed approach obtained accuracy of 97%, precision of 94%, recall of 85%, AUC of 72%, TPR of 55%, FPR of 45%.The proposed network outperformed both pre-trained architectures and state-of-the-art approaches in terms of accuracy. In future, the proposed network’s performance will be evaluated using various dropout ratios and without dropout, as well as the importance of the inception layers added to the network and how many inception layers are required to obtain greater performance. Understanding the implications of this systematic investigation has some limitations. Any article authored after that date will not have been identified in this study because the scanning period for detecting possible publications concluded on a certain date. There’s also the risk of losing out on other relevant databases for literature that could have an impact on the study’s results. We consider these findings to be limited, and we advise caution in applying them to other situations.

Ensemble Deep Learning Architectures in Bone Cancer Detection Based. . .

63

References 1. Torki, A. (2020). Fuzzy rank correlation-based segmentation method and deep neural network for bone cancer identification. Neural Computing and Applications, 32(3), 805–815. 2. Vandana, B. S., Antony, P. J., & Sathyavathi, R. A. (2020). Analysis of malignancy using enhanced graphcut-based clustering for diagnosis of bone cancer. In Information and communication technology for sustainable development (pp. 453–462). Springer. 3. Shrivastava, D., Sanyal, S., Maji, A. K., & Kandar, D. (2020). Bone cancer detection using machine learning techniques. In Smart healthcare for disease diagnosis and prevention (Vol. 20, pp. 175–183). Academic. 4. Li, W., Wang, G. G., & Gandomi, A. H. (2021). A survey of learning-based intelligent optimization algorithms. Archives of Computational Methods in Engineering, 28(5), 3781–3799. 5. Wang, G. G., Gandomi, A. H., Alavi, A. H., & Gong, D. (2019). A comprehensive review of krill herd algorithm: Variants, hybrids and applications. Artificial Intelligence Review, 51(1), 119–148. 6. Agarwal, P., Yadav, A., Mathur, P., Pal, V., & Chakrabarty, A. (2022). BID-Net: An automated system for bone invasion detection occurring at stage T4 in oral squamous carcinoma using deep learning. Computational Intelligence and Neuroscience. 7. Kim, J. Y., Kim, D., Jeon, K. J., Kim, H., & Huh, J. K. (2021). Using deep learning to predict temporomandibular joint disc perforation based on magnetic resonance imaging. Scientific Reports, 11(1), 1–9. 8. Zhang, X., Li, H., Wang, C., Cheng, W., Zhu, Y., Li, D., et al. (2021). Evaluating the accuracy of breast cancer and molecular subtype diagnosis by ultrasound image deep learning model. Frontiers in Oncology, 11, 606. 9. Pandey, B., Pandey, D. K., Mishra, B. P., & Rhmann, W. (2021). A comprehensive survey of deep learning in the field of medical imaging and medical natural language processing: Challenges and research directions (Vol. 34, pp. 5083–5099). Journal of King Saud University-Computer and Information Sciences. 10. Zheng, Q., Yang, L., Zeng, B., Li, J., Guo, K., Liang, Y., & Liao, G. (2021). Artificial intelligence performance in detecting tumor metastasis from medical radiology imaging: A systematic review and meta-analysis. EClinicalMedicine, 31, 100669. 11. Yeh, L. R., Zhang, Y., Chen, J. H., Liu, Y. L., Wang, A. C., Yang, J. Y., et al. (2022). A deep learning-based method for the diagnosis of vertebral fractures on spine MRI: Retrospective training and validation of ResNet. European Spine Journal, 31, 2022–2030. 12. Amarasinghe, K. C., Lopes, J., Beraldo, J., Kiss, N., Bucknell, N., Everitt, S., et al. (2021). A deep learning model to automate skeletal muscle area measurement on computed tomography images. Frontiers in Oncology, 11. 13. Cheng, D. C., Liu, C. C., Hsieh, T. C., Yen, K. Y., & Kao, C. H. (2021). Bone metastasis detection in the chest and pelvis from a whole-body bone scan using deep learning and a small dataset. Electronics, 10(10), 1201. 14. Lin, Q., Li, T., Cao, C., Cao, Y., Man, Z., & Wang, H. (2021). Deep learning based automated diagnosis of bone metastases with SPECT thoracic bone images. Scientific Reports, 11(1), 1–15. 15. He, M., Wang, X., & Zhao, Y. (2021). A calibrated deep learning ensemble for abnormality detection in musculoskeletal radiographs. Scientific Reports, 11(1), 1–11. 16. Jakaite, L., Schetinin, V., Hladůvka, J., Minaev, S., Ambia, A., & Krzanowski, W. (2021). Deep learning for early detection of pathological changes in x-ray bone microstructures: Case of osteoarthritis. Scientific Reports, 11(1), 1–9.

Digital Dermatitis Disease Classification Utilizing Visual Feature Extraction and Various Machine Learning Techniques by Explainable AI İsmail Kirbaş and Kürşad Yiğitarslan

1 Introduction Bovine digital dermatitis (DD), also known as papillomatous digital dermatitis, Mortellaro’s disease, or hairy heel warts, is a highly contagious foot condition that is quite widespread [2, 3]. This condition, which is most frequently seen in the back foot, is brought on by the following: Treponema spp. [5, 20]. This disease, which was discovered for the first time in Italy in 1974, was seen to leave the herd’s cows very lame [2]. According to the findings of several studies, the incidence of DD is significantly higher in dairy cows (32.2%) than it is in beef cows (10.8%). [11]. In addition to raising treatment and labor expenses, DD reduces milk output, reproductive performance, and animal welfare in dairy cows [8, 14]. According to reports, the typical price for a case of digital dermatitis is US$132.96. [4]. Even though Treponema spp. is acknowledged as the major pathogen in the development of the disease, DD is a complex illness. This condition is exacerbated by the muddy and dirty walking path [12, 21]. Pathogens such Bacte-riodetes, Fusobacteria, Tenericutes, Firmicutes, Proteobacteria, and Actinobacte-ria all play a secondary role in the development of digital dermatitis [11]. There are painful lesions located above the interdigital area and along the coronary band, which are near to the heels. These lesions are prone to

İ. Kirbaş (✉) HAYTEK Joint Application and Research Center of Digital Technologies in Livestock Sector, Burdur Mehmet Akif Ersoy University, Burdur, Turkey e-mail: [email protected] K. Yiğitarslan Veterinary Faculty, Department of Surgery, Burdur Mehmet Akif Ersoy University, Burdur, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Hossain et al. (eds.), Explainable Machine Learning for Multimedia Based Healthcare Applications, https://doi.org/10.1007/978-3-031-38036-5_4

65

66

İ. Kirbaş and K. Yiğitarslan

bleeding. It is possible for the lesioned area to develop filiform papillae, and the lesions themselves may be surrounded by hyperkeratotic skin with hair that is significantly longer than normal. The scoring system created by Döpfer et al. is utilized to categorize these lesions, and the lesions are rated in five categories [7]. Thanks to improvements in computer hardware and artificial intelligence techniques, it is now possible to design information systems that can learn to process images without human input. The digitization and classification of images, especially in light of recent developments in convolutional neural networks and deep learning algorithms, has had a transformative effect. Within the scope of this research, the DD disease is analysed as a classification challenge, and models that are able to classify data with a high degree of precision using a variety of machine learning approaches have been built. A comprehensive analysis of the performance of these models has also been completed. The aim of the study was to diagnose Digital Dermatitis disease at the disease level by using machine learning algorithms to photographs.

2 Materials and Methods In order to collect the data for the study, 206 photographs of the lesions on the hind legs of 168 Holstein cows with DD disease, aged 4–7 years, were taken. These cows were identified as having the disease as a result of the examinations that were carried out in the Burdur region. The photographs were graded according to the severity of the lesions that were assigned to each class. The condition is ranked from most mild (degree 1) to most severe (degree 4) with degree 1 being the least severe. Following the process of grouping, there were 60 images in the first degree, 71 in the second degree, 56 in the third degree, and 19 in the fourth degree. Figure 1 illustrates a case study image with each of the classifications by way of illustration. In order to obtain satisfying results when engaging in classification-based machine learning issues, it is vital to ensure that the data inside the data set is distributed uniformly among all of the classes that make up the data set. As a result of this, the data set that was consisted of the original photographs was generated with

Fig. 1 Examples of photos showing the four degrees of the disease are known as digital dermatitis

Digital Dermatitis Disease Classification Utilizing Visual. . .

67

Fig. 2 Stages of the artificial intelligence-aided process for classifying medical images

1000 instances in each class by utilising techniques from artificial intelligence. This was done to increase the number of examples and produce a data set that was more evenly distributed. A well-known deep learning technique known as the Inceptionv3 algorithm was applied to the processing of the replicated dataset’s photos, which resulted in the generation of more than 2000 numerical features. Stochastic Gradient Descent, Logistic Regression, K-Nearest Neighbour, Support Vector Machine, AdaBoost, Random Forest, Naive Bayes, and Neural Networks are some of the common machine learning techniques that were used to classify the numerical data set that was the result of this transformation procedure. Within the confines of the study, the dataset was first randomly segmented into 80 percent test data and 20 percent training data before being used to generate the test and training data sets. We adopted a method of learning called supervised learning. To begin, the models were initially trained using the data set utilised for training. The validation method utilised 10% of the data set that was used for training, and the performance of each model’s training was evaluated individually. Figure 2 provides a visual representation of each stage of the classification process. As a consequence of this, the initial data set had to be replicated in order to work with an equivalent number of samples. This was accomplished by using the computer language Python to perform the operations of rotate, zoom in, and zoom out. After the augmentation was complete, 1000 representative samples were drawn at random from each category in order to compile the data collection. The data set acquired from augmented photos was put through a process called “visual feature extraction” so that it could function with numerical data and be compatible with methodologies for machine learning that are frequently utilised in the research literature. The process of extracting visual features from data using a deep learning algorithm involves the digitization of the layer outputs that were obtained before the fully linked layer in the final layer. The steps involved in improving the initial data set, extracting visual features, and finally categorising the data through the use of machine learning techniques are illustrated in Fig. 2. Numerous criteria are frequently employed in the literature to assess the effectiveness of models. Cumulative Accuracy (CA), Area Under Curve (AUC), F1-Score, Specificity, Precision, and Recall measures rank among these. In performance measurements, the abbreviation TN stands for True Negative, TP for True Positive, FN for False Negative, and FP for False Positive is also widely used. Table 1 lists the performance measurements and calculation formulae used in the study.

İ. Kirbaş and K. Yiğitarslan

68 Table 1 For the classification metrics, model performance measures and equations are used

Performance metric Precision

Formulations TP TPþFP TP TPþFN TN TN þFP TPþTN TPþFN þTN þFP 2:ðprecision:recallÞ precisionþrecall

Recall Specificity Accuracy F1 Score

Table 2 Model performance evaluations in numeric form, including all machine learning models Model Logistic Regression SVM Random Forest AdaBoost SGD Naive Bayes Neural Network KNN

AUC 0.9778

Cumulative accuracy 0.8735

F1 0.8731

Precision 0.8729

Recall 0.8735

Specificity 0.9578

0.9806 0.8009 0.6358 0.9073 0.8167 0.9936 0.9953

0.8730 0.5795 0.4538 0.8610 0.5768 0.9353 0.9540

0.8724 0.5799 0.4537 0.8599 0.5780 0.9349 0.9536

0.8746 0.5806 0.4537 0.8595 0.5831 0.9349 0.9552

0.8730 0.5795 0.4538 0.8610 0.5768 0.9353 0.9540

0.9577 0.8598 0.8179 0.9537 0.8589 0.9784 0.9847

3 Results Six classification metrics were used to compare the performance data of the eight different machine learning models that were created and trained for the study. The results are shown in Table 2 and Fig. 3 accordingly. As a consequence of this, the KNN, ANN, LR, SVM, and SGD models have all attained values that are greater than 0.86 when the Cumulative Accuracy performance metric is used as a foundation. According to these findings, any one of the five models can come up with an accurate classification for the given problem. When the models that were utilised in the research were judged based on the precision metric, the KNN, ANN, LR, SVM, and SGD models demonstrated performance that was noticeably superior to that of the other models. The same may be said for the F1-score as well as the recall metrics. When everything is said and done, the KNN and ANN models come out on top when it comes to performance when the specificity metric is taken into account. Tree-based models such as AdaBoost and Random Forest are unsuccessful, despite the fact that there is no discernible gap in performance amongst the top four models. The amount of time invested in training and testing is also considered to be an essential statistic. This is because machine learning techniques are used to construct models, which are then evaluated for their performance. It is desirable that the training and test run times of the produced model be as low as is humanly possible.

Digital Dermatitis Disease Classification Utilizing Visual. . .

69

Fig. 3 A graphical representation of a comparison of the performance of each machine learning model Table 3 A numerical comparison of the amount of time spent on the train and the test for each machine learning model

Model Naive Bayes AdaBoost SVM SGD Logistic Regression Random Forest KNN Neural Network

Train time(s) 11.6720 40.8760 151.7910 20.3750 128.6910 10.1840 90.3970 83.7030

Test time (s) 2.7450 2.4560 21.0330 4.2810 2.2080 1.8400 14.9480 4.6600

This is in addition to the high classification performance of the model. In Table 3, the training and testing durations of the models that were considered for the study are presented as a number of seconds. The numerical data that are shown in Table 3 are represented graphically in Fig. 4 as a bar graph. When the values for training and test time are compared, it is shown that the SVM and Logistic Regression models, respectively, have the longest training times. With a time of 11.67 s, the Naive Bayes model was the trained model that was the quickest overall. In the SVM model, the test time that was assessed to be the longest was 21 s. In contrast, the Random Forest model had the test time that was measured to be the quickest at 1.84 s.

70

İ. Kirbaş and K. Yiğitarslan

Fig. 4 A graphical representation of the comparison of the training and test times for all machine learning models

Being a vital and incontestable component in the improvement of our lives, artificial intelligence is currently enhancing the quality of life we enjoy. Explainable artificial intelligence, often known as XAI, is a term used to describe strategies and procedures for developing artificial intelligence applications that allow users to comprehend “why” the applications make particular choices. To put it another way, we refer to an artificial intelligence (AI) system as a XAI system if we are able to query it for explanations regarding its own internal logic. Explainability is a relatively recent trait that is beginning to garner interest in the field of artificial intelligence (AI). On the other hand, systems that use artificial intelligence have become so complicated over the course of time that it is nearly hard for a typical user to comprehend how they operate. Yet, they should be aware of how the system operates, or at the very least, they should be able to acquire information when it is required. Traditional machine learning methods, such as decision trees, linear regression, and Bayesian networks have been the subject of research and development by mathematicians and statisticians for hundreds of years. These algorithms were devised a very long time before the creation of computers; thus they are quite easy to understand. After making a decision using one of these more conventional procedures, it is simple to develop an explanation for that decision. Despite this, they are only capable of achieving a limited degree of accuracy. As a result, our classical techniques could be explained, but their performance as models was poor. Nearly everything was different when multilayer neural networks were first implemented, and this was especially true when deep neural networks were first developed. Because of this advancement, a new area called deep learning came into being. Deep Learning is a subfield of artificial intelligence that focuses on duplicating the working mechanisms of neuron cells in our brains by utilising Artificial Neural Networks. Deep Learning is also known as “neural simulation.” Complex neural networks that execute with a high level of accuracy have been made possible as a

Digital Dermatitis Disease Classification Utilizing Visual. . .

71

Fig. 5 Comparison of accuracy and explainability performances for ml algorithms

result of advances in processing power as well as the optimization of open-source frameworks for deep learning. Researchers in artificial intelligence started competing with one another to obtain the best possible level of accuracy. This rivalry has unquestionably assisted us in the development of fantastic products that are enabled by AI; nevertheless, it did so at the expense of explainability. It is possible for neural network models to be exceptionally difficult to comprehend due to their enormous complexity. They are able to be constructed with a billion different parameter. For instance, Open AI’s innovative natural language processing (NLP) model, GPT-3, consists of over 175 billion machine learning parameters, and it’s tough to extract any logic from a model that’s this complicated. Figure 5 shows the relationship between the accuracy and explainability of machine learning algorithms. As can be seen, an AI developer stands to lose a significant amount of ground if they fall back on conventional algorithms rather than deep learning models. That’s why every day we see more accurate models with lower and lower explainability characteristics. However, the increasing use of artificial intelligence-based systems in sensitive areas and making automatic business decisions shows that there is a need for explainability more than ever. In the research that has been done on the topic, two distinct approaches have emerged as the most popular choices for clarifying how models interact with tabular data. The first of these is called LIME (Local Interpretable Model-agnostic Explanations). An explainable artificial intelligence model known as LIME is utilized in order to gain an understanding of the predictions made by all classifiers.[13, 18, 19] These predictions indicate the contribution of each characteristic to a prediction with regard to their interpretability. LIME is an algorithm that can accurately describe the forecasts produced by each classifier by approximating those forecasts to a model that can be interpreted locally. Certain classifiers make use of representations that are completely incomprehensible to end users. Lime provides an explanation of these classifiers in terms of interpretable representations, even if the representation in question is not the one that the classifier uses in practice [9].

İ. Kirbaş and K. Yiğitarslan

72

Throughout the course of working on the issue of classification, a brand-new technique known as SHAP (SHapley Additive exPlanations) was developed to explain the predictions that were produced by machine learning models. This method was named after the mathematician Shapley [16, 17]. It is possible to explain the findings of any machine learning model by determining how much each attribute contributed to the prediction and measuring that contribution. This approach reveals, both numerically and graphically, the degree to which the model is affected by particular elements, and it is based on the notions that are presented in game theory. It does this by linking the optimal distribution of credit with local explanations with the assistance of the traditional Shapley values from game theory and their related extensions. It was first published in 2017 by Lundberg and Lee, and it is a brilliant method for reverse-engineering the results of any predictive algorithm [15]. SHAP attempts to provide an explanation for the prediction it made about an instance x by computing the contribution that each feature made to the prediction. The SHAP explanation method, which makes use of coalitional game theory, is utilised in the computation of Shapley values. The characteristic qualities of a data instance take part in a coalition in the capacity of players. By employing Shapley values, we are able to distribute the prediction across the traits in an equitable manner. It’s possible that a player might represent a single feature value, like in tabular data. A player can alternatively be seen of as a compilation of many feature values. For the purpose of describing an image, for example, pixels can be grouped together to form super pixels and then distributed with the prediction. One of the novelties that SHAP has to provide is an additive feature attribution approach, sometimes known as a linear model. This explanation of Shapley’s value is presented in this manner. This perspective connects the values of Shapley with those of LIME. In Eq. 1, you can find an explanation of SHAP. gðz0 Þ = ∅0 þ

M j=1

∅j z0j

ð1Þ

When ∅j 2 ℝ is the feature attribution for a feature j, M is the maximum coalition size, z′ 2 {0, 1}M is the simplified features, and g is the explanation model, SHAP gives the explanation. The efficiency, additivity, dummy, symmetry properties can only be satisfied by Shapley values. Since it computes Shapley values, SHAP also satisfies these requirements. The feature values that comprise the forecast have a distribution that is quite uniform. We are provided with contradictory arguments that compare the forecast to the normal forecast. SHAP is the connecting mechanism between Shapley values and LIME. Because of this, you will have a much easier time comprehending both methods. In addition to this, it helps to consolidate interpretable machine learning under a single roof. The implementation of treebased models in SHAP happens very quickly. Considering that the length of time required for calculations is the primary barrier to the adoption of Shapley values. It is possible to swiftly compute the multiple Shapley values that are required for the interpretations of the global model. Clustering, summary plots, interactions, feature dependence, and feature importance are some of the methodologies that can be

Digital Dermatitis Disease Classification Utilizing Visual. . .

73

Fig. 6 The SHAP method provides a demonstration of the factor effects as well as the degree of influence that the KNN model has for each of the four different classes

utilised for global interpretation. There is a level of coherence between global interpretations and local descriptions because to the fact that Shapley values serve as the “atomic unit” of SHAP’s global interpretations. Figures 6 and 7 both show the degree of influence of the factors of the KNN model, which was used in the study and obtained the most successful results, on the determination of the classes. After the visual future extraction process, 2047 features were identified, and these features were converted into numerical values. To evaluate visually, the 10 most effective features in the classification of 2047 features were selected, and SHAP application was performed on these numeric features. Analysing the given figure reveals that the most significant factors identified for each class have different influence rankings and influence ratios. In the diagram, the blue dots represent a low degree of influence, while the red dots represent a high degree of influence of the class-related factor. The condensed plot incorporates both the significance of the variables and the impacts that those elements have. At each of the points that make up the summary plot, there is a Shapley value that corresponds to both a feature and an instance. The feature and the Shapley value both work together to establish where the point is on the y-axis. The position on the x-axis is determined by the Shapley value. The value of the characteristic is indicated by the hue, which ranges from low to high. It is

74

İ. Kirbaş and K. Yiğitarslan

Fig. 7 The degree of influence of the factors used in the classification estimation according to the classes

possible to examine how the Shapley values are dispersed throughout each feature as the overlapping dots are jittered along the y-axis. The characteristics are presented here in a descending order of importance. As a result of SHAP analysis, it is seen that some factors of high importance in classification are used in four classes, but the degree of influence and the order of influence vary from class to class.

4 Conclusion According to reports, Holstein cows are particularly susceptible to the prevalent dairy cattle ailment known as digital dermatitis [6]. Although there are Montofon and Simental cattle breeds in the Burdur region, the study is supported by the fact that the affected cows are Holstein breeds. Poor nail care, unsanitary environments, and damp barn flooring are cited as contributing factors to the development of DD disease [10]. The existence of comparable images in the establishments that study participants visited provides support for the information found in the literature. It is stressed that lesions in DD disease are seen in the plantar region and are more commonly seen in the back legs [1]. Similarly, the study’s discovery of DD illness in cows’ hind legs and between their heels is consistent with information from the literature. No studies specifically designed for DD disease using artificial intelligence for medical picture classification were found in the literature review. The present study distinguishes itself as an original work in this sense.

Digital Dermatitis Disease Classification Utilizing Visual. . .

75

Within the parameters of the study, a computer utilizing artificial intelligence techniques identified and evaluated the prevalent ailment known as digital dermatitis, which results in significant economic losses. Eight alternative machine learning models (Naive Bayes, AdaBoost, LR, SGD, SVM, Random Forest, KNN, and ANN) were constructed and analysed using six distinct classification metrics in our application for classifying medical photos (AUC, CA, F1-score, precision, recall, and specificity). Extremely high classification success of 0.95 has been attained, according to an analysis of the results. Accordingly, KNN, ANN, SVM, and LR stand out as the most effective classification models. The model with the best performance parameters among these four models was identified as KNN. The results obtained from the study show that it can be used safely in the grading of DD disease with high performance in all four models. Acknowledgments This research was made possible thanks to the funding provided by the project titled “Diagnosis and treatment of foot diseases in dairy cattle,” which is the fifth subproject of the main project titled “Increasing the Sectoral Competitiveness of the Province of Burdur: Integrating Development by Differentiating in Agriculture.” The ethical approval decision was dated March 16, 2022 and given the number 869 by the Local Ethics Committee on Animal Experiments at Burdur Mehmet Akif Ersoy University.

References 1. Bassett, D. R., Toth, L. P., LaMunion, S. R., & Crouter, S. E. (2017). Step counting: A review of measurement considerations and health-related applications. Sports Medicine, 47, 1303–1315. https://doi.org/10.1007/s40279-016-0663-1 2. Biemans, F., Bijma, P., Boots, N., & De Jong, M. (2017). Digital dermatitis in dairy cattle: The contribution of different disease classes to transmission. Epidemics, 23, 76–84. https://doi.org/ 10.1016/j.epidem.2017.12.007 3. Bruijnis, M. R. N., Beerda, B., Hogeveen, H., & Stassen, E. N. (2012). Assessing the welfare impact of foot disorders in dairy cattle by a modeling approach. Animal, 6, 962–970. https://doi. org/10.1017/S1751731111002606 4. Cha, E., Hertl, J. A., Bar, D., & Gröhn, Y. T. (2010). The cost of different types of lameness in dairy cows calculated by dynamic programming. Preventive Veterinary Medicine, 97, 1–8. https://doi.org/10.1016/j.prevetmed.2010.07.011 5. Clegg, S. R., Mansfield, K. G., Newbrook, K., Sullivan, L. E., Blowey, R. W., Carter, S. D., & Evans, N. J. (2015). Isolation of digital dermatitis treponemes from hoof lesions in Wild North American Elk (Cervus elaphus) in Washington State, USA. Journal of Clinical Microbiology, 53, 88–94. https://doi.org/10.1128/JCM.02276-14 6. Demirkan, I., Murray, R., & Carter, S. (2000). Skin diseases of the bovine digit associated with lameness. The Veterinary Bulletin, 70, 149–171. 7. Döpfer, D., Koopmans, A., Meijer, F. A., Szakáll, I., Schukken, Y. H., Klee, W., Bosma, R. B., Cornelisse, J. L., van Asten, A. J., & ter Huurne, A. A. (1997). Histological and bacteriological evaluation of digital dermatitis in cattle, with special reference to spirochaetes and Campylobacter faecalis. The Veterinary Record, 140, 620–623. https://doi.org/10.1136/vr.140.24.620 8. Garbarino, E. J., Hernandez, J. A., Shearer, J. K., Risco, C. A., & Thatcher, W. W. (2004). Effect of lameness on ovarian activity in postpartum Holstein cows. Journal of Dairy Science, 87, 4123–4131. https://doi.org/10.3168/jds.S0022-0302(04)73555-9

76

İ. Kirbaş and K. Yiğitarslan

9. Hasib, K. Md, Rahman, F., Hasnat, R., & Alam, Md. G. R. (2022). A machine learning and explainable AI approach for predicting secondary school student performance. In 2022 IEEE 12th annual Computing and Communication Workshop and Conference (CCWC), 0399–0405. https://doi.org/10.1109/CCWC54503.2022.9720806. 10. Hernandez, J., Shearer, J. K., & Webb, D. W. (2001). Effect of lameness on the calving-toconception interval in dairy cows. Journal of the American Veterinary Medical Association, 218, 1611–1614. https://doi.org/10.2460/javma.2001.218.1611 11. Hesseling, J., Legione, A. R., Stevenson, M. A., McCowan, C. I., Pyman, M. F., Finochio, C., Nguyen, D., Roic, C. L., Thiris, O. L., Zhang, A. J., van Schaik, G., & Coombe, J. E. (2019). Bovine digital dermatitis in Victoria, Australia. Australian Veterinary Journal, 97, 404–413. https://doi.org/10.1111/avj.12859 12. Holzhauer, M., Bartels, C. J. M., Döpfer, D., & Van Schaik, G. (2008). Clinical course of digital dermatitis lesions in an endemically infected herd without preventive herd strategies. Veterinary Journal, 177, 222–230. https://doi.org/10.1016/j.tvjl.2007.05.004 13. Kumarakulasinghe, N. B., Blomberg, T., Liu, J., Leao, A. S., & Papapetrou, P. (2020). Evaluating local interpretable model-agnostic explanations on clinical machine learning classification models. In 2020 IEEE 33rd international symposium on Computer-Based Medical Systems (CBMS), pp. 7–12. 14. Losinger, W. C. (2006). Economic impacts of reduced milk production associated with papillomatous digital dermatitis in dairy cows in the USA. The Journal of Dairy Research, 73, 244–256. https://doi.org/10.1017/S0022029906001798 15. Lundberg, S., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. arXiv. https://doi.org/10.48550/ARXIV.1705.07874 16. Mangalathu, S., Hwang, S.-H., & Jeon, J.-S. (2020). Failure mode and effects analysis of RC members based on machine-learning-based SHapley Additive exPlanations (SHAP) approach. Engineering Structures, 219, 110927. https://doi.org/10.1016/j.engstruct.2020.110927 17. Nohara, Y., Matsumoto, K., Soejima, H., & Nakashima, N. (2019). Explanation of machine learning models using improved Shapley additive explanation. In Proceedings of the 10th ACM international conference on Bioinformatics, Computational Biology and Health Informatics, 546. https://doi.org/10.1145/3307339.3343255. 18. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). ‘Why should I trust you?’: Explaining the predictions of any classifier. arXiv. https://doi.org/10.48550/ARXIV.1602.04938 19. Shi, S., Zhang, X., & Fan, W. (2020). A modified perturbed sampling method for local interpretable model-agnostic explanation. arXiv. https://doi.org/10.48550/ARXIV.2002.07434 20. Sogstad, A. M., Fjeldaas, T., Østerås, O., & Forshell, K. P. (2005). Prevalence of claw lesions in Norwegian dairy cattle housed in tie stalls and free stalls. Preventive Veterinary Medicine, 70, 191–209. https://doi.org/10.1016/j.prevetmed.2005.03.005 21. Trott, D. J., Moeller, M. R., Zuerner, R. L., Goff, J. P., Waters, W. R., Alt, D. P., Walker, R. L., & Wannemuehler, M. J. (2003). Characterization of Treponema phagedenis-like spirochetes isolated from papillomatous digital dermatitis lesions in dairy cattle. Journal of Clinical Microbiology, 41, 2522–2529.

Explainable Machine Learning in Healthcare Pawan Whig, Shama Kouser, Ashima Bhatnagar Bhatia, Rahul Reddy Nadikattu, and Pavika Sharma

1 Introduction Breast cancer is becoming the main factor in the deaths of many women globally. The method by which the illness is spread is examined as the main cause of the deaths of women. Although innovation has significantly influenced how we live, we still lag in correctly detecting this serious illness when it is still in its early stages [1]. Because the condition isn't always diagnosed in its early stages, the rate of mammography has increased for a particular age group of worried women [27]. Breast cancer is treatable, and on the off chance, life may be saved. It would examine at the beginning [4]. Numerous factors have been considered as potential causes of this terrible disease, including hormonal issues, ancestry in the family, obesity, radiation therapy, and others. In-depth learning and AI computations were used to diagnose this condition [2]. Timely diagnosis of breast cancer can significantly improve a patient's prognosis and increase their chances of survival [6–9]. Furthermore, more accurate classification of benign tumors can prevent unnecessary medical interventions. Consequently, there is significant research focused on improving the diagnosis and categorization of breast cancer using machine learning [10]. Machine learning offers distinct advantages in identifying critical features from complex breast cancer datasets, P. Whig (✉) · A. B. Bhatia Vivekananda Institute of Professional Studies-TC, New Delhi, India S. Kouser Department of Computer Science, Jazan University, Jazan, Saudi Arabia e-mail: [email protected] R. R. Nadikattu University of the Cumberland, Williamsburg, KY, USA P. Sharma BPIT, New Delhi, India © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Hossain et al. (eds.), Explainable Machine Learning for Multimedia Based Healthcare Applications, https://doi.org/10.1007/978-3-031-38036-5_5

77

78

P. Whig et al.

Fig. 1 Flow of steps for the treatment using machine learning

making it the preferred approach for breast cancer pattern classification and predictive modeling [13]. Figure 1 illustrates the sequential process for treating breast cancer using machine learning. The Flow of steps for the treatment using machine learning is shown in Fig. 1. Preprocessing, feature extraction, and classification are the three basic steps of the many data mining and machine learning approaches that have been developed in the previous several decades for breast cancer diagnosis and classification [29]. Preprocessing of mammography films is crucial to enhance the visibility of peripheral regions and intensity distribution, thus facilitating interpretation and analysis. Various techniques have been proposed for this purpose, which involve utilizing spatial frequency features of pixel intensity fluctuations and transforming the image using texture analysis algorithms based on transforms such as wavelet, fast Fourier, Gabor, and singular value decomposition [15]. Principal component analysis (PCA) can be applied to reduce the dimensionality of the feature representation. Furthermore, several studies have focused on automating breast cancer detection through machine learning algorithms. [26]. Methods for diagnosing breast cancer mentioned in the literature can be regarded as semi-automatic despite major efforts as shown in Fig. 2. The hyperparameters are those parameters, according to Kuhn and Johnson, that cannot be readily inferred from the data. Usually, certain model parameters need to be adjusted to get an algorithm to work as expected [28]. The choice of any given model's final tuning parameters is thus still up for debate [11].

Explainable Machine Learning in Healthcare

79

Fig. 2 Methods for diagnosing breast cancer

Machine learning is becoming more and more in demand, eventually becoming a service. Unfortunately, there are still significant obstacles to entry and specialized skills in the field of machine learning [18]. It takes a certain set of abilities and knowledge to create an efficient machine-learning model that includes the preprocessing, feature selection, and classification phases. An illustration of a data transformation pipeline or machine learning model. There are several options available at every pipeline level. A machine learning expert selects the best approach for the present issue domain [25].

2 Data Set Used The Wisconsin Breast Cancer dataset secondhand in this training was taken from the UCI Mechanism Knowledge Repository [3]. The same dataset that Bennett uses to distinguish between malignant and non-cancerous tumors is shown in Fig. 3. The dataset consists of measurements taken from digital images of fine-needle aspirate (FNA) tests of breast tumors [16]. The dataset includes 569 observations, each corresponding to a patient at a Wisconsin hospital. The first two characteristics of each observation represent a unique identifying number and the diagnostic status (malignant or benign). Of the 569 observations, 139 were diagnosed as malignant and 337 were diagnosed as benign [19]. The remaining 93 characteristics in the dataset represent 30 genuine qualities measured from the FNA tests, including the mean, standard deviation, and worst 10 cell nucleus characteristics as shown in Fig. 3.

80

P. Whig et al.

Fig. 3 Wisconsin breast cancer dataset used

Overall, the dataset is intended to aid in the classification of breast tumors as either malignant or benign based on these measured characteristics [12, 20]. The act of selecting a subset of pertinent attributes from a pool of potential subsets is known as feature selection in machine learning, and it is a necessary step in building a model [21]. A successful prediction model must carefully choose its features. Applying feature selection methods has several advantages, including (a) quicker and more efficient machine learning algorithm training, (b) simpler and easier to understand models, (c) increased model accuracy with the proper subset, and (d) less overfitting [5, 14, 22]. Some breast cancer risk factors are presented in Fig. 4.

Explainable Machine Learning in Healthcare

81

Fig. 4 Risk factors for breast cancer

3 Various Machine Learning Algorithms Due to the enormous demand and technological improvements over the past several years, machine learning has become much more common [23]. Machine learning has been enticing to companies across a wide range of sectors due to its capacity to extract value from data. With some tweaking and modest modifications, off-the-shelf machine learning algorithms are used in the design and implementation of the majority of machine learning products [5, 17].

4 Linear Regression By fitting a linear equation to the data, the supervised learning process of lined reversion attempts to describe the connection amid a continuous target mutable and one or more sovereign variable stars [24]. Scattered plots and correlation matrices are only two of the numerous techniques available to investigate the relationship between variables. A good association between a self-governing mutable (x-axis) and a reliance on a mutable, for instance, may be seen in the scatter plot below (y-axis). The other rises as the first one does [30]. Ordinary-least squares are the most often used method (OLE). By minimizing the number of quadrangles of the coldness amid the data opinions and the deterioration line, the optimum regression line is identified using this approach. For the aforementioned data points, the OLE-obtained regression line appears to be as shown in Fig. 5.

82

P. Whig et al.

Fig. 5 OLE-obtained regression line

Fig. 6 Decision boundary to discriminate between classes

5 SVM Support Vector Machines discriminative model that separates the data into different classes using a hyperplane in a high-dimensional space. SVM is particularly useful when the data is non-linearly separable and when the number of features is large as shown in Fig. 6. The hyperplane is defined by a set of parameters that are learned from the training data. This margin is important because it provides a measure of the robustness of the classifier to new data. Once the data is transformed, SVM finds the

Explainable Machine Learning in Healthcare

83

hyperplane that separates the data with the maximum margin. SVM then uses the hyperplane to classify new data points into one of the two classes. SVM is a powerful algorithm with several advantages over other classification algorithms. It can handle non-linearly separable data, high-dimensional data, and noisy data. It also has a regularization parameter that controls overfitting, which is a common problem in machine learning. However, SVM can be computationally intensive and requires careful selection of the kernel function and other parameters. In summary, SVM is a supervised learning algorithm that learns a hyperplane to separate the data into different classes. SVM is a powerful algorithm that can handle non-linearly separable data and high-dimensional data. However, it can be computationally intensive and requires careful selection of the kernel function and other parameters. SVM is memory efficient since it only uses a portion of the training data while determining the decision boundary. On the other hand, huge datasets increase training time, which has a detrimental impact on performance.

6 Naive Bayes A method for supervised learning used for classification problems is called Naive Bayes. It is also known as the Naive Bayes Classifier for this reason. Bayes' theorem and assumes that the features of the data are conditionally independent given the class. This assumption is often referred to as the “naive” assumption and is why the algorithm is called “Naive Bayes”. In Naive Bayes, the goal is to classify a new data point into one of several classes based on its features. The algorithm first learns the probability distribution of each feature given the class using the training data. The probability distribution of each feature given the class is typically modeled using a probability density function, such as a Gaussian distribution for continuous features or a multinomial distribution for discrete features. Bayes' theorem is then used to calculate the probability of each class given the features of the new data point. In Naive Bayes, the prior probability of each class is assumed to be equal, although this assumption can be relaxed in some cases. Naive Bayes has several advantages over other classification algorithms. It is simple to implement and computationally efficient, making it well-suited for large datasets. It also works well with high-dimensional data and is robust to irrelevant features. However, the naive assumption of conditional independence may not always hold in practice, which can lead to suboptimal performance in some cases. It assumes that the features of the data are conditionally independent given the class, which is why it is called “Naive Bayes”. Naive Bayes is simple, computationally efficient, and works well with high-dimensional data, but may not always perform optimally due to the naive assumption of conditional independence.

84

P. Whig et al.

pðAjBÞ =

pðAÞ:pðBjAÞ p ð BÞ

ðBayes0 TheoremÞ

When compared to complex algorithms, the naive Bayes method is particularly quick due to the presumption that all characteristics are independent. Sometimes, more speed is chosen to greater precision. The naïve Bayes algorithm, on the other hand, is less accurate than complex algorithms due to the same assumption.

7 Logistic Regression It is a type of generalized linear model that models the probability of a binary outcome (e.g. yes or no, true or false) given a set of input features. The algorithm works by fitting a logistic function to the training data, which maps the input features to the probability of the binary outcome. Logistic regression works by minimizing a loss function, such as the crossentropy loss, that measures the difference between the predicted probabilities and the true labels of the training data. The optimization problem is typically solved using gradient descent or other iterative optimization methods as shown in Fig. 7. Once the logistic regression model is trained, it can be used to predict the probability of the binary outcome for new data points. A decision threshold is typically used to convert the predicted probabilities to binary outcomes. For example, if the decision threshold is set to 0.5, any predicted probability above 0.5 is classified as a positive outcome, while any predicted probability below 0.5 is classified as a negative outcome.

Fig. 7 supervised learning approach

Explainable Machine Learning in Healthcare

85

Logistic regression has several advantages over other classification algorithms. It is simple, interpretable, and computationally efficient. It also works well with both linearly and nonlinearly separable data. However, logistic regression may not work well with high-dimensional data or when the relationship between the input features and the binary outcome is complex. It models the probability of a binary outcome given a set of input features using a logistic function. The weights of the logistic function are learned from the training data using maximum likelihood estimation or other methods. Logistic regression is simple, interpretable, and computationally efficient, but may not work well with high-dimensional or complex data. Choosing the positive class for all probability values greater than 50% is not always preferred. In the case of spam emails, we need to be nearly certain before we can label a message as such. Unless we are almost certain, emails are not labeled as spam. Conversely, when categorizing a health-related issue, we must be far more attentive. We do not want to overlook a cancerous cell, even if we have a slight suspicion that it may be. As a result, the value that marks the boundary between the positive and negative classes depends on the problem.

8 K-Nearest Neighbors (kNN) The kNN algorithm works by finding the k-nearest neighbors to a given data point in the feature space based on a similarity metric as shown in Fig. 8. In other words, the algorithm identifies the k data points in the training set that are closest to the new

Fig. 8 classes of the five nearest points

86

P. Whig et al.

data point based on some similarity measure, such as Euclidean distance or cosine similarity. Once the k nearest neighbors are identified, the algorithm predicts the class or value of the new data point based on the majority class or average value of the k neighbors. For example, in a binary classification task, if the majority of the k nearest neighbors belong to the positive class, the algorithm predicts a positive label for the new data point. The value of k in the kNN algorithm is an important hyperparameter that needs to be tuned. A small value of k (e.g., k = 1) can lead to overfitting, where the algorithm may be too sensitive to noise in the data, while a large value of k (e.g., k=n, where n is the number of training examples) can lead to underfitting, where the algorithm may not capture the underlying structure of the data. KNN can be used for both regression and classification problems. In a regression problem, the algorithm predicts the mean value of the k nearest neighbors. In a classification problem, the algorithm predicts the class label that has the highest frequency among the k nearest neighbors. KNN is a computationally efficient algorithm during testing, as it does not require a time-consuming training process. However, it can be computationally expensive during testing for large datasets, as it requires computing the distance between the new data point and all training examples. kNN is a simple yet effective algorithm for classification and regression tasks, and it can be used in a wide range of applications, including image and speech recognition, recommendation systems, and natural language processing.

9 Decision Trees A decision tree develops by repeatedly dividing data into categories by asking questions. With a visual depiction of a decision tree, it is simpler to see the partitioning data: A decision tree to forecast client turnover is shown in Fig. 9. The first split is based on the total monthly charges. The system then asks more questions to create distinct class labels. As the tree grows deeper, the inquiries get more focused. The decision tree technique seeks to maximize predictiveness division continuously learns more. Usually, arbitrarily dividing the topographies does not provide us with useful information. Separations with higher node purity provide additional information. The distribution of various classes inside a node is negatively correlated with the purity of that node. The selection of the questions is done to promote purity or minimize impurity.

Explainable Machine Learning in Healthcare

87

Fig. 9 decision tree to forecast client turnover

Fig. 10 Random forest algorithm flow

10

RF Algorithm

Random Forest (RF) is a popular ensemble learning algorithm ,It is a type of decision tree-based algorithm that combines multiple decision trees to make a final prediction as shown in Fig. 10. The RF algorithm works by creating a large number of decision trees on different subsets of the training data and input features. This process is called bootstrap aggregating, or bagging. Once the decision trees are constructed, the algorithm makes a final prediction by taking the majority vote or average of the predictions from all decision trees. For

88

P. Whig et al.

example, in a binary classification task, if the majority of decision trees predict a positive label, the RF algorithm predicts a positive label for the new data point. The RF algorithm has several advantages over other decision tree-based algorithms. It is less prone to overfitting and performs well on high-dimensional data. The algorithm also provides a measure of feature importance, which can be used for feature selection and interpretation. The hyperparameters of the RF algorithm include the number of decision trees, the maximum depth of each decision tree, the number of input features to consider at each split, and the criterion for splitting nodes, such as the Gini index or entropy. RF can be used for both regression and classification problems. In a regression problem, the algorithm predicts the average of the target values from the leaf nodes of all decision trees. In a classification problem, the algorithm predicts the class label that has the highest frequency among the predictions from all decision trees. Overall, RF is a powerful and versatile algorithm for classification and regression tasks. It is widely used in various fields, including bioinformatics, finance, and marketing.

11

Boosted Gradient Decision Trees (GBDT)

GB DT is an ensemble algorithm that combines many decision trees using the boosting approach as shown in Fig. 11. Decision trees serve as weak learners in GBDT. Gradient Boosted Decision Trees (GBDT), also known as Gradient Boosting Machines It is an ensemble learning method that combines multiple decision trees to make a final prediction. During the training process, the algorithm adjusts the weights of the training examples based on their residual errors from the previous trees. The training examples with larger residual errors are given higher weights, and the subsequent trees are trained to focus on these examples to correct their errors.

Fig. 11 Boosting is the process of successively combining learning algorithms

Explainable Machine Learning in Healthcare

89

Once all the decision trees have been constructed, the final prediction is made by summing the predictions from all the trees. In a binary classification problem, the algorithm predicts the probability of a positive label, which is transformed into a binary label using a threshold. In a regression problem, the algorithm predicts the mean value of the target variable. The hyperparameters of the GBDT algorithm include the number of decision trees, the depth of each tree, the learning rate, and the criterion for splitting nodes, such as the mean squared error or the mean absolute error. GBDT has several advantages over other machine learning algorithms. It is highly accurate and can handle a variety of data types and features. The algorithm is also robust to outliers and missing values. Additionally, GBDT provides a measure of feature importance, which can be used for feature selection and interpretation. GBDT is widely used in various fields, including finance, marketing, and healthcare. It has been used for fraud detection, credit risk analysis, and patient outcome prediction, among other applications. When compared to random forests, GBDT is more effective at both classification and regression tasks and offers more precise predictions. It can handle mixed-type features and doesn't require any pre-processing.

12

Clustering with K-Means

An approach to arrange a collection of data opinions so that related data opinions are gathered together is through clustering. Because of this, bunch procedures search for similarities or differences between data opinions. Since bunch is an unverified knowledge method, data points have no labels attached to them. Bunch techniques look for the data's underlying structure. Information is alienated into k bunches using K-means bunch, to make the clusters closer together and the individual data points more dispersed as shown in Fig. 12. It is a partition-based clustering method as a result. The distance between two locations determines how similar they are. K-means clustering seeks to maximize distances across clusters while minimizing distances within each cluster. The number of clusters cannot be determined using the K-means method. When building the K-means object, we must define it, which may be difficult.

13

Analysis by Principal Components (PCA)

A dimensionality reduction approach called PCA creates new features from the ones that already exist while retaining as much data as feasible. A common preprocessing step for supervised learning systems, PCA is an unsupervised learning method as

90

P. Whig et al.

Fig. 12 K-means clustering

Fig. 13 Analysis by Principal Components

shown in Fig. 13. By identifying the relationships between features within a dataset, PCA can extract new features. PCA’s goal is to use fewer features to describe the variation in the original dataset as much as feasible (or columns). Principal components are the new derived characteristics. The proportion of the original dataset's variation that each principal component accounts for determines their ranking.

Explainable Machine Learning in Healthcare

14

91

Case Study

The objective of this research is to identify the most useful characteristics for predicting whether a breast tumor is malignant or benign. Additionally, the study aims to discover broad trends that can assist in selecting the appropriate machine learning model and hyperparameters. The goal is to build a function that can accurately predict the class of a new input as either malignant or benign using classification algorithms. The ultimate objective is to determine the nature of the breast cancer and classify it accordingly. An ML library is often a collection of readily usable functions and procedures. A developer's armory must have a strong collection of libraries to do research and create sophisticated applications without having to write a lot of code. Libraries save programmers from repeatedly creating unnecessary code. Additionally, there are other libraries for handling various issues. For instance, we provide libraries for text processing, graphics, data manipulation, and scientific calculation. The top five headers of the dataset are shown in Fig. 14 Using the Dimension command we can see the dimension of the dataset print("Cancer data set dimensions : {}".format(dataset.shape)) Cancer data set dimensions : (569, 32)

The data set has 569 rows and 32 columns, as can be seen in Fig. 15. The column labeled “Diagnosis” will tell us if the cancer is M = malignant or B = benign. Cancer is indicated by a 1 and benign cancer by a 0. We can see that 357 of the 569 people have the label “B” (benign), while 212 have the label “M.” (malignant).

15

Handling Missing Values

There are no missing values this can be checked using dataset.isnull().sum() dataset.isna().sum()

Fig. 14 Top five headers of the dataset

92

P. Whig et al.

Fig. 15 Visualization of the dataset

There is no missing value as shown in Fig. 16. Variables with categorical data have label values rather than numerical values. Frequently, just a fixed set of values are available. Users are frequently characterized in terms of their place of origin, gender, age group, etc. To label the categorical data, we’ll utilize Label Encoder. Our predictive models can better interpret categorical input when it is converted into numbers using the Label Encoder function of the SciKit Learn module in Python as shown in Fig. 17.

16

Result

The tests that were run using various methodologies on the dataset mentioned above are presented in the table below. Based on training and testing accuracy and the amount of time spent on the dataset, several strategies are being compared as shown in Table 1. As it provides 99% accuracy in less time, the Extreme Learning Machine is the best among the competition, according to the data as shown in Fig. 18. A bar chart comparison of all the models is shown below, broken down by accuracy and time.

17

Conclusion

Machine learning has shown great promise in detecting breast cancer with a high accuracy rate of up to 98.4%, making the process more efficient, faster, and less complicated. The model was built using a combination of classifiers and algorithms,

Explainable Machine Learning in Healthcare Fig. 16 Missing values checking

93

94

P. Whig et al.

Fig. 17 Diagnosis data with and without encoding Table 1 Comparative analysis

Model Decision Tree (DT) K-Nearest Neighbour (KNN) Support Vector Machine (SVM) Random Forest (RF) Extreme Learning Machine (ELM)

Accuracy Training (%) 84 89 91 94 95

Testing (%) 89 90 91 94 99

TIME Training (ms) 0.04 0.35 0.06 0.15 0.04

Testing (ms) 0.01 0.32 0.01 0.14 0.01

including decision tree, random forest, and logistic regression. The model classifies patients as either malignant or benign based on the predictions made by the decision tree classifiers. This model provides more accurate predictions than incorrect ones and can aid in early detection of breast cancer, leading to timely treatment and saving

Explainable Machine Learning in Healthcare

95

Fig. 18 Accuracy and time comparison

patients from extensive surgery. Researchers estimate that in 50 years, breast cancer prediction accuracy rates could reach 99% using the Extreme Learning Machine (ELM) and PCA as the primary tools. This component can be applied to mammography techniques in the future to distinguish between benign and malignant cells in their early stages. This study can be a valuable resource for researchers conducting related studies, as it showcases the potential of machine learning in detecting breast cancer accurately. The researchers emphasize the need for continuous improvement in this field.

18

Future Scope

Explainable machine learning (XAI) in healthcare is an emerging field that aims to improve the transparency and interpretability of machine learning models used in healthcare applications. By providing explanations for the predictions made by these models, XAI can enhance trust and accountability, enable more informed decisionmaking, and improve patient outcomes. The future scope of this study is vast, as there is still much to be explored and achieved in the field of XAI in healthcare. Here are some potential areas of future research and development: 1. Improved model interpretability: As healthcare applications become more complex, the need for interpretable machine learning models becomes more critical. Future research could focus on developing more sophisticated algorithms that are not only accurate but also provide transparent and understandable predictions.

96

P. Whig et al.

2. Real-time monitoring: Real-time monitoring of patient data using machine learning algorithms can improve healthcare outcomes significantly. Future studies could focus on developing real-time monitoring tools that are transparent and explainable, making it easier for clinicians to make informed decisions. 3. Ethical considerations: The use of machine learning algorithms in healthcare raises ethical concerns around issues such as bias, fairness, and privacy. Future research could explore ways to address these ethical considerations, such as developing algorithms that are fair and unbiased, and ensuring patient privacy. 4. Clinical decision support: Clinical decision support systems that incorporate machine learning algorithms can assist clinicians in making informed decisions about patient care. Future research could focus on developing clinical decision support systems that are transparent, interpretable, and explainable, allowing clinicians to understand how the system arrived at its recommendations. 5. Integration with electronic health records (EHRs): Machine learning algorithms can be integrated with electronic health records to improve patient outcomes. Future research could explore ways to integrate machine learning algorithms into EHR systems in a way that is transparent, explainable, and easy for clinicians to use. Overall, the future scope of XAI in healthcare is vast, and there are many potential areas of research and development. By continuing to explore and improve the transparency and interpretability of machine learning models used in healthcare, we can improve patient outcomes and enhance trust and accountability in the healthcare system.

References 1. Akay, M. F. (2009). Support vector machines combined with feature selection for breast cancer diagnosis. Expert Systems with Applications, 36(2), 3240–3247. 2. Alkali, Y., Routray, I., & Whig, P. (2022). Strategy for reliable, efficient, and secure IoT using artificial intelligence. IUP Journal of Computer Sciences, 16(2), 1–9. 3. Anand, M., Velu, A., & Whig, P. (2022). Prediction of loan behaviour with machine learning models for secure banking. Journal of Computer Science and Engineering (JCSE), 3(1), 1–13. 4. Antonini, M., Barlaud, M., Mathieu, P., & Daubechies, I. (1992). Image coding using wavelet transform. IEEE Transactions on Image Processing, 1(2), 205–220. 5. Arun Velu, P. W. (2021). Impact of covid vaccination on the globe using data analytics. International Journal of Sustainable Development in Computing Science, 3(2), 1–12. 6. Carter, G., Knapp, C., & Nuttall, A. (1973). Estimation of the magnitude-squared coherence function via overlapped fast Fourier transform processing. IEEE Transactions on Audio and Electroacoustics, 21(4), 337–344. 7. Charite, A. P., & Jamie, S. B. (2017). The preprocessing methods of mammogram images for breast cancer detection. International Journal on Recent and Innovation Trends in Computing and Communication, 5(1), 261–264. 8. Cruz, A. J., & Wishart, D. S. (2006). Applications of machine learning in cancer prediction and prognosis. Cancer Informatics, 2, 59–77.

Explainable Machine Learning in Healthcare

97

9. Elmore, J. G., Wells, C. K., Lee, C. H., Howard, D. H., & Feinstein, A. R. (1994). Variability in radiologists’ interpretations of mammograms. New England Journal of Medicine, 331(22), 1493–1499. 10. Ferlay, J., Soerjomataram, I., Dikshit, R., et al. (2014). Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. International Journal of Cancer, 136(5), 359–389. 11. Jupalle, H., Kouser, S., Bhatia, A. B., Alam, N., Nadikattu, R. R., & Whig, P. (2022). Automation of human behaviors and their prediction using machine learning. Microsystem Technologies, 28, 1879–1887. 12. Khera, Y., Whig, P., & Velu, A. (2021). efficient effective and secured electronic billing system using AI. Vivekananda Journal of Research, 10, 53–60. 13. Mao, N., Yin, P., Wang, Q., et al. (2019). Added value of radionics on mammography for breast cancer diagnosis: A feasibility study. Journal of the American College of Radiology, 16(4), 485–491. 14. Nadikattu, R. R., Mohammad, S. M., & Whig, P. (2020). Novel economical social distancing smart device for covid-19. International Journal of Electrical Engineering and Technology (IJEET), 11, 204–216. 15. Narain Ponraj, D., Evangelin Jenifer, M., Poongodi, P., & Samuel Manoharan, J. (2011). A survey of the preprocessing techniques of mammogram for the detection of breast cancer. Journal of Emerging Trends in Computing and Information Sciences, 2(12), 656–664. 16. Salembier, P., & Garrido, L. (2000). Binary partition tree as an efficient representation for image processing, segmentation, and information retrieval. IEEE Transactions on Image Processing, 9(4), 561–576. 17. Sharma, A., Kumar, A., & Whig, P. (2015). On the performance of CDTA-based novel analog inverse low pass filter using 0.35 μm CMOS parameter. International Journal of Science, Technology & Management, 4(1), 594–601. 18. Tomar, U., Chakroborty, N., Sharma, H., & Whig, P. (2021). AI-based smart agriculture system. Transactions on Latest Trends in Artificial Intelligence, 2(2), 1–15. 19. Valvano, G., Santini, G., Martini, N., et al. (2019). Convolutional neural networks for the segmentation of microcalcification in mammography imaging. Journal of Healthcare Engineering, Article ID 9360941., 9 pages. 2019. 20. Velu, A., & Whig, P. (2022). Studying the impact of the COVID vaccination on the world using data analytics. Vivekananda Journal of Research, 10(1), 147–160. 21. Wang, H., Feng, J., Bu, Q., et al. (2018, 2018). Breast mass detection in digital mammogram based on gestalt psychology. Journal of Healthcare Engineering, Article ID 4015613., 13 pages. 22. Whig, P. (2019). A novel multi-center and threshold ternary pattern. International Journal of Machine Learning for Sustainable Development, 1(2), 1–10. 23. Whig, P., & Ahmad, S. N. (2012). A CMOS-integrated CC-ISFET device for water quality monitoring. International Journal of Computer Science Issues, 9(4), 1694–1814. 24. Whig, P., Kouser, S., Velu, A., & Nadikattu, R. R. (2022a). Fog-IoT-assisted-based smart agriculture application. In Demystifying federated learning for blockchain and industrial Internet of Things (pp. 74–93). IGI Global. 25. Whig, P., Nadikattu, R. R., & Velu, A. (2022b). COVID-19 pandemic analysis using the application of AI. In Healthcare monitoring and data analysis using IoT: Technologies and applications (pp. 1–15). 26. Whig, P., Velu, A., & Bhatia, A. B. (2022c). Protect nature and reduce the carbon footprint with an application of blockchain for IoT. In Demystifying federated learning for blockchain and industrial Internet of Things (pp. 123–142). IGI Global. 27. Whig, P., Velu, A., & Naddikatu, R. R. (2022d). The economic impact of AI-enabled blockchain in 6G-based industry. In AI and blockchain technology in 6G wireless network (pp. 205–224). Springer.

98

P. Whig et al.

28. Whig, P., Velu, A., & Nadikattu, R. R. (2022e). Blockchain platform to resolve security issues in IoT and smart networks. In AI-enabled agile Internet of Things for sustainable FinTech ecosystems (pp. 46–65). IGI Global. 29. Whig, P., Velu, A., & Ready, R. (2022f). Demystifying federated learning in artificial intelligence with human-computer interaction. In Demystifying federated learning for blockchain and industrial Internet of Things (pp. 94–122). IGI Global. 30. Whig, P., Velu, A., & Sharma, P. (2022g). Demystifying federated learning for blockchain: A case study. In Demystifying federated learning for blockchain and industrial Internet of Things (pp. 143–165). IGI Global.

Explainable Artificial Intelligence with Scaling Techniques to Classify Breast Cancer Images Abdulwasiu Bolakale Adelodun , Roseline Oluwaseun Ogundokun Akeem Olatunji Yekini, Joseph Bamidele Awotunde , and Christopher Chiebuka Timothy

,

1 Introduction Activities that formerly needed human intellect can be replicated by software algorithms. AI algorithms can match, and even outperform, human performance in a range of repetitive, well-defined activities. Computer vision is one of the major areas where AI technology is being applied [1–3]. They can be trained to extract patterns from massive data sets, including data sets comprising a significant number of medical images [4]. If given enough data, machine learning will rapidly become more intelligent on its own. Importantly, they can even learn to make wise choices in situations when humans may lack all specialized expertise. AI has the potential to improve healthcare by detecting tumors that conventional methods ignore, obviating A. B. Adelodun (✉) ECWA College of Nursing and Midwifery, Egbe, Kogi State, Nigeria R. O. Ogundokun Department of Multimedia Engineering, Kaunas University of Technology, Kaunas, Lithuania Department of Computer Science, Landmark University, Omu Aran, Nigeria e-mail: [email protected]; [email protected] A. O. Yekini Faculty of Electrical and Electronics Engineering, Kaunas University of Technology, Kaunas, Lithuania e-mail: [email protected] J. B. Awotunde Department of Computer Science, Faculty of Information and Communication Sciences, University of Ilorin, Ilorin, Kwara State, Nigeria e-mail: [email protected] C. C. Timothy Mechatronics Engineering Programme, Bowen University, Iwo, Osun State, Nigeria e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Hossain et al. (eds.), Explainable Machine Learning for Multimedia Based Healthcare Applications, https://doi.org/10.1007/978-3-031-38036-5_6

99

100

A. B. Adelodun et al.

the need for radiologists, and minimizing delays in decision-making that might threaten the lives of women [5, 6]. AI is a set of instructions that can analyze complicated data and find patterns [7]. Artificial intelligence-based solutions can play a lot in practically all stages of the breast screening route. Cancer is a significant worldwide public health concern and the second most prevalent cause of death in the United States [8]. According to predictions, 1.9 million new cases with 609,360 cancer deaths in the US were projected for 2022 [9]. Breast cancer in women was diagnosed with 11.7% of all malignancies in 2020, making it the most frequent disease globally [10, 11]. According to estimates, breast cancer accounts for 31% of all cases of female cancer in the United States [12], with 287,850 additional cases anticipated in 2022. Breast cancer first manifests itself as lobules or ductal cells (epithelium) within the glandular tissue of the breast. The initial location of the malignant growth is within the duct or lobule, where it frequently shows no symptoms and has a low chance of spreading [13]. Early detection has been shown to improve breast cancer treatment [14, 15]. That is the reason medical guidelines endorse the use of imaging techniques like CT for its screening. Women with a strong family history of breast cancer are encouraged to go for annual MRIs and mammograms [16]. Mammography screenings are intended to find breast cancer in its earliest stages when therapy will be more effective [17]. In the majority of nations, screening for breast cancer every 2 years is advised [18]. According to studies, screening mammography can lower the number of breast cancer deaths among women in the age range 40–47 years, with the highest support for the benefit being shown in those between the ages of 50 and 69. However, the advantages of regular mammography screening in women under the age of 40 are not well supported by research. Digital mammography, magnetic resonance imaging (MRI) [19], positron emission tomography (PET) scanning [20], and diffuse optical tomography, which creates images of the breast using light rather than x-rays, are all attempts to enhance traditional mammography. Despite the guidelines, the healthcare system faces difficulties even in a wealthy nation like the United States. Radiologists are prone to erroneous conclusions [21]. Additionally, there is a considerable variation in radiologists’ abilities and conclusions. A further issue that affects developing nations is a lack of qualified radiologists [22]. Artificial Intelligence can assist in identifying mammography pictures [23] that require additional diagnostic testing to determine if the image’s discovered characteristics are cancerous [24, 25]. AI systems are not limited by the limitations of the human eye, which can only view slices of 2D images on a screen; instead, the systems are capable of viewing the complete 3D image at once. In actuality, the system had a 4D vision, with time serving as the fourth dimension, because it could view both recent and older 3D images at the same time. AI has more processing capability than human radiologists, thus it can evaluate every pixel as well as different combinations of pixels with far more detail. For both diagnosis and treatment of breast cancer, a precise and trustworthy diagnosis is required [26]. With the proliferation of artificial intelligence and machine learning algorithms around the world, it is becoming increasingly important for people to become familiar with the use of these sometimes mysterious systems. One of the major

Explainable Artificial Intelligence with Scaling Techniques. . .

101

problems with AI systems today is that the process is not transparent. Unlike humans, artificial intelligence systems cannot explain how they arrived at their decision. In critical cases, it is important to know why a recommendation was formulated. In this scenario, XAI is quite useful [27, 28]. XAI’s goal is to create new machine learning systems that can explain their reasons, characterize their strengths and weaknesses, and provide an understanding of how they will behave in the future. Explainable AI is a set of methods for explaining a model’s predictions even when the nature of the modeling algorithm would ordinarily make such an explanation challenging [29, 30]. The commonly used XAI techniques are the Interpretable Machine Learning Model, LIME, SHAP [31], Counterfactuals/Adversarial Attacks, RETAIN, and LRP. For this, there are several algorithms employed in this study and the following are the primary contributions of this study: (i) To evaluate the feature importance in breast cancer prediction using explainable artificial intelligence (XAI) method. (ii) To explore various feature scaling techniques in the sk-learn library (iii) To assess and contrast the classification metrics for several classification algorithms on the Breast Cancer Wisconsin (Diagnostic) dataset both with and without scaling The remaining components of this study are structured as follows: The relevant papers on breast cancer categorization are included in Sect. 2. Sections 3 offer the suggested system design materials and methods. The experimental findings are discussed and illustrated in Sect. 4 and the study is concluded in Sect. 5.

2 Related Work By using several techniques, a significant amount of effort has previously been done to determine breast cancer [32]. Comparing machine learning classifiers in support of breast cancer treatment based on the quantity technique that was selected. Below, a variety of research methods have been presented. Research on Breast Cancer Detection with Machine Learning using Breast Cancer Wisconsin (Diagnostic) Dataset obtained from Kaggle was conducted by [33]. Analysis was done on 569 patient records, and each instance had 32 attributes combined with a diagnosis and characteristics. A parameter of the diseased and non-cancerous cells is present in each case. The values of the features are displayed numerically. The term “Target” refers to the patient who is suffering from either “Benign” or “Malignant” cancer. The Wisconsin breast cancer dataset was used in the study to conduct data visualizations and comparisons of the effectiveness of various machine learning algorithms, including Naive Bayes, KNN, Support Vector Machine, Decision Tree, Adaboost, XGboost, Random Forest, Support Vector Machine, and Decision Tree [34]. To be able to classify data accurately precisely, sensitively, and specifically, we evaluated the efficacy and efficiency of each

102

A. B. Adelodun et al.

method. The outcomes reveal that XGboost has the lowest mistake rate and the best accuracy (98.24%). This study was able to use different machine learning algorithms approaches but a small dataset (569) was used. Utilizing the Wisconsin Diagnostic Breast Cancer Dataset, [35] studied the behavior of SVM, Nave Bayes, and ANN by combining these approaches to machine learning using feature selection techniques to produce the best-suited one. As a result of its lengthier calculation time, SVM-LDA was preferred above all the other approaches, according to the simulation findings. Through the use of machine learning methods like AdaBoost, RCNN, RNN, Naive Bayes, and SVM, [36] developed a unique approach to identifying breast cancer (HA-BiRNN). Machine learning methods and the suggested approach were contrasted, and the simulation results demonstrated that the DNN algorithm outperformed the other methods in terms of performance, efficacy, and picture quality. Combining machine learning methods including Logistic Regression, Decision Tree, Random Forest, KNN, and SVM with CNN, DL, Naive Bayes, and ANN Classifier approaches, [37] proposed a fresh technique to find breast cancer. A comparison of machine learning and deep learning methods found that CNN and ANN models’ accuracy (99.3%) was greater than that of machine learning models (97.3%). The authors in [38] compared ANN and SVM, and for better dataset processing, included numerous classifiers, such as CNN, KNN, and Inception V3. The experimental results and performance assessments showed that ANN outperformed SVM, making it a superior classifier. In research conducted by [39], images from histopathology microscopy were categorized using ResNet-50. The model uses the ResNet-50’s potent transfer learning approach to train and categorize the BreakHis dataset into benign and malignant conditions. After 100 epochs of recording, the testing accuracy level fluctuated between 98% and 99.8%, and average testing accuracy of 99% of total testing accuracy was attained. Research by [40] uses the method of edge detection in a neural network to compare with the Convolution Neural Network model using the time, accuracy, and dimension parameters. Compared to the current Convolution Neural Network model, the time required for Canny-edged detection is lower. Additionally, it was discovered that the proposed work is between 14% and 15% more accurate than the conventional method. The accuracy, however, fluctuates depending on the image size and image dataset modifications. Most of the related works reviewed focus on the accuracy of a model while some were able to compare different models. However, no attention was paid to comparing the models with and without feature scaling. Also, there was little attention to the implications of false positives and false negatives. In this study, various feature scaling will be employed with priority on the confusion matrix pointing out its significance.

Explainable Artificial Intelligence with Scaling Techniques. . .

103

3 Materials and Methods Many different scaling approaches have been developed for data rescaling since most of the variables in our dataset may have diverse scales. Because there are scaling strategies accessible, data scaling is frequently seen as the best strategy. This scaling is data-specific and in some circumstances might not even be necessary. Therefore, this study suggested examining several algorithms and the implications of different scaling strategies on the Wisconsin (Diagnostic) Breast Cancer dataset.

3.1

Proposed Methodology

This research was implemented using the python programming language in the Jupiter notebook environment. The projected model implementation is shown in Fig. 1 and the methodology is outlined as follows: 1. Using a feature scaling technique, models for breast cancer prediction were created. The following goals were attained using the Wisconsin Diagnostic Breast Cancer dataset, 2. Utilizing the Training and Testing Splits approach to divide the data 3. On the training dataset, train the prediction models. 4. Preprocessing of breast cancer dataset using different scaling techniques to retrain the models 5. Validation of predictive models on eight different algorithms using training dataset of scaled and without scaling. 6. The model performance assessment metrics are calculated on the eight different algorithms to achieve the model accuracy, sensitivity, specificity, and F1-score for both scaled data and without scale.

3.2

Dataset

The data used for this research work is Wisconsin (Diagnostic) Breast Cancer Data Set made available by UCI Machine Learning Repository [41]. The dataset was created by Dr. William Wolberg at the University of Wisconsin [42], It has 569 occurrences and 32 characteristics which were taken from a digitized picture of a fine needle aspirate (FNA) of a breast mass containing 357 benign tumors and 212 malignant tumors. The dataset was split into 75% training and the remaining 25% was utilized for testing.

A. B. Adelodun et al.

104

Fig. 1 Proposed breast cancer classification architecture

3.3

Data Processing

Machine learning techniques are entirely dependent on data as it is the element that makes model training possible. On the other hand, a machine will be worthless if we are unable to interpret it before supplying it to ML algorithms. Simply said, we must

Explainable Artificial Intelligence with Scaling Techniques. . .

105

always provide the relevant data, in the proper scale, format, and with significant attributes, for the problem we want the machine to resolve. As a result, data preparation becomes the most crucial stage of the ML process. After selecting the raw data for machine learning, the process of “data preparation” increases the dataset’s appropriateness for ML approaches. This will transform the chosen data into a format that machine learning algorithms can use. Normalization together with Z-Score and Min-Max was the scaling methods used in this study.

3.3.1

Min-Max Scaling

This is a scaling method that transforms the data from measuring units to a new interval, which goes from lower(F) to upper(F) for feature (F). The boundary can be defined by the individual. Most often the new boundary and values are zero and one and this can be shown in Eq. 1. Mathematically: V0 =

V - minF ðupperF - lowerFÞ þ LowerF maxF - minF

ð1Þ

V = current value of feature F (original value) V′ = normalized value minF = minimum value of the feature maxF = maximum value of the feature upperF = upper bound lowerF = lower bound The MinMaxScaler class of the sci-kit-learn Python package allows us to rescale the data.

3.3.2

Normalization

Another beneficial data preparation technique is normalization. Each row of data is rescaled using this to have a length of 1. It works well with sparse datasets that include a lot of zeros. We may scale the data with the aid of the Normalizer class in the sci-kit-learn Python package. The normalization preprocessing methods employed in machine learning falls into two categories:

106

A. B. Adelodun et al.

L1 Normalization It might be described as a normalization method that modifies dataset values such that the total of absolute values in each row will always equal 1. Another name for it is Least Absolute Deviations or Lasso.

L2 Normalization It is a normalization technique that adjusts dataset values so that the sum of the squares in each row will always equal 1. Additionally known as least squares or Ridge.

3.3.3

Z-score

This is also called Zero mean normalization. Data is transformed to have a standard deviation of 1 and a mean of 0. The normalized value V′is computed as shown in Eq. 2 V0 =

V -μ σ

ð2Þ

μ = feature mean σ = feature standard deviation V = original value The sci-kit-learn Python library’s StandardScaler class may be used to call Z-Score.

3.4

Model

Classification is the process of inferring a category from observable quantities. The results can be categorized as black or white, spam or no spam. In mathematics, classification is the process of roughly translating an input function (X) to an output function (Y) [40]. In essence, in addition to the input data set, objectives are specified in this sort of supervised machine learning. Breast cancer detection is an example of a classification problem. Since there are only two possible output categories—“benign” and “malignant”—this type of categorization is binary. Before using this categorization, we must first train the classifier to help with prediction. Eight classification algorithms were considered in this research.

Explainable Artificial Intelligence with Scaling Techniques. . .

3.4.1

107

Logistic Regression

The simplest form of logistic regression is called a binary or binomial logistic regression if there are just two possible kinds for the goal or dependent variable, one or zero. It allows us to model how a binary or binomial target variable interacts with several predictor variables. The linear function is essentially used as an input to another function when utilizing logistic regression, such as g, as shown in Eqs. 3 and 4 [43]: ℎθ ðxÞ = gðθT xÞ wℎere 0 ≤ ℎθ ≤ 1

ð3Þ

The logistic or sigmoid function, g, in this case, can be expressed as follows: gð z Þ =

1 where z = θT x 1 þ e-Z

ð4Þ

The graph below can be used to illustrate the sigmoid curve. As can be seen, the y-axis values range from 0 to 1 and cross the axis at 0.5 (Fig. 2). The classifications can be divided into good or negative categories. The output is categorized as positive if the result is between 0 and 1. To facilitate our implementation, we are assuming that the output of the hypothesis function is positive if it is 0.5 and negative if it is not. Additionally, a loss function must be developed to assess the algorithm’s effectiveness using the weights on functions, which are denoted by theta as follows (see Eq. 5) [45]: h = gðXθÞJ ðθÞ =

1  - yT logðhÞ - ð1 - yÞT logð1 - hÞ m

ð5Þ

Our major goal at this point is to minimize the loss function, which has been established. It may be done by using weights that have been fitted, which requires changing the weights. Derivatives of the loss function for each weight would allow Fig. 2 Sigmoid curve. (Logistic-curve – Sigmoid function – Wikipedia [44])

108

A. B. Adelodun et al.

us to decide which parameters should be given a high weight and which should be given a lower weight. The gradient descent equation below illustrates how changing the parameters might alter the loss (see Eq. 6) ([46]; Machine-Learning-1/LogisticRegression-Model.Md at Master  EyasuTew/Machine-Learning-1  GitHub [47]): δJðθÞ 1 = X T ðgðXθÞ - yÞ δθj m

3.4.2

ð6Þ

Support Vector Machine (SVM)

Support vector machines (SVMs) are effective yet flexible supervised machine learning methods employed in both regression and classification (Support Vector Machine(SVM) Algorithms under Supervised Machine Learning (Tutorial) | by Neelam Tyagi | Analytics Steps | Medium [48]). Nevertheless, they are frequently used in classification problems. SVMs were first introduced in the 1960s, but significant development in them occurred around 1990. Unlike other machine learning algorithms, SVMs are implemented differently. Due to their ability to control a variety of continuous and categorical factors, they have lately experienced a significant increase in popularity. A hyperplane representing several classes in multidimensional space is all that an SVM model is. To cut down on error, SVM will repeatedly build the hyperplane. SVM seeks to categorize datasets to locate a maximum marginal hyperplane (MMH) [49] (Fig. 3). The following are crucial SVM concepts:

Support Vectors Data points closest to the hyperplane are called support vectors. These informational points will be utilized to specify the dividing line.

Fig. 3 SVM model hyperplane [50]

Explainable Artificial Intelligence with Scaling Techniques. . .

109

Hyperplane As seen in the figure above, there are several objects with various classifications divided up into this decision plane or space.

Margin It may be described as the separation of two lines on the closet data points of different classes. It may be calculated using the perpendicular distance between the line and the support vectors. In contrast to a narrow margin, a huge margin is seen positively. The datasets must first be categorized into classes to find a maximum marginal hyperplane (MMH). The next two methods can be used to accomplish this: (a) To optimally differentiate the classes, SVM will first iteratively generate hyperplanes. (b) The hyperplane that best separates the classes will then be chosen.

SVM Kernels A kernel that transforms an input data space into the required shape is used to implement the SVM algorithm. SVM employs a technique called the kernel trick to transform a low-dimensional input space into a higher-dimensional one. Simply explained, the kernel transforms non-separable problems become separable problems by adding extra dimensions. SVM becomes more potent, adaptable, and accurate as a result. SVM employs a variety of kernel types, including the following:

Linear Kernel It may be used as the dot product of any two observations. Here is the formula for a linear kernel: Kðx; xiÞ = sumðx  xiÞ

ð7Þ

The product between two vectors, such as x & xi, is determined by the total multiHeight="120" Width="120" Resolution="300"plication of each pair of input values, as shown in the following formula.

Polynomial Kernel It differentiates between linear and curved input spaces and is a more versatile version of the linear kernel. The formula for the polynomial kernel is as follows:

110

A. B. Adelodun et al.

Kðx, xiÞ = 1 þ sumðx xiÞd

ð8Þ

The degree of the polynomial, denoted by the letter d in this case, must be explicitly stated in the learning algorithm.

Radial Basis Function (RBF) Kernel The RBF kernel, which is frequently used in SVM classification, maps the input space in an infinite dimensional space. The following formula explains it mathematically: Kðx, xiÞ = exp

- gamma sum x–xi2

ð9Þ

In this instance, gamma is between 0 and 1. To take it into account, the learning algorithm must be manually modified. The recommended default value for gamma is 0.1. Python may be used to implement SVM for data that cannot be separated linearly. This may be achieved using kernels.

3.4.3

Decision Tree

Predictive modeling methods like decision tree analysis may be applied in various situations. To generate decision trees, an algorithm that divides the information in a variety of ways depending on different variables might be utilized. Decision tresses are the most effective algorithms among the supervised algorithms. They may be used for both classification and regression problems. The two main parts of a tree are its decision nodes, where the data is separated, and its leaves, where the outcomes are located. We have the two decision trees listed below: Decision trees for classification: The choice variable in these decision trees have a categorical nature. Decision trees for regression: The decision variable in these decision trees is continuous. Decision Tree can be implemented in two methods which are:

Gini Index The categorial goal variable “Success” or “Failure” is employed to evaluate the dataset’s binary splits, and this is the name of the cost function that is utilized for that purpose. The degree of homogeneity increases with the Gini index value. The worst

Explainable Artificial Intelligence with Scaling Techniques. . .

111

Gini index value is 0.5 while the ideal value is 0 (for 2 class problems). The following steps can be used to compute a split’s Gini index: Using the formula p2 + q2, Get the Gini index for sub-nodes, which is the sum of the squares of the probabilities of success and failure. Next, use the weighted Gini score of each split node to obtain the Gini index for that split. Binary splits are produced using the Classification and Regression Tree (CART) algorithm using the Gini technique.

Split Creation A split generally involves adding both a value and an attribute from the dataset. We can divide a dataset into the three pieces listed below: 1. Calculating Gini Score 2. Splitting a dataset: One way to put it is that it divides a dataset into two lists of rows based on the split value and index of each property. The Gini score established in the first section may be used to calculate the split value once the right and left groups have been extracted from the dataset. The attribute’s split value determines which category it belongs to. 3. Evaluating all splits: The evaluation of all splits comes after determining the Gini score and splitting the dataset. To do this, we must first verify each value connected to each attribute as a possible split. The optimal split should then be determined by analyzing the cost of the split. The decision tree’s node will be the best split.

3.4.4

Building a Tree

A tree, as we are all aware, has both root and terminal nodes. We may create the tree by doing two things after building the root node:

Terminal Node Creation When designing decision tree terminal nodes, it is essential to decide whether to terminate the tree’s growth or add additional nodes. The two conditions listed below can be applied to accomplish it: Minimum node records and maximum tree depth: 1. Maximum Tree Depth: The name denotes the number of nodes in a tree that can exist after the root node. We must stop adding new nodes to a tree once it has achieved its maximum depth or when it has the most terminal nodes. 2. Minimum Node Records: It might be characterized as the minimal minimum of training patterns that a certain node is responsible for. Once the tree reaches these

112

A. B. Adelodun et al.

minimal node records or falls below these minimums, we must cease adding terminal nodes. The final prediction is made using the terminal node.

Recursive Splitting We can start building our tree now that we are clear on when to create terminal nodes. Recursive splitting is a method of making a tree. This method allows us to generate child nodes (nodes that are attached to an existing node) on each group of data produced by splitting the dataset by using the same function that was used to build the original node.

3.4.5

Naïve Bayes

Naive Bayes algorithms are a classification technique that relies on the Bayes theorem and the strong assumption that all predictors are independent of one another. The basic idea is that a feature’s presence in a class is unconnected to the existence of any other feature in the class, to put it simply. The likelihood of a label given a set of observable features may be determined by finding the posterior probabilities P(L | features), which is what Bayesian classification is all about. The Bayes theorem may be used to quantify this, and the results are presented as follows: PðL j featuresÞ =

PðLÞPðfeatures j LÞ PðfeaturesÞ

ð10Þ

(L | features) = posterior probability of class. (L) = prior probability of class P. (features | L) = likelihood of a particular predictor for a class. (features) = prior probability of predictor.

3.4.6

Random Forest

For both classification and regression, the Random Forest supervised learning method is used. It is mostly used, nevertheless, to address categorization problems. Trees make form a forest, and the more trees there are, the more stable it is. The random forest technique is similar in that it builds decision trees on samples of data, extracts forecasts from each one, and then polls people to choose which response is best. Since the ensemble technique averages the results, it eliminates over-fitting and is thus superior to a single decision tree.

Explainable Artificial Intelligence with Scaling Techniques. . .

113

Working on Random Forest Algorithm The steps that follow help us understand how the Random Forest algorithm works: 1. Pick random samples from an existing dataset to start. 2. This procedure will then be used for each sample to construct a decision tree. The predicted result for each decision tree will then be discovered. 3. During this stage, voting will be conducted for each anticipated result. 4. As the final forecast outcome, select the prediction that garnered the most support. The illustration below illustrates how it works (Fig. 4):

3.4.7

K-Nearest Neighbor (KNN)

K-Nearest Neighbor, which employs the Supervised Learning methodology, is one of the most fundamental Machine Learning algorithms. The K-NN technique places the new instance in the category that is most similar to the existing categories, presuming that the new case and the old instances are comparable. After storing all of the previous data, a new data point is categorized using the K-NN algorithm based on similarity. Finding data points that are near the new point entered into the machine is the foundation of how KNN functions. The algorithm then separates the closest points in terms of the arrival point’s distance. Different methods are employed to estimate this specific distance, but Euclidian distance is the one that specialists most frequently utilize. In the following step, select a specific number of points that are closer together than other points and group them according to a

Fig. 4 Random forest operation [51]

114

A. B. Adelodun et al.

distinct category. In KNN, points are picked as odd numbers, such as when there are two classes. Similar to this, when categories are picked on their own as a new data point, the greatest number of points results [52]. The KNN technique can handle enormous data sets and is fairly easy to construct, but it has a high computational cost since it must calculate the distance between each data point for each training sample, and because K must always be determined, which might make the algorithm more complex [53].

3.4.8

Adaptive Boosting

AdaBoost, short for “Adaptive Boosting,” is a classification problem-solving technique that transforms weak predictors or learners into powerful predictions. The statistical classification meta-algorithm known as AdaBoost, or Adaptive Boosting, was developed by Yoav Freund and Robert Schapire in 1995. The idea of boosting machine learning emerged from the question of whether a collection of poor classifiers might be made into a strong classifier. A classifier or weak learner is a learner whose abilities are inferior to random guessing. This is true even if many weak classifiers are employed, each of which is superior to random. The most common weak classifier is a simple threshold on a single feature. If the feature is greater than the anticipated threshold, it falls under the positive category; otherwise, it falls under the negative group. The final equation may be expressed as follows for classification: FðxÞ = sign

m m=1

OmfmðxÞ

ð11Þ

In this instance, fm stands for the mth weak classifier, and m stands for its associated weight.

3.4.9

Extreme Gradient Boosting

The acronym XGBoost stands for Extreme Gradient Boosting. XGBoost, a distributed gradient boosting library, was developed to be incredibly effective, adaptable, and portable. Utilizing the Gradient Boosting architecture, it creates machine learning algorithms. To efficiently handle a variety of data science jobs, it offers parallel tree boosting.

3.5

Performance Evaluation Metrics.

We may assess the effectiveness of ML algorithms using a variety of indicators. We must be very careful while selecting our criteria to assess the effectiveness of ML

Explainable Artificial Intelligence with Scaling Techniques. . .

115

because the statistic you choose will determine exactly how the effectiveness of ML algorithms is evaluated and compared, and the metric you choose will greatly influence how you balance the respective weights of the numerous contributing elements to the outcome. The evaluation metrics for our suggested model’s prediction performance are listed below:

3.5.1

Confusion Matrix

When the output may contain two or more distinct types of classes, it is the easiest method to assess how effectively a classification problem is functioning. As illustrated below, a confusion matrix is just a table with two dimensions, Actual and Predicted, along with True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN) for each dimension (Fig. 5). (i) True Positives (TP): when the projected value and the actual value are both favorable (ii) True negatives (TN): when the projected value and the actual value are both negative (iii) False positives (FP): when a prognosis is given that seems good, but the outcomes turn out to be bad. Another name for it is type 1 error. (iv) False negatives (FN): when the prognosis is wrong but the actuality is positive. The Type 2 mistake is another name for it. A strong model has low FP and FN rates and high TP and TN rates. The confusion matrix is often a superior assessment criterion for your machine learning model when working with an unbalanced dataset. Since classification accuracy provides a less accurate picture of a model’s performance, confusion matrices are frequently utilized. In classification accuracy studies, for instance, it is not known how many occurrences were incorrectly labeled. Using the confusion matrix function of sk-learn metrics, the confusion matrix of our classification model can be calculated.

Fig. 5 Confusion Matrix (Ting, 2017)

116

3.5.2

A. B. Adelodun et al.

Classification Report

It is essentially a longer version of the confusion matrix. There are metrics available besides the confusion matrix that can aid in a more thorough comprehension and evaluation of our model’s functionality.

Accuracy Its use as a performance indicator for classification algorithms is very widespread. It may be described as the proportion of all forecasts made to the number of accurate predictions made. It is simple to compute using the confusion matrix and the following formula: Accuracy =

TP þ TN TP þ FP þ FN þ TN

ð12Þ

Precision It serves as a gauge of the degree of real-world forecast accuracy reached. It provides information on the ratio of accurate positive predictions to all positive predictions. It is calculated by dividing the total number of correctly identified positive classes by the total number of positive classes that were anticipated, or by calculating the proportion of correctly identified positive classes out of all predictive positive classifications that we correctly predicted. High accuracy is necessary (ideally 1). Precision =

TP TP þ FP

ð13Þ

Precision is an important statistic when the danger of False Positives is greater than the risk of False Negatives.

Recall or Sensitivity It counts the number of observations in the positive class that is expected to be positive or the number of actual observations that are correctly predicted. Sensitivity is another word for it. A recall is a useful alternative for evaluation statistics when we want to catch as many positives as possible. A recall is defined as the sum of all positively classified positive classes divided by all positive classes or how many of all the positive classifications we accurately anticipated. Recall must be quite strong, ideally 1.

Explainable Artificial Intelligence with Scaling Techniques. . .

Recall =

TP TP þ FN

117

ð14Þ

A recall is a valuable indicator in circumstances involving medicine in which it is irrelevant if we issue a false alert, but the true positive instances shouldn’t be overlooked, and False Negative beats False Positive. Since we don’t want to unintentionally discharge an infected individual and allow them to mix with the healthy group, a recall would be a preferable metric.

Specificity The number of negatives our ML model returned is what we refer to as specificity. It is simple to compute using the confusion matrix and the following formula as seen in Eq. 15: Specificity =

TN TN þ FP

ð15Þ

F1 Score (F-measure) The F1 score, which is a number between 0 and 1, represents the harmonic mean of accuracy and recall. In contrast to ordinary averages, it is not susceptible to exceptionally high values. The F1 score makes an effort to balance your classifier’s recall and accuracy. Because it will occasionally be difficult to decide whether Precision or Recall is more important, we combine the two. In reality, when we strive to improve the precision of our model, the recall declines, and vice versa. Both trends are represented by a single number in the F1 score. Recall = 2 

TP Precision  Recall TP Precision þ Recall

ð16Þ

A harmonic mean of precision and recall is the F1 score. The harmonic mean penalizes the extreme values more severely than the arithmetic mean does. F-score should be high (ideally 1).

Area Under ROC Curve (AUC) A metric of performance called AUC-ROC (Area Under Curve)-ROC (Receiver Operating Characteristic) is used to evaluate categorization issues based on changing threshold values. AUC assesses separability, whereas ROC, as its name suggests, is a probability curve. Simply put, the AUC-ROC metric will show us how well the

118

A. B. Adelodun et al.

Fig. 6 AUC (ROC Curve & AUC Explained with Python Examples – Data Analytics [54])

model can distinguish between different classes. The model improves with increasing AUC. It may be analytically generated by plotting TPR (True Positive Rate), also known as sensitivity or recall, vs. FPR (False Positive Rate), also known as 1-Specificity, at various threshold levels. The ROC and AUC are shown in the graph below, with TPR on the y-axis and FPR on the x-axis (Fig. 6):

3.5.3

Logarithmic Loss (LOGLOSS)

Cross-entropy loss or logistic regression loss are other names for it. It assesses how effectively a classification model works when given information with a probability between 0 and 1 and is mostly dependent on probability calculations. It may be differentiated accurately so that it is more easily comprehended. As opposed to Log-Loss, which measures the degree of uncertainty around our forecast based on how much it deviates from the true label, accuracy is when the number of predictions made by our model equals the actual value. We can view the performance of our model more accurately with the aid of the Log Loss value.

3.6

Metrics Use Case

1. When the True Positives and True Negatives are more crucial, accuracy is employed. For balanced data, accuracy is a superior metric. 2. Use Precision where False Positive is much more crucial. 3. When False Negative is significantly more crucial, employ Recall.

Explainable Artificial Intelligence with Scaling Techniques. . .

119

4. When the False Positives and False Negatives are significant, F1-Score is employed. A better indicator of unbalanced data is the F1-Score.

3.7

Explainable Artificial Intelligence (XAI)

LIME will be utilized in this study to locally explain our prediction. LIME (Local Interpretable Model-Agnostic Explanations) is now the most used XAI method. In reality, LIME is a post hoc model or a method that looks for an explanation after a choice has been taken. LIME has the benefit of being model-agnostic [55], as the name implies, so it doesn’t matter what kind of model it applies to. With the LIME approach, the model input is distorted or slightly modified to see how the output changes. This reveals how the model arrived at its judgments and enables us to comprehend which inputs have the most impact on the result.

4 Result and Discussion 4.1

Experimental Setup

The computer in use runs on widow 10, 64-bit operating system, x64-based processor equipped with Python 3. The system’s processor is Intel(R) Core (TM) i3-3110M CPU @ 2.40GHz 2.40 GHz with a RAM of 8.00G. All of the code used in this study work was written in the Jupyter notebook environment, and all relevant python libraries were installed using pip install and correctly imported.

4.2

Explainable Result

The Random Forest Classifier has the greatest performance metrics evidenced by 98% accuracy out of all the classifiers used for our proposed work. The explainability of this study, therefore, considered LIME on Random Forest Classifier. We examine the individual prediction to see whether the model use the relevant characteristics and function as planned. As shown in Fig. 7a, the actual value of 0 denotes the absence of breast cancer, and the anticipated value of 0 represents the likelihood of a successful prediction. This prediction was explained by the feature and feature values shown in Fig. 7a. Those that tend to move in the left direction have a strongly negative impact since their values seem relatively low while those shifting toward the right direction have a strongly positive impact with higher values for the prediction. Since the concavity_worst is the only one in the right direction and not even the most

120

A. B. Adelodun et al.

Fig. 7a No breast cancer prediction

Fig. 7b Presence of breast cancer prediction

important feature according to the model, give the reason for the no breast cancer conclusion. Another prediction in Fig. 7b gave an actual value of 1 meaning the presence of breast cancer with a predicted value of 1 as prediction probability. All the features tend towards the right showing a strongly positive impact with radius_worse as the most important feature and concavity_worst as the least important feature.

4.3

Experimental Results

The confusion matrix for each model and the performance metrics were presented in this section.

Explainable Artificial Intelligence with Scaling Techniques. . .

4.3.1

121

SVM

The confusion matrix for SVM without scaling is shown in Fig. 8a with a True Positive of 85, True Negative of 54, True Positive of 2, and False Negative of 2 while Fig. 8b shows the confusion matrix for SVM with Min-Max scaling, with True

Fig. 8 SVM Confusion Matrix (a) without scaling; (b) with Min-Max scaling; (c) with L1, (d) with L2 scaling, and (e) with Z-score scaling

122

A. B. Adelodun et al.

Table 1 SVM model metrics summary Performance parameters Without scaling With Minscaling Max L1 L2 Z-score

Accuracy 0.9720 0.9371

Precision 0.9642 0.9273

Recall 0.9642 0.9107

F1score 0.9643 0.919

ROC 0.9706 0.9324

False positive 2 4

False negative 2 5

0.9371 0.9720 0.9371

0.9123 0.9643 0.9123

0.9286 0.9643 0.9286

0.9204 0.9643 0.9204

0.9356 0.9706 0.9356

5 2 5

4 2 4

Positive of 83, True Negative of 51, False Positive of 4 and False Negative of 5 and The confusion matrix for an SVM with L1 scaling is shown in Fig. 8c, with True Positive values of 82, True Negative values of 52, False Positive values of 5, and False Negative values of 4. Figure 8d depicts the confusion matrix for SVM with L2 scaling, with a True Positive of 85, True Negative of 54, False Positive of 2, and False Negative of 2 while the confusion matrix for the SVM with Z-Score scaling is represented in Fig. 8e with True Positive of 82, True Negative of 52, False Positive of 5 and False Negative of 4. In Table 1, SVM performed best with L2 scaling with a recall of 0.96 and a false negative of 2. Even though there is no significant difference in the accuracy of the model when scaled with L2 and without scaling.

Random Forest (RF) The confusion matrix for Random Forest is shown in Fig. 9a with True Positive values of 86, True Negative values of 52, False Positive values of 1, and False Negative values of 4. Figure 9b shows the confusion matrix for Random Forest with Min-Max scaling, with a True Positive of 86, True Negative of 52, False Positive of 1, and False Negative of 4. The confusion matrix for Random Forest with L1 scaling is shown in Fig. 9c, with True Positive values of 85, True Negative values of 54, False Positive values of 2, and False Negative values of 2 while the confusion matrix for Random Forest with L2 scaling is shown in Fig. 9d with True Positive of 86, True Negative of 55, False Positive of 1 and False Negative of 1 and the confusion matrix for Random Forest with Z-Score scaling is shown in Fig. 9e with True Positive of 85, True Negative of 55, False Positive of 2 and False Negative of 1. Random Forest on the other hand produced an improved model on scaling with Z-score and L2 both having a recall of 0.9821 and both producing a false negative of 1. However, using the accuracy as presented in table 2, L2 will be a better choice of scaling for Random Forest.

Explainable Artificial Intelligence with Scaling Techniques. . .

123

Fig. 9 RF Confusion Matrix (a) without scaling; (b) with Min-Max scaling; (c) with L1, (d) with L2 scaling, and (e) with Z-score scaling

4.3.2

Naive Bayes (NB)

Figure 10a displays the confusion matrix for Naive Bayes without scaling, with True Positive values of 84, True Negative values of 51, False Positive values of 3, and

124

A. B. Adelodun et al.

Table 2 RF Model metrics summary Performance Parameters Without scaling With Minscaling Max L1 L2 Z-score

Accuracy 0.9650 0.9650

Precision 0.9811 0.9811

Recall 0.9286 0.9286

F1score 0.9541 0.9541

ROC 0.9585 0.9585

False positive 1 1

False negative 4 4

0.9720 0.9860 0.9790

0.9643 0.9821 0.9649

0.9643 0.9821 0.9821

0.9643 0.9821 0.9735

0.9706 0.9853 0.9796

2 1 2

2 1 1

False Negative values of 5 while the confusion matrix for Naive Bayes with Min-Max scaling is shown in Fig. 10b, with True Positive values of 84, True Negative values of 51, False Positive values of 5, and False Negative values of 5 and the confusion matrix for Naive Bayes with L1 scaling is shown in Fig. 10c with values of 69 True Positive, 54 True Negative, 18 False Positive, and 2 False Negative. The confusion matrix for Naive Bayes with L2 scaling is shown in Fig. 10d with a True Positive of 65, True Negative of 54, False Positive of 22, and False Negative of 2 and the confusion matrix for Naive Bayes with Z-Score scaling may be shown in Fig. 10e, with True Positive values of 69, True Negative values of 54, False Positive values of 18 and False Negative values of 2. As shown in table 3, Naïve Bayes performs best without scaling and behaves the same on scaling with Min-Max. But for this research, precision is of utmost priority, therefore, the model on L1 and Z-score scaling with recall and false negative of 0.96 and 2 respectively are considered the best for the Naïve Bayes model.

4.3.3

Logistic Regression (LR)

The confusion matrix for logistic regression without scaling is shown in Fig. 11a with true positive and true negative values of 81, 49, 6, and 7, respectively while the confusion matrix for Logistic Regression with Min-Max scaling is shown in Fig. 11b, with True Positive values of 87, True Negative values of 51, False Positive values of 0 and False Negative values of 5. Figure 11c shows the confusion matrix for Logistic Regression with L1 scale, with a True Positive of 87, True Negative of 8, False Positive of 0, and False Negative of 48 while the confusion matrix for Logistic Regression with L2 scaling is shown in Fig. 11d, with True Positive values of 87, True Negative values of 21, False Positive values of 0, and False Negative values of 35 and the confusion matrix for Logistic Regression with Z-Score scaling is shown in Fig. 11e with True Positive values of 87, True Negative values of 8, False Positive values of 0 and False Negative values of 48. Only Min-Max scaling outperforms Logistic regression when not scaled. The Min-Max has the best recall and the least false negative as shown in Table 4.

Fig. 10 NB confusion matrix (a) without scaling; (b) with Min-Max scaling; (c) with L1, (d) with L2 scaling, and (e) with Z-score scaling Table 3 NB Model metrics summary Performance parameters Without scaling With Minscaling Max L1 L2 Z-score

Accuracy 0.9441 0.9441

Precision 0.9444 0.9444

Recall 0.9107 0.9107

F1score 0.9273 0.9273

ROC 0.9381 0.9381

False positive 3 3

False negative 5 5

0.8601 0.8322 0.8601

0.7500 0.7105 0.7500

0.9643 0.9643 0.9643

0.8437 0.8182 0.8437

0.8787 0.8557 0.8787

18 22 18

2 2 2

126

A. B. Adelodun et al.

Fig. 11 LR Confusion Matrix (a) without scaling; (b) with Min-Max scaling; (c) with L1, (d) with L2 scaling, and (e) with Z-score scaling Table 4 LR model metrics summary Performance parameters Without scaling With Minscaling Max L1 L2 Z-score

Accuracy 0.9091 0.9650

Precision 0.8909 1.00

Recall 0.8750 0.9107

F1score 0.8829 0.9532

ROC 0.9030 0.9554

False positive 6 0

False negative 7 5

0.6643 0.7552 0.6643

1.00 1.00 1.00

0.1429 0.3750 0.1429

0.2500 0.5455 0.2500

0.5714 0.6875 0.5714

0 0 0

48 35 48

Explainable Artificial Intelligence with Scaling Techniques. . .

4.3.4

127

KNN

The confusion matrix for KNN without scaling is shown in Fig. 12a, with True Positive values of 84, True Negative values of 55, False Positive values of 3, and False Negative values of 1, and the confusion matrix for KNN with Min-Max scaling

Fig. 12 KNN Confusion Matrix (a) without scaling; (b) with Min-Max scaling; (c) with L1, (d) with L2 scaling, and (e) with Z-score scaling

128

A. B. Adelodun et al.

Table 5 Model metrics summary Performance parameters Without scaling With Minscaling Max L1 L2 Z-score

Accuracy 0.9720 0.9580

Precision 0.9483 0.9808

Recall 0.9821 0.9107

F1score 0.9649 0.9444

ROC 0.9738 0.9496

False positive 3 1

False negative 1 5

0.9510 0.9491 0.9510

0.9623 0.9444 0.9623

0.9107 0.9107 0.9107

0.9358 0.9273 0.9358

0.9439 0.9381 0.9439

2 3 2

5 5 5

is depicted in Fig. 12b with True Positive of 86, True Negative of 51, False Positive of 1 and False Negative of 5 while the confusion matrix for KNN with L1 scaling is shown in Fig. 12c with True Positive of 85, True Negative of 51, False Positive of 2 and False Negative of 5. The confusion matrix for the KNN with L2 scaling may be shown in Fig. 12d, with True Positive values of 84, True Negative values of 51, False Positive values of 3, and False Negative values of 5 and the confusion matrix for KNN with Z-Score scaling is represented in Fig. 12e with True Positive of 85, True Negative of 51, False Positive of 2 and False Negative of 5. As indicated in Table 5, KNN requires no scaling for best performance. The recall is 0.98 with a false negative of 1 without scaling.

4.3.5

Decision Tree (DT)

The confusion matrix for the Decision Tree without scaling is shown in Fig. 13a with True Positive values of 84, True Negative values of 51, False Positive values of 3, and False Negative values of 5 and the confusion matrix for the Decision Tree with Min-Max scaling is shown in Fig. 13b, with True Positive values of 84, True Negative values of 50, False Positive values of 3, and False Negative values of 6. The confusion matrix for the Decision Tree with L1 scaling is shown in Fig. 13c, with True Positive values of 79, True Negative values of 54, False Positive values of 8, and False Negative values of 4 while Fig. 13d illustrates the confusion matrix for a decision tree with L2 scaling, with True Positive values of 80, True Negative values of 52, False Positive values of 7, and False Negative values of 4 and the confusion matrix for the Decision Tree with Z-Score scaling is shown in Fig. 13e, with True Positive values of 83, True Negative values of 52, False Positive values of 4, and False Negative values of 4. From Table 6, L1, L2 and Z-score scaling are all at a recall of 0.93 and produce a false negative of 4. However, Z-score scaling gives higher accuracy and fewer false positives therefore considered the best scaling method for Decision Trees.

Explainable Artificial Intelligence with Scaling Techniques. . .

129

Fig. 13 DT Confusion Matrix (a) without scaling; (b) with Min-Max scaling; (c) with L1, (d) with L2 scaling, and (e) with Z-score scaling

130

A. B. Adelodun et al.

Table 6 Model metrics summary Performance parameters Without scaling With Minscaling Max L1 L2 Z-score

4.3.6

Accuracy 0.9441 0.9370

Precision 0.9444 0.9434

Recall 0.9107 0.8929

F1score 0.9273 0.9174

ROC 0.9381 0.9292

False positive 3 3

False negative 5 6

0.9161 0.9231 0.9441

0.8667 0.8814 0.9286

0.9286 0.9286 0.9286

0.8966 0.9043 0.9286

0.9183 0.9241 0.9413

8 7 4

4 4 4

Adaboost

The confusion matrix for Adaboost without scaling and with Min-Max scaling, L1 scaling, and Z-Score scaling correspondingly have the same values as shown in Fig. 14a-c, e with a True Positive of 87, True Negative of 53, False Positive of 0, and False Negative of 3 while Fig. 14d shows the confusion matrix for Adaboost with L2 scaling, with True Positive of 87, True Negative of 54, False Positive of 0 and False Negative of 2. Only L2 scaling has been imparted on Adaboost. Table 7 shows that L2 has a recall of 0.96 and a false negative of 2 on Adaboost, thus the best scaling on this model.

4.3.7

XGBoost

The confusion matrix for XGBoost without scaling and with Min-Max scaling are shown in Fig. 14a, b, respectively, with True Positive values of 85, True Negative values of 53, False Positive values of 2, and False Negative values of 3 while Fig. 14c, e show the confusion matrix for XGBoost with L1 scaling and with Z-Score scaling, respectively, with True Positive of 86, True Negative of 53, False Positive of 1, and False Negative of 3 and the confusion matrix for XGBoost with L2 scaling is shown in Fig. 14d with True Positive of 84, True Negative of 53, False Positive of 3 and False Negative of 3 values (Fig. 15). Table 8 shows that there is no significant difference between the recall of XGBoost without scaling and with scaling. They also all produce negative false of 3 but L1 and Z-score both with higher accuracy of 0.97 and a lesser false positive of 1 will be considered a better scaling option for XGBoost.

Explainable Artificial Intelligence with Scaling Techniques. . .

131

Fig. 14 Adaboost Confusion Matrix (a) without scaling; (b) with Min-Max scaling; (c) with L1, (d) with L2 scaling, and (e) with Z-score scaling Table 7 Model metrics summary Performance parameters Without scaling With Minscaling Max L1 L2 Z-score

Accuracy 0.9790 0.9790

Precision 1.00 1.00

Recall 0.9464 0.9464

F1score 0.9725 0.9725

ROC 0.9732 0.9732

False positive 0 0

False negative 3 3

0.9790 0.9860 0.9790

1.00 1.00 1.00

0.9464 0.9643 0.9464

0.9725 0.9818 0.9725

0.9732 0.9821 0.9732

0 0 0

3 2 3

A. B. Adelodun et al.

132

Fig. 15 XGBoost Confusion Matrix (a) without scaling; (b) with Min-Max scaling; (c) with L1, (d) with L2 scaling, and (e) with Z-score scaling

4.3.8

Comparative Analysis

It is misleading to compare this study with other studies as most only concentrate on the accuracy of the model which is not the best metric for this type of classification. However, presented in Table 9 is the study by Mangukiya et al., 2022 been compared with our proposed model.

Explainable Artificial Intelligence with Scaling Techniques. . .

133

Table 8 XGBoost model metrics summary Performance parameters Without scaling With Minscaling Max L1 L2 Z-score

Accuracy 0.9650 0.9650

Precision 0.9636 0.9636

Recall 0.9464 0.9464

F1score 0.9550 0.9550

ROC 0.9617 0.9617

False positive 2 2

False negative 3 3

0.9720 0.9580 0.9720

0.9815 0.9464 0.9815

0.9464 0.9464 0.9464

0.9636 0.9464 0.9636

0.9675 0.9560 0.9675

1 3 1

3 3 3

Table 9 Our model compared with others

Techniques SVM KNN Random forest Decision tree Naïve Bayes Adaboost XGboost

Accuracy without scaling Mangukiya model Proposed model (%) (%) 57.89 97.20 93.85 97.20 97.36 96.50

Accuracy with scaling Mangukiya model Proposed model (%) (%) 96.49 97.20 57.89 95.80 75.43 98.60

94.73 94.73 94.73 98.24

75.43 93.85 94.73 98.24

94.41 94.41 97.90 96.50

94.41 94.41 98.60 97.20

Table 10 Comparison between our models Model SVM Random forest Naïve Bayes Logistic regression KNN Decision tree Adaboost XGBoost

Best scaling L2 L2 L1/Z-Score Min-Max Perform best without Scaling Z-Score L2 L1/Z-Score

Recall 0.96 0.98 0.96 0.91 0.98 0.93 0.96 0.95

False negative 2 1 2 5 1 4 2 3

Table 9 shows that our model performed better with an outcome of more than 85% because we employ alternative scaling approaches. From Table 10, Random Forest with L2 scaling and KNN without scaling gave the best recall of 0.98 each. But from Table 2 Random Forest with L2 scaling produces a False Positive of 1 and in Table 5, KNN without scaling produces a False Positive of 3. Keeping both False Positive and False Negative at a minimum, Random Forest with L2 scaling is the best classifier among our models.

134

A. B. Adelodun et al.

5 Conclusion and Future Work It is evident from the results that certain methods do not scale well, and it would be false to accept the notion that feature scaling is necessary for greater accuracy. Additionally, for certain algorithms that resulted in an enhanced scaling model, it is crucial to pick the proper metrics to assess the effectiveness of our model in light of the issue we are attempting to solve. Since we want our model to detect the patients at high risk of breast cancer as precisely as possible, recall was employed as the decisive criterion. In medical environments where False Negative performs better, a recall is a valuable statistic. True positive and false positive instances both need to be reported and handled. False Negative results might stall therapy and offer affected women a fictitious sense of security. Future studies should use feature selection to determine the optimum scaling approach for each algorithm. Additionally, it is advisable to aim for an enhanced recall of at least 0.98 and accuracy of at least 98.60%, with a potential for zero False Negatives.

References 1. Awotunde, J. B., Adeniyi, E. A., Ajamu, G. J., Balogun, G. B., & Taofeek-Ibrahim, F. A. (2022). Explainable artificial intelligence in genomic sequence for healthcare systems prediction. In Connected e-health (pp. 417–437). Springer. 2. Ogundokun, R. O., Maskeliūnas, R., & Damaševičius, R. (2022). Human posture detection using image augmentation and hyperparameter-optimized transfer learning algorithms. Applied Sciences, 12(19), 10156. 3. Pärtel, J., Pärtel, M., & Wäldchen, J. (2021). Plant image identification application demonstrates high accuracy in Northern Europe. AoB Plants, 13(4). https://doi.org/10.1093/AOBPLA/ PLAB050 4. Bhowmik, A., & Eskreis-Winkler, S. (2022). Deep learning in breast imaging. BJR|Open, 4, 20210060. 5. Hosny, A., Parmar, C., Quackenbush, J., Schwartz, L. H., & Aerts, H. J. W. L. (2018). Artificial intelligence in radiology. Nature Reviews Cancer, 18(8), 500–510. https://doi.org/10.1038/ S41568-018-0016-5 6. Taylor-Phillips, S., Seedat, F., Kijauskaite, G., Marshall, J., Halligan, S., Hyde, C., GivenWilson, R., Wilkinson, L., Denniston, A. K., Glocker, B., Garrett, P., Mackie, A., & Steele, R. J. (2022). UK National Screening Committee’s approach to reviewing evidence on artificial intelligence in breast cancer screening. The Lancet Digital Health, 4(7), e558–e565. https:// doi.org/10.1016/S2589-7500(22)00088-7 7. Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Undefined, 25(1), 44–56. https://doi.org/10.1038/S41591-018-0300-7 8. Yin, X., Chen, Y., Ruze, R., Xu, R., Song, J., Wang, C., & Xu, Q. (2022). The evolving view of thermogenic fat and its implications in cancer and metabolic diseases. Signal Transduction and Targeted Therapy, 7(1), 324. https://doi.org/10.1038/s41392-022-01178-6 9. Siegel, R. L., Miller, K. D., Fuchs, H. E., & Jemal, A. (2022). Cancer statistics, 2022. CA: A Cancer Journal for Clinicians, 72(1), 7–33. https://doi.org/10.3322/CAAC.21708 10. Ogundokun, R. O., Misra, S., Douglas, M., Damaševičius, R., & Maskeliūnas, R. (2022). Medical Internet-of-Things based breast cancer diagnosis using hyperparameter-optimized neural networks. Future Internet, 14(5), 153. https://doi.org/10.3390/fi14050153

Explainable Artificial Intelligence with Scaling Techniques. . .

135

11. Sung, H., Ferlay, J., Siegel, R. L., Laversanne, M., Soerjomataram, I., Jemal, A., & Bray, F. (2021). Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer Journal for Clinicians, 71(3), 209–249. https://doi.org/10.3322/CAAC.21660 12. Riley, D., Charlton, M., Chrischilles, E. A., Lizarraga, I. M., Phadke, S., Smith, B. J., Skibbe, A., & Lynch, C. F. (2022). Hospital rurality and gene expression profiling for early-stage breast cancer among Iowa residents (2010–2018). The Breast Journal, 9, 1–11. https://doi.org/10. 1155/2022/8582894 13. Feng, Y., Spezia, M., Huang, S., Yuan, C., Zeng, Z., Zhang, L., Ji, X., Liu, W., Huang, B., Luo, W., Liu, B., Lei, Y., Du, S., Vuppalapati, A., Luu, H. H., Haydon, R. C., He, T. C., & Ren, G. (2018). Breast cancer development and progression: Risk factors, cancer stem cells, signaling pathways, genomics, and molecular pathogenesis. Genes and Diseases, 5(2), 77–106. https://doi.org/10.1016/J.GENDIS.2018.05.001 14. Abu Al-Haija, Q., & Adebanjo, A. (2020). Breast cancer diagnosis in histopathological images using ResNet-50 convolutional neural network. IEEE Xplore. 15. Marmot, M. G., Altman, D. G., Cameron, D. A., Dewar, J. A., Thompson, S. G., & Wilcox, M. (2013). The benefits and harms of breast cancer screening: An independent review. British Journal of Cancer, 108(11), 2205–2240. https://doi.org/10.1038/BJC.2013.177 16. Cardoso, F., Kyriakides, S., Ohno, S., Penault-Llorca, F., Poortmans, P., Rubio, I. T., Zackrisson, S., & Senkus, E. (2019). Early breast cancer: ESMO clinical practice guidelines for diagnosis, treatment, and follow-up{. Annals of Oncology, 30(8), 1194–1220. https://doi. org/10.1093/ANNONC/MDZ173 17. McKinney, S. M., Sieniek, M., Godbole, V., Godwin, J., Antropova, N., Ashrafian, H., Back, T., Chesus, M., Corrado, G. S., Darzi, A., Etemadi, M., Garcia-Vicente, F., Gilbert, F. J., Halling-Brown, M., Hassabis, D., Jansen, S., Karthikesalingam, A., Kelly, C. J., King, D., et al. (2020). Erratum: Addendum: International evaluation of an AI system for breast cancer screening (Nature (2020) 577 7788 (89–94)). Nature, 586(7829), E19. https://doi.org/10. 1038/S41586-020-2679-9 18. Duijm, L. E., Broeders, M. J., Setz-Pels, W., van Breest Smallenburg, V., van Beek, H. C., Donkers-van Rossum, A. B., et al. (2022). Effects of nonparticipation at previous screening rounds on the characteristics of screen-detected breast cancers. European Journal of Radiology, 154, 110391. 19. Adegun, A. A., Viriri, S., & Ogundokun, R. O. (2021). Deep learning approach for medical image analysis. Computational Intelligence and Neuroscience, 2021, 1–9. https://doi.org/10. 1155/2021/6215281 20. Soleiman, M., & Yari, H. (2021). Approaches to breast cancer diagnosis. Khazar Journal of Science and Technology, 5, 29. 21. Brady, A. P. (2017). Error and discrepancy in radiology: Inevitable or avoidable? Insights into Imaging, 8(1), 171–182. https://doi.org/10.1007/S13244-016-0534-1 22. Artificial intelligence in mammography Medtech innovation briefing. (2021). www.nice.org.uk/ guidance/mib242 23. Mendelson, E. B. (2019). Artificial intelligence in breast imaging: Potentials and limitations. American Journal of Roentgenology, 212(2), 293–299. https://doi.org/10.2214/AJR.18.20532 24. Awotunde, J. B., Adeniyi, A. E., Ajagbe, S. A., Jimoh, R. G., & Bhoi, A. K. (2022). Swarm intelligence and evolutionary algorithms in processing healthcare data. In Connected e-health (pp. 105–124). Springer. 25. Geras, K. J., Mann, R. M., & Moy, L. (2019). Artificial intelligence for mammography and digital breast tomosynthesis: Current concepts and future perspectives. Radiology, 293(2), 246–259. https://doi.org/10.1148/RADIOL.2019182627 26. Abiodun, M. K., Misra, S., Awotunde, J. B., Adewole, S., Joshua, A., & Oluranti, J. (2021, December). Comparing the performance of various supervised machine learning techniques for early detection of breast cancer. In International conference on hybrid intelligent systems (pp. 473–482). Springer.

136

A. B. Adelodun et al.

27. Dauda, O. I., Awotunde, J. B., AbdulRaheem, M., & Salihu, S. A. (2022). Basic issues and challenges on Explainable Artificial Intelligence (XAI) in healthcare systems. In Principles and methods of explainable artificial intelligence in healthcare (pp. 248–271). 28. Pawar, U., O’shea, D., Rea, S., & O’reilly, R. (2020). Explainable AI in healthcare. 29. Abiodun, K. M., Awotunde, J. B., Aremu, D. R., & Adeniyi, E. A. (2022). Explainable AI for fighting COVID-19 pandemic: Opportunities, challenges, and future prospects. In Computational intelligence for COVID-19 and future pandemics (pp. 315–332). 30. Lundberg, S., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. http:// arxiv.org/abs/1705.07874 31. Kokkotis, C., Giarmatzis, G., Giannakou, E., Moustakidis, S., Tsatalas, T., Tsiptsios, D., Vadikolias, K., & Aggelousis, N. (2022). An explainable machine learning pipeline for stroke prediction on imbalanced data. Diagnostics, 12(10), 2392. https://doi.org/10.3390/ diagnostics12102392 32. Gupta, A., Kaushik, D., Garg, M., & Verma, A. (2020). Machine learning model for breast cancer prediction. In 2020 fourth international conference on I-SMAC (IoT in social, mobile, analytics, and cloud) (I-SMAC) (pp. 472–477). https://doi.org/10.1109/I-SMAC49090.2020. 9243323 33. Mangukiya, M., Vaghani, A., & Savani, M. (2022). Breast cancer detection with machine learning. International Journal for Research in Applied Science and Engineering Technology, 10(2), 141–145. https://doi.org/10.22214/ijraset.2022.40204 34. Folorunso, S. O., Awotunde, J. B., Adeniyi, E. A., Abiodun, K. M., & Ayo, F. E. (2021, November). Heart disease classification using machine learning models. In International conference on informatics and intelligent applications (pp. 35–49). Springer. 35. Omondiagbe, D. A., Veeramani, S., & Sidhu, A. S. (2019, April). Machine learning classification techniques for breast cancer diagnosis. In IOP Conference Series: Materials Science and Engineering (Vol. 495, No. 1, p. 012033). IOP Publishing. 36. Nemade, V., Pathak, S., & Dubey, A. K. (2022). A systematic literature review of breast cancer diagnosis using machine intelligence techniques. Archives of Computational Methods in Engineering, 29, 4401–4430. 37. Tiwari, M., Bharuka, R., Shah, P., & Lokare, R. (2020). Breast cancer prediction using deep learning and machine learning techniques. Available at SSRN 3558786. 38. Debelee, T. G., Gebreselasie, A., Schwenker, F., Amirian, M., & Yohannes, D. (2019). Classification of mammograms using texture and CNN based extracted features. Journal of Biomimetics, Biomaterials and Biomedical Engineering, 42, 79–97. Trans Tech Publications Ltd. 39. Al-Haija, Q. A., & Adebanjo, A. (2020). Breast cancer diagnosis in histopathological images using ResNet-50 convolutional neural network. In 2020 IEEE International IoT, Electronics and Mechatronics conference (IEMTRONICS), 1–7. https://doi.org/10.1109/ IEMTRONICS51293.2020.9216455. 40. Saoud, H., Ghadi, A., & Ghailani, M. (2019). A proposed approach for breast cancer diagnosis using machine learning. In Proceedings of the 4th international conference on Smart City Applications (pp. 1–5). https://doi.org/10.1145/3368756.3369089 41. Wolberg, W. H., Street, W. N., & Mangasarian, O. L. (1995). UCI machine learning repository. Breast Cancer Wisconsin (Diagnostic) Data Set. 42. Memon, M. H., Li, J. P., Haq, A. U., Memon, M. H., & Zhou, W. (2019). Breast cancer detection in the IoT health environment using modified recursive feature selection. Wireless Communications and Mobile Computing, 2019, 1–19. https://doi.org/10.1155/2019/5176705 43. Binary Logistic Regression Model of ML. (n.d.). Retrieved September 27, 2022, from https:// www.tutorialspoint.com/machine_learning_with_python/machine_learning_with_python_ binary_logistic_regression_model.htm 44. Logistic-curve – Sigmoid function – Wikipedia. (n.d.). Retrieved September 27, 2022, from https://en.wikipedia.org/wiki/Sigmoid_function#/media/File:Logistic-curve.svg

Explainable Artificial Intelligence with Scaling Techniques. . .

137

45. Classification: True vs. False and Positive vs. Negative | Machine Learning | Google Developers. (n.d.). Retrieved September 27, 2022, from https://developers.google.com/machinelearning/crash-course/classification/true-false-positive-negative 46. Lichman, M. (2017). UCI machine learning repository. Breast Cancer Wisconsin (Diagnostic) Data Set (2014). 47. machine-learning-1/logistic-regression-model.md at master  EyasuTew/machine-learning-1  GitHub. (n.d.). Retrieved September 27, 2022, from https://github.com/EyasuTew/machinelearning-1/blob/master/week3/logistic-regression-model.md 48. Support Vector Machine(SVM) Algorithms under Supervised Machine Learning (Tutorial) | by Neelam Tyagi | Analytics Steps | Medium. (n.d.). Retrieved September 27, 2022, from https:// medium.com/analytics-steps/support-vector-machine-svm-algorithms-under-supervisedmachine-learning-tutorial-b5a385f05f89 49. Awad, M., & Khanna, R. (2015). Support vector machines for classification. In Efficient Learning Machines (pp. 39–66). https://doi.org/10.1007/978-1-4302-5990-9_3 50. Support Vector Machine — Introduction to Machine Learning Algorithms | by Rohith Gandhi | Towards Data Science. (n.d.). Retrieved September 27, 2022, from https://towardsdatascience. com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47 51. Machine Learning Random Forest Algorithm – Javatpoint. (n.d.). Retrieved September 28, 2022, from https://www.javatpoint.com/machine-learning-random-forest-algorithm 52. Nahid, A., & Kong, Y. (2017). Involvement of machine learning for breast cancer image classification: A survey. Computational and Mathematical Methods in Medicine. https://doi. org/10.1155/2017/3781951 53. Bhise, S., Gadekar, S., Gaur, A., Bepari, S., Kale, D., & Aswale, S. (2021). Breast cancer detection using machine learning techniques. International Journal of Engineering Research & Technology (IJERT), 10, 98. 54. ROC Curve & AUC Explained with Python Examples – Data Analytics. (n.d.). Retrieved September 28, 2022, from https://vitalflux.com/roc-curve-auc-python-false-positive-truepositive-rate/ 55. Molnar, C. (2022). Interpretable machine learning: A guide for making black box models explainable. International Kindle Paperwhite.

A Novel Approach of COVID-19 Estimation Using GIS and Kmeans Clustering: A Case of GEOAI Iyyanki MuraliKrishna and Prisilla Jayanthi

1 Introduction COVID-19 (CV-19) was never known to be the dreadful virus that would have escalated from one county to another. The most human loss was seen in just 2 months, and all researchers all over the world are trying to find the solutions for this disease. Covid-19 brings in different technologies and a mathematical modelling in this study. Murugesan [1] in Covid-19 study analyzed dataset using the IDW interpolation model. The drive of their research was to investigate the spatial distribution of COVID-19 and predict its trend for the disease escalates using GIS software. Mollalo et al. [2] compiled a geo-database of 35 variables that include environment, socio-economic, topographical, and demographical and explains the spatial variability of CV-19 incidence. Implementing SLM and SEM to examine the spatial dependence, GWR, multiscale GWR models and locally examine spatial non-stationarity. Ceylan [3] in CV-19 study implemented ARIMA models to predict the epidemiological trend of CV-19 prevalence of Italy, Spain, and France. For their study, the datasets of CV-19 were collected from WHO website. In other case study of CV-19, Cássaro and Pires [4] commended the difficultly in predicting based on the early data of CV-19 cases with first, second or third weeks how the pandemic would progress. CDF and its derivative were implemented in their study. Gupta et al. [5] performed analysis using bivariate but it failed to find any significant association between the numbers of infected cases in India. However, VIP through PLS technique showed higher significance. The study concludes that hot and dry states of India are more vulnerable to the Covid-19 infection. Biswas &

I. MuraliKrishna Defence Research and Development Organization, Hyderabad, India P. Jayanthi (✉) Ecole Centrale School of Engineering, Mahindra University, Hyderabad, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Hossain et al. (eds.), Explainable Machine Learning for Multimedia Based Healthcare Applications, https://doi.org/10.1007/978-3-031-38036-5_7

139

140

I. MuraliKrishna and P. Jayanthi

Sen [6] stated that the requirement of inverse square law for the number of cases against distance from the epicenter was found to be interesting opinion. Probably this would have some association with the law of gravity in social dynamics. Kastner et al. [7] discusses about NewsStand Corona Viz which the users have access to dynamic alternatives of the disease-related variables corresponding to the numbers of cases via a map query interface. Iban [8] discusses the concept of isolation tracking of Covid-19 and highlights its impact for prevention and recovery phases. The study focuses the necessity of geospatial data science in epidemiological research and community resilience. In this study of Covid-19 cases, an implementation of GIS thematic maps on the confirmed cases for 12 different states is carried out and the K-means Clustering algorithm. The highlight of this study is an equation that is derived for computing the confirmed cases of Covid-19 data based on the daily increase in cases. The data was collected from https://www.mohfw.gov.in/ from 12 April 2020 to 21 May 2020. The day-wise increase for individual graphs depicts how each state rises but why and how is unknown. Despite the lockdown period, few states showed very rapid increases in Covid-19 cases.

2 Discussion and Results 2.1

Temporal Data Distribution

The dataset of CV-19 was obtained from https://www.mohfw.gov.in/ and analyzed using STATA software for the daily increase in different states with the highest confirmed cases shown in the graphs of Fig. 1. The highest increase of cases was found in Maharashtra with over 44,582 cases; whereas Kerala (732) and Karnataka (1743) curves in the graph seem to flatten next comes Telangana with 1761 (Table 1). Kerala states were having the highest number of reported cases in March 2020 however the cases were controlled and decreased in May 2020 and the curve is flat with few cases. Figure 2 describes the daily increase in the confirmed cases in India from May 1, 2020, to May 23, 2020. Day-to-day readings were recorded and the day-wise increase was observed for the 12 different states in Fig. 3. The peak rises of cases were found on May 18, and May 23, 2020, with 5217 and 6654 respectively. The spatial distribution of Covid-19 data using QGIS for depicting the confirmed, recovery, and death cases in Figs. 4, 5, and 6 respectively. The case increased due to the community escalates, in view of the lockdown in almost the whole country (India), the cases were found increasing only in a few states namely Maharashtra, Delhi, Tamil Nadu, and Madhya Pradesh. Delhi cases were reported to increase due to the Tablighi Jamaat gathering in Nizamuddin Markaz in early March 2020 and the virus spread rapidly across Delhi. Delhi happens to have the fourth-highest cases in India.

A Novel Approach of COVID-19 Estimation Using GIS and Kmeans Clustering:. . .

Fig. 1 Day –wise increase graphs in the different states in India as on 23-May, 2020

141

142

I. MuraliKrishna and P. Jayanthi

Table 1 Dataset of Covid-19 of India State Maharashtra Gujarat Madhya Pradesh West Bengal Delhi Rajasthan Uttar Pradesh Tamil Nadu Andhra Pradesh Telangana Karnataka Punjab Jammu & Kashmir Haryana Bihar Odisha Assam Kerala Chandigarh Himachal Pradesh Jharkhand Meghalaya Uttarakhand Arunachal Pradesh An. & Nico. Islands Chhattisgarh D & N Haveli Goa Ladakh Manipur Mizoram Pondicherry Tripura

2.2

Reported confirmed 44,582 13,268 6170 3332 12,319 6494 5735 14,753 2709 1761 1743 2029 1489 1067 2177 1189 259 732 218 168 308 14 153 1 33 172 1 54 44 26 1 26 175

Recover 12,583 5880 3089 1221 5897 3680 3238 7128 1763 1043 597 1847 720 706 629 436 54 512 178 59 136 12 56 1 33 62 0 16 43 2 1 10 152

Demise 1517 802 272 265 208 153 152 98 55 45 41 39 20 16 11 7 4 4 3 3 3 1 1 0 0 0 0 0 0 0 0 0 0

Equation Based on the Values of the Confirmed Cases Obtained

The notation is framed based on the confirmed cases on the day wise increase in cases. This notation gives the nearest value of the India’s confirmed cases, here q is assumed to be the day of the cases known in India as 1st day, 2nd and so on. The results of P are cross-checked with the cases shown in “2020_coronavirus_ pandemic_in_India”.

A Novel Approach of COVID-19 Estimation Using GIS and Kmeans Clustering:. . .

4000

6000

8000

Day- Wise Increase in Confirmed Cases in India

0

2000

Number of Cases Per Day

143

29-Apr-20

6-May-20

13-May-20 Date

20-May-20

27-May-20

Fig. 2 Daywise increase in confirmed cases in India

2000 0

1000

Number of Cases Per Day

3000

Day- Wise Increase in Confirmed Cases in 12 States

08apr2020

22apr2020

06may2020

20may2020

Date Maharashtra

Rajasthan

Madhya Pradesh

Uttar Pradesh

TamilNadu

West Bengal

Andhra Pradesh

Telangana

Gujarat

Karnataka

Kerala

Delhi

Fig. 3 Day-wise increase in confirmed cases in12 different states

144

I. MuraliKrishna and P. Jayanthi

Fig. 4 Confirmed cases in India using QGIS (as on May 23, 2020)

Fig. 5 Recovered cases in India using QGIS

A Novel Approach of COVID-19 Estimation Using GIS and Kmeans Clustering:. . .

145

Fig. 6 Death cases in India using QGIS

Bias (B) is the error value obtained through the notation. P = ð 2Þ 3 q1 þ 1

2q2 þ 1 ðð1=qÞ þ 1Þ 2 ð1=qÞ2 þ 1 ± bias

Let’s consider few values of q to test q = 0, P = 8 ± B q = 1, P = 288 ± B q = 2, P = 486 ± B q = 3, P = 991 ± B q = 4, P = 1856 ± B q = 5, P = 3172 ± B q = 6, P = 5034 ± B q = 7, P = 7536 ± B q = 8, P = 10775 ± B q = 9, P = 14846 ± B and q = 10, P = 19846 ± B Hence, the Eq. (1) can be used to forecast the number of cases in ahead time.

ð1Þ

146

2.3

I. MuraliKrishna and P. Jayanthi

Artificial Intelligence in Covid-19 Drones for Survey During Lockdown

Drones were used in surveying the lockdown in all the states as technology-driven states during the CV-19. CV-19 virus escalates from human to human very rapidly and faster hence, maintaining or bring down the virus escalates. India and many other countries followed lockdown to prevent the virus escalate; even the socialdistancing was implemented in all countries. In such cases, the implementation of drones would help to avoid the virus escalating and will help to understand the study and know the status of the lockdown in the areas of very thickly populated. Drones help in reaching areas that cannot be reached by human foot either for medication or emergencies. This implementation in health centers helps in reaching out to the person in need of medication. In Delhi drones were used for video surveillance of coronavirus, and the disinfectant drones in China to fight corona. In Delhi and Telangana states, drones were used to spray disinfectant in all the areas to avoid human inference and save lives Fig. 7. Various preventive measures were taken to keep the cities and reduce the virus spread.

2.4

Robots During Covid-19

During the CV-19 attack, the patient were quarantined and treated intensively. Hospitals need more beds and more space. The process of quarantine requires a large number of doctors, nurses, and hospital staff, true to the scenario of CV-19, a few other countries namely China, India, Cuba, and many more countries sent doctors, nurses, and staff to assist and help the countries affected due to corona. There were cases where doctors, nurses, and staff treating CV-19 patients are affected by corona and left for eternity. In such cases, the robot helps for handy work and corona is a spreadable disease, this can be overcome by introducing robots in hospitals to lessen the burden and stress gone through each individual doctor/nurse/ staff. Unavoidable circumstance and the tiring situation of CV-19 was seen by each one at the hospital and not to forget the policemen spent their time to control the discipline of the lockdown. Robots would have replaced the policemen standing in red hot–scotch sun, monitoring the roads during the lockdown shown in Fig. 8. In a few states, robots are implemented but if this technique is implemented in all parts of the country may be CV-19 cases would have reduced or would be brought down. Awareness of AI techniques globally to reduce or fight CV-19 should be made known at least to avoid direct contact to overcome the CV-19 virus. Lazer et al. [9] portray no such constraint concerning the derivative and aggregated data. According to Hao [10], the algorithms of machine learning make predictions in real-time but only the catchy point is that with the lack of historical data maintained from previous pandemics, the research team is relying from the current pandemic. Due to a lack of patient report implementation of deep learning

A Novel Approach of COVID-19 Estimation Using GIS and Kmeans Clustering:. . .

147

Fig. 7 Drones in survey and spraying in Delhi, Telangana

techniques are not carried out. Naudé [11] suggests AI contributes to six areas of the fight against CV-19 to name (a) early warnings and alerts, (b) tracking and prediction, (c) data dashboards, (d) diagnosis and prognosis, (e) treatments, and (f) social control. One such algorithm helps to understand based on clusters formed from Table 1 of Fig. 1. The centroids are found for each in the data points available for the confirmed cases of all 12 states in Table 2 using the KMeans algorithm using python language on Spyder. This algorithm is named as KMeans clustering which helps to cluster all the data points which are nearest; this algorithm tries to suggest that the regions which are near to the highest cases will affect their neighboring regions too. In Fig. 9, the KMeans clustering graph is shown. Thus, AI techniques can be implemented to estimate the rapid escalation of CV-19 and helps to make necessary measurable precautions to overcome the escalation.

148

I. MuraliKrishna and P. Jayanthi

Fig. 8 Robots in hospitals and roads Table 2 Centroids obtained using KMeans clustering algorithm

States Andhra Pradesh Delhi Gujarat Karnataka Kerala Madhya Pradesh Maharashtra Rajasthan Tamil Nadu Telangana Uttar Pradesh West Bengal

Centroids 1463.0 3738.0 4721.0 589.0 497.0 2719.0 11506.0 2666.0 2526.0 1039.0 2328.0 795.0

3 Conclusions The study on the number of confirmed cases was carried out to understand the increase caused in each state individually. In spite of the lockdown the number of cases was raising to the extreme side. Various studies were analyzed but the cure for the virus was found in a few countries which showed the curve flattening and also a decrease in the number of cases. Thus, the implementation of GIS Maps and KMeans makes one understand the states of India where the increase of cases were found huge.

A Novel Approach of COVID-19 Estimation Using GIS and Kmeans Clustering:. . .

149

Fig. 9 Kmeans Clustering for confirmed cases

References 1. Murugesan, B., Karuppannan, S., Mengistie, A. T., Ranganathan, M., & Gopalakrishnan, G. (2020). Distribution and trend analysis of COVID-19 in India: Geospatial approach. Journal of Geographical Sciences, 4(1), 1–9. 2. Mollalo, A., Vahedi, B., & Rivera, K. M. (2020). GIS-based spatial modeling of COVID-19 incidence rate in the continental United States. Science of the Total Environment., 728, 138884. 3. Ceylan, Z. (2020). Estimation of COVID-19 prevalence in Italy, Spain, and France. Science of the Total Environment, 729, 138817. 0048-9697/© 2020 Elsevier B.V. 4. Cássaro, F. A. M., & Pires, L. F. (2020). Can we predict the occurrence of COVID-19 cases? Considerations using a simple model of growth. Science of the Total Environment, 728. 138834.0048-9697/©2020. Elsevier B.V. 5. Gupta, A., Banerjee, S., & Das, S. (2020). Significance of geographical factors to the COVID19 outbreak in India. Modeling Earth Systems and Environment. Springer Nature Switzerland AG 2020. 6. Biswas, K., & Sen, P. (2020). Space-time dependence of corona virus (COVID-19) outbreak. arXiv:2003.03149v1 [physics.soc-ph]. 7. Kastner, J., Wei, H., & Samet, H. (2020). Viewing the progression of the novel corona virus (COVID-19) with NewsStand, arXiv:2003.00107. 8. Iban, M. C. (2020). Geospatial data science response to COVID-19 crisis and pandemic isolation tracking. Turkish Journal of Geosciences, 01(0), 1–7. 9. Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of google flu: Traps in big data analysis. Science, 343, 1203–1205. www.sciencemag.org. 10. Hao, K. (2020, March 13). This is how the CDC is trying to forecast coronavirus’s spread. MIT Technology Review. www.technologyreview.com 11. Naudé, W. (2020). Artificial Intelligence against COVID-19: An early review (IZA DP No. 13110). http://ftp.iza.org/dp13110.pdf. ISSN: 2365-9793.

A Brief Review of Explainable Artificial Intelligence Reviews and Methods Ferdi Sarac

1 Introduction AI has been considered as the most pervasive technology especially for the last 5 years. A report by IDC states that the forecast for global spending on artificial intelligence (AI), including software, hardware, and services, will exceed $118 billion in 2022 and $300 billion in 2026 [1]. Nowadays, artificial intelligence is exploited in various sectors for vital issues, such as loan approval in the banking sector, automatic machines in the defense sector, and disease status detection in the health sector. As a result, people are increasingly relying on decisions made by AI in their daily lives. Although artificial intelligence algorithms have been developed to support users in their normal activities, its reliability is questioned by people due to its nature black box structure [2]. In other words, AI-based models often lack explainability and are not considered trustworthy and transparent by users. To bridge this gap, the concept of explainable artificial intelligence (XAI), a field that explains how AI systems make decisions, has emerged. Explainable artificial intelligence (XAI) is proposed to overcome aforementioned problems and enhance transparency and interpretability of the AI models. XAI enables researchers, domain experts, and users to perceive how machine learning algorithms work internally [3]. XAI has attracted the attention of researchers and studies in the field of XAI have increased significantly in recent years. In order to prove this, we conducted a literature search using the keywords “Explainable Artificial Intelligence” on Pubmed, Science Direct and Google Scholar for the last 5 years. The results are shown in Fig. 1. The number of studies carried out in the field of XAI so far in 2022 (November) is 1183, 16,011 and 29,200 in Pubmed, Science Direct and Google F. Sarac (✉) Department of Computer Engineering, Suleyman Demirel University, Isparta, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Hossain et al. (eds.), Explainable Machine Learning for Multimedia Based Healthcare Applications, https://doi.org/10.1007/978-3-031-38036-5_8

151

152

F. Sarac

Fig. 1 The number of XAI studies for the last 5 years

Scholar databases, respectively. These results show that studies in the field of explainable artificial intelligence have increased approximately 4 times in the last 5 years. In this chapter, a brief review of XAI methods and review studies are presented. This chapter is organized as follows. Section 2 describes fundamental terms of XAI, Section 3 presents a brief review of the latest XAI methods and several recent reviews of XAI, and Section 4 concludes the paper.

2 Fundamental Definitions In this section, the differences between XAI and AI are presented and essential XAI terms are defined. AI can be defined as a simulation of human intelligence in computer systems. Although artificial intelligence-based models seem to produce superior performance compared to traditional models, they are unable to produce explanations for their predictions. The purpose of XAI is to provide explanations for AI models and to

A Brief Review of Explainable Artificial Intelligence Reviews and Methods

153

Fig. 2 AI and XAI from human perspective Fig. 3 Four key factors to demonstrate the differences between AI and XAI

maintain their superior performance. Figure 2 illustrates AI and XAI from human perspective. In order to clarify the differences between AI and XAI four essential terms must be considered: Interpretability, Explainability, Transparency and Adaptability. These factors are shown in Fig. 3. Explainability and Interpretability can be considered as the two main terms that distinguish XAI from AI. Although these terms are generally thought to have the same meaning [2, 4], in fact, there are certain differences between these terms. Interpretability supports the explanation of cause-effect associations between the inputs and target of a system [4]. Miller [5] describes interpretability as the degree to which a person understands the reason for a decision made by a system. On the other hand, explainability is associated with decoding the internal structure of a model or a system, such as understanding which attributes contribute to the model’s prediction and why these features contribute to the model. Indeed, explainability can also be defined as the transition key of a system from the black

154

F. Sarac

Fig. 4 The number of peer-reviewed journal articles discussing AI transparency in healthcare for the last 11 years

box model to the white box model. In other words, explainability makes machine learning models more transparent. Transparency of a machine learning system is absolutely essential, especially in healthcare. Thereby, researchers’ interest in AI Transparency in healthcare has been growing exponentially. Bernal and Mazo [6] searched the number of peer-reviewed journal articles discussing artificial intelligence transparency in healthcare in the last 11 years in the Web of science database. The results they obtained are shown in Fig. 4. These results show the importance of transparency in artificial intelligence systems, especially in the field of health. Adaptability refers to the capability of adapting of the model to environmental changes such as users’ level of expertise, domain knowledge, cultural background, and interests. XAI models should be highly adaptable to emerging changes. For example, domain experts may be interested in implementing a new feature or modifying an existing one [7].

3 Recent XAI Methods and Reviews This section presents several recent XAI review studies and XAI methods. XAI review studies and XAI methods are listed with references in Tables 1 and 2, respectively. Additionally, we grouped the XAI methods into two categories: Scope and Methodology. These are further divided into subcategories and presented in Sect. 3.2.

A Brief Review of Explainable Artificial Intelligence Reviews and Methods Table 1 A List of recent XAI review studies along with published year and reference

Author/s Zhang and Zhu Adadi and Berrada Gilpin et al. Miller Gudiotti et al. Guo Ivanovs et al. Langer et al. Chakrobartty and Gayar Machlev et al. Speith Charmet Minh et al.

Year 2018 2018 2018 2019 2019 2020 2021 2021 2021 2022 2022 2022 2022

155 References [8] [3] [9] [5] [10] [11] [12] [13] [14] [15] [16] [17] [18]

Table 2 A list the XAI methods presented in this study Method GA2M

Scope Global

Methodology OT

Usage INT

References [19]

Year 2015

LRP

Both

GRA

INT

[20]

2015

LIME

Local

PER

Post

[21]

2016

CAM

Local

GRA

Post

[22]

2016

SHAP

Both

PER

Post

[23]

2017

GRADCAM

Local

GRA

Post

[24]

2017

DTD

Local

GRA

Post

[25]

2017

PDA

Local

PER

Post

[26]

2017

RISE

Local

PER

Post

[27]

2018

CACE NAM

Global Global

OT OT

Post INT

[28] [29]

2019 2020

LRP-IFT

Both

GRA

INT

[30]

2022

Code repository https://github.com/ interpretml/interpret https://github.com/ chr5tphr/zennit https://github.com/ marcotcr https://github.com/ zhoubolei/CAM https://github.com/ slundberg/shap https://github.com/ jacobgil/pytorch-gradcam https://github.com/ albermax/innvestigate https://github.com/ lmzintgraf/DeepVisPredDiff https://github.com/ eclique/RISE No code repository https://neural-additivemodels.github.io/ https://github.com/ SunJiamei/LRPimagecaptioningpytorch

156

3.1

F. Sarac

Review Studies

In this subsection, some of the recently published XAI review articles are briefly mentioned. Gilpin et al. [9] proposed a taxonomy of interpretability methods for neural networks. They divided these methods into three branches: The first branch includes methods which mimic the processing of data to generate insights for the connections between the model’s inputs and outputs. The second branch includes approaches which attempt to explain the representation of data within a network. The last one consists of self-explanatory transparent networks. Zhang and Zhu [8] reviewed explainable deep learning models and analyzed the recent XAI methods so that interpretability of the pre-trained Convolutional Neural Network model can be achieved. In 2018, Adadi et al. [3] investigated the main aspects of the XAI and examined state-of-the-art XAI approaches and analysed 381 papers between 2004 and 2018. Miller et al. [5] described the challenges of XAI in their study and investigated more than 250 XAI publications in terms of social science. Guidotti et al. [10] discussed essential XAI components including algorithms, data, and problems. They also divided these components into classes. Guo [11] discussed the main XAI approaches for wireless network configurations and summarized XAI studies in 6G. Ivanovs et al. [12] published a survey highlighting the urgent need for XAI and showing recent progress of perturbation-based XAI methods. Langer et al. [13] conducted research to identify both stakeholders and the requirements of the stakeholders. Chakrobartty et al. [14] systematically examined XAI methods in the field of medicine and covered studies between 2008 and 2020. In a recent study, Machlev et al. [15] presented a review of the XAI techniques for power system applications. In a very recent study, Speith [16] reviewed pros and cons of taxonomies of XAI methods and proposed a new taxonomy that combines existing ones. Charmet et al. [17] examined the literature for cybersecurity applications of XAI and the security of XAI. Minh et al. [18] provided a comprehensive review of XAI approaches and they split these techniques are into three different categories: pre and post modeling explainability, and interpretable model. Table 1 shows a list of recent XAI review studies that mentioned above.

3.2

XAI Methods

In this subsection, recent XAI methods are presented. Additionally, these methods are divided into different categories and subcategories. A list of XAI methods is also

A Brief Review of Explainable Artificial Intelligence Reviews and Methods

157

provided along with their categorizations, references, and code repository to help future researchers. Ribeiro et al. [21] proposed a novel XAI method named “Local Interpretable Model Agnostic Description (LIME)” that provides a nonparametric regression model by exploiting permutations of data instances in order to analyze the resulting influence on output. LIME does not directly use the output model to describe the target estimates of a complex model. Instead, it uses a surrogate model that is easier to explain and approximates the output model locally. It should also be noted that LIME does not require any knowledge of the internal components of the model, such as weight and topology; in fact, it only deals with the model’s output. Layer-Wise Relevance Propagation (LRP) [20], as the name suggests, is a propagation based explanation approach. Unlike LIME, it requests access to the internal network structures of the model and distributes an explanatory factor between every adjacent layers. This enables LRP to simplify, therefore, LRP solves the explanation problem more efficiently. In 2022, Sun et al. [30] developed a model by adapting Layer-wise Relevance Propagation (LRP) and BackPropagation-based explanation methods to explain image captioning predictions. They called this method as LRP-IFT. In 2021, Agarwal et al. [31] proposed an explainable model using a new measure of feature relevance and showed that iteratively removing redundant input variables reduces overfitting and improves accuracy. Zingraft et al. [26] proposed Prediction Different Analysis (PDA) method. For a given prediction, each input attribute is assigned a relevance value according to their correlation with predicted class C. The idea behind PDA is that the relevance of an attribute x can easily be predicted by observing the changes on the prediction when the attribute is unknown. In 2017, Lundberg et al. [23] developed Shapley Additive Explanations (SHAP) that is a game theoretic based approach. SHAP exploits Shapley values (Sv) which are broadly used in game theory. In order to generate an interpretable prediction model, SHAP integrates the ideas from LIME and Sv. Deep Taylor Decomposition (DTD) [25] is a gradient-based method that redistributes the outputs of the neural network to the input features layer by layer and determines the relevance of the lower layer elements in the redistribution process using Taylor theory (first order). Class Activation Mapping (CAM) approach [22] is a CNN-based interpretability technique that obtains the class-specific significance of each position of an image by calculating a weighted sum of the feature activation values at that location in all channels. The Gradient-weighted Class Activation Mapping (GradCAM) [24] method can be considered as an enhanced version of the CAM method and it can only be applied to Convolutional Neural Networks (CNNs). It exploits gradient information of the final layer of CNN where the most important features are appeared. It aims at enhancing model interpretability of CNN models while preserving its high level performance.

158

F. Sarac

Petsiuk et al. [27] proposed a black box XAI method named the Randomized Input Sampling for Explanation (RISE). The RISE creates a salience map that reveals the importance of each pixel in the image for network estimation. It is a perturbation-based method; therefore, it does not require intrinsic information such as gradients. Goyal et al. [28] proposed the Causal Concept Effect (CACE) metric that measures the causal effect of the presence or absence of high level concepts such as brightness in the image on the prediction, in order to explain classifiers’ decisions. The authors leveraged a VariationalAutoEncoder (VAE)-based architecture to generate counterfactuals. Generalized Additive Models (GAM) [32] can be considered as an extension of traditional multiple linear model: y = a 0 þ a1 x 1 þ a 2 x 2 þ . . . þ a i x i þ ℇ by replacing each linear element a1x1 with a non-linear fj(xj) function we can obtain the following formula: y = a0 þ f 1 ðx1 Þ þ f 2 ðx2 Þ þ . . . þ f i ðxi Þ þ ℇ where xis are the input features, y is the output, and fi is a univariate shape function. This is called GAM since estimation is made for each fj(xj) function for j = 1, 2,. . . i. Then they all added together.Caruana et al. [19] introduced a variation of GAM called GA2Ms, in which pairwise interactions were added to traditional GAM. They used the proposed method for health problems. In a recent study, Agarwal et al. [29] proposed a novel XAI method named Neural Additive Models (NAM) that can be considered as an extension of (GAM). The difference of this proposed method from GAM is that it exploits deep learning-based neural networks to learn nonlinear patterns. The main disadvantage of NAM is that the method is model specific. We divided Explainability of AI methods into two branches, which are Scope and Methodology, to determine the type of explainability method. These are shown in Fig. 4. If the explanation is post hoc, it can be derived from a pre-trained model. On the other hand, intrinsic explanations are related to internal structure of the model where the model is already explainable. Most post-hoc methods are Model-Agnostic (MA), meaning they are not dependent on the architectural model. Unlike post-hoc methods, internal or intrinsic methods are specific to the architectural model. Therefore, these methods are Model Specific. The scope of the explanations can be grouped into two different categories. Local annotations describe only one part of the model such as description of a feature. In contrast, global annotations describe the entire model. XAI methods can also categorized as Permutation-Based (PER), Gradient or Backpropagation-Based (GRA) or Others (OTH) as shown in the Fig. 5. Perturbation-based methods change the inputs of a model and investigate the effect of changes on the outputs. Investigation of changes in the target variable is

A Brief Review of Explainable Artificial Intelligence Reviews and Methods

159

Fig. 5 Categorization of explainability methods

expected to reveal the essential parts which are crucial for the prediction. The significance of the perturbated feature is calculated by comparing the output in which the feature is present with the output in its absence. Gradient based methods utilize partial derivatives during the backpropagation stage of the neural network algorithms, where the gradients propagating back from the prediction layer to the input layer. Table 2 shows a list of the XAI methods mentioned above along with their scope, methodologies, usages, references, and publication years.

4 XAI in Medicine In this section, XAI methods and review studies in the field of health are discussed. Artificial intelligence (AI) has been broadly utilised in medical applications including health service management, predictive medicine, clinical decision making, and patient data and diagnostics [33]. While it is broadly accepted that AI will revolutionize healthcare in the future, significant progress is yet to be made in order to gain the trust of healthcare professionals and patients. Increasing AI transparency is a

160

F. Sarac

Fig. 6 A visual representation of a medical XAI application

reliable way to address such trust issues. However, transparency still lacks maturity and definitions. AI has not yet gained full acceptance in healthcare, because healthcare professionals have not been able to validate the results of the prediction provided by AI models. One of the most important aspects of trust is explainability, especially in medical domain, since trust will depend on justifications of AI algorithms’ output, which lead understanding of the inner work of the AI systems. Interpretable AI models can explain why a particular decision is made for a particular patient by showing the behaviors of the model which lead to the prediction. Hence, the lack of interpretability limits the usage AI models in medical domain [34]. Zhang et al. [35] provided a visual representation of a medical XAI application which is shown in Fig. 6. The authors claim that if an intrinsic XAI method is used, it will allow the medical application to analyse medical data and provide explanations to the doctors. On the other hand, if a post hoc XAI model is used, black box approaches will be applied to medical data to make decisions, then, the post hoc XAI method will provide explanation to the black box methods. In the next subsection, we present four XAI methods (also mentioned in Sect. 3.2) which are widely used in medical applications.

4.1

SHAP

The idea behind the SHAP is the cooperative game theory approach [36]. It prioritizes attributes according to their average contribution to the model prediction. The SHAP is a perturbation based explainability method that can be exploited for both image and tabular data. Some more effective models have been developed based on the SHAP method, such as KernelSHAP, DeepSHAP, and TreeSHAP. SHAP is a widely used XAI method for medical applications, such as cancer classification, diagnosis of COVID-19 and Heart Anomaly Detection. Biomarkers can be beneficial for diagnosing or treating diseases. SHAP can also be exploited in medical applications to identify discriminative biomarkers that contribute to a particular disease.

A Brief Review of Explainable Artificial Intelligence Reviews and Methods

161

Table 3 A list of some recent XAI studies where the SHAP is used for medical applications Name Explainable machine learning model for predicting first-time acute exacerbation in patients with chronic obstructive pulmonary disease

Objective Prediction of first-time acute exacerbation in patients with chronic obstructive pulmonary disease

References [37]

Year 2022

Predicting the occurrence of postoperative malnutrition

Prediction of malnutrition status for children with congenital heart disease

[38]

2022

Forecasting adverse surgical events using self-supervised transfer learning for physiological signals A hierarchical expertguided machine learning framework for clinical decision support systems: an application to traumatic brain injury prognostication

Forecasting adverse surgical events

[39]

2021

Traumatic brain injury prognostication

[40]

2021

XOmiVAE: an interpretable deep learning model for cancer classification using high-dimensional omics data A robust interpretable deep learning classifier for heart anomaly detection without segmentation

Cancer classification

[41]

2021

Heart anomaly detection

[42]

2021

A multilayer multimodal detection and prediction model based on explainable artificial intelligence for Alzheimer’s disease

Multimodal detection and prediction model for Alzheimer’s disease

[43]

2021

Patient-level cancer prediction models from a nationwide patient cohort: model development and validation

Patient-level cancer prediction

[44]

2021

Results Accuracy: 0.7843 Specificity: 0.7749 Sensitivity: 0.7941 AUC: 0.832 F1 score: 0.7105 Accuracy: 0.81 Specificity: 0.88 Sensitivity: 0.85 AUC: 0.87 AUC: 0.84–0.89

Accuracy: 0.7488 Specificity: 0.7750 Sensitivity: 79.41 AUC: 0.8085 F1 score: 0.7045 AUC: 0.94 F1 score: 0.90

Accuracy: 0.9978 Specificity: 0.9972 Sensitivity: 0.9977 Accuracy: 0.8708 AUC: 0.8708 F1 score: 0.8709 AUC: 0.74–0.86

162

F. Sarac

Table 3 provides a list of some recent XAI studies using the SHAP method as an explainability method. The studies are listed along with their objectives, years, and results. The results are provided in the list so that future researchers can select appropriate XAI methods for their research.

4.2

GRADCAM

GRADCAM is a gradient based explainability approach that generates heat maps to provide explanations of classification results. Recently, GRADCAM has been commonly used in medical applications including chest x-ray and brain tumors to diagnose patients’ diseases and affected areas. Table 4 provides a list of some recent XAI studies using the GRADCAM as an explainability method in medical domain. The studies are listed along with their objectives, years, and results.

Table 4 A list of some recent XAI studies in which the GRADCAM method is used Name Interpretable deep learning approach for oral cancer classification using guided attention inference network

Objective Classification of oral cancer

References [45]

Year 2022

The clinical value of explainable deep learning for diagnosing fungal keratitis using in vivo confocal microscopy images

Detection of fungal keratitis

[46]

2021

Explainable DCNN based chest X-ray image analysis and classification for COVID-19 pneumonia detection

Detection of COVID-19

[47]

2021

Robust and interpretable convolutional neural networks to detect glaucoma in optical coherence tomography images Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning

Detect glaucoma

[48]

2021

The assessment of the risk of breast cancer

[49]

2021

Results Accuracy: 0.8484 Specificity: 0.766 Sensitivity: 0.893 Accuracy: 0.965 Specificity: 0.982 Sensitivity: 0.936 AUC: 0.983 Accuracy: 0.9603 Precision: 0.9615 F1 score: 0.96 Accuracy: 0.911 AUC: 0.955

A Brief Review of Explainable Artificial Intelligence Reviews and Methods

4.3

163

LRP

LRP is a gradient based, intrinsic XAI method that used to explain decisions of AI methods. It is commonly utilized for interpreting CNNs applied on the image data. Table 5 lists recent XAI studies using LRP as an explanatory method in the medical field. There are only three recent studies using LRP in the medical field. In these studies, LRP is utilized for image classification. LRP appears to be less popular for medical applications than both SHAP and GradCAM (with only three recent studies).

4.4

Lime

LIME approaches the prediction by utilizing local surrogate models that may be linear regressions or decision trees. It is a perturbation based and post hoc model that locally perturbs the input around the sample till it reaches a linear approximation and helps the decision maker in justifying the model’s behavior [53]. Table 6 shows recent XAI studies that used the LIME in medical domain.

Table 5 A list of recent XAI studies used the LRP for medical applications Name Morphological and molecular breast cancer profiling through explainable machine learning Explaining decisions of graph convolutional neural networks: patient-specific molecular subnetworks responsible for metastasis prediction in breast cancer

Objective Morphological and molecular breast cancer profiling

References [50]

Year 2021

Results Accuracy: 0.98

Patient-specific molecular subnetworks responsible for metastasis prediction in breast cancer

[51]

2022

Revealing the unique features of each individual’s muscle activation signatures

Explain biometrics using EMG

[52]

2021

Accuracy: 0.81 Sensitivity: 0.85 Specificity: 0.88 AUC: 0.87 Accuracy: 0.993 (for pedalling) Accuracy: 0.989 (for walking)

164

F. Sarac

Table 6 A list of recent XAI studies where the LIME is used for medical applications Name Deep learning for prediction of depressive symptoms in a large textual dataset Human activity recognition using wearable sensors, discriminant analysis, and long short-term memory-based neural structured learning Interpretable heartbeat classification using local model-agnostic explanations on ECGs

Objective Prediction of depressive symptoms in a large textual dataset Human activity recognition

References [54]

Year 2022

Results Accuracy: 0.9977

[55]

2021

Recall: 0.99

Interpretable heartbeat classification

[56]

2021

Sensitivity: 0.8950 AUC: 0.888

5 Discussion and Future Directions XAI is a field used to provide transparency and generate explanations for artificial intelligence systems. Recently, the popularity of XAI has increased, especially in the medical industry. In this chapter, we aim at presenting the latest level of XAI research, particularly in healthcare. If users are satisfied with the explanations provided by the XAI models, they will have confidence in their decision making. However, the details of decision making for complex DNN models with thousands of hidden layers are not yet clear. When it comes to human health, the risk to be taken should be close to zero. Therefore, human supervision is still necessary even when interpretable algorithms are used in medical applications.

6 Conclusion In this chapter, we covered the essential XAI concepts and terms, relevant XAI review studies, and recent XAI methods. We categorized XAI methods with respect to their Scope and Methodology, then, subcategorized these terms into global, local, intrinsic, and post-hoc. We explored that most of the XAI methods are post-hoc meaning that they are not transparent. We provided a list of recent XAI methods along with their scope, methodology, reference, usage, and code repository in order to assist future practitioners in their research. Additionally, we briefly reviewed the XAI methods which are applied to healthcare.

A Brief Review of Explainable Artificial Intelligence Reviews and Methods

165

References 1. IDC. (2022). Worldwide spending on AI-centric systems will pass $300 billion by 2026. https:// www.idc.com/getdoc.jsp?containerId=prUS49670322. Accessed 18 Dec 2022. 2. Islam, S. R., Eberle, W., Ghafoor, S. K., & Ahmed, M. (2021). Explainable artificial intelligence approaches: A survey. arXiv preprint arXiv:2101.09429. 3. Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access, 6, 52138–52160. 4. Linardatos, P., Papastefanopoulos, V., & Kotsiantis, S. (2020). Explainable AI: A review of machine learning interpretability methods. Entropy, 23(1), 18. 5. Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1–38. 6. Bernal, J., & Mazo, C. (2022). Transparency of artificial intelligence in healthcare: Insights from professionals in computing and healthcare worldwide. Applied Sciences, 12(20), 10228. 7. Sheu, R. K., & Pardeshi, M. S. (2022). A survey on medical explainable AI (XAI): Recent progress, explainability approach, human interaction and scoring system. Sensors, 22(20), 8068. 8. Zhang, Q. S., & Zhu, S. C. (2018). Visual interpretability for deep learning: A survey. Frontiers of Information Technology & Electronic Engineering, 19(1), 27–39. 9. Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., & Kagal, L. (2018, October). Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th international conference on data science and advanced analytics (DSAA) (pp. 80–89). 10. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., & Pedreschi, D. (2018). A survey of methods for explaining black box models. ACM Computing Surveys (CSUR), 51(5), 1–42. 11. Guo, W. (2020). Explainable artificial intelligence for 6G: Improving trust between human and machine. IEEE Communications Magazine, 58(6), 39–45. 12. Ivanovs, M., Kadikis, R., & Ozols, K. (2021). Perturbation-based methods for explaining deep neural networks: A survey. Pattern Recognition Letters, 150, 228–234. 13. Langer, M., Oster, D., Speith, T., Hermanns, H., Kästner, L., Schmidt, E., Sesing, A., & Baum, K. (2021). What do we want from explainable artificial intelligence (XAI)??A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artificial Intelligence, 296, 103473. 14. Chakrobartty, S., & El-Gayar, O. (2021). Explainable artifcial intelligence in the medical domain: A systematic review. Proceedings of the Americas Conference on Information Systems (AMCIS). 15. Machlev, R., Heistrene, L., Perl, M., Levy, K. Y., Belikov, J., Mannor, S., & Levron, Y. (2022). Explainable Artificial Intelligence (XAI) techniques for energy and power systems: Review, challenges and opportunities. Energy and AI, 9, 100169. 16. Speith, T. (2022). A review of taxonomies of explainable artificial intelligence (XAI) methods. In 2022 ACM conference on fairness, accountability, and transparency (pp. 2239–2250). 17. Charmet, F., Tanuwidjaja, H. C., Ayoubi, S., Gimenez, P. F., Han, Y., Jmila, H., et al. (2022). Explainable artificial intelligence for cybersecurity: A literature survey. Annals of Telecommunications, 77, 1–24. 18. Minh, D., Wang, H. X., Li, Y. F., & Nguyen, T. N. (2022). Explainable artificial intelligence: A comprehensive review. Artificial Intelligence Review, 55, 3503–3568. 19. Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., & Elhadad, N. (2015, August). Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1721–1730). 20. Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K. R., & Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One, 10(7), e0130140.

166

F. Sarac

21. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016, pp. 1135–1144. 22. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2016). Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2921–2929). 23. Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, 4765–4774. 24. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Gradcam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision (pp. 618–626). 25. Montavon, G., Lapuschkin, S., Binder, A., Samek, W., & Müller, K. R. (2017). Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognition, 65, 211–222. 26. Zintgraf, L. M., Cohen, T. S., Adel, T., & Welling, M. (2017). Visualizing deep neural network decisions: Prediction difference analysis. arXiv preprint arXiv:1702.04595. 27. Petsiuk, V., Das, A., & Saenko, K. (2018). Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421. 28. Goyal, Y., Feder, A., Shalit, U., & Kim, B. (2019). Explaining classifiers with causal concept effect (cace). arXiv preprint arXiv:1907.07165. 29. Agarwal, R., Melnick, L., Frosst, N., Zhang, X., Lengerich, B., Caruana, R., & Hinton, G. E. (2021). Neural additive models: Interpretable machine learning with neural nets. Advances in Neural Information Processing Systems, 34, 4699–4711. 30. Sun, J., Lapuschkin, S., Samek, W., & Binder, A. (2022). Explain and improve: LRP-inference fine-tuning for image captioning models. Information Fusion, 77, 233–246. 31. Agarwal, P., Tamer, M., & Budman, H. (2021). Explainability: Relevance based dynamic deep learning algorithm for fault detection and diagnosis in chemical processes. Computers & Chemical Engineering, 154, 107467. 32. Hastie, T. J., & Tibshirani, R. J. (1990). Generalized additive models (Monographs on statistics and applied probability) (Vol. 43, p. 335). Chapman & Hall. 33. Loh, H. W., Ooi, C. P., Seoni, S., Barua, P. D., Molinari, F., & Acharya, U. R. (2022). Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022). Computer Methods and Programs in Biomedicine, 226, 107161. 34. Lundberg, S. M., Nair, B., Vavilala, M. S., Horibe, M., Eisses, M. J., Adams, T., et al. (2018). Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering, 2(10), 749–760. 35. Zhang, Y., Weng, Y., & Lund, J. (2022). Applications of explainable artificial intelligence in diagnosis and surgery. Diagnostics, 12(2), 237. 36. Kuhn, H. W., & Tucker, A. W. (Eds.). (2016). Contributions to the theory of games (AM-24), Volume I (Vol. 24). Princeton University Press. 37. Kor, C. T., Li, Y. R., Lin, P. R., Lin, S. H., Wang, B. Y., & Lin, C. H. (2022). Explainable machine learning model for predicting first-time acute exacerbation in patients with chronic obstructive pulmonary disease. Journal of Personalized Medicine, 12(2), 228. 38. Shi, H., Yang, D., Tang, K., Hu, C., Li, L., Zhang, L., et al. (2022). Explainable machine learning model for predicting the occurrence of postoperative malnutrition in children with congenital heart disease. Clinical Nutrition, 41(1), 202–210. 39. Chen, H., Lundberg, S. M., Erion, G., Kim, J. H., & Lee, S. I. (2021). Forecasting adverse surgical events using self-supervised transfer learning for physiological signals. NPJ Digital Medicine, 4(1), 167. 40. Farzaneh, N., Williamson, C. A., Gryak, J., & Najarian, K. (2021). A hierarchical expert-guided machine learning framework for clinical decision support systems: An application to traumatic brain injury prognostication. NPJ Digital Medicine, 4(1), 78.

A Brief Review of Explainable Artificial Intelligence Reviews and Methods

167

41. Withnell, E., Zhang, X., Sun, K., & Guo, Y. (2021). XOmiVAE: An interpretable deep learning model for cancer classification using high-dimensional omics data. Briefings in Bioinformatics, 22(6), bbab315. 42. Dissanayake, T., Fernando, T., Denman, S., Sridharan, S., Ghaemmaghami, H., & Fookes, C. (2020). A robust interpretable deep learning classifier for heart anomaly detection without segmentation. IEEE Journal of Biomedical and Health Informatics, 25(6), 2162–2171. 43. El-Sappagh, S., Alonso, J. M., Islam, S. M., Sultan, A. M., & Kwak, K. S. (2021). A multilayer multimodal detection and prediction model based on explainable artificial intelligence for Alzheimer’s disease. Scientific Reports, 11(1), 1–26. 44. Lee, E., Jung, S. Y., Hwang, H. J., & Jung, J. (2021). Patient-level cancer prediction models from a nationwide patient cohort: Model development and validation. JMIR Medical Informatics, 9(8), e29807. 45. Figueroa, K. C., Song, B., Sunny, S., Li, S., Gurushanth, K., Mendonca, P., et al. (2022). Interpretable deep learning approach for oral cancer classification using guided attention inference network. Journal of Biomedical Optics, 27(1), 015001–015001. 46. Xu, F., Jiang, L., He, W., Huang, G., Hong, Y., Tang, F., et al. (2021). The clinical value of explainable deep learning for diagnosing fungal keratitis using in vivo confocal microscopy images. Frontiers in Medicine, 8, 797616. 47. Hou, J., & Gao, T. (2021). Explainable DCNN based chest X-ray image analysis and classification for COVID-19 pneumonia detection. Scientific Reports, 11(1), 1–15. 48. Thakoor, K. A., Koorathota, S. C., Hood, D. C., & Sajda, P. (2020). Robust and interpretable convolutional neural networks to detect glaucoma in optical coherence tomography images. IEEE Transactions on Biomedical Engineering, 68(8), 2456–2466. 49. Qian, X., Pei, J., Zheng, H., Xie, X., Yan, L., Zhang, H., et al. (2021). Prospective assessment of breast cancer risk from multimodal multiview ultrasound images via clinically applicable deep learning. Nature Biomedical Engineering, 5(6), 522–532. 50. Binder, A., Bockmayr, M., Hägele, M., Wienert, S., Heim, D., Hellweg, K., et al. (2021). Morphological and molecular breast cancer profiling through explainable machine learning. Nature Machine Intelligence, 3(4), 355–366. 51. Chereda, H., Bleckmann, A., Menck, K., Perera-Bel, J., Stegmaier, P., Auer, F., et al. (2021). Explaining decisions of graph convolutional neural networks: Patient-specific molecular subnetworks responsible for metastasis prediction in breast cancer. Genome Medicine, 13, 1–16. 52. Aeles, J., Horst, F., Lapuschkin, S., Lacourpaille, L., & Hug, F. (2021). Revealing the unique features of each individual’s muscle activation signatures. Journal of the Royal Society Interface, 18(174), 20200770. 53. Knapič, S., Malhi, A., Saluja, R., & Främling, K. (2021). Explainable artificial intelligence for human decision support system in the medical domain. Machine Learning and Knowledge Extraction, 3(3), 740–770. 54. Uddin, M. Z., Dysthe, K. K., Følstad, A., & Brandtzaeg, P. B. (2022). Deep learning for prediction of depressive symptoms in a large textual dataset. Neural Computing and Applications, 34(1), 721–744. 55. Uddin, M. Z., & Soylu, A. (2021). Human activity recognition using wearable sensors, discriminant analysis, and long short-term memory-based neural structured learning. Scientific Reports, 11(1), 16455. 56. Neves, I., Folgado, D., Santos, S., Barandas, M., Campagner, A., Ronzio, L., et al. (2021). Interpretable heartbeat classification using local model-agnostic explanations on ECGs. Computers in Biology and Medicine, 133, 104393.

Systematic Literature Review in Using Big Data Analytics and XAI Applications in Medical Behcet Oznacar and Utku Kose

1 Introduction 1.1

The Concept of Big Data

Big data, especially in the last few years, has aroused interest primarily in the field of engineering, and then in different scientific fields. Although big data brings many advantages with it, it has left a question mark in the mind in the context of data control. Using big data is considered to be a popular field of recent times in practice and its use in academic research. Researchers have been using multiple types of big data and several higher methods to analyze big data [15; p. 145]. Digitalization and the increase of online applications in both science and business have also increased the more data. Big data, which comes with digitalization, has taken its place in our lives as a term. In the same time, the comprehensive data growth has popularized the terms big data describing the resource of data of various types and sources and involved analytical techniques [19; p. 129]. With access to the Internet and advances in computer science, many regular and irregular data have emerged [12]. Big data commonly refers to datasets that are very high in velocity, volume, and variety [4; p. 443]. Big data is data that contains more diversity and is rapidly increasing in volume [25]. Those are data sets that express volume, speed and diversity. The 5Vs are as in the Table 1. B. Oznacar (✉) Ataturk Faculty of Education, Near East University, Nicosia, Northern Cyprus e-mail: [email protected] U. Kose Department of Computer Engineering, Suleyman Demirel University, Isparta, Turkey University of North Dakota, Grand Forks, ND, USA e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Hossain et al. (eds.), Explainable Machine Learning for Multimedia Based Healthcare Applications, https://doi.org/10.1007/978-3-031-38036-5_9

169

170

B. Oznacar and U. Kose

Table 1 Five features of big data [4, 5; p. 443, 13; p. 642] Volume Big data are generated continuously from millions of devices and applications

Velocity Data is generated quickly and should be processed easily in order to extract important information and knowledge.

Variety Big data is created from a big sources and in a variety of formats.

Veracity Refers to the quality aspect since the data can be collected from multiple sources, which may include low-quality samples.

Verification It is by whom and in what way the obtained data will be viewed, and which of these data should not be disclosed.

Big data has been incorporated into various research areas and attractive innovations have been introduced. [21]. Every individual who uses the Internet contributes to the growth and development of big data. Big data; observations, research, search engines, blogs, social media and many other sources of data meaningful and actionable. Big data provides important contributions to making the right decisions and developing strategies, especially by examining the customer behavior of companies. The data at hand are made the simplest and most processable, and the comparison method is used. The relationships of the obtained data with each other are examined. In this way, it is possible to predict the consequences of the decisions to be taken in advance [27]. Predictive research comes across here. With the simulations created by changing the locations of various points in the data, the reactions to different decisions can be seen. Since big data is based entirely on the analysis of real data, it reveals useful results in many ways. It saves cost and time in every area where we can make estimates. Big Data’s success is necessarily linked to an intelligent management of data selection and use as well as collective works towards apparent rules regarding data quality [3]. Big data analytics is the use of advanced analytical techniques against very large, diverse datasets containing structural, semi-structural and non-structural data from distinct sources and in new sizes from terabytes to zetabytes. Big data and big data analytics have brought about a number of discussions in addition to contributing to the scientific world. One of them was the issue of ethics. Because with the increasing use and reuse of big data sets, new ethical concerns emerge [10; p. 2]. Big data analytics is the process of examining and analyzing big data that can help to make decisions, especially for hidden patterns, unknown correlations, market trends, customer preferences, and other useful information [7; p. 1]. The reason why big data analytics was necessary was the abundance of information in different fields such as institutions, organizations, security, education, biology, marketing, astronomy and medicine. Through better analysis of the large data there is the potential for making faster advances in many scientific disciplines and improving the profitability and success of many enterprises [1]. The field of education should also be included in this potential. Big data should be used to discover new approaches and methods in education, to discover student success and interest, and to reach different dimensions in measurement and evaluation [2, 9].

Systematic Literature Review in Using Big Data Analytics and. . .

1.2

171

Explainable Artificial Intelligence: XAI

Explainable artificial intelligence is a processes and particular methods that contribute us to understand and rely on the outcomes produced by machine learning algorithms. As artificial intelligence is gradually developing, people are having a hard time understanding and following how the algorithm comes to a conclusion. There are many advantages to understanding how an artificial intelligence-enabled system leads to a certain result. Explainability increases the reliability of the result and can be important in that it allows those affected by a decision to change the result. Explainable artificial intelligence is basic requirements for the application of artificial intelligence. Just like in big data analytics, a process based on trust should be followed and ethical principles should be taken into account. To help implement artificial intelligence responsible, organizations need to placed ethical rules in artificial intelligence effect and procedures by creating artificial intelligence systems based on trust and transparency. An XAI is an artificial intelligence whose actions can be easily understood by humans and it implies dealing with interpretable machine learning (IML) models, whose behavior and predictions are understandable to humans [6; p. 1]. ML has grown large in both research and industrial applications, especially with the success of DL and NNs so large that its influence and possible after-effects should be taken seriously [22; p. 1]. Sometimes it is necessary not to know the reason behind the output in the data we obtain. It will be very useful, especially in areas that are important for detecting and repairing errors. It is necessary to take precautions so that there are no serious consequences. As in areas such as health sciences, detecting errors in other disciplines increases the scientific quality. Artificial intelligence is implemented as a “black box”, which gives the output after a certain input, but it does not turn out how it was obtained [16; p. 1]. In this context, the effort to make clear this black box is called “explainable artificial intelligence.” XAI has emerged as a response to the increasing “black box” problem of AI, according to which models and their performance are not comprehensible by humans [8; p. 1284]. An XAI is a set of methods developed to make the model more understandable by revealing the relationships between the output and input variables for each observation in order for the results obtained as a result of modeling to be more understandable and explainable. XAI models can be classified as follows [8; p. 1289]: I. Intrinsically transparent II. Model agnostic XAI frameworks The features of XAI are transparency, justification, informativeness and uncertainty estimation. In addition, there are lots of variations between dissimilar XAI methods, such as whether a method is global, local, ante-hoc, post-hoc or surrogate [11; p. 1].

172

B. Oznacar and U. Kose

The most common tools on XAI can be classified as follows: I. II. III. IV. V. VI.

LIME DeepVis TreeInterpreter Microsoft InterpretML MindsDB SHAP etc.

The aim of the functioning of XAI is: I. II. III. IV. V. VI.

I understand why I understand why not I know when you succeed I know when you fail I know when to trust you I know why you erred

An XAI is used in many areas. A machine using explainable artificial intelligence could significantly save medical personnel time and allow them to focus on the interpretive study of medicine instead of a repetitive task. More patients can be seen and at the same time more consideration can be given to each patient. The potential value is great, but the traceable explanation offered by explainable artificial intelligence is definitely a requirement. Explainable AI not only allows a machine to evaluate the data and come to a conclusion, but it must also give a doctor or nurse the decision tree data to understand how that conclusion was reached.

1.3

Big Data Analytics and XAI Applications in Medical

Big Data in Health refers to large data sets that are captured and stored electronically, routinely or automatically collected for the purpose of improving health and health system performance. This information leads us to the concept of medical data. Within this scope, the patient’s gender, progress notes, problems, treatments, vital signs (vital signs, warnings), medical resume, immunization, and laboratory data, and radiology reports includes” definition provides. Due to its nature, the healthcare sector is one of the areas where Big Data analytics applications are significant factors. The health records of patients and the patterns obtained from the mass storage of data resulting from the mass storage of these records can pave the way for revolutionary developments in the field of health. Owing to Big Data, the symptoms of a large number of patients are analyzed, the general course of diseases is accurately determined and early diagnosis is provided. Also; big Data analysis is also very useful for improving health services and treatment methods, detecting infectious diseases, and monitoring hospital quality. The pharmaceutical industry, which is part of the health sector, is also one of the

Systematic Literature Review in Using Big Data Analytics and. . .

173

Fig. 1 The correlation on deep learning, machine learning, artificial intelligence, and XAI

areas where Big Data applications are heavily utilized. In fact, the development of drugs with new and complex molecular structures used for the treatment of diseases is possible only by using Big Data analysis techniques. BD was then characterized as the representation of information assets described by a high volume, velocity and variety to require specific technology and analytical methods [20; p. 2]. In the medical area, few resources of big data like patient reports, hospital information, diagnostic tests, and data based smart medical devices form a giant value to future analysis [17; p. 1]. The big data obtained in the field of biomedical studies, of course, began with the arrival of comprehensive health services and the increase in patient data. The first field that greeted us about big data was health sciences. Big data analysis needs the most efficient and improved data analytical tool. Therefore, the algorithms or tools used for the analysis of data in big data analysis are located in the field of ML. AI is often used in family medicine, diagnosis -diagnosis, personalized medicine, findings obtained from different diseases, the discovery of new treatments, and the prediction of a healthier long life. Medical areas where AI is expected to be applied to functional use easily include genomic medicine, diagnostic imaging support, diagnosis and treatment support, and drug discovery [23; p. 313]. Two types of AI technologies are used in healthcare applications: machine learning (ML) and deep learning (DL) [14; p. 2]. An example of a deep learning system could be the output ‘cancer’ given the input of an image showing a cancer [24; p. 1]. ML and DL have accomplished impressive development nowadays and the success of artificial intelligence (AI) in the medical field has resulted in a significant growth in medical AI applications [26; p. 2] (Fig. 1). AI practices sort of deep learning and explainable artificial intelligence have played an important role in healthcare in recent years. Nevertheless the black-box structure of DL model limits the explainability of these models and limits their exact application in medicine [26; p. 4]. The correct use of artificial intelligence in medical depends not on the model, but on the explainability of the model. Therefore, there is a need for explainable artificial intelligence.

174

B. Oznacar and U. Kose

2 Methodology This study was conducted as a systematic literature review. The stages in the SLR method are stated below. SR aim to introduce all research addressing a particular question so that they give a equal and neutral summary of the literature [18; p. 381]. A systematic review is a scientific synthesis of evidence on a clearly presented topic that uses critical methods to identify, describe and evaluate research on the topic.

2.1

Research Questions

The research questions that the research addresses are: RQ1. What are big data and XAI activity between 2016 and 2022? RQ2. What research topics are covered? RQ3. What are the limitations of the current research? Regarding the limitations of the researches included in the systematic literature review: RQ3.1. What is the scope in the research? RQ3.2. Which databases were restricted in the research?

2.2

Research Process

The search process covers the screening of journal articles between 2016 and 2022. The search process was conducted with a keyword search of PubMed. It was researched with big data analytics and the concept of explainable artificial intelligence and 466 articles were reached. 321 were full-text accessible these. In the qualifying part, 20 articles were selected from among them. The articles were selected due to the topics of big data analytics and explainable artificial intelligence have found a place in health sciences. In addition, a word cloud was created from the keywords used in the research (Fig. 2).

2.3

Data Collection

The data extracted from each study were: The source, scope, research topic, country and year. The collection of data consisted of the above-mentioned processes.

Systematic Literature Review in Using Big Data Analytics and. . .

175

Fig. 2 Cloud of keywords in selected papers

2.4

Data Analysis

The data are presented in tabular form. The tables contain the following subheadings: (a) (b) (c) (d) (e)

What are the sources on which the data is based? What is the scope of the research? What is the research topic? The country to which the research is connected? In which year the research was conducted?

3 Discussion and Results The findings obtained from the research are shown in this section. The tables in the results section are divided into two. After the initial stage of the study selection process 20 papers were included. Opinions were shared on the selection criteria of the articles. After the data were extracted, their contents were checked and tabulated. The first table includes article code, year and country information (Table 2). The second table, on the other hand, covers the research topic and source.

176

B. Oznacar and U. Kose

Table 2 The year and country information in big data analytics and XAI applications in medical

3.1

Paper code PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10 PC11 PC12 PC13 PC14 PC15 PC16 PC17 PC18 PC19 PC20

Year 2022 2022 2018 2022 2022 2016 2020 2020 2019 2020 2020 2021 2020 2022 2016 2020 2019 2019 2021 2022

Country China China China China China USA China China Singapore China USA China China China India Iran Singapore Singapore China China

Context Results

As can be seen in Table 3, this part of the research constitutes the contents of the articles included in the research. In this context, the table below consists of subject, scope and source information. The articles are encoded with the letter “P”. Considering at the research topics of the selected articles, themes of “drug distribution (PC1), analysis, complication rates, diagnosis, diseases such as stress disorders, liver damage, respiratory tract (PC6, PC7, PC12), artificial intelligence and Health Sciences, seniors, health information, business processes, health care ethics (PC17, PC18), the environment (PC20), human personality” have been found. The research results showed that the selected articles were collected with the concept of big data analytics and explainable artificial intelligence was taken under the headings of deep learning or machine learning. It is understood that the journal named BioMed Research International has received studies on machine learning and big data analytics within the scope of the resource. In addition, the journal Wiley online library contains big data and deep learning. The journal Computational Intelligence and Neuroscience has also shown itself in the scope of big data. It should not be surprising to reach a wide range in the use, analysis and guidance of big data in the health sciences.

Systematic Literature Review in Using Big Data Analytics and. . .

177

Table 3 The research topic, scope and source of papers in big data analytics and XAI applications in medical Paper Code PC1

PC6

Topic Nano drug-Targeted Delivery Systems Fine-scale population spatialization Parallel graph-theoretical analysis package Influence Complication Rates on Clinical PKP Surgery Fault Diagnosis Strategy for Microgrid Oxidative Stress

PC7

Liver injury

PC8

Predicting hospital emergency room visits for respiratory diseases AI-Assisted Decision-making

PC2 PC3 PC4 PC5

PC9

PC10 PC11

Older people using real-world big data mining Electronic health record data

Scope Big Data

Source BioMed Research International

Big Data Big Data

Scientific Data Wileyonlinelibrary

Big Data

Journal of Healthcare Engineering Computational Intelligence and Neuroscience Environmental Health Perspectives BioMed Research International

Big Data Big Data Machine Learning Machine Learning Machine Learning and AI Big Data XAI Deep learning

PC13

Pulmonary ground-glass nodules by sequential computed tomography imaging Rice Genome Reannotation

PC14 PC15 PC16

SMEs’ Competitive Performance Optimizing Work Processes Functional networks

PC17 PC18 PC19

An Ethics structure for Big Data Ethics structure for Big Data Music Emotion Analysis

Big Data Big Data Big Data and Deep Learning Big Data Big Data Big Data

PC20

Human Personality and Geographical Environment

Machine Learning

PC12

Big Data

Environmental Science and Pollution Research Asian Bioethics Review

Age and Ageing Journal of the American Medical Informatics Association Wileyonlinelibrary

Genomics Proteomics Bioinformatics Information Systems Frontiers Nursing Informatics Frontiers in Neuroscience

Asian Bioethics Review Asian Bioethics Review Computational Intelligence and Neuroscience International Journal of Environmental Research and Public Health

178

3.2

B. Oznacar and U. Kose

Evaluation of the Studies

This section first address PC2 and PC9.They have stated that a toolkit for fast and scalable computational solutions is missing, especially in articles involving big data analysis. The focus has been on the ability to be computable and transparent with the transformation of artificial intelligence-based systems. It was emphasized that it is necessary to adapt artificial intelligence-supported support systems to clinical applications in an ethical and responsible manner. Secondly, prediction of hospital emergency room visits for respiratory diseases of great importance in terms of Public Health brought up the use of machine learning methods. Recently, it has been stated that machine learning methods are promising for predictions similar to this, given their strong short-term forecasting capabilities (PC8).

4 Conclusions The answers to the research questions are discussed in this section. What are big data and XAI activity between 2016 and 2022? The result rate encountered in the article search on the topic of big data and explainable artificial intelligence on a yearly basis is 2 in 2016, 1 in 2018, 3 in 2019. This ratio is 6 in 2020, 2 in 2021 and 6 in 2022. What research topics are covered? From the scope of the research in the article, Nano-targeted drug delivery systems, Parallel graph-theoretical analysis package, the impact of surgery, complication rate, clinical, diagnostic Strategy, oxidative stress, liver damage, Hospital emergency room visits for respiratory diseases, artificial intelligence-aided decision making, old people, and Electronic Health Record data, imaging and computed tomography pulmonary nodules frosted glass sequenced the rice genome functional networks, big data analytics, ethics and the problem of human personality and the geographical environment is observed. What are the limitations of the current research? The research has a fairly wide subject area. It aimed to concentrate on a specific area and at the same time categorize the area. in the articles, studies in which the use of big data analytics and explainable artificial intelligence come to the fore are selected. It is also limited to the last 6 years.

Systematic Literature Review in Using Big Data Analytics and. . .

179

What is the scope in the research? Considering at the scope of the research, we see the most working areas in big data analytics. Which databases were restricted in the research? The research is limited to the PubMed database.

References 1. Agrawal, D., Bernstein, P., Bertino, E., Davidson, S., Dayal, U., Franklin, M., . . . & Widom, J. (2011). Challenges and opportunities with Big Data 2011–1. 2. Ashrafimoghari, V. (2022). Big Data and Education: using big data analytics in language learning. arXiv preprint arXiv:2207.10572. 3. Buhl, H. U., Röglinger, M., Moser, F., & Heidemann, J. (2013). Big data. Business & Information Systems Engineering, 5(2), 65–69. 4. Cao, G., Tian, N., & Blankson, C. (2022). Big data, marketing analytics, and firm marketing capabilities. Journal of Computer Information Systems, 62(3), 442–451. 5. Deepa, N., Pham, Q. V., Nguyen, D. C., Bhattacharya, S., Prabadevi, B., Gadekallu, T. R., et al. (2022). A survey on blockchain for big data: approaches, opportunities, and future directions. Future Generation Computer Systems, 131, 209–226. 6. Duval, A. (2019). Explainable artificial intelligence (XAI). MA4K9 Scholarly Report, Mathematics Institute, The University of Warwick, 1–53. 7. Gandomi, A. H., Chen, F., & Abualigah, L. (2022). Machine learning technologies for big data analytics. Electronics, 11(3), 421. 8. Gerlings, J., Shollo, A., & Constantiou, I. (2020). Reviewing the need for explainable artificial intelligence (xAI). arXiv preprint arXiv:2012.01007. 1284–1293. 9. Gu, X. (2022). Evaluation of teaching quality on IP environment driven by multiple values theory based on Big Data. Journal of Environmental and Public Health, 2022, 1–11. 10. Hosseini, M., Wieczorek, M., & Gordijn, B. (2022). Ethical issues in social science research employing Big Data. Science and Engineering Ethics, 28(3), 1–21. 11. Islam, S. R., Eberle, W., Ghafoor, S. K., & Ahmed, M. (2021). Explainable artificial intelligence approaches: A survey. arXiv preprint arXiv:2101.09429. 12. Khan, M. A., & Khojah, M. (2022). Artificial intelligence and big data: The advent of new pedagogy in the adaptive e-learning system in the higher educational institutions of Saudi Arabia. Education Research International. 13. Li, J., Ye, Z., & Zhang, C. (2022). Study on the interaction between big data and artificial intelligence. Systems Research and Behavioral Science, 39(3), 641–648. 14. Loh, H. W., Ooi, C. P., Seoni, S., Barua, P. D., Molinari, F., & Acharya, U. R. (2022). Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022). Computer Methods and Programs in Biomedicine, 226, 107161. 15. Lv, H., Shi, S., & Gursoy, D. (2022). A look back and a leap forward: A review and synthesis of big data and artificial intelligence literature in hospitality and tourism. Journal of Hospitality Marketing & Management, 31(2), 145–175. 16. Mehta, H., & Passi, K. (2022). Social media hate speech detection using explainable artificial intelligence (XAI). Algorithms, 15(8), 291. 17. Namamula, L. R., & Chaytor, D. (2022). Effective ensemble learning approach for large-scale medical data analytics. International Journal of System Assurance Engineering and Management, (online first/in press). https://doi.org/10.1007/s13198-021-01552-7.

180

B. Oznacar and U. Kose

18. Nightingale, A. (2009). A guide to systematic literature reviews. Surgery (Oxford), 27(9), 381–384. 19. Oesterreich, T. D., Anton, E., Teuteberg, F., & Dwivedi, Y. K. (2022). The role of the social and technical factors in creating business value from big data analytics: A meta-analysis. Journal of Business Research, 153, 128–149. 20. Pablo, R. G. J., Roberto, D. P., Victor, S. U., Isabel, G. R., Paul, C., & Elizabeth, O. R. (2022). Big data in the healthcare system: A synergy with artificial intelligence and blockchain technology. Journal of Integrative Bioinformatics, 19(1), 1–16. 21. Tang, L., Li, J., Du, H., Li, L., Wu, J., & Wang, S. (2022). Big data in forecasting research: A literature review. Big Data Research, 27, 100289. 22. Tjoa, E., & Guan, C. (2020). A survey on explainable artificial intelligence (xai): Toward medical xai. IEEE transactions on neural networks and learning systems, 32(11), 4793–4813. 23. Tsuneki, M. (2022). Deep learning models in medical image analysis. Journal of Oral Biosciences, 64, 312–320. 24. Van der Velden, B. H., Kuijf, H. J., Gilhuijs, K. G., & Viergever, M. A. (2022). Explainable artificial intelligence (XAI) in deep learning-based medical image analysis. Med. Image Anal., 102470, 1–21. 25. Wu, R. (2022). Path of preschool education personnel training under the background of Big Data. International Journal of Educational Innovation and Science, 3(1), 1–9. 26. Zhang, Y., Weng, Y., & Lund, J. (2022). Applications of explainable artificial intelligence in diagnosis and surgery. Diagnostics, 12(2), 237, 1–18. 27. Zhang, R., Zhou, J., Hai, T., Zhang, S., Iwendi, M., Biamba, C., & Anumbe, N. (2022). Quality assurance awareness in higher education in China: big data challenges. Journal of Cloud Computing, 11(1), 1–9.

Using Explainable Artificial Intelligence in Drug Discovery: A Theoretical Research Bekir Aksoy, Mehmet Yücel, and Nergiz Aydin

1 Introduction Drug discovery is a process that starts with the ability to modify a compound to affect a disease and takes place in a series of stages. It takes one in 10,000 hits to discover and deliver a drug, a process that takes 14 years and costs around 2 billion dollars. This can be called a challenging process in terms of its long duration and high cost. The priority in drug discovery is to be fast and effective. However, the process shows weakness in terms of speed. In this respect, it is very important to address and investigate the issue and find a solution to the problem so that the treatment of diseases can be done on time. In this book chapter, drug discovery and its integration with artificial intelligence technologies, one of today’s advanced technologies, are examined. Artificial intelligence technology, as a powerful data mining tool through the development of machine learning theory and the accumulation of pharmacological data, is frequently used in various areas of drug design, such as virtual screening, activity prediction, QSAR analysis, nova drug design and in silico evaluation of absorption, distribution, metabolism, excretion and toxicity (ADME/T) properties [1].

B. Aksoy (✉) · M. Yücel · N. Aydin Faculty of Technology, Department of Mechatronics Engineering, Isparta University of Applied Sciences, Isparta, Turkey e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Hossain et al. (eds.), Explainable Machine Learning for Multimedia Based Healthcare Applications, https://doi.org/10.1007/978-3-031-38036-5_10

181

182

1.1

B. Aksoy et al.

A General Introduction to the Drug Discovery Sector

Countries primarily need healthy societies in order to develop economically and socially. Drug discovery and production, which is one of the most important factors in the creation of a healthy society, should be carried out in accordance with the rules. The timely delivery of the needed medicines to all living beings is one of the primary responsibilities of states. The ability of a society to meet its health needs is directly related to the economic competence of that society. The budget allocated for health in developed countries is much higher than in developing countries. Chronic diseases have increased with the increasing average life expectancy in the world. As a result of the ever-changing and diversifying types of diseases, the pharmaceutical industry allocates more time and budget to the discovery and production of innovative drugs. The pharmaceutical industry is a global industry with the highest research and development (R&D) potential. When R&D in the pharmaceutical industry is compared to research in other industries, it is seen to have different characteristics. One of the main features that makes the pharmaceutical industry special and different is that human participatory analysis is carried out during the clinical research phase. R&D in the pharmaceutical industry is a long and costly process that includes studies such as finding new molecules, determining new uses for existing molecules, and redesigning a drug with side effects. Today, billions of dollars are spent on the discovery of a molecule and it takes more than a decade to bring it to market.

2 Stages of Drug Discovery Drug discovery phases are carried out according to the following process steps together with analysis. Drug discovery is a complex and costly process where drugs are designed or discovered. Drug discovery involves fields such as chemistry, biology, pharmacology and clinical sciences. As shown in Fig. 1, drug discovery consists of five main stages [2]. In the target identification phase, many techniques are used to identify a target for a specific disease and determine its function. The target validation phase involves selecting one or more targets among many that are likely to be useful in the development of new treatments for the identified disease. In the lead compound identification phase, scientists must compare known substances with new compounds to determine the probability of success. The lead compound optimization phase then determines the best compound based on the results from the previous phase. In the final phase, the discovery is brought to the market [3]. Drugs often act on cellular or genetic chemicals in the body known as targets, which are believed to be associated with various diseases. In the target identification phase, a number of techniques are used to locate and isolate a target, learn more about its functions and prove how these functions affect diseases. Then, in the target

Using Explainable Artificial Intelligence in Drug Discovery: A. . .

183

Fig. 1 Stages of drug discovery [2]

validation phase, scientists need to compare each drug target by analyzing its relationship with a specific association to select targets that could be useful in the development of new drugs. Tests are performed to verify the interactions between the drug target and a desired change in the behavior of diseased cells. In the final step, research scientists identify compounds that act on the target [4, 5].

2.1

Using Artificial Intelligence in Drug Discovery Phases

The drug discovery process ranges from screening and analyzing existing literature to testing the ways in which potential drugs interact with targets. The preclinical development phase of drug discovery involves testing potential drug targets in animal subjects. Using artificial intelligence at this stage can help trials run smoothly. It is thought to enable researchers to more quickly and successfully predict how a drug might interact in animal subjects. Once a drug has gone through preclinical development and received approval from the United States Food and Drug Administration (FDA or USFDA), researchers begin testing it on humans. In

184

B. Aksoy et al.

Fig. 2 Drug discovery stages [6]

general, this is a four-step process and is generally recognized as the longest and most expensive part of the manufacturing phase [4]. A representative drug discovery step is shown in Fig. 2 [6].

3 What Is Explainable Artificial Intelligence? Explainable AI (Explainable AI or XAI) can be defined as techniques and methods that enable the results obtained by artificial intelligence to be explained at a level that can be understood by humans. Explainable AI has recently been frequently used in the literature with the concepts of interpretability and explainability. Interpretability is the ability to make sense of a given model to a human observer. This property is also referred to as transparency. In contrast, explainability can be seen as an active property of a model and refers to any action or procedure performed by the model in order to explain or elaborate its internal functions. Figure 3 shows the distribution of research on explainable artificial intelligence by years [7]. When Fig. 3 is analyzed, it is observed that the research topics related to explainable artificial intelligence have increased. One of the issues preventing the establishment of common grounds for explainable artificial intelligence is the problems arising from the interchangeable use of the concepts of interpretability and explainability in the literature. It is very important not to confuse the frequently used concepts related to explainable artificial intelligence and to use them correctly in

Using Explainable Artificial Intelligence in Drug Discovery: A. . .

185

Fig. 3 Distribution of academic studies on explainable artificial intelligence by years [7] Table 1 Concepts used in explainable artificial intelligence Concept Transferability Understandability Completeness Accuracy Reversibility Robustness Plausibility Causality Consistency Simplicity Defensibility Transparency Interpretability

Definition The capacity of the explicable method to transfer prior knowledge to unfamiliar situations Quality of language used for clarity The degree to which an inferential system is explained by explanations The capacity of the explainability method to select truly relevant features The ability of end-users to restore the system to its original state after it has been subjected to a malicious action that worsens their predictions Stability of the explainability model to small perturbations of the input The explainability capacity of the method to persuade users to perform certain actions Potential of the method to clarify the relationship between input and output Success in providing similar disclosures for similar or neighboring inputs The ability to select only those reasons that are necessary and sufficient to explain the model’s prediction The subject matter expert can assess whether the model is compatible with domain knowledge The capacity to explain how the system works even when it behaves in an unexpected way Capacity to provide or produce the meaning of an abstract concept

order to understand the subject. In Table 1, frequently encountered concepts related to explainable artificial intelligence are given with their definitions [8]. Explainable AI has emerged due to the low explainability of high-performance results in AI. In particular, the costs of bad outcomes caused by wrong decisionmaking have led to limited use of AI. These costs can be financially significant in the financial sector and of vital importance in health and law. The need for the

186

B. Aksoy et al.

Fig. 4 The inverse relationship between interpretability and accuracy in artificial intelligence algorithms [10]

emergence of explainable artificial intelligence varies in proportion to the needs of the field in which it is used. The fact that the results of the application in question are applied in critical areas such as finance, law and medicine increases the need for explainability [9]. In order to increase accuracy in artificial intelligence applications, models such as neural networks with a more complex structure have started to be used. Figure 4 shows that while complex models increase accuracy, they are insufficient to explain the reasons for the results [10]. For this reason, they turn into structures called black boxes that contain very little information about their content. People do not adopt techniques that cannot be directly interpreted, followed and trusted [7]. Thus, high accuracy models are not sufficient and the importance of explaining the causes of the results has increased. Thus, the interest in the study of explainable artificial intelligence has started to increase over time. As can be seen from the graph in Fig. 4, early AI systems were easily interpretable, but in recent years, with the introduction of more complex structures such as Deep Neural Networks (DNNs), interpretability has declined. The main reason for the decline in interpretability is that DNNs contain hundreds of layers and millions of parameters, which makes them considered as complex black box models [11].The inverse relationship between interpretability and accuracy in algorithms [10]. As can be seen from the graph in Fig. 4, while early AI systems were easily interpretable, in recent years, with the introduction of more complex structures such as Deep Neural Networks (DNNs), interpretability has decreased. The main reason for the decline in interpretability is that DNNs contain hundreds of layers and millions of parameters, making them complex black box models [11].

Using Explainable Artificial Intelligence in Drug Discovery: A. . .

3.1

187

The Importance of Explainable Artificial Intelligence in Drug Discovery

The discovery of a new drug is a challenging, expensive and lengthy process. One of the main goals is to simplify the drug discovery steps, reduce costs and speed up drug discovery. While the drug discovery process proceeds with traditional methods, it is very difficult to achieve the desired goals. For this reason, methods other than traditional methods in drug discovery have emerged as a research topic. Today, artificial intelligence has started to be applied in many fields and it is predicted that its application areas will increase in the future [12]. Thus, the idea that artificial intelligence can be a solution to the problems encountered in drug discovery phases has emerged. The use of artificial intelligence in the pre-clinical development phase of drug discovery, where potential drug targets are tested in animal test subjects, is thought to help the trials run smoothly. According to Insider Intelligence’s Artificial Intelligence in Drug Discovery and Development report, the use of artificial intelligence can reduce drug discovery costs for companies by up to 70% [1]. Based on the fact that artificial intelligence is a fast-working system and the high cost rates mentioned, hopes have increased that the use of artificial intelligence in drug discovery can alleviate the difficulties encountered in terms of cost and speed. Thus, studies have increased in order to make progress in drug discovery using artificial intelligence. However, although the use of artificial intelligence is promising, some of the problems that have occurred have prevented the models from being used in real life. Among these drawbacks is the desire to use models with very high accuracy in a vital issue such as the health sector. Complexity increases in models with high accuracy. As the complexity increases, the results of artificial intelligence turn into a black box. In such a serious field, unanswered question marks are not accepted and there are hesitations about the use of artificial intelligence. In order to find a solution to the black box problem of artificial intelligence, the use of explainable artificial intelligence in drug discovery is considered as an opportunity. Explainable artificial intelligence appears to be a safer way to overcome the weaknesses in drug discovery where the desired results cannot be obtained from artificial intelligence.

4 Academic Studies in Drug Discovery It is thought that the use of explainable artificial intelligence can be useful in reducing the inadequacies that arise with the use of artificial intelligence in drug discovery. Thus, they think that explainable artificial intelligence can help scientists navigate the scientific process [13]. When the academic literature is examined, Jiménez-Luna et al. made explanations about the use of explainable artificial intelligence in drug discovery in their study. They underlined that it is quite difficult to express infallible mathematical models of drug effect and the corresponding

188

B. Aksoy et al.

explanations as equations. In this context, it was emphasized that explainable artificial intelligence has the potential to augment human intuition and skills to design new bioactive compounds with desired properties [14]. In another study, Harren et al. expressed their opinion that it is quite difficult to understand which structural features of intrinsic black box characters are important for activity. They applied the explainable artificial intelligence method to study and compare wellestablished structure-activity relationships and available X-ray crystal structures and precursor optimization datasets. Thus, they stated that DNN models can be combined with some powerful interpretation methods to obtain easy-to-understand and comprehensive interpretations [15]. In another study, Jiménez-Luna et al. tried to improve modeling transparency for rational molecular design by applying the en-integrated gradients explainable artificial intelligence approach for graph neural network models. Explainable AI methods highlight molecular features and structural elements that are in agreement with known pharmacophore motifs and accurately identify feature differences. They also provided insights into non-specific ligand-target interactions [16]. In their study, Espinoza et al. developed a new model to respond to the need for compounds with novel targets or mechanisms of action [MOA] in the face of rising antimicrobial resistance. The model is based on a human-interpretable artificial intelligence classification framework that can be used to build accurate and flexible prediction models. In the study, they performed the testing process with 41 known antibiotic compounds, 9 crude extracts and darobactin, a compound with novel MOA activity. They developed a clairvoyance algorithm that allows us to select a feature to build high-per-performance explainable machine learning models [17]. In their study, Al-Taie et al. developed a method for patient classification and drug re-co-localization using explainable machine learning. In their method, they used contrast model mining and network analysis to discover homogeneous subgroups within the disease population. The phenotypic and genotypic data of patients were used with a heterogeneous knowledge base, as it provides a versatile perspective to find new indications for drugs beyond their original use. The developed method found 130 colorectal cancer subgroups that were statistically significant [18].

5 Conclusions Today, with the rapid development of technology, artificial intelligence technology is applied in many fields such as engineering, defense industry and education. One of the important areas of use of artificial intelligence is the health sector. Especially in the health sector, artificial intelligence methods are frequently used in the diagnosis and treatment of diseases. Another area of the health sector is the drugs used in the treatment of diseases. Recently, the low speed and high costs in drug discovery stages bring some difficulties for drug discovery.

Using Explainable Artificial Intelligence in Drug Discovery: A. . .

189

The use of artificial intelligence methods has reduced the difficulties in drug discovery. However, the inability to interpret the highly accurate results obtained from artificial intelligence models has limited their use in a field where human life is important, such as the health sector. In order to overcome these limitations, explainable artificial intelligence methods have started to be used. In this study, an academic literature review was conducted for explainable artificial intelligence methods used in drug discovery. When the academic studies are analyzed, it is seen that especially explainable artificial intelligence has significantly reduced the costs in drug discovery and thus enabled the development of a faster process in the treatment of diseases in public life.

References 1. Çelik, İ. N., Arslan, F. K., Ramazan, T. U. N. Ç., & Yildiz, İ. (2021). İlaç Keşfi ve Geliştirilmesinde Yapay Zekâ. Journal of Faculty of Pharmacy of Ankara University, 45(2), 400–427. 2. Duman, Y. E. (2014). Biyoinformatik ve İlaç Keşfi, academica.eu, 1–6. 3. Ratti, E., & Trist, D. (2001). Continuing evolution of the drug discovery process in the pharmaceutical industry. Pure and Applied Chemistry, 73(1), 67–75. 4. Uysal, İ., & Köse, U. (2022). ilaç keşfi ve yapay zeka, yapay zekanın değiştirdiği dinamikler, Eğitim Yayınevi, 19–35. 5. Katsila, T., Spyroulias, G. A., Patrinos, G. P., & Matsoukas, M. T. (2016). Hedef tanımlama ve ilaç keşfinde hesaplamalı yaklaşımlar. Hesaplamalı ve yapısal biyoteknoloji dergisi, 14, 177–184. 6. Chan, H. S., Shan, H., Dahoun, T., Vogel, H., & Yuan, S. (2019). Advancing drug discovery via artificial intelligence. Trends in Pharmacological Sciences, 40(8), 592–604. 7. Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., et al. (2020). Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. 8. Terzi, R. (2021). Sağlık Sektöründe Açıklanabilir Yapay Zekâ, Yapay Zekâ ve Büyük Veri Çalışmaları, Siber Güvenlik ve Mahremiyet, Nobel Yayınevi, 157–175. 9. Pehlivanlı, A. Ç., & Deliloğlu, R. A. S. (2021). Hibrit Açıklanabilir Yapay Zeka Tasarımı ve LIME Uygulaması. Avrupa Bilim ve Teknoloji Dergisi, (27), 228–236. 10. Sağıroğlu, Ş., & Demirezen, M. U. (Eds.). (2022). Yorumlanabilir ve Açıklanabilir Yapay Zeka ve Güncel Konular, Nobel Yayınevi, (4), 261. 11. Castelvecchi, D. (2016). Can we open the black box of AI? Nature News, 538(7623), 20. 12. Karaduman, T. (2019). Yapay zekâ uygulama alanlari. Gazi Universitesi, Bilisim Enstitusu, Adli Bilisim. ABD. 13. Askr, H., Elgeldawi, E., Aboul Ella, H., Elshaier, Y. A., Gomaa, M. M., & Hassanien, A. E. (2023). Deep learning in drug discovery: An integrative review and future challenges. Artificial Intelligence Review, 56, 5975–6037. 14. Jiménez-Luna, J., Grisoni, F., & Schneider, G. (2020). Açıklanabilir yapay zeka ile ilaç keşfi. Nature Machine Intelligence, 2(10), 573–584. 15. Harren, T., Matter, H., Hessler, G., Rarey, M., & Grebner, C. (2022). Interpretation of structure– activity relationships in real-world drug design data sets using explainable artificial intelligence. Journal of Chemical Information and Modeling, 62(3), 447–462. 16. Jiménez-Luna, J., Skalic, M., Weskamp, N., & Schneider, G. (2021). Color-ing molecules with explainable artificial intelligence for preclinical relevance assessment. Journal of Chemical Information and Modeling, 61(3), 1083–1094.

190

B. Aksoy et al.

17. Espinoza, J. L., Dupont, C. L., O’Rourke, A., Beyhan, S., Morales, P., Spoering, A., et al. (2021). Predicting antimicrobial mechanism-of-action from transcriptomes: A generalizable explainable artificial intelligence approach. PLOS Computational Biology, 17(3), e1008857. 18. Al-Taie, Z., Liu, D., Mitchem, J. B., Papageorgiou, C., Kaifi, J. T., Warren, W. C., & Shyu, C. R. (2021). Explainable artificial intelligence in high-throughput drug repositioning for subgroup stratifications with interventionable potential. Journal of Biomedical Informatics, 118, 103792.

Application of Interpretable Artificial Intelligence Enabled Cognitive Internet of Things for COVID-19 Pandemics Joseph Bamidele Awotunde , Rasheed Gbenga Jimoh , Abidemi Emmanuel Adeniyi , Emmanuel Femi Ayo , Gbemisola Janet Ajamu , and Dayo Reuben Aremu

1 Introduction The coronavirus (COVID-19) epidemic that started in December of 2019 in Wuhan, China has had a devastating impact on the entire biosphere. The pandemic is one of the most rapidly spreading contagious viruses in recent years, posing a new threat to the global healthcare system. The estimated infection cases globally as at 18th August, 2021 have reached 209,670,370 confirmed cases, including 4,399,468 deaths, and the active cases stands at 17,343,556 across 213 countries. These have been steadily increasing every hour in numerous countries. Notwithstanding Notwithstanding the astonishing speed with which vaccinations against COVID-19 have been developed and aggressive global mass vaccination efforts, the emergence of these new SARS-CoV-2 mutations could jeopardize the tremendous success

J. B. Awotunde (✉) · R. G. Jimoh · D. R. Aremu Department of Computer Science, Faculty of Communication and Information Sciences, University of Ilorin, Ilorin, 240003, Nigeria e-mail: [email protected]; [email protected]; [email protected] A. E. Adeniyi Department of Computer Science, Precious Cornerstone University, Ibadan, Nigeria e-mail: [email protected] E. F. Ayo Department of Mathematical Sciences, Olabisi Onabanjo University, Ago-Iwoye, Ogun State, Nigeria e-mail: [email protected] G. J. Ajamu Department of Agricultural Extension and Rural Development, Landmark University, Omu Aran, Nigeria e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Hossain et al. (eds.), Explainable Machine Learning for Multimedia Based Healthcare Applications, https://doi.org/10.1007/978-3-031-38036-5_11

191

192

J. B. Awotunde et al.

achieved thus far in controlling the spread of this viral infection immunization efforts. In this global health catastrophe, medical experts and researchers are looking for new tools to hunt for and stop the spread of the pandemic [1]. In order to provide appropriate patient care, treatment, and isolation to avoid disease containment, quick surveillance of influenza virus is critical not just for medical practitioners, but also for community health in general point of view [2]. The advanced computational research like 5G technology, Cloud Computing, Edge Computing, Fog Computing, Artificial Intelligence (AI), and Internet of Things (IoT) in this scenario is the recent smart technologies that can be applied to fight and combat major clinical challenges associated with COVID-19 pandemic [1, 3]. The IoT connects physical items to the Internet, allowing data to be sent or retrieved. Sensors, ML, real-time measurement, and embedded systems have all grown into and the IoT notion, and the concept of a smart hospital and other Internetcontrolled fixed or wireless devices. Smart gadgets can collect data and communicate it in everyday life to complete the desired objective. IoT applications connect intelligent cities, vehicles, computers, Surround sound speakers, houses, and integrated medical are just a few examples. For IoT adoption in the medical world, sensing devices, hospital instruments, machine intelligence, diagnostics, and sophisticated imaging devices are required. These technologies boost efficiency as well as life satisfaction in both traditional and new sectors and civilizations. The core idea underlying the Cognitive Internet of Things (CIoT) is to dynamically assign radio channels for data transmission between densely linked things. This CIoT concept is well suited to this pandemic since it is critical to connect and track every individual across a large network. Due to the worldwide lockdown and limitation of movement and crowds, most operations such as e-commerce, e-learning, smart metering, e-surveillance, ehealth, and telecommunication solutions are not accessible. These operations are made possible via wireless communication and networking, both of which require a large amount of bandwidth. The huge CIoT network transmits small packets by opportunistically scanning idle channels, conserving bandwidth and effectively using spectrum resources. The initial AI systems were easy to understand, but in recent years, the growth it has been observed that Deep Neural Networks (DNNs) transparent decision processes. Deep Learning (DL) models, such as DNNs, have achieved empirical success due to a grouping of superior learning techniques and their huge parameter domain. DNNs are a difficult black box to represent in this area since it includes hundreds of layers and millions of attributes [4, 5]. The pursuit of complete understanding of how a typical works is the polar differing of a black-box model [6]. Just as black-box ML algorithms are progressively being used in critical situations to create large forecasts, some AI proponents are advocating for greater openness [7]. The danger is in making and enforcing decisions that are not reasonable, legitimate, or merely do not permit for extensive justifications actions [8]. Clarifications in targeted therapy, sustaining a model’s performance is crucial, as professionals necessitate considerably more classic knowledge than a modest binary forecast to support their conclusions [9]. Other applications for self-driving automobiles include transportation, security, and finance. Humans are notorious for

Application of Interpretable Artificial Intelligence Enabled. . .

193

being resistant to follow strategies. However, several contributions claim designs that are interpretable and procedures that facilitate explanability have been accomplished. Gunning’s in [7], a starting point for the notion of Explainable Artificial Intelligence (XAI) may be useful in explaining this lack of consensus. XAI will create a collection of ML approaches that will help consumers to comprehend, believe in, and effectively manage the evolving era of artificially intelligent associates. This word combines two definitions that must be addressed ahead of time (understanding and trust). However, additional factors that drive the need for explainable AI models, such as trust, justice, causation, and ease of transferability, are not considered [10–12]. As illustrated by the definition above, a detailed full understanding of explanability in AI can occasionally escape our grasp. “An Artificial Intelligence that explains its operations this one produces solutions,” says a broader re-formulation of this concept. This does not fully reflect the definition; crucial factors such as the objective are left out. To expand on the completeness, a description of explanation is required initially. Someone’s explanations or motivations for making something simple or easy to understand are a translation [13]. This might be reshaped in the context of an ML typical as the specifics or justifications provided by the model to ensure that its functioning is faultless and easy to understand. Medical work necessitates a high level of precision and accuracy. To explain the consistency of system parameters and projections, as well as opinions, are necessary. This necessitates improved interpretability, which also necessitates an understanding of the algorithms’ processes. Regrettably, the DL’s BlackBox presence remains a mystery, and many system decisions are still poorly understood. The models related to this challenge fall under the XAI category, which is broadly recognized as a critical feature for the practical use of AI models, especially during the COVID-19 epidemic. However, the ML and DL algorithms have two significant drawbacks: first, training necessitates a huge COVID-19 dataset with a variety of characteristics, which will be difficult to obtain; and Training demands a large COVID-19 datasets with a diversity of characteristics; secondly, deep learning discoveries demand ethics approval and explanation by the health care industry, as well as other stakeholders, in order to be acknowledged. Therefore, this chapter discusses the relevance and potential of XAI-enabled CIoT in the context of the current epidemic. The chapter also covers the difficulties that XAI-enabled CIoT faces in the healthcare system. This chapter also proposes a structure upon which covid-19 diagnosis can be built XAI enabled.

194

J. B. Awotunde et al.

2 Applications of Explainable Artificial Intelligence in COVID-19 Pandemics Recent applications of ML in the medical area have gained prominence, and neural network performance is becoming comparable to that of medical specialists [14]. Due to the need for a relatively precise and rapid diagnosis procedure, AI can play a critical role in optimizing the discovery of COVID-19 cases [15, 16]. Because AI solutions are usually built on complicated, so-called blackbox models [17], it might be challenging to determine what issues influence a model’s forecast. This deficiency of interpretability could be harmful, since it could lead to skewed outcomes and choices in real-world diagnostic procedures [18]. Recent progress in the field of XAI has highlighted the necessity of model clarifications in avoiding incorrect predictions [19, 20]. Surprisingly, there are still few outcomes in the field of COVID-19 prediction and diagnosis involving the application of XAI for better interpretation of the analysis. The phrase “interpretability” rather than “explainability” is frequently used in the ML literature. However, according to references in [21, 22], interpretability is unsatisfactory because it does not handle all of the challenges that might occur when attempting to comprehend “black-box” methods. Explainability is more important than simple interpretability in order to gain user trust and get major knowledge about the causes, rationale, and judgements of “black-box” procedures. Although explainable systems are interpretable by default, this is not the case necessarily the case. T Transparent models can be simply explained, whereas opaque replicas involve post-hoc procedures to be understood. The majority of current XAI research is still focused on sensitivity analysis [23], feature relevance propagation and attribution on a layer-by-layer basis [24], LIME’s local pseudo-explanations [25], Shapley multiplicative theories based on game theory [26], Grad-CAM [27] or surrogate models are examples of gradient-based localisation (Fig. 1). LIME is a model-independent method for producing locally optimum ML model descriptions [25]. To understand the local behaviors of forecasts from a globe “black box” approach, LIME trains an interpretable surrogate model. An input picture is broken down into regions of consecutive superpixels (i.e., a picture entity) for image classification, after which a The scaled localised model is taught using a fresh collection of subset occurrences of the actual picture (certain superpixels are grayscaled). The idea is that by modifying human-understandable parts of the input data (spatial entities) and understanding the variations between those adjustments and the actual facts, it is possible to determine what aspect of the input contributes to each given class. Though, if the parameters that regulate the perturbations are chosen exclusively on the basis of heuristics, on a personal level, these arguments are not always instructive or believable. Despite their benefits, the methods described above do not offer clear, intelligible instructions explanations. They “just touch the “black box’s cover,“ with the goal of “destructive minimization” by providing post-hoc suggestions regarding attributes (property distribution) or locations within a picture. This is vastly diverse from how

Application of Interpretable Artificial Intelligence Enabled. . .

195

Fig. 1 The ontology of explainable intelligence techniques

humans think and make decisions, form associations, compare and contrast, and draw analogies that may be communicated in litigation or to another specialist (for example, in health, economics, law, or another field). The approaches described above do not provide answers to the underlying issues about model structure and properties that are relevant to the problem is actually nature and entirely disregard logic. In [28], the authors suggested a fundamentally different approach to explainability, treating it as humanistic (human-centered) phenomenon rather than a statistical phenomenon. Humans do, in fact, compare items (such as photographs, songs, and movies) as a whole, rather than per feature or pixel. People correlate new data with previously learnt and consolidated prototypes using similarity, whereas statistics is based on averages [29, 30]. Because of the prevalence and relevance of algorithms in applications, regulators and government agencies have developed policies to ensure that algorithmic decision-making is held to a higher standard. Although the scope of this right is debatable, the debate has emphasized the need for automated methods to eliminate inequity and prejudice in decision-making. In addition, in safety-critical tasks, they must meet the standards for safety and security. As a result, there has been a recent surge in interest in XAI models in a variety of fields. It was recently reported that XAI has been used in a variety of important domain applications, including medical [31], automated driving [32], and the criminal justice system [33]. There is an increasing demand for Algorithms in the medical field, most significantly during the COVID-19 epidemic. AI applications, on the other hand, must not only do well in regards of categorization metrics, but also in terms of accuracy, and especially for clinical decision-making, they must be trustworthy, clear, interpretable, and explainable [31].

196

J. B. Awotunde et al.

In [34], the authors provided a simple DL method for COVID-19 discovery utilizing computed tomography (CT) images, for example. In terms of accuracy, F1 score, and other statistical performance metrics, the suggested strategy was claimed to outperform standard DL systems such as ResNet [35], GoogleNet [36], and VGG-16 [37], But, more importantly, this method is built on prototypes, in this case a CT scan that a radiologists can understand. CT images of subjects who had and did not have COVID are used to create the models. This method can easily be extended to incorporate other classes, like “moderate” or “extreme” COVID, or to superpixel levels, as in [38]. In addition, the suggested deep neural network has a well-defined and understandable design (each layer has a distinct significance, and visual pictures of CT scans are used to help conceptualize the decision.). the authors in [39] suggested an explainable DeepDream technique in which a neuron’s activation is maximized by optimizing the slope ascension of a picture. The approach generates output curves that indicate how the characteristics evolve during the maximize process. This improves the neural network’s visibility and interpretability, and it was used to separate tumors from CT scans of the liver [39]. The criminal justice system is another example of XAI in action. Automated algorithms are being utilized to predict who is most likely to be involved in crime in particular nations, like the United States significant crime, who is most likely to skip a court date, and who is highest probable to commit additional crime in the future [33]. Corrective Offender Management Profiling for Alternative Sanctions is one such often utilized unlawful hazard analysis approach (COMPAS). While the COMPAS data does not include a person’s contest, other parts of the data may be associated with race, resulting in racial biases in the projections. As a result, clarifications of such crucial decisions are required in order to promote equality and eliminate bigotry during the decisionmaking process [33]. Prototype-based algorithms, as discussed by [40], can be used to reduce bias and favor fairness because the replicas derived can be checked and balanced to ensure a more equitable decision. Furthermore, the approach includes human-explainable rules to aid decision-making by specialists.

3 Cognitive Internet of Things for COVID-19 Pandemics The ability to quickly monitor a It is vital not just for healthcare practitioners, but also for public health officials, to ensure adequate patients separation and disease prevention confinement [41]. Advanced computational research such as IoT and AI are contemporary digital innovations that can be leveraged to solve important clinical challenges associated with COVID-19 in this study situation [42–44]. Recent improvements in this digital world of current technology, IoT in 5G network architecture, AI and DL approaches, big-data analysis, cloud services, Industrial 4.0, as well as block-chain innovation, can provide long-term answers to the COVID-19 problem outbreak [45–47]. These instruments may assist in illness identification and treatment, as well as minimizing its dissemination. These

Application of Interpretable Artificial Intelligence Enabled. . .

197

networked gadgets can assist in gathering real-time data from individuals in remote regions via the IoT; AI and big-data analytics are used to analyse, interpret, forecast, and make decisions; cloud computing is used to back up the data; and blockchain technology enhances this for secure data networks. CIoT is one such technology that allows any physical entity on the planet to actively connect and share data while adhering to Quality of Service (QoS) standards. CIoT is an abbreviation for Cognitive Radio (CR) powered IoT, which allows machine-to-machine interaction a growing network of wireless devices [48]. The CR-based dynamically relay selection method is the best way to deal with a tremendous amount of gadgets activities. The cognitive radio-based CIoT is an endowed technology for the effective exploitation of finite spectrum to address this ever-increasing bandwidth requirement [49]. The core idea of the Internet of Everything is to automatically assign radio frequencies for data transmission between devices densely connected things. Because everyone will be linked and controlled over a huge network, the CIoT concept is most adapted to this pandemic. Due to the global lockdown and restrictions on movement and crowding, the majority of activities, such as e-commerceWeb transactions include e-learning, smart metering, e-surveillance, ehealth, and telehealth are all examples of emerging technologies [4]. Cordless contact and connectivity, which demand throughput, make these processes possible. The massive CIoT network sends tiny packets by arbitrarily seeking vacant channels, saving capacity and effectively using frequency resources [50, 51]. It’s difficult to keep track of your psychological, behavioral, and bodily conditions, particularly during an outbreak, and then use that data within the emerging infectious diseases cooperation and collaboration. Recent IoT improvements have demonstrated promising outcomes when it comes to collecting a variety of emotional and physical wellbeing data from the household [52]. The ML and DL models may run in a resource-constrained edge environment, allowing CIoT device data to be analyzed directly at the edge and inferences about in-home health to be made [53, 54]. This permits health data to remain close to the user edge while maintaining the higher order thinking system’s privacy, confidentiality, and low latency [1, 53]. Smart health care monitoring is a new IoT application that is particularly useful during the COVID-19 pandemic [1]. Smart health care monitoring is a new IoT application that is particularly useful during the COVID-19 pandemic. Instead of depending on established treatment approaches, this technology enables significant breakthroughs in COVID-19 management. This condition is being identified, monitored, recorded, and managed in real time, including new instances of the illness on a daily basis [55]. So the entire neighborhood is affected, it is difficult for a few persons to manage the problem unless real-time data is accessible. CIoT facilitates the combination of sensory data, automated processing, and network connectivity [56, 57]. In this approach, CIoT can be used to combat COVID-19 in a variety of ways. In terms of COVID-19 strategic planning, IoT is widely utilized in offering on-line medical services to patients, acquiring adequate healthcare, and screening at home/at a quarantine center. It can also serve as a medical platform for database administration that are important to government and healthcare organizations. Figure 3 displayed the basic areas of application of CIoT in fight against COVID-19 outbreak (Fig. 2).

198

J. B. Awotunde et al.

Fig. 2 Areas of application of the Cognitive Internet of Things (CIoT) for fighting COVID-19 outbreak

3.1

Rapid Diagnosis

Tourists and presumed patients are confined in segregation, regardless of whether they’re detained show no disease manifestations, thus early detection is critical [58]. Through particular network applications, CIoT enables such people with trip histories to interact with hospital facilities in order to obtain a quick diagnosis with minimum inaccuracy. The lab personnel can take X-rays or CT scans directly from the central station through real-time streaming of video, which may then be evaluated by AI-enabled video cameras, allowing for faster diagnosis and validation of the problem. This also enables for contact-free and rapid viral transmission diagnosis [59, 60].

Application of Interpretable Artificial Intelligence Enabled. . .

3.2

199

Contact Tracing and Clustering

To stop the pandemic from spreading further, it’s critical to track down the confirmed cases’ contacts [61]. This time-consuming task can be made easier if the COVID positive patient’s location history is readily available in the database, and the healthcare authorities have access to this data [62]. Based on the number of confirmed instances, area-wise grouping and categorizing the regions as confinement zones, safeguard zones, red zones, orange zones, green zones, and so on can be updated fast using CIoT [63]. When medical and health-care units are connected through IoT, The variety of beneficial events per region may be gathered in real time. The authorities can get this statistics and issue a notification for medical checks in the affected area, and this can be accomplished quickly using a Performance of the structure [64]. The zone aggregation also allows the government to enforce different shutdown and social distancing regulations and directives.

3.3

Prevention and Control

Individual alertness, as well as timely intervention by healthcare and government agencies, can help to control the virus’s spread [65]. Using apps like Arogya Setu, which are popular in India [66], the CIoT allows users to learn about positive cases in their area and be alerted.

3.4

Screening and Surveillance

At numerous entry points at airports, train stations, hostels, and other locations, thermoelectric microscopy facial recognition data can be collected, and accessed using CIoT by the healthcare authorities and the public for surveillance and screening purposes [67]. This continuous monitoring of suspected and confirmed cases may aid in infection control [68].

3.5

Remote Monitoring of the Patient

Because the COVID-19 pandemic is extremely infectious, physicians and medical workers are at risk of contracting it during their time on the job. The CIoT allows clinicians to remotely monitor a patient’s health status using fingertip medical data like Electrocardiogram (ECG), Breathing rate, Pulse rate, Temperature, Blood Pressure level, Glucose level, Heart rate, and so on. Wearable IoT sensors can be used to collect data on clinical parameters [54, 69, 70]. Because all of the COVID-19

200

J. B. Awotunde et al.

hospitals’ units are substantially it is possible to send medical data in real time through the internet, saving time and effort. The aged benefit greatly from the use of CIoT or people with various illnesses [71].

3.6

Real-Time Tracking

This system allows for a real-time Global COVID-19 case update every day, including the quantity of healed individuals, mortality, and current cases in various areas. As a consequence, AI may be used to evaluate illness severity and anticipate disease activity, enabling health officials and politicians to improve their choices and be better equipped for management [71, 72]. Every individual linked to the CIoT network will have access to government operations, preventative health-care interventions, and water recycling upgrades [3, 73].

3.7

Development of Drugs and Vaccines

AI is used for pharmaceutical research by analyzing existing COVID-19 data. It is useful for designing and developing medication delivery systems. This technique is used to accelerate actual drug screening, which is a lengthy process in existing approaches, and so it substantially speeds up this operation, which a human would not be able to perform [74]. It might help in the development of effective treatments for COVID-19 patients. It has grown into an important tool for the development of diagnostic tests and vaccinations [75, 76]. AI accelerates the development of vaccines and medicines, as well as clinical trials, at a far quicker rate than was previously achievable developing drugs. AI-based algorithms have changed drug discovery in general during the last decade [77–79]. Numerous RV virtual structures, also known as rule-based screening systems, have been developed as a result of AI [80, 81]. ML allows for the building of models that learn and extrapolate trends from accessible data and can draw conclusions from unknown data. DL-based models allows for autonomous feature extraction from raw data as part of the learning process [82]. Furthermore, it has recently been discovered that DL’s feature extraction can produce better results than conventional machine models [83].

Application of Interpretable Artificial Intelligence Enabled. . .

201

4 The Challenges of Interpretable Artificial Intelligence Enabled Cognitive Internet of Things for COVID-19 Pandemic As previously noted, one of the challenges with XAI is determining unbiased metrics for what constitutes a good description. Constructing scientifically persuasive hypotheses one strategy to diminish subjectivity is to be motivated by research in human neurology, sociology, or natural psychology. Use these results while creating an explanatory AI model are highlighted in [84]: A worthy explication must not only explain why the model made decision X rather than option Y, but it must also illustrate why the algorithm chose option X over option Y. Additionally, they claim that reasons are selective, indicating that the focus of a decision-making technique should be solely on significant causes. Unfalsifiable justifications have also been demonstrated to aid consumer comprehension of a model’s decision [85, 86]. Combining a possible approach to this difficulty appears to be connectionist and representational frameworks [87, 88]. On the one hand, fully convolutional tactics are more effective, but they are also more unclear. Symbolic techniques, on the other hand, are widely seen as less successful, regardless of the fact that they convey more information better clarity and thus meet the following criteria: (a) Symbolic processes can be constrictive due to their ability to appeal to established standards of reasoning. (b) The use of a codified knowledge base, such as data may be handled qualitative easily using an ontology. (c) Being chosen is less obvious in connectionist models than in symbolic systems. Considering that a decent description changes the consumer’s conceptual framework, which is a depiction of the natural reality made up of symbols, it seems fair to use the symbolic learning approach to provide a rationale. As a consequence, neural-symbolic applicability may give persuasive interpretations while retaining or improving accuracy improving general efficiency [89]. According to Doran, et al. [10], a truly intelligible model should not leave users with a plethora of explanations, each of which has a different meaning depending on the situation, and from which multiple explanations might be drawn. By giving a semantic representation of information, A model ought to be capable to generate natural language concepts by combining basic logic rationale and humanreadable language qualities [90]. Furthermore, it is critical to make a concerted effort to properly formalize measuring techniques prior to the adoption of an objective metric. One technique should be motivated by sociology by being trustworthy in the selection of assessment questions and the study population employed [91]. The difficulty that XAI processes for deep learning must solve involves giving solutions that are concise accessible to society, legislators, as well as the laws in general. More specifically, the transmission of non-technical explanations will be critical in dealing with uncertainty and establishing the societal rights to an explanation under the EU General Data Protection Regulation (not yet accessible) (GDPR) [92].

202

J. B. Awotunde et al.

Security is another concern that explainable AI faces. There was no discussion of worries about XAI-related secrecy. Algorithmic domain and trade secrets are terms used in XAI are critical to maintain secure. These principles, on the other hand, have received insufficient attention [84]. In the AI sense, if the property that makes something secret is private, numerous parts in a model will preserve this property. Consider a model developed by an organisation after several decades of study in a certain subject. The knowledge generated in the system could be categorized as exclusive, with only the output and input having access [93], assuring its security. The above suggests that under some situations, data model feature theft is conceivable. The authors of [94] suggest a method for making deep learning models more resistant to exposed to intellectual material based on a set of inaccessible questions. Recent research has revealed the requirement to grow XAI instruments capable of clarifying While maintaining the prototype’s identity in mind, create machine learning prototypes. Ideally, XAI ought to be ready to provide clarification and consider what the model accomplishes inside a model AI. Yet, the knowledge revealed by XAI approaches may be used to develop antagonistic strategy that is more effective context assaults aimed at perplexing the model, as well as to strengthen systems to combat private content leakage employing such knowledge. Adversarial assaults aim to manipulate an ML process after determining the exact information to be input into the system in order to produce a specific outcome [95]. Aggressive actions, for example, try to locate the smallest changes to input data that may be performed to produce a specific category in a supervised ML classification model. Another issue that requires immediate consideration is the issue of meaning. However, in the field of XAI, plans are now being formed. There are still numerous hurdles to overcome in order to clarify the meaning of deep learning models. The numerous meanings of XAI and the vocabulary agreement The importance of words and the relevance of features linked to the same definition, for example, can be observed here. This is especially evident in visualization approaches, where there is no uniformity under what is known as sensitivity tables, prominent caps, heat maps, neuron activation, attributing, and other methods. Because XAI is a new industry, there is no standard name for the group. Regardless if the analyst is an enterprise Intellectual clarity is not required to be at the same quality to provide the viewer, whether you are a professional, a policymaker, or a consumer with no machine cognitive functioning with an insight [96]. The key concern with leveraging the CIoT in the present COVID-19 disease outbreak situation is the preservation and security of the data obtained, which is particularly important from the standpoint of patient health [54]. The second issue concerns the measures that must be taken while establishing a data connection between the systems and protocols in question. As the number and diversity of smart sensors integrated into CIoT networks grows, so does the potential for security threats. While the Internet of Things improves company competitiveness and community life, it also expands the Hacking and other types of cyber fraud have threat vectors. According to Hewlett Packard’s (2014) research, 70% of the most widely implemented CIoT systems contain significant problems. Inadequate transport

Application of Interpretable Artificial Intelligence Enabled. . .

203

cryptography, hazardous network interfaces, and information security failures, and lack of permission are all issues with IoT systems. Each computer had an average of 25 holes, indicating the potential for a network connection breach [53]. Typically, data encryption mechanisms are not used by IoT computers. Any IoT technologies that support smart grid and network protection are examples of critical utilities and vital assets, are supported by any IoT technology. Other IoT technologies will gradually generate huge quantities of private details on a person’s home, health, and financial situation that firms have access to will be able to use to benefit their businesses. Businesses and consumers would be less likely to accept the IoT if security and privacy were to be compromised. Educating developers on how integrating cryptographic techniques (e.g., intruder detection techniques, vpns) into apps and letting people to employ IoT security protocols loaded on their devices might be beneficial help to solve security challenges. Businesses and consumers would be less likely to accept the IoT if security and privacy were to be compromised. Educating developers on how integrating cryptographic techniques (e.g., intruder detection techniques, vpns) into apps and letting people to employ IoT security protocols loaded on their devices might be beneficial help to solve security challenges. Multi-purpose gadgets and interactive apps, if not correctly designed, would wreak havoc on our lives. A slight error or failure in an unconnected world may not bring the system down; nevertheless, a breakdown in one element of a densely system will cause pandemonium across the entire thing. The home automation technology and health care and control modules are made up of embedded sensors, communications equipment, and controllers [54]. If a medical monitoring and control device’s sensor fails, the operator may receive an incorrect result, which could be fatal to the consumer. Connected home equipment such as thermometers and household electricity meters are commonplace failing or being infected with malware, posing unexpected safety risks. The proliferation of computer data traffic can exceed network connectivity, resulting in system-wide delay difficulties [1]. A single computer may be unconcerned, nonetheless, other people’s chain responses connected devices could prove disastrous for the entire network. Companies must make every effort to lower the To minimize misunderstanding in the hyper-connected IoT, increase the complexity of linked networks, increase dependability, and quality assurance of software, and guarantee the security and confidentiality of users anywhere, on any device in the world. The latest wave of health data digitalization has resulted in a change in thinking in the health coverage industry. In terms of complexity and diversification, the amount of data in the sector is increasing. Big technology is gaining traction as a viable solution for transforming the healthcare sector. A transition from reactionary to proactively healthcare would save money on medical bills and may even spur economic development. When new dangers and weaknesses emerge, security and privacy concerns have become increasingly relevant emerge in the health industry, which controls the majority of big data. When dealing with health surveillance, privacy and data protection should be thoroughly investigated. Developers can assist in the integration of security features into computers, software, and applications [97]. Developers should use a Client-Server architecture for transmitted data, in

204

J. B. Awotunde et al.

which the server sends a specific type of information to users while holding other information secured by encryption a suitable certificate [54]. Data protection has become a major issue as a result of these advances, especially because of the risks and misappropriation that come with them. Modern ethics [98] is A novel technology sector has arisen. The study of ethical concerns with knowledge and information, techniques, and related technologies behaviors and infrastructures, as outlined elsewhere, falls under this area of morals. Hospitals and immigration preparations must now be made to provide important statistics with the IoT system, including data on a sudden increase of patients with high fever and persons entering or fleeing the region, so that they may be followed in real time. Moreover, all essential systems, including edge routers and cloud computing connections with a 5G network, must be established to ensure fast connectivity to all gadgets available by electronic systems and the varying stages of end-users. An in-depth examination of the usage of IoT in tracking is necessary, as is an improved comprehension of the data breaches it poses. Unlike internet analysis methods, occurrence IoT sensing gathers and communicates direct information gathered from an array of private lenders (headlines, posts on social media, and online comments) to identify impending epidemic occurrences that dispersed quicker than conventional more stringent strategies [99]. Bacterial detection and diagnosis (rapid molecular identification and infectious illness forecasting of microbes) have both benefited from this [100]. The employment of robotics during the COVID-19 outbreak, as well as racism and secrecy, are all major concerns. The dangers of making the wrong choice amongst clinicians and the need to preserve the huge amounts of data obtained must both be recognized [101]. The Internet of Things is the most likely innovation framework to be leveraged to combat the pandemic. Physicians can make advantage of it resources to sense their symptoms. It also keeps track of people’s health. Its most notable feature is that it regulates case diagnosis by tracking the whereabouts of patients. Because data is unique from one user to another, the most serious issue with IoT implementation is data security [42]. The safety of data sent to the network is the primary issue. The HIPAA Protection Act, which includes a complete collection of safety rules and procedures standards as well as contractual criteria, must be followed by the health-care industry, including network operators. Around the same time, the establishment of the Health Information Technology for Economic and Clinical Health (HITECH) safety violation notice law (combined with concomitant sanctions and a sizable proportion of health data safety breaches) has put a strong emphasis on ensuring medical information security. The protection of private patient data and one of the most crucial and carefully regulated areas is health history responsibilities of health care institutions. Information security is essential to secure data when it enters or exits the network, as corrupted it renders data useless. It also necessitates dependable connectivity for interaction, protected session login, as well as encryption keys as it travels around the network and into the cloud [3]. The cryptography method, on the other hand, is extremely computationally costly, using the Advanced Encryption Protocol (AES) method. This kind of software-based encryption is known as based on compute-

Application of Interpretable Artificial Intelligence Enabled. . .

205

intensive approaches that can have an impact on the computer system’s efficiency, especially when used to secure large amounts of data moving to and from the server [3]. Traditional encryption approaches might produce processing inefficiencies due to significant performance overheads, making They are less than perfect for cloud network security resources. Any OpenSSL-enabled program will instantly benefit from the Intel framework changes [102]. Medical businesses may take the most use of broadband network and offer extensive data protection to and from the clouds through speeding cryptographic techniques, private session start, and massive file transfer [102].

5 The Framework of an XAI Enabled CIoT for Fighting COVID-19 Pandemic The CIoT-based is a medical-care-specific version of IoT that might be used to give treatment or cure to health-care workers, as well as ensure isolation compliance and trace disease sources [1]. COVID-19 Identity tests and data collection may be performed out using radars integrated with intelligent headgear, robots, robots, and COVID-19. These techniques would collect data, which would then be sent to a central cloud repository for processing. Healthcare experts and government agencies would be well equipped to respond to the COVID-19 catastrophe thanks to the data generated by such a system. As a result of these findings, healthcare practitioners will be able to provide patients with more personalized electronic wellness sessions. Patients will be able to seek more effective therapy while also limiting their access to the virus and preventing it from spreading further. Departments of the agency, in collaboration with local public health The Department of Health and Human Services (HHS) and the Centers for Disease Control and Prevention (CDC) will have the ability to allocate resources, evaluate isolation requirements, monitor epidemics, and use this information to implement emergency procedures [103, 104]. The increased need for patient monitoring monitoring, paired with the storage space of the cloud, has encouraged the development of CIoT-based healthcare solutions. COVID-19 cases have been reduced in frequency and their side effects have been reduced as a result of well-implemented surveillance systems [105, 106]. Because of the actual purpose of monitoring patients, the patients and warden make decisions about their physiological parameters condition via video meeting [107]. Flexible, real-time, and bidirectional communication in the healthcare sector provides a platform for a variety of technologies to provide participant involvement [1]. This chapter presents a CIoT-based adaptive XAI-enabled system for battling COVID-19 outbreaks. During the proposed approach facilitates the detection of the COVID-19 epidemic employment of a variety of wearable gadgets that are all connected to keep track of a patient’s identity. Body temperature and pulse rate,

206

J. B. Awotunde et al.

for example, contribute in the gathering of clinical information signals. Because of the tiny computational resources of the gadgets node and memory, as well as to avoid utilizing a smart smartphones as a processing system, the sensor data gathered by these wearables devices will be transferred directly to the public cloud for the method of information capture by these devices. Figure 3 demonstrates the creative process for the COVID-19 outbreak’s XAI-enabled CIoT Platform. The concept is organized into three major modules, the first module from the bottom consists of Sensors and IoT-based technologies that may be utilized to collect and capture data like temperature, heartbeat rate, glucose, pulse and so on from the

COVID-19 Monitoring and Alert System

Hospital

Doctor

Collection Detection

Module

Computers + AI

Explanations

CIoT Cloud Repository and learning Module

COVID-19 Data Storage

Data Processing

Explainable AI

AI Model

Data collection/ network module

Vital CIoT Devices /Sensors

Fig. 3 The XAI-enabled CIoT system for the COVID-19 pandemic

Prediction

Application of Interpretable Artificial Intelligence Enabled. . .

207

users from hospitals, and homes in real-times from the monitoring entities. A patient with an infection can Heart rate should be recorded and sent to a clinical data center using a mobile phone app. The signals would subsequently be classified as normal or needing more investigation by an AI module. The second module in the middle layer consists of The CIoT cloud repository and learning module where content is stored and handled on the internet. The cloud enables clinical information to be analyzed from the repository through the network and then evaluated by doctors. Individual COVID-19 data storage is captured by CIoT devices and sensors, which then undergo data preprocessing and employ trained AI models to predict the likelihood of particular abnormalities or diseases. XAI approaches leverage the predictions, as well as the COVID-19 data gathering, to provide explanations. The knowledge of an expert can be used to analyze these explanations. This study will allow clinicians to validate the AI model’s predictions, allowing for greater openness. If the predictions are right, explanations combined with clinical expertise can yield significant actionable insights. If predictions are erroneous, the discrepancy between explanations and clinician expertise can be utilized to identify aspects that lead to faulty predictions, allowing the AI model to be improved. The third module is the Monitoring and Alert Platform, which allows the physician is responsible for keeping track of the health records and sensory data. Doctors may review and act on data provided by the cloud-based program. In this system, knowledge is replicated in real-time by deleting all data from the CIoT database as soon as it comes, ensuring that specialists are current on the patient’s condition and supporting paramedics in taking a quick assessment in the emergency situation so that the condition does not worsen and hospitalization is avoided. One of the most essential objectives in achieving health fairness is to deploy XAI allowed CIoT-based knowledge to swap out various components of present healthcare care. Cloud and systems satisfy timely satisfaction of customer demand, consider the the patient’s present health condition, increase interaction between specialists and sick individuals, and minimize the time it takes for clients to obtain medical services, improving client loyalty while also decreasing costs improving the effectiveness of a hospital. Choosing the Best Treatments, especially during the COVID-19 epidemic, you might be able to establish a consistent standard. This theoretical system might be used for a variety of purposes, including smart agriculture, intelligent automobiles, and smart cities, in addition to smart healthcare. Sensors and equipment rates could be tracked and regulated using drones and airplanes to assure convenience, health, and safety.

6 Conclusion and Future Directions The worldwide COVID-19 outbreak has been a key focus of scientific investigation, and cutting-edge technology will provide an ideal answer to this global catastrophe. The Internet of Things has shown to be beneficial in identifying, tracking, mapping

208

J. B. Awotunde et al.

contacts, and managing this viral infection. The integration of CIoT-based technologies to the present COVID-19 epidemic can be utilized to build a social forum to help patients receive suitable home therapy as well as the creation of a powerful administration and health care system illness management storehouse institutions. The establishment of models and approaches that aid in the interpretation and understanding of AI choices is a growing area of XAI research. XAI approaches can be used to explain and trace the outcomes of AI-based autonomous systems. Hence, the introduction of XAI enabled CIoT-based system will greatly help during this period. As a result, this chapter discusses the application of XAI enabled CIoTbased systems during COVID-19 outbreak. But as stated in this chapter, incorporating XAI enabled CIoT techniques poses some challenges that needs to solve to really enjoy the use of the techniques fighting the current and future pandemic. An intelligent XAI enabled CIoT-based system was proposed, the framework can be used during COVID-19 pandemic to really provide real-time monitoring, prediction, and diagnosis. Other DL models will be evaluated in the architecture in the future, as well as the use protease gene sequencing. Other possibilities for the future include adding a forecasting and time-series assessment method to the platform. Pervasive edge computing may also be used to increase safety and minimize complexity.

References 1. Awotunde, J. B., Jimoh, R. G., AbdulRaheem, M., Oladipo, I. D., Folorunso, S. O., & Ajamu, G. J. (2022). IoT-based wearable body sensor network for COVID-19 pandemic. Studies in Systems, Decision and Control, 2022(378), 253–275. 2. World Health Organization. (2018). Managing epidemics: Key facts about major deadly diseases. World Health Organization. 3. Awotunde, J. B., Jimoh, R. G., Oladipo, I. D., Abdulraheem, M., Jimoh, T. B., & Ajamu, G. J. (2021). Big data and data analytics for an enhanced COVID-19 epidemic management. Studies in Systems, Decision and Control, 2021(358), 11–29. 4. Castelvecchi, D. (2016). Can we open the black box of AI? Nature News, 538(7623), 20–23. 5. Awotunde, J. B., Folorunso, S. O., Jimoh, R. G., Adeniyi, E. A., Abiodun, K. M., & Ajamu, G. J. (2021). Application of artificial intelligence for COVID-19 epidemic: An exploratory study, opportunities, challenges, and future prospects. In Artificial intelligence for COVID-19 (pp. 47–61). Springer. 6. Lipton, Z. C. (2018). The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 16(3), 31–57. 7. Preece, A., Harborne, D., Braines, D., Tomsett, R., & Chakraborty, S. (2018). Stakeholders in explainable AI. arXiv preprint arXiv:1810.00184. 8. Gunning, D. (2016). Explainable Artificial Intelligence (XAI): Technical report defense advanced research projects agency darpa-baa-16-53. DARPA. 9. Tjoa, E., & Guan, C. (2019). A survey on Explainable Artificial Intelligence (XAI). arXiv: 1907.07374. 10. Doran, D., Schulz, S., & Besold, T. R. (2017). What does explainable AI really mean? A new conceptualization of perspectives. arXiv preprint arXiv:1710.00794. 11. Doshi-Velez, F., Kim, B., Towards a rigorous science of interpretable machine learning, 2017. (Molnar, C., Casalicchio, G., & Bischl, B. (2020, September). Interpretable machine learning–

Application of Interpretable Artificial Intelligence Enabled. . .

209

a brief history, state-of-the-art and challenges. In Joint European conference on Machine Learning and Knowledge Discovery in Databases (pp. 417–431). Springer). 12. Vellido, A., Martín-Guerrero, J. D., & Lisboa, P. J. (2012, April). Making machine learning models interpretable. In ESANN (Vol. 12, pp. 163–172). 13. Walter, E. (2008). Cambridge advanced learner’s dictionary. Cambridge University Press. 14. Ghoshal, B., & Tucker, A. (2020). Estimating uncertainty and interpretability in deep learning for coronavirus (COVID-19) detection. arXiv preprint arXiv:2003.10769. 15. Angelov, P., & Soares, E. (2020). Explainable-by-design approach for covid-19 classification via ct-scan. medRxiv. 16. Matsuyama, E. (2020). A deep learning interpretable model for novel coronavirus disease (COVID-19) screening with chest CT images. Journal of Biomedical Science and Engineering, 13(07), 140–152. 17. Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. 18. Chatterjee, S., Saad, F., Sarasaen, C., Ghosh, S., Khatun, R., Radeva, P., ... & Nürnberger, A. (2020). Exploration of interpretability techniques for deep covid-19 classification using chest x-ray images. arXiv preprint arXiv:2006.02570. 19. Sarker, L., Islam, M. M., Hannan, T., & Ahmed, Z. (2020). COVID-DenseNet: A deep learning architecture to detect COVID-19 from chest radiology images. 20. Ouyang, X., Huo, J., Xia, L., Shan, F., Liu, J., Mo, Z., et al. (2020). Dual-sampling attention network for diagnosis of COVID-19 from community acquired pneumonia. IEEE Transactions on Medical Imaging, 39(8), 2595–2605. 21. Brennen, A. (2020, April). What do people really want when they say they want “explainable AI?” we asked 60 stakeholders. In Extended abstracts of the 2020 CHI conference on human factors in computing systems (pp. 1–7). 22. Burkart, N., & Huber, M. F. (2021). A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research, 70, 245–317. 23. Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., et al. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82–115. 24. Tritscher, J., Ring, M., Schlr, D., Hettinger, L., & Hotho, A. (2020, September). Evaluation of post-hoc xai approaches through synthetic tabular data. In International symposium on methodologies for intelligent systems (pp. 422–430). Springer. 25. Dieber, J., & Kirrane, S. (2020). Why model why? Assessing the strengths and limitations of LIME. arXiv preprint arXiv:2012.00093. 26. Chen, J., Hua, C., & Liu, C. (2019). Considerations for better construction and demolition waste management: Identifying the decision behaviors of contractors and government departments through a game theory decision-making model. Journal of Cleaner Production, 212, 190–199. 27. Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2017). Gradcam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision (pp. 618–626). 28. Angelov, P., & Soares, E. (2020). Towards explainable deep neural networks (xDNN). Neural Networks, 130, 185–194. 29. Bien, J., & Tibshirani, R. (2011). Prototype selection for interpretable classification. The Annals of Applied Statistics, 5(4), 2403–2424. 30. Bishop, C. M. (2006). Pattern recognition. Machine Learning, 128(9), 1–738. 31. Goebel, R., Chander, A., Holzinger, K., Lecue, F., Akata, Z., Stumpf, S., et al. (2018, August). Explainable AI: The new 42? In International cross-domain conference for machine learning and knowledge extraction (pp. 295–303). Springer. 32. Cysneiros, L. M., Raffi, M., & do Prado Leite, J. C. S. (2018, August). Software transparency as a key requirement for self-driving cars. In 2018 IEEE 26th international requirements engineering conference (RE) (pp. 382–387). IEEE.

210

J. B. Awotunde et al.

33. Dressel, J., & Farid, H. (2018). The accuracy, fairness, and limits of predicting recidivism. Science Advances, 4(1), eaao5580. 34. Soares, E. A., Angelov, P. P., Costa, B., Castro, M., Nageshrao, S., & Filev, D. (2020). Explaining deep learning models through rule-based approximation and visualization. IEEE Transactions on Fuzzy Systems, 1, 1–10. 35. He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778). 36. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9). 37. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. 38. Tetila, E., Bressem, K., Astolfi, G., Sant'Ana, D. A., Pache, M. C., & Pistori, H. (2020). System for quantitative diagnosis of COVID-19-associated pneumonia based on superpixels with deep learning and chest CT. Research Square, 1, 1–13. https://doi.org/10.21203/rs.3.rs123158/v1 39. Couteaux, V., Nempont, O., Pizaine, G., & Bloch, I. (2019). Towards interpretability of segmentation networks by analyzing deepDreams. In Interpretability of machine intelligence in medical image computing and multimodal learning for clinical decision support (pp. 56–63). Springer. 40. Angelov, P. P., Soares, E. A., Jiang, R., Arnold, N. I., & Atkinson, P. M. (2021). Explainable artificial intelligence: An analytical review (p. e1424). Data Mining and Knowledge Discovery. 41. Allam, Z., Dey, G., & Jones, D. S. (2020). Artificial Intelligence (AI) provided early detection of the coronavirus (COVID-19) in China and will influence future urban health policy internationally. AI, 1(2), 156–165. 42. Singh, R. P., Javaid, M., Haleem, A., & Suman, R. (2020). Internet of things (IoT) applications to fight against COVID-19 pandemic. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 14(4), 521–524. 43. Ting, D. S. W., Carin, L., Dzau, V., & Wong, T. Y. (2020). Digital technology and COVID-19. Nature Medicine, 26(4), 459–461. 44. Vaishya, R., Javaid, M., Khan, I. H., & Haleem, A. (2020). Artificial intelligence (AI) applications for COVID-19 pandemic. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 14(4), 337–339. 45. Haleem, A., Javaid, M., & Vaishya, R. (2020). Effects of COVID-19 pandemic in daily life. Current Medicine Research and Practice, 10(2), 78–79. 46. Javaid, M., Haleem, A., Vaishya, R., Bahl, S., Suman, R., & Vaish, A. (2020). Industry 4.0 technologies and their applications in fighting COVID-19 pandemic. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 14(4), 419–422. 47. Ejaz, W., & Ibnkahla, M. (2017). Multiband spectrum sensing and resource allocation for IoT in cognitive 5G networks. IEEE Internet of Things Journal, 5(1), 150–163. 48. Ahmed, R., Chen, Y., Hassan, B., & Du, L. (2021). CR-IoTNet: Machine learning based joint spectrum sensing and allocation for cognitive radio enabled IoT cellular networks. Ad Hoc Networks, 112, 102390. 49. Verma, S., Kaur, S., Rawat, D. B., Xi, C., Alex, L. T., & Jhanjhi, N. Z. (2021). Intelligent framework using IoT-based WSNs for wildfire detection. IEEE Access, 9, 48185–48196. 50. Dang, L. M., Piran, M., Han, D., Min, K., & Moon, H. (2019). A survey on internet of things and cloud computing for healthcare. Electronics, 8(7), 768. 51. Osifeko, M. O., Hancke, G. P., & Abu-Mahfouz, A. M. (2020). Artificial intelligence techniques for cognitive sensing in future IoT: State-of-the-Art, potentials, and challenges. Journal of Sensor and Actuator Networks, 9(2), 21.

Application of Interpretable Artificial Intelligence Enabled. . .

211

52. Swayamsiddha, S., & Mohanty, C. (2020). Application of cognitive internet of medical things for COVID-19 pandemic. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 14(5), 911–915. 53. Awotunde, J. B., Bhoi, A. K., & Barsocchi, P. (2021). Hybrid cloud/fog environment for healthcare: An exploratory study, opportunities, challenges, and future prospects. Intelligent Systems Reference Library, 2021(209), 1–20. 54. Awotunde, J. B., Folorunso, S. O., Bhoi, A. K., Adebayo, P. O., & Ijaz, M. F. (2021). Disease diagnosis system for IoT-based wearable body sensors with machine learning algorithm. Intelligent Systems Reference Library, 2021(209), 201–222. 55. Lee, H. A., Kung, H. H., Lee, Y. J., Chao, J. C., Udayasankaran, J. G., Fan, H. C., et al. (2020). Global infectious disease surveillance and case tracking system for COVID-19: Development study. JMIR Medical Informatics, 8(12), e20567. 56. Pramanik, P. K. D., Pal, S., & Choudhury, P. (2018). Beyond automation: The cognitive IoT. Artificial intelligence brings sense to the internet of things. In Cognitive computing for big data systems over IoT (pp. 1–37). Springer. 57. Patra, M. K. (2017, February). An architecture model for smart city using cognitive internet of things (CIoT). In 2017 second International Conference on Electrical, Computer and Communication Technologies (ICECCT) (pp. 1–6). IEEE. 58. Lin, C., Braund, W. E., Auerbach, J., Chou, J. H., Teng, J. H., Tu, P., & Mullen, J. (2020). Policy decisions and use of information technology to fight coronavirus disease, Taiwan. Emerging infectious diseases, 26(7), 1506–1512. 59. Muhammad, L. J., Algehyne, E. A., Usman, S. S., Ahmad, A., Chakraborty, C., & Mohammed, I. A. (2021). Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset. SN Computer Science, 2(1), 1–13. 60. Shi, F., Wang, J., Shi, J., Wu, Z., Wang, Q., Tang, Z., et al. (2020). Review of artificial intelligence techniques in imaging data acquisition, segmentation, and diagnosis for COVID-19. IEEE Reviews in Biomedical Engineering, 14, 4–15. 61. Garg, L., Chukwu, E., Nasser, N., Chakraborty, C., & Garg, G. (2020). Anonymity preserving IoT-based COVID-19 and other infectious disease contact tracing model. IEEE Access, 8, 159402–159414. 62. Lai, S. H. S., Tang, C. Q. Y., Kurup, A., & Thevendran, G. (2021). The experience of contact tracing in Singapore in the control of COVID-19: Highlighting the use of digital technology. International Orthopaedics, 45(1), 65–69. 63. Kretzschmar, M. E., Rozhnova, G., Bootsma, M. C., van Boven, M., van de Wijgert, J. H., & Bonten, M. J. (2020). Impact of delays on effectiveness of contact tracing strategies for COVID-19: A modelling study. The Lancet Public Health, 5(8), e452–e459. 64. Rao, A. S. S., & Vazquez, J. A. (2020). Identification of COVID-19 can be quicker through artificial intelligence framework using a mobile phone–based survey when cities and towns are under quarantine. Infection Control & Hospital Epidemiology, 41(7), 826–830. 65. Tambo, E., Djuikoue, I. C., Tazemda, G. K., Fotsing, M. F., & Zhou, X. N. (2021). Early stage risk communication and community engagement (RCCE) strategies and measures against the coronavirus disease 2019 (COVID-19) pandemic crisis. Global Health Journal., 5, 44–50. 66. Kodali, P. B., Hense, S., Kopparty, S., Kalapala, G. R., & Haloi, B. (2020). How Indians responded to the Arogya Setu app? Indian Journal of Public Health, 64(6), 228. 67. Vaishya, R., Haleem, A., Vaish, A., & Javaid, M. (2020). Emerging technologies to combat the COVID-19 pandemic. Journal of Clinical and Experimental Hepatology, 10(4), 409–411. 68. Wax, R. S., & Christian, M. D. (2020). Practical recommendations for critical care and anesthesiology teams caring for novel coronavirus (2019-nCoV) patients. Canadian Journal of Anesthesia/Journal canadien d’anesthésie, 67(5), 568–576. 69. Pan, X. B. (2020). Application of personal-oriented digital technology in preventing transmission of COVID-19, China. Irish Journal of Medical Science (1971-), 189(4), 1145–1146. 70. Sood, S. K., & Mahajan, I. (2017). Wearable IoT sensor based healthcare system for identifying and controlling chikungunya virus. Computers in Industry, 91, 33–44.

212

J. B. Awotunde et al.

71. Adeniyi, E. A., Ogundokun, R. O., & Awotunde, J. B. (2021). IoMT-based wearable body sensors network healthcare monitoring system. Studies in Computational Intelligence, 2021(933), 103–121. 72. Awotunde, J. B., Adeniyi, A. E., Ogundokun, R. O., Ajamu, G. J., & Adebayo, P. O. (2021). MIoT-based big data analytics architecture, opportunities and challenges for enhanced telemedicine systems. Studies in Fuzziness and Soft Computing, 2021(410), 199–220. 73. Kumar, H., Singh, M. K., Gupta, M. P., & Madaan, J. (2020). Moving towards smart cities: Solutions that lead to the Smart City transformation framework. Technological Forecasting and Social Change, 153, 119281. 74. Haleem, A., Vaishya, R., Javaid, M., & Khan, I. H. (2020). Artificial Intelligence (AI) applications in orthopaedics: An innovative technology to embrace. Journal of Clinical Orthopaedics and Trauma, 11(Suppl 1), S80–S81. 75. Chen, S., Yang, J., Yang, W., Wang, C., & Bärnighausen, T. (2020). COVID-19 control in China during mass population movements at new year. The Lancet, 395(10226), 764–766. 76. Bobdey, S., & Ray, S. (2020). Going viral–Covid-19 impact assessment: A perspective beyond clinical practice. Journal of Marine Medical Society, 22(1), 9. 77. Zhong, F., Xing, J., Li, X., Liu, X., Fu, Z., Xiong, Z., et al. (2018). Artificial intelligence in drug design. Science China Life Sciences, 61(10), 1191–1204. 78. Duan, Y., Edwards, J. S., & Dwivedi, Y. K. (2019). Artificial intelligence for decision making in the era of Big Data–evolution, challenges and research agenda. International Journal of Information Management, 48, 63–71. 79. Lavecchia, A. (2019). Deep learning in drug discovery: Opportunities, challenges and future prospects. Drug Discovery Today, 24(10), 2017–2032. 80. Naz, K., Naz, A., Ashraf, S. T., Rizwan, M., Ahmad, J., Baumbach, J., & Ali, A. (2019). PanRV: Pangenome-reverse vaccinology approach for identifications of potential vaccine candidates in microbial pangenome. BMC Bioinformatics, 20(1), 1–10. 81. Ong, E., Wang, H., Wong, M. U., Seetharaman, M., Valdez, N., & He, Y. (2020). Vaxign-ML: Supervised machine learning reverse vaccinology model for improved prediction of bacterial protective antigens. Bioinformatics, 36(10), 3185–3191. 82. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. 83. Zhavoronkov, A., Ivanenkov, Y. A., Aliper, A., Veselov, M. S., Aladinskiy, V. A., Aladinskaya, A. V., et al. (2019). Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nature Biotechnology, 37(9), 1038–1040. 84. Tjoa, E., & Guan, C. (2020). A survey on explainable artificial intelligence (xai): Toward medical xai. IEEE Transactions on Neural Networks and Learning Systems, 32(11), 4793– 4813. 85. Goudet, O., Kalainathan, D., Caillou, P., Guyon, I., Lopez-Paz, D., & Sebag, M. (2018). Learning functional causal models with generative neural networks. In Explainable and interpretable models in computer vision and machine learning (pp. 39–80). Springer. 86. Byrne, R. M. (2019, August). Counterfactuals in Explainable Artificial Intelligence (XAI): Evidence from human reasoning. In IJCAI (pp. 6276–6282). 87. Garcez, A. D. A., Gori, M., Lamb, L. C., Serafini, L., Spranger, M., & Tran, S. N. (2019). Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning. arXiv preprint arXiv:1905.06088. 88. Garnelo, M., & Shanahan, M. (2019). Reconciling deep learning with symbolic artificial intelligence: Representing objects and relations. Current Opinion in Behavioral Sciences, 29, 17–23. 89. Donadello, I., Serafini, L., & Garcez, A. D. A. (2017). Logic tensor networks for semantic image interpretation. arXiv preprint arXiv:1705.08968. 90. Bennetot, A., Laurent, J. L., Chatila, R., & Díaz-Rodríguez, N. (2019). Towards explainable neural-symbolic visual reasoning. arXiv preprint arXiv:1909.09065.

Application of Interpretable Artificial Intelligence Enabled. . .

213

91. Kelley, K., Clark, B., Brown, V., & Sitzia, J. (2003). Good practice in the conduct and reporting of survey research. International Journal for Quality in Health Care, 15(3), 261–266. 92. Wachter, S., Mittelstadt, B., & Floridi, L. (2017). Why a right to explanation of automated decision-making does not exist in the general data protection regulation. International Data Privacy Law, 7(2), 76–99. 93. Orekondy, T., Schiele, B., & Fritz, M. (2019). Knockoff nets: Stealing functionality of blackbox models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 4954–4963). 94. Oh, S. J., Schiele, B., & Fritz, M. (2019). Towards reverse-engineering black-box neural networks. In Explainable AI: Interpreting, explaining and visualizing deep learning (pp. 121–144). Springer. 95. Alabugin, S. K., & Sokolov, A. N. (2020, November). Applying of generative adversarial networks for anomaly detection in industrial control systems. In 2020 Global Smart Industry Conference (GloSIC) (pp. 199–203). IEEE. 96. George, R. Z., & Bruce, J. B. (Eds.). (2008). Analyzing intelligence: Origins, obstacles, and innovations. Georgetown University Press. 97. Kotz, D., Fu, K., Gunter, C., & Rubin, A. (2015). Security for mobile and cloud frontiers in healthcare. Communications of the ACM, 58(8), 21–23. 98. Castiglione, A., D’Ambrosio, C., De Santis, A., Castiglione, A., & Palmieri, F. (2013, July). On secure data management in health-care environment. In 2013 seventh international conference on innovative mobile and internet services in ubiquitous computing (pp. 666–671). IEEE. 99. Christaki, E. (2015). New technologies in predicting, preventing and controlling emerging infectious diseases. Virulence, 6(6), 558–565. 100. Rahman, M. S., Peeri, N. C., Shrestha, N., Zaki, R., Haque, U., & Ab Hamid, S. H. (2020). Defending against the novel coronavirus (COVID-19) outbreak: How can the internet of things (IoT) help to save the world?. Health Policy and Technology (Vol. 9, pp. 136–138). 101. Howard, A., & Borenstein, J. (2020). AI, robots, and ethics in the age of COVID-19. Retrieved August, 18, 2021. 102. Awotunde, J. B., Jimoh, R. G., Folorunso, S. O., Adeniyi, E. A., Abiodun, K. M., & Banjo, O. O. (2021). Privacy and security concerns in IoT-based healthcare systems. In Internet of Things (pp. 105–134). 103. Richardson, E., & Devine, C. (2020). Emergencies end eventually: How to better analyze human rights restrictions sparked by the COVID-19 pandemic under the international covenant on civil and political rights. Michigan Journal of International Law, 42, 105. 104. Walsh, D. (2021). COVID-19: A crisis and an opportunity to improve the emergency use authorization process. Minnesota Journal of Law, Science & Technology, 22(2), 169. 105. Jin, L. S., & Fisher, D. (2021). MDRO transmission in acute hospitals during the COVID-19 pandemic. Current Opinion in Infectious Diseases, 34(4), 365–371. 106. Ding, J., Dai, Q., Li, Y., Han, S., Zhang, Y., & Feng, Y. (2021). Impact of meteorological condition changes on air quality and particulate chemical composition during the COVID-19 lockdown. Journal of Environmental Sciences, 109, 45–56. 107. Felten-Barentsz, K. M., van Oorsouw, R., Klooster, E., Koenders, N., Driehuis, F., Hulzebos, E. H., et al. (2020). Recommendations for hospital-based physical therapists managing patients with COVID-19. Physical Therapy, 100(9), 1444–1457.

Remote Photoplethysmography: Digital Disruption in Health Vital Acquisition Monika, Harish Kumar, Sakshi Kaushal, and Varinder Garg

1 Introduction An individual’s vital signs like Respiration Rate, Heart Rate & variability, Temperature, and Oxygen level (SpO2) are crucial information that is associated with both the physical and the physiological condition of an individual. With advancements in image processing technologies such as rPPG, it is now possible to acquire and retrieve health information from the human body by employing merely an optical sensor, such as a normal red-green-blue (RGB) camera without any physical contact. rPPG is a low-priced, contactless, and emerging technology for monitoring vital signs via video analysis. rPPG uses Computer Vision (CV) technology to extract data on variations in light absorption on the facial skin that represent the physiological state of the individual. rPPG is the modification of the photoplethysmography (PPG) technique which was discovered by Hertzmen [1] in 1937 where a photoelectric cell positioned beneath a finger irradiated by a source of light, was used to calculate differences in the light absorption by human skin. Traditional photoplethysmography (PPG), which is the base technology for wearable devices such as pulse oximeters and smartwatches, uses an emitter as a light source projected onto the skin’s surface and a receptor in contact with the skin to capture reflectivity information that corresponds to an individual’s heartbeat and other health vitals. The same process can be carried out remotely using visible light as the source of light and a simple RGB camera to capture the video. Vital signs may be assessed simply by analyzing a video stream of the user’s face using rPPG. Video processing can be Monika (✉) · H. Kumar · S. Kaushal UIET, Panjab University, Chandigarh, UT, India e-mail: [email protected]; [email protected]; [email protected] V. Garg PGIMER, Chandigarh, UT, India e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. S. Hossain et al. (eds.), Explainable Machine Learning for Multimedia Based Healthcare Applications, https://doi.org/10.1007/978-3-031-38036-5_12

215

216

Monika et al.

done in real-time or on a video clip that has already been recorded. The advantage of rPPG is that it is a totally contactless approach that does not require specific instruments. The accuracy of rPPG, on the other hand, is dependent on the quality of the signals processed, which is influenced by a variety of factors such as luminance, distance, and image quality. Heart rate (HR) & its variability (HRV), temperature, respiratory rate (RR), and oxygen saturation (SpO2) are just a few of the vital indicators that can be measured with the rPPG technology. Furthermore, HRV, sleep quality, heart rhythm abnormalities, and drowsiness can all be utilized to determine the stress levels and mental wellness of human beings. rPPG is a special attraction in telehealth due to its striking features like non-contact monitoring, aviation health screening, employee wellbeing, pet health, etc. Because of its simplicity, cost, flexibility, and convenience of use, rPPG can assist a larger range of care services to rural and marginalized communities, including childcare, senior care, disability, chronic illness care, and mental care. With all these advantages, rPPG also faces numerous challenges in various fields. An overview of the issues in currently available rPPG algorithms is provided in this paper and an analysis of the development trends and available tools in this field is explained. So far, only three commercially available rPPG web applications are active in the market, which are Wellfie, Covitor, and BwellInsure. Wellfie and Covitor applications are chosen for the evaluation and analysis as the BwellInsure application is not available for free general public use. This Analysis and evaluation of commercially available rPPG applications are the first of its kind as no paper is found during the literature review which contains an analysis of such applications. The rest of the paper is arranged as follows: the basic working principle of rPPG technology is contained in Sect. 2. Existing rPPG algorithms are demonstrated in Sect. 3. Section 4 describes the flaws present in the existing rPPG techniques and the ways to handle these flaws in research work so far. Current trends and tools in rPPG technology and its applications are explained in Sect. 5. Section 6 presents a study performed to test the accuracy and reliability of a few heart-rate estimating tools present in cyberspace and its result. The conclusion of the paper is given in Sect. 7. In addition, “Acknowledgment” and “Conflict of Interest” are also provided.

2 Basic Principle 2.1

Photoplethysmography (PPG)

The basic technique behind rPPG development is photoplethysmography (PPG). PPG can be described as a non-penetrating and visual approach for sensing the blood volume pulse (BVP). Traditional heart rate measuring methods involving contact with the patient/individual such as Pulse Oximeter and Electrocardiography (ECG) are based on the PPG technique. ECG provides the most precise HR data, however, it necessitates medical electrodes on the patient/individual. A pulse oximeter, which uses PPG, must also require attached to a part of the body, like the finger or the earlobe. These two procedures are both reliable and cost-effective but have

Remote Photoplethysmography: Digital Disruption in Health Vital Acquisition

217

limitations due to the requirement of close interaction with the subject. It is inconvenient to attach electrodes and medical sensors to the individual since it may cause irritation and discomfort. A whole machinery setup and medically trained individuals are required to operate an ECG machine for patient monitoring, which is not available to everyone in need, such as people in backward areas. Also, remote transmission of patient’s health data is not possible while using both ECG and pulse oximeter, thus no remote monitoring is possible. The difficulties of contactbased methods outlined above highlight the requirement for non-contact heart rate measurement methods.

2.2

Remote Photoplethysmography (rPPG)

rPPG has received a lot of attention in the research field with the intervention of wearable devices. The ability to measure vital signs without making physical touch with the subject is a unique characteristic of this method, which removes any annoyance during the assessment process, making it an easy procedure. This eye-catching feature allows rPPG technology to be used in both clinical and non-clinical applications such as face anti-spoofing, driver monitoring, fitness-cardio training, and home health monitoring. Furthermore, this method is promising in terms of security, cost, health, dependability, and computational efficiency.

2.2.1

Principle of rPPG

rPPG upholds the PPG tenet in a contactless environment. In the case of a wearable device, light is emitted to the skin by one sensor, and the amount of light returned to the device is detected by a second sensor. This causes a difference between emitted and reflected light. Because the amount of reflected light varies with blood volume due to capillary dilation and constriction, it can be used to determine heart rate. Less absorption of the green light and narrower arteries are indicators of lower blood pressure before the pulse wave (higher reflectivity). In contrast, higher blood pressure before the pulse wave means broader arteries and more green light absorption (lesser reflectivity). Similarly, rPPG determines the difference in red, blue, and green light reflection variations from the facial skin, as the difference between specular and diffuse reflection. A basic approach for rPPG has the following steps: 1. Skin pixel selection: To establish facial landmarks and head orientation, the face in the acquired webcam image is identified and modeled. The region of interest is around the top two-thirds of the face, where the majority of the blood arteries are concentrated. 2. Signal extraction: For both specular and diffuse reflections, the average of each pixel color (red, green, blue) in the region is measured over time. Then a suitable algorithm is used for signal extraction.

218

Monika et al.

3. Signal filtering: Fitting of the face model is performed to detect noise from head motion, and subsequently a noise-free heart rate is generated. 4. Heart rate calculation: Heart rate and other health vitals are calculated by detecting peaks and measuring inter-beat intervals. A skin reflection model is discussed for understanding signal extraction and signal filtering steps in more detail.

2.2.2

Skin Reflection Model

A basic rPPG model for skin reflections is shown in Fig. 1, which takes into account the relevant optical and physiological features of the skin reflections. This model explains the generic rPPG method in detail and on the basis of this model provides an overview of the ways different rPPG methods are implemented to perform the same task. Consider a source of light that illuminates a part of facial skin tissue with pulsatile blood flowing and a distant color camera that captures the image. Here it is assumed that the source of light is having a persistent spectral composition with varying intensities of light. The variations in the intensity rely on the distance between the source of light and the facial skin surface and also on the distance between the source of light and the camera. The skin part captured by the camera has a certain color which is a combination of the reflection of light from the light source, own color of the skin, and the sensitivities of the color channels of the camera. This captured skin color varies with time because of the motion-induced: intensity known as specular variations, and the pulse-induced intensity. These changes in color are known as temporal changes and these depend on the intensity of the illuminating light. Based on the dichromatic model [2]. Each facial skin pixel’s reflection in a captured image can be characterized as a time-dependent RGB channel function as shown in Eq. (1). Ck ðtÞ = IðtÞ:ðvs ðtÞ þ vd ðtÞÞ þ vn ðtÞ

ð1Þ

Here, Ck(t) represents RGB channels of the kth skin pixel ordered in columns. I(t) is the illuminance intensity level which represents the intensity variations due to the source of light as well as the change of distance between the source, facial skin surface, and the camera. I(t) depends on the two constituents in the dichromatic model for modulation: specular reflection: vs(t) and diffuse reflection vd(t). All the components depend on time due to the pulsatile blood flow and body motion. The camera sensor has vn(t) as the quantization noise. No pulsatile information is present in the specular reflection which is a mirror image of light from the facial skin surface. The spectral composition of this reflection is the same as the source of light. Specular reflection is time-dependent due to the

Remote Photoplethysmography: Digital Disruption in Health Vital Acquisition

Fig. 1 This shows the basic model of the rPPG principle

219

220

Monika et al.

body movement which will change the distance between the skin surface, the light source, and the camera. This time dependency can be depicted as shown in Eq. (2). vs ðtÞ = us :ðs0 þ sðtÞÞ

ð2Þ

Here, us(t): is the unit color vector; of the light spectrum. s0 is the stationary and s (t) is the time-varying part of the specular reflection. s(t) is the motion-induced part of the specular reflection. Diffuse reflection is the measure of scattering and absorption of light in facial skin tissues. The spectral composition of this reflection is different from that of the light source as the melanin and hemoglobin present in the skin tissues: lead to a particular chromaticity for vd. And vd is also time-dependent due to changes in blood volume as described in Eq. (3). vd ðtÞ = ud :d0 þ up :pðtÞ

ð3Þ

Here, ud represents the unit color vector due to the facial skin tissue; d0 is the stationary reflection’s strength; up represents corresponding pulsatile strengths in the RGB channels, and the pulse signal is represented by p(t). Substituting the values of Eqs. (2) and (3) in Eq. (1), ck(t) is calculated as in Eq. (4). ck ðtÞ = IðtÞ: us ðs0 þ sðtÞÞ þ ud :d0 þ up :pðtÞ þ vn ðtÞ

ð4Þ

In Eq. (4), the stationary parts of the diffuse and specular reflection are combined into a single component as Eq. (5), which will be the stationary skin reflection. uc :c0 = us :s0 þ ud :d0

ð5Þ

Here, uc is the unit color vector for the skin reflection and c0 is the stationary reflection strength. Modifying Eq. (4) according to Eq. (5) will give Eq. (6). ck ðtÞ = I0 :ð1 þ iðtÞÞ: uc :c0 þ us :sðtÞ þ pp :pðtÞ þ vn ðtÞ

ð6Þ

I(t) can be represented as a mixture of two parts: a stationary part I0 and a timedependent part I0. i(t). In other words, the intensity variation strength captured by the camera, which is caused by the body motion, is directly proportional to the amount of intensity present. The zero-mean signals i(t), s(t), and p(t) are all zero-mean signals. Specular reflection can be the most crucial component, often outweighing all other factors in a scene. Therefore, it is presumed that there exist measures such as classifiers that may be used to reject the locations where the specular reflection component is at its most significant. The pixels k where ud is determined significantly by diffuse reflection are therefore the only ones taken into consideration as a

Remote Photoplethysmography: Digital Disruption in Health Vital Acquisition

221

result of this. The p(t) extraction from ck(t) is the ultimate goal of any rPPG approach. Depending on the way p(t) is extracted from ck(t), different rPPG algorithms are discussed in the next section.

2.2.3

Use of AI in rPPG

There is a pressing need for high-quality rPPG pulse signals in various industries, including emotion recognition and health monitoring. However, due to the drawback of inaccurate pulse signals, the majority of rPPG techniques now in use can only be utilized to provide average heart rate (HR) measurements. Researchers have recently started using Artificial intelligence (AI) and deep learning (DL) approaches in different stages of rPPG due to the success of these methods in other fields, such as image and video analysis. A conventional rPPG method has the following stages: (1) Face video capturing using a digital camera, (2) Face detection to get the subject’s bounding box coordinates, (3) ROI selection, (4) rPPG signal extraction, and (5) Heart rate estimation. DL can be applied in remote heart rate estimation as either an end-to-end model or as a hybrid DL method. End-to-end DL approaches use a single model to produce the HR or rPPG signal, whereas hybrid DL methods employ DL at various phases. An end-to-end three-dimensional Convolution Neural Network (CNN) Spatio-Temporal Network (STN) called PhysNet [3] aims to identify the peak of each unique heartbeat. Accurate HR and HRV estimation are possible, enabling more challenging applications like emotion identification. It directly outputs the final rPPG signal after receiving the original RGB video frames as input. As a hybrid DL method [4], applied an LSTM network for signal filtering and quality improvement of the extracted signal. In the beginning, the LSTM network was trained using large synthetic data. Its capacity for generalization was then improved by additional training on real data for model fine-tuning. AI can be implemented at every stage of rPPG to get improved and more accurate health-related information. Various applications of AI-embedded rPPG are described below: Telehealth Throughout the world, telehealth has grown in popularity both during and after the COVID-19 epidemic. Using a consumer-level gadget, the user can evaluate their physiological signals and identify early signs of various health issues from any location thanks to the integration of telehealth software and rPPG technology. BINAH.AI is one such telehealth application software present in the commercial market. The video-based monitoring solutions from Binah.ai fulfill the promise of universal access to health and wellness services by providing businesses and consumers with a strong tool that is highly accessible and simple to use. The technology of Binah.ai [28] analyses a signal acquired from exposed skin on the surface of a human face using a special combination of signal processing and AI technologies, along with a unique mathematical back-end. Patient Monitoring The emotion detection feature of AI can be combined with rPPG in hospital infrastructures like appointment scheduling software and waiting

222

Monika et al.

areas, it can assist in giving priority to patients who are in urgent danger over others. AI methods can also assist in keeping track of a patient’s health journey and tracking changes in their vital signs. Some other applications [5] can be Pandemic control, Deepfake detection, Face anti-spoofing, Neonatal-monitoring, and Fitness-tracking, etc.

3 Algorithmic Methods Several rPPG methods for extracting the cardiac signal from videos have been developed. This section presents the overview of each method produced by researchers for estimating heart rate. Below stated methods are three broad categories for these methodologies:

3.1

Blind Source Separation (BSS) Method (PCA/ICA)

It is also critical to generalize time series data in the frequency domain as an alternate type of information. As a result of this representation, signal interpretation, data filtering, and interpolation can be accomplished. It is possible to separate this approach into two categories: ICA-based methods [6] and PCA [7]. Assumptions about whether the source signals are uncorrelated or independently related are critical to the PCA and the ICA methods. This is the most significant distinction between the two methods. Following the BSS process, the pulse is selected as the signal with the highest frequency periodicity among the factorized source signals. As a result, these systems cannot handle situations where the motion is both periodic and repetitive, such as when the subject is exercising in a gym. It is possible to recover and decode the BVP signal using this method, by splitting the RGB mixed signals into separate, independent signals.

3.2

Model-Based Method (CHROM/BVP)

Unlike BSS-based algorithms, which assume nothing about the colors of the source signals, model-based approaches govern demixing. When it comes to demixing, model-based approaches use the information on the individual component’s color vector. This means that removing c(t)‘s dependence on the average skin’s reflection color, which includes both the intrinsic skin color and light source, is common to both solutions. A further advantage of model-based approaches is the motiontolerance. The model-based strategy incorporates the PBV and CHROM methodologies introduced by De Haan [5, 6].

Remote Photoplethysmography: Digital Disruption in Health Vital Acquisition

3.2.1

223

Chrominance-Based Method (CHROM)

de Haan and Jeanne [8], which, to white balance the photos, linearly mixes the chrominance signals under the assumption of uniform skin color;

3.2.2

BVP Signature-Based Method

This approach is resistant to the effects of motion artifacts. This method distinguishes between color fluctuations caused by changes in heart rate and noise caused by motion artifacts. Separation of the samples is accomplished by using the signature of the blood volume pulse at various optical wavelengths [9]. Wenjing Wang also proposed the orthogonal skin plane (POS) [10], which is a new approach. This method is similar to CHROM, but it uses alternative priors to change the order in which the predicted color distortions are decreased. The authors created a skin tone orthogonal plane in a temporally normalized RGB environment for this innovative technique.

3.3

Design-Based Method

A revolutionary technique called Spatial Subspace Rotation (2SR) was presented in [11]. The 2SR method quantifies RGB values as a spatial representation. Skin subspace time rotation measurement in the skin subspace is used to extract the BVP signal. Throbbing blood creates differences in RGB channels in the temporal domain, modifying the subspace of skin pixels. The way red, blue, and green signals are integrated to obtain the pulse signal is the fundamental difference between various rPPG techniques. A comparison of a few rPPG algorithms is presented in Table 1. All the algorithms taken into consideration have an implementation in both python and MATLAB.

4 Issues and Literature Review 4.1

PPG vs. rPPG

New issues have arisen due to the transition from PPG to rPPG [12]. To begin with, the rPPG signal anticipated from a video sequence has a significantly lower SNR than the expected signal. To improve the quality of the signal, PPG employs contact probes and specific light sources. PPG sees ambient light as noise, but rPPG treats it as a light source. Furthermore, rPPG uses a complementary metal-oxide

Independent component analysis

Chrominance based method (CHROM)

BVP signaturebased method

Spatial Subspace Rotation (2SR)

Plane Orthogonal to Skin (POS)

3

4

5

6

Algorithm Principal component analysis

2

Sr No 1

Artemyev et al. [26]

Wang et al. [11]

de Haan and van Leest [9]

Model-based method

Design-based method

Model-based method

Model-based method

Blind source separation

Dasari et al. [6]

Haan et al. [8]

Category Blind source separation

Authors Cho et al. [7]

By restricting all color changes to the pulsatile direction, it immediately recovers [26] the pulse from the pulsatile component. It develops a skin color space, dependent on the subject and measures the pulse by tracking the tone change through time and determining the instant tone based on the statistical distribution of the image skin pixels. Similar to CHROM, but it uses alternative priors to change the order in which the expected color distortions are decreased.

The images integrate the chrominance impulses in a straight line using a predetermined skin tone to white-balance.

Assuming the input signals are independent, this statistical approach breaks down [27] a multivariate signal into its component signals.

Description A generalized resolution to digital circuit issues [26] that does not use the skin reflecting properties that are special to the rPPG difficulty is needed.

Table 1 Comparison of remote photoplethysmography algorithms along with limitations Limitation It is necessary for the amplitude variations of the noise and pulse to be sufficiently distinct from determining the directions of the eigenvector in PCA, which calculates eigenvectors using RGB signal covariance. The order of the independent components can not be known. The independent components of the eigenvectors matrix can be obtained by permuting the column differently. In addition, the precise amplitude and sign of the independent components are unknown. Because PBV relies on blood signaling and distracts its transmission to the opposite side of the body, CHROM has a downside of using the vector for white skin reference. PBV is a single-step mechanism. Precision expertise in the cardiac hallmark of the blood volume is required for PBV. 2SR’s subspace axes are entirely datadriven, with no physiological considerations. When spatial measurements are unreliable, i.e. when the skin mask is poorly selected or noisy, this causes performance issues in practice. Work best in stationary scenarios.

224 Monika et al.

Remote Photoplethysmography: Digital Disruption in Health Vital Acquisition

225

semiconductor (CMOS) sensor from a digital camera in place of the monocolor photodetector that was specifically designed for PPG. There is a predetermined distance between the camera and the subject during the rPPG measurement. All these differences between PPG and rPPG make it difficult for rPPG to predict a highquality signal. Second, the motion artifacts issue in rPPG is even more problematic than in PPG. Ambient light does not have a spatially homogeneous intensity distribution in everyday surroundings. Also, a digital camera’s CMOS sensor and optics do not have a uniform reaction across the whole range of vision. When taking measurements of a subject in motion, it is necessary for rPPG to adjust the size and location of the ROI. Due to the resulting change in the radiant flux, the response of the digital camera to the ROI will also be affected. The rPPG signal will be distorted due to the small amount of skin color variation loading BVP information compared to global shading effects. Motion artifacts are the term for this type of distortion [12]. The signal quality of rPPG is significantly harmed by motion artifacts and low SNR.

4.2

Factors Affecting rPPG Video Capturing

rPPG is suitable for both clinical and non-clinical applications, such as patient monitoring [13], neonatal monitoring, and sleep monitoring, all utilize the heart rate, and its variability recorded by rPPG as markers. Despite its benefits for telemedicine services, lighting circumstances [14], body motion, and camera parameters all contribute to an increase in the inaccuracy in the HRV obtained using rPPG. As a result, one of the primary problems is developing a strategy to reduce the impact of these elements in order to obtain precise rPPG measurements. Akito et al. [13] conducted an experiment to examine the effect of all these factors on rPPG signal quality which are listed one by one:

4.2.1

Effect of Light Source

Akito Tohma et al. [13] set up an experimental condition to study the effect of light illuminance on HRV measurement. The experiment was carried out by altering the light source’s intensity and direction. It had three circumstances in terms of light source direction: top, front, and front and top. The light illuminance was modified to four levels: 100, 300, 500, and 700 lux per light source direction. Results show that signal extraction is more accurate when lighted from the front than illuminated from the top or front. Changes in light source direction result in specular reflection changes on the skin surface, altering rPPG strengths in RGB components. Wang et al. [15] discovered that a single light source had a more excellent signal-to-noise ratio (S/N) than many light sources because of the varying color shifts in the facial area. Also, the estimated heart rate quality was better when 500–700 lux light was shed on the subject from the front. When the light level is low (100 lux), the accuracy

226

Monika et al.

is reduced. During low-light conditions, the amplitude of the heartbeat pulse signal diminishes, and the camera noise increases. Wang et al. [15] found that the signal-tonoise ratio (S/N) declines under low levels of light. Because of the use of infrared light, near-infrared (NIR) remote photoplethysmography [16] is regarded to be a viable solution for addressing this problem.

4.2.2

Effect of Body Motion

Body mobility significantly impacts HRV measurement since it is hard to control the user’s activity while presuming the real situation. Breathing and little face and lip movements are also part of the overall motion of the body. When a person’s face moves a lot, the light source’s position concerning the skin and the camera shifts, impacting the quality of the extracted blood pulse. Little facial movements like lip-smacking during speech affect the reflection of light on the skin. Consequently, it occurs in the arterial beat as an unwanted noise feature. Three methods were used to assess the consequences of these bodily actions: fixation, non-fixation, and non-fixation while talking. According to the results of the experiments [13], there seems to be a large motion aberration in non-fixation and speaking conditions. To use rPPG in application scenarios, the facial motion must be controlled or an algorithm that adjusts for face motion, like the one proposed by Wang et al. [17].

4.2.3

Effect of Camera’s Frame Rate

A typical camera’s frame rate is 30 frames per second. Akito Tohma et al. [13] examined the frame rate effect on the extracted signal quality by taking into consideration four frame rates 15, 30, 60, and 100 fps. The accuracy of the extracted signal is found to be good at a 30 fps frame rate and higher.

4.3

Effect of Video Compression

One of the most important preprocessing phases for video processing systems, as well as rPPG system, is video compression. In rPPG, video compression is required in a variety of circumstances. Some of the scenarios can be (1) For the processing of the captured video in a remote system such as for sending a patient’s video to the hospital for diagnosis. (2) The need to store the video data for future use. As the video data takes up large storage space and is difficult to download and upload, this can be a reason for the non-availability of a large video training dataset. Video compression is useful in these instances because it reduces the quantity of video data needed for transmission and storage. But a negative effect of video compression can be seen in rPPG. Traditional video compression techniques do not tend to preserve the physiological features of facial skin while compression. Also, most of the remote

Remote Photoplethysmography: Digital Disruption in Health Vital Acquisition

227

photoplethysmographic algorithms use uncompression video data as input, so little attention has been paid to developing physiological features conserving rPPG methods. Changchen Zhao et al. [14] tried to handle this issue by proposing a video compression technique that preserves the physiological features of the skin while compressing. This technique allotted more buts for the compression of the pulsatile area and a few bits to the non-pulsatile skin regions. The findings [14] show that the suggested technique is effective in maintaining physiological signals for facial movies under typical light intensity and is unaffected by ROI size, shape, or subject count.

4.4

ROI Detection and Selection Problem

The effectiveness of HR measurement for rPPG is heavily influenced by ROI quality [18]. Hairs or beards, for example, can readily block some ROIs due to the lack of pulsatile information. Flexible actions like blinking or talking can readily disrupt some ROIs. Some ROIs, such as the forehead and cheeks, are more sensitive to cardiac vibrations because of differences in the distribution of face vasculature. A skewed angle between the face and the power bulb may also result in irregular lighting conditions on the skin surface, as shown by [18]. It is essential to identify high-quality facial ROIs to improve heart rate measurement results. In the initial days of rPPG research, the ROI was frequently defined as a box including the entire facial skin region. As an illustration, consider using an ROI on the forehead. Both cheeks and forehead ROIs were identified in several studies [12] to be more appropriate for producing a stronger pulse signal. According to a previous study, the best ROIs are often selected from smaller patches across the entire forehead or face region. All patches have quality indices defined to help in ROI evaluation. Using the cross-correlation (CC) coefficients and SNR of pulse waveforms from two contiguous windows, Feng et al. [19] split a fixed ROI into non-overlapping patches. When developing EEMD-MCCA for optimal ROI identification, Rencheng Song et al. [18] created the framework to function with various observation sets described by optimal patch ROIs.

4.5

Signal Processing Techniques Limitations

Despite its benefits for a variety of clinical and non-clinical services, the deviation in the HRV obtained using rPPG increases as a result of a variety of circumstances, including illumination conditions [20], body motion [17], and camera parameters. Traditional rPPG methods are not fully robust against one or more of these noises and issues. As a result, one of the primary problems is developing a strategy to reduce the impact of these elements in order to obtain precise rPPG measurements.

228

Monika et al.

1. The rPPG’s quality was improved using blind source separation (BSS) technology [21]. According to BSS, one way to determine one of the most uncorrelated source signals is independent component analysis (ICA). Long-term series signals are required for HR measurement and can’t be expected to attain good performance. 2. In contrast to BSS, a framework technique leverages background knowledge of various color constituents. Diffuse and specular reflections can be taken into consideration to produce color changes in CHROM, a chrominance-based technique developed by De Hann and his colleagues [8]. Thus, assumed a uniform skin tone and utilized a linear combination of chrominance attributes to decrease specular reflection in pictures. 3. Furthermore, De Hann et al. [9] presented the blood volume pulse (PBV) signature approach, which exploits the unique signature defined by hemoglobin’s absorption spectrum and recovers the pulse by confining all color fluctuations to the pulsatile direction. While these attempts are ongoing, scaling up from the laboratory to a real-world context is still a problem, such as in telemedicine, due to the difficulty of totally eliminating the impacts of body mobility and lighting.

4.6

Extracted Signal Noise Problem

The rPPG signal obtained as the output from the basic rPPG methods is still affected by different kinds of noises due to body motion and uneven illumination, thus degrading the quality of heart rate estimation value. Because the traditional methods are not fully robust against these noises. To tackle this problem, Rencheng Song et al. [22] process the rough rPPG signal obtained from some conventional method using a generative adversarial network (GAN) to reduce the noise present in the signal and then calculate heart rate from it to get more accurate results.

5 Trends and Tools rPPG is still an emerging technology with a promising future in both the healthcare field and in general life situations such as employee health monitoring, drivers’ heart rate monitoring, etc. rPPG with improving technical aspects can be used by health insurance businesses in various ways such as to improve the process of stratifying high-risk individuals so that the insurer may better manage the insurance pool’s risk. One such medical insurance web application present in the commercial market is BwellInsure. It uses data from medical bio-markers by taking consumers’ rPPGbased medical diagnoses to help them choose the best medical insurance. Because rPPG can run on easily available mobile devices, insurers can possibly expand the user base by gaining access to a diverse range of clients while keeping an acceptable level of risk. The underwriting process takes approximately 90 seconds to measure

Remote Photoplethysmography: Digital Disruption in Health Vital Acquisition

229

the heart rate through rPPG. It is engaging and appealing to people. With the advancement of information and communication technology (ICT), several telemedicine services have sparked interest [13]. Furthermore, the COVID-19 pandemic has increased the need for telemedicine because there is a risk of infection in actual hospitals; this need can be met by the rPPG method. However, there is a need to measure health vitals in telemedicine with high accuracy, thus it is essential to create rPPG algorithms that are resistant to diverse noise types such as light intensity variation and body motion, and produce promising results in terms of accuracy. Few applications are seen in cyberspace for commercial use that measures heart rate and other health vitals by capturing a facial video, presently available three such applications are listed below: • Wellfie [23] • Covitor [24] • Bwellinsure [25] Applications Wellfie and Covitor take as input the physical details of a person like height and weight, record the facial video of the person for 45 seconds, and as an output, provide health vitals like heart rate, oxygen saturation level, respiration rate, and Body Mass Index (BMI). BwellInsure is a health insurance application that provides users with the rPPGbased medical diagnosis for choosing the best medical cover by leveraging medical bio-markers data. As BwellInsure is not available to the general public without health insurance, only Wellfie and Covitor applications are considered for analysis.

6 Study and Results This section will evaluate 2 of the above-listed tools present in cyberspace (Wellfie and Covitor) based on a self-conducted experiment on 50 people (both male and female) of the age group 18–60 years to check the tool’s robustness and accuracy. Hardware requirements of the experiment are a source of light, a Chair, distance measuring tape, a computer with a webcam or a mobile phone with a camera, a desk for setup, a medical pulse oximeter, and software requirements of the experiments are Wellfie and Covitor applications. Each person is made to sit at a distance of 0.5 m from the webcam (or smartphone) as shown in Fig. 2, and reading is taken using the online tool (Wellfie and Covitor, one at a time) and a pulse oximeter simultaneously. All the readings were noted to create the database, which is then evaluated to check the significance of the health vital results produced by these online tools. Consent is taken from each subject for using the health information in this paper as per the Informed Consent Form in Annexure-I. As a measure of performance, the Mean Absolute Error (MAE) of both heart rate and oxygen saturation level is calculated for both applications which are shown in Tables 2 and 3.

230

Monika et al.

Fig. 2 Experimental setup to record input for both Wellfie and Covitor applications Table 2 The Mean Absolute Error (MAE) of heart rate was calculated from the Wellfie and Covitor website in comparison with the pulse oximeter Application Wellfie Covitor

MAE (Heart rate, bpm) 4 3.55

Table 3 The Mean Absolute Error (MAE) of the oxygen saturation level was calculated from the Wellfie and Covitor website in comparison with the pulse oximeter Application Wellfie Covitor

MAE (Oxygen level, %) 2.05 1.75

Result Mean absolute error for heart rate in wellfie and covitor application is 4 bpm and 3.55 bpm respectively. The mean absolute error for oxygen saturation levels in wellfie and covitor is 2.05% and 1.75% respectively. Although significant results are obtained from both applications, there are a few cases when the applications are not able to detect the face and also not able to estimate the health vitals. These scenarios are listed below: • When the subject is wearing a hat or a turban, it becomes difficult for the application to detect faces and produce results. The same problem is faced when the subject has some facial injury. • Good quality webcams or smartphone cameras are needed with resolution up to a mark for using these applications because the picture quality should be high enough for the rPPG algorithm to measure the color variations on the skin surface. • Good lighting conditions are a must for the measurement of health vitals because it is difficult to get the applications to work during the evening time or in an environment with low lighting conditions. • A large deviation from the standard pulse oximeter or error in the result can be seen while measuring heart rate after a workout session or running. An

Remote Photoplethysmography: Digital Disruption in Health Vital Acquisition Table 4 Outliers detected in heart rate estimation (in bpm) during the study

Scenario Exercise Turban Hat Beard and mustache Distance >0.5 m

Wellfie 94 NA NA 81 NA

Covitor 92 NA NA 82 NA

231 Standard device 102 81 83 89 86

explanation for this is the involvement of periodic motion during exercise which causes interference with the periodic motion of the pulse. • Distance between the camera and subject is also a crucial parameter to consider because, during the study, it was difficult to detect a face when there is more than a 0.5-m distance between the smartphone and the subject. The above-listed exceptions observed during the study are listed in Table 4. It can be depicted from Table 4 that wellfie and covitor applications were not able to detect health vitals in the case of a subject wearing a hat or turban, and also for a distance of more than 0.5 m. And, there is a large deviation in health vitals reading for exercise and beard and mustache scenario.

7 Conclusion Every rPPG algorithm existing so far suffers from certain limitations which can be due to noise present in the source signal, uneven illumination, body movements, distance from the camera, or any other unavoidable circumstances such as a person with a facial injury, a beard or turban, etc. This paper studied the various issues present in rPPG technology and the challenges for heart rate estimation. Each issue is accompanied by the current research solution for that problem. It also studied the effect of video compression on heart rate estimation and presented limitations of the existing rPPG algorithms. It then discusses the current technology trends in this field and evaluated the wellfie and covitor health vitals estimating websites present in cyberspace through a study on 50 people. The MAE for heart rate in wellfie and covitor application is 4 bpm and 3.55 bpm respectively and the MAE for oxygen saturation level in wellfie and covitor is 2.05% and 1.75% respectively. Both the websites showed significant results except for the case when the subject is wearing a turban or hat and has a heavy beard and mustache and when the subject’s distance from the camera is more than 0.5 m, and in low lighting conditions which can be treated as a research gap and future work can focus on this issue to make these applications more realistic and efficient. It concludes that remote photoplethysmography is still an open research problem because there are certain issues in each stage of the rPPG technology implementation that need to be resolved for medical-grade implementation. AI can be combined with the existing rPPG approaches to improve the overall estimated heart rate as well as for improvement

232

Monika et al.

in the output of each intermediate stage, As AI and DL methods provide better accuracy than the traditional rPPG methods. Possible future work can be either the development of rPPG methods robust against different kinds of noise and signal deterioration or doing further research to find out the best-suited experimental conditions to get the most significant results from the existing rPPG methods. In the future, rPPG have the potential use in continuous patient monitoring in hospital emergency wards such as ICU. Acknowledgement Authors would like to thank Bhawna Sethi, Amit Kumar, and Deepali Sharma for their discussions on the topic. Conflict of Interest Authors affirms that they have no known financial or interpersonal conflicts that would have appeared to have an impact on the research presented in this study.

Annexure-I: Informed Consent Form Participant’s Name:_______________ Age: _________________ 1. I understand that my participation in the study is voluntary and that I am free to withdraw at any time, without giving any reason, without my medical care or legal rights being affected. I agree to take part in the above study 2. I agree not to restrict the use of any data or results that arise from this study provided such use is only for the scientific purpose(s)

Participant’s Signature/Thumb impression:

Yes/ No Yes/ No

Date:

References 1. Hertzman, A. B. (1937, December). Photoelectric plethysmography of the fingers and toes in man. Sage Journals (Proceedings of the Society for Experimental Biology and Medicine), 33, 529–534. 2. Wang, W., et al. (2017, July). Algorithmic principles of remote PPG. IEEE Transactions on Biomedical Engineering, 64(7), 1479–1491. 3. Hu, M., Qian, F., Wang, X., He, L., Guo, D., & Ren, F. (2021). Robust heart rate estimation with spatial-temporal attention network from facial videos. IEEE Transactions on Cognitive and Developmental Systems. https://doi.org/10.1109/TCDS.2021.3131197. 4. Bian, M., Peng, B., Wang, W., & Dong, J. (2019). An accurate LSTM based video heart rate estimation method. In Z. Lin, L. Wang, J. Yang, G. Shi, T. Tan, N. Zheng, X. Chen, & Y. Zhang (Eds.), Pattern recognition and computer vision (pp. 409–417). Springer. 5. Cheng, C. H., et al. (2021). Deep learning methods for remote heart rate measurement: A review and future research agenda. Sensors (MDPI), 21(18), 6296. 6. Dasari, A., et al. (2021, June). Evaluation of biases in remote photoplethysmography methods. NPJ (Digital medicine), 4(1), 1–13.

Remote Photoplethysmography: Digital Disruption in Health Vital Acquisition

233

7. Cho, D., et al. (2021, August). Reduction of motion artifacts from remote photoplethysmography using adaptive noise cancellation and modified HSI model. IEEE Access, 9, 122655–122667. 8. de Haan, G., & Jeanne, V. (2013, October). Robust pulse rate from chrominance-based rPPG. IEEE Transactions on Biomedical Engineering, 60(10), 2878–2886. 9. de Haan, G., & van Leest, A. (2014, October). Improved motion robustness of remotePPG by using the blood volume pulse signature. Physiological Measurement, 35(9), 1913–1922. 10. der Kooij, V., Koen, M., & Naber, M. (2019, May). An open-source remote heart rate imaging method with practical apparatus and algorithms. Springer Link (Behavior Research Methods), 51, 2106–2119. 11. Wang, W., et al. (2016, September). A novel algorithm for remote photoplethysmography: Spatial subspace rotation. IEEE Transactions on Biomedical Engineering, 63(9), 1974–1984. 12. Po, L. M. (2017). Block-based adaptive ROI for remote photoplethysmography. Multimedia Tools and Applications, 77(6), 6503–6529. 13. Tohma, A., et al. (2021, December). Evaluation of remote photoplethysmography measurement conditions toward telemedicine applications. Sensors (MDPI), 21(24), 8357. 14. Zhao, C., et al. (2019, June). Physiological signal preserving video compression for remote photoplethysmography. IEEE Sensors Journal, 19(12), 4537–4548. 15. Wang, W., den Brinker, A. C., Stuijk, S., & de Haan, G. (2017). Robust heart rate from fitness videos. Physiological Measurement, 38(1023), 1044. 16. Van Gastel, M., Stuijk, S., & de Haan, G. (2015). Motion robust remote-PPG in infrared. IEEE Transactions on Biomedical Engineering, 62(1425), 1433. 17. Wang, W. W., Stuijk, S., & De Haan, G. G. (2014). Exploiting spatial redundancy of image sensor for motion robust rPPG. IEEE Transactions on Biomedical Engineering, 62, 415–425. 18. Song, R., et al. (2021, March). Remote photoplethysmography with an EEMD-MCCA method robust against spatially uneven illuminations. IEEE Sensors Journal, 21(12), 13484–13494. 19. Feng, L., et al. (2015, April). Dynamic ROI based on K-means for remote photoplethysmography. In Proceeding of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1310–1314). 20. Papageorgiou, A., & de Haan, G. (2014, August 31) Adaptive gain tuning for robust remote pulse rate monitoring under changing light conditions. Master’s thesis, Eindhoven University of Technology. 21. Poh, M.-Z., McDu, D. J., & Picard, R. W. (2010). Advancements in noncontact, multiparameter physiological measurements using a webcam. IEEE Transactions on Biomedical Engineering, 58(7), 11. 22. Song, R., et al. (2021, January). PulseGAN: Learning to generate realistic pulse waveforms in remote photoplethysmography. IEEE Journal of Biomedical and Health Informatics, 25(5), 1373–1384. 23. AI-Based Wellness Selfie | Get your Key Health Vitals Anywhere, Anytime (wellfie.in). Last accessed on 20 April 2022. 24. AI-Based Wellness Sel e | Get your Key Health Vitals Anywhere, Anytime (covitor.ai). Last accessed on 20 April 2022. 25. https://bwellinsure.com. Last accessed on 20 April 2022. 26. Artemyev, M., et al. (2020). Robust algorithm for remote photoplethysmography in realistic conditions. Elsevier (Digital Signal Processing), 104, 102737. 27. Macwan, R., & Benezeth, Y. (2018). Heart rate estimation using remote photoplethysmography with multi-objective optimization. Elsevier (Biomedical Signal Processing and Control), 49, 24–33. 28. https://www.binah.ai/company/. Last accessed on 23 Sept 2022.